2022-05-09 04:08:28

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: [PATCH V2 0/7] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer

From: Oleksandr Tyshchenko <[email protected]>

Hello all.

The purpose of this patch series is to add support for restricting memory access under Xen using specific
grant table [1] based DMA-mapping layer. Patch series is based on Juergen Gross’ initial work [2] which implies
using grant references instead of raw guest physical addresses (GPA) for the virtio communications (some
kind of the software IOMMU).

You can find RFC-V1 patch series (and previous discussions) at [3].

The high level idea is to create new Xen’s grant table based DMA-mapping layer for the guest Linux whose main
purpose is to provide a special 64-bit DMA address which is formed by using the grant reference (for a page
to be shared with the backend) with offset and setting the highest address bit (this is for the backend to
be able to distinguish grant ref based DMA address from normal GPA). For this to work we need the ability
to allocate contiguous (consecutive) grant references for multi-page allocations. And the backend then needs
to offer VIRTIO_F_ACCESS_PLATFORM and VIRTIO_F_VERSION_1 feature bits (it must support virtio-mmio modern
transport for 64-bit addresses in the virtqueue).

Xen's grant mapping mechanism is the secure and safe solution to share pages between domains which proven
to work and works for years (in the context of traditional Xen PV drivers for example). So far, the foreign
mapping is used for the virtio backend to map and access guest memory. With the foreign mapping, the backend
is able to map arbitrary pages from the guest memory (or even from Dom0 memory). And as the result, the malicious
backend which runs in a non-trusted domain can take advantage of this. Instead, with the grant mapping
the backend is only allowed to map pages which were explicitly granted by the guest before and nothing else.
According to the discussions in various mainline threads this solution would likely be welcome because it
perfectly fits in the security model Xen provides.

What is more, the grant table based solution requires zero changes to the Xen hypervisor itself at least
with virtio-mmio and DT (in comparison, for example, with "foreign mapping + virtio-iommu" solution which would
require the whole new complex emulator in hypervisor in addition to new functionality/hypercall to pass IOVA
from the virtio backend running elsewhere to the hypervisor and translate it to the GPA before mapping into
P2M or denying the foreign mapping request if no corresponding IOVA-GPA mapping present in the IOMMU page table
for that particular device). We only need to update toolstack to insert a new "xen,dev-domid" property to
the virtio-mmio device node when creating a guest device-tree (this is an indicator for the guest to use grants
and the ID of Xen domain where the corresponding backend resides, it is used as an argument to the grant mapping
APIs). It worth mentioning that toolstack patch is based on non upstreamed yet “Virtio support for toolstack
on Arm” series which is on review now [4].

Please note the following:
- Patch series only covers Arm and virtio-mmio (device-tree) for now. To enable the restricted memory access
feature on Arm the following option should be set:
CONFIG_XEN_VIRTIO = y
- Patch series is based on "platform_has()" patch series which is on review now [5]
- Xen should be built with the following options:
CONFIG_IOREQ_SERVER=y
CONFIG_EXPERT=y

Patch series is rebased on Linux 5.18-rc4 tag with "platform_has()" series applied and tested on Renesas
Salvator-X board + H3 ES3.0 SoC (Arm64) with standalone userspace (non-Qemu) virtio-mmio based virtio-disk
backend running in Driver domain and Linux guest running on existing virtio-blk driver (frontend).
No issues were observed. Guest domain 'reboot/destroy' use-cases work properly. I have also tested other
use-cases such as assigning several virtio block devices or a mix of virtio and Xen PV block devices
to the guest. Patch series was build-tested on Arm32 and x86.

1. Xen changes located at (last patch):
https://github.com/otyshchenko1/xen/commits/libxl_virtio_next1
2. Linux changes located at (last 7 patches):
https://github.com/otyshchenko1/linux/commits/virtio_grant7
3. virtio-disk changes located at:
https://github.com/otyshchenko1/virtio-disk/commits/virtio_grant

Any feedback/help would be highly appreciated.

[1] https://xenbits.xenproject.org/docs/4.16-testing/misc/grant-tables.txt
[2] https://www.youtube.com/watch?v=IrlEdaIUDPk
[3] https://lore.kernel.org/xen-devel/[email protected]/
https://lore.kernel.org/xen-devel/[email protected]/
[4] https://lore.kernel.org/xen-devel/[email protected]/
[5] https://lore.kernel.org/xen-devel/[email protected]/

Juergen Gross (3):
xen/grants: support allocating consecutive grants
xen/grant-dma-ops: Add option to restrict memory access under Xen
xen/virtio: Enable restricted memory access using Xen grant mappings

Oleksandr Tyshchenko (4):
arm/xen: Introduce xen_setup_dma_ops()
dt-bindings: Add xen,dev-domid property description for xen-grant DMA
ops
xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices
arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices

.../devicetree/bindings/arm/xen,dev-domid.yaml | 37 +++
Documentation/devicetree/bindings/virtio/mmio.yaml | 7 +
arch/arm/include/asm/xen/xen-ops.h | 2 +
arch/arm/mm/dma-mapping.c | 7 +-
arch/arm/xen/enlighten.c | 2 +
arch/arm64/include/asm/xen/xen-ops.h | 2 +
arch/arm64/mm/dma-mapping.c | 7 +-
arch/x86/xen/enlighten_hvm.c | 2 +
arch/x86/xen/enlighten_pv.c | 2 +
drivers/xen/Kconfig | 15 +
drivers/xen/Makefile | 1 +
drivers/xen/grant-dma-ops.c | 324 +++++++++++++++++++++
drivers/xen/grant-table.c | 238 +++++++++++++--
include/xen/arm/xen-ops.h | 18 ++
include/xen/grant_table.h | 4 +
include/xen/xen-ops.h | 13 +
include/xen/xen.h | 8 +
17 files changed, 647 insertions(+), 42 deletions(-)
create mode 100644 Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
create mode 100644 arch/arm/include/asm/xen/xen-ops.h
create mode 100644 arch/arm64/include/asm/xen/xen-ops.h
create mode 100644 drivers/xen/grant-dma-ops.c
create mode 100644 include/xen/arm/xen-ops.h

--
2.7.4



2022-05-09 05:03:24

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: [PATCH V2 2/7] xen/grants: support allocating consecutive grants

From: Juergen Gross <[email protected]>

For support of virtio via grant mappings in rare cases larger mappings
using consecutive grants are needed. Support those by adding a bitmap
of free grants.

As consecutive grants will be needed only in very rare cases (e.g. when
configuring a virtio device with a multi-page ring), optimize for the
normal case of non-consecutive allocations.

Signed-off-by: Juergen Gross <[email protected]>
---
Changes RFC -> V1:
- no changes

Changes V1 -> V2:
- no changes
---
drivers/xen/grant-table.c | 238 +++++++++++++++++++++++++++++++++++++++-------
include/xen/grant_table.h | 4 +
2 files changed, 210 insertions(+), 32 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 8ccccac..1b458c0 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -33,6 +33,7 @@

#define pr_fmt(fmt) "xen:" KBUILD_MODNAME ": " fmt

+#include <linux/bitmap.h>
#include <linux/memblock.h>
#include <linux/sched.h>
#include <linux/mm.h>
@@ -72,9 +73,32 @@

static grant_ref_t **gnttab_list;
static unsigned int nr_grant_frames;
+
+/*
+ * Handling of free grants:
+ *
+ * Free grants are in a simple list anchored in gnttab_free_head. They are
+ * linked by grant ref, the last element contains GNTTAB_LIST_END. The number
+ * of free entries is stored in gnttab_free_count.
+ * Additionally there is a bitmap of free entries anchored in
+ * gnttab_free_bitmap. This is being used for simplifying allocation of
+ * multiple consecutive grants, which is needed e.g. for support of virtio.
+ * gnttab_last_free is used to add free entries of new frames at the end of
+ * the free list.
+ * gnttab_free_tail_ptr specifies the variable which references the start
+ * of consecutive free grants ending with gnttab_last_free. This pointer is
+ * updated in a rather defensive way, in order to avoid performance hits in
+ * hot paths.
+ * All those variables are protected by gnttab_list_lock.
+ */
static int gnttab_free_count;
-static grant_ref_t gnttab_free_head;
+static unsigned int gnttab_size;
+static grant_ref_t gnttab_free_head = GNTTAB_LIST_END;
+static grant_ref_t gnttab_last_free = GNTTAB_LIST_END;
+static grant_ref_t *gnttab_free_tail_ptr;
+static unsigned long *gnttab_free_bitmap;
static DEFINE_SPINLOCK(gnttab_list_lock);
+
struct grant_frames xen_auto_xlat_grant_frames;
static unsigned int xen_gnttab_version;
module_param_named(version, xen_gnttab_version, uint, 0);
@@ -170,16 +194,111 @@ static int get_free_entries(unsigned count)

ref = head = gnttab_free_head;
gnttab_free_count -= count;
- while (count-- > 1)
- head = gnttab_entry(head);
+ while (count--) {
+ bitmap_clear(gnttab_free_bitmap, head, 1);
+ if (gnttab_free_tail_ptr == __gnttab_entry(head))
+ gnttab_free_tail_ptr = &gnttab_free_head;
+ if (count)
+ head = gnttab_entry(head);
+ }
gnttab_free_head = gnttab_entry(head);
gnttab_entry(head) = GNTTAB_LIST_END;

+ if (!gnttab_free_count) {
+ gnttab_last_free = GNTTAB_LIST_END;
+ gnttab_free_tail_ptr = NULL;
+ }
+
spin_unlock_irqrestore(&gnttab_list_lock, flags);

return ref;
}

+static int get_seq_entry_count(void)
+{
+ if (gnttab_last_free == GNTTAB_LIST_END || !gnttab_free_tail_ptr ||
+ *gnttab_free_tail_ptr == GNTTAB_LIST_END)
+ return 0;
+
+ return gnttab_last_free - *gnttab_free_tail_ptr + 1;
+}
+
+/* Rebuilds the free grant list and tries to find count consecutive entries. */
+static int get_free_seq(unsigned int count)
+{
+ int ret = -ENOSPC;
+ unsigned int from, to;
+ grant_ref_t *last;
+
+ gnttab_free_tail_ptr = &gnttab_free_head;
+ last = &gnttab_free_head;
+
+ for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
+ from < gnttab_size;
+ from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
+ to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
+ from + 1);
+ if (ret < 0 && to - from >= count) {
+ ret = from;
+ bitmap_clear(gnttab_free_bitmap, ret, count);
+ from += count;
+ gnttab_free_count -= count;
+ if (from == to)
+ continue;
+ }
+
+ while (from < to) {
+ *last = from;
+ last = __gnttab_entry(from);
+ gnttab_last_free = from;
+ from++;
+ }
+ if (to < gnttab_size)
+ gnttab_free_tail_ptr = __gnttab_entry(to - 1);
+ }
+
+ *last = GNTTAB_LIST_END;
+ if (gnttab_last_free != gnttab_size - 1)
+ gnttab_free_tail_ptr = NULL;
+
+ return ret;
+}
+
+static int get_free_entries_seq(unsigned int count)
+{
+ unsigned long flags;
+ int ret = 0;
+
+ spin_lock_irqsave(&gnttab_list_lock, flags);
+
+ if (gnttab_free_count < count) {
+ ret = gnttab_expand(count - gnttab_free_count);
+ if (ret < 0)
+ goto out;
+ }
+
+ if (get_seq_entry_count() < count) {
+ ret = get_free_seq(count);
+ if (ret >= 0)
+ goto out;
+ ret = gnttab_expand(count - get_seq_entry_count());
+ if (ret < 0)
+ goto out;
+ }
+
+ ret = *gnttab_free_tail_ptr;
+ *gnttab_free_tail_ptr = gnttab_entry(ret + count - 1);
+ gnttab_free_count -= count;
+ if (!gnttab_free_count)
+ gnttab_free_tail_ptr = NULL;
+ bitmap_clear(gnttab_free_bitmap, ret, count);
+
+ out:
+ spin_unlock_irqrestore(&gnttab_list_lock, flags);
+
+ return ret;
+}
+
static void do_free_callbacks(void)
{
struct gnttab_free_callback *callback, *next;
@@ -206,17 +325,48 @@ static inline void check_free_callbacks(void)
do_free_callbacks();
}

-static void put_free_entry(grant_ref_t ref)
+static void put_free_entry_locked(grant_ref_t ref)
{
- unsigned long flags;
- spin_lock_irqsave(&gnttab_list_lock, flags);
gnttab_entry(ref) = gnttab_free_head;
gnttab_free_head = ref;
+ if (!gnttab_free_count)
+ gnttab_last_free = ref;
+ if (gnttab_free_tail_ptr == &gnttab_free_head)
+ gnttab_free_tail_ptr = __gnttab_entry(ref);
gnttab_free_count++;
+ bitmap_set(gnttab_free_bitmap, ref, 1);
+}
+
+static void put_free_entry(grant_ref_t ref)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&gnttab_list_lock, flags);
+ put_free_entry_locked(ref);
check_free_callbacks();
spin_unlock_irqrestore(&gnttab_list_lock, flags);
}

+static void gnttab_set_free(unsigned int start, unsigned int n)
+{
+ unsigned int i;
+
+ for (i = start; i < start + n - 1; i++)
+ gnttab_entry(i) = i + 1;
+
+ gnttab_entry(i) = GNTTAB_LIST_END;
+ if (!gnttab_free_count) {
+ gnttab_free_head = start;
+ gnttab_free_tail_ptr = &gnttab_free_head;
+ } else {
+ gnttab_entry(gnttab_last_free) = start;
+ }
+ gnttab_free_count += n;
+ gnttab_last_free = i;
+
+ bitmap_set(gnttab_free_bitmap, start, n);
+}
+
/*
* Following applies to gnttab_update_entry_v1 and gnttab_update_entry_v2.
* Introducing a valid entry into the grant table:
@@ -448,23 +598,31 @@ void gnttab_free_grant_references(grant_ref_t head)
{
grant_ref_t ref;
unsigned long flags;
- int count = 1;
- if (head == GNTTAB_LIST_END)
- return;
+
spin_lock_irqsave(&gnttab_list_lock, flags);
- ref = head;
- while (gnttab_entry(ref) != GNTTAB_LIST_END) {
- ref = gnttab_entry(ref);
- count++;
+ while (head != GNTTAB_LIST_END) {
+ ref = gnttab_entry(head);
+ put_free_entry_locked(head);
+ head = ref;
}
- gnttab_entry(ref) = gnttab_free_head;
- gnttab_free_head = head;
- gnttab_free_count += count;
check_free_callbacks();
spin_unlock_irqrestore(&gnttab_list_lock, flags);
}
EXPORT_SYMBOL_GPL(gnttab_free_grant_references);

+void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count)
+{
+ unsigned long flags;
+ unsigned int i;
+
+ spin_lock_irqsave(&gnttab_list_lock, flags);
+ for (i = count; i > 0; i--)
+ put_free_entry_locked(head + i - 1);
+ check_free_callbacks();
+ spin_unlock_irqrestore(&gnttab_list_lock, flags);
+}
+EXPORT_SYMBOL_GPL(gnttab_free_grant_reference_seq);
+
int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
{
int h = get_free_entries(count);
@@ -478,6 +636,24 @@ int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
}
EXPORT_SYMBOL_GPL(gnttab_alloc_grant_references);

+int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first)
+{
+ int h;
+
+ if (count == 1)
+ h = get_free_entries(1);
+ else
+ h = get_free_entries_seq(count);
+
+ if (h < 0)
+ return -ENOSPC;
+
+ *first = h;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(gnttab_alloc_grant_reference_seq);
+
int gnttab_empty_grant_references(const grant_ref_t *private_head)
{
return (*private_head == GNTTAB_LIST_END);
@@ -570,16 +746,13 @@ static int grow_gnttab_list(unsigned int more_frames)
goto grow_nomem;
}

+ gnttab_set_free(gnttab_size, extra_entries);

- for (i = grefs_per_frame * nr_grant_frames;
- i < grefs_per_frame * new_nr_grant_frames - 1; i++)
- gnttab_entry(i) = i + 1;
-
- gnttab_entry(i) = gnttab_free_head;
- gnttab_free_head = grefs_per_frame * nr_grant_frames;
- gnttab_free_count += extra_entries;
+ if (!gnttab_free_tail_ptr)
+ gnttab_free_tail_ptr = __gnttab_entry(gnttab_size);

nr_grant_frames = new_nr_grant_frames;
+ gnttab_size += extra_entries;

check_free_callbacks();

@@ -1424,7 +1597,6 @@ int gnttab_init(void)
int i;
unsigned long max_nr_grant_frames;
unsigned int max_nr_glist_frames, nr_glist_frames;
- unsigned int nr_init_grefs;
int ret;

gnttab_request_version();
@@ -1452,6 +1624,13 @@ int gnttab_init(void)
}
}

+ i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
+ gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
+ if (!gnttab_free_bitmap) {
+ ret = -ENOMEM;
+ goto ini_nomem;
+ }
+
ret = arch_gnttab_init(max_nr_grant_frames,
nr_status_frames(max_nr_grant_frames));
if (ret < 0)
@@ -1462,15 +1641,9 @@ int gnttab_init(void)
goto ini_nomem;
}

- nr_init_grefs = nr_grant_frames *
- gnttab_interface->grefs_per_grant_frame;
-
- for (i = NR_RESERVED_ENTRIES; i < nr_init_grefs - 1; i++)
- gnttab_entry(i) = i + 1;
+ gnttab_size = nr_grant_frames * gnttab_interface->grefs_per_grant_frame;

- gnttab_entry(nr_init_grefs - 1) = GNTTAB_LIST_END;
- gnttab_free_count = nr_init_grefs - NR_RESERVED_ENTRIES;
- gnttab_free_head = NR_RESERVED_ENTRIES;
+ gnttab_set_free(NR_RESERVED_ENTRIES, gnttab_size - NR_RESERVED_ENTRIES);

printk("Grant table initialized\n");
return 0;
@@ -1479,6 +1652,7 @@ int gnttab_init(void)
for (i--; i >= 0; i--)
free_page((unsigned long)gnttab_list[i]);
kfree(gnttab_list);
+ bitmap_free(gnttab_free_bitmap);
return ret;
}
EXPORT_SYMBOL_GPL(gnttab_init);
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index dfd5bf3..d815e1d 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -129,10 +129,14 @@ int gnttab_try_end_foreign_access(grant_ref_t ref);
*/
int gnttab_alloc_grant_references(u16 count, grant_ref_t *pprivate_head);

+int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first);
+
void gnttab_free_grant_reference(grant_ref_t ref);

void gnttab_free_grant_references(grant_ref_t head);

+void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count);
+
int gnttab_empty_grant_references(const grant_ref_t *pprivate_head);

int gnttab_claim_grant_reference(grant_ref_t *pprivate_head);
--
2.7.4


2022-05-09 05:56:14

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: [PATCH V2 6/7] xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices

From: Oleksandr Tyshchenko <[email protected]>

Use the presence of recently introduced "xen,dev-domid" property
in the device node as a clear indicator of enabling Xen grant
mappings scheme for that device and read the ID of Xen domain where
the corresponding backend resides. The ID (domid) is used as
an argument to the Xen grant mapping APIs.

Also introduce xen_is_grant_dma_device() to check whether xen-grant
DMA ops need to be set for a passed device.

Remove the hardcoded domid 0 in xen_grant_setup_dma_ops().

Signed-off-by: Oleksandr Tyshchenko <[email protected]>
---
Changes RFC -> V1:
- new patch, split required changes from commit:
"[PATCH 4/6] virtio: Various updates to xen-virtio DMA ops layer"
- update checks in xen_virtio_setup_dma_ops() to only support
DT devices for now
- remove the "virtio,mmio" check from xen_is_virtio_device()
- remane everything according to the new naming scheme:
s/virtio/grant_dma

Changes V1 -> V2:
- remove dev_is_pci() check in xen_grant_setup_dma_ops()
- remove EXPORT_SYMBOL_GPL(xen_is_grant_dma_device);
---
drivers/xen/grant-dma-ops.c | 24 +++++++++++++++++-------
include/xen/xen-ops.h | 5 +++++
2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index 29ad7bf..8924178 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -55,11 +55,6 @@ static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev)
* Such a DMA address is formed by using the grant reference as a frame
* number and setting the highest address bit (this bit is for the backend
* to be able to distinguish it from e.g. a mmio address).
- *
- * Note that for now we hard wire dom0 to be the backend domain. In order
- * to support any domain as backend we'd need to add a way to communicate
- * the domid of this backend, e.g. via Xenstore, via the PCI-device's
- * config space or DT/ACPI.
*/
static void *xen_grant_dma_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp,
@@ -275,6 +270,15 @@ static const struct dma_map_ops xen_grant_dma_ops = {
.dma_supported = xen_grant_dma_supported,
};

+bool xen_is_grant_dma_device(struct device *dev)
+{
+ /* XXX Handle only DT devices for now */
+ if (!dev->of_node)
+ return false;
+
+ return of_property_read_bool(dev->of_node, "xen,dev-domid");
+}
+
void xen_grant_setup_dma_ops(struct device *dev)
{
struct xen_grant_dma_data *data;
@@ -286,8 +290,14 @@ void xen_grant_setup_dma_ops(struct device *dev)
return;
}

- /* XXX The dom0 is hardcoded as the backend domain for now */
- dev_domid = 0;
+ /* XXX ACPI device unsupported for now */
+ if (!dev->of_node)
+ goto err;
+
+ if (of_property_read_u32(dev->of_node, "xen,dev-domid", &dev_domid)) {
+ dev_err(dev, "xen,dev-domid property is not present\n");
+ goto err;
+ }

data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
if (!data)
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 4f9fad5..62be9dc 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -223,10 +223,15 @@ static inline void xen_preemptible_hcall_end(void) { }

#ifdef CONFIG_XEN_GRANT_DMA_OPS
void xen_grant_setup_dma_ops(struct device *dev);
+bool xen_is_grant_dma_device(struct device *dev);
#else
static inline void xen_grant_setup_dma_ops(struct device *dev)
{
}
+static inline bool xen_is_grant_dma_device(struct device *dev)
+{
+ return false;
+}
#endif /* CONFIG_XEN_GRANT_DMA_OPS */

#endif /* INCLUDE_XEN_OPS_H */
--
2.7.4


2022-05-09 08:42:16

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: [PATCH V2 1/7] arm/xen: Introduce xen_setup_dma_ops()

From: Oleksandr Tyshchenko <[email protected]>

This patch introduces new helper and places it in new header.
The helper's purpose is to assign any Xen specific DMA ops in
a single place. For now, we deal with xen-swiotlb DMA ops only.
The one of the subsequent commits in current series will add
xen-grant DMA ops case.

Also re-use the xen_swiotlb_detect() check on Arm32.

Signed-off-by: Oleksandr Tyshchenko <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
---
Changes RFC -> V1:
- update commit description
- move commit to the beginning of the series
- move #ifdef CONFIG_XEN from dma-mapping.c to xen-ops.h

Changes V1 -> V2:
- add Stefano's R-b
- add missing SPDX-License-Identifier to xen-ops.h
---
arch/arm/include/asm/xen/xen-ops.h | 2 ++
arch/arm/mm/dma-mapping.c | 7 ++-----
arch/arm64/include/asm/xen/xen-ops.h | 2 ++
arch/arm64/mm/dma-mapping.c | 7 ++-----
include/xen/arm/xen-ops.h | 15 +++++++++++++++
5 files changed, 23 insertions(+), 10 deletions(-)
create mode 100644 arch/arm/include/asm/xen/xen-ops.h
create mode 100644 arch/arm64/include/asm/xen/xen-ops.h
create mode 100644 include/xen/arm/xen-ops.h

diff --git a/arch/arm/include/asm/xen/xen-ops.h b/arch/arm/include/asm/xen/xen-ops.h
new file mode 100644
index 00000000..7ebb7eb
--- /dev/null
+++ b/arch/arm/include/asm/xen/xen-ops.h
@@ -0,0 +1,2 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <xen/arm/xen-ops.h>
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82ffac6..059cce0 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -33,7 +33,7 @@
#include <asm/dma-iommu.h>
#include <asm/mach/map.h>
#include <asm/system_info.h>
-#include <xen/swiotlb-xen.h>
+#include <asm/xen/xen-ops.h>

#include "dma.h"
#include "mm.h"
@@ -2287,10 +2287,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,

set_dma_ops(dev, dma_ops);

-#ifdef CONFIG_XEN
- if (xen_initial_domain())
- dev->dma_ops = &xen_swiotlb_dma_ops;
-#endif
+ xen_setup_dma_ops(dev);
dev->archdata.dma_ops_setup = true;
}

diff --git a/arch/arm64/include/asm/xen/xen-ops.h b/arch/arm64/include/asm/xen/xen-ops.h
new file mode 100644
index 00000000..7ebb7eb
--- /dev/null
+++ b/arch/arm64/include/asm/xen/xen-ops.h
@@ -0,0 +1,2 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <xen/arm/xen-ops.h>
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 6719f9e..6099c81 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -9,9 +9,9 @@
#include <linux/dma-map-ops.h>
#include <linux/dma-iommu.h>
#include <xen/xen.h>
-#include <xen/swiotlb-xen.h>

#include <asm/cacheflush.h>
+#include <asm/xen/xen-ops.h>

void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
enum dma_data_direction dir)
@@ -52,8 +52,5 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
if (iommu)
iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);

-#ifdef CONFIG_XEN
- if (xen_swiotlb_detect())
- dev->dma_ops = &xen_swiotlb_dma_ops;
-#endif
+ xen_setup_dma_ops(dev);
}
diff --git a/include/xen/arm/xen-ops.h b/include/xen/arm/xen-ops.h
new file mode 100644
index 00000000..288deb1
--- /dev/null
+++ b/include/xen/arm/xen-ops.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM_XEN_OPS_H
+#define _ASM_ARM_XEN_OPS_H
+
+#include <xen/swiotlb-xen.h>
+
+static inline void xen_setup_dma_ops(struct device *dev)
+{
+#ifdef CONFIG_XEN
+ if (xen_swiotlb_detect())
+ dev->dma_ops = &xen_swiotlb_dma_ops;
+#endif
+}
+
+#endif /* _ASM_ARM_XEN_OPS_H */
--
2.7.4


2022-05-09 08:58:09

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: [PATCH V2 4/7] xen/virtio: Enable restricted memory access using Xen grant mappings

From: Juergen Gross <[email protected]>

In order to support virtio in Xen guests add a config option XEN_VIRTIO
enabling the user to specify whether in all Xen guests virtio should
be able to access memory via Xen grant mappings only on the host side.

Also set PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS feature from the guest
initialization code on Arm and x86 if CONFIG_XEN_VIRTIO is enabled.

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Oleksandr Tyshchenko <[email protected]>
---
Changes V1 -> V2:
- new patch, split required changes from commit:
"[PATCH V1 3/6] xen/virtio: Add option to restrict memory access under Xen"
- rework according to new platform_has() infrastructure
---
arch/arm/xen/enlighten.c | 2 ++
arch/x86/xen/enlighten_hvm.c | 2 ++
arch/x86/xen/enlighten_pv.c | 2 ++
drivers/xen/Kconfig | 11 +++++++++++
include/xen/xen.h | 8 ++++++++
5 files changed, 25 insertions(+)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 07eb69f..1f9c3ba 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -443,6 +443,8 @@ static int __init xen_guest_init(void)
if (!xen_domain())
return 0;

+ xen_set_restricted_virtio_memory_access();
+
if (!acpi_disabled)
xen_acpi_guest_init();
else
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 517a9d8..8b71b1d 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -195,6 +195,8 @@ static void __init xen_hvm_guest_init(void)
if (xen_pv_domain())
return;

+ xen_set_restricted_virtio_memory_access();
+
init_hvm_pv_info();

reserve_shared_info();
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 5038edb..fcd5d5d 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -109,6 +109,8 @@ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);

static void __init xen_pv_init_platform(void)
{
+ xen_set_restricted_virtio_memory_access();
+
populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));

set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 313a9127..a7bd8ce 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -339,4 +339,15 @@ config XEN_GRANT_DMA_OPS
bool
select DMA_OPS

+config XEN_VIRTIO
+ bool "Xen virtio support"
+ depends on VIRTIO
+ select XEN_GRANT_DMA_OPS
+ help
+ Enable virtio support for running as Xen guest. Depending on the
+ guest type this will require special support on the backend side
+ (qemu or kernel, depending on the virtio device types used).
+
+ If in doubt, say n.
+
endmenu
diff --git a/include/xen/xen.h b/include/xen/xen.h
index a99bab8..0780a81 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -52,6 +52,14 @@ bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
extern u64 xen_saved_max_mem_size;
#endif

+#include <linux/platform-feature.h>
+
+static inline void xen_set_restricted_virtio_memory_access(void)
+{
+ if (IS_ENABLED(CONFIG_XEN_VIRTIO) && xen_domain())
+ platform_set(PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS);
+}
+
#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages);
void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages);
--
2.7.4


2022-05-09 09:32:11

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops

From: Oleksandr Tyshchenko <[email protected]>

Introduce Xen specific binding for the virtualized device (e.g. virtio)
to be used by Xen grant DMA-mapping layer in the subsequent commit.

This binding indicates that Xen grant mappings scheme needs to be
enabled for the device which DT node contains that property and specifies
the ID of Xen domain where the corresponding backend resides. The ID
(domid) is used as an argument to the grant mapping APIs.

This is needed for the option to restrict memory access using Xen grant
mappings to work which primary goal is to enable using virtio devices
in Xen guests.

Signed-off-by: Oleksandr Tyshchenko <[email protected]>
---
Changes RFC -> V1:
- update commit subject/description and text in description
- move to devicetree/bindings/arm/

Changes V1 -> V2:
- update text in description
- change the maintainer of the binding
- fix validation issue
- reference xen,dev-domid.yaml schema from virtio/mmio.yaml
---
.../devicetree/bindings/arm/xen,dev-domid.yaml | 37 ++++++++++++++++++++++
Documentation/devicetree/bindings/virtio/mmio.yaml | 7 ++++
2 files changed, 44 insertions(+)
create mode 100644 Documentation/devicetree/bindings/arm/xen,dev-domid.yaml

diff --git a/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
new file mode 100644
index 00000000..750e89e
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
@@ -0,0 +1,37 @@
+# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/arm/xen,dev-domid.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Xen specific binding for virtualized devices (e.g. virtio)
+
+maintainers:
+ - Stefano Stabellini <[email protected]>
+
+select: true
+
+description:
+ This binding indicates that Xen grant mappings need to be enabled for
+ the device, and it specifies the ID of the domain where the corresponding
+ device (backend) resides. The property is required to restrict memory
+ access using Xen grant mappings.
+
+properties:
+ xen,dev-domid:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ The domid (domain ID) of the domain where the device (backend) is running.
+
+additionalProperties: true
+
+examples:
+ - |
+ virtio@3000 {
+ compatible = "virtio,mmio";
+ reg = <0x3000 0x100>;
+ interrupts = <41>;
+
+ /* The device is located in Xen domain with ID 1 */
+ xen,dev-domid = <1>;
+ };
diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
index 10c22b5..29a0932 100644
--- a/Documentation/devicetree/bindings/virtio/mmio.yaml
+++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
@@ -13,6 +13,9 @@ description:
See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
more details.

+allOf:
+ - $ref: /schemas/arm/xen,dev-domid.yaml#
+
properties:
compatible:
const: virtio,mmio
@@ -33,6 +36,10 @@ properties:
description: Required for devices making accesses thru an IOMMU.
maxItems: 1

+ xen,dev-domid:
+ description: Required when Xen grant mappings need to be enabled for device.
+ $ref: /schemas/types.yaml#/definitions/uint32
+
required:
- compatible
- reg
--
2.7.4


2022-05-09 09:58:40

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: [PATCH V2 3/7] xen/grant-dma-ops: Add option to restrict memory access under Xen

From: Juergen Gross <[email protected]>

Introduce Xen grant DMA-mapping layer which contains special DMA-mapping
routines for providing grant references as DMA addresses to be used by
frontends (e.g. virtio) in Xen guests.

Add the needed functionality by providing a special set of DMA ops
handling the needed grant operations for the I/O pages.

The subsequent commit will introduce the use case for xen-grant DMA ops
layer to enable using virtio devices in Xen guests in a safe manner.

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Oleksandr Tyshchenko <[email protected]>
---
Changes RFC -> V1:
- squash with almost all changes from commit (except handling "xen,dev-domid"
property):
"[PATCH 4/6] virtio: Various updates to xen-virtio DMA ops layer"
- update commit subject/description and comments in code
- leave only single Kconfig option XEN_VIRTIO and remove architectural
dependencies
- introduce common xen_has_restricted_virtio_memory_access() in xen.h
and update arch_has_restricted_virtio_memory_access() for both
Arm and x86 to call new helper
- use (1ULL << 63) instead of 0x8000000000000000ULL for XEN_GRANT_ADDR_OFF
- implement xen_virtio_dma_map(unmap)_sg() using example in swiotlb-xen.c
- optimize padding by moving "broken" field in struct xen_virtio_data
- remove unneeded per-device spinlock
- remove the inclusion of virtio_config.h
- remane everything according to the new naming scheme:
s/virtio/grant_dma
- add new hidden config option XEN_GRANT_DMA_OPS

Changes V1 -> V2:
- fix checkpatch.pl warnings
- remove the inclusion of linux/pci.h
- rework to use xarray for data context
- remove EXPORT_SYMBOL_GPL(xen_grant_setup_dma_ops);
- remove the line of * after SPDX-License-Identifier
- split changes into grant-dma-ops.c and arch_has_restricted_virtio_memory_access()
and update commit subject/description accordingly
- remove "default n" for config XEN_VIRTIO
- implement xen_grant_dma_alloc(free)_pages()
---
drivers/xen/Kconfig | 4 +
drivers/xen/Makefile | 1 +
drivers/xen/grant-dma-ops.c | 314 ++++++++++++++++++++++++++++++++++++++++++++
include/xen/xen-ops.h | 8 ++
4 files changed, 327 insertions(+)
create mode 100644 drivers/xen/grant-dma-ops.c

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 120d32f..313a9127 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -335,4 +335,8 @@ config XEN_UNPOPULATED_ALLOC
having to balloon out RAM regions in order to obtain physical memory
space to create such mappings.

+config XEN_GRANT_DMA_OPS
+ bool
+ select DMA_OPS
+
endmenu
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 5aae66e..1a23cb0 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -39,3 +39,4 @@ xen-gntalloc-y := gntalloc.o
xen-privcmd-y := privcmd.o privcmd-buf.o
obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF) += xen-front-pgdir-shbuf.o
obj-$(CONFIG_XEN_UNPOPULATED_ALLOC) += unpopulated-alloc.o
+obj-$(CONFIG_XEN_GRANT_DMA_OPS) += grant-dma-ops.o
diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
new file mode 100644
index 00000000..29ad7bf
--- /dev/null
+++ b/drivers/xen/grant-dma-ops.c
@@ -0,0 +1,314 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Xen grant DMA-mapping layer - contains special DMA-mapping routines
+ * for providing grant references as DMA addresses to be used by frontends
+ * (e.g. virtio) in Xen guests
+ *
+ * Copyright (c) 2021, Juergen Gross <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/dma-map-ops.h>
+#include <linux/of.h>
+#include <linux/pfn.h>
+#include <linux/xarray.h>
+#include <xen/xen.h>
+#include <xen/grant_table.h>
+
+struct xen_grant_dma_data {
+ /* The ID of backend domain */
+ domid_t dev_domid;
+ /* Is device behaving sane? */
+ bool broken;
+};
+
+static DEFINE_XARRAY(xen_grant_dma_devices);
+
+#define XEN_GRANT_DMA_ADDR_OFF (1ULL << 63)
+
+static inline dma_addr_t grant_to_dma(grant_ref_t grant)
+{
+ return XEN_GRANT_DMA_ADDR_OFF | ((dma_addr_t)grant << PAGE_SHIFT);
+}
+
+static inline grant_ref_t dma_to_grant(dma_addr_t dma)
+{
+ return (grant_ref_t)((dma & ~XEN_GRANT_DMA_ADDR_OFF) >> PAGE_SHIFT);
+}
+
+static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev)
+{
+ struct xen_grant_dma_data *data;
+
+ xa_lock(&xen_grant_dma_devices);
+ data = xa_load(&xen_grant_dma_devices, (unsigned long)dev);
+ xa_unlock(&xen_grant_dma_devices);
+
+ return data;
+}
+
+/*
+ * DMA ops for Xen frontends (e.g. virtio).
+ *
+ * Used to act as a kind of software IOMMU for Xen guests by using grants as
+ * DMA addresses.
+ * Such a DMA address is formed by using the grant reference as a frame
+ * number and setting the highest address bit (this bit is for the backend
+ * to be able to distinguish it from e.g. a mmio address).
+ *
+ * Note that for now we hard wire dom0 to be the backend domain. In order
+ * to support any domain as backend we'd need to add a way to communicate
+ * the domid of this backend, e.g. via Xenstore, via the PCI-device's
+ * config space or DT/ACPI.
+ */
+static void *xen_grant_dma_alloc(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t gfp,
+ unsigned long attrs)
+{
+ struct xen_grant_dma_data *data;
+ unsigned int i, n_pages = PFN_UP(size);
+ unsigned long pfn;
+ grant_ref_t grant;
+ void *ret;
+
+ data = find_xen_grant_dma_data(dev);
+ if (!data)
+ return NULL;
+
+ if (unlikely(data->broken))
+ return NULL;
+
+ ret = alloc_pages_exact(n_pages * PAGE_SIZE, gfp);
+ if (!ret)
+ return NULL;
+
+ pfn = virt_to_pfn(ret);
+
+ if (gnttab_alloc_grant_reference_seq(n_pages, &grant)) {
+ free_pages_exact(ret, n_pages * PAGE_SIZE);
+ return NULL;
+ }
+
+ for (i = 0; i < n_pages; i++) {
+ gnttab_grant_foreign_access_ref(grant + i, data->dev_domid,
+ pfn_to_gfn(pfn + i), 0);
+ }
+
+ *dma_handle = grant_to_dma(grant);
+
+ return ret;
+}
+
+static void xen_grant_dma_free(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, unsigned long attrs)
+{
+ struct xen_grant_dma_data *data;
+ unsigned int i, n_pages = PFN_UP(size);
+ grant_ref_t grant;
+
+ data = find_xen_grant_dma_data(dev);
+ if (!data)
+ return;
+
+ if (unlikely(data->broken))
+ return;
+
+ grant = dma_to_grant(dma_handle);
+
+ for (i = 0; i < n_pages; i++) {
+ if (unlikely(!gnttab_end_foreign_access_ref(grant + i))) {
+ dev_alert(dev, "Grant still in use by backend domain, disabled for further use\n");
+ data->broken = true;
+ return;
+ }
+ }
+
+ gnttab_free_grant_reference_seq(grant, n_pages);
+
+ free_pages_exact(vaddr, n_pages * PAGE_SIZE);
+}
+
+static struct page *xen_grant_dma_alloc_pages(struct device *dev, size_t size,
+ dma_addr_t *dma_handle,
+ enum dma_data_direction dir,
+ gfp_t gfp)
+{
+ void *vaddr;
+
+ vaddr = xen_grant_dma_alloc(dev, size, dma_handle, gfp, 0);
+ if (!vaddr)
+ return NULL;
+
+ return virt_to_page(vaddr);
+}
+
+static void xen_grant_dma_free_pages(struct device *dev, size_t size,
+ struct page *vaddr, dma_addr_t dma_handle,
+ enum dma_data_direction dir)
+{
+ xen_grant_dma_free(dev, size, page_to_virt(vaddr), dma_handle, 0);
+}
+
+static dma_addr_t xen_grant_dma_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ struct xen_grant_dma_data *data;
+ unsigned int i, n_pages = PFN_UP(size);
+ grant_ref_t grant;
+ dma_addr_t dma_handle;
+
+ if (WARN_ON(dir == DMA_NONE))
+ return DMA_MAPPING_ERROR;
+
+ data = find_xen_grant_dma_data(dev);
+ if (!data)
+ return DMA_MAPPING_ERROR;
+
+ if (unlikely(data->broken))
+ return DMA_MAPPING_ERROR;
+
+ if (gnttab_alloc_grant_reference_seq(n_pages, &grant))
+ return DMA_MAPPING_ERROR;
+
+ for (i = 0; i < n_pages; i++) {
+ gnttab_grant_foreign_access_ref(grant + i, data->dev_domid,
+ xen_page_to_gfn(page) + i, dir == DMA_TO_DEVICE);
+ }
+
+ dma_handle = grant_to_dma(grant) + offset;
+
+ return dma_handle;
+}
+
+static void xen_grant_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ struct xen_grant_dma_data *data;
+ unsigned int i, n_pages = PFN_UP(size);
+ grant_ref_t grant;
+
+ if (WARN_ON(dir == DMA_NONE))
+ return;
+
+ data = find_xen_grant_dma_data(dev);
+ if (!data)
+ return;
+
+ if (unlikely(data->broken))
+ return;
+
+ grant = dma_to_grant(dma_handle);
+
+ for (i = 0; i < n_pages; i++) {
+ if (unlikely(!gnttab_end_foreign_access_ref(grant + i))) {
+ dev_alert(dev, "Grant still in use by backend domain, disabled for further use\n");
+ data->broken = true;
+ return;
+ }
+ }
+
+ gnttab_free_grant_reference_seq(grant, n_pages);
+}
+
+static void xen_grant_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
+ int nents, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ struct scatterlist *s;
+ unsigned int i;
+
+ if (WARN_ON(dir == DMA_NONE))
+ return;
+
+ for_each_sg(sg, s, nents, i)
+ xen_grant_dma_unmap_page(dev, s->dma_address, sg_dma_len(s), dir,
+ attrs);
+}
+
+static int xen_grant_dma_map_sg(struct device *dev, struct scatterlist *sg,
+ int nents, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ struct scatterlist *s;
+ unsigned int i;
+
+ if (WARN_ON(dir == DMA_NONE))
+ return -EINVAL;
+
+ for_each_sg(sg, s, nents, i) {
+ s->dma_address = xen_grant_dma_map_page(dev, sg_page(s), s->offset,
+ s->length, dir, attrs);
+ if (s->dma_address == DMA_MAPPING_ERROR)
+ goto out;
+
+ sg_dma_len(s) = s->length;
+ }
+
+ return nents;
+
+out:
+ xen_grant_dma_unmap_sg(dev, sg, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
+ sg_dma_len(sg) = 0;
+
+ return -EIO;
+}
+
+static int xen_grant_dma_supported(struct device *dev, u64 mask)
+{
+ return mask == DMA_BIT_MASK(64);
+}
+
+static const struct dma_map_ops xen_grant_dma_ops = {
+ .alloc = xen_grant_dma_alloc,
+ .free = xen_grant_dma_free,
+ .alloc_pages = xen_grant_dma_alloc_pages,
+ .free_pages = xen_grant_dma_free_pages,
+ .mmap = dma_common_mmap,
+ .get_sgtable = dma_common_get_sgtable,
+ .map_page = xen_grant_dma_map_page,
+ .unmap_page = xen_grant_dma_unmap_page,
+ .map_sg = xen_grant_dma_map_sg,
+ .unmap_sg = xen_grant_dma_unmap_sg,
+ .dma_supported = xen_grant_dma_supported,
+};
+
+void xen_grant_setup_dma_ops(struct device *dev)
+{
+ struct xen_grant_dma_data *data;
+ uint32_t dev_domid;
+
+ data = find_xen_grant_dma_data(dev);
+ if (data) {
+ dev_err(dev, "Xen grant DMA data is already created\n");
+ return;
+ }
+
+ /* XXX The dom0 is hardcoded as the backend domain for now */
+ dev_domid = 0;
+
+ data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+ if (!data)
+ goto err;
+
+ data->dev_domid = dev_domid;
+
+ if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
+ GFP_KERNEL))) {
+ dev_err(dev, "Cannot store Xen grant DMA data\n");
+ goto err;
+ }
+
+ dev->dma_ops = &xen_grant_dma_ops;
+
+ return;
+
+err:
+ dev_err(dev, "Сannot set up Xen grant DMA ops, retain platform DMA ops\n");
+}
+
+MODULE_DESCRIPTION("Xen grant DMA-mapping layer");
+MODULE_AUTHOR("Juergen Gross <[email protected]>");
+MODULE_LICENSE("GPL");
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index a3584a3..4f9fad5 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -221,4 +221,12 @@ static inline void xen_preemptible_hcall_end(void) { }

#endif /* CONFIG_XEN_PV && !CONFIG_PREEMPTION */

+#ifdef CONFIG_XEN_GRANT_DMA_OPS
+void xen_grant_setup_dma_ops(struct device *dev);
+#else
+static inline void xen_grant_setup_dma_ops(struct device *dev)
+{
+}
+#endif /* CONFIG_XEN_GRANT_DMA_OPS */
+
#endif /* INCLUDE_XEN_OPS_H */
--
2.7.4


2022-05-09 21:54:32

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH V2 4/7] xen/virtio: Enable restricted memory access using Xen grant mappings

On Sat, 7 May 2022, Oleksandr Tyshchenko wrote:
> From: Juergen Gross <[email protected]>
>
> In order to support virtio in Xen guests add a config option XEN_VIRTIO
> enabling the user to specify whether in all Xen guests virtio should
> be able to access memory via Xen grant mappings only on the host side.
>
> Also set PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS feature from the guest
> initialization code on Arm and x86 if CONFIG_XEN_VIRTIO is enabled.
>
> Signed-off-by: Juergen Gross <[email protected]>
> Signed-off-by: Oleksandr Tyshchenko <[email protected]>

The patch looks OK to me

Reviewed-by: Stefano Stabellini <[email protected]>


> ---
> Changes V1 -> V2:
> - new patch, split required changes from commit:
> "[PATCH V1 3/6] xen/virtio: Add option to restrict memory access under Xen"
> - rework according to new platform_has() infrastructure
> ---
> arch/arm/xen/enlighten.c | 2 ++
> arch/x86/xen/enlighten_hvm.c | 2 ++
> arch/x86/xen/enlighten_pv.c | 2 ++
> drivers/xen/Kconfig | 11 +++++++++++
> include/xen/xen.h | 8 ++++++++
> 5 files changed, 25 insertions(+)
>
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 07eb69f..1f9c3ba 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -443,6 +443,8 @@ static int __init xen_guest_init(void)
> if (!xen_domain())
> return 0;
>
> + xen_set_restricted_virtio_memory_access();
> +
> if (!acpi_disabled)
> xen_acpi_guest_init();
> else
> diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
> index 517a9d8..8b71b1d 100644
> --- a/arch/x86/xen/enlighten_hvm.c
> +++ b/arch/x86/xen/enlighten_hvm.c
> @@ -195,6 +195,8 @@ static void __init xen_hvm_guest_init(void)
> if (xen_pv_domain())
> return;
>
> + xen_set_restricted_virtio_memory_access();
> +
> init_hvm_pv_info();
>
> reserve_shared_info();
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 5038edb..fcd5d5d 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -109,6 +109,8 @@ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
>
> static void __init xen_pv_init_platform(void)
> {
> + xen_set_restricted_virtio_memory_access();
> +
> populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));
>
> set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 313a9127..a7bd8ce 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -339,4 +339,15 @@ config XEN_GRANT_DMA_OPS
> bool
> select DMA_OPS
>
> +config XEN_VIRTIO
> + bool "Xen virtio support"
> + depends on VIRTIO
> + select XEN_GRANT_DMA_OPS
> + help
> + Enable virtio support for running as Xen guest. Depending on the
> + guest type this will require special support on the backend side
> + (qemu or kernel, depending on the virtio device types used).
> +
> + If in doubt, say n.
> +
> endmenu
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index a99bab8..0780a81 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -52,6 +52,14 @@ bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> extern u64 xen_saved_max_mem_size;
> #endif
>
> +#include <linux/platform-feature.h>
> +
> +static inline void xen_set_restricted_virtio_memory_access(void)
> +{
> + if (IS_ENABLED(CONFIG_XEN_VIRTIO) && xen_domain())
> + platform_set(PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS);
> +}
> +
> #ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages);
> void xen_free_unpopulated_pages(unsigned int nr_pages, struct page **pages);

2022-05-09 21:55:22

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen, dev-domid property description for xen-grant DMA ops

On Sat, 7 May 2022, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <[email protected]>
>
> Introduce Xen specific binding for the virtualized device (e.g. virtio)
> to be used by Xen grant DMA-mapping layer in the subsequent commit.
>
> This binding indicates that Xen grant mappings scheme needs to be
> enabled for the device which DT node contains that property and specifies
> the ID of Xen domain where the corresponding backend resides. The ID
> (domid) is used as an argument to the grant mapping APIs.
>
> This is needed for the option to restrict memory access using Xen grant
> mappings to work which primary goal is to enable using virtio devices
> in Xen guests.
>
> Signed-off-by: Oleksandr Tyshchenko <[email protected]>

The binding is OK and the wording is OK too.

Reviewed-by: Stefano Stabellini <[email protected]>

I am not an expert on the details of writing a good schema, I'll defer
to Rob if he has any comments on that.


> ---
> Changes RFC -> V1:
> - update commit subject/description and text in description
> - move to devicetree/bindings/arm/
>
> Changes V1 -> V2:
> - update text in description
> - change the maintainer of the binding
> - fix validation issue
> - reference xen,dev-domid.yaml schema from virtio/mmio.yaml
> ---
> .../devicetree/bindings/arm/xen,dev-domid.yaml | 37 ++++++++++++++++++++++
> Documentation/devicetree/bindings/virtio/mmio.yaml | 7 ++++
> 2 files changed, 44 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
>
> diff --git a/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
> new file mode 100644
> index 00000000..750e89e
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
> @@ -0,0 +1,37 @@
> +# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/arm/xen,dev-domid.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Xen specific binding for virtualized devices (e.g. virtio)
> +
> +maintainers:
> + - Stefano Stabellini <[email protected]>
> +
> +select: true
> +
> +description:
> + This binding indicates that Xen grant mappings need to be enabled for
> + the device, and it specifies the ID of the domain where the corresponding
> + device (backend) resides. The property is required to restrict memory
> + access using Xen grant mappings.
> +
> +properties:
> + xen,dev-domid:
> + $ref: /schemas/types.yaml#/definitions/uint32
> + description:
> + The domid (domain ID) of the domain where the device (backend) is running.
> +
> +additionalProperties: true
> +
> +examples:
> + - |
> + virtio@3000 {
> + compatible = "virtio,mmio";
> + reg = <0x3000 0x100>;
> + interrupts = <41>;
> +
> + /* The device is located in Xen domain with ID 1 */
> + xen,dev-domid = <1>;
> + };
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
> index 10c22b5..29a0932 100644
> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> @@ -13,6 +13,9 @@ description:
> See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
> more details.
>
> +allOf:
> + - $ref: /schemas/arm/xen,dev-domid.yaml#
> +
> properties:
> compatible:
> const: virtio,mmio
> @@ -33,6 +36,10 @@ properties:
> description: Required for devices making accesses thru an IOMMU.
> maxItems: 1
>
> + xen,dev-domid:
> + description: Required when Xen grant mappings need to be enabled for device.
> + $ref: /schemas/types.yaml#/definitions/uint32
> +
> required:
> - compatible
> - reg
> --
> 2.7.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2022-05-09 22:33:10

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH V2 6/7] xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices

On Sat, 7 May 2022, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <[email protected]>
>
> Use the presence of recently introduced "xen,dev-domid" property
> in the device node as a clear indicator of enabling Xen grant
> mappings scheme for that device and read the ID of Xen domain where
> the corresponding backend resides. The ID (domid) is used as
> an argument to the Xen grant mapping APIs.
>
> Also introduce xen_is_grant_dma_device() to check whether xen-grant
> DMA ops need to be set for a passed device.
>
> Remove the hardcoded domid 0 in xen_grant_setup_dma_ops().
>
> Signed-off-by: Oleksandr Tyshchenko <[email protected]>

Reviewed-by: Stefano Stabellini <[email protected]>


> ---
> Changes RFC -> V1:
> - new patch, split required changes from commit:
> "[PATCH 4/6] virtio: Various updates to xen-virtio DMA ops layer"
> - update checks in xen_virtio_setup_dma_ops() to only support
> DT devices for now
> - remove the "virtio,mmio" check from xen_is_virtio_device()
> - remane everything according to the new naming scheme:
> s/virtio/grant_dma
>
> Changes V1 -> V2:
> - remove dev_is_pci() check in xen_grant_setup_dma_ops()
> - remove EXPORT_SYMBOL_GPL(xen_is_grant_dma_device);
> ---
> drivers/xen/grant-dma-ops.c | 24 +++++++++++++++++-------
> include/xen/xen-ops.h | 5 +++++
> 2 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> index 29ad7bf..8924178 100644
> --- a/drivers/xen/grant-dma-ops.c
> +++ b/drivers/xen/grant-dma-ops.c
> @@ -55,11 +55,6 @@ static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev)
> * Such a DMA address is formed by using the grant reference as a frame
> * number and setting the highest address bit (this bit is for the backend
> * to be able to distinguish it from e.g. a mmio address).
> - *
> - * Note that for now we hard wire dom0 to be the backend domain. In order
> - * to support any domain as backend we'd need to add a way to communicate
> - * the domid of this backend, e.g. via Xenstore, via the PCI-device's
> - * config space or DT/ACPI.
> */
> static void *xen_grant_dma_alloc(struct device *dev, size_t size,
> dma_addr_t *dma_handle, gfp_t gfp,
> @@ -275,6 +270,15 @@ static const struct dma_map_ops xen_grant_dma_ops = {
> .dma_supported = xen_grant_dma_supported,
> };
>
> +bool xen_is_grant_dma_device(struct device *dev)
> +{
> + /* XXX Handle only DT devices for now */
> + if (!dev->of_node)
> + return false;
> +
> + return of_property_read_bool(dev->of_node, "xen,dev-domid");
> +}
> +
> void xen_grant_setup_dma_ops(struct device *dev)
> {
> struct xen_grant_dma_data *data;
> @@ -286,8 +290,14 @@ void xen_grant_setup_dma_ops(struct device *dev)
> return;
> }
>
> - /* XXX The dom0 is hardcoded as the backend domain for now */
> - dev_domid = 0;
> + /* XXX ACPI device unsupported for now */
> + if (!dev->of_node)
> + goto err;
> +
> + if (of_property_read_u32(dev->of_node, "xen,dev-domid", &dev_domid)) {
> + dev_err(dev, "xen,dev-domid property is not present\n");
> + goto err;
> + }
>
> data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> if (!data)
> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
> index 4f9fad5..62be9dc 100644
> --- a/include/xen/xen-ops.h
> +++ b/include/xen/xen-ops.h
> @@ -223,10 +223,15 @@ static inline void xen_preemptible_hcall_end(void) { }
>
> #ifdef CONFIG_XEN_GRANT_DMA_OPS
> void xen_grant_setup_dma_ops(struct device *dev);
> +bool xen_is_grant_dma_device(struct device *dev);
> #else
> static inline void xen_grant_setup_dma_ops(struct device *dev)
> {
> }
> +static inline bool xen_is_grant_dma_device(struct device *dev)
> +{
> + return false;
> +}
> #endif /* CONFIG_XEN_GRANT_DMA_OPS */
>
> #endif /* INCLUDE_XEN_OPS_H */
> --
> 2.7.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2022-05-09 22:34:36

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH V2 3/7] xen/grant-dma-ops: Add option to restrict memory access under Xen

On Sat, 7 May 2022, Oleksandr Tyshchenko wrote:
> From: Juergen Gross <[email protected]>
>
> Introduce Xen grant DMA-mapping layer which contains special DMA-mapping
> routines for providing grant references as DMA addresses to be used by
> frontends (e.g. virtio) in Xen guests.
>
> Add the needed functionality by providing a special set of DMA ops
> handling the needed grant operations for the I/O pages.
>
> The subsequent commit will introduce the use case for xen-grant DMA ops
> layer to enable using virtio devices in Xen guests in a safe manner.
>
> Signed-off-by: Juergen Gross <[email protected]>
> Signed-off-by: Oleksandr Tyshchenko <[email protected]>

Reviewed-by: Stefano Stabellini <[email protected]>


> ---
> Changes RFC -> V1:
> - squash with almost all changes from commit (except handling "xen,dev-domid"
> property):
> "[PATCH 4/6] virtio: Various updates to xen-virtio DMA ops layer"
> - update commit subject/description and comments in code
> - leave only single Kconfig option XEN_VIRTIO and remove architectural
> dependencies
> - introduce common xen_has_restricted_virtio_memory_access() in xen.h
> and update arch_has_restricted_virtio_memory_access() for both
> Arm and x86 to call new helper
> - use (1ULL << 63) instead of 0x8000000000000000ULL for XEN_GRANT_ADDR_OFF
> - implement xen_virtio_dma_map(unmap)_sg() using example in swiotlb-xen.c
> - optimize padding by moving "broken" field in struct xen_virtio_data
> - remove unneeded per-device spinlock
> - remove the inclusion of virtio_config.h
> - remane everything according to the new naming scheme:
> s/virtio/grant_dma
> - add new hidden config option XEN_GRANT_DMA_OPS
>
> Changes V1 -> V2:
> - fix checkpatch.pl warnings
> - remove the inclusion of linux/pci.h
> - rework to use xarray for data context
> - remove EXPORT_SYMBOL_GPL(xen_grant_setup_dma_ops);
> - remove the line of * after SPDX-License-Identifier
> - split changes into grant-dma-ops.c and arch_has_restricted_virtio_memory_access()
> and update commit subject/description accordingly
> - remove "default n" for config XEN_VIRTIO
> - implement xen_grant_dma_alloc(free)_pages()
> ---
> drivers/xen/Kconfig | 4 +
> drivers/xen/Makefile | 1 +
> drivers/xen/grant-dma-ops.c | 314 ++++++++++++++++++++++++++++++++++++++++++++
> include/xen/xen-ops.h | 8 ++
> 4 files changed, 327 insertions(+)
> create mode 100644 drivers/xen/grant-dma-ops.c
>
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 120d32f..313a9127 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -335,4 +335,8 @@ config XEN_UNPOPULATED_ALLOC
> having to balloon out RAM regions in order to obtain physical memory
> space to create such mappings.
>
> +config XEN_GRANT_DMA_OPS
> + bool
> + select DMA_OPS
> +
> endmenu
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 5aae66e..1a23cb0 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -39,3 +39,4 @@ xen-gntalloc-y := gntalloc.o
> xen-privcmd-y := privcmd.o privcmd-buf.o
> obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF) += xen-front-pgdir-shbuf.o
> obj-$(CONFIG_XEN_UNPOPULATED_ALLOC) += unpopulated-alloc.o
> +obj-$(CONFIG_XEN_GRANT_DMA_OPS) += grant-dma-ops.o
> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> new file mode 100644
> index 00000000..29ad7bf
> --- /dev/null
> +++ b/drivers/xen/grant-dma-ops.c
> @@ -0,0 +1,314 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Xen grant DMA-mapping layer - contains special DMA-mapping routines
> + * for providing grant references as DMA addresses to be used by frontends
> + * (e.g. virtio) in Xen guests
> + *
> + * Copyright (c) 2021, Juergen Gross <[email protected]>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/dma-map-ops.h>
> +#include <linux/of.h>
> +#include <linux/pfn.h>
> +#include <linux/xarray.h>
> +#include <xen/xen.h>
> +#include <xen/grant_table.h>
> +
> +struct xen_grant_dma_data {
> + /* The ID of backend domain */
> + domid_t dev_domid;
> + /* Is device behaving sane? */
> + bool broken;
> +};
> +
> +static DEFINE_XARRAY(xen_grant_dma_devices);
> +
> +#define XEN_GRANT_DMA_ADDR_OFF (1ULL << 63)
> +
> +static inline dma_addr_t grant_to_dma(grant_ref_t grant)
> +{
> + return XEN_GRANT_DMA_ADDR_OFF | ((dma_addr_t)grant << PAGE_SHIFT);
> +}
> +
> +static inline grant_ref_t dma_to_grant(dma_addr_t dma)
> +{
> + return (grant_ref_t)((dma & ~XEN_GRANT_DMA_ADDR_OFF) >> PAGE_SHIFT);
> +}
> +
> +static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev)
> +{
> + struct xen_grant_dma_data *data;
> +
> + xa_lock(&xen_grant_dma_devices);
> + data = xa_load(&xen_grant_dma_devices, (unsigned long)dev);
> + xa_unlock(&xen_grant_dma_devices);
> +
> + return data;
> +}
> +
> +/*
> + * DMA ops for Xen frontends (e.g. virtio).
> + *
> + * Used to act as a kind of software IOMMU for Xen guests by using grants as
> + * DMA addresses.
> + * Such a DMA address is formed by using the grant reference as a frame
> + * number and setting the highest address bit (this bit is for the backend
> + * to be able to distinguish it from e.g. a mmio address).
> + *
> + * Note that for now we hard wire dom0 to be the backend domain. In order
> + * to support any domain as backend we'd need to add a way to communicate
> + * the domid of this backend, e.g. via Xenstore, via the PCI-device's
> + * config space or DT/ACPI.
> + */
> +static void *xen_grant_dma_alloc(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp,
> + unsigned long attrs)
> +{
> + struct xen_grant_dma_data *data;
> + unsigned int i, n_pages = PFN_UP(size);
> + unsigned long pfn;
> + grant_ref_t grant;
> + void *ret;
> +
> + data = find_xen_grant_dma_data(dev);
> + if (!data)
> + return NULL;
> +
> + if (unlikely(data->broken))
> + return NULL;
> +
> + ret = alloc_pages_exact(n_pages * PAGE_SIZE, gfp);
> + if (!ret)
> + return NULL;
> +
> + pfn = virt_to_pfn(ret);
> +
> + if (gnttab_alloc_grant_reference_seq(n_pages, &grant)) {
> + free_pages_exact(ret, n_pages * PAGE_SIZE);
> + return NULL;
> + }
> +
> + for (i = 0; i < n_pages; i++) {
> + gnttab_grant_foreign_access_ref(grant + i, data->dev_domid,
> + pfn_to_gfn(pfn + i), 0);
> + }
> +
> + *dma_handle = grant_to_dma(grant);
> +
> + return ret;
> +}
> +
> +static void xen_grant_dma_free(struct device *dev, size_t size, void *vaddr,
> + dma_addr_t dma_handle, unsigned long attrs)
> +{
> + struct xen_grant_dma_data *data;
> + unsigned int i, n_pages = PFN_UP(size);
> + grant_ref_t grant;
> +
> + data = find_xen_grant_dma_data(dev);
> + if (!data)
> + return;
> +
> + if (unlikely(data->broken))
> + return;
> +
> + grant = dma_to_grant(dma_handle);
> +
> + for (i = 0; i < n_pages; i++) {
> + if (unlikely(!gnttab_end_foreign_access_ref(grant + i))) {
> + dev_alert(dev, "Grant still in use by backend domain, disabled for further use\n");
> + data->broken = true;
> + return;
> + }
> + }
> +
> + gnttab_free_grant_reference_seq(grant, n_pages);
> +
> + free_pages_exact(vaddr, n_pages * PAGE_SIZE);
> +}
> +
> +static struct page *xen_grant_dma_alloc_pages(struct device *dev, size_t size,
> + dma_addr_t *dma_handle,
> + enum dma_data_direction dir,
> + gfp_t gfp)
> +{
> + void *vaddr;
> +
> + vaddr = xen_grant_dma_alloc(dev, size, dma_handle, gfp, 0);
> + if (!vaddr)
> + return NULL;
> +
> + return virt_to_page(vaddr);
> +}
> +
> +static void xen_grant_dma_free_pages(struct device *dev, size_t size,
> + struct page *vaddr, dma_addr_t dma_handle,
> + enum dma_data_direction dir)
> +{
> + xen_grant_dma_free(dev, size, page_to_virt(vaddr), dma_handle, 0);
> +}
> +
> +static dma_addr_t xen_grant_dma_map_page(struct device *dev, struct page *page,
> + unsigned long offset, size_t size,
> + enum dma_data_direction dir,
> + unsigned long attrs)
> +{
> + struct xen_grant_dma_data *data;
> + unsigned int i, n_pages = PFN_UP(size);
> + grant_ref_t grant;
> + dma_addr_t dma_handle;
> +
> + if (WARN_ON(dir == DMA_NONE))
> + return DMA_MAPPING_ERROR;
> +
> + data = find_xen_grant_dma_data(dev);
> + if (!data)
> + return DMA_MAPPING_ERROR;
> +
> + if (unlikely(data->broken))
> + return DMA_MAPPING_ERROR;
> +
> + if (gnttab_alloc_grant_reference_seq(n_pages, &grant))
> + return DMA_MAPPING_ERROR;
> +
> + for (i = 0; i < n_pages; i++) {
> + gnttab_grant_foreign_access_ref(grant + i, data->dev_domid,
> + xen_page_to_gfn(page) + i, dir == DMA_TO_DEVICE);
> + }
> +
> + dma_handle = grant_to_dma(grant) + offset;
> +
> + return dma_handle;
> +}
> +
> +static void xen_grant_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> + size_t size, enum dma_data_direction dir,
> + unsigned long attrs)
> +{
> + struct xen_grant_dma_data *data;
> + unsigned int i, n_pages = PFN_UP(size);
> + grant_ref_t grant;
> +
> + if (WARN_ON(dir == DMA_NONE))
> + return;
> +
> + data = find_xen_grant_dma_data(dev);
> + if (!data)
> + return;
> +
> + if (unlikely(data->broken))
> + return;
> +
> + grant = dma_to_grant(dma_handle);
> +
> + for (i = 0; i < n_pages; i++) {
> + if (unlikely(!gnttab_end_foreign_access_ref(grant + i))) {
> + dev_alert(dev, "Grant still in use by backend domain, disabled for further use\n");
> + data->broken = true;
> + return;
> + }
> + }
> +
> + gnttab_free_grant_reference_seq(grant, n_pages);
> +}
> +
> +static void xen_grant_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
> + int nents, enum dma_data_direction dir,
> + unsigned long attrs)
> +{
> + struct scatterlist *s;
> + unsigned int i;
> +
> + if (WARN_ON(dir == DMA_NONE))
> + return;
> +
> + for_each_sg(sg, s, nents, i)
> + xen_grant_dma_unmap_page(dev, s->dma_address, sg_dma_len(s), dir,
> + attrs);
> +}
> +
> +static int xen_grant_dma_map_sg(struct device *dev, struct scatterlist *sg,
> + int nents, enum dma_data_direction dir,
> + unsigned long attrs)
> +{
> + struct scatterlist *s;
> + unsigned int i;
> +
> + if (WARN_ON(dir == DMA_NONE))
> + return -EINVAL;
> +
> + for_each_sg(sg, s, nents, i) {
> + s->dma_address = xen_grant_dma_map_page(dev, sg_page(s), s->offset,
> + s->length, dir, attrs);
> + if (s->dma_address == DMA_MAPPING_ERROR)
> + goto out;
> +
> + sg_dma_len(s) = s->length;
> + }
> +
> + return nents;
> +
> +out:
> + xen_grant_dma_unmap_sg(dev, sg, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
> + sg_dma_len(sg) = 0;
> +
> + return -EIO;
> +}
> +
> +static int xen_grant_dma_supported(struct device *dev, u64 mask)
> +{
> + return mask == DMA_BIT_MASK(64);
> +}
> +
> +static const struct dma_map_ops xen_grant_dma_ops = {
> + .alloc = xen_grant_dma_alloc,
> + .free = xen_grant_dma_free,
> + .alloc_pages = xen_grant_dma_alloc_pages,
> + .free_pages = xen_grant_dma_free_pages,
> + .mmap = dma_common_mmap,
> + .get_sgtable = dma_common_get_sgtable,
> + .map_page = xen_grant_dma_map_page,
> + .unmap_page = xen_grant_dma_unmap_page,
> + .map_sg = xen_grant_dma_map_sg,
> + .unmap_sg = xen_grant_dma_unmap_sg,
> + .dma_supported = xen_grant_dma_supported,
> +};
> +
> +void xen_grant_setup_dma_ops(struct device *dev)
> +{
> + struct xen_grant_dma_data *data;
> + uint32_t dev_domid;
> +
> + data = find_xen_grant_dma_data(dev);
> + if (data) {
> + dev_err(dev, "Xen grant DMA data is already created\n");
> + return;
> + }
> +
> + /* XXX The dom0 is hardcoded as the backend domain for now */
> + dev_domid = 0;
> +
> + data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> + if (!data)
> + goto err;
> +
> + data->dev_domid = dev_domid;
> +
> + if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
> + GFP_KERNEL))) {
> + dev_err(dev, "Cannot store Xen grant DMA data\n");
> + goto err;
> + }
> +
> + dev->dma_ops = &xen_grant_dma_ops;
> +
> + return;
> +
> +err:
> + dev_err(dev, "Сannot set up Xen grant DMA ops, retain platform DMA ops\n");
> +}
> +
> +MODULE_DESCRIPTION("Xen grant DMA-mapping layer");
> +MODULE_AUTHOR("Juergen Gross <[email protected]>");
> +MODULE_LICENSE("GPL");
> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
> index a3584a3..4f9fad5 100644
> --- a/include/xen/xen-ops.h
> +++ b/include/xen/xen-ops.h
> @@ -221,4 +221,12 @@ static inline void xen_preemptible_hcall_end(void) { }
>
> #endif /* CONFIG_XEN_PV && !CONFIG_PREEMPTION */
>
> +#ifdef CONFIG_XEN_GRANT_DMA_OPS
> +void xen_grant_setup_dma_ops(struct device *dev);
> +#else
> +static inline void xen_grant_setup_dma_ops(struct device *dev)
> +{
> +}
> +#endif /* CONFIG_XEN_GRANT_DMA_OPS */
> +
> #endif /* INCLUDE_XEN_OPS_H */
> --
> 2.7.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2022-05-12 07:20:55

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants


On 5/11/22 2:00 PM, Oleksandr wrote:
>
> On 07.05.22 21:19, Oleksandr Tyshchenko wrote:
>
> Hello Boris, Stefano
>
>
>> From: Juergen Gross <[email protected]>
>>
>> For support of virtio via grant mappings in rare cases larger mappings
>> using consecutive grants are needed. Support those by adding a bitmap
>> of free grants.
>>
>> As consecutive grants will be needed only in very rare cases (e.g. when
>> configuring a virtio device with a multi-page ring), optimize for the
>> normal case of non-consecutive allocations.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>> Changes RFC -> V1:
>>     - no changes
>>     Changes V1 -> V2:
>>     - no changes
>
>
> May I please ask for the review here?



I had a quick look but I am stuck on get_free_seq(), I need to stare at it some more. Unless someone else reviews this, I will try to get to this in the next couple of days.


One thing I did notice is


>
>> @@ -1452,6 +1624,13 @@ int gnttab_init(void)
>>           }
>>       }
>>   +    i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
>> +    gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
>> +    if (!gnttab_free_bitmap) {
>> +        ret = -ENOMEM;
>> +        goto ini_nomem;
>> +    }


This overwrites 'i' and will break error handling at ini_nomem.


-boris



2022-05-12 17:13:50

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants


On 07.05.22 21:19, Oleksandr Tyshchenko wrote:

Hello Boris, Stefano


> From: Juergen Gross <[email protected]>
>
> For support of virtio via grant mappings in rare cases larger mappings
> using consecutive grants are needed. Support those by adding a bitmap
> of free grants.
>
> As consecutive grants will be needed only in very rare cases (e.g. when
> configuring a virtio device with a multi-page ring), optimize for the
> normal case of non-consecutive allocations.
>
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> Changes RFC -> V1:
> - no changes
>
> Changes V1 -> V2:
> - no changes


May I please ask for the review here?


This patch has been tested in various scenarios:

1. Guest with Xen PV drivers only (gnttab_alloc(free)_grant_reference()
usage only)

2. Guest with Virtio drivers only
(gnttab_alloc(free)_grant_reference_seq() usage only)

3. Guest with Virtio and Xen PV drivers (combined
gnttab_alloc(free)_grant_reference() and
gnttab_alloc(free)_grant_reference_seq() usage)


> ---
> drivers/xen/grant-table.c | 238 +++++++++++++++++++++++++++++++++++++++-------
> include/xen/grant_table.h | 4 +
> 2 files changed, 210 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 8ccccac..1b458c0 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -33,6 +33,7 @@
>
> #define pr_fmt(fmt) "xen:" KBUILD_MODNAME ": " fmt
>
> +#include <linux/bitmap.h>
> #include <linux/memblock.h>
> #include <linux/sched.h>
> #include <linux/mm.h>
> @@ -72,9 +73,32 @@
>
> static grant_ref_t **gnttab_list;
> static unsigned int nr_grant_frames;
> +
> +/*
> + * Handling of free grants:
> + *
> + * Free grants are in a simple list anchored in gnttab_free_head. They are
> + * linked by grant ref, the last element contains GNTTAB_LIST_END. The number
> + * of free entries is stored in gnttab_free_count.
> + * Additionally there is a bitmap of free entries anchored in
> + * gnttab_free_bitmap. This is being used for simplifying allocation of
> + * multiple consecutive grants, which is needed e.g. for support of virtio.
> + * gnttab_last_free is used to add free entries of new frames at the end of
> + * the free list.
> + * gnttab_free_tail_ptr specifies the variable which references the start
> + * of consecutive free grants ending with gnttab_last_free. This pointer is
> + * updated in a rather defensive way, in order to avoid performance hits in
> + * hot paths.
> + * All those variables are protected by gnttab_list_lock.
> + */
> static int gnttab_free_count;
> -static grant_ref_t gnttab_free_head;
> +static unsigned int gnttab_size;
> +static grant_ref_t gnttab_free_head = GNTTAB_LIST_END;
> +static grant_ref_t gnttab_last_free = GNTTAB_LIST_END;
> +static grant_ref_t *gnttab_free_tail_ptr;
> +static unsigned long *gnttab_free_bitmap;
> static DEFINE_SPINLOCK(gnttab_list_lock);
> +
> struct grant_frames xen_auto_xlat_grant_frames;
> static unsigned int xen_gnttab_version;
> module_param_named(version, xen_gnttab_version, uint, 0);
> @@ -170,16 +194,111 @@ static int get_free_entries(unsigned count)
>
> ref = head = gnttab_free_head;
> gnttab_free_count -= count;
> - while (count-- > 1)
> - head = gnttab_entry(head);
> + while (count--) {
> + bitmap_clear(gnttab_free_bitmap, head, 1);
> + if (gnttab_free_tail_ptr == __gnttab_entry(head))
> + gnttab_free_tail_ptr = &gnttab_free_head;
> + if (count)
> + head = gnttab_entry(head);
> + }
> gnttab_free_head = gnttab_entry(head);
> gnttab_entry(head) = GNTTAB_LIST_END;
>
> + if (!gnttab_free_count) {
> + gnttab_last_free = GNTTAB_LIST_END;
> + gnttab_free_tail_ptr = NULL;
> + }
> +
> spin_unlock_irqrestore(&gnttab_list_lock, flags);
>
> return ref;
> }
>
> +static int get_seq_entry_count(void)
> +{
> + if (gnttab_last_free == GNTTAB_LIST_END || !gnttab_free_tail_ptr ||
> + *gnttab_free_tail_ptr == GNTTAB_LIST_END)
> + return 0;
> +
> + return gnttab_last_free - *gnttab_free_tail_ptr + 1;
> +}
> +
> +/* Rebuilds the free grant list and tries to find count consecutive entries. */
> +static int get_free_seq(unsigned int count)
> +{
> + int ret = -ENOSPC;
> + unsigned int from, to;
> + grant_ref_t *last;
> +
> + gnttab_free_tail_ptr = &gnttab_free_head;
> + last = &gnttab_free_head;
> +
> + for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
> + from < gnttab_size;
> + from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
> + to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
> + from + 1);
> + if (ret < 0 && to - from >= count) {
> + ret = from;
> + bitmap_clear(gnttab_free_bitmap, ret, count);
> + from += count;
> + gnttab_free_count -= count;
> + if (from == to)
> + continue;
> + }
> +
> + while (from < to) {
> + *last = from;
> + last = __gnttab_entry(from);
> + gnttab_last_free = from;
> + from++;
> + }
> + if (to < gnttab_size)
> + gnttab_free_tail_ptr = __gnttab_entry(to - 1);
> + }
> +
> + *last = GNTTAB_LIST_END;
> + if (gnttab_last_free != gnttab_size - 1)
> + gnttab_free_tail_ptr = NULL;
> +
> + return ret;
> +}
> +
> +static int get_free_entries_seq(unsigned int count)
> +{
> + unsigned long flags;
> + int ret = 0;
> +
> + spin_lock_irqsave(&gnttab_list_lock, flags);
> +
> + if (gnttab_free_count < count) {
> + ret = gnttab_expand(count - gnttab_free_count);
> + if (ret < 0)
> + goto out;
> + }
> +
> + if (get_seq_entry_count() < count) {
> + ret = get_free_seq(count);
> + if (ret >= 0)
> + goto out;
> + ret = gnttab_expand(count - get_seq_entry_count());
> + if (ret < 0)
> + goto out;
> + }
> +
> + ret = *gnttab_free_tail_ptr;
> + *gnttab_free_tail_ptr = gnttab_entry(ret + count - 1);
> + gnttab_free_count -= count;
> + if (!gnttab_free_count)
> + gnttab_free_tail_ptr = NULL;
> + bitmap_clear(gnttab_free_bitmap, ret, count);
> +
> + out:
> + spin_unlock_irqrestore(&gnttab_list_lock, flags);
> +
> + return ret;
> +}
> +
> static void do_free_callbacks(void)
> {
> struct gnttab_free_callback *callback, *next;
> @@ -206,17 +325,48 @@ static inline void check_free_callbacks(void)
> do_free_callbacks();
> }
>
> -static void put_free_entry(grant_ref_t ref)
> +static void put_free_entry_locked(grant_ref_t ref)
> {
> - unsigned long flags;
> - spin_lock_irqsave(&gnttab_list_lock, flags);
> gnttab_entry(ref) = gnttab_free_head;
> gnttab_free_head = ref;
> + if (!gnttab_free_count)
> + gnttab_last_free = ref;
> + if (gnttab_free_tail_ptr == &gnttab_free_head)
> + gnttab_free_tail_ptr = __gnttab_entry(ref);
> gnttab_free_count++;
> + bitmap_set(gnttab_free_bitmap, ref, 1);
> +}
> +
> +static void put_free_entry(grant_ref_t ref)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&gnttab_list_lock, flags);
> + put_free_entry_locked(ref);
> check_free_callbacks();
> spin_unlock_irqrestore(&gnttab_list_lock, flags);
> }
>
> +static void gnttab_set_free(unsigned int start, unsigned int n)
> +{
> + unsigned int i;
> +
> + for (i = start; i < start + n - 1; i++)
> + gnttab_entry(i) = i + 1;
> +
> + gnttab_entry(i) = GNTTAB_LIST_END;
> + if (!gnttab_free_count) {
> + gnttab_free_head = start;
> + gnttab_free_tail_ptr = &gnttab_free_head;
> + } else {
> + gnttab_entry(gnttab_last_free) = start;
> + }
> + gnttab_free_count += n;
> + gnttab_last_free = i;
> +
> + bitmap_set(gnttab_free_bitmap, start, n);
> +}
> +
> /*
> * Following applies to gnttab_update_entry_v1 and gnttab_update_entry_v2.
> * Introducing a valid entry into the grant table:
> @@ -448,23 +598,31 @@ void gnttab_free_grant_references(grant_ref_t head)
> {
> grant_ref_t ref;
> unsigned long flags;
> - int count = 1;
> - if (head == GNTTAB_LIST_END)
> - return;
> +
> spin_lock_irqsave(&gnttab_list_lock, flags);
> - ref = head;
> - while (gnttab_entry(ref) != GNTTAB_LIST_END) {
> - ref = gnttab_entry(ref);
> - count++;
> + while (head != GNTTAB_LIST_END) {
> + ref = gnttab_entry(head);
> + put_free_entry_locked(head);
> + head = ref;
> }
> - gnttab_entry(ref) = gnttab_free_head;
> - gnttab_free_head = head;
> - gnttab_free_count += count;
> check_free_callbacks();
> spin_unlock_irqrestore(&gnttab_list_lock, flags);
> }
> EXPORT_SYMBOL_GPL(gnttab_free_grant_references);
>
> +void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count)
> +{
> + unsigned long flags;
> + unsigned int i;
> +
> + spin_lock_irqsave(&gnttab_list_lock, flags);
> + for (i = count; i > 0; i--)
> + put_free_entry_locked(head + i - 1);
> + check_free_callbacks();
> + spin_unlock_irqrestore(&gnttab_list_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(gnttab_free_grant_reference_seq);
> +
> int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
> {
> int h = get_free_entries(count);
> @@ -478,6 +636,24 @@ int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
> }
> EXPORT_SYMBOL_GPL(gnttab_alloc_grant_references);
>
> +int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first)
> +{
> + int h;
> +
> + if (count == 1)
> + h = get_free_entries(1);
> + else
> + h = get_free_entries_seq(count);
> +
> + if (h < 0)
> + return -ENOSPC;
> +
> + *first = h;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(gnttab_alloc_grant_reference_seq);
> +
> int gnttab_empty_grant_references(const grant_ref_t *private_head)
> {
> return (*private_head == GNTTAB_LIST_END);
> @@ -570,16 +746,13 @@ static int grow_gnttab_list(unsigned int more_frames)
> goto grow_nomem;
> }
>
> + gnttab_set_free(gnttab_size, extra_entries);
>
> - for (i = grefs_per_frame * nr_grant_frames;
> - i < grefs_per_frame * new_nr_grant_frames - 1; i++)
> - gnttab_entry(i) = i + 1;
> -
> - gnttab_entry(i) = gnttab_free_head;
> - gnttab_free_head = grefs_per_frame * nr_grant_frames;
> - gnttab_free_count += extra_entries;
> + if (!gnttab_free_tail_ptr)
> + gnttab_free_tail_ptr = __gnttab_entry(gnttab_size);
>
> nr_grant_frames = new_nr_grant_frames;
> + gnttab_size += extra_entries;
>
> check_free_callbacks();
>
> @@ -1424,7 +1597,6 @@ int gnttab_init(void)
> int i;
> unsigned long max_nr_grant_frames;
> unsigned int max_nr_glist_frames, nr_glist_frames;
> - unsigned int nr_init_grefs;
> int ret;
>
> gnttab_request_version();
> @@ -1452,6 +1624,13 @@ int gnttab_init(void)
> }
> }
>
> + i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
> + gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
> + if (!gnttab_free_bitmap) {
> + ret = -ENOMEM;
> + goto ini_nomem;
> + }
> +
> ret = arch_gnttab_init(max_nr_grant_frames,
> nr_status_frames(max_nr_grant_frames));
> if (ret < 0)
> @@ -1462,15 +1641,9 @@ int gnttab_init(void)
> goto ini_nomem;
> }
>
> - nr_init_grefs = nr_grant_frames *
> - gnttab_interface->grefs_per_grant_frame;
> -
> - for (i = NR_RESERVED_ENTRIES; i < nr_init_grefs - 1; i++)
> - gnttab_entry(i) = i + 1;
> + gnttab_size = nr_grant_frames * gnttab_interface->grefs_per_grant_frame;
>
> - gnttab_entry(nr_init_grefs - 1) = GNTTAB_LIST_END;
> - gnttab_free_count = nr_init_grefs - NR_RESERVED_ENTRIES;
> - gnttab_free_head = NR_RESERVED_ENTRIES;
> + gnttab_set_free(NR_RESERVED_ENTRIES, gnttab_size - NR_RESERVED_ENTRIES);
>
> printk("Grant table initialized\n");
> return 0;
> @@ -1479,6 +1652,7 @@ int gnttab_init(void)
> for (i--; i >= 0; i--)
> free_page((unsigned long)gnttab_list[i]);
> kfree(gnttab_list);
> + bitmap_free(gnttab_free_bitmap);
> return ret;
> }
> EXPORT_SYMBOL_GPL(gnttab_init);
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index dfd5bf3..d815e1d 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -129,10 +129,14 @@ int gnttab_try_end_foreign_access(grant_ref_t ref);
> */
> int gnttab_alloc_grant_references(u16 count, grant_ref_t *pprivate_head);
>
> +int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first);
> +
> void gnttab_free_grant_reference(grant_ref_t ref);
>
> void gnttab_free_grant_references(grant_ref_t head);
>
> +void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count);
> +
> int gnttab_empty_grant_references(const grant_ref_t *pprivate_head);
>
> int gnttab_claim_grant_reference(grant_ref_t *pprivate_head);

--
Regards,

Oleksandr Tyshchenko


2022-05-13 22:50:06

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants

On 12.05.22 22:01, Boris Ostrovsky wrote:
>
> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
>> +
>> +/*
>> + * Handling of free grants:
>> + *
>> + * Free grants are in a simple list anchored in gnttab_free_head. They are
>> + * linked by grant ref, the last element contains GNTTAB_LIST_END. The number
>> + * of free entries is stored in gnttab_free_count.
>> + * Additionally there is a bitmap of free entries anchored in
>> + * gnttab_free_bitmap. This is being used for simplifying allocation of
>> + * multiple consecutive grants, which is needed e.g. for support of virtio.
>> + * gnttab_last_free is used to add free entries of new frames at the end of
>> + * the free list.
>> + * gnttab_free_tail_ptr specifies the variable which references the start
>
>
> If this references the start of the free interval, why is it called
> gnttab_free_*tail*_ptr?

Because this is the tail of the whole area which is free.

It could be renamed to gnttab_free_tail_start_ptr. :-)

>
>
>> + * of consecutive free grants ending with gnttab_last_free. This pointer is
>> + * updated in a rather defensive way, in order to avoid performance hits in
>> + * hot paths.
>> + * All those variables are protected by gnttab_list_lock.
>> + */
>>   static int gnttab_free_count;
>> -static grant_ref_t gnttab_free_head;
>> +static unsigned int gnttab_size;
>> +static grant_ref_t gnttab_free_head = GNTTAB_LIST_END;
>> +static grant_ref_t gnttab_last_free = GNTTAB_LIST_END;
>> +static grant_ref_t *gnttab_free_tail_ptr;
>> +static unsigned long *gnttab_free_bitmap;
>>   static DEFINE_SPINLOCK(gnttab_list_lock);
>> +
>>   struct grant_frames xen_auto_xlat_grant_frames;
>>   static unsigned int xen_gnttab_version;
>>   module_param_named(version, xen_gnttab_version, uint, 0);
>> @@ -170,16 +194,111 @@ static int get_free_entries(unsigned count)
>>       ref = head = gnttab_free_head;
>>       gnttab_free_count -= count;
>> -    while (count-- > 1)
>> -        head = gnttab_entry(head);
>> +    while (count--) {
>> +        bitmap_clear(gnttab_free_bitmap, head, 1);
>> +        if (gnttab_free_tail_ptr == __gnttab_entry(head))
>> +            gnttab_free_tail_ptr = &gnttab_free_head;
>> +        if (count)
>> +            head = gnttab_entry(head);
>> +    }
>>       gnttab_free_head = gnttab_entry(head);
>>       gnttab_entry(head) = GNTTAB_LIST_END;
>> +    if (!gnttab_free_count) {
>> +        gnttab_last_free = GNTTAB_LIST_END;
>> +        gnttab_free_tail_ptr = NULL;
>> +    }
>> +
>>       spin_unlock_irqrestore(&gnttab_list_lock, flags);
>>       return ref;
>>   }
>> +static int get_seq_entry_count(void)
>> +{
>> +    if (gnttab_last_free == GNTTAB_LIST_END || !gnttab_free_tail_ptr ||
>> +        *gnttab_free_tail_ptr == GNTTAB_LIST_END)
>> +        return 0;
>> +
>> +    return gnttab_last_free - *gnttab_free_tail_ptr + 1;
>> +}
>> +
>> +/* Rebuilds the free grant list and tries to find count consecutive entries. */
>> +static int get_free_seq(unsigned int count)
>> +{
>> +    int ret = -ENOSPC;
>> +    unsigned int from, to;
>> +    grant_ref_t *last;
>> +
>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>> +    last = &gnttab_free_head;
>> +
>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>> +         from < gnttab_size;
>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>> +                    from + 1);
>> +        if (ret < 0 && to - from >= count) {
>> +            ret = from;
>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>> +            from += count;
>> +            gnttab_free_count -= count;
>
>
> IIUIC we can have multiple passes over this, meaning that the gnttab_free_count
> may be decremented more than once. Is that intentional?

After the first pass decrementing gnttab_free_cnt, ret will no
longer be less than zero, so this can be hit only once.

>
>
>> +            if (from == to)
>> +                continue;
>> +        }
>> +
>> +        while (from < to) {
>> +            *last = from;
>> +            last = __gnttab_entry(from);
>> +            gnttab_last_free = from;
>> +            from++;
>> +        }
>
>
> I have been looking at this loop and I can't understand what it is doing ;-( Can
> you enlighten me?

It is recreating the free list in order to have it properly sorted.
This is needed to make sure that the free tail has the maximum
possible size (you can take the tail off the list without having
to worry about breaking the linked list because of references into
the tail).


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-05-14 01:21:01

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants


On 13.05.22 08:33, Juergen Gross wrote:
> On 12.05.22 22:01, Boris Ostrovsky wrote:


Hello Juergen, Boris


[Juergen, thank you for your explanation]


>
>>
>> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
>>> +
>>> +/*
>>> + * Handling of free grants:
>>> + *
>>> + * Free grants are in a simple list anchored in gnttab_free_head.
>>> They are
>>> + * linked by grant ref, the last element contains GNTTAB_LIST_END.
>>> The number
>>> + * of free entries is stored in gnttab_free_count.
>>> + * Additionally there is a bitmap of free entries anchored in
>>> + * gnttab_free_bitmap. This is being used for simplifying
>>> allocation of
>>> + * multiple consecutive grants, which is needed e.g. for support of
>>> virtio.
>>> + * gnttab_last_free is used to add free entries of new frames at
>>> the end of
>>> + * the free list.
>>> + * gnttab_free_tail_ptr specifies the variable which references the
>>> start
>>
>>
>> If this references the start of the free interval, why is it called
>> gnttab_free_*tail*_ptr?
>
> Because this is the tail of the whole area which is free.
>
> It could be renamed to gnttab_free_tail_start_ptr. :-)
>
>>
>>
>>> + * of consecutive free grants ending with gnttab_last_free. This
>>> pointer is
>>> + * updated in a rather defensive way, in order to avoid performance
>>> hits in
>>> + * hot paths.
>>> + * All those variables are protected by gnttab_list_lock.
>>> + */
>>>   static int gnttab_free_count;
>>> -static grant_ref_t gnttab_free_head;
>>> +static unsigned int gnttab_size;
>>> +static grant_ref_t gnttab_free_head = GNTTAB_LIST_END;
>>> +static grant_ref_t gnttab_last_free = GNTTAB_LIST_END;
>>> +static grant_ref_t *gnttab_free_tail_ptr;
>>> +static unsigned long *gnttab_free_bitmap;
>>>   static DEFINE_SPINLOCK(gnttab_list_lock);
>>> +
>>>   struct grant_frames xen_auto_xlat_grant_frames;
>>>   static unsigned int xen_gnttab_version;
>>>   module_param_named(version, xen_gnttab_version, uint, 0);
>>> @@ -170,16 +194,111 @@ static int get_free_entries(unsigned count)
>>>       ref = head = gnttab_free_head;
>>>       gnttab_free_count -= count;
>>> -    while (count-- > 1)
>>> -        head = gnttab_entry(head);
>>> +    while (count--) {
>>> +        bitmap_clear(gnttab_free_bitmap, head, 1);
>>> +        if (gnttab_free_tail_ptr == __gnttab_entry(head))
>>> +            gnttab_free_tail_ptr = &gnttab_free_head;
>>> +        if (count)
>>> +            head = gnttab_entry(head);
>>> +    }
>>>       gnttab_free_head = gnttab_entry(head);
>>>       gnttab_entry(head) = GNTTAB_LIST_END;
>>> +    if (!gnttab_free_count) {
>>> +        gnttab_last_free = GNTTAB_LIST_END;
>>> +        gnttab_free_tail_ptr = NULL;
>>> +    }
>>> +
>>>       spin_unlock_irqrestore(&gnttab_list_lock, flags);
>>>       return ref;
>>>   }
>>> +static int get_seq_entry_count(void)
>>> +{
>>> +    if (gnttab_last_free == GNTTAB_LIST_END ||
>>> !gnttab_free_tail_ptr ||
>>> +        *gnttab_free_tail_ptr == GNTTAB_LIST_END)
>>> +        return 0;
>>> +
>>> +    return gnttab_last_free - *gnttab_free_tail_ptr + 1;
>>> +}
>>> +
>>> +/* Rebuilds the free grant list and tries to find count consecutive
>>> entries. */
>>> +static int get_free_seq(unsigned int count)
>>> +{
>>> +    int ret = -ENOSPC;
>>> +    unsigned int from, to;
>>> +    grant_ref_t *last;
>>> +
>>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>>> +    last = &gnttab_free_head;
>>> +
>>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>>> +         from < gnttab_size;
>>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, to +
>>> 1)) {
>>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>>> +                    from + 1);
>>> +        if (ret < 0 && to - from >= count) {
>>> +            ret = from;
>>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>>> +            from += count;
>>> +            gnttab_free_count -= count;
>>
>>
>> IIUIC we can have multiple passes over this, meaning that the
>> gnttab_free_count may be decremented more than once. Is that
>> intentional?
>
> After the first pass decrementing gnttab_free_cnt, ret will no
> longer be less than zero, so this can be hit only once.
>
>>
>>
>>> +            if (from == to)
>>> +                continue;
>>> +        }
>>> +
>>> +        while (from < to) {
>>> +            *last = from;
>>> +            last = __gnttab_entry(from);
>>> +            gnttab_last_free = from;
>>> +            from++;
>>> +        }
>>
>>
>> I have been looking at this loop and I can't understand what it is
>> doing ;-( Can you enlighten me?
>
> It is recreating the free list in order to have it properly sorted.
> This is needed to make sure that the free tail has the maximum
> possible size (you can take the tail off the list without having
> to worry about breaking the linked list because of references into
> the tail).

I think the loop deserves a comment on top, I will add it.



>
>
>
> Juergen

--
Regards,

Oleksandr Tyshchenko


2022-05-14 01:44:04

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants


On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
> +
> +/*
> + * Handling of free grants:
> + *
> + * Free grants are in a simple list anchored in gnttab_free_head. They are
> + * linked by grant ref, the last element contains GNTTAB_LIST_END. The number
> + * of free entries is stored in gnttab_free_count.
> + * Additionally there is a bitmap of free entries anchored in
> + * gnttab_free_bitmap. This is being used for simplifying allocation of
> + * multiple consecutive grants, which is needed e.g. for support of virtio.
> + * gnttab_last_free is used to add free entries of new frames at the end of
> + * the free list.
> + * gnttab_free_tail_ptr specifies the variable which references the start


If this references the start of the free interval, why is it called gnttab_free_*tail*_ptr?


> + * of consecutive free grants ending with gnttab_last_free. This pointer is
> + * updated in a rather defensive way, in order to avoid performance hits in
> + * hot paths.
> + * All those variables are protected by gnttab_list_lock.
> + */
> static int gnttab_free_count;
> -static grant_ref_t gnttab_free_head;
> +static unsigned int gnttab_size;
> +static grant_ref_t gnttab_free_head = GNTTAB_LIST_END;
> +static grant_ref_t gnttab_last_free = GNTTAB_LIST_END;
> +static grant_ref_t *gnttab_free_tail_ptr;
> +static unsigned long *gnttab_free_bitmap;
> static DEFINE_SPINLOCK(gnttab_list_lock);
> +
> struct grant_frames xen_auto_xlat_grant_frames;
> static unsigned int xen_gnttab_version;
> module_param_named(version, xen_gnttab_version, uint, 0);
> @@ -170,16 +194,111 @@ static int get_free_entries(unsigned count)
>
> ref = head = gnttab_free_head;
> gnttab_free_count -= count;
> - while (count-- > 1)
> - head = gnttab_entry(head);
> + while (count--) {
> + bitmap_clear(gnttab_free_bitmap, head, 1);
> + if (gnttab_free_tail_ptr == __gnttab_entry(head))
> + gnttab_free_tail_ptr = &gnttab_free_head;
> + if (count)
> + head = gnttab_entry(head);
> + }
> gnttab_free_head = gnttab_entry(head);
> gnttab_entry(head) = GNTTAB_LIST_END;
>
> + if (!gnttab_free_count) {
> + gnttab_last_free = GNTTAB_LIST_END;
> + gnttab_free_tail_ptr = NULL;
> + }
> +
> spin_unlock_irqrestore(&gnttab_list_lock, flags);
>
> return ref;
> }
>
> +static int get_seq_entry_count(void)
> +{
> + if (gnttab_last_free == GNTTAB_LIST_END || !gnttab_free_tail_ptr ||
> + *gnttab_free_tail_ptr == GNTTAB_LIST_END)
> + return 0;
> +
> + return gnttab_last_free - *gnttab_free_tail_ptr + 1;
> +}
> +
> +/* Rebuilds the free grant list and tries to find count consecutive entries. */
> +static int get_free_seq(unsigned int count)
> +{
> + int ret = -ENOSPC;
> + unsigned int from, to;
> + grant_ref_t *last;
> +
> + gnttab_free_tail_ptr = &gnttab_free_head;
> + last = &gnttab_free_head;
> +
> + for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
> + from < gnttab_size;
> + from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
> + to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
> + from + 1);
> + if (ret < 0 && to - from >= count) {
> + ret = from;
> + bitmap_clear(gnttab_free_bitmap, ret, count);
> + from += count;
> + gnttab_free_count -= count;


IIUIC we can have multiple passes over this, meaning that the gnttab_free_count may be decremented more than once. Is that intentional?


> + if (from == to)
> + continue;
> + }
> +
> + while (from < to) {
> + *last = from;
> + last = __gnttab_entry(from);
> + gnttab_last_free = from;
> + from++;
> + }


I have been looking at this loop and I can't understand what it is doing ;-( Can you enlighten me?



-boris



> + if (to < gnttab_size)
> + gnttab_free_tail_ptr = __gnttab_entry(to - 1);
> + }
> +
> + *last = GNTTAB_LIST_END;
> + if (gnttab_last_free != gnttab_size - 1)
> + gnttab_free_tail_ptr = NULL;
> +
> + return ret;
> +}
> +
> +static int get_free_entries_seq(unsigned int count)
> +{
> + unsigned long flags;
> + int ret = 0;
> +
> + spin_lock_irqsave(&gnttab_list_lock, flags);
> +
> + if (gnttab_free_count < count) {
> + ret = gnttab_expand(count - gnttab_free_count);
> + if (ret < 0)
> + goto out;
> + }
> +
> + if (get_seq_entry_count() < count) {
> + ret = get_free_seq(count);
> + if (ret >= 0)
> + goto out;
> + ret = gnttab_expand(count - get_seq_entry_count());
> + if (ret < 0)
> + goto out;
> + }
> +
> + ret = *gnttab_free_tail_ptr;
> + *gnttab_free_tail_ptr = gnttab_entry(ret + count - 1);
> + gnttab_free_count -= count;
> + if (!gnttab_free_count)
> + gnttab_free_tail_ptr = NULL;
> + bitmap_clear(gnttab_free_bitmap, ret, count);
> +
> + out:
> + spin_unlock_irqrestore(&gnttab_list_lock, flags);
> +
> + return ret;
> +}
> +
> static void do_free_callbacks(void)
> {
> struct gnttab_free_callback *callback, *next;
> @@ -206,17 +325,48 @@ static inline void check_free_callbacks(void)
> do_free_callbacks();
> }
>
> -static void put_free_entry(grant_ref_t ref)
> +static void put_free_entry_locked(grant_ref_t ref)
> {
> - unsigned long flags;
> - spin_lock_irqsave(&gnttab_list_lock, flags);
> gnttab_entry(ref) = gnttab_free_head;
> gnttab_free_head = ref;
> + if (!gnttab_free_count)
> + gnttab_last_free = ref;
> + if (gnttab_free_tail_ptr == &gnttab_free_head)
> + gnttab_free_tail_ptr = __gnttab_entry(ref);
> gnttab_free_count++;
> + bitmap_set(gnttab_free_bitmap, ref, 1);
> +}
> +
> +static void put_free_entry(grant_ref_t ref)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&gnttab_list_lock, flags);
> + put_free_entry_locked(ref);
> check_free_callbacks();
> spin_unlock_irqrestore(&gnttab_list_lock, flags);
> }
>
> +static void gnttab_set_free(unsigned int start, unsigned int n)
> +{
> + unsigned int i;
> +
> + for (i = start; i < start + n - 1; i++)
> + gnttab_entry(i) = i + 1;
> +
> + gnttab_entry(i) = GNTTAB_LIST_END;
> + if (!gnttab_free_count) {
> + gnttab_free_head = start;
> + gnttab_free_tail_ptr = &gnttab_free_head;
> + } else {
> + gnttab_entry(gnttab_last_free) = start;
> + }
> + gnttab_free_count += n;
> + gnttab_last_free = i;
> +
> + bitmap_set(gnttab_free_bitmap, start, n);
> +}
> +
> /*
> * Following applies to gnttab_update_entry_v1 and gnttab_update_entry_v2.
> * Introducing a valid entry into the grant table:
> @@ -448,23 +598,31 @@ void gnttab_free_grant_references(grant_ref_t head)
> {
> grant_ref_t ref;
> unsigned long flags;
> - int count = 1;
> - if (head == GNTTAB_LIST_END)
> - return;
> +
> spin_lock_irqsave(&gnttab_list_lock, flags);
> - ref = head;
> - while (gnttab_entry(ref) != GNTTAB_LIST_END) {
> - ref = gnttab_entry(ref);
> - count++;
> + while (head != GNTTAB_LIST_END) {
> + ref = gnttab_entry(head);
> + put_free_entry_locked(head);
> + head = ref;
> }
> - gnttab_entry(ref) = gnttab_free_head;
> - gnttab_free_head = head;
> - gnttab_free_count += count;
> check_free_callbacks();
> spin_unlock_irqrestore(&gnttab_list_lock, flags);
> }
> EXPORT_SYMBOL_GPL(gnttab_free_grant_references);
>
> +void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count)
> +{
> + unsigned long flags;
> + unsigned int i;
> +
> + spin_lock_irqsave(&gnttab_list_lock, flags);
> + for (i = count; i > 0; i--)
> + put_free_entry_locked(head + i - 1);
> + check_free_callbacks();
> + spin_unlock_irqrestore(&gnttab_list_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(gnttab_free_grant_reference_seq);
> +
> int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
> {
> int h = get_free_entries(count);
> @@ -478,6 +636,24 @@ int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
> }
> EXPORT_SYMBOL_GPL(gnttab_alloc_grant_references);
>
> +int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first)
> +{
> + int h;
> +
> + if (count == 1)
> + h = get_free_entries(1);
> + else
> + h = get_free_entries_seq(count);
> +
> + if (h < 0)
> + return -ENOSPC;
> +
> + *first = h;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(gnttab_alloc_grant_reference_seq);
> +
> int gnttab_empty_grant_references(const grant_ref_t *private_head)
> {
> return (*private_head == GNTTAB_LIST_END);
> @@ -570,16 +746,13 @@ static int grow_gnttab_list(unsigned int more_frames)
> goto grow_nomem;
> }
>
> + gnttab_set_free(gnttab_size, extra_entries);
>
> - for (i = grefs_per_frame * nr_grant_frames;
> - i < grefs_per_frame * new_nr_grant_frames - 1; i++)
> - gnttab_entry(i) = i + 1;
> -
> - gnttab_entry(i) = gnttab_free_head;
> - gnttab_free_head = grefs_per_frame * nr_grant_frames;
> - gnttab_free_count += extra_entries;
> + if (!gnttab_free_tail_ptr)
> + gnttab_free_tail_ptr = __gnttab_entry(gnttab_size);
>
> nr_grant_frames = new_nr_grant_frames;
> + gnttab_size += extra_entries;
>
> check_free_callbacks();
>
> @@ -1424,7 +1597,6 @@ int gnttab_init(void)
> int i;
> unsigned long max_nr_grant_frames;
> unsigned int max_nr_glist_frames, nr_glist_frames;
> - unsigned int nr_init_grefs;
> int ret;
>
> gnttab_request_version();
> @@ -1452,6 +1624,13 @@ int gnttab_init(void)
> }
> }
>
> + i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
> + gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
> + if (!gnttab_free_bitmap) {
> + ret = -ENOMEM;
> + goto ini_nomem;
> + }
> +
> ret = arch_gnttab_init(max_nr_grant_frames,
> nr_status_frames(max_nr_grant_frames));
> if (ret < 0)
> @@ -1462,15 +1641,9 @@ int gnttab_init(void)
> goto ini_nomem;
> }
>
> - nr_init_grefs = nr_grant_frames *
> - gnttab_interface->grefs_per_grant_frame;
> -
> - for (i = NR_RESERVED_ENTRIES; i < nr_init_grefs - 1; i++)
> - gnttab_entry(i) = i + 1;
> + gnttab_size = nr_grant_frames * gnttab_interface->grefs_per_grant_frame;
>
> - gnttab_entry(nr_init_grefs - 1) = GNTTAB_LIST_END;
> - gnttab_free_count = nr_init_grefs - NR_RESERVED_ENTRIES;
> - gnttab_free_head = NR_RESERVED_ENTRIES;
> + gnttab_set_free(NR_RESERVED_ENTRIES, gnttab_size - NR_RESERVED_ENTRIES);
>
> printk("Grant table initialized\n");
> return 0;
> @@ -1479,6 +1652,7 @@ int gnttab_init(void)
> for (i--; i >= 0; i--)
> free_page((unsigned long)gnttab_list[i]);
> kfree(gnttab_list);
> + bitmap_free(gnttab_free_bitmap);
> return ret;
> }
> EXPORT_SYMBOL_GPL(gnttab_init);
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index dfd5bf3..d815e1d 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -129,10 +129,14 @@ int gnttab_try_end_foreign_access(grant_ref_t ref);
> */
> int gnttab_alloc_grant_references(u16 count, grant_ref_t *pprivate_head);
>
> +int gnttab_alloc_grant_reference_seq(unsigned int count, grant_ref_t *first);
> +
> void gnttab_free_grant_reference(grant_ref_t ref);
>
> void gnttab_free_grant_references(grant_ref_t head);
>
> +void gnttab_free_grant_reference_seq(grant_ref_t head, unsigned int count);
> +
> int gnttab_empty_grant_references(const grant_ref_t *pprivate_head);
>
> int gnttab_claim_grant_reference(grant_ref_t *pprivate_head);

2022-05-14 01:51:51

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants


On 12.05.22 00:09, Boris Ostrovsky wrote:


Hello Boris


>
> On 5/11/22 2:00 PM, Oleksandr wrote:
>>
>> On 07.05.22 21:19, Oleksandr Tyshchenko wrote:
>>
>> Hello Boris, Stefano
>>
>>
>>> From: Juergen Gross <[email protected]>
>>>
>>> For support of virtio via grant mappings in rare cases larger mappings
>>> using consecutive grants are needed. Support those by adding a bitmap
>>> of free grants.
>>>
>>> As consecutive grants will be needed only in very rare cases (e.g. when
>>> configuring a virtio device with a multi-page ring), optimize for the
>>> normal case of non-consecutive allocations.
>>>
>>> Signed-off-by: Juergen Gross <[email protected]>
>>> ---
>>> Changes RFC -> V1:
>>>     - no changes
>>>     Changes V1 -> V2:
>>>     - no changes
>>
>>
>> May I please ask for the review here?
>
>
>
> I had a quick look but I am stuck on get_free_seq(), I need to stare
> at it some more. Unless someone else reviews this, I will try to get
> to this in the next couple of days.


Thank you!


>
>
>
> One thing I did notice is
>
>
>>
>>> @@ -1452,6 +1624,13 @@ int gnttab_init(void)
>>>           }
>>>       }
>>>   +    i = gnttab_interface->grefs_per_grant_frame *
>>> max_nr_grant_frames;
>>> +    gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
>>> +    if (!gnttab_free_bitmap) {
>>> +        ret = -ENOMEM;
>>> +        goto ini_nomem;
>>> +    }
>
>
> This overwrites 'i' and will break error handling at ini_nomem.


Indeed, will fix. Thank you for pointing this out.


>
>
> -boris
>
>
--
Regards,

Oleksandr Tyshchenko


2022-05-14 04:32:41

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants



On 5/13/22 1:33 AM, Juergen Gross wrote:
> On 12.05.22 22:01, Boris Ostrovsky wrote:
>>
>> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:

>>> +/* Rebuilds the free grant list and tries to find count consecutive entries. */
>>> +static int get_free_seq(unsigned int count)
>>> +{
>>> +    int ret = -ENOSPC;
>>> +    unsigned int from, to;
>>> +    grant_ref_t *last;
>>> +
>>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>>> +    last = &gnttab_free_head;
>>> +
>>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>>> +         from < gnttab_size;
>>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
>>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>>> +                    from + 1);
>>> +        if (ret < 0 && to - from >= count) {
>>> +            ret = from;
>>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>>> +            from += count;
>>> +            gnttab_free_count -= count;
>>
>>
>> IIUIC we can have multiple passes over this, meaning that the gnttab_free_count may be decremented more than once. Is that intentional?
>
> After the first pass decrementing gnttab_free_cnt, ret will no
> longer be less than zero, so this can be hit only once.

Oh, yes, of course.

>
>>
>>
>>> +            if (from == to)
>>> +                continue;
>>> +        }
>>> +
>>> +        while (from < to) {
>>> +            *last = from;
>>> +            last = __gnttab_entry(from);
>>> +            gnttab_last_free = from;
>>> +            from++;
>>> +        }
>>
>>
>> I have been looking at this loop and I can't understand what it is doing ;-( Can you enlighten me?
>
> It is recreating the free list in order to have it properly sorted.
> This is needed to make sure that the free tail has the maximum
> possible size (you can take the tail off the list without having
> to worry about breaking the linked list because of references into
> the tail).


So let's say we have the (one-dimensional) table of length 13

idx .. 2 3 ... 10 11 12

grant 12 11 2 -1 3


and gnttab_free_head is 10. I.e. the free list is 2, 12, 3, 11.

What will this look like after the 2 iterations of the outer loop?

(I am really having a mental block on this).



-boris


2022-05-16 09:28:47

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants

On 14.05.22 04:34, Boris Ostrovsky wrote:
>
>
> On 5/13/22 1:33 AM, Juergen Gross wrote:
>> On 12.05.22 22:01, Boris Ostrovsky wrote:
>>>
>>> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
>
>>>> +/* Rebuilds the free grant list and tries to find count consecutive
>>>> entries. */
>>>> +static int get_free_seq(unsigned int count)
>>>> +{
>>>> +    int ret = -ENOSPC;
>>>> +    unsigned int from, to;
>>>> +    grant_ref_t *last;
>>>> +
>>>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>>>> +    last = &gnttab_free_head;
>>>> +
>>>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>>>> +         from < gnttab_size;
>>>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
>>>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>>>> +                    from + 1);
>>>> +        if (ret < 0 && to - from >= count) {
>>>> +            ret = from;
>>>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>>>> +            from += count;
>>>> +            gnttab_free_count -= count;
>>>
>>>
>>> IIUIC we can have multiple passes over this, meaning that the
>>> gnttab_free_count may be decremented more than once. Is that intentional?
>>
>> After the first pass decrementing gnttab_free_cnt, ret will no
>> longer be less than zero, so this can be hit only once.
>
> Oh, yes, of course.
>
>>
>>>
>>>
>>>> +            if (from == to)
>>>> +                continue;
>>>> +        }
>>>> +
>>>> +        while (from < to) {
>>>> +            *last = from;
>>>> +            last = __gnttab_entry(from);
>>>> +            gnttab_last_free = from;
>>>> +            from++;
>>>> +        }
>>>
>>>
>>> I have been looking at this loop and I can't understand what it is doing ;-(
>>> Can you enlighten me?
>>
>> It is recreating the free list in order to have it properly sorted.
>> This is needed to make sure that the free tail has the maximum
>> possible size (you can take the tail off the list without having
>> to worry about breaking the linked list because of references into
>> the tail).
>
>
> So let's say we have the (one-dimensional) table of length 13
>
> idx    ..    2    3  ...  10  11  12
>
> grant       12   11        2  -1   3
>
>
> and gnttab_free_head is 10. I.e. the free list is 2, 12, 3, 11.

You meant 10, 2, 12, 3, 11, I guess?

>
> What will this look like after the 2 iterations of the outer loop?

idx .. 2 3 ... 10 11 12

grant 3 10 11 12 -1

with gnttab_free_head being 2, i.e the free list is now 2, 3, 10, 11, 12.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-05-16 23:40:04

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants


On 5/16/22 2:30 PM, Oleksandr wrote:
>
> On 16.05.22 19:00, Boris Ostrovsky wrote:
>
>
>>
>>
>> With the error handling in gnttab_init() fixed
>
> yes, this is a diff that I am going to apply for the next version:
>
>
> [snip]
>
> @@ -1596,19 +1601,20 @@ static int gnttab_expand(unsigned int req_entries)
>  int gnttab_init(void)
>  {
>         int i;
> -       unsigned long max_nr_grant_frames;
> +       unsigned long max_nr_grant_frames, max_nr_grefs;
>         unsigned int max_nr_glist_frames, nr_glist_frames;
>         int ret;
>
>         gnttab_request_version();
>         max_nr_grant_frames = gnttab_max_grant_frames();
> +       max_nr_grefs = max_nr_grant_frames *
> + gnttab_interface->grefs_per_grant_frame;
>         nr_grant_frames = 1;
>
>         /* Determine the maximum number of frames required for the
>          * grant reference free list on the current hypervisor.
>          */
> -       max_nr_glist_frames = (max_nr_grant_frames *
> - gnttab_interface->grefs_per_grant_frame / RPP);
> +       max_nr_glist_frames = max_nr_grefs / RPP;
>
>         gnttab_list = kmalloc_array(max_nr_glist_frames,
>                                     sizeof(grant_ref_t *),
> @@ -1625,8 +1631,7 @@ int gnttab_init(void)
>                 }
>         }
>
> -       i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
> -       gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
> +       gnttab_free_bitmap = bitmap_zalloc(max_nr_grefs, GFP_KERNEL);
>         if (!gnttab_free_bitmap) {
>                 ret = -ENOMEM;
>                 goto ini_nomem;
>


Looks good, thanks.


-boris


2022-05-17 01:27:50

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants


On 16.05.22 19:00, Boris Ostrovsky wrote:


Hello Boris


>
>
> On 5/16/22 1:59 AM, Juergen Gross wrote:
>> On 14.05.22 04:34, Boris Ostrovsky wrote:
>>>
>>>
>>> On 5/13/22 1:33 AM, Juergen Gross wrote:
>>>> On 12.05.22 22:01, Boris Ostrovsky wrote:
>>>>>
>>>>> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
>>>
>>>>>> +/* Rebuilds the free grant list and tries to find count
>>>>>> consecutive entries. */
>>>>>> +static int get_free_seq(unsigned int count)
>>>>>> +{
>>>>>> +    int ret = -ENOSPC;
>>>>>> +    unsigned int from, to;
>>>>>> +    grant_ref_t *last;
>>>>>> +
>>>>>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>>>>>> +    last = &gnttab_free_head;
>>>>>> +
>>>>>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>>>>>> +         from < gnttab_size;
>>>>>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size,
>>>>>> to + 1)) {
>>>>>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>>>>>> +                    from + 1);
>>>>>> +        if (ret < 0 && to - from >= count) {
>>>>>> +            ret = from;
>>>>>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>>>>>> +            from += count;
>>>>>> +            gnttab_free_count -= count;
>>>>>
>>>>>
>>>>> IIUIC we can have multiple passes over this, meaning that the
>>>>> gnttab_free_count may be decremented more than once. Is that
>>>>> intentional?
>>>>
>>>> After the first pass decrementing gnttab_free_cnt, ret will no
>>>> longer be less than zero, so this can be hit only once.
>>>
>>> Oh, yes, of course.
>>>
>>>>
>>>>>
>>>>>
>>>>>> +            if (from == to)
>>>>>> +                continue;
>>>>>> +        }
>>>>>> +
>>>>>> +        while (from < to) {
>>>>>> +            *last = from;
>>>>>> +            last = __gnttab_entry(from);
>>>>>> +            gnttab_last_free = from;
>>>>>> +            from++;
>>>>>> +        }
>>>>>
>>>>>
>>>>> I have been looking at this loop and I can't understand what it is
>>>>> doing ;-( Can you enlighten me?
>>>>
>>>> It is recreating the free list in order to have it properly sorted.
>>>> This is needed to make sure that the free tail has the maximum
>>>> possible size (you can take the tail off the list without having
>>>> to worry about breaking the linked list because of references into
>>>> the tail).
>>>
>>>
>>> So let's say we have the (one-dimensional) table of length 13
>>>
>>> idx    ..    2    3  ...  10  11  12
>>>
>>> grant       12   11        2  -1   3
>>>
>>>
>>> and gnttab_free_head is 10. I.e. the free list is 2, 12, 3, 11.
>>
>> You meant 10, 2, 12, 3, 11, I guess?
>>
>>>
>>> What will this look like after the 2 iterations of the outer loop?
>>
>> idx    ..    2    3  ...  10  11  12
>>
>> grant        3   10       11  12  -1
>>
>> with gnttab_free_head being 2, i.e the free list is now 2, 3, 10, 11,
>> 12.
>
>
>
> OK, thanks, that helped. I couldn't link the free chunks in my head
>
>
> With the error handling in gnttab_init() fixed

yes, this is a diff that I am going to apply for the next version:


[snip]

@@ -1596,19 +1601,20 @@ static int gnttab_expand(unsigned int req_entries)
 int gnttab_init(void)
 {
        int i;
-       unsigned long max_nr_grant_frames;
+       unsigned long max_nr_grant_frames, max_nr_grefs;
        unsigned int max_nr_glist_frames, nr_glist_frames;
        int ret;

        gnttab_request_version();
        max_nr_grant_frames = gnttab_max_grant_frames();
+       max_nr_grefs = max_nr_grant_frames *
+ gnttab_interface->grefs_per_grant_frame;
        nr_grant_frames = 1;

        /* Determine the maximum number of frames required for the
         * grant reference free list on the current hypervisor.
         */
-       max_nr_glist_frames = (max_nr_grant_frames *
- gnttab_interface->grefs_per_grant_frame / RPP);
+       max_nr_glist_frames = max_nr_grefs / RPP;

        gnttab_list = kmalloc_array(max_nr_glist_frames,
                                    sizeof(grant_ref_t *),
@@ -1625,8 +1631,7 @@ int gnttab_init(void)
                }
        }

-       i = gnttab_interface->grefs_per_grant_frame * max_nr_grant_frames;
-       gnttab_free_bitmap = bitmap_zalloc(i, GFP_KERNEL);
+       gnttab_free_bitmap = bitmap_zalloc(max_nr_grefs, GFP_KERNEL);
        if (!gnttab_free_bitmap) {
                ret = -ENOMEM;
                goto ini_nomem;


>
>
> Reviewed-by: Boris Ostrovsky <[email protected]>

Thanks!



>
>
>
> -boris

--
Regards,

Oleksandr Tyshchenko


2022-05-17 03:44:36

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops

On Sat, May 07, 2022 at 09:19:06PM +0300, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <[email protected]>
>
> Introduce Xen specific binding for the virtualized device (e.g. virtio)
> to be used by Xen grant DMA-mapping layer in the subsequent commit.
>
> This binding indicates that Xen grant mappings scheme needs to be
> enabled for the device which DT node contains that property and specifies
> the ID of Xen domain where the corresponding backend resides. The ID
> (domid) is used as an argument to the grant mapping APIs.
>
> This is needed for the option to restrict memory access using Xen grant
> mappings to work which primary goal is to enable using virtio devices
> in Xen guests.
>
> Signed-off-by: Oleksandr Tyshchenko <[email protected]>
> ---
> Changes RFC -> V1:
> - update commit subject/description and text in description
> - move to devicetree/bindings/arm/
>
> Changes V1 -> V2:
> - update text in description
> - change the maintainer of the binding
> - fix validation issue
> - reference xen,dev-domid.yaml schema from virtio/mmio.yaml
> ---
> .../devicetree/bindings/arm/xen,dev-domid.yaml | 37 ++++++++++++++++++++++
> Documentation/devicetree/bindings/virtio/mmio.yaml | 7 ++++
> 2 files changed, 44 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
>
> diff --git a/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
> new file mode 100644
> index 00000000..750e89e
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
> @@ -0,0 +1,37 @@
> +# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/arm/xen,dev-domid.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Xen specific binding for virtualized devices (e.g. virtio)
> +
> +maintainers:
> + - Stefano Stabellini <[email protected]>
> +
> +select: true

Omit. No need to apply this on every single node.

> +
> +description:
> + This binding indicates that Xen grant mappings need to be enabled for
> + the device, and it specifies the ID of the domain where the corresponding
> + device (backend) resides. The property is required to restrict memory
> + access using Xen grant mappings.
> +
> +properties:
> + xen,dev-domid:

I kind of think 'dev' is redundant. Is there another kind of domid
possible? Maybe xen,backend-domid or just xen,domid? I don't know Xen
too well, so ultimately up to you all.

> + $ref: /schemas/types.yaml#/definitions/uint32
> + description:
> + The domid (domain ID) of the domain where the device (backend) is running.
> +
> +additionalProperties: true
> +
> +examples:
> + - |
> + virtio@3000 {
> + compatible = "virtio,mmio";
> + reg = <0x3000 0x100>;
> + interrupts = <41>;
> +
> + /* The device is located in Xen domain with ID 1 */
> + xen,dev-domid = <1>;
> + };
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
> index 10c22b5..29a0932 100644
> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> @@ -13,6 +13,9 @@ description:
> See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
> more details.
>
> +allOf:
> + - $ref: /schemas/arm/xen,dev-domid.yaml#
> +
> properties:
> compatible:
> const: virtio,mmio
> @@ -33,6 +36,10 @@ properties:
> description: Required for devices making accesses thru an IOMMU.
> maxItems: 1
>
> + xen,dev-domid:
> + description: Required when Xen grant mappings need to be enabled for device.
> + $ref: /schemas/types.yaml#/definitions/uint32

No need to define the type again nor describe it again.

Instead, just change additionalProperties to unevaluateProperties in
this doc. The diff is the latter takes $ref's into account.

> +
> required:
> - compatible
> - reg
> --
> 2.7.4
>
>

2022-05-18 00:42:00

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH V2 2/7] xen/grants: support allocating consecutive grants



On 5/16/22 1:59 AM, Juergen Gross wrote:
> On 14.05.22 04:34, Boris Ostrovsky wrote:
>>
>>
>> On 5/13/22 1:33 AM, Juergen Gross wrote:
>>> On 12.05.22 22:01, Boris Ostrovsky wrote:
>>>>
>>>> On 5/7/22 2:19 PM, Oleksandr Tyshchenko wrote:
>>
>>>>> +/* Rebuilds the free grant list and tries to find count consecutive entries. */
>>>>> +static int get_free_seq(unsigned int count)
>>>>> +{
>>>>> +    int ret = -ENOSPC;
>>>>> +    unsigned int from, to;
>>>>> +    grant_ref_t *last;
>>>>> +
>>>>> +    gnttab_free_tail_ptr = &gnttab_free_head;
>>>>> +    last = &gnttab_free_head;
>>>>> +
>>>>> +    for (from = find_first_bit(gnttab_free_bitmap, gnttab_size);
>>>>> +         from < gnttab_size;
>>>>> +         from = find_next_bit(gnttab_free_bitmap, gnttab_size, to + 1)) {
>>>>> +        to = find_next_zero_bit(gnttab_free_bitmap, gnttab_size,
>>>>> +                    from + 1);
>>>>> +        if (ret < 0 && to - from >= count) {
>>>>> +            ret = from;
>>>>> +            bitmap_clear(gnttab_free_bitmap, ret, count);
>>>>> +            from += count;
>>>>> +            gnttab_free_count -= count;
>>>>
>>>>
>>>> IIUIC we can have multiple passes over this, meaning that the gnttab_free_count may be decremented more than once. Is that intentional?
>>>
>>> After the first pass decrementing gnttab_free_cnt, ret will no
>>> longer be less than zero, so this can be hit only once.
>>
>> Oh, yes, of course.
>>
>>>
>>>>
>>>>
>>>>> +            if (from == to)
>>>>> +                continue;
>>>>> +        }
>>>>> +
>>>>> +        while (from < to) {
>>>>> +            *last = from;
>>>>> +            last = __gnttab_entry(from);
>>>>> +            gnttab_last_free = from;
>>>>> +            from++;
>>>>> +        }
>>>>
>>>>
>>>> I have been looking at this loop and I can't understand what it is doing ;-( Can you enlighten me?
>>>
>>> It is recreating the free list in order to have it properly sorted.
>>> This is needed to make sure that the free tail has the maximum
>>> possible size (you can take the tail off the list without having
>>> to worry about breaking the linked list because of references into
>>> the tail).
>>
>>
>> So let's say we have the (one-dimensional) table of length 13
>>
>> idx    ..    2    3  ...  10  11  12
>>
>> grant       12   11        2  -1   3
>>
>>
>> and gnttab_free_head is 10. I.e. the free list is 2, 12, 3, 11.
>
> You meant 10, 2, 12, 3, 11, I guess?
>
>>
>> What will this look like after the 2 iterations of the outer loop?
>
> idx    ..    2    3  ...  10  11  12
>
> grant        3   10       11  12  -1
>
> with gnttab_free_head being 2, i.e the free list is now 2, 3, 10, 11, 12.



OK, thanks, that helped. I couldn't link the free chunks in my head


With the error handling in gnttab_init() fixed

Reviewed-by: Boris Ostrovsky <[email protected]>



-boris

2022-05-18 14:18:37

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops


On 17.05.22 03:27, Rob Herring wrote:

Hello Rob, all


> On Sat, May 07, 2022 at 09:19:06PM +0300, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <[email protected]>
>>
>> Introduce Xen specific binding for the virtualized device (e.g. virtio)
>> to be used by Xen grant DMA-mapping layer in the subsequent commit.
>>
>> This binding indicates that Xen grant mappings scheme needs to be
>> enabled for the device which DT node contains that property and specifies
>> the ID of Xen domain where the corresponding backend resides. The ID
>> (domid) is used as an argument to the grant mapping APIs.
>>
>> This is needed for the option to restrict memory access using Xen grant
>> mappings to work which primary goal is to enable using virtio devices
>> in Xen guests.
>>
>> Signed-off-by: Oleksandr Tyshchenko <[email protected]>
>> ---
>> Changes RFC -> V1:
>> - update commit subject/description and text in description
>> - move to devicetree/bindings/arm/
>>
>> Changes V1 -> V2:
>> - update text in description
>> - change the maintainer of the binding
>> - fix validation issue
>> - reference xen,dev-domid.yaml schema from virtio/mmio.yaml
>> ---
>> .../devicetree/bindings/arm/xen,dev-domid.yaml | 37 ++++++++++++++++++++++
>> Documentation/devicetree/bindings/virtio/mmio.yaml | 7 ++++
>> 2 files changed, 44 insertions(+)
>> create mode 100644 Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
>>
>> diff --git a/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
>> new file mode 100644
>> index 00000000..750e89e
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/arm/xen,dev-domid.yaml
>> @@ -0,0 +1,37 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/arm/xen,dev-domid.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Xen specific binding for virtualized devices (e.g. virtio)
>> +
>> +maintainers:
>> + - Stefano Stabellini <[email protected]>
>> +
>> +select: true
> Omit. No need to apply this on every single node.

ok, will do


>
>> +
>> +description:
>> + This binding indicates that Xen grant mappings need to be enabled for
>> + the device, and it specifies the ID of the domain where the corresponding
>> + device (backend) resides. The property is required to restrict memory
>> + access using Xen grant mappings.
>> +
>> +properties:
>> + xen,dev-domid:
> I kind of think 'dev' is redundant. Is there another kind of domid
> possible?


In general, yes. It is driver(frontend) domid. But, at least for now, I
don't see why we will need an additional property for that.


> Maybe xen,backend-domid or just xen,domid? I don't know Xen
> too well, so ultimately up to you all.

xen,domid sounds ambiguous to me.

xen,backend-domid sounds perfectly fine to me, I even think it fits better.



Stefano, Juergen, would you be happy with new xen,backend-domid name?

If yes, Stefano could you please clarify, would you be OK if I retained
your R-b tags (for all patches in current series which touch that
property) after doing such renaming?




>
>> + $ref: /schemas/types.yaml#/definitions/uint32
>> + description:
>> + The domid (domain ID) of the domain where the device (backend) is running.
>> +
>> +additionalProperties: true
>> +
>> +examples:
>> + - |
>> + virtio@3000 {
>> + compatible = "virtio,mmio";
>> + reg = <0x3000 0x100>;
>> + interrupts = <41>;
>> +
>> + /* The device is located in Xen domain with ID 1 */
>> + xen,dev-domid = <1>;
>> + };
>> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
>> index 10c22b5..29a0932 100644
>> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
>> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
>> @@ -13,6 +13,9 @@ description:
>> See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
>> more details.
>>
>> +allOf:
>> + - $ref: /schemas/arm/xen,dev-domid.yaml#
>> +
>> properties:
>> compatible:
>> const: virtio,mmio
>> @@ -33,6 +36,10 @@ properties:
>> description: Required for devices making accesses thru an IOMMU.
>> maxItems: 1
>>
>> + xen,dev-domid:
>> + description: Required when Xen grant mappings need to be enabled for device.
>> + $ref: /schemas/types.yaml#/definitions/uint32
> No need to define the type again nor describe it again.
>
> Instead, just change additionalProperties to unevaluateProperties in
> this doc. The diff is the latter takes $ref's into account.

ok, will do. Could you please clarify, shall I use?

unevaluatedProperties: false

or

unevaluatedProperties:

type: object


I am not too familiar with this stuff. Both variants seem to pass
validation.


Thank you.


>
>> +
>> required:
>> - compatible
>> - reg
>> --
>> 2.7.4
>>
>>
--
Regards,

Oleksandr Tyshchenko


2022-05-18 14:34:26

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops

On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <[email protected]> wrote:
>
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
> index 10c22b5..29a0932 100644
> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> @@ -13,6 +13,9 @@ description:
> See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
> more details.
>
> +allOf:
> + - $ref: /schemas/arm/xen,dev-domid.yaml#
> +
> properties:
> compatible:
> const: virtio,mmio
> @@ -33,6 +36,10 @@ properties:
> description: Required for devices making accesses thru an IOMMU.
> maxItems: 1
>
> + xen,dev-domid:
> + description: Required when Xen grant mappings need to be enabled for device.
> + $ref: /schemas/types.yaml#/definitions/uint32
> +
> required:
> - compatible
> - reg

Sorry for joining the discussion late. Have you considered using the
generic iommu
binding here instead of a custom property? This would mean having a device
node for the grant-table mechanism that can be referred to using the 'iommus'
phandle property, with the domid as an additional argument.

It does not quite fit the model that Linux currently uses for iommus,
as that has an allocator for dma_addr_t space, but it would think it's
conceptually close enough that it makes sense for the binding.

Arnd

2022-05-18 16:12:37

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops


On 18.05.22 17:32, Arnd Bergmann wrote:


Hello Arnd


> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <[email protected]> wrote:
>> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
>> index 10c22b5..29a0932 100644
>> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
>> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
>> @@ -13,6 +13,9 @@ description:
>> See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
>> more details.
>>
>> +allOf:
>> + - $ref: /schemas/arm/xen,dev-domid.yaml#
>> +
>> properties:
>> compatible:
>> const: virtio,mmio
>> @@ -33,6 +36,10 @@ properties:
>> description: Required for devices making accesses thru an IOMMU.
>> maxItems: 1
>>
>> + xen,dev-domid:
>> + description: Required when Xen grant mappings need to be enabled for device.
>> + $ref: /schemas/types.yaml#/definitions/uint32
>> +
>> required:
>> - compatible
>> - reg
> Sorry for joining the discussion late. Have you considered using the
> generic iommu
> binding here instead of a custom property?

I have to admit - no, I haven't. I was thinking that Xen specific
feature should be communicated using Xen specific DT property.


> This would mean having a device
> node for the grant-table mechanism that can be referred to using the 'iommus'
> phandle property, with the domid as an additional argument.

I assume, you are speaking about something like the following?


xen_dummy_iommu {
   compatible = "xen,dummy-iommu";
   #iommu-cells = <1>;
};

virtio@3000 {
   compatible = "virtio,mmio";
   reg = <0x3000 0x100>;
   interrupts = <41>;

   /* The device is located in Xen domain with ID 1 */
   iommus = <&xen_dummy_iommu 1>;
};


>
> It does not quite fit the model that Linux currently uses for iommus,
> as that has an allocator for dma_addr_t space

yes (# 3/7 adds grant-table based allocator)


> , but it would think it's
> conceptually close enough that it makes sense for the binding.

Interesting idea. I am wondering, do we need an extra actions for this
to work in Linux guest (dummy IOMMU driver, etc)?


>
> Arnd

--
Regards,

Oleksandr Tyshchenko


2022-05-18 16:42:05

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops

On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]> wrote:
> On 18.05.22 17:32, Arnd Bergmann wrote:
> > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <[email protected]> wrote:
>
> > This would mean having a device
> > node for the grant-table mechanism that can be referred to using the 'iommus'
> > phandle property, with the domid as an additional argument.
>
> I assume, you are speaking about something like the following?
>
>
> xen_dummy_iommu {
> compatible = "xen,dummy-iommu";
> #iommu-cells = <1>;
> };
>
> virtio@3000 {
> compatible = "virtio,mmio";
> reg = <0x3000 0x100>;
> interrupts = <41>;
>
> /* The device is located in Xen domain with ID 1 */
> iommus = <&xen_dummy_iommu 1>;
> };

Right, that's that's the idea, except I would not call it a 'dummy'.
From the perspective of the DT, this behaves just like an IOMMU,
even if the exact mechanism is different from most hardware IOMMU
implementations.

> > It does not quite fit the model that Linux currently uses for iommus,
> > as that has an allocator for dma_addr_t space
>
> yes (# 3/7 adds grant-table based allocator)
>
>
> > , but it would think it's
> > conceptually close enough that it makes sense for the binding.
>
> Interesting idea. I am wondering, do we need an extra actions for this
> to work in Linux guest (dummy IOMMU driver, etc)?

It depends on how closely the guest implementation can be made to
resemble a normal iommu. If you do allocate dma_addr_t addresses,
it may actually be close enough that you can just turn the grant-table
code into a normal iommu driver and change nothing else.

Arnd

2022-05-18 18:59:26

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops

On Wed, May 18, 2022 at 03:32:27PM +0100, Arnd Bergmann wrote:
> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <[email protected]> wrote:
> >
> > diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
> > index 10c22b5..29a0932 100644
> > --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
> > +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> > @@ -13,6 +13,9 @@ description:
> > See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
> > more details.
> >
> > +allOf:
> > + - $ref: /schemas/arm/xen,dev-domid.yaml#
> > +
> > properties:
> > compatible:
> > const: virtio,mmio
> > @@ -33,6 +36,10 @@ properties:
> > description: Required for devices making accesses thru an IOMMU.
> > maxItems: 1
> >
> > + xen,dev-domid:
> > + description: Required when Xen grant mappings need to be enabled for device.
> > + $ref: /schemas/types.yaml#/definitions/uint32
> > +
> > required:
> > - compatible
> > - reg
>
> Sorry for joining the discussion late. Have you considered using the
> generic iommu
> binding here instead of a custom property? This would mean having a device
> node for the grant-table mechanism that can be referred to using the 'iommus'
> phandle property, with the domid as an additional argument.
>
> It does not quite fit the model that Linux currently uses for iommus,
> as that has an allocator for dma_addr_t space, but it would think it's
> conceptually close enough that it makes sense for the binding.

Something common is almost always better.

That may also have the issue that fw_devlink will make the 'iommu'
driver a dependency to probe.

Rob

2022-05-18 23:50:31

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops


On 18.05.22 19:39, Arnd Bergmann wrote:


Hello Arnd


> On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]> wrote:
>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <[email protected]> wrote:
>>> This would mean having a device
>>> node for the grant-table mechanism that can be referred to using the 'iommus'
>>> phandle property, with the domid as an additional argument.
>> I assume, you are speaking about something like the following?
>>
>>
>> xen_dummy_iommu {
>> compatible = "xen,dummy-iommu";
>> #iommu-cells = <1>;
>> };
>>
>> virtio@3000 {
>> compatible = "virtio,mmio";
>> reg = <0x3000 0x100>;
>> interrupts = <41>;
>>
>> /* The device is located in Xen domain with ID 1 */
>> iommus = <&xen_dummy_iommu 1>;
>> };
> Right, that's that's the idea,

thank you for the confirmation



> except I would not call it a 'dummy'.
> From the perspective of the DT, this behaves just like an IOMMU,
> even if the exact mechanism is different from most hardware IOMMU
> implementations.

well, agree


>
>>> It does not quite fit the model that Linux currently uses for iommus,
>>> as that has an allocator for dma_addr_t space
>> yes (# 3/7 adds grant-table based allocator)
>>
>>
>>> , but it would think it's
>>> conceptually close enough that it makes sense for the binding.
>> Interesting idea. I am wondering, do we need an extra actions for this
>> to work in Linux guest (dummy IOMMU driver, etc)?
> It depends on how closely the guest implementation can be made to
> resemble a normal iommu. If you do allocate dma_addr_t addresses,
> it may actually be close enough that you can just turn the grant-table
> code into a normal iommu driver and change nothing else.

Unfortunately, I failed to find a way how use grant references at the
iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
am not too familiar with that, so what is written below might be wrong
or at least not precise.

The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
itself, it just maps (IOVA-PA) what was requested to be mapped by the
upper layer. The DMA address allocation is done by the upper layer
(DMA-IOMMU which is the glue layer between DMA API and IOMMU API
allocates IOVA for PA?). But, all what we need here is just to allocate
our specific grant-table based DMA addresses (DMA address = grant
reference + offset in the page), so let’s say we need an entity to take
a physical address as parameter and return a DMA address (what actually
commit #3/7 is doing), and that’s all. So working at the dma_ops layer
we get exactly what we need, with the minimal changes to guest
infrastructure. In our case the Xen itself acts as an IOMMU.

Assuming that we want to reuse the IOMMU infrastructure somehow for our
needs. I think, in that case we will likely need to introduce a new
specific IOVA allocator (alongside with a generic one) to be hooked up
by the DMA-IOMMU layer if we run on top of Xen. But, even having the
specific IOVA allocator to return what we indeed need (DMA address =
grant reference + offset in the page) we will still need the specific
minimal required IOMMU driver to be present in the system anyway in
order to track the mappings(?) and do nothing with them, returning a
success (this specific IOMMU driver should have all mandatory callbacks
implemented).

I completely agree, it would be really nice to reuse generic IOMMU
bindings rather than introducing Xen specific property if what we are
trying to implement in current patch series fits in the usage of
"iommus" in Linux more-less. But, if we will have to add more
complexity/more components to the code for the sake of reusing device
tree binding, this raises a question whether that’s worthwhile.

Or I really missed something?


>
> Arnd

--
Regards,

Oleksandr Tyshchenko


2022-05-19 02:14:25

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops


On 18.05.22 21:59, Rob Herring wrote:

Hello Rob, Arnd

> On Wed, May 18, 2022 at 03:32:27PM +0100, Arnd Bergmann wrote:
>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko <[email protected]> wrote:
>>> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml
>>> index 10c22b5..29a0932 100644
>>> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
>>> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
>>> @@ -13,6 +13,9 @@ description:
>>> See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
>>> more details.
>>>
>>> +allOf:
>>> + - $ref: /schemas/arm/xen,dev-domid.yaml#
>>> +
>>> properties:
>>> compatible:
>>> const: virtio,mmio
>>> @@ -33,6 +36,10 @@ properties:
>>> description: Required for devices making accesses thru an IOMMU.
>>> maxItems: 1
>>>
>>> + xen,dev-domid:
>>> + description: Required when Xen grant mappings need to be enabled for device.
>>> + $ref: /schemas/types.yaml#/definitions/uint32
>>> +
>>> required:
>>> - compatible
>>> - reg
>> Sorry for joining the discussion late. Have you considered using the
>> generic iommu
>> binding here instead of a custom property? This would mean having a device
>> node for the grant-table mechanism that can be referred to using the 'iommus'
>> phandle property, with the domid as an additional argument.
>>
>> It does not quite fit the model that Linux currently uses for iommus,
>> as that has an allocator for dma_addr_t space, but it would think it's
>> conceptually close enough that it makes sense for the binding.
> Something common is almost always better.

agree


>
> That may also have the issue that fw_devlink will make the 'iommu'
> driver a dependency to probe.

Looks like I ran into it while experimenting. I generated the following
nodes in guest DT using Xen toolstack:

[snip]

        xen_dummy_iommu {
                compatible = "xen,dummy-iommu";
                #iommu-cells = <0x01>;
                phandle = <0xfde9>;
        };
        virtio@2000000 {
                compatible = "virtio,mmio";
                reg = <0x00 0x2000000 0x00 0x200>;
                interrupts = <0x00 0x01 0xf01>;
                interrupt-parent = <0xfde8>;
                dma-coherent;
                iommus = <0xfde9 0x01>;
        };

[snip]


And got:

virtio-mmio 2000000.virtio: deferred probe timeout, ignoring dependency


>
> Rob

--
Regards,

Oleksandr Tyshchenko


2022-05-19 06:12:19

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops


On 19.05.22 04:06, Stefano Stabellini wrote:


Hello Stefano

> On Thu, 19 May 2022, Oleksandr wrote:
>>> On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]> wrote:
>>>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
>>>>> <[email protected]> wrote:
>>>>> This would mean having a device
>>>>> node for the grant-table mechanism that can be referred to using the
>>>>> 'iommus'
>>>>> phandle property, with the domid as an additional argument.
>>>> I assume, you are speaking about something like the following?
>>>>
>>>>
>>>> xen_dummy_iommu {
>>>> compatible = "xen,dummy-iommu";
>>>> #iommu-cells = <1>;
>>>> };
>>>>
>>>> virtio@3000 {
>>>> compatible = "virtio,mmio";
>>>> reg = <0x3000 0x100>;
>>>> interrupts = <41>;
>>>>
>>>> /* The device is located in Xen domain with ID 1 */
>>>> iommus = <&xen_dummy_iommu 1>;
>>>> };
>>> Right, that's that's the idea,
>> thank you for the confirmation
>>
>>
>>
>>> except I would not call it a 'dummy'.
>>> From the perspective of the DT, this behaves just like an IOMMU,
>>> even if the exact mechanism is different from most hardware IOMMU
>>> implementations.
>> well, agree
>>
>>
>>>>> It does not quite fit the model that Linux currently uses for iommus,
>>>>> as that has an allocator for dma_addr_t space
>>>> yes (# 3/7 adds grant-table based allocator)
>>>>
>>>>
>>>>> , but it would think it's
>>>>> conceptually close enough that it makes sense for the binding.
>>>> Interesting idea. I am wondering, do we need an extra actions for this
>>>> to work in Linux guest (dummy IOMMU driver, etc)?
>>> It depends on how closely the guest implementation can be made to
>>> resemble a normal iommu. If you do allocate dma_addr_t addresses,
>>> it may actually be close enough that you can just turn the grant-table
>>> code into a normal iommu driver and change nothing else.
>> Unfortunately, I failed to find a way how use grant references at the
>> iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I am
>> not too familiar with that, so what is written below might be wrong or at
>> least not precise.
>>
>> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by itself, it
>> just maps (IOVA-PA) what was requested to be mapped by the upper layer. The
>> DMA address allocation is done by the upper layer (DMA-IOMMU which is the glue
>> layer between DMA API and IOMMU API allocates IOVA for PA?). But, all what we
>> need here is just to allocate our specific grant-table based DMA addresses
>> (DMA address = grant reference + offset in the page), so let’s say we need an
>> entity to take a physical address as parameter and return a DMA address (what
>> actually commit #3/7 is doing), and that’s all. So working at the dma_ops
>> layer we get exactly what we need, with the minimal changes to guest
>> infrastructure. In our case the Xen itself acts as an IOMMU.
>>
>> Assuming that we want to reuse the IOMMU infrastructure somehow for our needs.
>> I think, in that case we will likely need to introduce a new specific IOVA
>> allocator (alongside with a generic one) to be hooked up by the DMA-IOMMU
>> layer if we run on top of Xen. But, even having the specific IOVA allocator to
>> return what we indeed need (DMA address = grant reference + offset in the
>> page) we will still need the specific minimal required IOMMU driver to be
>> present in the system anyway in order to track the mappings(?) and do nothing
>> with them, returning a success (this specific IOMMU driver should have all
>> mandatory callbacks implemented).
>>
>> I completely agree, it would be really nice to reuse generic IOMMU bindings
>> rather than introducing Xen specific property if what we are trying to
>> implement in current patch series fits in the usage of "iommus" in Linux
>> more-less. But, if we will have to add more complexity/more components to the
>> code for the sake of reusing device tree binding, this raises a question
>> whether that’s worthwhile.
>>
>> Or I really missed something?
> I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
> bindings, not necessarily the IOMMU drivers framework in Linux (although
> that would be an added bonus.)
>
> I know from previous discussions with you that making the grant table
> fit in the existing IOMMU drivers model is difficult, but just reusing
> the Device Tree bindings seems feasible?

I started experimenting with that. As wrote in a separate email, I got a
deferred probe timeout,

after inserting required nodes into guest device tree, which seems to be
a consequence of the unavailability of IOMMU, I will continue to
investigate this question.



--
Regards,

Oleksandr Tyshchenko


2022-05-19 06:23:54

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops

On Thu, 19 May 2022, Oleksandr wrote:
> > On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]> wrote:
> > > On 18.05.22 17:32, Arnd Bergmann wrote:
> > > > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
> > > > <[email protected]> wrote:
> > > > This would mean having a device
> > > > node for the grant-table mechanism that can be referred to using the
> > > > 'iommus'
> > > > phandle property, with the domid as an additional argument.
> > > I assume, you are speaking about something like the following?
> > >
> > >
> > > xen_dummy_iommu {
> > > compatible = "xen,dummy-iommu";
> > > #iommu-cells = <1>;
> > > };
> > >
> > > virtio@3000 {
> > > compatible = "virtio,mmio";
> > > reg = <0x3000 0x100>;
> > > interrupts = <41>;
> > >
> > > /* The device is located in Xen domain with ID 1 */
> > > iommus = <&xen_dummy_iommu 1>;
> > > };
> > Right, that's that's the idea,
>
> thank you for the confirmation
>
>
>
> > except I would not call it a 'dummy'.
> > From the perspective of the DT, this behaves just like an IOMMU,
> > even if the exact mechanism is different from most hardware IOMMU
> > implementations.
>
> well, agree
>
>
> >
> > > > It does not quite fit the model that Linux currently uses for iommus,
> > > > as that has an allocator for dma_addr_t space
> > > yes (# 3/7 adds grant-table based allocator)
> > >
> > >
> > > > , but it would think it's
> > > > conceptually close enough that it makes sense for the binding.
> > > Interesting idea. I am wondering, do we need an extra actions for this
> > > to work in Linux guest (dummy IOMMU driver, etc)?
> > It depends on how closely the guest implementation can be made to
> > resemble a normal iommu. If you do allocate dma_addr_t addresses,
> > it may actually be close enough that you can just turn the grant-table
> > code into a normal iommu driver and change nothing else.
>
> Unfortunately, I failed to find a way how use grant references at the
> iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I am
> not too familiar with that, so what is written below might be wrong or at
> least not precise.
>
> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by itself, it
> just maps (IOVA-PA) what was requested to be mapped by the upper layer. The
> DMA address allocation is done by the upper layer (DMA-IOMMU which is the glue
> layer between DMA API and IOMMU API allocates IOVA for PA?). But, all what we
> need here is just to allocate our specific grant-table based DMA addresses
> (DMA address = grant reference + offset in the page), so let’s say we need an
> entity to take a physical address as parameter and return a DMA address (what
> actually commit #3/7 is doing), and that’s all. So working at the dma_ops
> layer we get exactly what we need, with the minimal changes to guest
> infrastructure. In our case the Xen itself acts as an IOMMU.
>
> Assuming that we want to reuse the IOMMU infrastructure somehow for our needs.
> I think, in that case we will likely need to introduce a new specific IOVA
> allocator (alongside with a generic one) to be hooked up by the DMA-IOMMU
> layer if we run on top of Xen. But, even having the specific IOVA allocator to
> return what we indeed need (DMA address = grant reference + offset in the
> page) we will still need the specific minimal required IOMMU driver to be
> present in the system anyway in order to track the mappings(?) and do nothing
> with them, returning a success (this specific IOMMU driver should have all
> mandatory callbacks implemented).
>
> I completely agree, it would be really nice to reuse generic IOMMU bindings
> rather than introducing Xen specific property if what we are trying to
> implement in current patch series fits in the usage of "iommus" in Linux
> more-less. But, if we will have to add more complexity/more components to the
> code for the sake of reusing device tree binding, this raises a question
> whether that’s worthwhile.
>
> Or I really missed something?

I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
bindings, not necessarily the IOMMU drivers framework in Linux (although
that would be an added bonus.)

I know from previous discussions with you that making the grant table
fit in the existing IOMMU drivers model is difficult, but just reusing
the Device Tree bindings seems feasible?

2022-05-23 18:19:32

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops


On 19.05.22 09:03, Oleksandr wrote:

Hello Stefano, all


>
> On 19.05.22 04:06, Stefano Stabellini wrote:
>
>
> Hello Stefano, all
>
>> On Thu, 19 May 2022, Oleksandr wrote:
>>>> On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]> wrote:
>>>>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>>>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
>>>>>> <[email protected]> wrote:
>>>>>>     This would mean having a device
>>>>>> node for the grant-table mechanism that can be referred to using the
>>>>>> 'iommus'
>>>>>> phandle property, with the domid as an additional argument.
>>>>> I assume, you are speaking about something like the following?
>>>>>
>>>>>
>>>>> xen_dummy_iommu {
>>>>>       compatible = "xen,dummy-iommu";
>>>>>       #iommu-cells = <1>;
>>>>> };
>>>>>
>>>>> virtio@3000 {
>>>>>       compatible = "virtio,mmio";
>>>>>       reg = <0x3000 0x100>;
>>>>>       interrupts = <41>;
>>>>>
>>>>>       /* The device is located in Xen domain with ID 1 */
>>>>>       iommus = <&xen_dummy_iommu 1>;
>>>>> };
>>>> Right, that's that's the idea,
>>> thank you for the confirmation
>>>
>>>
>>>
>>>>    except I would not call it a 'dummy'.
>>>>   From the perspective of the DT, this behaves just like an IOMMU,
>>>> even if the exact mechanism is different from most hardware IOMMU
>>>> implementations.
>>> well, agree
>>>
>>>
>>>>>> It does not quite fit the model that Linux currently uses for
>>>>>> iommus,
>>>>>> as that has an allocator for dma_addr_t space
>>>>> yes (# 3/7 adds grant-table based allocator)
>>>>>
>>>>>
>>>>>> , but it would think it's
>>>>>> conceptually close enough that it makes sense for the binding.
>>>>> Interesting idea. I am wondering, do we need an extra actions for
>>>>> this
>>>>> to work in Linux guest (dummy IOMMU driver, etc)?
>>>> It depends on how closely the guest implementation can be made to
>>>> resemble a normal iommu. If you do allocate dma_addr_t addresses,
>>>> it may actually be close enough that you can just turn the grant-table
>>>> code into a normal iommu driver and change nothing else.
>>> Unfortunately, I failed to find a way how use grant references at the
>>> iommu_ops level (I mean to fully pretend that we are an IOMMU
>>> driver). I am
>>> not too familiar with that, so what is written below might be wrong
>>> or at
>>> least not precise.
>>>
>>> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
>>> itself, it
>>> just maps (IOVA-PA) what was requested to be mapped by the upper
>>> layer. The
>>> DMA address allocation is done by the upper layer (DMA-IOMMU which
>>> is the glue
>>> layer between DMA API and IOMMU API allocates IOVA for PA?). But,
>>> all what we
>>> need here is just to allocate our specific grant-table based DMA
>>> addresses
>>> (DMA address = grant reference + offset in the page), so let’s say
>>> we need an
>>> entity to take a physical address as parameter and return a DMA
>>> address (what
>>> actually commit #3/7 is doing), and that’s all. So working at the
>>> dma_ops
>>> layer we get exactly what we need, with the minimal changes to guest
>>> infrastructure. In our case the Xen itself acts as an IOMMU.
>>>
>>> Assuming that we want to reuse the IOMMU infrastructure somehow for
>>> our needs.
>>> I think, in that case we will likely need to introduce a new
>>> specific IOVA
>>> allocator (alongside with a generic one) to be hooked up by the
>>> DMA-IOMMU
>>> layer if we run on top of Xen. But, even having the specific IOVA
>>> allocator to
>>> return what we indeed need (DMA address = grant reference + offset
>>> in the
>>> page) we will still need the specific minimal required IOMMU driver
>>> to be
>>> present in the system anyway in order to track the mappings(?) and
>>> do nothing
>>> with them, returning a success (this specific IOMMU driver should
>>> have all
>>> mandatory callbacks implemented).
>>>
>>> I completely agree, it would be really nice to reuse generic IOMMU
>>> bindings
>>> rather than introducing Xen specific property if what we are trying to
>>> implement in current patch series fits in the usage of "iommus" in
>>> Linux
>>> more-less. But, if we will have to add more complexity/more
>>> components to the
>>> code for the sake of reusing device tree binding, this raises a
>>> question
>>> whether that’s worthwhile.
>>>
>>> Or I really missed something?
>> I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
>> bindings, not necessarily the IOMMU drivers framework in Linux (although
>> that would be an added bonus.)
>>
>> I know from previous discussions with you that making the grant table
>> fit in the existing IOMMU drivers model is difficult, but just reusing
>> the Device Tree bindings seems feasible?
>
> I started experimenting with that. As wrote in a separate email, I got
> a deferred probe timeout,
>
> after inserting required nodes into guest device tree, which seems to
> be a consequence of the unavailability of IOMMU, I will continue to
> investigate this question.


I have experimented with that. Yes, just reusing the Device Tree
bindings is technically feasible (and we are able to do this by only
touching grant-dma-ops.c), although deferred probe timeout still stands
(as there is no IOMMU driver being present actually).

[    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout,
ignoring dependency
[    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks
(2.10 GB/1.95 GiB)


Below the working diff (on top of current series):

diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index da9c7ff..6586152 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {

 bool xen_is_grant_dma_device(struct device *dev)
 {
+       struct device_node *iommu_np;
+       bool has_iommu;
+
        /* XXX Handle only DT devices for now */
        if (!dev->of_node)
                return false;

-       return of_property_read_bool(dev->of_node, "xen,backend-domid");
+       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
+       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
"xen,grant-dma");
+       of_node_put(iommu_np);
+
+       return has_iommu;
 }

 void xen_grant_setup_dma_ops(struct device *dev)
 {
        struct xen_grant_dma_data *data;
-       uint32_t domid;
+       struct of_phandle_args iommu_spec;

        data = find_xen_grant_dma_data(dev);
        if (data) {
@@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
        if (!dev->of_node)
                goto err;

-       if (of_property_read_u32(dev->of_node, "xen,backend-domid",
&domid)) {
-               dev_err(dev, "xen,backend-domid property is not present\n");
+       if (of_parse_phandle_with_args(dev->of_node, "iommus",
"#iommu-cells",
+                       0, &iommu_spec)) {
+               dev_err(dev, "Cannot parse iommus property\n");
+               goto err;
+       }
+
+       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
+                       iommu_spec.args_count != 1) {
+               dev_err(dev, "Incompatible IOMMU node\n");
+               of_node_put(iommu_spec.np);
                goto err;
        }

+       of_node_put(iommu_spec.np);
+
        data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
        if (!data)
                goto err;

-       data->backend_domid = domid;
+       /*
+        * The endpoint ID here means the ID of the domain where the
corresponding
+        * backend is running
+        */
+       data->backend_domid = iommu_spec.args[0];

        if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev,
data,
                        GFP_KERNEL))) {
(END)



Below, the nodes generated by Xen toolstack:

        xen_grant_dma {
                compatible = "xen,grant-dma";
                #iommu-cells = <0x01>;
                phandle = <0xfde9>;
        };

        virtio@2000000 {
                compatible = "virtio,mmio";
                reg = <0x00 0x2000000 0x00 0x200>;
                interrupts = <0x00 0x01 0xf01>;
                interrupt-parent = <0xfde8>;
                dma-coherent;
                iommus = <0xfde9 0x01>;
        };



I am wondering, would be the proper solution to eliminate deferred probe
timeout issue in our particular case (without introducing an extra IOMMU
driver)?




>
>
>
>
--
Regards,

Oleksandr Tyshchenko


2022-05-24 02:54:48

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops

On Mon, 23 May 2022, Oleksandr wrote:
> > > On Thu, 19 May 2022, Oleksandr wrote:
> > > > > On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]> wrote:
> > > > > > On 18.05.22 17:32, Arnd Bergmann wrote:
> > > > > > > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
> > > > > > > <[email protected]> wrote:
> > > > > > >     This would mean having a device
> > > > > > > node for the grant-table mechanism that can be referred to using
> > > > > > > the
> > > > > > > 'iommus'
> > > > > > > phandle property, with the domid as an additional argument.
> > > > > > I assume, you are speaking about something like the following?
> > > > > >
> > > > > >
> > > > > > xen_dummy_iommu {
> > > > > >       compatible = "xen,dummy-iommu";
> > > > > >       #iommu-cells = <1>;
> > > > > > };
> > > > > >
> > > > > > virtio@3000 {
> > > > > >       compatible = "virtio,mmio";
> > > > > >       reg = <0x3000 0x100>;
> > > > > >       interrupts = <41>;
> > > > > >
> > > > > >       /* The device is located in Xen domain with ID 1 */
> > > > > >       iommus = <&xen_dummy_iommu 1>;
> > > > > > };
> > > > > Right, that's that's the idea,
> > > > thank you for the confirmation
> > > >
> > > >
> > > >
> > > > >    except I would not call it a 'dummy'.
> > > > >   From the perspective of the DT, this behaves just like an IOMMU,
> > > > > even if the exact mechanism is different from most hardware IOMMU
> > > > > implementations.
> > > > well, agree
> > > >
> > > >
> > > > > > > It does not quite fit the model that Linux currently uses for
> > > > > > > iommus,
> > > > > > > as that has an allocator for dma_addr_t space
> > > > > > yes (# 3/7 adds grant-table based allocator)
> > > > > >
> > > > > >
> > > > > > > , but it would think it's
> > > > > > > conceptually close enough that it makes sense for the binding.
> > > > > > Interesting idea. I am wondering, do we need an extra actions for
> > > > > > this
> > > > > > to work in Linux guest (dummy IOMMU driver, etc)?
> > > > > It depends on how closely the guest implementation can be made to
> > > > > resemble a normal iommu. If you do allocate dma_addr_t addresses,
> > > > > it may actually be close enough that you can just turn the grant-table
> > > > > code into a normal iommu driver and change nothing else.
> > > > Unfortunately, I failed to find a way how use grant references at the
> > > > iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
> > > > am
> > > > not too familiar with that, so what is written below might be wrong or
> > > > at
> > > > least not precise.
> > > >
> > > > The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
> > > > itself, it
> > > > just maps (IOVA-PA) what was requested to be mapped by the upper layer.
> > > > The
> > > > DMA address allocation is done by the upper layer (DMA-IOMMU which is
> > > > the glue
> > > > layer between DMA API and IOMMU API allocates IOVA for PA?). But, all
> > > > what we
> > > > need here is just to allocate our specific grant-table based DMA
> > > > addresses
> > > > (DMA address = grant reference + offset in the page), so let’s say we
> > > > need an
> > > > entity to take a physical address as parameter and return a DMA address
> > > > (what
> > > > actually commit #3/7 is doing), and that’s all. So working at the
> > > > dma_ops
> > > > layer we get exactly what we need, with the minimal changes to guest
> > > > infrastructure. In our case the Xen itself acts as an IOMMU.
> > > >
> > > > Assuming that we want to reuse the IOMMU infrastructure somehow for our
> > > > needs.
> > > > I think, in that case we will likely need to introduce a new specific
> > > > IOVA
> > > > allocator (alongside with a generic one) to be hooked up by the
> > > > DMA-IOMMU
> > > > layer if we run on top of Xen. But, even having the specific IOVA
> > > > allocator to
> > > > return what we indeed need (DMA address = grant reference + offset in
> > > > the
> > > > page) we will still need the specific minimal required IOMMU driver to
> > > > be
> > > > present in the system anyway in order to track the mappings(?) and do
> > > > nothing
> > > > with them, returning a success (this specific IOMMU driver should have
> > > > all
> > > > mandatory callbacks implemented).
> > > >
> > > > I completely agree, it would be really nice to reuse generic IOMMU
> > > > bindings
> > > > rather than introducing Xen specific property if what we are trying to
> > > > implement in current patch series fits in the usage of "iommus" in Linux
> > > > more-less. But, if we will have to add more complexity/more components
> > > > to the
> > > > code for the sake of reusing device tree binding, this raises a question
> > > > whether that’s worthwhile.
> > > >
> > > > Or I really missed something?
> > > I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
> > > bindings, not necessarily the IOMMU drivers framework in Linux (although
> > > that would be an added bonus.)
> > >
> > > I know from previous discussions with you that making the grant table
> > > fit in the existing IOMMU drivers model is difficult, but just reusing
> > > the Device Tree bindings seems feasible?
> >
> > I started experimenting with that. As wrote in a separate email, I got a
> > deferred probe timeout,
> >
> > after inserting required nodes into guest device tree, which seems to be a
> > consequence of the unavailability of IOMMU, I will continue to investigate
> > this question.
>
>
> I have experimented with that. Yes, just reusing the Device Tree bindings is
> technically feasible (and we are able to do this by only touching
> grant-dma-ops.c), although deferred probe timeout still stands (as there is no
> IOMMU driver being present actually).
>
> [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, ignoring
> dependency
> [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks (2.10
> GB/1.95 GiB)
>
>
> Below the working diff (on top of current series):
>
> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> index da9c7ff..6586152 100644
> --- a/drivers/xen/grant-dma-ops.c
> +++ b/drivers/xen/grant-dma-ops.c
> @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {
>
>  bool xen_is_grant_dma_device(struct device *dev)
>  {
> +       struct device_node *iommu_np;
> +       bool has_iommu;
> +
>         /* XXX Handle only DT devices for now */
>         if (!dev->of_node)
>                 return false;
>
> -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
> +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
> +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
> "xen,grant-dma");
> +       of_node_put(iommu_np);
> +
> +       return has_iommu;
>  }
>
>  void xen_grant_setup_dma_ops(struct device *dev)
>  {
>         struct xen_grant_dma_data *data;
> -       uint32_t domid;
> +       struct of_phandle_args iommu_spec;
>
>         data = find_xen_grant_dma_data(dev);
>         if (data) {
> @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
>         if (!dev->of_node)
>                 goto err;
>
> -       if (of_property_read_u32(dev->of_node, "xen,backend-domid", &domid)) {
> -               dev_err(dev, "xen,backend-domid property is not present\n");
> +       if (of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells",
> +                       0, &iommu_spec)) {
> +               dev_err(dev, "Cannot parse iommus property\n");
> +               goto err;
> +       }
> +
> +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
> +                       iommu_spec.args_count != 1) {
> +               dev_err(dev, "Incompatible IOMMU node\n");
> +               of_node_put(iommu_spec.np);
>                 goto err;
>         }
>
> +       of_node_put(iommu_spec.np);
> +
>         data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>         if (!data)
>                 goto err;
>
> -       data->backend_domid = domid;
> +       /*
> +        * The endpoint ID here means the ID of the domain where the
> corresponding
> +        * backend is running
> +        */
> +       data->backend_domid = iommu_spec.args[0];
>
>         if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
>                         GFP_KERNEL))) {
> (END)
>
>
>
> Below, the nodes generated by Xen toolstack:
>
>         xen_grant_dma {
>                 compatible = "xen,grant-dma";
>                 #iommu-cells = <0x01>;
>                 phandle = <0xfde9>;
>         };
>
>         virtio@2000000 {
>                 compatible = "virtio,mmio";
>                 reg = <0x00 0x2000000 0x00 0x200>;
>                 interrupts = <0x00 0x01 0xf01>;
>                 interrupt-parent = <0xfde8>;
>                 dma-coherent;
>                 iommus = <0xfde9 0x01>;
>         };

Not bad! I like it.


> I am wondering, would be the proper solution to eliminate deferred probe
> timeout issue in our particular case (without introducing an extra IOMMU
> driver)?

In reality I don't think there is a way to do that. I would create an
empty skelethon IOMMU driver for xen,grant-dma.

2022-05-24 16:22:17

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops


On 24.05.22 04:58, Stefano Stabellini wrote:

Hello Stefano, all

> On Mon, 23 May 2022, Oleksandr wrote:
>>>> On Thu, 19 May 2022, Oleksandr wrote:
>>>>>> On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]> wrote:
>>>>>>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>>>>>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
>>>>>>>> <[email protected]> wrote:
>>>>>>>>     This would mean having a device
>>>>>>>> node for the grant-table mechanism that can be referred to using
>>>>>>>> the
>>>>>>>> 'iommus'
>>>>>>>> phandle property, with the domid as an additional argument.
>>>>>>> I assume, you are speaking about something like the following?
>>>>>>>
>>>>>>>
>>>>>>> xen_dummy_iommu {
>>>>>>>       compatible = "xen,dummy-iommu";
>>>>>>>       #iommu-cells = <1>;
>>>>>>> };
>>>>>>>
>>>>>>> virtio@3000 {
>>>>>>>       compatible = "virtio,mmio";
>>>>>>>       reg = <0x3000 0x100>;
>>>>>>>       interrupts = <41>;
>>>>>>>
>>>>>>>       /* The device is located in Xen domain with ID 1 */
>>>>>>>       iommus = <&xen_dummy_iommu 1>;
>>>>>>> };
>>>>>> Right, that's that's the idea,
>>>>> thank you for the confirmation
>>>>>
>>>>>
>>>>>
>>>>>>    except I would not call it a 'dummy'.
>>>>>>   From the perspective of the DT, this behaves just like an IOMMU,
>>>>>> even if the exact mechanism is different from most hardware IOMMU
>>>>>> implementations.
>>>>> well, agree
>>>>>
>>>>>
>>>>>>>> It does not quite fit the model that Linux currently uses for
>>>>>>>> iommus,
>>>>>>>> as that has an allocator for dma_addr_t space
>>>>>>> yes (# 3/7 adds grant-table based allocator)
>>>>>>>
>>>>>>>
>>>>>>>> , but it would think it's
>>>>>>>> conceptually close enough that it makes sense for the binding.
>>>>>>> Interesting idea. I am wondering, do we need an extra actions for
>>>>>>> this
>>>>>>> to work in Linux guest (dummy IOMMU driver, etc)?
>>>>>> It depends on how closely the guest implementation can be made to
>>>>>> resemble a normal iommu. If you do allocate dma_addr_t addresses,
>>>>>> it may actually be close enough that you can just turn the grant-table
>>>>>> code into a normal iommu driver and change nothing else.
>>>>> Unfortunately, I failed to find a way how use grant references at the
>>>>> iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
>>>>> am
>>>>> not too familiar with that, so what is written below might be wrong or
>>>>> at
>>>>> least not precise.
>>>>>
>>>>> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
>>>>> itself, it
>>>>> just maps (IOVA-PA) what was requested to be mapped by the upper layer.
>>>>> The
>>>>> DMA address allocation is done by the upper layer (DMA-IOMMU which is
>>>>> the glue
>>>>> layer between DMA API and IOMMU API allocates IOVA for PA?). But, all
>>>>> what we
>>>>> need here is just to allocate our specific grant-table based DMA
>>>>> addresses
>>>>> (DMA address = grant reference + offset in the page), so let’s say we
>>>>> need an
>>>>> entity to take a physical address as parameter and return a DMA address
>>>>> (what
>>>>> actually commit #3/7 is doing), and that’s all. So working at the
>>>>> dma_ops
>>>>> layer we get exactly what we need, with the minimal changes to guest
>>>>> infrastructure. In our case the Xen itself acts as an IOMMU.
>>>>>
>>>>> Assuming that we want to reuse the IOMMU infrastructure somehow for our
>>>>> needs.
>>>>> I think, in that case we will likely need to introduce a new specific
>>>>> IOVA
>>>>> allocator (alongside with a generic one) to be hooked up by the
>>>>> DMA-IOMMU
>>>>> layer if we run on top of Xen. But, even having the specific IOVA
>>>>> allocator to
>>>>> return what we indeed need (DMA address = grant reference + offset in
>>>>> the
>>>>> page) we will still need the specific minimal required IOMMU driver to
>>>>> be
>>>>> present in the system anyway in order to track the mappings(?) and do
>>>>> nothing
>>>>> with them, returning a success (this specific IOMMU driver should have
>>>>> all
>>>>> mandatory callbacks implemented).
>>>>>
>>>>> I completely agree, it would be really nice to reuse generic IOMMU
>>>>> bindings
>>>>> rather than introducing Xen specific property if what we are trying to
>>>>> implement in current patch series fits in the usage of "iommus" in Linux
>>>>> more-less. But, if we will have to add more complexity/more components
>>>>> to the
>>>>> code for the sake of reusing device tree binding, this raises a question
>>>>> whether that’s worthwhile.
>>>>>
>>>>> Or I really missed something?
>>>> I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
>>>> bindings, not necessarily the IOMMU drivers framework in Linux (although
>>>> that would be an added bonus.)
>>>>
>>>> I know from previous discussions with you that making the grant table
>>>> fit in the existing IOMMU drivers model is difficult, but just reusing
>>>> the Device Tree bindings seems feasible?
>>> I started experimenting with that. As wrote in a separate email, I got a
>>> deferred probe timeout,
>>>
>>> after inserting required nodes into guest device tree, which seems to be a
>>> consequence of the unavailability of IOMMU, I will continue to investigate
>>> this question.
>>
>> I have experimented with that. Yes, just reusing the Device Tree bindings is
>> technically feasible (and we are able to do this by only touching
>> grant-dma-ops.c), although deferred probe timeout still stands (as there is no
>> IOMMU driver being present actually).
>>
>> [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, ignoring
>> dependency
>> [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks (2.10
>> GB/1.95 GiB)
>>
>>
>> Below the working diff (on top of current series):
>>
>> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
>> index da9c7ff..6586152 100644
>> --- a/drivers/xen/grant-dma-ops.c
>> +++ b/drivers/xen/grant-dma-ops.c
>> @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {
>>
>>  bool xen_is_grant_dma_device(struct device *dev)
>>  {
>> +       struct device_node *iommu_np;
>> +       bool has_iommu;
>> +
>>         /* XXX Handle only DT devices for now */
>>         if (!dev->of_node)
>>                 return false;
>>
>> -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
>> +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
>> +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
>> "xen,grant-dma");
>> +       of_node_put(iommu_np);
>> +
>> +       return has_iommu;
>>  }
>>
>>  void xen_grant_setup_dma_ops(struct device *dev)
>>  {
>>         struct xen_grant_dma_data *data;
>> -       uint32_t domid;
>> +       struct of_phandle_args iommu_spec;
>>
>>         data = find_xen_grant_dma_data(dev);
>>         if (data) {
>> @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
>>         if (!dev->of_node)
>>                 goto err;
>>
>> -       if (of_property_read_u32(dev->of_node, "xen,backend-domid", &domid)) {
>> -               dev_err(dev, "xen,backend-domid property is not present\n");
>> +       if (of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells",
>> +                       0, &iommu_spec)) {
>> +               dev_err(dev, "Cannot parse iommus property\n");
>> +               goto err;
>> +       }
>> +
>> +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
>> +                       iommu_spec.args_count != 1) {
>> +               dev_err(dev, "Incompatible IOMMU node\n");
>> +               of_node_put(iommu_spec.np);
>>                 goto err;
>>         }
>>
>> +       of_node_put(iommu_spec.np);
>> +
>>         data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>>         if (!data)
>>                 goto err;
>>
>> -       data->backend_domid = domid;
>> +       /*
>> +        * The endpoint ID here means the ID of the domain where the
>> corresponding
>> +        * backend is running
>> +        */
>> +       data->backend_domid = iommu_spec.args[0];
>>
>>         if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
>>                         GFP_KERNEL))) {
>> (END)
>>
>>
>>
>> Below, the nodes generated by Xen toolstack:
>>
>>         xen_grant_dma {
>>                 compatible = "xen,grant-dma";
>>                 #iommu-cells = <0x01>;
>>                 phandle = <0xfde9>;
>>         };
>>
>>         virtio@2000000 {
>>                 compatible = "virtio,mmio";
>>                 reg = <0x00 0x2000000 0x00 0x200>;
>>                 interrupts = <0x00 0x01 0xf01>;
>>                 interrupt-parent = <0xfde8>;
>>                 dma-coherent;
>>                 iommus = <0xfde9 0x01>;
>>         };
>
> Not bad! I like it.


Good.



>
>
>> I am wondering, would be the proper solution to eliminate deferred probe
>> timeout issue in our particular case (without introducing an extra IOMMU
>> driver)?
> In reality I don't think there is a way to do that. I would create an
> empty skelethon IOMMU driver for xen,grant-dma.

Ok, I found yet another option how we can avoid deferred probe timeout
issue. I am not sure whether it will be welcome. But it doesn't really
require introducing stub IOMMU driver or other changes in the guest. The
idea is to make IOMMU device unavailable (status = "disabled"), this way
of_iommu_configure() will treat that as success condition also.

https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L31
https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L149

        xen_grant_dma {
                compatible = "xen,grant-dma";
                #iommu-cells = <0x01>;
                phandle = <0xfde9>;
                status = "disabled";
        };
        virtio@2000000 {
                compatible = "virtio,mmio";
                reg = <0x00 0x2000000 0x00 0x200>;
                interrupts = <0x00 0x01 0xf01>;
                interrupt-parent = <0xfde8>;
                dma-coherent;
                iommus = <0xfde9 0x01>;
        };

I have checked, this "fixes" deferred probe timeout issue.


Or we indeed need to introduce stub IOMMU driver (I placed it to
driver/xen instead of driver/iommu, also we can even squash it with
grant-dma-ops.c?).
This stub driver also results in NO_IOMMU condition (as "of_xlate"
callback is not implemented).

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index a7bd8ce..35b91b9 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -335,6 +335,10 @@ config XEN_UNPOPULATED_ALLOC
          having to balloon out RAM regions in order to obtain physical
memory
          space to create such mappings.

+config XEN_GRANT_DMA_IOMMU
+       bool
+       select IOMMU_API
+
 config XEN_GRANT_DMA_OPS
        bool
        select DMA_OPS
@@ -343,6 +347,7 @@ config XEN_VIRTIO
        bool "Xen virtio support"
        depends on VIRTIO
        select XEN_GRANT_DMA_OPS
+       select XEN_GRANT_DMA_IOMMU
        help
          Enable virtio support for running as Xen guest. Depending on the
          guest type this will require special support on the backend side
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 1a23cb0..c0503f1 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -40,3 +40,4 @@ xen-privcmd-y                         := privcmd.o
privcmd-buf.o
 obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF)    += xen-front-pgdir-shbuf.o
 obj-$(CONFIG_XEN_UNPOPULATED_ALLOC)    += unpopulated-alloc.o
 obj-$(CONFIG_XEN_GRANT_DMA_OPS)                += grant-dma-ops.o
+obj-$(CONFIG_XEN_GRANT_DMA_IOMMU)      += grant-dma-iommu.o
diff --git a/drivers/xen/grant-dma-iommu.c b/drivers/xen/grant-dma-iommu.c
new file mode 100644
index 00000000..b8aad8a
--- /dev/null
+++ b/drivers/xen/grant-dma-iommu.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Stub IOMMU driver which does nothing.
+ * The main purpose of it being present is to reuse generic device-tree
IOMMU
+ * bindings by Xen grant DMA-mapping layer.
+ */
+
+#include <linux/iommu.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+
+struct grant_dma_iommu_device {
+       struct device *dev;
+       struct iommu_device iommu;
+};
+
+/* Nothing is really needed here */
+static const struct iommu_ops grant_dma_iommu_ops;
+
+static const struct of_device_id grant_dma_iommu_of_match[] = {
+       { .compatible = "xen,grant-dma" },
+       { },
+};
+
+static int grant_dma_iommu_probe(struct platform_device *pdev)
+{
+       struct grant_dma_iommu_device *mmu;
+       int ret;
+
+       mmu = devm_kzalloc(&pdev->dev, sizeof(*mmu), GFP_KERNEL);
+       if (!mmu)
+               return -ENOMEM;
+
+       mmu->dev = &pdev->dev;
+
+       ret = iommu_device_register(&mmu->iommu, &grant_dma_iommu_ops,
&pdev->dev);
+       if (ret)
+               return ret;
+
+       platform_set_drvdata(pdev, mmu);
+
+       return 0;
+}
+
+static int grant_dma_iommu_remove(struct platform_device *pdev)
+{
+       struct grant_dma_iommu_device *mmu = platform_get_drvdata(pdev);
+
+       platform_set_drvdata(pdev, NULL);
+       iommu_device_unregister(&mmu->iommu);
+
+       return 0;
+}
+
+static struct platform_driver grant_dma_iommu_driver = {
+       .driver = {
+               .name = "grant-dma-iommu",
+               .of_match_table = grant_dma_iommu_of_match,
+       },
+       .probe = grant_dma_iommu_probe,
+       .remove = grant_dma_iommu_remove,
+};
+
+static int __init grant_dma_iommu_init(void)
+{
+       struct device_node *iommu_np;
+
+       iommu_np = of_find_matching_node(NULL, grant_dma_iommu_of_match);
+       if (!iommu_np)
+               return 0;
+
+       of_node_put(iommu_np);
+
+       return platform_driver_register(&grant_dma_iommu_driver);
+}
+subsys_initcall(grant_dma_iommu_init);

I have checked, this also "fixes" deferred probe timeout issue.

Personally I would prefer the first option, but I would be also happy to
use second option in order to unblock the series.

What do the maintainers think?


--
Regards,

Oleksandr Tyshchenko


2022-05-25 03:26:12

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops

"

On Tue, May 24, 2022 at 9:01 AM Rob Herring <[email protected]> wrote:
>
> +Saravana
>
> On Mon, May 23, 2022 at 06:58:13PM -0700, Stefano Stabellini wrote:
> > On Mon, 23 May 2022, Oleksandr wrote:
> > > > > On Thu, 19 May 2022, Oleksandr wrote:
> > > > > > > On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]> wrote:
> > > > > > > > On 18.05.22 17:32, Arnd Bergmann wrote:
> > > > > > > > > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
> > > > > > > > > <[email protected]> wrote:
> > > > > > > > > This would mean having a device
> > > > > > > > > node for the grant-table mechanism that can be referred to using
> > > > > > > > > the
> > > > > > > > > 'iommus'
> > > > > > > > > phandle property, with the domid as an additional argument.
> > > > > > > > I assume, you are speaking about something like the following?
> > > > > > > >
> > > > > > > >
> > > > > > > > xen_dummy_iommu {
> > > > > > > > compatible = "xen,dummy-iommu";
> > > > > > > > #iommu-cells = <1>;
> > > > > > > > };
> > > > > > > >
> > > > > > > > virtio@3000 {
> > > > > > > > compatible = "virtio,mmio";
> > > > > > > > reg = <0x3000 0x100>;
> > > > > > > > interrupts = <41>;
> > > > > > > >
> > > > > > > > /* The device is located in Xen domain with ID 1 */
> > > > > > > > iommus = <&xen_dummy_iommu 1>;
> > > > > > > > };
> > > > > > > Right, that's that's the idea,
> > > > > > thank you for the confirmation
> > > > > >
> > > > > >
> > > > > >
> > > > > > > except I would not call it a 'dummy'.
> > > > > > > From the perspective of the DT, this behaves just like an IOMMU,
> > > > > > > even if the exact mechanism is different from most hardware IOMMU
> > > > > > > implementations.
> > > > > > well, agree
> > > > > >
> > > > > >
> > > > > > > > > It does not quite fit the model that Linux currently uses for
> > > > > > > > > iommus,
> > > > > > > > > as that has an allocator for dma_addr_t space
> > > > > > > > yes (# 3/7 adds grant-table based allocator)
> > > > > > > >
> > > > > > > >
> > > > > > > > > , but it would think it's
> > > > > > > > > conceptually close enough that it makes sense for the binding.
> > > > > > > > Interesting idea. I am wondering, do we need an extra actions for
> > > > > > > > this
> > > > > > > > to work in Linux guest (dummy IOMMU driver, etc)?
> > > > > > > It depends on how closely the guest implementation can be made to
> > > > > > > resemble a normal iommu. If you do allocate dma_addr_t addresses,
> > > > > > > it may actually be close enough that you can just turn the grant-table
> > > > > > > code into a normal iommu driver and change nothing else.
> > > > > > Unfortunately, I failed to find a way how use grant references at the
> > > > > > iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
> > > > > > am
> > > > > > not too familiar with that, so what is written below might be wrong or
> > > > > > at
> > > > > > least not precise.
> > > > > >
> > > > > > The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
> > > > > > itself, it
> > > > > > just maps (IOVA-PA) what was requested to be mapped by the upper layer.
> > > > > > The
> > > > > > DMA address allocation is done by the upper layer (DMA-IOMMU which is
> > > > > > the glue
> > > > > > layer between DMA API and IOMMU API allocates IOVA for PA?). But, all
> > > > > > what we
> > > > > > need here is just to allocate our specific grant-table based DMA
> > > > > > addresses
> > > > > > (DMA address = grant reference + offset in the page), so let’s say we
> > > > > > need an
> > > > > > entity to take a physical address as parameter and return a DMA address
> > > > > > (what
> > > > > > actually commit #3/7 is doing), and that’s all. So working at the
> > > > > > dma_ops
> > > > > > layer we get exactly what we need, with the minimal changes to guest
> > > > > > infrastructure. In our case the Xen itself acts as an IOMMU.
> > > > > >
> > > > > > Assuming that we want to reuse the IOMMU infrastructure somehow for our
> > > > > > needs.
> > > > > > I think, in that case we will likely need to introduce a new specific
> > > > > > IOVA
> > > > > > allocator (alongside with a generic one) to be hooked up by the
> > > > > > DMA-IOMMU
> > > > > > layer if we run on top of Xen. But, even having the specific IOVA
> > > > > > allocator to
> > > > > > return what we indeed need (DMA address = grant reference + offset in
> > > > > > the
> > > > > > page) we will still need the specific minimal required IOMMU driver to
> > > > > > be
> > > > > > present in the system anyway in order to track the mappings(?) and do
> > > > > > nothing
> > > > > > with them, returning a success (this specific IOMMU driver should have
> > > > > > all
> > > > > > mandatory callbacks implemented).
> > > > > >
> > > > > > I completely agree, it would be really nice to reuse generic IOMMU
> > > > > > bindings
> > > > > > rather than introducing Xen specific property if what we are trying to
> > > > > > implement in current patch series fits in the usage of "iommus" in Linux
> > > > > > more-less. But, if we will have to add more complexity/more components
> > > > > > to the
> > > > > > code for the sake of reusing device tree binding, this raises a question
> > > > > > whether that’s worthwhile.
> > > > > >
> > > > > > Or I really missed something?
> > > > > I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
> > > > > bindings, not necessarily the IOMMU drivers framework in Linux (although
> > > > > that would be an added bonus.)
> > > > >
> > > > > I know from previous discussions with you that making the grant table
> > > > > fit in the existing IOMMU drivers model is difficult, but just reusing
> > > > > the Device Tree bindings seems feasible?
> > > >
> > > > I started experimenting with that. As wrote in a separate email, I got a
> > > > deferred probe timeout,
> > > >
> > > > after inserting required nodes into guest device tree, which seems to be a
> > > > consequence of the unavailability of IOMMU, I will continue to investigate
> > > > this question.
> > >
> > >
> > > I have experimented with that. Yes, just reusing the Device Tree bindings is
> > > technically feasible (and we are able to do this by only touching
> > > grant-dma-ops.c), although deferred probe timeout still stands (as there is no
> > > IOMMU driver being present actually).
> > >
> > > [ 0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, ignoring
> > > dependency
> > > [ 0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks (2.10
> > > GB/1.95 GiB)
> > >
> > >
> > > Below the working diff (on top of current series):
> > >
> > > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > > index da9c7ff..6586152 100644
> > > --- a/drivers/xen/grant-dma-ops.c
> > > +++ b/drivers/xen/grant-dma-ops.c
> > > @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {
> > >
> > > bool xen_is_grant_dma_device(struct device *dev)
> > > {
> > > + struct device_node *iommu_np;
> > > + bool has_iommu;
> > > +
> > > /* XXX Handle only DT devices for now */
> > > if (!dev->of_node)
> > > return false;
> > >
> > > - return of_property_read_bool(dev->of_node, "xen,backend-domid");
> > > + iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
> > > + has_iommu = iommu_np && of_device_is_compatible(iommu_np,
> > > "xen,grant-dma");
> > > + of_node_put(iommu_np);
> > > +
> > > + return has_iommu;
> > > }
> > >
> > > void xen_grant_setup_dma_ops(struct device *dev)
> > > {
> > > struct xen_grant_dma_data *data;
> > > - uint32_t domid;
> > > + struct of_phandle_args iommu_spec;
> > >
> > > data = find_xen_grant_dma_data(dev);
> > > if (data) {
> > > @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
> > > if (!dev->of_node)
> > > goto err;
> > >
> > > - if (of_property_read_u32(dev->of_node, "xen,backend-domid", &domid)) {
> > > - dev_err(dev, "xen,backend-domid property is not present\n");
> > > + if (of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells",
> > > + 0, &iommu_spec)) {
> > > + dev_err(dev, "Cannot parse iommus property\n");
> > > + goto err;
> > > + }
> > > +
> > > + if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
> > > + iommu_spec.args_count != 1) {
> > > + dev_err(dev, "Incompatible IOMMU node\n");
> > > + of_node_put(iommu_spec.np);
> > > goto err;
> > > }
> > >
> > > + of_node_put(iommu_spec.np);
> > > +
> > > data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> > > if (!data)
> > > goto err;
> > >
> > > - data->backend_domid = domid;
> > > + /*
> > > + * The endpoint ID here means the ID of the domain where the
> > > corresponding
> > > + * backend is running
> > > + */
> > > + data->backend_domid = iommu_spec.args[0];
> > >
> > > if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
> > > GFP_KERNEL))) {
> > > (END)
> > >
> > >
> > >
> > > Below, the nodes generated by Xen toolstack:
> > >
> > > xen_grant_dma {
>
> Nit: iommu {
>
> > > compatible = "xen,grant-dma";
> > > #iommu-cells = <0x01>;
> > > phandle = <0xfde9>;
> > > };
> > >
> > > virtio@2000000 {
> > > compatible = "virtio,mmio";
> > > reg = <0x00 0x2000000 0x00 0x200>;
> > > interrupts = <0x00 0x01 0xf01>;
> > > interrupt-parent = <0xfde8>;
> > > dma-coherent;
> > > iommus = <0xfde9 0x01>;
> > > };
> >
> > Not bad! I like it.
> >
> >
> > > I am wondering, would be the proper solution to eliminate deferred probe
> > > timeout issue in our particular case (without introducing an extra IOMMU
> > > driver)?
> >
> > In reality I don't think there is a way to do that. I would create an
> > empty skelethon IOMMU driver for xen,grant-dma.
>
> Does it have to be an empty driver? Originally, IOMMU 'drivers' were not
> drivers, but they've been getting converted. Can that be done here?
>
> Short of that, I think we could have some sort of skip probe list for
> deferred probe. Not sure if that would be easiest as IOMMU specific or
> global.

Hi Oleksandr,

If you do fw_devlink.strict=1, you'll notice that the consumers of
this "iommu" won't probe at all or will delay the boot by some number
of seconds. The eventual goal is to go towards fw_devlink.strict=1
being the default.

From a fw_devlik perspective, please implement a driver. Ideally a
real one, but at least an empty one. The empty one doesn't need to be
an IOMMU driver, but at least just do a return 0 in the probe
function. Also, if it's not a device, why even have a "compatible"
property (removing it won't necessarily remove the deferred probe
timeout issue you see)? Will any code be using "xen,grant-dma" to look
up the node? If so, that driver could be the one that probes this
device. At least from a fw_devlink perspective, it just needs to have
a driver that binds to this device.

Also, if we aren't going to implement a driver and have the supplier
("xen,grant-dma") behave like a device (as in, have a driver that
probes), I'd rather that the iommu binding not be used at all as this
would be an exception to how every other iommu device behaves.

-Saravana

2022-05-25 09:51:21

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops

On Tue, 24 May 2022, Oleksandr wrote:
> > On Mon, 23 May 2022, Oleksandr wrote:
> > > > > On Thu, 19 May 2022, Oleksandr wrote:
> > > > > > > On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]>
> > > > > > > wrote:
> > > > > > > > On 18.05.22 17:32, Arnd Bergmann wrote:
> > > > > > > > > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
> > > > > > > > > <[email protected]> wrote:
> > > > > > > > >     This would mean having a device
> > > > > > > > > node for the grant-table mechanism that can be referred to
> > > > > > > > > using
> > > > > > > > > the
> > > > > > > > > 'iommus'
> > > > > > > > > phandle property, with the domid as an additional argument.
> > > > > > > > I assume, you are speaking about something like the following?
> > > > > > > >
> > > > > > > >
> > > > > > > > xen_dummy_iommu {
> > > > > > > >       compatible = "xen,dummy-iommu";
> > > > > > > >       #iommu-cells = <1>;
> > > > > > > > };
> > > > > > > >
> > > > > > > > virtio@3000 {
> > > > > > > >       compatible = "virtio,mmio";
> > > > > > > >       reg = <0x3000 0x100>;
> > > > > > > >       interrupts = <41>;
> > > > > > > >
> > > > > > > >       /* The device is located in Xen domain with ID 1 */
> > > > > > > >       iommus = <&xen_dummy_iommu 1>;
> > > > > > > > };
> > > > > > > Right, that's that's the idea,
> > > > > > thank you for the confirmation
> > > > > >
> > > > > >
> > > > > >
> > > > > > >    except I would not call it a 'dummy'.
> > > > > > >   From the perspective of the DT, this behaves just like an
> > > > > > > IOMMU,
> > > > > > > even if the exact mechanism is different from most hardware IOMMU
> > > > > > > implementations.
> > > > > > well, agree
> > > > > >
> > > > > >
> > > > > > > > > It does not quite fit the model that Linux currently uses for
> > > > > > > > > iommus,
> > > > > > > > > as that has an allocator for dma_addr_t space
> > > > > > > > yes (# 3/7 adds grant-table based allocator)
> > > > > > > >
> > > > > > > >
> > > > > > > > > , but it would think it's
> > > > > > > > > conceptually close enough that it makes sense for the binding.
> > > > > > > > Interesting idea. I am wondering, do we need an extra actions
> > > > > > > > for
> > > > > > > > this
> > > > > > > > to work in Linux guest (dummy IOMMU driver, etc)?
> > > > > > > It depends on how closely the guest implementation can be made to
> > > > > > > resemble a normal iommu. If you do allocate dma_addr_t addresses,
> > > > > > > it may actually be close enough that you can just turn the
> > > > > > > grant-table
> > > > > > > code into a normal iommu driver and change nothing else.
> > > > > > Unfortunately, I failed to find a way how use grant references at
> > > > > > the
> > > > > > iommu_ops level (I mean to fully pretend that we are an IOMMU
> > > > > > driver). I
> > > > > > am
> > > > > > not too familiar with that, so what is written below might be wrong
> > > > > > or
> > > > > > at
> > > > > > least not precise.
> > > > > >
> > > > > > The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
> > > > > > itself, it
> > > > > > just maps (IOVA-PA) what was requested to be mapped by the upper
> > > > > > layer.
> > > > > > The
> > > > > > DMA address allocation is done by the upper layer (DMA-IOMMU which
> > > > > > is
> > > > > > the glue
> > > > > > layer between DMA API and IOMMU API allocates IOVA for PA?). But,
> > > > > > all
> > > > > > what we
> > > > > > need here is just to allocate our specific grant-table based DMA
> > > > > > addresses
> > > > > > (DMA address = grant reference + offset in the page), so let’s say
> > > > > > we
> > > > > > need an
> > > > > > entity to take a physical address as parameter and return a DMA
> > > > > > address
> > > > > > (what
> > > > > > actually commit #3/7 is doing), and that’s all. So working at the
> > > > > > dma_ops
> > > > > > layer we get exactly what we need, with the minimal changes to guest
> > > > > > infrastructure. In our case the Xen itself acts as an IOMMU.
> > > > > >
> > > > > > Assuming that we want to reuse the IOMMU infrastructure somehow for
> > > > > > our
> > > > > > needs.
> > > > > > I think, in that case we will likely need to introduce a new
> > > > > > specific
> > > > > > IOVA
> > > > > > allocator (alongside with a generic one) to be hooked up by the
> > > > > > DMA-IOMMU
> > > > > > layer if we run on top of Xen. But, even having the specific IOVA
> > > > > > allocator to
> > > > > > return what we indeed need (DMA address = grant reference + offset
> > > > > > in
> > > > > > the
> > > > > > page) we will still need the specific minimal required IOMMU driver
> > > > > > to
> > > > > > be
> > > > > > present in the system anyway in order to track the mappings(?) and
> > > > > > do
> > > > > > nothing
> > > > > > with them, returning a success (this specific IOMMU driver should
> > > > > > have
> > > > > > all
> > > > > > mandatory callbacks implemented).
> > > > > >
> > > > > > I completely agree, it would be really nice to reuse generic IOMMU
> > > > > > bindings
> > > > > > rather than introducing Xen specific property if what we are trying
> > > > > > to
> > > > > > implement in current patch series fits in the usage of "iommus" in
> > > > > > Linux
> > > > > > more-less. But, if we will have to add more complexity/more
> > > > > > components
> > > > > > to the
> > > > > > code for the sake of reusing device tree binding, this raises a
> > > > > > question
> > > > > > whether that’s worthwhile.
> > > > > >
> > > > > > Or I really missed something?
> > > > > I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
> > > > > bindings, not necessarily the IOMMU drivers framework in Linux
> > > > > (although
> > > > > that would be an added bonus.)
> > > > >
> > > > > I know from previous discussions with you that making the grant table
> > > > > fit in the existing IOMMU drivers model is difficult, but just reusing
> > > > > the Device Tree bindings seems feasible?
> > > > I started experimenting with that. As wrote in a separate email, I got a
> > > > deferred probe timeout,
> > > >
> > > > after inserting required nodes into guest device tree, which seems to be
> > > > a
> > > > consequence of the unavailability of IOMMU, I will continue to
> > > > investigate
> > > > this question.
> > >
> > > I have experimented with that. Yes, just reusing the Device Tree bindings
> > > is
> > > technically feasible (and we are able to do this by only touching
> > > grant-dma-ops.c), although deferred probe timeout still stands (as there
> > > is no
> > > IOMMU driver being present actually).
> > >
> > > [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout,
> > > ignoring
> > > dependency
> > > [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks
> > > (2.10
> > > GB/1.95 GiB)
> > >
> > >
> > > Below the working diff (on top of current series):
> > >
> > > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > > index da9c7ff..6586152 100644
> > > --- a/drivers/xen/grant-dma-ops.c
> > > +++ b/drivers/xen/grant-dma-ops.c
> > > @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops =
> > > {
> > >
> > >  bool xen_is_grant_dma_device(struct device *dev)
> > >  {
> > > +       struct device_node *iommu_np;
> > > +       bool has_iommu;
> > > +
> > >         /* XXX Handle only DT devices for now */
> > >         if (!dev->of_node)
> > >                 return false;
> > >
> > > -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
> > > +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
> > > +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
> > > "xen,grant-dma");
> > > +       of_node_put(iommu_np);
> > > +
> > > +       return has_iommu;
> > >  }
> > >
> > >  void xen_grant_setup_dma_ops(struct device *dev)
> > >  {
> > >         struct xen_grant_dma_data *data;
> > > -       uint32_t domid;
> > > +       struct of_phandle_args iommu_spec;
> > >
> > >         data = find_xen_grant_dma_data(dev);
> > >         if (data) {
> > > @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
> > >         if (!dev->of_node)
> > >                 goto err;
> > >
> > > -       if (of_property_read_u32(dev->of_node, "xen,backend-domid",
> > > &domid)) {
> > > -               dev_err(dev, "xen,backend-domid property is not
> > > present\n");
> > > +       if (of_parse_phandle_with_args(dev->of_node, "iommus",
> > > "#iommu-cells",
> > > +                       0, &iommu_spec)) {
> > > +               dev_err(dev, "Cannot parse iommus property\n");
> > > +               goto err;
> > > +       }
> > > +
> > > +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
> > > +                       iommu_spec.args_count != 1) {
> > > +               dev_err(dev, "Incompatible IOMMU node\n");
> > > +               of_node_put(iommu_spec.np);
> > >                 goto err;
> > >         }
> > >
> > > +       of_node_put(iommu_spec.np);
> > > +
> > >         data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> > >         if (!data)
> > >                 goto err;
> > >
> > > -       data->backend_domid = domid;
> > > +       /*
> > > +        * The endpoint ID here means the ID of the domain where the
> > > corresponding
> > > +        * backend is running
> > > +        */
> > > +       data->backend_domid = iommu_spec.args[0];
> > >
> > >         if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev,
> > > data,
> > >                         GFP_KERNEL))) {
> > > (END)
> > >
> > >
> > >
> > > Below, the nodes generated by Xen toolstack:
> > >
> > >         xen_grant_dma {
> > >                 compatible = "xen,grant-dma";
> > >                 #iommu-cells = <0x01>;
> > >                 phandle = <0xfde9>;
> > >         };
> > >
> > >         virtio@2000000 {
> > >                 compatible = "virtio,mmio";
> > >                 reg = <0x00 0x2000000 0x00 0x200>;
> > >                 interrupts = <0x00 0x01 0xf01>;
> > >                 interrupt-parent = <0xfde8>;
> > >                 dma-coherent;
> > >                 iommus = <0xfde9 0x01>;
> > >         };
> > Not bad! I like it.
>
>
> Good.
>
>
>
> >
> > > I am wondering, would be the proper solution to eliminate deferred probe
> > > timeout issue in our particular case (without introducing an extra IOMMU
> > > driver)?
> > In reality I don't think there is a way to do that. I would create an
> > empty skelethon IOMMU driver for xen,grant-dma.
>
> Ok, I found yet another option how we can avoid deferred probe timeout issue.
> I am not sure whether it will be welcome. But it doesn't really require
> introducing stub IOMMU driver or other changes in the guest. The idea is to
> make IOMMU device unavailable (status = "disabled"), this way
> of_iommu_configure() will treat that as success condition also.
>
> https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L31
> https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L149
>
>         xen_grant_dma {
>                 compatible = "xen,grant-dma";
>                 #iommu-cells = <0x01>;
>                 phandle = <0xfde9>;
>                 status = "disabled";
>         };
>         virtio@2000000 {
>                 compatible = "virtio,mmio";
>                 reg = <0x00 0x2000000 0x00 0x200>;
>                 interrupts = <0x00 0x01 0xf01>;
>                 interrupt-parent = <0xfde8>;
>                 dma-coherent;
>                 iommus = <0xfde9 0x01>;
>         };
>
> I have checked, this "fixes" deferred probe timeout issue.
>
>
> Or we indeed need to introduce stub IOMMU driver (I placed it to driver/xen
> instead of driver/iommu, also we can even squash it with grant-dma-ops.c?).
> This stub driver also results in NO_IOMMU condition (as "of_xlate" callback is
> not implemented).
>
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index a7bd8ce..35b91b9 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -335,6 +335,10 @@ config XEN_UNPOPULATED_ALLOC
>           having to balloon out RAM regions in order to obtain physical memory
>           space to create such mappings.
>
> +config XEN_GRANT_DMA_IOMMU
> +       bool
> +       select IOMMU_API
> +
>  config XEN_GRANT_DMA_OPS
>         bool
>         select DMA_OPS
> @@ -343,6 +347,7 @@ config XEN_VIRTIO
>         bool "Xen virtio support"
>         depends on VIRTIO
>         select XEN_GRANT_DMA_OPS
> +       select XEN_GRANT_DMA_IOMMU
>         help
>           Enable virtio support for running as Xen guest. Depending on the
>           guest type this will require special support on the backend side
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 1a23cb0..c0503f1 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -40,3 +40,4 @@ xen-privcmd-y                         := privcmd.o
> privcmd-buf.o
>  obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF)    += xen-front-pgdir-shbuf.o
>  obj-$(CONFIG_XEN_UNPOPULATED_ALLOC)    += unpopulated-alloc.o
>  obj-$(CONFIG_XEN_GRANT_DMA_OPS)                += grant-dma-ops.o
> +obj-$(CONFIG_XEN_GRANT_DMA_IOMMU)      += grant-dma-iommu.o
> diff --git a/drivers/xen/grant-dma-iommu.c b/drivers/xen/grant-dma-iommu.c
> new file mode 100644
> index 00000000..b8aad8a
> --- /dev/null
> +++ b/drivers/xen/grant-dma-iommu.c
> @@ -0,0 +1,76 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Stub IOMMU driver which does nothing.
> + * The main purpose of it being present is to reuse generic device-tree IOMMU
> + * bindings by Xen grant DMA-mapping layer.
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +
> +struct grant_dma_iommu_device {
> +       struct device *dev;
> +       struct iommu_device iommu;
> +};
> +
> +/* Nothing is really needed here */
> +static const struct iommu_ops grant_dma_iommu_ops;
> +
> +static const struct of_device_id grant_dma_iommu_of_match[] = {
> +       { .compatible = "xen,grant-dma" },
> +       { },
> +};
> +
> +static int grant_dma_iommu_probe(struct platform_device *pdev)
> +{
> +       struct grant_dma_iommu_device *mmu;
> +       int ret;
> +
> +       mmu = devm_kzalloc(&pdev->dev, sizeof(*mmu), GFP_KERNEL);
> +       if (!mmu)
> +               return -ENOMEM;
> +
> +       mmu->dev = &pdev->dev;
> +
> +       ret = iommu_device_register(&mmu->iommu, &grant_dma_iommu_ops,
> &pdev->dev);
> +       if (ret)
> +               return ret;
> +
> +       platform_set_drvdata(pdev, mmu);
> +
> +       return 0;
> +}
> +
> +static int grant_dma_iommu_remove(struct platform_device *pdev)
> +{
> +       struct grant_dma_iommu_device *mmu = platform_get_drvdata(pdev);
> +
> +       platform_set_drvdata(pdev, NULL);
> +       iommu_device_unregister(&mmu->iommu);
> +
> +       return 0;
> +}
> +
> +static struct platform_driver grant_dma_iommu_driver = {
> +       .driver = {
> +               .name = "grant-dma-iommu",
> +               .of_match_table = grant_dma_iommu_of_match,
> +       },
> +       .probe = grant_dma_iommu_probe,
> +       .remove = grant_dma_iommu_remove,
> +};
> +
> +static int __init grant_dma_iommu_init(void)
> +{
> +       struct device_node *iommu_np;
> +
> +       iommu_np = of_find_matching_node(NULL, grant_dma_iommu_of_match);
> +       if (!iommu_np)
> +               return 0;
> +
> +       of_node_put(iommu_np);
> +
> +       return platform_driver_register(&grant_dma_iommu_driver);
> +}
> +subsys_initcall(grant_dma_iommu_init);
>
> I have checked, this also "fixes" deferred probe timeout issue.
>
> Personally I would prefer the first option, but I would be also happy to use
> second option in order to unblock the series.
>
> What do the maintainers think?


I don't think it is a good idea to mark the fake IOMMU as disabled
because it implies that there is no need to use it (no need to use
dma_ops) which is a problem.

If we don't want the skelethon driver then Rob's suggestion of having a
skip list for deferred probe is better.

I think the skelethon driver also is totally fine.

2022-05-26 00:40:11

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops

+Saravana

On Mon, May 23, 2022 at 06:58:13PM -0700, Stefano Stabellini wrote:
> On Mon, 23 May 2022, Oleksandr wrote:
> > > > On Thu, 19 May 2022, Oleksandr wrote:
> > > > > > On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]> wrote:
> > > > > > > On 18.05.22 17:32, Arnd Bergmann wrote:
> > > > > > > > On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
> > > > > > > > <[email protected]> wrote:
> > > > > > > >     This would mean having a device
> > > > > > > > node for the grant-table mechanism that can be referred to using
> > > > > > > > the
> > > > > > > > 'iommus'
> > > > > > > > phandle property, with the domid as an additional argument.
> > > > > > > I assume, you are speaking about something like the following?
> > > > > > >
> > > > > > >
> > > > > > > xen_dummy_iommu {
> > > > > > >       compatible = "xen,dummy-iommu";
> > > > > > >       #iommu-cells = <1>;
> > > > > > > };
> > > > > > >
> > > > > > > virtio@3000 {
> > > > > > >       compatible = "virtio,mmio";
> > > > > > >       reg = <0x3000 0x100>;
> > > > > > >       interrupts = <41>;
> > > > > > >
> > > > > > >       /* The device is located in Xen domain with ID 1 */
> > > > > > >       iommus = <&xen_dummy_iommu 1>;
> > > > > > > };
> > > > > > Right, that's that's the idea,
> > > > > thank you for the confirmation
> > > > >
> > > > >
> > > > >
> > > > > >    except I would not call it a 'dummy'.
> > > > > >   From the perspective of the DT, this behaves just like an IOMMU,
> > > > > > even if the exact mechanism is different from most hardware IOMMU
> > > > > > implementations.
> > > > > well, agree
> > > > >
> > > > >
> > > > > > > > It does not quite fit the model that Linux currently uses for
> > > > > > > > iommus,
> > > > > > > > as that has an allocator for dma_addr_t space
> > > > > > > yes (# 3/7 adds grant-table based allocator)
> > > > > > >
> > > > > > >
> > > > > > > > , but it would think it's
> > > > > > > > conceptually close enough that it makes sense for the binding.
> > > > > > > Interesting idea. I am wondering, do we need an extra actions for
> > > > > > > this
> > > > > > > to work in Linux guest (dummy IOMMU driver, etc)?
> > > > > > It depends on how closely the guest implementation can be made to
> > > > > > resemble a normal iommu. If you do allocate dma_addr_t addresses,
> > > > > > it may actually be close enough that you can just turn the grant-table
> > > > > > code into a normal iommu driver and change nothing else.
> > > > > Unfortunately, I failed to find a way how use grant references at the
> > > > > iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
> > > > > am
> > > > > not too familiar with that, so what is written below might be wrong or
> > > > > at
> > > > > least not precise.
> > > > >
> > > > > The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
> > > > > itself, it
> > > > > just maps (IOVA-PA) what was requested to be mapped by the upper layer.
> > > > > The
> > > > > DMA address allocation is done by the upper layer (DMA-IOMMU which is
> > > > > the glue
> > > > > layer between DMA API and IOMMU API allocates IOVA for PA?). But, all
> > > > > what we
> > > > > need here is just to allocate our specific grant-table based DMA
> > > > > addresses
> > > > > (DMA address = grant reference + offset in the page), so let’s say we
> > > > > need an
> > > > > entity to take a physical address as parameter and return a DMA address
> > > > > (what
> > > > > actually commit #3/7 is doing), and that’s all. So working at the
> > > > > dma_ops
> > > > > layer we get exactly what we need, with the minimal changes to guest
> > > > > infrastructure. In our case the Xen itself acts as an IOMMU.
> > > > >
> > > > > Assuming that we want to reuse the IOMMU infrastructure somehow for our
> > > > > needs.
> > > > > I think, in that case we will likely need to introduce a new specific
> > > > > IOVA
> > > > > allocator (alongside with a generic one) to be hooked up by the
> > > > > DMA-IOMMU
> > > > > layer if we run on top of Xen. But, even having the specific IOVA
> > > > > allocator to
> > > > > return what we indeed need (DMA address = grant reference + offset in
> > > > > the
> > > > > page) we will still need the specific minimal required IOMMU driver to
> > > > > be
> > > > > present in the system anyway in order to track the mappings(?) and do
> > > > > nothing
> > > > > with them, returning a success (this specific IOMMU driver should have
> > > > > all
> > > > > mandatory callbacks implemented).
> > > > >
> > > > > I completely agree, it would be really nice to reuse generic IOMMU
> > > > > bindings
> > > > > rather than introducing Xen specific property if what we are trying to
> > > > > implement in current patch series fits in the usage of "iommus" in Linux
> > > > > more-less. But, if we will have to add more complexity/more components
> > > > > to the
> > > > > code for the sake of reusing device tree binding, this raises a question
> > > > > whether that’s worthwhile.
> > > > >
> > > > > Or I really missed something?
> > > > I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
> > > > bindings, not necessarily the IOMMU drivers framework in Linux (although
> > > > that would be an added bonus.)
> > > >
> > > > I know from previous discussions with you that making the grant table
> > > > fit in the existing IOMMU drivers model is difficult, but just reusing
> > > > the Device Tree bindings seems feasible?
> > >
> > > I started experimenting with that. As wrote in a separate email, I got a
> > > deferred probe timeout,
> > >
> > > after inserting required nodes into guest device tree, which seems to be a
> > > consequence of the unavailability of IOMMU, I will continue to investigate
> > > this question.
> >
> >
> > I have experimented with that. Yes, just reusing the Device Tree bindings is
> > technically feasible (and we are able to do this by only touching
> > grant-dma-ops.c), although deferred probe timeout still stands (as there is no
> > IOMMU driver being present actually).
> >
> > [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, ignoring
> > dependency
> > [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks (2.10
> > GB/1.95 GiB)
> >
> >
> > Below the working diff (on top of current series):
> >
> > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > index da9c7ff..6586152 100644
> > --- a/drivers/xen/grant-dma-ops.c
> > +++ b/drivers/xen/grant-dma-ops.c
> > @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {
> >
> >  bool xen_is_grant_dma_device(struct device *dev)
> >  {
> > +       struct device_node *iommu_np;
> > +       bool has_iommu;
> > +
> >         /* XXX Handle only DT devices for now */
> >         if (!dev->of_node)
> >                 return false;
> >
> > -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
> > +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
> > +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
> > "xen,grant-dma");
> > +       of_node_put(iommu_np);
> > +
> > +       return has_iommu;
> >  }
> >
> >  void xen_grant_setup_dma_ops(struct device *dev)
> >  {
> >         struct xen_grant_dma_data *data;
> > -       uint32_t domid;
> > +       struct of_phandle_args iommu_spec;
> >
> >         data = find_xen_grant_dma_data(dev);
> >         if (data) {
> > @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
> >         if (!dev->of_node)
> >                 goto err;
> >
> > -       if (of_property_read_u32(dev->of_node, "xen,backend-domid", &domid)) {
> > -               dev_err(dev, "xen,backend-domid property is not present\n");
> > +       if (of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells",
> > +                       0, &iommu_spec)) {
> > +               dev_err(dev, "Cannot parse iommus property\n");
> > +               goto err;
> > +       }
> > +
> > +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
> > +                       iommu_spec.args_count != 1) {
> > +               dev_err(dev, "Incompatible IOMMU node\n");
> > +               of_node_put(iommu_spec.np);
> >                 goto err;
> >         }
> >
> > +       of_node_put(iommu_spec.np);
> > +
> >         data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> >         if (!data)
> >                 goto err;
> >
> > -       data->backend_domid = domid;
> > +       /*
> > +        * The endpoint ID here means the ID of the domain where the
> > corresponding
> > +        * backend is running
> > +        */
> > +       data->backend_domid = iommu_spec.args[0];
> >
> >         if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
> >                         GFP_KERNEL))) {
> > (END)
> >
> >
> >
> > Below, the nodes generated by Xen toolstack:
> >
> >         xen_grant_dma {

Nit: iommu {

> >                 compatible = "xen,grant-dma";
> >                 #iommu-cells = <0x01>;
> >                 phandle = <0xfde9>;
> >         };
> >
> >         virtio@2000000 {
> >                 compatible = "virtio,mmio";
> >                 reg = <0x00 0x2000000 0x00 0x200>;
> >                 interrupts = <0x00 0x01 0xf01>;
> >                 interrupt-parent = <0xfde8>;
> >                 dma-coherent;
> >                 iommus = <0xfde9 0x01>;
> >         };
>
> Not bad! I like it.
>
>
> > I am wondering, would be the proper solution to eliminate deferred probe
> > timeout issue in our particular case (without introducing an extra IOMMU
> > driver)?
>
> In reality I don't think there is a way to do that. I would create an
> empty skelethon IOMMU driver for xen,grant-dma.

Does it have to be an empty driver? Originally, IOMMU 'drivers' were not
drivers, but they've been getting converted. Can that be done here?

Short of that, I think we could have some sort of skip probe list for
deferred probe. Not sure if that would be easiest as IOMMU specific or
global.

Rob

2022-05-26 00:40:30

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops


On 24.05.22 20:59, Stefano Stabellini wrote:

Hello Stefano

> On Tue, 24 May 2022, Oleksandr wrote:
>>> On Mon, 23 May 2022, Oleksandr wrote:
>>>>>> On Thu, 19 May 2022, Oleksandr wrote:
>>>>>>>> On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>>>>>>>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>     This would mean having a device
>>>>>>>>>> node for the grant-table mechanism that can be referred to
>>>>>>>>>> using
>>>>>>>>>> the
>>>>>>>>>> 'iommus'
>>>>>>>>>> phandle property, with the domid as an additional argument.
>>>>>>>>> I assume, you are speaking about something like the following?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> xen_dummy_iommu {
>>>>>>>>>       compatible = "xen,dummy-iommu";
>>>>>>>>>       #iommu-cells = <1>;
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> virtio@3000 {
>>>>>>>>>       compatible = "virtio,mmio";
>>>>>>>>>       reg = <0x3000 0x100>;
>>>>>>>>>       interrupts = <41>;
>>>>>>>>>
>>>>>>>>>       /* The device is located in Xen domain with ID 1 */
>>>>>>>>>       iommus = <&xen_dummy_iommu 1>;
>>>>>>>>> };
>>>>>>>> Right, that's that's the idea,
>>>>>>> thank you for the confirmation
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>    except I would not call it a 'dummy'.
>>>>>>>>   From the perspective of the DT, this behaves just like an
>>>>>>>> IOMMU,
>>>>>>>> even if the exact mechanism is different from most hardware IOMMU
>>>>>>>> implementations.
>>>>>>> well, agree
>>>>>>>
>>>>>>>
>>>>>>>>>> It does not quite fit the model that Linux currently uses for
>>>>>>>>>> iommus,
>>>>>>>>>> as that has an allocator for dma_addr_t space
>>>>>>>>> yes (# 3/7 adds grant-table based allocator)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> , but it would think it's
>>>>>>>>>> conceptually close enough that it makes sense for the binding.
>>>>>>>>> Interesting idea. I am wondering, do we need an extra actions
>>>>>>>>> for
>>>>>>>>> this
>>>>>>>>> to work in Linux guest (dummy IOMMU driver, etc)?
>>>>>>>> It depends on how closely the guest implementation can be made to
>>>>>>>> resemble a normal iommu. If you do allocate dma_addr_t addresses,
>>>>>>>> it may actually be close enough that you can just turn the
>>>>>>>> grant-table
>>>>>>>> code into a normal iommu driver and change nothing else.
>>>>>>> Unfortunately, I failed to find a way how use grant references at
>>>>>>> the
>>>>>>> iommu_ops level (I mean to fully pretend that we are an IOMMU
>>>>>>> driver). I
>>>>>>> am
>>>>>>> not too familiar with that, so what is written below might be wrong
>>>>>>> or
>>>>>>> at
>>>>>>> least not precise.
>>>>>>>
>>>>>>> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
>>>>>>> itself, it
>>>>>>> just maps (IOVA-PA) what was requested to be mapped by the upper
>>>>>>> layer.
>>>>>>> The
>>>>>>> DMA address allocation is done by the upper layer (DMA-IOMMU which
>>>>>>> is
>>>>>>> the glue
>>>>>>> layer between DMA API and IOMMU API allocates IOVA for PA?). But,
>>>>>>> all
>>>>>>> what we
>>>>>>> need here is just to allocate our specific grant-table based DMA
>>>>>>> addresses
>>>>>>> (DMA address = grant reference + offset in the page), so let’s say
>>>>>>> we
>>>>>>> need an
>>>>>>> entity to take a physical address as parameter and return a DMA
>>>>>>> address
>>>>>>> (what
>>>>>>> actually commit #3/7 is doing), and that’s all. So working at the
>>>>>>> dma_ops
>>>>>>> layer we get exactly what we need, with the minimal changes to guest
>>>>>>> infrastructure. In our case the Xen itself acts as an IOMMU.
>>>>>>>
>>>>>>> Assuming that we want to reuse the IOMMU infrastructure somehow for
>>>>>>> our
>>>>>>> needs.
>>>>>>> I think, in that case we will likely need to introduce a new
>>>>>>> specific
>>>>>>> IOVA
>>>>>>> allocator (alongside with a generic one) to be hooked up by the
>>>>>>> DMA-IOMMU
>>>>>>> layer if we run on top of Xen. But, even having the specific IOVA
>>>>>>> allocator to
>>>>>>> return what we indeed need (DMA address = grant reference + offset
>>>>>>> in
>>>>>>> the
>>>>>>> page) we will still need the specific minimal required IOMMU driver
>>>>>>> to
>>>>>>> be
>>>>>>> present in the system anyway in order to track the mappings(?) and
>>>>>>> do
>>>>>>> nothing
>>>>>>> with them, returning a success (this specific IOMMU driver should
>>>>>>> have
>>>>>>> all
>>>>>>> mandatory callbacks implemented).
>>>>>>>
>>>>>>> I completely agree, it would be really nice to reuse generic IOMMU
>>>>>>> bindings
>>>>>>> rather than introducing Xen specific property if what we are trying
>>>>>>> to
>>>>>>> implement in current patch series fits in the usage of "iommus" in
>>>>>>> Linux
>>>>>>> more-less. But, if we will have to add more complexity/more
>>>>>>> components
>>>>>>> to the
>>>>>>> code for the sake of reusing device tree binding, this raises a
>>>>>>> question
>>>>>>> whether that’s worthwhile.
>>>>>>>
>>>>>>> Or I really missed something?
>>>>>> I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
>>>>>> bindings, not necessarily the IOMMU drivers framework in Linux
>>>>>> (although
>>>>>> that would be an added bonus.)
>>>>>>
>>>>>> I know from previous discussions with you that making the grant table
>>>>>> fit in the existing IOMMU drivers model is difficult, but just reusing
>>>>>> the Device Tree bindings seems feasible?
>>>>> I started experimenting with that. As wrote in a separate email, I got a
>>>>> deferred probe timeout,
>>>>>
>>>>> after inserting required nodes into guest device tree, which seems to be
>>>>> a
>>>>> consequence of the unavailability of IOMMU, I will continue to
>>>>> investigate
>>>>> this question.
>>>> I have experimented with that. Yes, just reusing the Device Tree bindings
>>>> is
>>>> technically feasible (and we are able to do this by only touching
>>>> grant-dma-ops.c), although deferred probe timeout still stands (as there
>>>> is no
>>>> IOMMU driver being present actually).
>>>>
>>>> [    0.583771] virtio-mmio 2000000.virtio: deferred probe timeout,
>>>> ignoring
>>>> dependency
>>>> [    0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks
>>>> (2.10
>>>> GB/1.95 GiB)
>>>>
>>>>
>>>> Below the working diff (on top of current series):
>>>>
>>>> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
>>>> index da9c7ff..6586152 100644
>>>> --- a/drivers/xen/grant-dma-ops.c
>>>> +++ b/drivers/xen/grant-dma-ops.c
>>>> @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops =
>>>> {
>>>>
>>>>  bool xen_is_grant_dma_device(struct device *dev)
>>>>  {
>>>> +       struct device_node *iommu_np;
>>>> +       bool has_iommu;
>>>> +
>>>>         /* XXX Handle only DT devices for now */
>>>>         if (!dev->of_node)
>>>>                 return false;
>>>>
>>>> -       return of_property_read_bool(dev->of_node, "xen,backend-domid");
>>>> +       iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
>>>> +       has_iommu = iommu_np && of_device_is_compatible(iommu_np,
>>>> "xen,grant-dma");
>>>> +       of_node_put(iommu_np);
>>>> +
>>>> +       return has_iommu;
>>>>  }
>>>>
>>>>  void xen_grant_setup_dma_ops(struct device *dev)
>>>>  {
>>>>         struct xen_grant_dma_data *data;
>>>> -       uint32_t domid;
>>>> +       struct of_phandle_args iommu_spec;
>>>>
>>>>         data = find_xen_grant_dma_data(dev);
>>>>         if (data) {
>>>> @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
>>>>         if (!dev->of_node)
>>>>                 goto err;
>>>>
>>>> -       if (of_property_read_u32(dev->of_node, "xen,backend-domid",
>>>> &domid)) {
>>>> -               dev_err(dev, "xen,backend-domid property is not
>>>> present\n");
>>>> +       if (of_parse_phandle_with_args(dev->of_node, "iommus",
>>>> "#iommu-cells",
>>>> +                       0, &iommu_spec)) {
>>>> +               dev_err(dev, "Cannot parse iommus property\n");
>>>> +               goto err;
>>>> +       }
>>>> +
>>>> +       if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
>>>> +                       iommu_spec.args_count != 1) {
>>>> +               dev_err(dev, "Incompatible IOMMU node\n");
>>>> +               of_node_put(iommu_spec.np);
>>>>                 goto err;
>>>>         }
>>>>
>>>> +       of_node_put(iommu_spec.np);
>>>> +
>>>>         data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>>>>         if (!data)
>>>>                 goto err;
>>>>
>>>> -       data->backend_domid = domid;
>>>> +       /*
>>>> +        * The endpoint ID here means the ID of the domain where the
>>>> corresponding
>>>> +        * backend is running
>>>> +        */
>>>> +       data->backend_domid = iommu_spec.args[0];
>>>>
>>>>         if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev,
>>>> data,
>>>>                         GFP_KERNEL))) {
>>>> (END)
>>>>
>>>>
>>>>
>>>> Below, the nodes generated by Xen toolstack:
>>>>
>>>>         xen_grant_dma {
>>>>                 compatible = "xen,grant-dma";
>>>>                 #iommu-cells = <0x01>;
>>>>                 phandle = <0xfde9>;
>>>>         };
>>>>
>>>>         virtio@2000000 {
>>>>                 compatible = "virtio,mmio";
>>>>                 reg = <0x00 0x2000000 0x00 0x200>;
>>>>                 interrupts = <0x00 0x01 0xf01>;
>>>>                 interrupt-parent = <0xfde8>;
>>>>                 dma-coherent;
>>>>                 iommus = <0xfde9 0x01>;
>>>>         };
>>> Not bad! I like it.
>>
>> Good.
>>
>>
>>
>>>
>>>> I am wondering, would be the proper solution to eliminate deferred probe
>>>> timeout issue in our particular case (without introducing an extra IOMMU
>>>> driver)?
>>> In reality I don't think there is a way to do that. I would create an
>>> empty skelethon IOMMU driver for xen,grant-dma.
>> Ok, I found yet another option how we can avoid deferred probe timeout issue.
>> I am not sure whether it will be welcome. But it doesn't really require
>> introducing stub IOMMU driver or other changes in the guest. The idea is to
>> make IOMMU device unavailable (status = "disabled"), this way
>> of_iommu_configure() will treat that as success condition also.
>>
>> https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L31
>> https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L149
>>
>>         xen_grant_dma {
>>                 compatible = "xen,grant-dma";
>>                 #iommu-cells = <0x01>;
>>                 phandle = <0xfde9>;
>>                 status = "disabled";
>>         };
>>         virtio@2000000 {
>>                 compatible = "virtio,mmio";
>>                 reg = <0x00 0x2000000 0x00 0x200>;
>>                 interrupts = <0x00 0x01 0xf01>;
>>                 interrupt-parent = <0xfde8>;
>>                 dma-coherent;
>>                 iommus = <0xfde9 0x01>;
>>         };
>>
>> I have checked, this "fixes" deferred probe timeout issue.
>>
>>
>> Or we indeed need to introduce stub IOMMU driver (I placed it to driver/xen
>> instead of driver/iommu, also we can even squash it with grant-dma-ops.c?).
>> This stub driver also results in NO_IOMMU condition (as "of_xlate" callback is
>> not implemented).
>>
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index a7bd8ce..35b91b9 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -335,6 +335,10 @@ config XEN_UNPOPULATED_ALLOC
>>           having to balloon out RAM regions in order to obtain physical memory
>>           space to create such mappings.
>>
>> +config XEN_GRANT_DMA_IOMMU
>> +       bool
>> +       select IOMMU_API
>> +
>>  config XEN_GRANT_DMA_OPS
>>         bool
>>         select DMA_OPS
>> @@ -343,6 +347,7 @@ config XEN_VIRTIO
>>         bool "Xen virtio support"
>>         depends on VIRTIO
>>         select XEN_GRANT_DMA_OPS
>> +       select XEN_GRANT_DMA_IOMMU
>>         help
>>           Enable virtio support for running as Xen guest. Depending on the
>>           guest type this will require special support on the backend side
>> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
>> index 1a23cb0..c0503f1 100644
>> --- a/drivers/xen/Makefile
>> +++ b/drivers/xen/Makefile
>> @@ -40,3 +40,4 @@ xen-privcmd-y                         := privcmd.o
>> privcmd-buf.o
>>  obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF)    += xen-front-pgdir-shbuf.o
>>  obj-$(CONFIG_XEN_UNPOPULATED_ALLOC)    += unpopulated-alloc.o
>>  obj-$(CONFIG_XEN_GRANT_DMA_OPS)                += grant-dma-ops.o
>> +obj-$(CONFIG_XEN_GRANT_DMA_IOMMU)      += grant-dma-iommu.o
>> diff --git a/drivers/xen/grant-dma-iommu.c b/drivers/xen/grant-dma-iommu.c
>> new file mode 100644
>> index 00000000..b8aad8a
>> --- /dev/null
>> +++ b/drivers/xen/grant-dma-iommu.c
>> @@ -0,0 +1,76 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Stub IOMMU driver which does nothing.
>> + * The main purpose of it being present is to reuse generic device-tree IOMMU
>> + * bindings by Xen grant DMA-mapping layer.
>> + */
>> +
>> +#include <linux/iommu.h>
>> +#include <linux/of.h>
>> +#include <linux/platform_device.h>
>> +
>> +struct grant_dma_iommu_device {
>> +       struct device *dev;
>> +       struct iommu_device iommu;
>> +};
>> +
>> +/* Nothing is really needed here */
>> +static const struct iommu_ops grant_dma_iommu_ops;
>> +
>> +static const struct of_device_id grant_dma_iommu_of_match[] = {
>> +       { .compatible = "xen,grant-dma" },
>> +       { },
>> +};
>> +
>> +static int grant_dma_iommu_probe(struct platform_device *pdev)
>> +{
>> +       struct grant_dma_iommu_device *mmu;
>> +       int ret;
>> +
>> +       mmu = devm_kzalloc(&pdev->dev, sizeof(*mmu), GFP_KERNEL);
>> +       if (!mmu)
>> +               return -ENOMEM;
>> +
>> +       mmu->dev = &pdev->dev;
>> +
>> +       ret = iommu_device_register(&mmu->iommu, &grant_dma_iommu_ops,
>> &pdev->dev);
>> +       if (ret)
>> +               return ret;
>> +
>> +       platform_set_drvdata(pdev, mmu);
>> +
>> +       return 0;
>> +}
>> +
>> +static int grant_dma_iommu_remove(struct platform_device *pdev)
>> +{
>> +       struct grant_dma_iommu_device *mmu = platform_get_drvdata(pdev);
>> +
>> +       platform_set_drvdata(pdev, NULL);
>> +       iommu_device_unregister(&mmu->iommu);
>> +
>> +       return 0;
>> +}
>> +
>> +static struct platform_driver grant_dma_iommu_driver = {
>> +       .driver = {
>> +               .name = "grant-dma-iommu",
>> +               .of_match_table = grant_dma_iommu_of_match,
>> +       },
>> +       .probe = grant_dma_iommu_probe,
>> +       .remove = grant_dma_iommu_remove,
>> +};
>> +
>> +static int __init grant_dma_iommu_init(void)
>> +{
>> +       struct device_node *iommu_np;
>> +
>> +       iommu_np = of_find_matching_node(NULL, grant_dma_iommu_of_match);
>> +       if (!iommu_np)
>> +               return 0;
>> +
>> +       of_node_put(iommu_np);
>> +
>> +       return platform_driver_register(&grant_dma_iommu_driver);
>> +}
>> +subsys_initcall(grant_dma_iommu_init);
>>
>> I have checked, this also "fixes" deferred probe timeout issue.
>>
>> Personally I would prefer the first option, but I would be also happy to use
>> second option in order to unblock the series.
>>
>> What do the maintainers think?
>
>
> I don't think it is a good idea to mark the fake IOMMU as disabled
> because it implies that there is no need to use it (no need to use
> dma_ops) which is a problem.

I got your point. You are right, this indeed sounds weird. I expected
this simple solution wouldn't be welcome.


>
> If we don't want the skelethon driver then Rob's suggestion of having a
> skip list for deferred probe is better.


I am not sure I understand the idea completely.

Does it mean that we will need new command line option (?) to pass some
string (I assume, the "compatible" for IOMMU device) to not defer probe
if corresponding driver is missing,

but just return NO_IOMMU right away?


>
> I think the skelethon driver also is totally fine.

ok, thank you for the feedback.


--
Regards,

Oleksandr Tyshchenko


2022-05-27 11:57:37

by Oleksandr Tyshchenko

[permalink] [raw]
Subject: Re: [PATCH V2 5/7] dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops


On 24.05.22 21:34, Saravana Kannan wrote:

Hello all

> "
>
> On Tue, May 24, 2022 at 9:01 AM Rob Herring <[email protected]> wrote:
>> +Saravana
>>
>> On Mon, May 23, 2022 at 06:58:13PM -0700, Stefano Stabellini wrote:
>>> On Mon, 23 May 2022, Oleksandr wrote:
>>>>>> On Thu, 19 May 2022, Oleksandr wrote:
>>>>>>>> On Wed, May 18, 2022 at 5:06 PM Oleksandr <[email protected]> wrote:
>>>>>>>>> On 18.05.22 17:32, Arnd Bergmann wrote:
>>>>>>>>>> On Sat, May 7, 2022 at 7:19 PM Oleksandr Tyshchenko
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> This would mean having a device
>>>>>>>>>> node for the grant-table mechanism that can be referred to using
>>>>>>>>>> the
>>>>>>>>>> 'iommus'
>>>>>>>>>> phandle property, with the domid as an additional argument.
>>>>>>>>> I assume, you are speaking about something like the following?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> xen_dummy_iommu {
>>>>>>>>> compatible = "xen,dummy-iommu";
>>>>>>>>> #iommu-cells = <1>;
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> virtio@3000 {
>>>>>>>>> compatible = "virtio,mmio";
>>>>>>>>> reg = <0x3000 0x100>;
>>>>>>>>> interrupts = <41>;
>>>>>>>>>
>>>>>>>>> /* The device is located in Xen domain with ID 1 */
>>>>>>>>> iommus = <&xen_dummy_iommu 1>;
>>>>>>>>> };
>>>>>>>> Right, that's that's the idea,
>>>>>>> thank you for the confirmation
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> except I would not call it a 'dummy'.
>>>>>>>> From the perspective of the DT, this behaves just like an IOMMU,
>>>>>>>> even if the exact mechanism is different from most hardware IOMMU
>>>>>>>> implementations.
>>>>>>> well, agree
>>>>>>>
>>>>>>>
>>>>>>>>>> It does not quite fit the model that Linux currently uses for
>>>>>>>>>> iommus,
>>>>>>>>>> as that has an allocator for dma_addr_t space
>>>>>>>>> yes (# 3/7 adds grant-table based allocator)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> , but it would think it's
>>>>>>>>>> conceptually close enough that it makes sense for the binding.
>>>>>>>>> Interesting idea. I am wondering, do we need an extra actions for
>>>>>>>>> this
>>>>>>>>> to work in Linux guest (dummy IOMMU driver, etc)?
>>>>>>>> It depends on how closely the guest implementation can be made to
>>>>>>>> resemble a normal iommu. If you do allocate dma_addr_t addresses,
>>>>>>>> it may actually be close enough that you can just turn the grant-table
>>>>>>>> code into a normal iommu driver and change nothing else.
>>>>>>> Unfortunately, I failed to find a way how use grant references at the
>>>>>>> iommu_ops level (I mean to fully pretend that we are an IOMMU driver). I
>>>>>>> am
>>>>>>> not too familiar with that, so what is written below might be wrong or
>>>>>>> at
>>>>>>> least not precise.
>>>>>>>
>>>>>>> The normal IOMMU driver in Linux doesn’t allocate DMA addresses by
>>>>>>> itself, it
>>>>>>> just maps (IOVA-PA) what was requested to be mapped by the upper layer.
>>>>>>> The
>>>>>>> DMA address allocation is done by the upper layer (DMA-IOMMU which is
>>>>>>> the glue
>>>>>>> layer between DMA API and IOMMU API allocates IOVA for PA?). But, all
>>>>>>> what we
>>>>>>> need here is just to allocate our specific grant-table based DMA
>>>>>>> addresses
>>>>>>> (DMA address = grant reference + offset in the page), so let’s say we
>>>>>>> need an
>>>>>>> entity to take a physical address as parameter and return a DMA address
>>>>>>> (what
>>>>>>> actually commit #3/7 is doing), and that’s all. So working at the
>>>>>>> dma_ops
>>>>>>> layer we get exactly what we need, with the minimal changes to guest
>>>>>>> infrastructure. In our case the Xen itself acts as an IOMMU.
>>>>>>>
>>>>>>> Assuming that we want to reuse the IOMMU infrastructure somehow for our
>>>>>>> needs.
>>>>>>> I think, in that case we will likely need to introduce a new specific
>>>>>>> IOVA
>>>>>>> allocator (alongside with a generic one) to be hooked up by the
>>>>>>> DMA-IOMMU
>>>>>>> layer if we run on top of Xen. But, even having the specific IOVA
>>>>>>> allocator to
>>>>>>> return what we indeed need (DMA address = grant reference + offset in
>>>>>>> the
>>>>>>> page) we will still need the specific minimal required IOMMU driver to
>>>>>>> be
>>>>>>> present in the system anyway in order to track the mappings(?) and do
>>>>>>> nothing
>>>>>>> with them, returning a success (this specific IOMMU driver should have
>>>>>>> all
>>>>>>> mandatory callbacks implemented).
>>>>>>>
>>>>>>> I completely agree, it would be really nice to reuse generic IOMMU
>>>>>>> bindings
>>>>>>> rather than introducing Xen specific property if what we are trying to
>>>>>>> implement in current patch series fits in the usage of "iommus" in Linux
>>>>>>> more-less. But, if we will have to add more complexity/more components
>>>>>>> to the
>>>>>>> code for the sake of reusing device tree binding, this raises a question
>>>>>>> whether that’s worthwhile.
>>>>>>>
>>>>>>> Or I really missed something?
>>>>>> I think Arnd was primarily suggesting to reuse the IOMMU Device Tree
>>>>>> bindings, not necessarily the IOMMU drivers framework in Linux (although
>>>>>> that would be an added bonus.)
>>>>>>
>>>>>> I know from previous discussions with you that making the grant table
>>>>>> fit in the existing IOMMU drivers model is difficult, but just reusing
>>>>>> the Device Tree bindings seems feasible?
>>>>> I started experimenting with that. As wrote in a separate email, I got a
>>>>> deferred probe timeout,
>>>>>
>>>>> after inserting required nodes into guest device tree, which seems to be a
>>>>> consequence of the unavailability of IOMMU, I will continue to investigate
>>>>> this question.
>>>>
>>>> I have experimented with that. Yes, just reusing the Device Tree bindings is
>>>> technically feasible (and we are able to do this by only touching
>>>> grant-dma-ops.c), although deferred probe timeout still stands (as there is no
>>>> IOMMU driver being present actually).
>>>>
>>>> [ 0.583771] virtio-mmio 2000000.virtio: deferred probe timeout, ignoring
>>>> dependency
>>>> [ 0.615556] virtio_blk virtio0: [vda] 4096000 512-byte logical blocks (2.10
>>>> GB/1.95 GiB)
>>>>
>>>>
>>>> Below the working diff (on top of current series):
>>>>
>>>> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
>>>> index da9c7ff..6586152 100644
>>>> --- a/drivers/xen/grant-dma-ops.c
>>>> +++ b/drivers/xen/grant-dma-ops.c
>>>> @@ -272,17 +272,24 @@ static const struct dma_map_ops xen_grant_dma_ops = {
>>>>
>>>> bool xen_is_grant_dma_device(struct device *dev)
>>>> {
>>>> + struct device_node *iommu_np;
>>>> + bool has_iommu;
>>>> +
>>>> /* XXX Handle only DT devices for now */
>>>> if (!dev->of_node)
>>>> return false;
>>>>
>>>> - return of_property_read_bool(dev->of_node, "xen,backend-domid");
>>>> + iommu_np = of_parse_phandle(dev->of_node, "iommus", 0);
>>>> + has_iommu = iommu_np && of_device_is_compatible(iommu_np,
>>>> "xen,grant-dma");
>>>> + of_node_put(iommu_np);
>>>> +
>>>> + return has_iommu;
>>>> }
>>>>
>>>> void xen_grant_setup_dma_ops(struct device *dev)
>>>> {
>>>> struct xen_grant_dma_data *data;
>>>> - uint32_t domid;
>>>> + struct of_phandle_args iommu_spec;
>>>>
>>>> data = find_xen_grant_dma_data(dev);
>>>> if (data) {
>>>> @@ -294,16 +301,30 @@ void xen_grant_setup_dma_ops(struct device *dev)
>>>> if (!dev->of_node)
>>>> goto err;
>>>>
>>>> - if (of_property_read_u32(dev->of_node, "xen,backend-domid", &domid)) {
>>>> - dev_err(dev, "xen,backend-domid property is not present\n");
>>>> + if (of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells",
>>>> + 0, &iommu_spec)) {
>>>> + dev_err(dev, "Cannot parse iommus property\n");
>>>> + goto err;
>>>> + }
>>>> +
>>>> + if (!of_device_is_compatible(iommu_spec.np, "xen,grant-dma") ||
>>>> + iommu_spec.args_count != 1) {
>>>> + dev_err(dev, "Incompatible IOMMU node\n");
>>>> + of_node_put(iommu_spec.np);
>>>> goto err;
>>>> }
>>>>
>>>> + of_node_put(iommu_spec.np);
>>>> +
>>>> data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>>>> if (!data)
>>>> goto err;
>>>>
>>>> - data->backend_domid = domid;
>>>> + /*
>>>> + * The endpoint ID here means the ID of the domain where the
>>>> corresponding
>>>> + * backend is running
>>>> + */
>>>> + data->backend_domid = iommu_spec.args[0];
>>>>
>>>> if (xa_err(xa_store(&xen_grant_dma_devices, (unsigned long)dev, data,
>>>> GFP_KERNEL))) {
>>>> (END)
>>>>
>>>>
>>>>
>>>> Below, the nodes generated by Xen toolstack:
>>>>
>>>> xen_grant_dma {
>> Nit: iommu {
>>
>>>> compatible = "xen,grant-dma";
>>>> #iommu-cells = <0x01>;
>>>> phandle = <0xfde9>;
>>>> };
>>>>
>>>> virtio@2000000 {
>>>> compatible = "virtio,mmio";
>>>> reg = <0x00 0x2000000 0x00 0x200>;
>>>> interrupts = <0x00 0x01 0xf01>;
>>>> interrupt-parent = <0xfde8>;
>>>> dma-coherent;
>>>> iommus = <0xfde9 0x01>;
>>>> };
>>> Not bad! I like it.
>>>
>>>
>>>> I am wondering, would be the proper solution to eliminate deferred probe
>>>> timeout issue in our particular case (without introducing an extra IOMMU
>>>> driver)?
>>> In reality I don't think there is a way to do that. I would create an
>>> empty skelethon IOMMU driver for xen,grant-dma.
>> Does it have to be an empty driver? Originally, IOMMU 'drivers' were not
>> drivers, but they've been getting converted. Can that be done here?
>>
>> Short of that, I think we could have some sort of skip probe list for
>> deferred probe. Not sure if that would be easiest as IOMMU specific or
>> global.
> Hi Oleksandr,
>
> If you do fw_devlink.strict=1, you'll notice that the consumers of
> this "iommu" won't probe at all or will delay the boot by some number
> of seconds. The eventual goal is to go towards fw_devlink.strict=1
> being the default.

ok, I got it.

Let's me please explain our particular case in details, sorry I may
repeat some information which I have already mentioned elsewhere, but it
maybe better to keep the whole context here.

We have Xen grant DMA-mapping layer added by previous commit [1]. For it
to operate properly we need a way to communicate some per-device
information using device-tree,
and this information is Xen specific. This is what the current commit is
doing by introducing new binding to describe that. The next commit [2]
will use that new binding to retrieve required information. There was a
suggestion to consider reusing generic device-tree IOMMU bindings to
communicate this specific information instead of introducing a custom
property.

Although it requires more effort for the Xen toolstack (instead of
adding a code to insert a single "xen,backend-domid" property, we need
to generate fake IOMMU node, reserve phandle for it, etc), from the
device tree PoV it looks indeed good (we reuse endpoint ID to pass the
ID of the domain where the corresponding backend is running), and
resulting code to retrieve this information in our DMA-mapping layer
also looks simple enough [3].

Using generic device-tree IOMMU bindings:

         iommu {
                 compatible = "xen,grant-dma";
                 #iommu-cells = <0x01>;
                 phandle = <0xfde9>;
         };
         virtio@2000000 {
                 compatible = "virtio,mmio";
                 reg = <0x00 0x2000000 0x00 0x200>;
                 interrupts = <0x00 0x01 0xf01>;
                 interrupt-parent = <0xfde8>;
                 dma-coherent;
                 iommus = <0xfde9 0x01>;
         };

Using Xen specific property:

         virtio@2000000 {
                 compatible = "virtio,mmio";
                 reg = <0x00 0x2000000 0x00 0x200>;
                 interrupts = <0x00 0x01 0xf01>;
                 interrupt-parent = <0xfde8>;
                 dma-coherent;
                 xen,backend-domid = <0x01>;
         };


The main problem is that idea doesn't quite fit into how Linux currently
behaves for the "iommus" property. Of course, just reusing IOMMU
bindings (without having a corresponding driver) leads to the deferred
probe timeout issue afterwards, because the IOMMU device never becomes
available. From my understanding, our DMA-mapping layer we are consider
to reuse IOMMU bindings for, *cannot* be converted into the proper IOMMU
driver.

Sure, we will need to find a way how to deal with it, if we really want
to reuse the IOMMU bindings. And yes, one idea was to just implement
stub IOMMU driver for that purpose, I have rechecked, it works fine with
that stub driver [4].

>
> From a fw_devlik perspective, please implement a driver. Ideally a
> real one, but at least an empty one. The empty one doesn't need to be
> an IOMMU driver, but at least just do a return 0 in the probe
> function.


If I got things right, I am afraid, for the "of_iommu" case the empty
driver is not enough. The driver should at least register iommu_ops, but
the "of_xlate" callback should be *not* implemented.
In that case, we will get NO_IOMMU (>0 : there is no IOMMU, or one was
unavailable for non-fatal reasons) which is also a success condition, so
-EPROBE_DEFER won't be returned.

https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L32

Otherwise, of_iommu_xlate() will call driver_deferred_probe_check_state().

https://elixir.bootlin.com/linux/v5.18/source/drivers/iommu/of_iommu.c#L43

> Also, if it's not a device, why even have a "compatible"
> property (removing it won't necessarily remove the deferred probe
> timeout issue you see)? Will any code be using "xen,grant-dma" to look
> up the node?

Yes


> If so, that driver could be the one that probes this
> device. At least from a fw_devlink perspective, it just needs to have
> a driver that binds to this device.

Agree


>
> Also, if we aren't going to implement a driver and have the supplier
> ("xen,grant-dma") behave like a device (as in, have a driver that
> probes), I'd rather that the iommu binding not be used at all as this
> would be an exception to how every other iommu device behaves.

Agree


Saravana, thank you for the explanation.


To summarize, as I understand, we have three options (the first two are
clear enough, the third is unclear yet):

1. Do not try to reuse IOMMU bindings for current xen-virtio enabling
work, use "xen,backend-domid" property.
2. Reuse IOMMU bindings, for that purpose introduce stub IOMMU driver.
It is a standalone entity in my example, but it can be a part of
grant-dma-ops.c which
actually uses "xen,grant-dma" compatible to look up a node.
3. Try to find other options how to reuse IOMMU bindings but *without*
introducing stub IOMMU driver, such as skip list for deferred probe, etc.


What do the maintainers think regarding the option to go forward?


[1]
https://lore.kernel.org/xen-devel/[email protected]/

[2]
https://lore.kernel.org/xen-devel/[email protected]/

[3]
https://lore.kernel.org/xen-devel/[email protected]/

[4]
https://lore.kernel.org/xen-devel/[email protected]/

>
> -Saravana

--
Regards,

Oleksandr Tyshchenko