From: Oleksandr Andrushchenko <[email protected]>
This work is in response to my previous attempt to introduce Xen/DRM
zero-copy driver [1] to enable Linux dma-buf API [2] for Xen based
frontends/backends. There is also an existing hyper_dmabuf approach
available [3] which, if reworked to utilize the proposed solution,
can greatly benefit as well.
RFC for this series was published and discussed [9], comments addressed.
The original rationale behind this work was to enable zero-copying
use-cases while working with Xen para-virtual display driver [4]:
when using Xen PV DRM frontend driver then on backend side one will
need to do copying of display buffers' contents (filled by the
frontend's user-space) into buffers allocated at the backend side.
Taking into account the size of display buffers and frames per
second it may result in unneeded huge data bus occupation and
performance loss.
The helper driver [4] allows implementing zero-copying use-cases
when using Xen para-virtualized frontend display driver by implementing
a DRM/KMS helper driver running on backend's side.
It utilizes PRIME buffers API (implemented on top of Linux dma-buf)
to share frontend's buffers with physical device drivers on
backend's side:
- a dumb buffer created on backend's side can be shared
with the Xen PV frontend driver, so it directly writes
into backend's domain memory (into the buffer exported from
DRM/KMS driver of a physical display device)
- a dumb buffer allocated by the frontend can be imported
into physical device DRM/KMS driver, thus allowing to
achieve no copying as well
Finally, it was discussed and decided ([1], [5]) that it is worth
implementing such use-cases via extension of the existing Xen gntdev
driver instead of introducing new DRM specific driver.
Please note, that the support of dma-buf is Linux only,
as dma-buf is a Linux only thing.
Now to the proposed solution. The changes to the existing Xen drivers
in the Linux kernel fall into 2 categories:
1. DMA-able memory buffer allocation and increasing/decreasing memory
reservation of the pages of such a buffer.
This is required if we are about to share dma-buf with the hardware
that does require those to be allocated with dma_alloc_xxx API.
(It is still possible to allocate a dma-buf from any system memory,
e.g. system pages).
2. Extension of the gntdev driver to enable it to import/export dma-buf’s.
The first four patches are in preparation for Xen dma-buf support,
but I consider those usable regardless of the dma-buf use-case,
e.g. other frontend/backend kernel modules may also benefit from these
for better code reuse:
0001-xen-grant-table-Make-set-clear-page-private-code-sha.patch
0002-xen-balloon-Move-common-memory-reservation-routines-.patch
0003-xen-grant-table-Allow-allocating-buffers-suitable-fo.patch
0004-xen-gntdev-Allow-mappings-for-DMA-buffers.patch
The next three patches are Xen implementation of dma-buf as part of
the grant device:
0005-xen-gntdev-Add-initial-support-for-dma-buf-UAPI.patch
0006-xen-gntdev-Implement-dma-buf-export-functionality.patch
0007-xen-gntdev-Implement-dma-buf-import-functionality.patch
The last patch makes it possible for in-kernel use of Xen dma-buf API:
0008-xen-gntdev-Expose-gntdev-s-dma-buf-API-for-in-kernel.patch
The corresponding libxengnttab changes are available at [6].
All the above was tested with display backend [7] and its accompanying
helper library [8] on Renesas ARM64 based board.
*To all the communities*: I would like to ask you to review the proposed
solution and give feedback on it, so I can improve and send final
patches for review (this is still work in progress, but enough to start
discussing the implementation).
Thank you in advance,
Oleksandr Andrushchenko
[1] https://lists.freedesktop.org/archives/dri-devel/2018-April/173163.html
[2] https://elixir.bootlin.com/linux/v4.17-rc5/source/Documentation/driver-api/dma-buf.rst
[3] https://lists.xenproject.org/archives/html/xen-devel/2018-02/msg01202.html
[4] https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/xen
[5] https://patchwork.kernel.org/patch/10279681/
[6] https://github.com/andr2000/xen/tree/xen_dma_buf_v1
[7] https://github.com/andr2000/displ_be/tree/xen_dma_buf_v1
[8] https://github.com/andr2000/libxenbe/tree/xen_dma_buf_v1
[9] https://lkml.org/lkml/2018/5/17/215
Oleksandr Andrushchenko (8):
xen/grant-table: Make set/clear page private code shared
xen/balloon: Move common memory reservation routines to a module
xen/grant-table: Allow allocating buffers suitable for DMA
xen/gntdev: Allow mappings for DMA buffers
xen/gntdev: Add initial support for dma-buf UAPI
xen/gntdev: Implement dma-buf export functionality
xen/gntdev: Implement dma-buf import functionality
xen/gntdev: Expose gntdev's dma-buf API for in-kernel use
drivers/xen/Kconfig | 23 +
drivers/xen/Makefile | 1 +
drivers/xen/balloon.c | 71 +--
drivers/xen/gntdev.c | 1025 ++++++++++++++++++++++++++++++++-
drivers/xen/grant-table.c | 176 +++++-
drivers/xen/mem-reservation.c | 134 +++++
include/uapi/xen/gntdev.h | 106 ++++
include/xen/grant_dev.h | 37 ++
include/xen/grant_table.h | 28 +
include/xen/mem_reservation.h | 29 +
10 files changed, 1527 insertions(+), 103 deletions(-)
create mode 100644 drivers/xen/mem-reservation.c
create mode 100644 include/xen/grant_dev.h
create mode 100644 include/xen/mem_reservation.h
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Allow creating grant device context for use by kernel modules which
require functionality, provided by gntdev. Export symbols for dma-buf
API provided by the module.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/gntdev.c | 116 ++++++++++++++++++++++++++--------------
include/xen/grant_dev.h | 37 +++++++++++++
2 files changed, 113 insertions(+), 40 deletions(-)
create mode 100644 include/xen/grant_dev.h
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index d8b6168f2cd9..912056f3e909 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -684,14 +684,33 @@ static const struct mmu_notifier_ops gntdev_mmu_ops = {
/* ------------------------------------------------------------------ */
-static int gntdev_open(struct inode *inode, struct file *flip)
+void gntdev_free_context(struct gntdev_priv *priv)
+{
+ struct grant_map *map;
+
+ pr_debug("priv %p\n", priv);
+
+ mutex_lock(&priv->lock);
+ while (!list_empty(&priv->maps)) {
+ map = list_entry(priv->maps.next, struct grant_map, next);
+ list_del(&map->next);
+ gntdev_put_map(NULL /* already removed */, map);
+ }
+ WARN_ON(!list_empty(&priv->freeable_maps));
+
+ mutex_unlock(&priv->lock);
+
+ kfree(priv);
+}
+EXPORT_SYMBOL(gntdev_free_context);
+
+struct gntdev_priv *gntdev_alloc_context(struct device *dev)
{
struct gntdev_priv *priv;
- int ret = 0;
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&priv->maps);
INIT_LIST_HEAD(&priv->freeable_maps);
@@ -704,6 +723,32 @@ static int gntdev_open(struct inode *inode, struct file *flip)
INIT_LIST_HEAD(&priv->dmabuf_imp_list);
#endif
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ priv->dma_dev = dev;
+
+ /*
+ * The device is not spawn from a device tree, so arch_setup_dma_ops
+ * is not called, thus leaving the device with dummy DMA ops.
+ * Fix this call of_dma_configure() with a NULL node to set
+ * default DMA ops.
+ */
+ of_dma_configure(priv->dma_dev, NULL);
+#endif
+ pr_debug("priv %p\n", priv);
+
+ return priv;
+}
+EXPORT_SYMBOL(gntdev_alloc_context);
+
+static int gntdev_open(struct inode *inode, struct file *flip)
+{
+ struct gntdev_priv *priv;
+ int ret = 0;
+
+ priv = gntdev_alloc_context(gntdev_miscdev.this_device);
+ if (IS_ERR(priv))
+ return PTR_ERR(priv);
+
if (use_ptemod) {
priv->mm = get_task_mm(current);
if (!priv->mm) {
@@ -716,23 +761,11 @@ static int gntdev_open(struct inode *inode, struct file *flip)
}
if (ret) {
- kfree(priv);
+ gntdev_free_context(priv);
return ret;
}
flip->private_data = priv;
-#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
- priv->dma_dev = gntdev_miscdev.this_device;
-
- /*
- * The device is not spawn from a device tree, so arch_setup_dma_ops
- * is not called, thus leaving the device with dummy DMA ops.
- * Fix this call of_dma_configure() with a NULL node to set
- * default DMA ops.
- */
- of_dma_configure(priv->dma_dev, NULL);
-#endif
- pr_debug("priv %p\n", priv);
return 0;
}
@@ -740,22 +773,11 @@ static int gntdev_open(struct inode *inode, struct file *flip)
static int gntdev_release(struct inode *inode, struct file *flip)
{
struct gntdev_priv *priv = flip->private_data;
- struct grant_map *map;
-
- pr_debug("priv %p\n", priv);
-
- mutex_lock(&priv->lock);
- while (!list_empty(&priv->maps)) {
- map = list_entry(priv->maps.next, struct grant_map, next);
- list_del(&map->next);
- gntdev_put_map(NULL /* already removed */, map);
- }
- WARN_ON(!list_empty(&priv->freeable_maps));
- mutex_unlock(&priv->lock);
if (use_ptemod)
mmu_notifier_unregister(&priv->mn, priv->mm);
- kfree(priv);
+
+ gntdev_free_context(priv);
return 0;
}
@@ -1210,7 +1232,7 @@ dmabuf_exp_wait_obj_get_by_fd(struct gntdev_priv *priv, int fd)
return ret;
}
-static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
+int gntdev_dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
int wait_to_ms)
{
struct xen_dmabuf *xen_dmabuf;
@@ -1242,6 +1264,7 @@ static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
dmabuf_exp_wait_obj_free(priv, obj);
return ret;
}
+EXPORT_SYMBOL(gntdev_dmabuf_exp_wait_released);
/* ------------------------------------------------------------------ */
/* DMA buffer export support. */
@@ -1511,7 +1534,7 @@ dmabuf_exp_alloc_backing_storage(struct gntdev_priv *priv, int dmabuf_flags,
return map;
}
-static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
+int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
int count, u32 domid, u32 *refs, u32 *fd)
{
struct grant_map *map;
@@ -1557,6 +1580,7 @@ static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
gntdev_remove_map(priv, map);
return ret;
}
+EXPORT_SYMBOL(gntdev_dmabuf_exp_from_refs);
/* ------------------------------------------------------------------ */
/* DMA buffer import support. */
@@ -1646,8 +1670,9 @@ static struct xen_dmabuf *dmabuf_imp_alloc_storage(int count)
return ERR_PTR(-ENOMEM);
}
-static struct xen_dmabuf *
-dmabuf_imp_to_refs(struct gntdev_priv *priv, int fd, int count, int domid)
+struct xen_dmabuf *
+gntdev_dmabuf_imp_to_refs(struct gntdev_priv *priv, int fd,
+ int count, int domid)
{
struct xen_dmabuf *xen_dmabuf, *ret;
struct dma_buf *dma_buf;
@@ -1736,6 +1761,16 @@ dmabuf_imp_to_refs(struct gntdev_priv *priv, int fd, int count, int domid)
dma_buf_put(dma_buf);
return ret;
}
+EXPORT_SYMBOL(gntdev_dmabuf_imp_to_refs);
+
+u32 *gntdev_dmabuf_imp_get_refs(struct xen_dmabuf *xen_dmabuf)
+{
+ if (xen_dmabuf)
+ return xen_dmabuf->u.imp.refs;
+
+ return NULL;
+}
+EXPORT_SYMBOL(gntdev_dmabuf_imp_get_refs);
/*
* Find the hyper dma-buf by its file descriptor and remove
@@ -1759,7 +1794,7 @@ dmabuf_imp_find_unlink(struct gntdev_priv *priv, int fd)
return ret;
}
-static int dmabuf_imp_release(struct gntdev_priv *priv, u32 fd)
+int gntdev_dmabuf_imp_release(struct gntdev_priv *priv, u32 fd)
{
struct xen_dmabuf *xen_dmabuf;
struct dma_buf_attachment *attach;
@@ -1785,6 +1820,7 @@ static int dmabuf_imp_release(struct gntdev_priv *priv, u32 fd)
dmabuf_imp_free_storage(xen_dmabuf);
return 0;
}
+EXPORT_SYMBOL(gntdev_dmabuf_imp_release);
/* ------------------------------------------------------------------ */
/* DMA buffer IOCTL support. */
@@ -1810,8 +1846,8 @@ gntdev_ioctl_dmabuf_exp_from_refs(struct gntdev_priv *priv,
goto out;
}
- ret = dmabuf_exp_from_refs(priv, op.flags, op.count,
- op.domid, refs, &op.fd);
+ ret = gntdev_dmabuf_exp_from_refs(priv, op.flags, op.count,
+ op.domid, refs, &op.fd);
if (ret)
goto out;
@@ -1832,7 +1868,7 @@ gntdev_ioctl_dmabuf_exp_wait_released(struct gntdev_priv *priv,
if (copy_from_user(&op, u, sizeof(op)) != 0)
return -EFAULT;
- return dmabuf_exp_wait_released(priv, op.fd, op.wait_to_ms);
+ return gntdev_dmabuf_exp_wait_released(priv, op.fd, op.wait_to_ms);
}
static long
@@ -1846,7 +1882,7 @@ gntdev_ioctl_dmabuf_imp_to_refs(struct gntdev_priv *priv,
if (copy_from_user(&op, u, sizeof(op)) != 0)
return -EFAULT;
- xen_dmabuf = dmabuf_imp_to_refs(priv, op.fd, op.count, op.domid);
+ xen_dmabuf = gntdev_dmabuf_imp_to_refs(priv, op.fd, op.count, op.domid);
if (IS_ERR(xen_dmabuf))
return PTR_ERR(xen_dmabuf);
@@ -1858,7 +1894,7 @@ gntdev_ioctl_dmabuf_imp_to_refs(struct gntdev_priv *priv,
return 0;
out_release:
- dmabuf_imp_release(priv, op.fd);
+ gntdev_dmabuf_imp_release(priv, op.fd);
return ret;
}
@@ -1871,7 +1907,7 @@ gntdev_ioctl_dmabuf_imp_release(struct gntdev_priv *priv,
if (copy_from_user(&op, u, sizeof(op)) != 0)
return -EFAULT;
- return dmabuf_imp_release(priv, op.fd);
+ return gntdev_dmabuf_imp_release(priv, op.fd);
}
#endif
diff --git a/include/xen/grant_dev.h b/include/xen/grant_dev.h
new file mode 100644
index 000000000000..faccc9170174
--- /dev/null
+++ b/include/xen/grant_dev.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+/*
+ * Grant device kernel API
+ *
+ * Copyright (C) 2018 EPAM Systems Inc.
+ *
+ * Author: Oleksandr Andrushchenko <[email protected]>
+ */
+
+#ifndef _GRANT_DEV_H
+#define _GRANT_DEV_H
+
+#include <linux/types.h>
+
+struct device;
+struct gntdev_priv;
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+struct xen_dmabuf;
+#endif
+
+struct gntdev_priv *gntdev_alloc_context(struct device *dev);
+void gntdev_free_context(struct gntdev_priv *priv);
+
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
+ int count, u32 domid, u32 *refs, u32 *fd);
+int gntdev_dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
+ int wait_to_ms);
+
+struct xen_dmabuf *gntdev_dmabuf_imp_to_refs(struct gntdev_priv *priv,
+ int fd, int count, int domid);
+u32 *gntdev_dmabuf_imp_get_refs(struct xen_dmabuf *xen_dmabuf);
+int gntdev_dmabuf_imp_release(struct gntdev_priv *priv, u32 fd);
+#endif
+
+#endif
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Add UAPI and IOCTLs for dma-buf grant device driver extension:
the extension allows userspace processes and kernel modules to
use Xen backed dma-buf implementation. With this extension grant
references to the pages of an imported dma-buf can be exported
for other domain use and grant references coming from a foreign
domain can be converted into a local dma-buf for local export.
Implement basic initialization and stubs for Xen DMA buffers'
support.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/Kconfig | 10 +++
drivers/xen/gntdev.c | 148 ++++++++++++++++++++++++++++++++++++++
include/uapi/xen/gntdev.h | 91 +++++++++++++++++++++++
3 files changed, 249 insertions(+)
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 3431fe210624..eaf63a2c7ae6 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -152,6 +152,16 @@ config XEN_GNTDEV
help
Allows userspace processes to use grants.
+config XEN_GNTDEV_DMABUF
+ bool "Add support for dma-buf grant access device driver extension"
+ depends on XEN_GNTDEV && XEN_GRANT_DMA_ALLOC
+ help
+ Allows userspace processes and kernel modules to use Xen backed
+ dma-buf implementation. With this extension grant references to
+ the pages of an imported dma-buf can be exported for other domain
+ use and grant references coming from a foreign domain can be
+ converted into a local dma-buf for local export.
+
config XEN_GRANT_DEV_ALLOC
tristate "User-space grant reference allocator driver"
depends on XEN
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 640a579f42ea..9e450622af1a 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -6,6 +6,7 @@
*
* Copyright (c) 2006-2007, D G Murray.
* (c) 2009 Gerd Hoffmann <[email protected]>
+ * (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
@@ -122,6 +123,17 @@ struct grant_map {
#endif
};
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+struct xen_dmabuf {
+ union {
+ struct {
+ /* Granted references of the imported buffer. */
+ grant_ref_t *refs;
+ } imp;
+ } u;
+};
+#endif
+
static int unmap_grant_pages(struct grant_map *map, int offset, int pages);
static struct miscdevice gntdev_miscdev;
@@ -1036,6 +1048,128 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
return ret;
}
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+/* ------------------------------------------------------------------ */
+/* DMA buffer export support. */
+/* ------------------------------------------------------------------ */
+
+static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
+ int wait_to_ms)
+{
+ return -ETIMEDOUT;
+}
+
+static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
+ int count, u32 domid, u32 *refs, u32 *fd)
+{
+ *fd = -1;
+ return -EINVAL;
+}
+
+/* ------------------------------------------------------------------ */
+/* DMA buffer import support. */
+/* ------------------------------------------------------------------ */
+
+static int dmabuf_imp_release(struct gntdev_priv *priv, u32 fd)
+{
+ return 0;
+}
+
+static struct xen_dmabuf *
+dmabuf_imp_to_refs(struct gntdev_priv *priv, int fd, int count, int domid)
+{
+ return ERR_PTR(-ENOMEM);
+}
+
+/* ------------------------------------------------------------------ */
+/* DMA buffer IOCTL support. */
+/* ------------------------------------------------------------------ */
+
+static long
+gntdev_ioctl_dmabuf_exp_from_refs(struct gntdev_priv *priv,
+ struct ioctl_gntdev_dmabuf_exp_from_refs __user *u)
+{
+ struct ioctl_gntdev_dmabuf_exp_from_refs op;
+ u32 *refs;
+ long ret;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+
+ refs = kcalloc(op.count, sizeof(*refs), GFP_KERNEL);
+ if (!refs)
+ return -ENOMEM;
+
+ if (copy_from_user(refs, u->refs, sizeof(*refs) * op.count) != 0) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ ret = dmabuf_exp_from_refs(priv, op.flags, op.count,
+ op.domid, refs, &op.fd);
+ if (ret)
+ goto out;
+
+ if (copy_to_user(u, &op, sizeof(op)) != 0)
+ ret = -EFAULT;
+
+out:
+ kfree(refs);
+ return ret;
+}
+
+static long
+gntdev_ioctl_dmabuf_exp_wait_released(struct gntdev_priv *priv,
+ struct ioctl_gntdev_dmabuf_exp_wait_released __user *u)
+{
+ struct ioctl_gntdev_dmabuf_exp_wait_released op;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+
+ return dmabuf_exp_wait_released(priv, op.fd, op.wait_to_ms);
+}
+
+static long
+gntdev_ioctl_dmabuf_imp_to_refs(struct gntdev_priv *priv,
+ struct ioctl_gntdev_dmabuf_imp_to_refs __user *u)
+{
+ struct ioctl_gntdev_dmabuf_imp_to_refs op;
+ struct xen_dmabuf *xen_dmabuf;
+ long ret;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+
+ xen_dmabuf = dmabuf_imp_to_refs(priv, op.fd, op.count, op.domid);
+ if (IS_ERR(xen_dmabuf))
+ return PTR_ERR(xen_dmabuf);
+
+ if (copy_to_user(u->refs, xen_dmabuf->u.imp.refs,
+ sizeof(*u->refs) * op.count) != 0) {
+ ret = -EFAULT;
+ goto out_release;
+ }
+ return 0;
+
+out_release:
+ dmabuf_imp_release(priv, op.fd);
+ return ret;
+}
+
+static long
+gntdev_ioctl_dmabuf_imp_release(struct gntdev_priv *priv,
+ struct ioctl_gntdev_dmabuf_imp_release __user *u)
+{
+ struct ioctl_gntdev_dmabuf_imp_release op;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+
+ return dmabuf_imp_release(priv, op.fd);
+}
+#endif
+
static long gntdev_ioctl(struct file *flip,
unsigned int cmd, unsigned long arg)
{
@@ -1058,6 +1192,20 @@ static long gntdev_ioctl(struct file *flip,
case IOCTL_GNTDEV_GRANT_COPY:
return gntdev_ioctl_grant_copy(priv, ptr);
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+ case IOCTL_GNTDEV_DMABUF_EXP_FROM_REFS:
+ return gntdev_ioctl_dmabuf_exp_from_refs(priv, ptr);
+
+ case IOCTL_GNTDEV_DMABUF_EXP_WAIT_RELEASED:
+ return gntdev_ioctl_dmabuf_exp_wait_released(priv, ptr);
+
+ case IOCTL_GNTDEV_DMABUF_IMP_TO_REFS:
+ return gntdev_ioctl_dmabuf_imp_to_refs(priv, ptr);
+
+ case IOCTL_GNTDEV_DMABUF_IMP_RELEASE:
+ return gntdev_ioctl_dmabuf_imp_release(priv, ptr);
+#endif
+
default:
pr_debug("priv %p, unknown cmd %x\n", priv, cmd);
return -ENOIOCTLCMD;
diff --git a/include/uapi/xen/gntdev.h b/include/uapi/xen/gntdev.h
index 2d5a4672f07c..568d5cb522e9 100644
--- a/include/uapi/xen/gntdev.h
+++ b/include/uapi/xen/gntdev.h
@@ -5,6 +5,7 @@
* Interface to /dev/xen/gntdev.
*
* Copyright (c) 2007, D G Murray
+ * Copyright (c) 2018, Oleksandr Andrushchenko, EPAM Systems Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License version 2
@@ -215,4 +216,94 @@ struct ioctl_gntdev_grant_copy {
*/
#define GNTDEV_DMA_FLAG_COHERENT (1 << 2)
+/*
+ * Create a dma-buf [1] from grant references @refs of count @count provided
+ * by the foreign domain @domid with flags @flags.
+ *
+ * By default dma-buf is backed by system memory pages, but by providing
+ * one of the GNTDEV_DMA_FLAG_XXX flags it can also be created as
+ * a DMA write-combine or coherent buffer, e.g. allocated with dma_alloc_wc/
+ * dma_alloc_coherent.
+ *
+ * Returns 0 if dma-buf was successfully created and the corresponding
+ * dma-buf's file descriptor is returned in @fd.
+ *
+ * [1] https://elixir.bootlin.com/linux/latest/source/Documentation/driver-api/dma-buf.rst
+ */
+
+#define IOCTL_GNTDEV_DMABUF_EXP_FROM_REFS \
+ _IOC(_IOC_NONE, 'G', 9, \
+ sizeof(struct ioctl_gntdev_dmabuf_exp_from_refs))
+struct ioctl_gntdev_dmabuf_exp_from_refs {
+ /* IN parameters. */
+ /* Specific options for this dma-buf: see GNTDEV_DMA_FLAG_XXX. */
+ __u32 flags;
+ /* Number of grant references in @refs array. */
+ __u32 count;
+ /* OUT parameters. */
+ /* File descriptor of the dma-buf. */
+ __u32 fd;
+ /* The domain ID of the grant references to be mapped. */
+ __u32 domid;
+ /* Variable IN parameter. */
+ /* Array of grant references of size @count. */
+ __u32 refs[1];
+};
+
+/*
+ * This will block until the dma-buf with the file descriptor @fd is
+ * released. This is only valid for buffers created with
+ * IOCTL_GNTDEV_DMABUF_EXP_FROM_REFS.
+ *
+ * If within @wait_to_ms milliseconds the buffer is not released
+ * then -ETIMEDOUT error is returned.
+ * If the buffer with the file descriptor @fd does not exist or has already
+ * been released, then -ENOENT is returned. For valid file descriptors
+ * this must not be treated as error.
+ */
+#define IOCTL_GNTDEV_DMABUF_EXP_WAIT_RELEASED \
+ _IOC(_IOC_NONE, 'G', 10, \
+ sizeof(struct ioctl_gntdev_dmabuf_exp_wait_released))
+struct ioctl_gntdev_dmabuf_exp_wait_released {
+ /* IN parameters */
+ __u32 fd;
+ __u32 wait_to_ms;
+};
+
+/*
+ * Import a dma-buf with file descriptor @fd and export granted references
+ * to the pages of that dma-buf into array @refs of size @count.
+ */
+#define IOCTL_GNTDEV_DMABUF_IMP_TO_REFS \
+ _IOC(_IOC_NONE, 'G', 11, \
+ sizeof(struct ioctl_gntdev_dmabuf_imp_to_refs))
+struct ioctl_gntdev_dmabuf_imp_to_refs {
+ /* IN parameters. */
+ /* File descriptor of the dma-buf. */
+ __u32 fd;
+ /* Number of grant references in @refs array. */
+ __u32 count;
+ /* The domain ID for which references to be granted. */
+ __u32 domid;
+ /* Reserved - must be zero. */
+ __u32 reserved;
+ /* OUT parameters. */
+ /* Array of grant references of size @count. */
+ __u32 refs[1];
+};
+
+/*
+ * This will close all references to the imported buffer with file descriptor
+ * @fd, so it can be released by the owner. This is only valid for buffers
+ * created with IOCTL_GNTDEV_DMABUF_IMP_TO_REFS.
+ */
+#define IOCTL_GNTDEV_DMABUF_IMP_RELEASE \
+ _IOC(_IOC_NONE, 'G', 12, \
+ sizeof(struct ioctl_gntdev_dmabuf_imp_release))
+struct ioctl_gntdev_dmabuf_imp_release {
+ /* IN parameters */
+ __u32 fd;
+ __u32 reserved;
+};
+
#endif /* __LINUX_PUBLIC_GNTDEV_H__ */
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
1. Import a dma-buf with the file descriptor provided and export
granted references to the pages of that dma-buf into the array
of grant references.
2. Add API to close all references to an imported buffer, so it can be
released by the owner. This is only valid for buffers created with
IOCTL_GNTDEV_DMABUF_IMP_TO_REFS.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/gntdev.c | 237 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 234 insertions(+), 3 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 52abc6cd5846..d8b6168f2cd9 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -71,6 +71,17 @@ static atomic_t pages_mapped = ATOMIC_INIT(0);
static int use_ptemod;
#define populate_freeable_maps use_ptemod
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+#ifndef GRANT_INVALID_REF
+/*
+ * Note on usage of grant reference 0 as invalid grant reference:
+ * grant reference 0 is valid, but never exposed to a driver,
+ * because of the fact it is already in use/reserved by the PV console.
+ */
+#define GRANT_INVALID_REF 0
+#endif
+#endif
+
struct gntdev_priv {
/* maps with visible offsets in the file descriptor */
struct list_head maps;
@@ -94,6 +105,8 @@ struct gntdev_priv {
struct list_head dmabuf_exp_list;
/* List of wait objects. */
struct list_head dmabuf_exp_wait_list;
+ /* List of imported DMA buffers. */
+ struct list_head dmabuf_imp_list;
/* This is the lock which protects dma_buf_xxx lists. */
struct mutex dmabuf_lock;
#endif
@@ -155,6 +168,10 @@ struct xen_dmabuf {
struct {
/* Granted references of the imported buffer. */
grant_ref_t *refs;
+ /* Scatter-gather table of the imported buffer. */
+ struct sg_table *sgt;
+ /* dma-buf attachment of the imported buffer. */
+ struct dma_buf_attachment *attach;
} imp;
} u;
@@ -684,6 +701,7 @@ static int gntdev_open(struct inode *inode, struct file *flip)
mutex_init(&priv->dmabuf_lock);
INIT_LIST_HEAD(&priv->dmabuf_exp_list);
INIT_LIST_HEAD(&priv->dmabuf_exp_wait_list);
+ INIT_LIST_HEAD(&priv->dmabuf_imp_list);
#endif
if (use_ptemod) {
@@ -1544,15 +1562,228 @@ static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
/* DMA buffer import support. */
/* ------------------------------------------------------------------ */
-static int dmabuf_imp_release(struct gntdev_priv *priv, u32 fd)
+static int
+dmabuf_imp_grant_foreign_access(struct page **pages, u32 *refs,
+ int count, int domid)
{
- return 0;
+ grant_ref_t priv_gref_head;
+ int i, ret;
+
+ ret = gnttab_alloc_grant_references(count, &priv_gref_head);
+ if (ret < 0) {
+ pr_err("Cannot allocate grant references, ret %d\n", ret);
+ return ret;
+ }
+
+ for (i = 0; i < count; i++) {
+ int cur_ref;
+
+ cur_ref = gnttab_claim_grant_reference(&priv_gref_head);
+ if (cur_ref < 0) {
+ ret = cur_ref;
+ pr_err("Cannot claim grant reference, ret %d\n", ret);
+ goto out;
+ }
+
+ gnttab_grant_foreign_access_ref(cur_ref, domid,
+ xen_page_to_gfn(pages[i]), 0);
+ refs[i] = cur_ref;
+ }
+
+ ret = 0;
+
+out:
+ gnttab_free_grant_references(priv_gref_head);
+ return ret;
+}
+
+static void dmabuf_imp_end_foreign_access(u32 *refs, int count)
+{
+ int i;
+
+ for (i = 0; i < count; i++)
+ if (refs[i] != GRANT_INVALID_REF)
+ gnttab_end_foreign_access(refs[i], 0, 0UL);
+}
+
+static void dmabuf_imp_free_storage(struct xen_dmabuf *xen_dmabuf)
+{
+ kfree(xen_dmabuf->pages);
+ kfree(xen_dmabuf->u.imp.refs);
+ kfree(xen_dmabuf);
+}
+
+static struct xen_dmabuf *dmabuf_imp_alloc_storage(int count)
+{
+ struct xen_dmabuf *xen_dmabuf;
+ int i;
+
+ xen_dmabuf = kzalloc(sizeof(*xen_dmabuf), GFP_KERNEL);
+ if (!xen_dmabuf)
+ goto fail;
+
+ xen_dmabuf->u.imp.refs = kcalloc(count,
+ sizeof(xen_dmabuf->u.imp.refs[0]),
+ GFP_KERNEL);
+ if (!xen_dmabuf->u.imp.refs)
+ goto fail;
+
+ xen_dmabuf->pages = kcalloc(count,
+ sizeof(xen_dmabuf->pages[0]),
+ GFP_KERNEL);
+ if (!xen_dmabuf->pages)
+ goto fail;
+
+ xen_dmabuf->nr_pages = count;
+
+ for (i = 0; i < count; i++)
+ xen_dmabuf->u.imp.refs[i] = GRANT_INVALID_REF;
+
+ return xen_dmabuf;
+
+fail:
+ dmabuf_imp_free_storage(xen_dmabuf);
+ return ERR_PTR(-ENOMEM);
}
static struct xen_dmabuf *
dmabuf_imp_to_refs(struct gntdev_priv *priv, int fd, int count, int domid)
{
- return ERR_PTR(-ENOMEM);
+ struct xen_dmabuf *xen_dmabuf, *ret;
+ struct dma_buf *dma_buf;
+ struct dma_buf_attachment *attach;
+ struct sg_table *sgt;
+ struct sg_page_iter sg_iter;
+ int i;
+
+ dma_buf = dma_buf_get(fd);
+ if (IS_ERR(dma_buf))
+ return ERR_CAST(dma_buf);
+
+ xen_dmabuf = dmabuf_imp_alloc_storage(count);
+ if (IS_ERR(xen_dmabuf)) {
+ ret = xen_dmabuf;
+ goto fail_put;
+ }
+
+ xen_dmabuf->priv = priv;
+ xen_dmabuf->fd = fd;
+
+ attach = dma_buf_attach(dma_buf, priv->dma_dev);
+ if (IS_ERR(attach)) {
+ ret = ERR_CAST(attach);
+ goto fail_free_obj;
+ }
+
+ xen_dmabuf->u.imp.attach = attach;
+
+ sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+ if (IS_ERR(sgt)) {
+ ret = ERR_CAST(sgt);
+ goto fail_detach;
+ }
+
+ /* Check number of pages that imported buffer has. */
+ if (attach->dmabuf->size != xen_dmabuf->nr_pages << PAGE_SHIFT) {
+ ret = ERR_PTR(-EINVAL);
+ pr_err("DMA buffer has %zu pages, user-space expects %d\n",
+ attach->dmabuf->size, xen_dmabuf->nr_pages);
+ goto fail_unmap;
+ }
+
+ xen_dmabuf->u.imp.sgt = sgt;
+
+ /* Now convert sgt to array of pages and check for page validity. */
+ i = 0;
+ for_each_sg_page(sgt->sgl, &sg_iter, sgt->nents, 0) {
+ struct page *page = sg_page_iter_page(&sg_iter);
+ /*
+ * Check if page is valid: this can happen if we are given
+ * a page from VRAM or other resources which are not backed
+ * by a struct page.
+ */
+ if (!pfn_valid(page_to_pfn(page))) {
+ ret = ERR_PTR(-EINVAL);
+ goto fail_unmap;
+ }
+
+ xen_dmabuf->pages[i++] = page;
+ }
+
+ ret = ERR_PTR(dmabuf_imp_grant_foreign_access(xen_dmabuf->pages,
+ xen_dmabuf->u.imp.refs,
+ count, domid));
+ if (IS_ERR(ret))
+ goto fail_end_access;
+
+ pr_debug("Imported DMA buffer with fd %d\n", fd);
+
+ mutex_lock(&priv->dmabuf_lock);
+ list_add(&xen_dmabuf->next, &priv->dmabuf_imp_list);
+ mutex_unlock(&priv->dmabuf_lock);
+
+ return xen_dmabuf;
+
+fail_end_access:
+ dmabuf_imp_end_foreign_access(xen_dmabuf->u.imp.refs, count);
+fail_unmap:
+ dma_buf_unmap_attachment(attach, sgt, DMA_BIDIRECTIONAL);
+fail_detach:
+ dma_buf_detach(dma_buf, attach);
+fail_free_obj:
+ dmabuf_imp_free_storage(xen_dmabuf);
+fail_put:
+ dma_buf_put(dma_buf);
+ return ret;
+}
+
+/*
+ * Find the hyper dma-buf by its file descriptor and remove
+ * it from the buffer's list.
+ */
+static struct xen_dmabuf *
+dmabuf_imp_find_unlink(struct gntdev_priv *priv, int fd)
+{
+ struct xen_dmabuf *q, *xen_dmabuf, *ret = ERR_PTR(-ENOENT);
+
+ mutex_lock(&priv->dmabuf_lock);
+ list_for_each_entry_safe(xen_dmabuf, q, &priv->dmabuf_imp_list, next) {
+ if (xen_dmabuf->fd == fd) {
+ pr_debug("Found xen_dmabuf in the import list\n");
+ ret = xen_dmabuf;
+ list_del(&xen_dmabuf->next);
+ break;
+ }
+ }
+ mutex_unlock(&priv->dmabuf_lock);
+ return ret;
+}
+
+static int dmabuf_imp_release(struct gntdev_priv *priv, u32 fd)
+{
+ struct xen_dmabuf *xen_dmabuf;
+ struct dma_buf_attachment *attach;
+ struct dma_buf *dma_buf;
+
+ xen_dmabuf = dmabuf_imp_find_unlink(priv, fd);
+ if (IS_ERR(xen_dmabuf))
+ return PTR_ERR(xen_dmabuf);
+
+ pr_debug("Releasing DMA buffer with fd %d\n", fd);
+
+ attach = xen_dmabuf->u.imp.attach;
+
+ if (xen_dmabuf->u.imp.sgt)
+ dma_buf_unmap_attachment(attach, xen_dmabuf->u.imp.sgt,
+ DMA_BIDIRECTIONAL);
+ dma_buf = attach->dmabuf;
+ dma_buf_detach(attach->dmabuf, attach);
+ dma_buf_put(dma_buf);
+
+ dmabuf_imp_end_foreign_access(xen_dmabuf->u.imp.refs,
+ xen_dmabuf->nr_pages);
+ dmabuf_imp_free_storage(xen_dmabuf);
+ return 0;
}
/* ------------------------------------------------------------------ */
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Memory {increase|decrease}_reservation and VA mappings update/reset
code used in balloon driver can be made common, so other drivers can
also re-use the same functionality without open-coding.
Create a dedicated module for the shared code and export corresponding
symbols for other kernel modules.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/Makefile | 1 +
drivers/xen/balloon.c | 71 ++----------------
drivers/xen/mem-reservation.c | 134 ++++++++++++++++++++++++++++++++++
include/xen/mem_reservation.h | 29 ++++++++
4 files changed, 170 insertions(+), 65 deletions(-)
create mode 100644 drivers/xen/mem-reservation.c
create mode 100644 include/xen/mem_reservation.h
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 451e833f5931..3c87b0c3aca6 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -2,6 +2,7 @@
obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
obj-$(CONFIG_X86) += fallback.o
obj-y += grant-table.o features.o balloon.o manage.o preempt.o time.o
+obj-y += mem-reservation.o
obj-y += events/
obj-y += xenbus/
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 065f0b607373..57b482d67a3a 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -71,6 +71,7 @@
#include <xen/balloon.h>
#include <xen/features.h>
#include <xen/page.h>
+#include <xen/mem_reservation.h>
static int xen_hotplug_unpopulated;
@@ -157,13 +158,6 @@ static DECLARE_DELAYED_WORK(balloon_worker, balloon_process);
#define GFP_BALLOON \
(GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
-static void scrub_page(struct page *page)
-{
-#ifdef CONFIG_XEN_SCRUB_PAGES
- clear_highpage(page);
-#endif
-}
-
/* balloon_append: add the given page to the balloon. */
static void __balloon_append(struct page *page)
{
@@ -463,11 +457,6 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
int rc;
unsigned long i;
struct page *page;
- struct xen_memory_reservation reservation = {
- .address_bits = 0,
- .extent_order = EXTENT_ORDER,
- .domid = DOMID_SELF
- };
if (nr_pages > ARRAY_SIZE(frame_list))
nr_pages = ARRAY_SIZE(frame_list);
@@ -486,9 +475,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
page = balloon_next_page(page);
}
- set_xen_guest_handle(reservation.extent_start, frame_list);
- reservation.nr_extents = nr_pages;
- rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+ rc = xenmem_reservation_increase(nr_pages, frame_list);
if (rc <= 0)
return BP_EAGAIN;
@@ -496,29 +483,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
page = balloon_retrieve(false);
BUG_ON(page == NULL);
-#ifdef CONFIG_XEN_HAVE_PVMMU
- /*
- * We don't support PV MMU when Linux and Xen is using
- * different page granularity.
- */
- BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
-
- if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- unsigned long pfn = page_to_pfn(page);
-
- set_phys_to_machine(pfn, frame_list[i]);
-
- /* Link back into the page tables if not highmem. */
- if (!PageHighMem(page)) {
- int ret;
- ret = HYPERVISOR_update_va_mapping(
- (unsigned long)__va(pfn << PAGE_SHIFT),
- mfn_pte(frame_list[i], PAGE_KERNEL),
- 0);
- BUG_ON(ret);
- }
- }
-#endif
+ xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
/* Relinquish the page back to the allocator. */
free_reserved_page(page);
@@ -535,11 +500,6 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
unsigned long i;
struct page *page, *tmp;
int ret;
- struct xen_memory_reservation reservation = {
- .address_bits = 0,
- .extent_order = EXTENT_ORDER,
- .domid = DOMID_SELF
- };
LIST_HEAD(pages);
if (nr_pages > ARRAY_SIZE(frame_list))
@@ -553,7 +513,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
break;
}
adjust_managed_page_count(page, -1);
- scrub_page(page);
+ xenmem_reservation_scrub_page(page);
list_add(&page->lru, &pages);
}
@@ -575,25 +535,8 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
/* XENMEM_decrease_reservation requires a GFN */
frame_list[i++] = xen_page_to_gfn(page);
-#ifdef CONFIG_XEN_HAVE_PVMMU
- /*
- * We don't support PV MMU when Linux and Xen is using
- * different page granularity.
- */
- BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
-
- if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- unsigned long pfn = page_to_pfn(page);
+ xenmem_reservation_va_mapping_reset(1, &page);
- if (!PageHighMem(page)) {
- ret = HYPERVISOR_update_va_mapping(
- (unsigned long)__va(pfn << PAGE_SHIFT),
- __pte_ma(0), 0);
- BUG_ON(ret);
- }
- __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
- }
-#endif
list_del(&page->lru);
balloon_append(page);
@@ -601,9 +544,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
flush_tlb_all();
- set_xen_guest_handle(reservation.extent_start, frame_list);
- reservation.nr_extents = nr_pages;
- ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
+ ret = xenmem_reservation_decrease(nr_pages, frame_list);
BUG_ON(ret != nr_pages);
balloon_stats.current_pages -= nr_pages;
diff --git a/drivers/xen/mem-reservation.c b/drivers/xen/mem-reservation.c
new file mode 100644
index 000000000000..29882e4324f5
--- /dev/null
+++ b/drivers/xen/mem-reservation.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+
+/******************************************************************************
+ * Xen memory reservation utilities.
+ *
+ * Copyright (c) 2003, B Dragovic
+ * Copyright (c) 2003-2004, M Williamson, K Fraser
+ * Copyright (c) 2005 Dan M. Smith, IBM Corporation
+ * Copyright (c) 2010 Daniel Kiper
+ * Copyright (c) 2018, Oleksandr Andrushchenko, EPAM Systems Inc.
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include <asm/tlb.h>
+#include <asm/xen/hypercall.h>
+
+#include <xen/interface/memory.h>
+#include <xen/page.h>
+
+/*
+ * Use one extent per PAGE_SIZE to avoid to break down the page into
+ * multiple frame.
+ */
+#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
+
+void xenmem_reservation_scrub_page(struct page *page)
+{
+#ifdef CONFIG_XEN_SCRUB_PAGES
+ clear_highpage(page);
+#endif
+}
+EXPORT_SYMBOL(xenmem_reservation_scrub_page);
+
+void xenmem_reservation_va_mapping_update(unsigned long count,
+ struct page **pages,
+ xen_pfn_t *frames)
+{
+#ifdef CONFIG_XEN_HAVE_PVMMU
+ int i;
+
+ for (i = 0; i < count; i++) {
+ struct page *page;
+
+ page = pages[i];
+ BUG_ON(page == NULL);
+
+ /*
+ * We don't support PV MMU when Linux and Xen is using
+ * different page granularity.
+ */
+ BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
+ if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+ unsigned long pfn = page_to_pfn(page);
+
+ set_phys_to_machine(pfn, frames[i]);
+
+ /* Link back into the page tables if not highmem. */
+ if (!PageHighMem(page)) {
+ int ret;
+
+ ret = HYPERVISOR_update_va_mapping(
+ (unsigned long)__va(pfn << PAGE_SHIFT),
+ mfn_pte(frames[i], PAGE_KERNEL),
+ 0);
+ BUG_ON(ret);
+ }
+ }
+ }
+#endif
+}
+EXPORT_SYMBOL(xenmem_reservation_va_mapping_update);
+
+void xenmem_reservation_va_mapping_reset(unsigned long count,
+ struct page **pages)
+{
+#ifdef CONFIG_XEN_HAVE_PVMMU
+ int i;
+
+ for (i = 0; i < count; i++) {
+ /*
+ * We don't support PV MMU when Linux and Xen is using
+ * different page granularity.
+ */
+ BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
+ if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+ struct page *page = pages[i];
+ unsigned long pfn = page_to_pfn(page);
+
+ if (!PageHighMem(page)) {
+ int ret;
+
+ ret = HYPERVISOR_update_va_mapping(
+ (unsigned long)__va(pfn << PAGE_SHIFT),
+ __pte_ma(0), 0);
+ BUG_ON(ret);
+ }
+ __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+ }
+ }
+#endif
+}
+EXPORT_SYMBOL(xenmem_reservation_va_mapping_reset);
+
+int xenmem_reservation_increase(int count, xen_pfn_t *frames)
+{
+ struct xen_memory_reservation reservation = {
+ .address_bits = 0,
+ .extent_order = EXTENT_ORDER,
+ .domid = DOMID_SELF
+ };
+
+ set_xen_guest_handle(reservation.extent_start, frames);
+ reservation.nr_extents = count;
+ return HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+}
+EXPORT_SYMBOL(xenmem_reservation_increase);
+
+int xenmem_reservation_decrease(int count, xen_pfn_t *frames)
+{
+ struct xen_memory_reservation reservation = {
+ .address_bits = 0,
+ .extent_order = EXTENT_ORDER,
+ .domid = DOMID_SELF
+ };
+
+ set_xen_guest_handle(reservation.extent_start, frames);
+ reservation.nr_extents = count;
+ return HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
+}
+EXPORT_SYMBOL(xenmem_reservation_decrease);
diff --git a/include/xen/mem_reservation.h b/include/xen/mem_reservation.h
new file mode 100644
index 000000000000..9306d9b8743c
--- /dev/null
+++ b/include/xen/mem_reservation.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+/*
+ * Xen memory reservation utilities.
+ *
+ * Copyright (c) 2003, B Dragovic
+ * Copyright (c) 2003-2004, M Williamson, K Fraser
+ * Copyright (c) 2005 Dan M. Smith, IBM Corporation
+ * Copyright (c) 2010 Daniel Kiper
+ * Copyright (c) 2018, Oleksandr Andrushchenko, EPAM Systems Inc.
+ */
+
+#ifndef _XENMEM_RESERVATION_H
+#define _XENMEM_RESERVATION_H
+
+void xenmem_reservation_scrub_page(struct page *page);
+
+void xenmem_reservation_va_mapping_update(unsigned long count,
+ struct page **pages,
+ xen_pfn_t *frames);
+
+void xenmem_reservation_va_mapping_reset(unsigned long count,
+ struct page **pages);
+
+int xenmem_reservation_increase(int count, xen_pfn_t *frames);
+
+int xenmem_reservation_decrease(int count, xen_pfn_t *frames);
+
+#endif
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Allow mappings for DMA backed buffers if grant table module
supports such: this extends grant device to not only map buffers
made of balloon pages, but also from buffers allocated with
dma_alloc_xxx.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/gntdev.c | 100 +++++++++++++++++++++++++++++++++++++-
include/uapi/xen/gntdev.h | 15 ++++++
2 files changed, 113 insertions(+), 2 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index bd56653b9bbc..640a579f42ea 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -37,6 +37,9 @@
#include <linux/slab.h>
#include <linux/highmem.h>
#include <linux/refcount.h>
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+#include <linux/of_device.h>
+#endif
#include <xen/xen.h>
#include <xen/grant_table.h>
@@ -72,6 +75,11 @@ struct gntdev_priv {
struct mutex lock;
struct mm_struct *mm;
struct mmu_notifier mn;
+
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ /* Device for which DMA memory is allocated. */
+ struct device *dma_dev;
+#endif
};
struct unmap_notify {
@@ -96,10 +104,28 @@ struct grant_map {
struct gnttab_unmap_grant_ref *kunmap_ops;
struct page **pages;
unsigned long pages_vm_start;
+
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ /*
+ * If dmabuf_vaddr is not NULL then this mapping is backed by DMA
+ * capable memory.
+ */
+
+ /* Device for which DMA memory is allocated. */
+ struct device *dma_dev;
+ /* Flags used to create this DMA buffer: GNTDEV_DMABUF_FLAG_XXX. */
+ bool dma_flags;
+ /* Virtual/CPU address of the DMA buffer. */
+ void *dma_vaddr;
+ /* Bus address of the DMA buffer. */
+ dma_addr_t dma_bus_addr;
+#endif
};
static int unmap_grant_pages(struct grant_map *map, int offset, int pages);
+static struct miscdevice gntdev_miscdev;
+
/* ------------------------------------------------------------------ */
static void gntdev_print_maps(struct gntdev_priv *priv,
@@ -121,8 +147,26 @@ static void gntdev_free_map(struct grant_map *map)
if (map == NULL)
return;
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ if (map->dma_vaddr) {
+ struct gnttab_dma_alloc_args args;
+
+ args.dev = map->dma_dev;
+ args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
+ args.nr_pages = map->count;
+ args.pages = map->pages;
+ args.vaddr = map->dma_vaddr;
+ args.dev_bus_addr = map->dma_bus_addr;
+
+ gnttab_dma_free_pages(&args);
+ } else if (map->pages) {
+ gnttab_free_pages(map->count, map->pages);
+ }
+#else
if (map->pages)
gnttab_free_pages(map->count, map->pages);
+#endif
+
kfree(map->pages);
kfree(map->grants);
kfree(map->map_ops);
@@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map *map)
kfree(map);
}
-static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
+static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count,
+ int dma_flags)
{
struct grant_map *add;
int i;
@@ -155,8 +200,37 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
NULL == add->pages)
goto err;
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ add->dma_flags = dma_flags;
+
+ /*
+ * Check if this mapping is requested to be backed
+ * by a DMA buffer.
+ */
+ if (dma_flags & (GNTDEV_DMA_FLAG_WC | GNTDEV_DMA_FLAG_COHERENT)) {
+ struct gnttab_dma_alloc_args args;
+
+ /* Remember the device, so we can free DMA memory. */
+ add->dma_dev = priv->dma_dev;
+
+ args.dev = priv->dma_dev;
+ args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT;
+ args.nr_pages = count;
+ args.pages = add->pages;
+
+ if (gnttab_dma_alloc_pages(&args))
+ goto err;
+
+ add->dma_vaddr = args.vaddr;
+ add->dma_bus_addr = args.dev_bus_addr;
+ } else {
+ if (gnttab_alloc_pages(count, add->pages))
+ goto err;
+ }
+#else
if (gnttab_alloc_pages(count, add->pages))
goto err;
+#endif
for (i = 0; i < count; i++) {
add->map_ops[i].handle = -1;
@@ -323,8 +397,19 @@ static int map_grant_pages(struct grant_map *map)
}
map->unmap_ops[i].handle = map->map_ops[i].handle;
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ if (use_ptemod) {
+ map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
+ } else if (map->dma_vaddr) {
+ unsigned long mfn;
+
+ mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
+ map->unmap_ops[i].dev_bus_addr = __pfn_to_phys(mfn);
+ }
+#else
if (use_ptemod)
map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
+#endif
}
return err;
}
@@ -548,6 +633,17 @@ static int gntdev_open(struct inode *inode, struct file *flip)
}
flip->private_data = priv;
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ priv->dma_dev = gntdev_miscdev.this_device;
+
+ /*
+ * The device is not spawn from a device tree, so arch_setup_dma_ops
+ * is not called, thus leaving the device with dummy DMA ops.
+ * Fix this call of_dma_configure() with a NULL node to set
+ * default DMA ops.
+ */
+ of_dma_configure(priv->dma_dev, NULL);
+#endif
pr_debug("priv %p\n", priv);
return 0;
@@ -589,7 +685,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
return -EINVAL;
err = -ENOMEM;
- map = gntdev_alloc_map(priv, op.count);
+ map = gntdev_alloc_map(priv, op.count, 0 /* This is not a dma-buf. */);
if (!map)
return err;
diff --git a/include/uapi/xen/gntdev.h b/include/uapi/xen/gntdev.h
index 6d1163456c03..2d5a4672f07c 100644
--- a/include/uapi/xen/gntdev.h
+++ b/include/uapi/xen/gntdev.h
@@ -200,4 +200,19 @@ struct ioctl_gntdev_grant_copy {
/* Send an interrupt on the indicated event channel */
#define UNMAP_NOTIFY_SEND_EVENT 0x2
+/*
+ * Flags to be used while requesting memory mapping's backing storage
+ * to be allocated with DMA API.
+ */
+
+/*
+ * The buffer is backed with memory allocated with dma_alloc_wc.
+ */
+#define GNTDEV_DMA_FLAG_WC (1 << 1)
+
+/*
+ * The buffer is backed with memory allocated with dma_alloc_coherent.
+ */
+#define GNTDEV_DMA_FLAG_COHERENT (1 << 2)
+
#endif /* __LINUX_PUBLIC_GNTDEV_H__ */
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
1. Create a dma-buf from grant references provided by the foreign
domain. By default dma-buf is backed by system memory pages, but
by providing GNTDEV_DMA_FLAG_XXX flags it can also be created
as a DMA write-combine/coherent buffer, e.g. allocated with
corresponding dma_alloc_xxx API.
Export the resulting buffer as a new dma-buf.
2. Implement waiting for the dma-buf to be released: block until the
dma-buf with the file descriptor provided is released.
If within the time-out provided the buffer is not released then
-ETIMEDOUT error is returned. If the buffer with the file descriptor
does not exist or has already been released, then -ENOENT is returned.
For valid file descriptors this must not be treated as error.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/gntdev.c | 478 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 476 insertions(+), 2 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 9e450622af1a..52abc6cd5846 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -4,6 +4,8 @@
* Device for accessing (in user-space) pages that have been granted by other
* domains.
*
+ * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c.
+ *
* Copyright (c) 2006-2007, D G Murray.
* (c) 2009 Gerd Hoffmann <[email protected]>
* (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
@@ -41,6 +43,9 @@
#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
#include <linux/of_device.h>
#endif
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+#include <linux/dma-buf.h>
+#endif
#include <xen/xen.h>
#include <xen/grant_table.h>
@@ -81,6 +86,17 @@ struct gntdev_priv {
/* Device for which DMA memory is allocated. */
struct device *dma_dev;
#endif
+
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+ /* Private data of the hyper DMA buffers. */
+
+ /* List of exported DMA buffers. */
+ struct list_head dmabuf_exp_list;
+ /* List of wait objects. */
+ struct list_head dmabuf_exp_wait_list;
+ /* This is the lock which protects dma_buf_xxx lists. */
+ struct mutex dmabuf_lock;
+#endif
};
struct unmap_notify {
@@ -125,12 +141,38 @@ struct grant_map {
#ifdef CONFIG_XEN_GNTDEV_DMABUF
struct xen_dmabuf {
+ struct gntdev_priv *priv;
+ struct dma_buf *dmabuf;
+ struct list_head next;
+ int fd;
+
union {
+ struct {
+ /* Exported buffers are reference counted. */
+ struct kref refcount;
+ struct grant_map *map;
+ } exp;
struct {
/* Granted references of the imported buffer. */
grant_ref_t *refs;
} imp;
} u;
+
+ /* Number of pages this buffer has. */
+ int nr_pages;
+ /* Pages of this buffer. */
+ struct page **pages;
+};
+
+struct xen_dmabuf_wait_obj {
+ struct list_head next;
+ struct xen_dmabuf *xen_dmabuf;
+ struct completion completion;
+};
+
+struct xen_dmabuf_attachment {
+ struct sg_table *sgt;
+ enum dma_data_direction dir;
};
#endif
@@ -320,6 +362,16 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map)
gntdev_free_map(map);
}
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+static void gntdev_remove_map(struct gntdev_priv *priv, struct grant_map *map)
+{
+ mutex_lock(&priv->lock);
+ list_del(&map->next);
+ gntdev_put_map(NULL /* already removed */, map);
+ mutex_unlock(&priv->lock);
+}
+#endif
+
/* ------------------------------------------------------------------ */
static int find_grant_ptes(pte_t *pte, pgtable_t token,
@@ -628,6 +680,12 @@ static int gntdev_open(struct inode *inode, struct file *flip)
INIT_LIST_HEAD(&priv->freeable_maps);
mutex_init(&priv->lock);
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+ mutex_init(&priv->dmabuf_lock);
+ INIT_LIST_HEAD(&priv->dmabuf_exp_list);
+ INIT_LIST_HEAD(&priv->dmabuf_exp_wait_list);
+#endif
+
if (use_ptemod) {
priv->mm = get_task_mm(current);
if (!priv->mm) {
@@ -1053,17 +1111,433 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
/* DMA buffer export support. */
/* ------------------------------------------------------------------ */
+/* ------------------------------------------------------------------ */
+/* Implementation of wait for exported DMA buffer to be released. */
+/* ------------------------------------------------------------------ */
+
+static void dmabuf_exp_release(struct kref *kref);
+
+static struct xen_dmabuf_wait_obj *
+dmabuf_exp_wait_obj_new(struct gntdev_priv *priv,
+ struct xen_dmabuf *xen_dmabuf)
+{
+ struct xen_dmabuf_wait_obj *obj;
+
+ obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+ if (!obj)
+ return ERR_PTR(-ENOMEM);
+
+ init_completion(&obj->completion);
+ obj->xen_dmabuf = xen_dmabuf;
+
+ mutex_lock(&priv->dmabuf_lock);
+ list_add(&obj->next, &priv->dmabuf_exp_wait_list);
+ /* Put our reference and wait for xen_dmabuf's release to fire. */
+ kref_put(&xen_dmabuf->u.exp.refcount, dmabuf_exp_release);
+ mutex_unlock(&priv->dmabuf_lock);
+ return obj;
+}
+
+static void dmabuf_exp_wait_obj_free(struct gntdev_priv *priv,
+ struct xen_dmabuf_wait_obj *obj)
+{
+ struct xen_dmabuf_wait_obj *cur_obj, *q;
+
+ mutex_lock(&priv->dmabuf_lock);
+ list_for_each_entry_safe(cur_obj, q, &priv->dmabuf_exp_wait_list, next)
+ if (cur_obj == obj) {
+ list_del(&obj->next);
+ kfree(obj);
+ break;
+ }
+ mutex_unlock(&priv->dmabuf_lock);
+}
+
+static int dmabuf_exp_wait_obj_wait(struct xen_dmabuf_wait_obj *obj,
+ u32 wait_to_ms)
+{
+ if (wait_for_completion_timeout(&obj->completion,
+ msecs_to_jiffies(wait_to_ms)) <= 0)
+ return -ETIMEDOUT;
+
+ return 0;
+}
+
+static void dmabuf_exp_wait_obj_signal(struct gntdev_priv *priv,
+ struct xen_dmabuf *xen_dmabuf)
+{
+ struct xen_dmabuf_wait_obj *obj, *q;
+
+ list_for_each_entry_safe(obj, q, &priv->dmabuf_exp_wait_list, next)
+ if (obj->xen_dmabuf == xen_dmabuf) {
+ pr_debug("Found xen_dmabuf in the wait list, wake\n");
+ complete_all(&obj->completion);
+ }
+}
+
+static struct xen_dmabuf *
+dmabuf_exp_wait_obj_get_by_fd(struct gntdev_priv *priv, int fd)
+{
+ struct xen_dmabuf *q, *xen_dmabuf, *ret = ERR_PTR(-ENOENT);
+
+ mutex_lock(&priv->dmabuf_lock);
+ list_for_each_entry_safe(xen_dmabuf, q, &priv->dmabuf_exp_list, next)
+ if (xen_dmabuf->fd == fd) {
+ pr_debug("Found xen_dmabuf in the wait list\n");
+ kref_get(&xen_dmabuf->u.exp.refcount);
+ ret = xen_dmabuf;
+ break;
+ }
+ mutex_unlock(&priv->dmabuf_lock);
+ return ret;
+}
+
static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
int wait_to_ms)
{
- return -ETIMEDOUT;
+ struct xen_dmabuf *xen_dmabuf;
+ struct xen_dmabuf_wait_obj *obj;
+ int ret;
+
+ pr_debug("Will wait for dma-buf with fd %d\n", fd);
+ /*
+ * Try to find the DMA buffer: if not found means that
+ * either the buffer has already been released or file descriptor
+ * provided is wrong.
+ */
+ xen_dmabuf = dmabuf_exp_wait_obj_get_by_fd(priv, fd);
+ if (IS_ERR(xen_dmabuf))
+ return PTR_ERR(xen_dmabuf);
+
+ /*
+ * xen_dmabuf still exists and is reference count locked by us now,
+ * so prepare to wait: allocate wait object and add it to the wait list,
+ * so we can find it on release.
+ */
+ obj = dmabuf_exp_wait_obj_new(priv, xen_dmabuf);
+ if (IS_ERR(obj)) {
+ pr_err("Failed to setup wait object, ret %ld\n", PTR_ERR(obj));
+ return PTR_ERR(obj);
+ }
+
+ ret = dmabuf_exp_wait_obj_wait(obj, wait_to_ms);
+ dmabuf_exp_wait_obj_free(priv, obj);
+ return ret;
+}
+
+/* ------------------------------------------------------------------ */
+/* DMA buffer export support. */
+/* ------------------------------------------------------------------ */
+
+static struct sg_table *
+dmabuf_pages_to_sgt(struct page **pages, unsigned int nr_pages)
+{
+ struct sg_table *sgt;
+ int ret;
+
+ sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+ if (!sgt) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = sg_alloc_table_from_pages(sgt, pages, nr_pages, 0,
+ nr_pages << PAGE_SHIFT,
+ GFP_KERNEL);
+ if (ret)
+ goto out;
+
+ return sgt;
+
+out:
+ kfree(sgt);
+ return ERR_PTR(ret);
+}
+
+static int dmabuf_exp_ops_attach(struct dma_buf *dma_buf,
+ struct device *target_dev,
+ struct dma_buf_attachment *attach)
+{
+ struct xen_dmabuf_attachment *xen_dmabuf_attach;
+
+ xen_dmabuf_attach = kzalloc(sizeof(*xen_dmabuf_attach), GFP_KERNEL);
+ if (!xen_dmabuf_attach)
+ return -ENOMEM;
+
+ xen_dmabuf_attach->dir = DMA_NONE;
+ attach->priv = xen_dmabuf_attach;
+ /* Might need to pin the pages of the buffer now. */
+ return 0;
+}
+
+static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
+ struct dma_buf_attachment *attach)
+{
+ struct xen_dmabuf_attachment *xen_dmabuf_attach = attach->priv;
+
+ if (xen_dmabuf_attach) {
+ struct sg_table *sgt = xen_dmabuf_attach->sgt;
+
+ if (sgt) {
+ if (xen_dmabuf_attach->dir != DMA_NONE)
+ dma_unmap_sg_attrs(attach->dev, sgt->sgl,
+ sgt->nents,
+ xen_dmabuf_attach->dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
+ sg_free_table(sgt);
+ }
+
+ kfree(sgt);
+ kfree(xen_dmabuf_attach);
+ attach->priv = NULL;
+ }
+ /* Might need to unpin the pages of the buffer now. */
+}
+
+static struct sg_table *
+dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
+ enum dma_data_direction dir)
+{
+ struct xen_dmabuf_attachment *xen_dmabuf_attach = attach->priv;
+ struct xen_dmabuf *xen_dmabuf = attach->dmabuf->priv;
+ struct sg_table *sgt;
+
+ pr_debug("Mapping %d pages for dev %p\n", xen_dmabuf->nr_pages,
+ attach->dev);
+
+ if (WARN_ON(dir == DMA_NONE || !xen_dmabuf_attach))
+ return ERR_PTR(-EINVAL);
+
+ /* Return the cached mapping when possible. */
+ if (xen_dmabuf_attach->dir == dir)
+ return xen_dmabuf_attach->sgt;
+
+ /*
+ * Two mappings with different directions for the same attachment are
+ * not allowed.
+ */
+ if (WARN_ON(xen_dmabuf_attach->dir != DMA_NONE))
+ return ERR_PTR(-EBUSY);
+
+ sgt = dmabuf_pages_to_sgt(xen_dmabuf->pages, xen_dmabuf->nr_pages);
+ if (!IS_ERR(sgt)) {
+ if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
+ DMA_ATTR_SKIP_CPU_SYNC)) {
+ sg_free_table(sgt);
+ kfree(sgt);
+ sgt = ERR_PTR(-ENOMEM);
+ } else {
+ xen_dmabuf_attach->sgt = sgt;
+ xen_dmabuf_attach->dir = dir;
+ }
+ }
+ if (IS_ERR(sgt))
+ pr_err("Failed to map sg table for dev %p\n", attach->dev);
+ return sgt;
+}
+
+static void dmabuf_exp_ops_unmap_dma_buf(struct dma_buf_attachment *attach,
+ struct sg_table *sgt,
+ enum dma_data_direction dir)
+{
+ /* Not implemented. The unmap is done at dmabuf_exp_ops_detach(). */
+}
+
+static void dmabuf_exp_release(struct kref *kref)
+{
+ struct xen_dmabuf *xen_dmabuf =
+ container_of(kref, struct xen_dmabuf, u.exp.refcount);
+
+ dmabuf_exp_wait_obj_signal(xen_dmabuf->priv, xen_dmabuf);
+ list_del(&xen_dmabuf->next);
+ kfree(xen_dmabuf);
+}
+
+static void dmabuf_exp_ops_release(struct dma_buf *dma_buf)
+{
+ struct xen_dmabuf *xen_dmabuf = dma_buf->priv;
+ struct gntdev_priv *priv = xen_dmabuf->priv;
+
+ gntdev_remove_map(priv, xen_dmabuf->u.exp.map);
+ mutex_lock(&priv->dmabuf_lock);
+ kref_put(&xen_dmabuf->u.exp.refcount, dmabuf_exp_release);
+ mutex_unlock(&priv->dmabuf_lock);
+}
+
+static void *dmabuf_exp_ops_kmap_atomic(struct dma_buf *dma_buf,
+ unsigned long page_num)
+{
+ /* Not implemented. */
+ return NULL;
+}
+
+static void dmabuf_exp_ops_kunmap_atomic(struct dma_buf *dma_buf,
+ unsigned long page_num, void *addr)
+{
+ /* Not implemented. */
+}
+
+static void *dmabuf_exp_ops_kmap(struct dma_buf *dma_buf,
+ unsigned long page_num)
+{
+ /* Not implemented. */
+ return NULL;
+}
+
+static void dmabuf_exp_ops_kunmap(struct dma_buf *dma_buf,
+ unsigned long page_num, void *addr)
+{
+ /* Not implemented. */
+}
+
+static int dmabuf_exp_ops_mmap(struct dma_buf *dma_buf,
+ struct vm_area_struct *vma)
+{
+ /* Not implemented. */
+ return 0;
+}
+
+static const struct dma_buf_ops dmabuf_exp_ops = {
+ .attach = dmabuf_exp_ops_attach,
+ .detach = dmabuf_exp_ops_detach,
+ .map_dma_buf = dmabuf_exp_ops_map_dma_buf,
+ .unmap_dma_buf = dmabuf_exp_ops_unmap_dma_buf,
+ .release = dmabuf_exp_ops_release,
+ .map = dmabuf_exp_ops_kmap,
+ .map_atomic = dmabuf_exp_ops_kmap_atomic,
+ .unmap = dmabuf_exp_ops_kunmap,
+ .unmap_atomic = dmabuf_exp_ops_kunmap_atomic,
+ .mmap = dmabuf_exp_ops_mmap,
+};
+
+static int dmabuf_export(struct gntdev_priv *priv, struct grant_map *map,
+ int *fd)
+{
+ DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+ struct xen_dmabuf *xen_dmabuf;
+ int ret = 0;
+
+ xen_dmabuf = kzalloc(sizeof(*xen_dmabuf), GFP_KERNEL);
+ if (!xen_dmabuf)
+ return -ENOMEM;
+
+ kref_init(&xen_dmabuf->u.exp.refcount);
+
+ xen_dmabuf->priv = priv;
+ xen_dmabuf->nr_pages = map->count;
+ xen_dmabuf->pages = map->pages;
+ xen_dmabuf->u.exp.map = map;
+
+ exp_info.exp_name = KBUILD_MODNAME;
+ if (map->dma_dev->driver && map->dma_dev->driver->owner)
+ exp_info.owner = map->dma_dev->driver->owner;
+ else
+ exp_info.owner = THIS_MODULE;
+ exp_info.ops = &dmabuf_exp_ops;
+ exp_info.size = map->count << PAGE_SHIFT;
+ exp_info.flags = O_RDWR;
+ exp_info.priv = xen_dmabuf;
+
+ xen_dmabuf->dmabuf = dma_buf_export(&exp_info);
+ if (IS_ERR(xen_dmabuf->dmabuf)) {
+ ret = PTR_ERR(xen_dmabuf->dmabuf);
+ xen_dmabuf->dmabuf = NULL;
+ goto fail;
+ }
+
+ ret = dma_buf_fd(xen_dmabuf->dmabuf, O_CLOEXEC);
+ if (ret < 0)
+ goto fail;
+
+ xen_dmabuf->fd = ret;
+ *fd = ret;
+
+ pr_debug("Exporting DMA buffer with fd %d\n", ret);
+
+ mutex_lock(&priv->dmabuf_lock);
+ list_add(&xen_dmabuf->next, &priv->dmabuf_exp_list);
+ mutex_unlock(&priv->dmabuf_lock);
+ return 0;
+
+fail:
+ if (xen_dmabuf->dmabuf)
+ dma_buf_put(xen_dmabuf->dmabuf);
+ kfree(xen_dmabuf);
+ return ret;
+}
+
+static struct grant_map *
+dmabuf_exp_alloc_backing_storage(struct gntdev_priv *priv, int dmabuf_flags,
+ int count)
+{
+ struct grant_map *map;
+
+ if (unlikely(count <= 0))
+ return ERR_PTR(-EINVAL);
+
+ if ((dmabuf_flags & GNTDEV_DMA_FLAG_WC) &&
+ (dmabuf_flags & GNTDEV_DMA_FLAG_COHERENT)) {
+ pr_err("Wrong dma-buf flags: either WC or coherent, not both\n");
+ return ERR_PTR(-EINVAL);
+ }
+
+ map = gntdev_alloc_map(priv, count, dmabuf_flags);
+ if (!map)
+ return ERR_PTR(-ENOMEM);
+
+ if (unlikely(atomic_add_return(count, &pages_mapped) > limit)) {
+ pr_err("can't map: over limit\n");
+ gntdev_put_map(NULL, map);
+ return ERR_PTR(-ENOMEM);
+ }
+ return map;
}
static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
int count, u32 domid, u32 *refs, u32 *fd)
{
+ struct grant_map *map;
+ int i, ret;
+
*fd = -1;
- return -EINVAL;
+
+ if (use_ptemod) {
+ pr_err("Cannot provide dma-buf: use_ptemode %d\n",
+ use_ptemod);
+ return -EINVAL;
+ }
+
+ map = dmabuf_exp_alloc_backing_storage(priv, flags, count);
+ if (IS_ERR(map))
+ return PTR_ERR(map);
+
+ for (i = 0; i < count; i++) {
+ map->grants[i].domid = domid;
+ map->grants[i].ref = refs[i];
+ }
+
+ mutex_lock(&priv->lock);
+ gntdev_add_map(priv, map);
+ mutex_unlock(&priv->lock);
+
+ map->flags |= GNTMAP_host_map;
+#if defined(CONFIG_X86)
+ map->flags |= GNTMAP_device_map;
+#endif
+
+ ret = map_grant_pages(map);
+ if (ret < 0)
+ goto out;
+
+ ret = dmabuf_export(priv, map, fd);
+ if (ret < 0)
+ goto out;
+
+ return 0;
+
+out:
+ gntdev_remove_map(priv, map);
+ return ret;
}
/* ------------------------------------------------------------------ */
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Extend grant table module API to allow allocating buffers that can
be used for DMA operations and mapping foreign grant references
on top of those.
The resulting buffer is similar to the one allocated by the balloon
driver in terms that proper memory reservation is made
({increase|decrease}_reservation and VA mappings updated if needed).
This is useful for sharing foreign buffers with HW drivers which
cannot work with scattered buffers provided by the balloon driver,
but require DMAable memory instead.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/Kconfig | 13 ++++
drivers/xen/grant-table.c | 124 ++++++++++++++++++++++++++++++++++++++
include/xen/grant_table.h | 25 ++++++++
3 files changed, 162 insertions(+)
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index e5d0c28372ea..3431fe210624 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -161,6 +161,19 @@ config XEN_GRANT_DEV_ALLOC
to other domains. This can be used to implement frontend drivers
or as part of an inter-domain shared memory channel.
+config XEN_GRANT_DMA_ALLOC
+ bool "Allow allocating DMA capable buffers with grant reference module"
+ depends on XEN
+ help
+ Extends grant table module API to allow allocating DMA capable
+ buffers and mapping foreign grant references on top of it.
+ The resulting buffer is similar to one allocated by the balloon
+ driver in terms that proper memory reservation is made
+ ({increase|decrease}_reservation and VA mappings updated if needed).
+ This is useful for sharing foreign buffers with HW drivers which
+ cannot work with scattered buffers provided by the balloon driver,
+ but require DMAable memory instead.
+
config SWIOTLB_XEN
def_bool y
select SWIOTLB
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index d7488226e1f2..06fe6e7f639c 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -45,6 +45,9 @@
#include <linux/workqueue.h>
#include <linux/ratelimit.h>
#include <linux/moduleparam.h>
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+#include <linux/dma-mapping.h>
+#endif
#include <xen/xen.h>
#include <xen/interface/xen.h>
@@ -57,6 +60,7 @@
#ifdef CONFIG_X86
#include <asm/xen/cpuid.h>
#endif
+#include <xen/mem_reservation.h>
#include <asm/xen/hypercall.h>
#include <asm/xen/interface.h>
@@ -811,6 +815,82 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
}
EXPORT_SYMBOL(gnttab_alloc_pages);
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+/**
+ * gnttab_dma_alloc_pages - alloc DMAable pages suitable for grant mapping into
+ * @args: arguments to the function
+ */
+int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
+{
+ unsigned long pfn, start_pfn;
+ xen_pfn_t *frames;
+ size_t size;
+ int i, ret;
+
+ frames = kcalloc(args->nr_pages, sizeof(*frames), GFP_KERNEL);
+ if (!frames)
+ return -ENOMEM;
+
+ size = args->nr_pages << PAGE_SHIFT;
+ if (args->coherent)
+ args->vaddr = dma_alloc_coherent(args->dev, size,
+ &args->dev_bus_addr,
+ GFP_KERNEL | __GFP_NOWARN);
+ else
+ args->vaddr = dma_alloc_wc(args->dev, size,
+ &args->dev_bus_addr,
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!args->vaddr) {
+ pr_err("Failed to allocate DMA buffer of size %zu\n", size);
+ ret = -ENOMEM;
+ goto fail_free_frames;
+ }
+
+ start_pfn = __phys_to_pfn(args->dev_bus_addr);
+ for (pfn = start_pfn, i = 0; pfn < start_pfn + args->nr_pages;
+ pfn++, i++) {
+ struct page *page = pfn_to_page(pfn);
+
+ args->pages[i] = page;
+ frames[i] = xen_page_to_gfn(page);
+ xenmem_reservation_scrub_page(page);
+ }
+
+ xenmem_reservation_va_mapping_reset(args->nr_pages, args->pages);
+
+ ret = xenmem_reservation_decrease(args->nr_pages, frames);
+ if (ret != args->nr_pages) {
+ pr_err("Failed to decrease reservation for DMA buffer\n");
+ xenmem_reservation_increase(ret, frames);
+ ret = -EFAULT;
+ goto fail_free_dma;
+ }
+
+ ret = gnttab_pages_set_private(args->nr_pages, args->pages);
+ if (ret < 0)
+ goto fail_clear_private;
+
+ kfree(frames);
+ return 0;
+
+fail_clear_private:
+ gnttab_pages_clear_private(args->nr_pages, args->pages);
+fail_free_dma:
+ xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
+ frames);
+ if (args->coherent)
+ dma_free_coherent(args->dev, size,
+ args->vaddr, args->dev_bus_addr);
+ else
+ dma_free_wc(args->dev, size,
+ args->vaddr, args->dev_bus_addr);
+fail_free_frames:
+ kfree(frames);
+ return ret;
+}
+EXPORT_SYMBOL(gnttab_dma_alloc_pages);
+#endif
+
void gnttab_pages_clear_private(int nr_pages, struct page **pages)
{
int i;
@@ -838,6 +918,50 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
}
EXPORT_SYMBOL(gnttab_free_pages);
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+/**
+ * gnttab_dma_free_pages - free DMAable pages
+ * @args: arguments to the function
+ */
+int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
+{
+ xen_pfn_t *frames;
+ size_t size;
+ int i, ret;
+
+ gnttab_pages_clear_private(args->nr_pages, args->pages);
+
+ frames = kcalloc(args->nr_pages, sizeof(*frames), GFP_KERNEL);
+ if (!frames)
+ return -ENOMEM;
+
+ for (i = 0; i < args->nr_pages; i++)
+ frames[i] = page_to_xen_pfn(args->pages[i]);
+
+ ret = xenmem_reservation_increase(args->nr_pages, frames);
+ if (ret != args->nr_pages) {
+ pr_err("Failed to decrease reservation for DMA buffer\n");
+ ret = -EFAULT;
+ } else {
+ ret = 0;
+ }
+
+ xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
+ frames);
+
+ size = args->nr_pages << PAGE_SHIFT;
+ if (args->coherent)
+ dma_free_coherent(args->dev, size,
+ args->vaddr, args->dev_bus_addr);
+ else
+ dma_free_wc(args->dev, size,
+ args->vaddr, args->dev_bus_addr);
+ kfree(frames);
+ return ret;
+}
+EXPORT_SYMBOL(gnttab_dma_free_pages);
+#endif
+
/* Handling of paged out grant targets (GNTST_eagain) */
#define MAX_DELAY 256
static inline void
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index de03f2542bb7..982e34242b9c 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -198,6 +198,31 @@ void gnttab_free_auto_xlat_frames(void);
int gnttab_alloc_pages(int nr_pages, struct page **pages);
void gnttab_free_pages(int nr_pages, struct page **pages);
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+struct gnttab_dma_alloc_args {
+ /* Device for which DMA memory will be/was allocated. */
+ struct device *dev;
+ /*
+ * If set then DMA buffer is coherent and write-combine otherwise.
+ */
+ bool coherent;
+ /*
+ * Number of entries in the @pages array, defines the size
+ * of the DMA buffer.
+ */
+ int nr_pages;
+ /* Array of pages @pages filled with pages of the DMA buffer. */
+ struct page **pages;
+ /* Virtual/CPU address of the DMA buffer. */
+ void *vaddr;
+ /* Bus address of the DMA buffer. */
+ dma_addr_t dev_bus_addr;
+};
+
+int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args);
+int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args);
+#endif
+
int gnttab_pages_set_private(int nr_pages, struct page **pages);
void gnttab_pages_clear_private(int nr_pages, struct page **pages);
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Make set/clear page private code shared and accessible to
other kernel modules which can re-use these instead of open-coding.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/grant-table.c | 54 +++++++++++++++++++++++++--------------
include/xen/grant_table.h | 3 +++
2 files changed, 38 insertions(+), 19 deletions(-)
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 27be107d6480..d7488226e1f2 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -769,29 +769,18 @@ void gnttab_free_auto_xlat_frames(void)
}
EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
-/**
- * gnttab_alloc_pages - alloc pages suitable for grant mapping into
- * @nr_pages: number of pages to alloc
- * @pages: returns the pages
- */
-int gnttab_alloc_pages(int nr_pages, struct page **pages)
+int gnttab_pages_set_private(int nr_pages, struct page **pages)
{
int i;
- int ret;
-
- ret = alloc_xenballooned_pages(nr_pages, pages);
- if (ret < 0)
- return ret;
for (i = 0; i < nr_pages; i++) {
#if BITS_PER_LONG < 64
struct xen_page_foreign *foreign;
foreign = kzalloc(sizeof(*foreign), GFP_KERNEL);
- if (!foreign) {
- gnttab_free_pages(nr_pages, pages);
+ if (!foreign)
return -ENOMEM;
- }
+
set_page_private(pages[i], (unsigned long)foreign);
#endif
SetPagePrivate(pages[i]);
@@ -799,14 +788,30 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
return 0;
}
-EXPORT_SYMBOL(gnttab_alloc_pages);
+EXPORT_SYMBOL(gnttab_pages_set_private);
/**
- * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
- * @nr_pages; number of pages to free
- * @pages: the pages
+ * gnttab_alloc_pages - alloc pages suitable for grant mapping into
+ * @nr_pages: number of pages to alloc
+ * @pages: returns the pages
*/
-void gnttab_free_pages(int nr_pages, struct page **pages)
+int gnttab_alloc_pages(int nr_pages, struct page **pages)
+{
+ int ret;
+
+ ret = alloc_xenballooned_pages(nr_pages, pages);
+ if (ret < 0)
+ return ret;
+
+ ret = gnttab_pages_set_private(nr_pages, pages);
+ if (ret < 0)
+ gnttab_free_pages(nr_pages, pages);
+
+ return ret;
+}
+EXPORT_SYMBOL(gnttab_alloc_pages);
+
+void gnttab_pages_clear_private(int nr_pages, struct page **pages)
{
int i;
@@ -818,6 +823,17 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
ClearPagePrivate(pages[i]);
}
}
+}
+EXPORT_SYMBOL(gnttab_pages_clear_private);
+
+/**
+ * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
+ * @nr_pages; number of pages to free
+ * @pages: the pages
+ */
+void gnttab_free_pages(int nr_pages, struct page **pages)
+{
+ gnttab_pages_clear_private(nr_pages, pages);
free_xenballooned_pages(nr_pages, pages);
}
EXPORT_SYMBOL(gnttab_free_pages);
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 2e37741f6b8d..de03f2542bb7 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -198,6 +198,9 @@ void gnttab_free_auto_xlat_frames(void);
int gnttab_alloc_pages(int nr_pages, struct page **pages);
void gnttab_free_pages(int nr_pages, struct page **pages);
+int gnttab_pages_set_private(int nr_pages, struct page **pages);
+void gnttab_pages_clear_private(int nr_pages, struct page **pages);
+
int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count);
--
2.17.0
On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Make set/clear page private code shared and accessible to
> other kernel modules which can re-use these instead of open-coding.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Memory {increase|decrease}_reservation and VA mappings update/reset
> code used in balloon driver can be made common, so other drivers can
> also re-use the same functionality without open-coding.
> Create a dedicated module
IIUIC this is not really a module, it's a common file.
> for the shared code and export corresponding
> symbols for other kernel modules.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/Makefile | 1 +
> drivers/xen/balloon.c | 71 ++----------------
> drivers/xen/mem-reservation.c | 134 ++++++++++++++++++++++++++++++++++
> include/xen/mem_reservation.h | 29 ++++++++
> 4 files changed, 170 insertions(+), 65 deletions(-)
> create mode 100644 drivers/xen/mem-reservation.c
> create mode 100644 include/xen/mem_reservation.h
>
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 451e833f5931..3c87b0c3aca6 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -2,6 +2,7 @@
> obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
> obj-$(CONFIG_X86) += fallback.o
> obj-y += grant-table.o features.o balloon.o manage.o preempt.o time.o
> +obj-y += mem-reservation.o
> obj-y += events/
> obj-y += xenbus/
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 065f0b607373..57b482d67a3a 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -71,6 +71,7 @@
> #include <xen/balloon.h>
> #include <xen/features.h>
> #include <xen/page.h>
> +#include <xen/mem_reservation.h>
>
> static int xen_hotplug_unpopulated;
>
> @@ -157,13 +158,6 @@ static DECLARE_DELAYED_WORK(balloon_worker, balloon_process);
> #define GFP_BALLOON \
> (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
>
> -static void scrub_page(struct page *page)
> -{
> -#ifdef CONFIG_XEN_SCRUB_PAGES
> - clear_highpage(page);
> -#endif
> -}
> -
> /* balloon_append: add the given page to the balloon. */
> static void __balloon_append(struct page *page)
> {
> @@ -463,11 +457,6 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
> int rc;
> unsigned long i;
> struct page *page;
> - struct xen_memory_reservation reservation = {
> - .address_bits = 0,
> - .extent_order = EXTENT_ORDER,
> - .domid = DOMID_SELF
> - };
>
> if (nr_pages > ARRAY_SIZE(frame_list))
> nr_pages = ARRAY_SIZE(frame_list);
> @@ -486,9 +475,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
> page = balloon_next_page(page);
> }
>
> - set_xen_guest_handle(reservation.extent_start, frame_list);
> - reservation.nr_extents = nr_pages;
> - rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> + rc = xenmem_reservation_increase(nr_pages, frame_list);
> if (rc <= 0)
> return BP_EAGAIN;
>
> @@ -496,29 +483,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
> page = balloon_retrieve(false);
> BUG_ON(page == NULL);
>
> -#ifdef CONFIG_XEN_HAVE_PVMMU
> - /*
> - * We don't support PV MMU when Linux and Xen is using
> - * different page granularity.
> - */
> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> -
> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> - unsigned long pfn = page_to_pfn(page);
> -
> - set_phys_to_machine(pfn, frame_list[i]);
> -
> - /* Link back into the page tables if not highmem. */
> - if (!PageHighMem(page)) {
> - int ret;
> - ret = HYPERVISOR_update_va_mapping(
> - (unsigned long)__va(pfn << PAGE_SHIFT),
> - mfn_pte(frame_list[i], PAGE_KERNEL),
> - 0);
> - BUG_ON(ret);
> - }
> - }
> -#endif
> + xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
Can you make a single call to xenmem_reservation_va_mapping_update(rc,
...)? You need to keep track of pages but presumable they can be put
into an array (or a list). In fact, perhaps we can have
balloon_retrieve() return a set of pages.
>
> /* Relinquish the page back to the allocator. */
> free_reserved_page(page);
> @@ -535,11 +500,6 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
> unsigned long i;
> struct page *page, *tmp;
> int ret;
> - struct xen_memory_reservation reservation = {
> - .address_bits = 0,
> - .extent_order = EXTENT_ORDER,
> - .domid = DOMID_SELF
> - };
> LIST_HEAD(pages);
>
> if (nr_pages > ARRAY_SIZE(frame_list))
> @@ -553,7 +513,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
> break;
> }
> adjust_managed_page_count(page, -1);
> - scrub_page(page);
> + xenmem_reservation_scrub_page(page);
> list_add(&page->lru, &pages);
> }
>
> @@ -575,25 +535,8 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
> /* XENMEM_decrease_reservation requires a GFN */
> frame_list[i++] = xen_page_to_gfn(page);
>
> -#ifdef CONFIG_XEN_HAVE_PVMMU
> - /*
> - * We don't support PV MMU when Linux and Xen is using
> - * different page granularity.
> - */
> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> -
> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> - unsigned long pfn = page_to_pfn(page);
> + xenmem_reservation_va_mapping_reset(1, &page);
and here too.
>
> - if (!PageHighMem(page)) {
> - ret = HYPERVISOR_update_va_mapping(
> - (unsigned long)__va(pfn << PAGE_SHIFT),
> - __pte_ma(0), 0);
> - BUG_ON(ret);
> - }
> - __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> - }
> -#endif
> list_del(&page->lru);
>
> balloon_append(page);
> @@ -601,9 +544,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>
> flush_tlb_all();
>
> - set_xen_guest_handle(reservation.extent_start, frame_list);
> - reservation.nr_extents = nr_pages;
> - ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
> + ret = xenmem_reservation_decrease(nr_pages, frame_list);
> BUG_ON(ret != nr_pages);
>
> balloon_stats.current_pages -= nr_pages;
> diff --git a/drivers/xen/mem-reservation.c b/drivers/xen/mem-reservation.c
> new file mode 100644
> index 000000000000..29882e4324f5
> --- /dev/null
> +++ b/drivers/xen/mem-reservation.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0 OR MIT
Why is this "OR MIT"? The original file was licensed GPLv2 only.
> +
> +/******************************************************************************
> + * Xen memory reservation utilities.
> + *
> + * Copyright (c) 2003, B Dragovic
> + * Copyright (c) 2003-2004, M Williamson, K Fraser
> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
> + * Copyright (c) 2010 Daniel Kiper
> + * Copyright (c) 2018, Oleksandr Andrushchenko, EPAM Systems Inc.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +
> +#include <asm/tlb.h>
> +#include <asm/xen/hypercall.h>
> +
> +#include <xen/interface/memory.h>
> +#include <xen/page.h>
> +
> +/*
> + * Use one extent per PAGE_SIZE to avoid to break down the page into
> + * multiple frame.
> + */
> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
> +
> +void xenmem_reservation_scrub_page(struct page *page)
> +{
> +#ifdef CONFIG_XEN_SCRUB_PAGES
> + clear_highpage(page);
> +#endif
> +}
> +EXPORT_SYMBOL(xenmem_reservation_scrub_page);
> +
> +void xenmem_reservation_va_mapping_update(unsigned long count,
> + struct page **pages,
> + xen_pfn_t *frames)
> +{
> +#ifdef CONFIG_XEN_HAVE_PVMMU
> + int i;
> +
> + for (i = 0; i < count; i++) {
> + struct page *page;
> +
> + page = pages[i];
> + BUG_ON(page == NULL);
> +
> + /*
> + * We don't support PV MMU when Linux and Xen is using
> + * different page granularity.
> + */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
> + if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> + unsigned long pfn = page_to_pfn(page);
> +
> + set_phys_to_machine(pfn, frames[i]);
> +
> + /* Link back into the page tables if not highmem. */
> + if (!PageHighMem(page)) {
> + int ret;
> +
> + ret = HYPERVISOR_update_va_mapping(
> + (unsigned long)__va(pfn << PAGE_SHIFT),
> + mfn_pte(frames[i], PAGE_KERNEL),
> + 0);
> + BUG_ON(ret);
> + }
> + }
> + }
> +#endif
> +}
> +EXPORT_SYMBOL(xenmem_reservation_va_mapping_update);
> +
> +void xenmem_reservation_va_mapping_reset(unsigned long count,
> + struct page **pages)
> +{
> +#ifdef CONFIG_XEN_HAVE_PVMMU
> + int i;
> +
> + for (i = 0; i < count; i++) {
> + /*
> + * We don't support PV MMU when Linux and Xen is using
> + * different page granularity.
> + */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
> + if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> + struct page *page = pages[i];
> + unsigned long pfn = page_to_pfn(page);
> +
> + if (!PageHighMem(page)) {
> + int ret;
> +
> + ret = HYPERVISOR_update_va_mapping(
> + (unsigned long)__va(pfn << PAGE_SHIFT),
> + __pte_ma(0), 0);
> + BUG_ON(ret);
> + }
> + __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> + }
> + }
> +#endif
> +}
> +EXPORT_SYMBOL(xenmem_reservation_va_mapping_reset);
> +
> +int xenmem_reservation_increase(int count, xen_pfn_t *frames)
> +{
> + struct xen_memory_reservation reservation = {
> + .address_bits = 0,
> + .extent_order = EXTENT_ORDER,
> + .domid = DOMID_SELF
> + };
> +
> + set_xen_guest_handle(reservation.extent_start, frames);
> + reservation.nr_extents = count;
> + return HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> +}
> +EXPORT_SYMBOL(xenmem_reservation_increase);
> +
> +int xenmem_reservation_decrease(int count, xen_pfn_t *frames)
> +{
> + struct xen_memory_reservation reservation = {
> + .address_bits = 0,
> + .extent_order = EXTENT_ORDER,
> + .domid = DOMID_SELF
> + };
> +
> + set_xen_guest_handle(reservation.extent_start, frames);
> + reservation.nr_extents = count;
> + return HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
> +}
> +EXPORT_SYMBOL(xenmem_reservation_decrease);
> diff --git a/include/xen/mem_reservation.h b/include/xen/mem_reservation.h
> new file mode 100644
> index 000000000000..9306d9b8743c
> --- /dev/null
> +++ b/include/xen/mem_reservation.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
and here too.
-boris
> +
> +/*
> + * Xen memory reservation utilities.
> + *
> + * Copyright (c) 2003, B Dragovic
> + * Copyright (c) 2003-2004, M Williamson, K Fraser
> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
> + * Copyright (c) 2010 Daniel Kiper
> + * Copyright (c) 2018, Oleksandr Andrushchenko, EPAM Systems Inc.
> + */
> +
> +#ifndef _XENMEM_RESERVATION_H
> +#define _XENMEM_RESERVATION_H
> +
> +void xenmem_reservation_scrub_page(struct page *page);
> +
> +void xenmem_reservation_va_mapping_update(unsigned long count,
> + struct page **pages,
> + xen_pfn_t *frames);
> +
> +void xenmem_reservation_va_mapping_reset(unsigned long count,
> + struct page **pages);
> +
> +int xenmem_reservation_increase(int count, xen_pfn_t *frames);
> +
> +int xenmem_reservation_decrease(int count, xen_pfn_t *frames);
> +
> +#endif
On 05/29/2018 09:04 PM, Boris Ostrovsky wrote:
> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> Memory {increase|decrease}_reservation and VA mappings update/reset
>> code used in balloon driver can be made common, so other drivers can
>> also re-use the same functionality without open-coding.
>> Create a dedicated module
> IIUIC this is not really a module, it's a common file.
Sure, will put "file" here
>
>> for the shared code and export corresponding
>> symbols for other kernel modules.
>>
>> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
>> ---
>> drivers/xen/Makefile | 1 +
>> drivers/xen/balloon.c | 71 ++----------------
>> drivers/xen/mem-reservation.c | 134 ++++++++++++++++++++++++++++++++++
>> include/xen/mem_reservation.h | 29 ++++++++
>> 4 files changed, 170 insertions(+), 65 deletions(-)
>> create mode 100644 drivers/xen/mem-reservation.c
>> create mode 100644 include/xen/mem_reservation.h
>>
>> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
>> index 451e833f5931..3c87b0c3aca6 100644
>> --- a/drivers/xen/Makefile
>> +++ b/drivers/xen/Makefile
>> @@ -2,6 +2,7 @@
>> obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
>> obj-$(CONFIG_X86) += fallback.o
>> obj-y += grant-table.o features.o balloon.o manage.o preempt.o time.o
>> +obj-y += mem-reservation.o
>> obj-y += events/
>> obj-y += xenbus/
>>
>> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
>> index 065f0b607373..57b482d67a3a 100644
>> --- a/drivers/xen/balloon.c
>> +++ b/drivers/xen/balloon.c
>> @@ -71,6 +71,7 @@
>> #include <xen/balloon.h>
>> #include <xen/features.h>
>> #include <xen/page.h>
>> +#include <xen/mem_reservation.h>
>>
>> static int xen_hotplug_unpopulated;
>>
>> @@ -157,13 +158,6 @@ static DECLARE_DELAYED_WORK(balloon_worker, balloon_process);
>> #define GFP_BALLOON \
>> (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
>>
>> -static void scrub_page(struct page *page)
>> -{
>> -#ifdef CONFIG_XEN_SCRUB_PAGES
>> - clear_highpage(page);
>> -#endif
>> -}
>> -
>> /* balloon_append: add the given page to the balloon. */
>> static void __balloon_append(struct page *page)
>> {
>> @@ -463,11 +457,6 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>> int rc;
>> unsigned long i;
>> struct page *page;
>> - struct xen_memory_reservation reservation = {
>> - .address_bits = 0,
>> - .extent_order = EXTENT_ORDER,
>> - .domid = DOMID_SELF
>> - };
>>
>> if (nr_pages > ARRAY_SIZE(frame_list))
>> nr_pages = ARRAY_SIZE(frame_list);
>> @@ -486,9 +475,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>> page = balloon_next_page(page);
>> }
>>
>> - set_xen_guest_handle(reservation.extent_start, frame_list);
>> - reservation.nr_extents = nr_pages;
>> - rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
>> + rc = xenmem_reservation_increase(nr_pages, frame_list);
>> if (rc <= 0)
>> return BP_EAGAIN;
>>
>> @@ -496,29 +483,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>> page = balloon_retrieve(false);
>> BUG_ON(page == NULL);
>>
>> -#ifdef CONFIG_XEN_HAVE_PVMMU
>> - /*
>> - * We don't support PV MMU when Linux and Xen is using
>> - * different page granularity.
>> - */
>> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>> -
>> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> - unsigned long pfn = page_to_pfn(page);
>> -
>> - set_phys_to_machine(pfn, frame_list[i]);
>> -
>> - /* Link back into the page tables if not highmem. */
>> - if (!PageHighMem(page)) {
>> - int ret;
>> - ret = HYPERVISOR_update_va_mapping(
>> - (unsigned long)__va(pfn << PAGE_SHIFT),
>> - mfn_pte(frame_list[i], PAGE_KERNEL),
>> - 0);
>> - BUG_ON(ret);
>> - }
>> - }
>> -#endif
>> + xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
>
> Can you make a single call to xenmem_reservation_va_mapping_update(rc,
> ...)? You need to keep track of pages but presumable they can be put
> into an array (or a list). In fact, perhaps we can have
> balloon_retrieve() return a set of pages.
This is actually how it is used later on for dma-buf, but I just didn't want
to alter original balloon code too much, but this can be done, in order
of simplicity:
1. Similar to frame_list, e.g. static array of struct page* of size
ARRAY_SIZE(frame_list):
more static memory is used, but no allocations
2. Allocated at run-time with kcalloc: allocation can fail
3. Make balloon_retrieve() return a set of pages: will require
list/array allocation
and handling, allocation may fail, balloon_retrieve prototype change
Could you please tell which of the above will fit better?
>
>
>
>>
>> /* Relinquish the page back to the allocator. */
>> free_reserved_page(page);
>> @@ -535,11 +500,6 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>> unsigned long i;
>> struct page *page, *tmp;
>> int ret;
>> - struct xen_memory_reservation reservation = {
>> - .address_bits = 0,
>> - .extent_order = EXTENT_ORDER,
>> - .domid = DOMID_SELF
>> - };
>> LIST_HEAD(pages);
>>
>> if (nr_pages > ARRAY_SIZE(frame_list))
>> @@ -553,7 +513,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>> break;
>> }
>> adjust_managed_page_count(page, -1);
>> - scrub_page(page);
>> + xenmem_reservation_scrub_page(page);
>> list_add(&page->lru, &pages);
>> }
>>
>> @@ -575,25 +535,8 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>> /* XENMEM_decrease_reservation requires a GFN */
>> frame_list[i++] = xen_page_to_gfn(page);
>>
>> -#ifdef CONFIG_XEN_HAVE_PVMMU
>> - /*
>> - * We don't support PV MMU when Linux and Xen is using
>> - * different page granularity.
>> - */
>> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>> -
>> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> - unsigned long pfn = page_to_pfn(page);
>> + xenmem_reservation_va_mapping_reset(1, &page);
>
> and here too.
see above
>
>>
>> - if (!PageHighMem(page)) {
>> - ret = HYPERVISOR_update_va_mapping(
>> - (unsigned long)__va(pfn << PAGE_SHIFT),
>> - __pte_ma(0), 0);
>> - BUG_ON(ret);
>> - }
>> - __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>> - }
>> -#endif
>> list_del(&page->lru);
>>
>> balloon_append(page);
>> @@ -601,9 +544,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>>
>> flush_tlb_all();
>>
>> - set_xen_guest_handle(reservation.extent_start, frame_list);
>> - reservation.nr_extents = nr_pages;
>> - ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
>> + ret = xenmem_reservation_decrease(nr_pages, frame_list);
>> BUG_ON(ret != nr_pages);
>>
>> balloon_stats.current_pages -= nr_pages;
>> diff --git a/drivers/xen/mem-reservation.c b/drivers/xen/mem-reservation.c
>> new file mode 100644
>> index 000000000000..29882e4324f5
>> --- /dev/null
>> +++ b/drivers/xen/mem-reservation.c
>> @@ -0,0 +1,134 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR MIT
>
> Why is this "OR MIT"? The original file was licensed GPLv2 only.
Will fix - I was not sure about the license to be used here, thanks
>
>> +
>> +/******************************************************************************
>> + * Xen memory reservation utilities.
>> + *
>> + * Copyright (c) 2003, B Dragovic
>> + * Copyright (c) 2003-2004, M Williamson, K Fraser
>> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
>> + * Copyright (c) 2010 Daniel Kiper
>> + * Copyright (c) 2018, Oleksandr Andrushchenko, EPAM Systems Inc.
>> + */
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/slab.h>
>> +
>> +#include <asm/tlb.h>
>> +#include <asm/xen/hypercall.h>
>> +
>> +#include <xen/interface/memory.h>
>> +#include <xen/page.h>
>> +
>> +/*
>> + * Use one extent per PAGE_SIZE to avoid to break down the page into
>> + * multiple frame.
>> + */
>> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
>> +
>> +void xenmem_reservation_scrub_page(struct page *page)
>> +{
>> +#ifdef CONFIG_XEN_SCRUB_PAGES
>> + clear_highpage(page);
>> +#endif
>> +}
>> +EXPORT_SYMBOL(xenmem_reservation_scrub_page);
>> +
>> +void xenmem_reservation_va_mapping_update(unsigned long count,
>> + struct page **pages,
>> + xen_pfn_t *frames)
>> +{
>> +#ifdef CONFIG_XEN_HAVE_PVMMU
>> + int i;
>> +
>> + for (i = 0; i < count; i++) {
>> + struct page *page;
>> +
>> + page = pages[i];
>> + BUG_ON(page == NULL);
>> +
>> + /*
>> + * We don't support PV MMU when Linux and Xen is using
>> + * different page granularity.
>> + */
>> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>> +
>> + if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> + unsigned long pfn = page_to_pfn(page);
>> +
>> + set_phys_to_machine(pfn, frames[i]);
>> +
>> + /* Link back into the page tables if not highmem. */
>> + if (!PageHighMem(page)) {
>> + int ret;
>> +
>> + ret = HYPERVISOR_update_va_mapping(
>> + (unsigned long)__va(pfn << PAGE_SHIFT),
>> + mfn_pte(frames[i], PAGE_KERNEL),
>> + 0);
>> + BUG_ON(ret);
>> + }
>> + }
>> + }
>> +#endif
>> +}
>> +EXPORT_SYMBOL(xenmem_reservation_va_mapping_update);
>> +
>> +void xenmem_reservation_va_mapping_reset(unsigned long count,
>> + struct page **pages)
>> +{
>> +#ifdef CONFIG_XEN_HAVE_PVMMU
>> + int i;
>> +
>> + for (i = 0; i < count; i++) {
>> + /*
>> + * We don't support PV MMU when Linux and Xen is using
>> + * different page granularity.
>> + */
>> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>> +
>> + if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> + struct page *page = pages[i];
>> + unsigned long pfn = page_to_pfn(page);
>> +
>> + if (!PageHighMem(page)) {
>> + int ret;
>> +
>> + ret = HYPERVISOR_update_va_mapping(
>> + (unsigned long)__va(pfn << PAGE_SHIFT),
>> + __pte_ma(0), 0);
>> + BUG_ON(ret);
>> + }
>> + __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>> + }
>> + }
>> +#endif
>> +}
>> +EXPORT_SYMBOL(xenmem_reservation_va_mapping_reset);
>> +
>> +int xenmem_reservation_increase(int count, xen_pfn_t *frames)
>> +{
>> + struct xen_memory_reservation reservation = {
>> + .address_bits = 0,
>> + .extent_order = EXTENT_ORDER,
>> + .domid = DOMID_SELF
>> + };
>> +
>> + set_xen_guest_handle(reservation.extent_start, frames);
>> + reservation.nr_extents = count;
>> + return HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
>> +}
>> +EXPORT_SYMBOL(xenmem_reservation_increase);
>> +
>> +int xenmem_reservation_decrease(int count, xen_pfn_t *frames)
>> +{
>> + struct xen_memory_reservation reservation = {
>> + .address_bits = 0,
>> + .extent_order = EXTENT_ORDER,
>> + .domid = DOMID_SELF
>> + };
>> +
>> + set_xen_guest_handle(reservation.extent_start, frames);
>> + reservation.nr_extents = count;
>> + return HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
>> +}
>> +EXPORT_SYMBOL(xenmem_reservation_decrease);
>> diff --git a/include/xen/mem_reservation.h b/include/xen/mem_reservation.h
>> new file mode 100644
>> index 000000000000..9306d9b8743c
>> --- /dev/null
>> +++ b/include/xen/mem_reservation.h
>> @@ -0,0 +1,29 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> and here too.
will fix
>
> -boris
>
Thank you,
Oleksandr
>> +
>> +/*
>> + * Xen memory reservation utilities.
>> + *
>> + * Copyright (c) 2003, B Dragovic
>> + * Copyright (c) 2003-2004, M Williamson, K Fraser
>> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
>> + * Copyright (c) 2010 Daniel Kiper
>> + * Copyright (c) 2018, Oleksandr Andrushchenko, EPAM Systems Inc.
>> + */
>> +
>> +#ifndef _XENMEM_RESERVATION_H
>> +#define _XENMEM_RESERVATION_H
>> +
>> +void xenmem_reservation_scrub_page(struct page *page);
>> +
>> +void xenmem_reservation_va_mapping_update(unsigned long count,
>> + struct page **pages,
>> + xen_pfn_t *frames);
>> +
>> +void xenmem_reservation_va_mapping_reset(unsigned long count,
>> + struct page **pages);
>> +
>> +int xenmem_reservation_increase(int count, xen_pfn_t *frames);
>> +
>> +int xenmem_reservation_decrease(int count, xen_pfn_t *frames);
>> +
>> +#endif
On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
> +
> +void xenmem_reservation_va_mapping_update(unsigned long count,
> + struct page **pages,
> + xen_pfn_t *frames)
> +{
> +#ifdef CONFIG_XEN_HAVE_PVMMU
> + int i;
> +
> + for (i = 0; i < count; i++) {
> + struct page *page;
> +
> + page = pages[i];
> + BUG_ON(page == NULL);
> +
> + /*
> + * We don't support PV MMU when Linux and Xen is using
> + * different page granularity.
> + */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
> + if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> + unsigned long pfn = page_to_pfn(page);
> +
> + set_phys_to_machine(pfn, frames[i]);
> +
> + /* Link back into the page tables if not highmem. */
> + if (!PageHighMem(page)) {
> + int ret;
> +
> + ret = HYPERVISOR_update_va_mapping(
> + (unsigned long)__va(pfn << PAGE_SHIFT),
> + mfn_pte(frames[i], PAGE_KERNEL),
> + 0);
> + BUG_ON(ret);
> + }
> + }
> + }
> +#endif
> +}
> +EXPORT_SYMBOL(xenmem_reservation_va_mapping_update);
> +
> +void xenmem_reservation_va_mapping_reset(unsigned long count,
> + struct page **pages)
> +{
> +#ifdef CONFIG_XEN_HAVE_PVMMU
> + int i;
> +
> + for (i = 0; i < count; i++) {
> + /*
> + * We don't support PV MMU when Linux and Xen is using
> + * different page granularity.
> + */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
> + if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> + struct page *page = pages[i];
> + unsigned long pfn = page_to_pfn(page);
> +
> + if (!PageHighMem(page)) {
> + int ret;
> +
> + ret = HYPERVISOR_update_va_mapping(
> + (unsigned long)__va(pfn << PAGE_SHIFT),
> + __pte_ma(0), 0);
> + BUG_ON(ret);
> + }
> + __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> + }
> + }
> +#endif
> +}
> +EXPORT_SYMBOL(xenmem_reservation_va_mapping_reset);
One other thing I noticed --- both of these can be declared as NOPs in
the header file if !CONFIG_XEN_HAVE_PVMMU.
-boris
On 05/29/2018 09:24 PM, Boris Ostrovsky wrote:
> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>> +
>> +void xenmem_reservation_va_mapping_update(unsigned long count,
>> + struct page **pages,
>> + xen_pfn_t *frames)
>> +{
>> +#ifdef CONFIG_XEN_HAVE_PVMMU
>> + int i;
>> +
>> + for (i = 0; i < count; i++) {
>> + struct page *page;
>> +
>> + page = pages[i];
>> + BUG_ON(page == NULL);
>> +
>> + /*
>> + * We don't support PV MMU when Linux and Xen is using
>> + * different page granularity.
>> + */
>> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>> +
>> + if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> + unsigned long pfn = page_to_pfn(page);
>> +
>> + set_phys_to_machine(pfn, frames[i]);
>> +
>> + /* Link back into the page tables if not highmem. */
>> + if (!PageHighMem(page)) {
>> + int ret;
>> +
>> + ret = HYPERVISOR_update_va_mapping(
>> + (unsigned long)__va(pfn << PAGE_SHIFT),
>> + mfn_pte(frames[i], PAGE_KERNEL),
>> + 0);
>> + BUG_ON(ret);
>> + }
>> + }
>> + }
>> +#endif
>> +}
>> +EXPORT_SYMBOL(xenmem_reservation_va_mapping_update);
>> +
>> +void xenmem_reservation_va_mapping_reset(unsigned long count,
>> + struct page **pages)
>> +{
>> +#ifdef CONFIG_XEN_HAVE_PVMMU
>> + int i;
>> +
>> + for (i = 0; i < count; i++) {
>> + /*
>> + * We don't support PV MMU when Linux and Xen is using
>> + * different page granularity.
>> + */
>> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>> +
>> + if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> + struct page *page = pages[i];
>> + unsigned long pfn = page_to_pfn(page);
>> +
>> + if (!PageHighMem(page)) {
>> + int ret;
>> +
>> + ret = HYPERVISOR_update_va_mapping(
>> + (unsigned long)__va(pfn << PAGE_SHIFT),
>> + __pte_ma(0), 0);
>> + BUG_ON(ret);
>> + }
>> + __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>> + }
>> + }
>> +#endif
>> +}
>> +EXPORT_SYMBOL(xenmem_reservation_va_mapping_reset);
> One other thing I noticed --- both of these can be declared as NOPs in
> the header file if !CONFIG_XEN_HAVE_PVMMU.
Will rework it to be NOp for !CONFIG_XEN_HAVE_PVMMU
> -boris
On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Extend grant table module API to allow allocating buffers that can
> be used for DMA operations and mapping foreign grant references
> on top of those.
> The resulting buffer is similar to the one allocated by the balloon
> driver in terms that proper memory reservation is made
> ({increase|decrease}_reservation and VA mappings updated if needed).
> This is useful for sharing foreign buffers with HW drivers which
> cannot work with scattered buffers provided by the balloon driver,
> but require DMAable memory instead.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/Kconfig | 13 ++++
> drivers/xen/grant-table.c | 124 ++++++++++++++++++++++++++++++++++++++
> include/xen/grant_table.h | 25 ++++++++
> 3 files changed, 162 insertions(+)
>
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index e5d0c28372ea..3431fe210624 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -161,6 +161,19 @@ config XEN_GRANT_DEV_ALLOC
> to other domains. This can be used to implement frontend drivers
> or as part of an inter-domain shared memory channel.
>
> +config XEN_GRANT_DMA_ALLOC
> + bool "Allow allocating DMA capable buffers with grant reference module"
> + depends on XEN
Should it depend on anything from DMA? CONFIG_HAS_DMA for example?
> + help
> + Extends grant table module API to allow allocating DMA capable
> + buffers and mapping foreign grant references on top of it.
> + The resulting buffer is similar to one allocated by the balloon
> + driver in terms that proper memory reservation is made
> + ({increase|decrease}_reservation and VA mappings updated if needed).
> + This is useful for sharing foreign buffers with HW drivers which
> + cannot work with scattered buffers provided by the balloon driver,
> + but require DMAable memory instead.
> +
> config SWIOTLB_XEN
> def_bool y
> select SWIOTLB
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index d7488226e1f2..06fe6e7f639c 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -45,6 +45,9 @@
> #include <linux/workqueue.h>
> #include <linux/ratelimit.h>
> #include <linux/moduleparam.h>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> +#include <linux/dma-mapping.h>
> +#endif
>
> #include <xen/xen.h>
> #include <xen/interface/xen.h>
> @@ -57,6 +60,7 @@
> #ifdef CONFIG_X86
> #include <asm/xen/cpuid.h>
> #endif
> +#include <xen/mem_reservation.h>
> #include <asm/xen/hypercall.h>
> #include <asm/xen/interface.h>
>
> @@ -811,6 +815,82 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
> }
> EXPORT_SYMBOL(gnttab_alloc_pages);
>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> +/**
> + * gnttab_dma_alloc_pages - alloc DMAable pages suitable for grant mapping into
> + * @args: arguments to the function
> + */
> +int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
> +{
> + unsigned long pfn, start_pfn;
> + xen_pfn_t *frames;
> + size_t size;
> + int i, ret;
> +
> + frames = kcalloc(args->nr_pages, sizeof(*frames), GFP_KERNEL);
> + if (!frames)
> + return -ENOMEM;
> +
> + size = args->nr_pages << PAGE_SHIFT;
> + if (args->coherent)
> + args->vaddr = dma_alloc_coherent(args->dev, size,
> + &args->dev_bus_addr,
> + GFP_KERNEL | __GFP_NOWARN);
> + else
> + args->vaddr = dma_alloc_wc(args->dev, size,
> + &args->dev_bus_addr,
> + GFP_KERNEL | __GFP_NOWARN);
> + if (!args->vaddr) {
> + pr_err("Failed to allocate DMA buffer of size %zu\n", size);
> + ret = -ENOMEM;
> + goto fail_free_frames;
> + }
> +
> + start_pfn = __phys_to_pfn(args->dev_bus_addr);
> + for (pfn = start_pfn, i = 0; pfn < start_pfn + args->nr_pages;
> + pfn++, i++) {
> + struct page *page = pfn_to_page(pfn);
> +
> + args->pages[i] = page;
> + frames[i] = xen_page_to_gfn(page);
> + xenmem_reservation_scrub_page(page);
> + }
> +
> + xenmem_reservation_va_mapping_reset(args->nr_pages, args->pages);
> +
> + ret = xenmem_reservation_decrease(args->nr_pages, frames);
> + if (ret != args->nr_pages) {
> + pr_err("Failed to decrease reservation for DMA buffer\n");
> + xenmem_reservation_increase(ret, frames);
> + ret = -EFAULT;
> + goto fail_free_dma;
> + }
> +
> + ret = gnttab_pages_set_private(args->nr_pages, args->pages);
> + if (ret < 0)
> + goto fail_clear_private;
> +
> + kfree(frames);
> + return 0;
> +
> +fail_clear_private:
> + gnttab_pages_clear_private(args->nr_pages, args->pages);
> +fail_free_dma:
Do you need to xenmem_reservation_increase()?
> + xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
> + frames);
> + if (args->coherent)
> + dma_free_coherent(args->dev, size,
> + args->vaddr, args->dev_bus_addr);
> + else
> + dma_free_wc(args->dev, size,
> + args->vaddr, args->dev_bus_addr);
> +fail_free_frames:
> + kfree(frames);
> + return ret;
> +}
> +EXPORT_SYMBOL(gnttab_dma_alloc_pages);
> +#endif
> +
> void gnttab_pages_clear_private(int nr_pages, struct page **pages)
> {
> int i;
> @@ -838,6 +918,50 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
> }
> EXPORT_SYMBOL(gnttab_free_pages);
>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> +/**
> + * gnttab_dma_free_pages - free DMAable pages
> + * @args: arguments to the function
> + */
> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
> +{
> + xen_pfn_t *frames;
> + size_t size;
> + int i, ret;
> +
> + gnttab_pages_clear_private(args->nr_pages, args->pages);
> +
> + frames = kcalloc(args->nr_pages, sizeof(*frames), GFP_KERNEL);
Any way you can do it without allocating memory? One possibility is to
keep allocated frames from gnttab_dma_alloc_pages(). (Not sure I like
that either but it's the only thing I can think of).
> + if (!frames)
> + return -ENOMEM;
> +
> + for (i = 0; i < args->nr_pages; i++)
> + frames[i] = page_to_xen_pfn(args->pages[i]);
Not xen_page_to_gfn()?
> +
> + ret = xenmem_reservation_increase(args->nr_pages, frames);
> + if (ret != args->nr_pages) {
> + pr_err("Failed to decrease reservation for DMA buffer\n");
> + ret = -EFAULT;
> + } else {
> + ret = 0;
> + }
> +
> + xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
> + frames);
> +
> + size = args->nr_pages << PAGE_SHIFT;
> + if (args->coherent)
> + dma_free_coherent(args->dev, size,
> + args->vaddr, args->dev_bus_addr);
> + else
> + dma_free_wc(args->dev, size,
> + args->vaddr, args->dev_bus_addr);
> + kfree(frames);
> + return ret;
> +}
> +EXPORT_SYMBOL(gnttab_dma_free_pages);
> +#endif
> +
> /* Handling of paged out grant targets (GNTST_eagain) */
> #define MAX_DELAY 256
> static inline void
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index de03f2542bb7..982e34242b9c 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -198,6 +198,31 @@ void gnttab_free_auto_xlat_frames(void);
> int gnttab_alloc_pages(int nr_pages, struct page **pages);
> void gnttab_free_pages(int nr_pages, struct page **pages);
>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> +struct gnttab_dma_alloc_args {
> + /* Device for which DMA memory will be/was allocated. */
> + struct device *dev;
> + /*
> + * If set then DMA buffer is coherent and write-combine otherwise.
> + */
Single-line comment
> + bool coherent;
> + /*
> + * Number of entries in the @pages array, defines the size
> + * of the DMA buffer.
> + */
This can be made into single line comment as well. However, I am not
sure this comment as well as those below are necessary. Fields names are
self-describing IMO.
-boris
> + int nr_pages;
> + /* Array of pages @pages filled with pages of the DMA buffer. */
> + struct page **pages;
> + /* Virtual/CPU address of the DMA buffer. */
> + void *vaddr;
> + /* Bus address of the DMA buffer. */
> + dma_addr_t dev_bus_addr;
> +};
> +
> +int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args);
> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args);
> +#endif
> +
> int gnttab_pages_set_private(int nr_pages, struct page **pages);
> void gnttab_pages_clear_private(int nr_pages, struct page **pages);
>
On 05/29/2018 02:22 PM, Oleksandr Andrushchenko wrote:
> On 05/29/2018 09:04 PM, Boris Ostrovsky wrote:
>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>> @@ -463,11 +457,6 @@ static enum bp_state
>> increase_reservation(unsigned long nr_pages)
>> int rc;
>> unsigned long i;
>> struct page *page;
>> - struct xen_memory_reservation reservation = {
>> - .address_bits = 0,
>> - .extent_order = EXTENT_ORDER,
>> - .domid = DOMID_SELF
>> - };
>> if (nr_pages > ARRAY_SIZE(frame_list))
>> nr_pages = ARRAY_SIZE(frame_list);
>> @@ -486,9 +475,7 @@ static enum bp_state
>> increase_reservation(unsigned long nr_pages)
>> page = balloon_next_page(page);
>> }
>> - set_xen_guest_handle(reservation.extent_start, frame_list);
>> - reservation.nr_extents = nr_pages;
>> - rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
>> + rc = xenmem_reservation_increase(nr_pages, frame_list);
>> if (rc <= 0)
>> return BP_EAGAIN;
>> @@ -496,29 +483,7 @@ static enum bp_state
>> increase_reservation(unsigned long nr_pages)
>> page = balloon_retrieve(false);
>> BUG_ON(page == NULL);
>> -#ifdef CONFIG_XEN_HAVE_PVMMU
>> - /*
>> - * We don't support PV MMU when Linux and Xen is using
>> - * different page granularity.
>> - */
>> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>> -
>> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> - unsigned long pfn = page_to_pfn(page);
>> -
>> - set_phys_to_machine(pfn, frame_list[i]);
>> -
>> - /* Link back into the page tables if not highmem. */
>> - if (!PageHighMem(page)) {
>> - int ret;
>> - ret = HYPERVISOR_update_va_mapping(
>> - (unsigned long)__va(pfn << PAGE_SHIFT),
>> - mfn_pte(frame_list[i], PAGE_KERNEL),
>> - 0);
>> - BUG_ON(ret);
>> - }
>> - }
>> -#endif
>> + xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
>>
>> Can you make a single call to xenmem_reservation_va_mapping_update(rc,
>> ...)? You need to keep track of pages but presumable they can be put
>> into an array (or a list). In fact, perhaps we can have
>> balloon_retrieve() return a set of pages.
> This is actually how it is used later on for dma-buf, but I just
> didn't want
> to alter original balloon code too much, but this can be done, in
> order of simplicity:
>
> 1. Similar to frame_list, e.g. static array of struct page* of size
> ARRAY_SIZE(frame_list):
> more static memory is used, but no allocations
>
> 2. Allocated at run-time with kcalloc: allocation can fail
If this is called in freeing DMA buffer code path or in error path then
we shouldn't do it.
>
> 3. Make balloon_retrieve() return a set of pages: will require
> list/array allocation
> and handling, allocation may fail, balloon_retrieve prototype change
balloon pages are strung on the lru list. Can we keep have
balloon_retrieve return a list of pages on that list?
-boris
>
> Could you please tell which of the above will fit better?
>
>>
>>
On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>
> struct unmap_notify {
> @@ -96,10 +104,28 @@ struct grant_map {
> struct gnttab_unmap_grant_ref *kunmap_ops;
> struct page **pages;
> unsigned long pages_vm_start;
> +
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> + /*
> + * If dmabuf_vaddr is not NULL then this mapping is backed by DMA
> + * capable memory.
> + */
> +
> + /* Device for which DMA memory is allocated. */
> + struct device *dma_dev;
> + /* Flags used to create this DMA buffer: GNTDEV_DMABUF_FLAG_XXX. */
> + bool dma_flags;
Again, I think most of the comments here can be dropped. Except possibly
for the flags.
> + /* Virtual/CPU address of the DMA buffer. */
> + void *dma_vaddr;
> + /* Bus address of the DMA buffer. */
> + dma_addr_t dma_bus_addr;
> +#endif
> };
>
> static int unmap_grant_pages(struct grant_map *map, int offset, int pages);
>
> +static struct miscdevice gntdev_miscdev;
> +
> /* ------------------------------------------------------------------ */
>
> static void gntdev_print_maps(struct gntdev_priv *priv,
> @@ -121,8 +147,26 @@ static void gntdev_free_map(struct grant_map *map)
> if (map == NULL)
> return;
>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> + if (map->dma_vaddr) {
> + struct gnttab_dma_alloc_args args;
> +
> + args.dev = map->dma_dev;
> + args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
> + args.nr_pages = map->count;
> + args.pages = map->pages;
> + args.vaddr = map->dma_vaddr;
> + args.dev_bus_addr = map->dma_bus_addr;
> +
> + gnttab_dma_free_pages(&args);
> + } else if (map->pages) {
> + gnttab_free_pages(map->count, map->pages);
> + }
> +#else
> if (map->pages)
> gnttab_free_pages(map->count, map->pages);
> +#endif
> +
} else
#endif
if (map->pages)
gnttab_free_pages(map->count, map->pages);
(and elsewhere)
> kfree(map->pages);
> kfree(map->grants);
> kfree(map->map_ops);
>
> diff --git a/include/uapi/xen/gntdev.h b/include/uapi/xen/gntdev.h
> index 6d1163456c03..2d5a4672f07c 100644
> --- a/include/uapi/xen/gntdev.h
> +++ b/include/uapi/xen/gntdev.h
> @@ -200,4 +200,19 @@ struct ioctl_gntdev_grant_copy {
> /* Send an interrupt on the indicated event channel */
> #define UNMAP_NOTIFY_SEND_EVENT 0x2
>
> +/*
> + * Flags to be used while requesting memory mapping's backing storage
> + * to be allocated with DMA API.
> + */
> +
> +/*
> + * The buffer is backed with memory allocated with dma_alloc_wc.
> + */
> +#define GNTDEV_DMA_FLAG_WC (1 << 1)
Is there a reason you are not using bit 0?
-boris
> +
> +/*
> + * The buffer is backed with memory allocated with dma_alloc_coherent.
> + */
> +#define GNTDEV_DMA_FLAG_COHERENT (1 << 2)
> +
> #endif /* __LINUX_PUBLIC_GNTDEV_H__ */
On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>
> +/*
> + * Create a dma-buf [1] from grant references @refs of count @count provided
> + * by the foreign domain @domid with flags @flags.
> + *
> + * By default dma-buf is backed by system memory pages, but by providing
> + * one of the GNTDEV_DMA_FLAG_XXX flags it can also be created as
> + * a DMA write-combine or coherent buffer, e.g. allocated with dma_alloc_wc/
> + * dma_alloc_coherent.
> + *
> + * Returns 0 if dma-buf was successfully created and the corresponding
> + * dma-buf's file descriptor is returned in @fd.
> + *
> + * [1] https://elixir.bootlin.com/linux/latest/source/Documentation/driver-api/dma-buf.rst
Documentation/driver-api/dma-buf.rst.
-boris
On 25/05/18 17:33, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Make set/clear page private code shared and accessible to
> other kernel modules which can re-use these instead of open-coding.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/grant-table.c | 54 +++++++++++++++++++++++++--------------
> include/xen/grant_table.h | 3 +++
> 2 files changed, 38 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 27be107d6480..d7488226e1f2 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -769,29 +769,18 @@ void gnttab_free_auto_xlat_frames(void)
> }
> EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
>
> -/**
> - * gnttab_alloc_pages - alloc pages suitable for grant mapping into
> - * @nr_pages: number of pages to alloc
> - * @pages: returns the pages
> - */
> -int gnttab_alloc_pages(int nr_pages, struct page **pages)
> +int gnttab_pages_set_private(int nr_pages, struct page **pages)
> {
> int i;
> - int ret;
> -
> - ret = alloc_xenballooned_pages(nr_pages, pages);
> - if (ret < 0)
> - return ret;
>
> for (i = 0; i < nr_pages; i++) {
> #if BITS_PER_LONG < 64
> struct xen_page_foreign *foreign;
>
> foreign = kzalloc(sizeof(*foreign), GFP_KERNEL);
> - if (!foreign) {
> - gnttab_free_pages(nr_pages, pages);
> + if (!foreign)
> return -ENOMEM;
> - }
> +
> set_page_private(pages[i], (unsigned long)foreign);
> #endif
> SetPagePrivate(pages[i]);
> @@ -799,14 +788,30 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
>
> return 0;
> }
> -EXPORT_SYMBOL(gnttab_alloc_pages);
> +EXPORT_SYMBOL(gnttab_pages_set_private);
EXPORT_SYMBOL_GPL()
>
> /**
> - * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
> - * @nr_pages; number of pages to free
> - * @pages: the pages
> + * gnttab_alloc_pages - alloc pages suitable for grant mapping into
> + * @nr_pages: number of pages to alloc
> + * @pages: returns the pages
> */
> -void gnttab_free_pages(int nr_pages, struct page **pages)
> +int gnttab_alloc_pages(int nr_pages, struct page **pages)
> +{
> + int ret;
> +
> + ret = alloc_xenballooned_pages(nr_pages, pages);
> + if (ret < 0)
> + return ret;
> +
> + ret = gnttab_pages_set_private(nr_pages, pages);
> + if (ret < 0)
> + gnttab_free_pages(nr_pages, pages);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(gnttab_alloc_pages);
> +
> +void gnttab_pages_clear_private(int nr_pages, struct page **pages)
> {
> int i;
>
> @@ -818,6 +823,17 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
> ClearPagePrivate(pages[i]);
> }
> }
> +}
> +EXPORT_SYMBOL(gnttab_pages_clear_private);
EXPORT_SYMBOL_GPL()
Juergen
On 25/05/18 17:33, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Memory {increase|decrease}_reservation and VA mappings update/reset
> code used in balloon driver can be made common, so other drivers can
> also re-use the same functionality without open-coding.
> Create a dedicated module for the shared code and export corresponding
> symbols for other kernel modules.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/Makefile | 1 +
> drivers/xen/balloon.c | 71 ++----------------
> drivers/xen/mem-reservation.c | 134 ++++++++++++++++++++++++++++++++++
> include/xen/mem_reservation.h | 29 ++++++++
> 4 files changed, 170 insertions(+), 65 deletions(-)
> create mode 100644 drivers/xen/mem-reservation.c
> create mode 100644 include/xen/mem_reservation.h
Can you please name this include/xen/mem-reservation.h ?
>
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 451e833f5931..3c87b0c3aca6 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -2,6 +2,7 @@
> obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
> obj-$(CONFIG_X86) += fallback.o
> obj-y += grant-table.o features.o balloon.o manage.o preempt.o time.o
> +obj-y += mem-reservation.o
> obj-y += events/
> obj-y += xenbus/
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 065f0b607373..57b482d67a3a 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -71,6 +71,7 @@
> #include <xen/balloon.h>
> #include <xen/features.h>
> #include <xen/page.h>
> +#include <xen/mem_reservation.h>
>
> static int xen_hotplug_unpopulated;
>
> @@ -157,13 +158,6 @@ static DECLARE_DELAYED_WORK(balloon_worker, balloon_process);
> #define GFP_BALLOON \
> (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
>
> -static void scrub_page(struct page *page)
> -{
> -#ifdef CONFIG_XEN_SCRUB_PAGES
> - clear_highpage(page);
> -#endif
> -}
> -
> /* balloon_append: add the given page to the balloon. */
> static void __balloon_append(struct page *page)
> {
> @@ -463,11 +457,6 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
> int rc;
> unsigned long i;
> struct page *page;
> - struct xen_memory_reservation reservation = {
> - .address_bits = 0,
> - .extent_order = EXTENT_ORDER,
> - .domid = DOMID_SELF
> - };
>
> if (nr_pages > ARRAY_SIZE(frame_list))
> nr_pages = ARRAY_SIZE(frame_list);
> @@ -486,9 +475,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
> page = balloon_next_page(page);
> }
>
> - set_xen_guest_handle(reservation.extent_start, frame_list);
> - reservation.nr_extents = nr_pages;
> - rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> + rc = xenmem_reservation_increase(nr_pages, frame_list);
> if (rc <= 0)
> return BP_EAGAIN;
>
> @@ -496,29 +483,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
> page = balloon_retrieve(false);
> BUG_ON(page == NULL);
>
> -#ifdef CONFIG_XEN_HAVE_PVMMU
> - /*
> - * We don't support PV MMU when Linux and Xen is using
> - * different page granularity.
> - */
> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> -
> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> - unsigned long pfn = page_to_pfn(page);
> -
> - set_phys_to_machine(pfn, frame_list[i]);
> -
> - /* Link back into the page tables if not highmem. */
> - if (!PageHighMem(page)) {
> - int ret;
> - ret = HYPERVISOR_update_va_mapping(
> - (unsigned long)__va(pfn << PAGE_SHIFT),
> - mfn_pte(frame_list[i], PAGE_KERNEL),
> - 0);
> - BUG_ON(ret);
> - }
> - }
> -#endif
> + xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
>
> /* Relinquish the page back to the allocator. */
> free_reserved_page(page);
> @@ -535,11 +500,6 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
> unsigned long i;
> struct page *page, *tmp;
> int ret;
> - struct xen_memory_reservation reservation = {
> - .address_bits = 0,
> - .extent_order = EXTENT_ORDER,
> - .domid = DOMID_SELF
> - };
> LIST_HEAD(pages);
>
> if (nr_pages > ARRAY_SIZE(frame_list))
> @@ -553,7 +513,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
> break;
> }
> adjust_managed_page_count(page, -1);
> - scrub_page(page);
> + xenmem_reservation_scrub_page(page);
> list_add(&page->lru, &pages);
> }
>
> @@ -575,25 +535,8 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
> /* XENMEM_decrease_reservation requires a GFN */
> frame_list[i++] = xen_page_to_gfn(page);
>
> -#ifdef CONFIG_XEN_HAVE_PVMMU
> - /*
> - * We don't support PV MMU when Linux and Xen is using
> - * different page granularity.
> - */
> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> -
> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> - unsigned long pfn = page_to_pfn(page);
> + xenmem_reservation_va_mapping_reset(1, &page);
>
> - if (!PageHighMem(page)) {
> - ret = HYPERVISOR_update_va_mapping(
> - (unsigned long)__va(pfn << PAGE_SHIFT),
> - __pte_ma(0), 0);
> - BUG_ON(ret);
> - }
> - __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> - }
> -#endif
> list_del(&page->lru);
>
> balloon_append(page);
> @@ -601,9 +544,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>
> flush_tlb_all();
>
> - set_xen_guest_handle(reservation.extent_start, frame_list);
> - reservation.nr_extents = nr_pages;
> - ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
> + ret = xenmem_reservation_decrease(nr_pages, frame_list);
> BUG_ON(ret != nr_pages);
>
> balloon_stats.current_pages -= nr_pages;
> diff --git a/drivers/xen/mem-reservation.c b/drivers/xen/mem-reservation.c
> new file mode 100644
> index 000000000000..29882e4324f5
> --- /dev/null
> +++ b/drivers/xen/mem-reservation.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0 OR MIT
> +
> +/******************************************************************************
> + * Xen memory reservation utilities.
> + *
> + * Copyright (c) 2003, B Dragovic
> + * Copyright (c) 2003-2004, M Williamson, K Fraser
> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
> + * Copyright (c) 2010 Daniel Kiper
> + * Copyright (c) 2018, Oleksandr Andrushchenko, EPAM Systems Inc.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +
> +#include <asm/tlb.h>
> +#include <asm/xen/hypercall.h>
> +
> +#include <xen/interface/memory.h>
> +#include <xen/page.h>
> +
> +/*
> + * Use one extent per PAGE_SIZE to avoid to break down the page into
> + * multiple frame.
> + */
> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
> +
> +void xenmem_reservation_scrub_page(struct page *page)
> +{
> +#ifdef CONFIG_XEN_SCRUB_PAGES
> + clear_highpage(page);
> +#endif
> +}
> +EXPORT_SYMBOL(xenmem_reservation_scrub_page);
EXPORT_SYMBOL_GPL()
Multiple times below, too.
As a general rule of thumb: new exports should all be
EXPORT_SYMBOL_GPL() if you can't give a reason why they shouldn't be.
Juergen
On 05/30/2018 07:24 AM, Juergen Gross wrote:
> On 25/05/18 17:33, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> Make set/clear page private code shared and accessible to
>> other kernel modules which can re-use these instead of open-coding.
>>
>> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
>> ---
>> drivers/xen/grant-table.c | 54 +++++++++++++++++++++++++--------------
>> include/xen/grant_table.h | 3 +++
>> 2 files changed, 38 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
>> index 27be107d6480..d7488226e1f2 100644
>> --- a/drivers/xen/grant-table.c
>> +++ b/drivers/xen/grant-table.c
>> @@ -769,29 +769,18 @@ void gnttab_free_auto_xlat_frames(void)
>> }
>> EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
>>
>> -/**
>> - * gnttab_alloc_pages - alloc pages suitable for grant mapping into
>> - * @nr_pages: number of pages to alloc
>> - * @pages: returns the pages
>> - */
>> -int gnttab_alloc_pages(int nr_pages, struct page **pages)
>> +int gnttab_pages_set_private(int nr_pages, struct page **pages)
>> {
>> int i;
>> - int ret;
>> -
>> - ret = alloc_xenballooned_pages(nr_pages, pages);
>> - if (ret < 0)
>> - return ret;
>>
>> for (i = 0; i < nr_pages; i++) {
>> #if BITS_PER_LONG < 64
>> struct xen_page_foreign *foreign;
>>
>> foreign = kzalloc(sizeof(*foreign), GFP_KERNEL);
>> - if (!foreign) {
>> - gnttab_free_pages(nr_pages, pages);
>> + if (!foreign)
>> return -ENOMEM;
>> - }
>> +
>> set_page_private(pages[i], (unsigned long)foreign);
>> #endif
>> SetPagePrivate(pages[i]);
>> @@ -799,14 +788,30 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
>>
>> return 0;
>> }
>> -EXPORT_SYMBOL(gnttab_alloc_pages);
>> +EXPORT_SYMBOL(gnttab_pages_set_private);
> EXPORT_SYMBOL_GPL()
Sure, I was confused by the fact that there are only 2 functions in the file
which are exported as:
- EXPORT_SYMBOL(gnttab_alloc_pages);
- EXPORT_SYMBOL(gnttab_free_pages);
and those were the base for the new
gnttab_pages_set_private/gnttab_pages_clear_private
This made me think I have to retain the same EXPORT_SYMBOL for them.
Do you want me to add one more patch into this series and change
gnttab_alloc_pages/gnttab_free_pages to GPL as well?
>>
>> /**
>> - * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
>> - * @nr_pages; number of pages to free
>> - * @pages: the pages
>> + * gnttab_alloc_pages - alloc pages suitable for grant mapping into
>> + * @nr_pages: number of pages to alloc
>> + * @pages: returns the pages
>> */
>> -void gnttab_free_pages(int nr_pages, struct page **pages)
>> +int gnttab_alloc_pages(int nr_pages, struct page **pages)
>> +{
>> + int ret;
>> +
>> + ret = alloc_xenballooned_pages(nr_pages, pages);
>> + if (ret < 0)
>> + return ret;
>> +
>> + ret = gnttab_pages_set_private(nr_pages, pages);
>> + if (ret < 0)
>> + gnttab_free_pages(nr_pages, pages);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(gnttab_alloc_pages);
>> +
>> +void gnttab_pages_clear_private(int nr_pages, struct page **pages)
>> {
>> int i;
>>
>> @@ -818,6 +823,17 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
>> ClearPagePrivate(pages[i]);
>> }
>> }
>> +}
>> +EXPORT_SYMBOL(gnttab_pages_clear_private);
> EXPORT_SYMBOL_GPL()
Will change
>
> Juergen
Thank you,
Oleksandr
On 05/30/2018 07:32 AM, Juergen Gross wrote:
> On 25/05/18 17:33, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> Memory {increase|decrease}_reservation and VA mappings update/reset
>> code used in balloon driver can be made common, so other drivers can
>> also re-use the same functionality without open-coding.
>> Create a dedicated module for the shared code and export corresponding
>> symbols for other kernel modules.
>>
>> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
>> ---
>> drivers/xen/Makefile | 1 +
>> drivers/xen/balloon.c | 71 ++----------------
>> drivers/xen/mem-reservation.c | 134 ++++++++++++++++++++++++++++++++++
>> include/xen/mem_reservation.h | 29 ++++++++
>> 4 files changed, 170 insertions(+), 65 deletions(-)
>> create mode 100644 drivers/xen/mem-reservation.c
>> create mode 100644 include/xen/mem_reservation.h
> Can you please name this include/xen/mem-reservation.h ?
>
Will rename
>> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
>> index 451e833f5931..3c87b0c3aca6 100644
>> --- a/drivers/xen/Makefile
>> +++ b/drivers/xen/Makefile
>> @@ -2,6 +2,7 @@
>> obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
>> obj-$(CONFIG_X86) += fallback.o
>> obj-y += grant-table.o features.o balloon.o manage.o preempt.o time.o
>> +obj-y += mem-reservation.o
>> obj-y += events/
>> obj-y += xenbus/
>>
>> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
>> index 065f0b607373..57b482d67a3a 100644
>> --- a/drivers/xen/balloon.c
>> +++ b/drivers/xen/balloon.c
>> @@ -71,6 +71,7 @@
>> #include <xen/balloon.h>
>> #include <xen/features.h>
>> #include <xen/page.h>
>> +#include <xen/mem_reservation.h>
>>
>> static int xen_hotplug_unpopulated;
>>
>> @@ -157,13 +158,6 @@ static DECLARE_DELAYED_WORK(balloon_worker, balloon_process);
>> #define GFP_BALLOON \
>> (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
>>
>> -static void scrub_page(struct page *page)
>> -{
>> -#ifdef CONFIG_XEN_SCRUB_PAGES
>> - clear_highpage(page);
>> -#endif
>> -}
>> -
>> /* balloon_append: add the given page to the balloon. */
>> static void __balloon_append(struct page *page)
>> {
>> @@ -463,11 +457,6 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>> int rc;
>> unsigned long i;
>> struct page *page;
>> - struct xen_memory_reservation reservation = {
>> - .address_bits = 0,
>> - .extent_order = EXTENT_ORDER,
>> - .domid = DOMID_SELF
>> - };
>>
>> if (nr_pages > ARRAY_SIZE(frame_list))
>> nr_pages = ARRAY_SIZE(frame_list);
>> @@ -486,9 +475,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>> page = balloon_next_page(page);
>> }
>>
>> - set_xen_guest_handle(reservation.extent_start, frame_list);
>> - reservation.nr_extents = nr_pages;
>> - rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
>> + rc = xenmem_reservation_increase(nr_pages, frame_list);
>> if (rc <= 0)
>> return BP_EAGAIN;
>>
>> @@ -496,29 +483,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>> page = balloon_retrieve(false);
>> BUG_ON(page == NULL);
>>
>> -#ifdef CONFIG_XEN_HAVE_PVMMU
>> - /*
>> - * We don't support PV MMU when Linux and Xen is using
>> - * different page granularity.
>> - */
>> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>> -
>> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> - unsigned long pfn = page_to_pfn(page);
>> -
>> - set_phys_to_machine(pfn, frame_list[i]);
>> -
>> - /* Link back into the page tables if not highmem. */
>> - if (!PageHighMem(page)) {
>> - int ret;
>> - ret = HYPERVISOR_update_va_mapping(
>> - (unsigned long)__va(pfn << PAGE_SHIFT),
>> - mfn_pte(frame_list[i], PAGE_KERNEL),
>> - 0);
>> - BUG_ON(ret);
>> - }
>> - }
>> -#endif
>> + xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
>>
>> /* Relinquish the page back to the allocator. */
>> free_reserved_page(page);
>> @@ -535,11 +500,6 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>> unsigned long i;
>> struct page *page, *tmp;
>> int ret;
>> - struct xen_memory_reservation reservation = {
>> - .address_bits = 0,
>> - .extent_order = EXTENT_ORDER,
>> - .domid = DOMID_SELF
>> - };
>> LIST_HEAD(pages);
>>
>> if (nr_pages > ARRAY_SIZE(frame_list))
>> @@ -553,7 +513,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>> break;
>> }
>> adjust_managed_page_count(page, -1);
>> - scrub_page(page);
>> + xenmem_reservation_scrub_page(page);
>> list_add(&page->lru, &pages);
>> }
>>
>> @@ -575,25 +535,8 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>> /* XENMEM_decrease_reservation requires a GFN */
>> frame_list[i++] = xen_page_to_gfn(page);
>>
>> -#ifdef CONFIG_XEN_HAVE_PVMMU
>> - /*
>> - * We don't support PV MMU when Linux and Xen is using
>> - * different page granularity.
>> - */
>> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>> -
>> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> - unsigned long pfn = page_to_pfn(page);
>> + xenmem_reservation_va_mapping_reset(1, &page);
>>
>> - if (!PageHighMem(page)) {
>> - ret = HYPERVISOR_update_va_mapping(
>> - (unsigned long)__va(pfn << PAGE_SHIFT),
>> - __pte_ma(0), 0);
>> - BUG_ON(ret);
>> - }
>> - __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>> - }
>> -#endif
>> list_del(&page->lru);
>>
>> balloon_append(page);
>> @@ -601,9 +544,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>>
>> flush_tlb_all();
>>
>> - set_xen_guest_handle(reservation.extent_start, frame_list);
>> - reservation.nr_extents = nr_pages;
>> - ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
>> + ret = xenmem_reservation_decrease(nr_pages, frame_list);
>> BUG_ON(ret != nr_pages);
>>
>> balloon_stats.current_pages -= nr_pages;
>> diff --git a/drivers/xen/mem-reservation.c b/drivers/xen/mem-reservation.c
>> new file mode 100644
>> index 000000000000..29882e4324f5
>> --- /dev/null
>> +++ b/drivers/xen/mem-reservation.c
>> @@ -0,0 +1,134 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR MIT
>> +
>> +/******************************************************************************
>> + * Xen memory reservation utilities.
>> + *
>> + * Copyright (c) 2003, B Dragovic
>> + * Copyright (c) 2003-2004, M Williamson, K Fraser
>> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
>> + * Copyright (c) 2010 Daniel Kiper
>> + * Copyright (c) 2018, Oleksandr Andrushchenko, EPAM Systems Inc.
>> + */
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/slab.h>
>> +
>> +#include <asm/tlb.h>
>> +#include <asm/xen/hypercall.h>
>> +
>> +#include <xen/interface/memory.h>
>> +#include <xen/page.h>
>> +
>> +/*
>> + * Use one extent per PAGE_SIZE to avoid to break down the page into
>> + * multiple frame.
>> + */
>> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
>> +
>> +void xenmem_reservation_scrub_page(struct page *page)
>> +{
>> +#ifdef CONFIG_XEN_SCRUB_PAGES
>> + clear_highpage(page);
>> +#endif
>> +}
>> +EXPORT_SYMBOL(xenmem_reservation_scrub_page);
> EXPORT_SYMBOL_GPL()
>
> Multiple times below, too.
Ok, will change to _GPL
> As a general rule of thumb: new exports should all be
> EXPORT_SYMBOL_GPL() if you can't give a reason why they shouldn't be.
>
>
> Juergen
Thank you,
Oleksandr
On 05/29/2018 10:10 PM, Boris Ostrovsky wrote:
> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> Extend grant table module API to allow allocating buffers that can
>> be used for DMA operations and mapping foreign grant references
>> on top of those.
>> The resulting buffer is similar to the one allocated by the balloon
>> driver in terms that proper memory reservation is made
>> ({increase|decrease}_reservation and VA mappings updated if needed).
>> This is useful for sharing foreign buffers with HW drivers which
>> cannot work with scattered buffers provided by the balloon driver,
>> but require DMAable memory instead.
>>
>> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
>> ---
>> drivers/xen/Kconfig | 13 ++++
>> drivers/xen/grant-table.c | 124 ++++++++++++++++++++++++++++++++++++++
>> include/xen/grant_table.h | 25 ++++++++
>> 3 files changed, 162 insertions(+)
>>
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index e5d0c28372ea..3431fe210624 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -161,6 +161,19 @@ config XEN_GRANT_DEV_ALLOC
>> to other domains. This can be used to implement frontend drivers
>> or as part of an inter-domain shared memory channel.
>>
>> +config XEN_GRANT_DMA_ALLOC
>> + bool "Allow allocating DMA capable buffers with grant reference module"
>> + depends on XEN
>
> Should it depend on anything from DMA? CONFIG_HAS_DMA for example?
Yes, it must be "depends on XEN && HAS_DMA",
thank you
>
>> + help
>> + Extends grant table module API to allow allocating DMA capable
>> + buffers and mapping foreign grant references on top of it.
>> + The resulting buffer is similar to one allocated by the balloon
>> + driver in terms that proper memory reservation is made
>> + ({increase|decrease}_reservation and VA mappings updated if needed).
>> + This is useful for sharing foreign buffers with HW drivers which
>> + cannot work with scattered buffers provided by the balloon driver,
>> + but require DMAable memory instead.
>> +
>> config SWIOTLB_XEN
>> def_bool y
>> select SWIOTLB
>> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
>> index d7488226e1f2..06fe6e7f639c 100644
>> --- a/drivers/xen/grant-table.c
>> +++ b/drivers/xen/grant-table.c
>> @@ -45,6 +45,9 @@
>> #include <linux/workqueue.h>
>> #include <linux/ratelimit.h>
>> #include <linux/moduleparam.h>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> +#include <linux/dma-mapping.h>
>> +#endif
>>
>> #include <xen/xen.h>
>> #include <xen/interface/xen.h>
>> @@ -57,6 +60,7 @@
>> #ifdef CONFIG_X86
>> #include <asm/xen/cpuid.h>
>> #endif
>> +#include <xen/mem_reservation.h>
>> #include <asm/xen/hypercall.h>
>> #include <asm/xen/interface.h>
>>
>> @@ -811,6 +815,82 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
>> }
>> EXPORT_SYMBOL(gnttab_alloc_pages);
>>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> +/**
>> + * gnttab_dma_alloc_pages - alloc DMAable pages suitable for grant mapping into
>> + * @args: arguments to the function
>> + */
>> +int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
>> +{
>> + unsigned long pfn, start_pfn;
>> + xen_pfn_t *frames;
>> + size_t size;
>> + int i, ret;
>> +
>> + frames = kcalloc(args->nr_pages, sizeof(*frames), GFP_KERNEL);
>> + if (!frames)
>> + return -ENOMEM;
>> +
>> + size = args->nr_pages << PAGE_SHIFT;
>> + if (args->coherent)
>> + args->vaddr = dma_alloc_coherent(args->dev, size,
>> + &args->dev_bus_addr,
>> + GFP_KERNEL | __GFP_NOWARN);
>> + else
>> + args->vaddr = dma_alloc_wc(args->dev, size,
>> + &args->dev_bus_addr,
>> + GFP_KERNEL | __GFP_NOWARN);
>> + if (!args->vaddr) {
>> + pr_err("Failed to allocate DMA buffer of size %zu\n", size);
>> + ret = -ENOMEM;
>> + goto fail_free_frames;
>> + }
>> +
>> + start_pfn = __phys_to_pfn(args->dev_bus_addr);
>> + for (pfn = start_pfn, i = 0; pfn < start_pfn + args->nr_pages;
>> + pfn++, i++) {
>> + struct page *page = pfn_to_page(pfn);
>> +
>> + args->pages[i] = page;
>> + frames[i] = xen_page_to_gfn(page);
>> + xenmem_reservation_scrub_page(page);
>> + }
>> +
>> + xenmem_reservation_va_mapping_reset(args->nr_pages, args->pages);
>> +
>> + ret = xenmem_reservation_decrease(args->nr_pages, frames);
>> + if (ret != args->nr_pages) {
>> + pr_err("Failed to decrease reservation for DMA buffer\n");
>> + xenmem_reservation_increase(ret, frames);
>> + ret = -EFAULT;
>> + goto fail_free_dma;
>> + }
>> +
>> + ret = gnttab_pages_set_private(args->nr_pages, args->pages);
>> + if (ret < 0)
>> + goto fail_clear_private;
>> +
>> + kfree(frames);
>> + return 0;
>> +
>> +fail_clear_private:
>> + gnttab_pages_clear_private(args->nr_pages, args->pages);
>> +fail_free_dma:
>
> Do you need to xenmem_reservation_increase()?
Yes, missed that on fail_clear_private error path, will fix
>
>> + xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
>> + frames);
>> + if (args->coherent)
>> + dma_free_coherent(args->dev, size,
>> + args->vaddr, args->dev_bus_addr);
>> + else
>> + dma_free_wc(args->dev, size,
>> + args->vaddr, args->dev_bus_addr);
>> +fail_free_frames:
>> + kfree(frames);
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(gnttab_dma_alloc_pages);
>> +#endif
>> +
>> void gnttab_pages_clear_private(int nr_pages, struct page **pages)
>> {
>> int i;
>> @@ -838,6 +918,50 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
>> }
>> EXPORT_SYMBOL(gnttab_free_pages);
>>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> +/**
>> + * gnttab_dma_free_pages - free DMAable pages
>> + * @args: arguments to the function
>> + */
>> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
>> +{
>> + xen_pfn_t *frames;
>> + size_t size;
>> + int i, ret;
>> +
>> + gnttab_pages_clear_private(args->nr_pages, args->pages);
>> +
>> + frames = kcalloc(args->nr_pages, sizeof(*frames), GFP_KERNEL);
>
> Any way you can do it without allocating memory? One possibility is to
> keep allocated frames from gnttab_dma_alloc_pages(). (Not sure I like
> that either but it's the only thing I can think of).
Yes, I was also thinking about storing the allocated frames array from
gnttab_dma_alloc_pages(), but that seemed not to be clear enough as
the caller of the gnttab_dma_alloc_pages will need to store those frames
in some context, so we can pass them on free. But the caller doesn't really
need the frames which might confuse, so I decided to make those allocations
on the fly.
But I can still rework that to store the frames if you insist: please
let me know.
>
>
>> + if (!frames)
>> + return -ENOMEM;
>> +
>> + for (i = 0; i < args->nr_pages; i++)
>> + frames[i] = page_to_xen_pfn(args->pages[i]);
>
> Not xen_page_to_gfn()?
Well, according to [1] it should be :
/* XENMEM_populate_physmap requires a PFN based on Xen
* granularity.
*/
frame_list[i] = page_to_xen_pfn(page);
>
>> +
>> + ret = xenmem_reservation_increase(args->nr_pages, frames);
>> + if (ret != args->nr_pages) {
>> + pr_err("Failed to decrease reservation for DMA buffer\n");
>> + ret = -EFAULT;
>> + } else {
>> + ret = 0;
>> + }
>> +
>> + xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
>> + frames);
>> +
>> + size = args->nr_pages << PAGE_SHIFT;
>> + if (args->coherent)
>> + dma_free_coherent(args->dev, size,
>> + args->vaddr, args->dev_bus_addr);
>> + else
>> + dma_free_wc(args->dev, size,
>> + args->vaddr, args->dev_bus_addr);
>> + kfree(frames);
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(gnttab_dma_free_pages);
>> +#endif
>> +
>> /* Handling of paged out grant targets (GNTST_eagain) */
>> #define MAX_DELAY 256
>> static inline void
>> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
>> index de03f2542bb7..982e34242b9c 100644
>> --- a/include/xen/grant_table.h
>> +++ b/include/xen/grant_table.h
>> @@ -198,6 +198,31 @@ void gnttab_free_auto_xlat_frames(void);
>> int gnttab_alloc_pages(int nr_pages, struct page **pages);
>> void gnttab_free_pages(int nr_pages, struct page **pages);
>>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> +struct gnttab_dma_alloc_args {
>> + /* Device for which DMA memory will be/was allocated. */
>> + struct device *dev;
>> + /*
>> + * If set then DMA buffer is coherent and write-combine otherwise.
>> + */
> Single-line comment
will fix
>
>> + bool coherent;
>> + /*
>> + * Number of entries in the @pages array, defines the size
>> + * of the DMA buffer.
>> + */
> This can be made into single line comment as well. However, I am not
> sure this comment as well as those below are necessary. Fields names are
> self-describing IMO.
will remove
>
> -boris
Thank you,
Oleksandr
>> + int nr_pages;
>> + /* Array of pages @pages filled with pages of the DMA buffer. */
>> + struct page **pages;
>> + /* Virtual/CPU address of the DMA buffer. */
>> + void *vaddr;
>> + /* Bus address of the DMA buffer. */
>> + dma_addr_t dev_bus_addr;
>> +};
>> +
>> +int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args);
>> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args);
>> +#endif
>> +
>> int gnttab_pages_set_private(int nr_pages, struct page **pages);
>> void gnttab_pages_clear_private(int nr_pages, struct page **pages);
>>
[1]
https://elixir.bootlin.com/linux/v4.17-rc7/source/drivers/xen/balloon.c#L485
On 05/30/2018 12:52 AM, Boris Ostrovsky wrote:
> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>>
>> struct unmap_notify {
>> @@ -96,10 +104,28 @@ struct grant_map {
>> struct gnttab_unmap_grant_ref *kunmap_ops;
>> struct page **pages;
>> unsigned long pages_vm_start;
>> +
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + /*
>> + * If dmabuf_vaddr is not NULL then this mapping is backed by DMA
>> + * capable memory.
>> + */
>> +
>> + /* Device for which DMA memory is allocated. */
>> + struct device *dma_dev;
>> + /* Flags used to create this DMA buffer: GNTDEV_DMABUF_FLAG_XXX. */
>> + bool dma_flags;
> Again, I think most of the comments here can be dropped. Except possibly
> for the flags.
will drop most of those
>> + /* Virtual/CPU address of the DMA buffer. */
>> + void *dma_vaddr;
>> + /* Bus address of the DMA buffer. */
>> + dma_addr_t dma_bus_addr;
>> +#endif
>> };
>>
>> static int unmap_grant_pages(struct grant_map *map, int offset, int pages);
>>
>> +static struct miscdevice gntdev_miscdev;
>> +
>> /* ------------------------------------------------------------------ */
>>
>> static void gntdev_print_maps(struct gntdev_priv *priv,
>> @@ -121,8 +147,26 @@ static void gntdev_free_map(struct grant_map *map)
>> if (map == NULL)
>> return;
>>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + if (map->dma_vaddr) {
>> + struct gnttab_dma_alloc_args args;
>> +
>> + args.dev = map->dma_dev;
>> + args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>> + args.nr_pages = map->count;
>> + args.pages = map->pages;
>> + args.vaddr = map->dma_vaddr;
>> + args.dev_bus_addr = map->dma_bus_addr;
>> +
>> + gnttab_dma_free_pages(&args);
>> + } else if (map->pages) {
>> + gnttab_free_pages(map->count, map->pages);
>> + }
>> +#else
>> if (map->pages)
>> gnttab_free_pages(map->count, map->pages);
>> +#endif
>> +
> } else
> #endif
> if (map->pages)
> gnttab_free_pages(map->count, map->pages);
>
>
> (and elsewhere)
ok, just wasn't sure if it is ok to violate kernel coding style here ;)
>> kfree(map->pages);
>> kfree(map->grants);
>> kfree(map->map_ops);
>
>
>>
>> diff --git a/include/uapi/xen/gntdev.h b/include/uapi/xen/gntdev.h
>> index 6d1163456c03..2d5a4672f07c 100644
>> --- a/include/uapi/xen/gntdev.h
>> +++ b/include/uapi/xen/gntdev.h
>> @@ -200,4 +200,19 @@ struct ioctl_gntdev_grant_copy {
>> /* Send an interrupt on the indicated event channel */
>> #define UNMAP_NOTIFY_SEND_EVENT 0x2
>>
>> +/*
>> + * Flags to be used while requesting memory mapping's backing storage
>> + * to be allocated with DMA API.
>> + */
>> +
>> +/*
>> + * The buffer is backed with memory allocated with dma_alloc_wc.
>> + */
>> +#define GNTDEV_DMA_FLAG_WC (1 << 1)
>
> Is there a reason you are not using bit 0?
No reason for that, will start from 0
>
> -boris
Thank you,
Oleksandr
>> +
>> +/*
>> + * The buffer is backed with memory allocated with dma_alloc_coherent.
>> + */
>> +#define GNTDEV_DMA_FLAG_COHERENT (1 << 2)
>> +
>> #endif /* __LINUX_PUBLIC_GNTDEV_H__ */
On 05/25/2018 06:33 PM, Oleksandr Andrushchenko wrote:
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 3431fe210624..eaf63a2c7ae6 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -152,6 +152,16 @@ config XEN_GNTDEV
> help
> Allows userspace processes to use grants.
>
> +config XEN_GNTDEV_DMABUF
> + bool "Add support for dma-buf grant access device driver extension"
> + depends on XEN_GNTDEV && XEN_GRANT_DMA_ALLOC
This must be "depends on XEN_GNTDEV && XEN_GRANT_DMA_ALLOC &&
*DMA_SHARED_BUFFER*"
> + help
> + Allows userspace processes and kernel modules to use Xen backed
> + dma-buf implementation. With this extension grant references to
> + the pages of an imported dma-buf can be exported for other domain
> + use and grant references coming from a foreign domain can be
> + converted into a local dma-buf for local export.
> +
>
On 05/30/2018 01:34 AM, Boris Ostrovsky wrote:
> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>
>>
>> +/*
>> + * Create a dma-buf [1] from grant references @refs of count @count provided
>> + * by the foreign domain @domid with flags @flags.
>> + *
>> + * By default dma-buf is backed by system memory pages, but by providing
>> + * one of the GNTDEV_DMA_FLAG_XXX flags it can also be created as
>> + * a DMA write-combine or coherent buffer, e.g. allocated with dma_alloc_wc/
>> + * dma_alloc_coherent.
>> + *
>> + * Returns 0 if dma-buf was successfully created and the corresponding
>> + * dma-buf's file descriptor is returned in @fd.
>> + *
>> + * [1] https://elixir.bootlin.com/linux/latest/source/Documentation/driver-api/dma-buf.rst
>
> Documentation/driver-api/dma-buf.rst.
>
Indeed ;)
> -boris
Thank you,
Oleksandr
On 05/29/2018 11:03 PM, Boris Ostrovsky wrote:
> On 05/29/2018 02:22 PM, Oleksandr Andrushchenko wrote:
>> On 05/29/2018 09:04 PM, Boris Ostrovsky wrote:
>>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>>> @@ -463,11 +457,6 @@ static enum bp_state
>>> increase_reservation(unsigned long nr_pages)
>>> int rc;
>>> unsigned long i;
>>> struct page *page;
>>> - struct xen_memory_reservation reservation = {
>>> - .address_bits = 0,
>>> - .extent_order = EXTENT_ORDER,
>>> - .domid = DOMID_SELF
>>> - };
>>> if (nr_pages > ARRAY_SIZE(frame_list))
>>> nr_pages = ARRAY_SIZE(frame_list);
>>> @@ -486,9 +475,7 @@ static enum bp_state
>>> increase_reservation(unsigned long nr_pages)
>>> page = balloon_next_page(page);
>>> }
>>> - set_xen_guest_handle(reservation.extent_start, frame_list);
>>> - reservation.nr_extents = nr_pages;
>>> - rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
>>> + rc = xenmem_reservation_increase(nr_pages, frame_list);
>>> if (rc <= 0)
>>> return BP_EAGAIN;
>>> @@ -496,29 +483,7 @@ static enum bp_state
>>> increase_reservation(unsigned long nr_pages)
>>> page = balloon_retrieve(false);
>>> BUG_ON(page == NULL);
>>> -#ifdef CONFIG_XEN_HAVE_PVMMU
>>> - /*
>>> - * We don't support PV MMU when Linux and Xen is using
>>> - * different page granularity.
>>> - */
>>> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>>> -
>>> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>>> - unsigned long pfn = page_to_pfn(page);
>>> -
>>> - set_phys_to_machine(pfn, frame_list[i]);
>>> -
>>> - /* Link back into the page tables if not highmem. */
>>> - if (!PageHighMem(page)) {
>>> - int ret;
>>> - ret = HYPERVISOR_update_va_mapping(
>>> - (unsigned long)__va(pfn << PAGE_SHIFT),
>>> - mfn_pte(frame_list[i], PAGE_KERNEL),
>>> - 0);
>>> - BUG_ON(ret);
>>> - }
>>> - }
>>> -#endif
>>> + xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
>>>
>>> Can you make a single call to xenmem_reservation_va_mapping_update(rc,
>>> ...)? You need to keep track of pages but presumable they can be put
>>> into an array (or a list). In fact, perhaps we can have
>>> balloon_retrieve() return a set of pages.
>> This is actually how it is used later on for dma-buf, but I just
>> didn't want
>> to alter original balloon code too much, but this can be done, in
>> order of simplicity:
>>
>> 1. Similar to frame_list, e.g. static array of struct page* of size
>> ARRAY_SIZE(frame_list):
>> more static memory is used, but no allocations
>>
>> 2. Allocated at run-time with kcalloc: allocation can fail
>
> If this is called in freeing DMA buffer code path or in error path then
> we shouldn't do it.
>
>
>> 3. Make balloon_retrieve() return a set of pages: will require
>> list/array allocation
>> and handling, allocation may fail, balloon_retrieve prototype change
>
> balloon pages are strung on the lru list. Can we keep have
> balloon_retrieve return a list of pages on that list?
First of all, before we go deep in details, I will highlight
the goal of the requested change: for balloon driver we call
xenmem_reservation_va_mapping_update(*1*, &page, &frame_list[i]);
from increase_reservation
and
xenmem_reservation_va_mapping_reset(*1*, &page);
from decrease_reservation and it seems to be not elegant because of
that one page/frame passed while we might have multiple pages/frames
passed at once.
In the balloon driver the producer of pages for increase_reservation
is balloon_retrieve(false) and for decrease_reservation it is
alloc_page(gfp).
In case of decrease_reservation the page is added on the list:
LIST_HEAD(pages);
[...]
list_add(&page->lru, &pages);
and in case of increase_reservation it is retrieved page by page
and can be put on a list as well with the same code from
decrease_reservation, e.g.
LIST_HEAD(pages);
[...]
list_add(&page->lru, &pages);
Thus, both decrease_reservation and increase_reservation may hold
their pages on a list before calling
xenmem_reservation_va_mapping_{update|reset}.
For that we need a prototype change:
xenmem_reservation_va_mapping_reset(<nr_pages>, <list of pages>);
But for xenmem_reservation_va_mapping_update it will look like:
xenmem_reservation_va_mapping_update(<nr_pages>, <list of pages>, <array
of frames>)
which seems to be inconsistent. Converting entries of the static
frame_list array
into corresponding list doesn't seem to be cute as well.
For dma-buf use-case arrays are more preferable as dma-buf constructs
scatter-gather
tables from array of pages etc. and if page list is passed then it needs
to be
converted into page array anyways.
So, we can:
1. Keep the prototypes as is, e.g. accept array of pages and use
nr_pages == 1 in
case of balloon driver (existing code)
2. Statically allocate struct page* array in the balloon driver and fill
it with pages
when those pages are retrieved:
static struct page *page_list[ARRAY_SIZE(frame_list)];
which will take additional 8KiB of space on 64-bit platform, but
simplify things a lot.
3. Allocate struct page *page_list[ARRAY_SIZE(frame_list)] dynamically
As to Boris' suggestion "balloon pages are strung on the lru list. Can
we keep have
balloon_retrieve return a list of pages on that list?"
Because of alloc_xenballooned_pages' retry logic for page retireval, e.g.
while (pgno < nr_pages) {
page = balloon_retrieve(true);
if (page) {
[...]
} else {
ret = add_ballooned_pages(nr_pages - pgno);
[...]
}
I wouldn't change things that much.
IMO, we can keep 1 page based API with the only overhead for balloon
driver of
function calls to xenmem_reservation_va_mapping_{update|reset} for each
page.
> -boris
Thank you,
Oleksandr
>
>> Could you please tell which of the above will fit better?
>>
>>>
On 05/25/2018 06:33 PM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Allow creating grant device context for use by kernel modules which
> require functionality, provided by gntdev. Export symbols for dma-buf
> API provided by the module.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/gntdev.c | 116 ++++++++++++++++++++++++++--------------
> include/xen/grant_dev.h | 37 +++++++++++++
> 2 files changed, 113 insertions(+), 40 deletions(-)
> create mode 100644 include/xen/grant_dev.h
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index d8b6168f2cd9..912056f3e909 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -684,14 +684,33 @@ static const struct mmu_notifier_ops gntdev_mmu_ops = {
>
> /* ------------------------------------------------------------------ */
>
> -static int gntdev_open(struct inode *inode, struct file *flip)
> +void gntdev_free_context(struct gntdev_priv *priv)
> +{
> + struct grant_map *map;
> +
> + pr_debug("priv %p\n", priv);
> +
> + mutex_lock(&priv->lock);
> + while (!list_empty(&priv->maps)) {
> + map = list_entry(priv->maps.next, struct grant_map, next);
> + list_del(&map->next);
> + gntdev_put_map(NULL /* already removed */, map);
> + }
> + WARN_ON(!list_empty(&priv->freeable_maps));
> +
> + mutex_unlock(&priv->lock);
> +
> + kfree(priv);
> +}
> +EXPORT_SYMBOL(gntdev_free_context);
I will convert EXPORT_SYMBOL to EXPORT_SYMBOL_GPL here and below
> +
> +struct gntdev_priv *gntdev_alloc_context(struct device *dev)
> {
> struct gntdev_priv *priv;
> - int ret = 0;
>
> priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> if (!priv)
> - return -ENOMEM;
> + return ERR_PTR(-ENOMEM);
>
> INIT_LIST_HEAD(&priv->maps);
> INIT_LIST_HEAD(&priv->freeable_maps);
> @@ -704,6 +723,32 @@ static int gntdev_open(struct inode *inode, struct file *flip)
> INIT_LIST_HEAD(&priv->dmabuf_imp_list);
> #endif
>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> + priv->dma_dev = dev;
> +
> + /*
> + * The device is not spawn from a device tree, so arch_setup_dma_ops
> + * is not called, thus leaving the device with dummy DMA ops.
> + * Fix this call of_dma_configure() with a NULL node to set
> + * default DMA ops.
> + */
> + of_dma_configure(priv->dma_dev, NULL);
> +#endif
> + pr_debug("priv %p\n", priv);
> +
> + return priv;
> +}
> +EXPORT_SYMBOL(gntdev_alloc_context);
> +
> +static int gntdev_open(struct inode *inode, struct file *flip)
> +{
> + struct gntdev_priv *priv;
> + int ret = 0;
> +
> + priv = gntdev_alloc_context(gntdev_miscdev.this_device);
> + if (IS_ERR(priv))
> + return PTR_ERR(priv);
> +
> if (use_ptemod) {
> priv->mm = get_task_mm(current);
> if (!priv->mm) {
> @@ -716,23 +761,11 @@ static int gntdev_open(struct inode *inode, struct file *flip)
> }
>
> if (ret) {
> - kfree(priv);
> + gntdev_free_context(priv);
> return ret;
> }
>
> flip->private_data = priv;
> -#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> - priv->dma_dev = gntdev_miscdev.this_device;
> -
> - /*
> - * The device is not spawn from a device tree, so arch_setup_dma_ops
> - * is not called, thus leaving the device with dummy DMA ops.
> - * Fix this call of_dma_configure() with a NULL node to set
> - * default DMA ops.
> - */
> - of_dma_configure(priv->dma_dev, NULL);
> -#endif
> - pr_debug("priv %p\n", priv);
>
> return 0;
> }
> @@ -740,22 +773,11 @@ static int gntdev_open(struct inode *inode, struct file *flip)
> static int gntdev_release(struct inode *inode, struct file *flip)
> {
> struct gntdev_priv *priv = flip->private_data;
> - struct grant_map *map;
> -
> - pr_debug("priv %p\n", priv);
> -
> - mutex_lock(&priv->lock);
> - while (!list_empty(&priv->maps)) {
> - map = list_entry(priv->maps.next, struct grant_map, next);
> - list_del(&map->next);
> - gntdev_put_map(NULL /* already removed */, map);
> - }
> - WARN_ON(!list_empty(&priv->freeable_maps));
> - mutex_unlock(&priv->lock);
>
> if (use_ptemod)
> mmu_notifier_unregister(&priv->mn, priv->mm);
> - kfree(priv);
> +
> + gntdev_free_context(priv);
> return 0;
> }
>
> @@ -1210,7 +1232,7 @@ dmabuf_exp_wait_obj_get_by_fd(struct gntdev_priv *priv, int fd)
> return ret;
> }
>
> -static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
> +int gntdev_dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
> int wait_to_ms)
> {
> struct xen_dmabuf *xen_dmabuf;
> @@ -1242,6 +1264,7 @@ static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
> dmabuf_exp_wait_obj_free(priv, obj);
> return ret;
> }
> +EXPORT_SYMBOL(gntdev_dmabuf_exp_wait_released);
>
> /* ------------------------------------------------------------------ */
> /* DMA buffer export support. */
> @@ -1511,7 +1534,7 @@ dmabuf_exp_alloc_backing_storage(struct gntdev_priv *priv, int dmabuf_flags,
> return map;
> }
>
> -static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
> +int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
> int count, u32 domid, u32 *refs, u32 *fd)
> {
> struct grant_map *map;
> @@ -1557,6 +1580,7 @@ static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
> gntdev_remove_map(priv, map);
> return ret;
> }
> +EXPORT_SYMBOL(gntdev_dmabuf_exp_from_refs);
>
> /* ------------------------------------------------------------------ */
> /* DMA buffer import support. */
> @@ -1646,8 +1670,9 @@ static struct xen_dmabuf *dmabuf_imp_alloc_storage(int count)
> return ERR_PTR(-ENOMEM);
> }
>
> -static struct xen_dmabuf *
> -dmabuf_imp_to_refs(struct gntdev_priv *priv, int fd, int count, int domid)
> +struct xen_dmabuf *
> +gntdev_dmabuf_imp_to_refs(struct gntdev_priv *priv, int fd,
> + int count, int domid)
> {
> struct xen_dmabuf *xen_dmabuf, *ret;
> struct dma_buf *dma_buf;
> @@ -1736,6 +1761,16 @@ dmabuf_imp_to_refs(struct gntdev_priv *priv, int fd, int count, int domid)
> dma_buf_put(dma_buf);
> return ret;
> }
> +EXPORT_SYMBOL(gntdev_dmabuf_imp_to_refs);
> +
> +u32 *gntdev_dmabuf_imp_get_refs(struct xen_dmabuf *xen_dmabuf)
> +{
> + if (xen_dmabuf)
> + return xen_dmabuf->u.imp.refs;
> +
> + return NULL;
> +}
> +EXPORT_SYMBOL(gntdev_dmabuf_imp_get_refs);
>
> /*
> * Find the hyper dma-buf by its file descriptor and remove
> @@ -1759,7 +1794,7 @@ dmabuf_imp_find_unlink(struct gntdev_priv *priv, int fd)
> return ret;
> }
>
> -static int dmabuf_imp_release(struct gntdev_priv *priv, u32 fd)
> +int gntdev_dmabuf_imp_release(struct gntdev_priv *priv, u32 fd)
> {
> struct xen_dmabuf *xen_dmabuf;
> struct dma_buf_attachment *attach;
> @@ -1785,6 +1820,7 @@ static int dmabuf_imp_release(struct gntdev_priv *priv, u32 fd)
> dmabuf_imp_free_storage(xen_dmabuf);
> return 0;
> }
> +EXPORT_SYMBOL(gntdev_dmabuf_imp_release);
>
> /* ------------------------------------------------------------------ */
> /* DMA buffer IOCTL support. */
> @@ -1810,8 +1846,8 @@ gntdev_ioctl_dmabuf_exp_from_refs(struct gntdev_priv *priv,
> goto out;
> }
>
> - ret = dmabuf_exp_from_refs(priv, op.flags, op.count,
> - op.domid, refs, &op.fd);
> + ret = gntdev_dmabuf_exp_from_refs(priv, op.flags, op.count,
> + op.domid, refs, &op.fd);
> if (ret)
> goto out;
>
> @@ -1832,7 +1868,7 @@ gntdev_ioctl_dmabuf_exp_wait_released(struct gntdev_priv *priv,
> if (copy_from_user(&op, u, sizeof(op)) != 0)
> return -EFAULT;
>
> - return dmabuf_exp_wait_released(priv, op.fd, op.wait_to_ms);
> + return gntdev_dmabuf_exp_wait_released(priv, op.fd, op.wait_to_ms);
> }
>
> static long
> @@ -1846,7 +1882,7 @@ gntdev_ioctl_dmabuf_imp_to_refs(struct gntdev_priv *priv,
> if (copy_from_user(&op, u, sizeof(op)) != 0)
> return -EFAULT;
>
> - xen_dmabuf = dmabuf_imp_to_refs(priv, op.fd, op.count, op.domid);
> + xen_dmabuf = gntdev_dmabuf_imp_to_refs(priv, op.fd, op.count, op.domid);
> if (IS_ERR(xen_dmabuf))
> return PTR_ERR(xen_dmabuf);
>
> @@ -1858,7 +1894,7 @@ gntdev_ioctl_dmabuf_imp_to_refs(struct gntdev_priv *priv,
> return 0;
>
> out_release:
> - dmabuf_imp_release(priv, op.fd);
> + gntdev_dmabuf_imp_release(priv, op.fd);
> return ret;
> }
>
> @@ -1871,7 +1907,7 @@ gntdev_ioctl_dmabuf_imp_release(struct gntdev_priv *priv,
> if (copy_from_user(&op, u, sizeof(op)) != 0)
> return -EFAULT;
>
> - return dmabuf_imp_release(priv, op.fd);
> + return gntdev_dmabuf_imp_release(priv, op.fd);
> }
> #endif
>
> diff --git a/include/xen/grant_dev.h b/include/xen/grant_dev.h
> new file mode 100644
> index 000000000000..faccc9170174
> --- /dev/null
> +++ b/include/xen/grant_dev.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +/*
> + * Grant device kernel API
> + *
> + * Copyright (C) 2018 EPAM Systems Inc.
> + *
> + * Author: Oleksandr Andrushchenko <[email protected]>
> + */
> +
> +#ifndef _GRANT_DEV_H
> +#define _GRANT_DEV_H
> +
> +#include <linux/types.h>
> +
> +struct device;
> +struct gntdev_priv;
> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
> +struct xen_dmabuf;
> +#endif
> +
> +struct gntdev_priv *gntdev_alloc_context(struct device *dev);
> +void gntdev_free_context(struct gntdev_priv *priv);
> +
> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
> +int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
> + int count, u32 domid, u32 *refs, u32 *fd);
> +int gntdev_dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
> + int wait_to_ms);
> +
> +struct xen_dmabuf *gntdev_dmabuf_imp_to_refs(struct gntdev_priv *priv,
> + int fd, int count, int domid);
> +u32 *gntdev_dmabuf_imp_get_refs(struct xen_dmabuf *xen_dmabuf);
> +int gntdev_dmabuf_imp_release(struct gntdev_priv *priv, u32 fd);
> +#endif
> +
> +#endif
On 05/30/2018 02:34 AM, Oleksandr Andrushchenko wrote:
> On 05/29/2018 10:10 PM, Boris Ostrovsky wrote:
>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>> +/**
>> + * gnttab_dma_free_pages - free DMAable pages
>> + * @args: arguments to the function
>> + */
>> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
>> +{
>> + xen_pfn_t *frames;
>> + size_t size;
>> + int i, ret;
>> +
>> + gnttab_pages_clear_private(args->nr_pages, args->pages);
>> +
>> + frames = kcalloc(args->nr_pages, sizeof(*frames), GFP_KERNEL);
>>
>> Any way you can do it without allocating memory? One possibility is to
>> keep allocated frames from gnttab_dma_alloc_pages(). (Not sure I like
>> that either but it's the only thing I can think of).
> Yes, I was also thinking about storing the allocated frames array from
> gnttab_dma_alloc_pages(), but that seemed not to be clear enough as
> the caller of the gnttab_dma_alloc_pages will need to store those frames
> in some context, so we can pass them on free. But the caller doesn't
> really
> need the frames which might confuse, so I decided to make those
> allocations
> on the fly.
> But I can still rework that to store the frames if you insist: please
> let me know.
I would prefer not to allocate anything in the release path. Yes, I
realize that dragging frames array around is not necessary but IMO it's
better than potentially failing an allocation during a teardown. A
comment in the struct definition could explain the reason for having
this field.
>>
>>
>>> + if (!frames)
>>> + return -ENOMEM;
>>> +
>>> + for (i = 0; i < args->nr_pages; i++)
>>> + frames[i] = page_to_xen_pfn(args->pages[i]);
>>
>> Not xen_page_to_gfn()?
> Well, according to [1] it should be :
> /* XENMEM_populate_physmap requires a PFN based on Xen
> * granularity.
> */
> frame_list[i] = page_to_xen_pfn(page);
Ah, yes. I was looking at decrease_reservation and automatically assumed
the same parameter type.
-boris
On 05/30/2018 04:29 AM, Oleksandr Andrushchenko wrote:
> On 05/29/2018 11:03 PM, Boris Ostrovsky wrote:
>> On 05/29/2018 02:22 PM, Oleksandr Andrushchenko wrote:
>>> On 05/29/2018 09:04 PM, Boris Ostrovsky wrote:
>>>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>>>> @@ -463,11 +457,6 @@ static enum bp_state
>>>> increase_reservation(unsigned long nr_pages)
>>>> int rc;
>>>> unsigned long i;
>>>> struct page *page;
>>>> - struct xen_memory_reservation reservation = {
>>>> - .address_bits = 0,
>>>> - .extent_order = EXTENT_ORDER,
>>>> - .domid = DOMID_SELF
>>>> - };
>>>> if (nr_pages > ARRAY_SIZE(frame_list))
>>>> nr_pages = ARRAY_SIZE(frame_list);
>>>> @@ -486,9 +475,7 @@ static enum bp_state
>>>> increase_reservation(unsigned long nr_pages)
>>>> page = balloon_next_page(page);
>>>> }
>>>> - set_xen_guest_handle(reservation.extent_start, frame_list);
>>>> - reservation.nr_extents = nr_pages;
>>>> - rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
>>>> + rc = xenmem_reservation_increase(nr_pages, frame_list);
>>>> if (rc <= 0)
>>>> return BP_EAGAIN;
>>>> @@ -496,29 +483,7 @@ static enum bp_state
>>>> increase_reservation(unsigned long nr_pages)
>>>> page = balloon_retrieve(false);
>>>> BUG_ON(page == NULL);
>>>> -#ifdef CONFIG_XEN_HAVE_PVMMU
>>>> - /*
>>>> - * We don't support PV MMU when Linux and Xen is using
>>>> - * different page granularity.
>>>> - */
>>>> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>>>> -
>>>> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>>>> - unsigned long pfn = page_to_pfn(page);
>>>> -
>>>> - set_phys_to_machine(pfn, frame_list[i]);
>>>> -
>>>> - /* Link back into the page tables if not highmem. */
>>>> - if (!PageHighMem(page)) {
>>>> - int ret;
>>>> - ret = HYPERVISOR_update_va_mapping(
>>>> - (unsigned long)__va(pfn << PAGE_SHIFT),
>>>> - mfn_pte(frame_list[i], PAGE_KERNEL),
>>>> - 0);
>>>> - BUG_ON(ret);
>>>> - }
>>>> - }
>>>> -#endif
>>>> + xenmem_reservation_va_mapping_update(1, &page,
>>>> &frame_list[i]);
>>>>
>>>> Can you make a single call to xenmem_reservation_va_mapping_update(rc,
>>>> ...)? You need to keep track of pages but presumable they can be put
>>>> into an array (or a list). In fact, perhaps we can have
>>>> balloon_retrieve() return a set of pages.
>>> This is actually how it is used later on for dma-buf, but I just
>>> didn't want
>>> to alter original balloon code too much, but this can be done, in
>>> order of simplicity:
>>>
>>> 1. Similar to frame_list, e.g. static array of struct page* of size
>>> ARRAY_SIZE(frame_list):
>>> more static memory is used, but no allocations
>>>
>>> 2. Allocated at run-time with kcalloc: allocation can fail
>>
>> If this is called in freeing DMA buffer code path or in error path then
>> we shouldn't do it.
>>
>>
>>> 3. Make balloon_retrieve() return a set of pages: will require
>>> list/array allocation
>>> and handling, allocation may fail, balloon_retrieve prototype change
>>
>> balloon pages are strung on the lru list. Can we keep have
>> balloon_retrieve return a list of pages on that list?
> First of all, before we go deep in details, I will highlight
> the goal of the requested change: for balloon driver we call
> xenmem_reservation_va_mapping_update(*1*, &page, &frame_list[i]);
> from increase_reservation
> and
> xenmem_reservation_va_mapping_reset(*1*, &page);
> from decrease_reservation and it seems to be not elegant because of
> that one page/frame passed while we might have multiple pages/frames
> passed at once.
>
> In the balloon driver the producer of pages for increase_reservation
> is balloon_retrieve(false) and for decrease_reservation it is
> alloc_page(gfp).
> In case of decrease_reservation the page is added on the list:
> LIST_HEAD(pages);
> [...]
> list_add(&page->lru, &pages);
>
> and in case of increase_reservation it is retrieved page by page
> and can be put on a list as well with the same code from
> decrease_reservation, e.g.
> LIST_HEAD(pages);
> [...]
> list_add(&page->lru, &pages);
>
> Thus, both decrease_reservation and increase_reservation may hold
> their pages on a list before calling
> xenmem_reservation_va_mapping_{update|reset}.
>
> For that we need a prototype change:
> xenmem_reservation_va_mapping_reset(<nr_pages>, <list of pages>);
> But for xenmem_reservation_va_mapping_update it will look like:
> xenmem_reservation_va_mapping_update(<nr_pages>, <list of pages>,
> <array of frames>)
> which seems to be inconsistent. Converting entries of the static
> frame_list array
> into corresponding list doesn't seem to be cute as well.
>
> For dma-buf use-case arrays are more preferable as dma-buf constructs
> scatter-gather
> tables from array of pages etc. and if page list is passed then it
> needs to be
> converted into page array anyways.
>
> So, we can:
> 1. Keep the prototypes as is, e.g. accept array of pages and use
> nr_pages == 1 in
> case of balloon driver (existing code)
> 2. Statically allocate struct page* array in the balloon driver and
> fill it with pages
> when those pages are retrieved:
> static struct page *page_list[ARRAY_SIZE(frame_list)];
> which will take additional 8KiB of space on 64-bit platform, but
> simplify things a lot.
> 3. Allocate struct page *page_list[ARRAY_SIZE(frame_list)] dynamically
>
> As to Boris' suggestion "balloon pages are strung on the lru list. Can
> we keep have
> balloon_retrieve return a list of pages on that list?"
> Because of alloc_xenballooned_pages' retry logic for page retireval, e.g.
> while (pgno < nr_pages) {
> page = balloon_retrieve(true);
> if (page) {
> [...]
> } else {
> ret = add_ballooned_pages(nr_pages - pgno);
> [...]
> }
> I wouldn't change things that much.
>
> IMO, we can keep 1 page based API with the only overhead for balloon
> driver of
> function calls to xenmem_reservation_va_mapping_{update|reset} for
> each page.
I still think what I suggested is doable but we can come back to it
later and keep your per-page implementation for now.
BTW, I also think you can further simplify
xenmem_reservation_va_mapping_* routines by bailing out right away if
xen_feature(XENFEAT_auto_translated_physmap). In fact, you might even
make them inlines, along the lines of
inline void xenmem_reservation_va_mapping_reset(unsigned long count,
struct page **pages)
{
#ifdef CONFIG_XEN_HAVE_PVMMU
if (!xen_feature(XENFEAT_auto_translated_physmap))
__xenmem_reservation_va_mapping_reset(...)
#endif
}
Or some such.
-boris
-boris
On 05/30/2018 06:54 PM, Boris Ostrovsky wrote:
> On 05/30/2018 04:29 AM, Oleksandr Andrushchenko wrote:
>> On 05/29/2018 11:03 PM, Boris Ostrovsky wrote:
>>> On 05/29/2018 02:22 PM, Oleksandr Andrushchenko wrote:
>>>> On 05/29/2018 09:04 PM, Boris Ostrovsky wrote:
>>>>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>>>>> @@ -463,11 +457,6 @@ static enum bp_state
>>>>> increase_reservation(unsigned long nr_pages)
>>>>> int rc;
>>>>> unsigned long i;
>>>>> struct page *page;
>>>>> - struct xen_memory_reservation reservation = {
>>>>> - .address_bits = 0,
>>>>> - .extent_order = EXTENT_ORDER,
>>>>> - .domid = DOMID_SELF
>>>>> - };
>>>>> if (nr_pages > ARRAY_SIZE(frame_list))
>>>>> nr_pages = ARRAY_SIZE(frame_list);
>>>>> @@ -486,9 +475,7 @@ static enum bp_state
>>>>> increase_reservation(unsigned long nr_pages)
>>>>> page = balloon_next_page(page);
>>>>> }
>>>>> - set_xen_guest_handle(reservation.extent_start, frame_list);
>>>>> - reservation.nr_extents = nr_pages;
>>>>> - rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
>>>>> + rc = xenmem_reservation_increase(nr_pages, frame_list);
>>>>> if (rc <= 0)
>>>>> return BP_EAGAIN;
>>>>> @@ -496,29 +483,7 @@ static enum bp_state
>>>>> increase_reservation(unsigned long nr_pages)
>>>>> page = balloon_retrieve(false);
>>>>> BUG_ON(page == NULL);
>>>>> -#ifdef CONFIG_XEN_HAVE_PVMMU
>>>>> - /*
>>>>> - * We don't support PV MMU when Linux and Xen is using
>>>>> - * different page granularity.
>>>>> - */
>>>>> - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>>>>> -
>>>>> - if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>>>>> - unsigned long pfn = page_to_pfn(page);
>>>>> -
>>>>> - set_phys_to_machine(pfn, frame_list[i]);
>>>>> -
>>>>> - /* Link back into the page tables if not highmem. */
>>>>> - if (!PageHighMem(page)) {
>>>>> - int ret;
>>>>> - ret = HYPERVISOR_update_va_mapping(
>>>>> - (unsigned long)__va(pfn << PAGE_SHIFT),
>>>>> - mfn_pte(frame_list[i], PAGE_KERNEL),
>>>>> - 0);
>>>>> - BUG_ON(ret);
>>>>> - }
>>>>> - }
>>>>> -#endif
>>>>> + xenmem_reservation_va_mapping_update(1, &page,
>>>>> &frame_list[i]);
>>>>>
>>>>> Can you make a single call to xenmem_reservation_va_mapping_update(rc,
>>>>> ...)? You need to keep track of pages but presumable they can be put
>>>>> into an array (or a list). In fact, perhaps we can have
>>>>> balloon_retrieve() return a set of pages.
>>>> This is actually how it is used later on for dma-buf, but I just
>>>> didn't want
>>>> to alter original balloon code too much, but this can be done, in
>>>> order of simplicity:
>>>>
>>>> 1. Similar to frame_list, e.g. static array of struct page* of size
>>>> ARRAY_SIZE(frame_list):
>>>> more static memory is used, but no allocations
>>>>
>>>> 2. Allocated at run-time with kcalloc: allocation can fail
>>> If this is called in freeing DMA buffer code path or in error path then
>>> we shouldn't do it.
>>>
>>>
>>>> 3. Make balloon_retrieve() return a set of pages: will require
>>>> list/array allocation
>>>> and handling, allocation may fail, balloon_retrieve prototype change
>>> balloon pages are strung on the lru list. Can we keep have
>>> balloon_retrieve return a list of pages on that list?
>> First of all, before we go deep in details, I will highlight
>> the goal of the requested change: for balloon driver we call
>> xenmem_reservation_va_mapping_update(*1*, &page, &frame_list[i]);
>> from increase_reservation
>> and
>> xenmem_reservation_va_mapping_reset(*1*, &page);
>> from decrease_reservation and it seems to be not elegant because of
>> that one page/frame passed while we might have multiple pages/frames
>> passed at once.
>>
>> In the balloon driver the producer of pages for increase_reservation
>> is balloon_retrieve(false) and for decrease_reservation it is
>> alloc_page(gfp).
>> In case of decrease_reservation the page is added on the list:
>> LIST_HEAD(pages);
>> [...]
>> list_add(&page->lru, &pages);
>>
>> and in case of increase_reservation it is retrieved page by page
>> and can be put on a list as well with the same code from
>> decrease_reservation, e.g.
>> LIST_HEAD(pages);
>> [...]
>> list_add(&page->lru, &pages);
>>
>> Thus, both decrease_reservation and increase_reservation may hold
>> their pages on a list before calling
>> xenmem_reservation_va_mapping_{update|reset}.
>>
>> For that we need a prototype change:
>> xenmem_reservation_va_mapping_reset(<nr_pages>, <list of pages>);
>> But for xenmem_reservation_va_mapping_update it will look like:
>> xenmem_reservation_va_mapping_update(<nr_pages>, <list of pages>,
>> <array of frames>)
>> which seems to be inconsistent. Converting entries of the static
>> frame_list array
>> into corresponding list doesn't seem to be cute as well.
>>
>> For dma-buf use-case arrays are more preferable as dma-buf constructs
>> scatter-gather
>> tables from array of pages etc. and if page list is passed then it
>> needs to be
>> converted into page array anyways.
>>
>> So, we can:
>> 1. Keep the prototypes as is, e.g. accept array of pages and use
>> nr_pages == 1 in
>> case of balloon driver (existing code)
>> 2. Statically allocate struct page* array in the balloon driver and
>> fill it with pages
>> when those pages are retrieved:
>> static struct page *page_list[ARRAY_SIZE(frame_list)];
>> which will take additional 8KiB of space on 64-bit platform, but
>> simplify things a lot.
>> 3. Allocate struct page *page_list[ARRAY_SIZE(frame_list)] dynamically
>>
>> As to Boris' suggestion "balloon pages are strung on the lru list. Can
>> we keep have
>> balloon_retrieve return a list of pages on that list?"
>> Because of alloc_xenballooned_pages' retry logic for page retireval, e.g.
>> while (pgno < nr_pages) {
>> page = balloon_retrieve(true);
>> if (page) {
>> [...]
>> } else {
>> ret = add_ballooned_pages(nr_pages - pgno);
>> [...]
>> }
>> I wouldn't change things that much.
>>
>> IMO, we can keep 1 page based API with the only overhead for balloon
>> driver of
>> function calls to xenmem_reservation_va_mapping_{update|reset} for
>> each page.
>
>
> I still think what I suggested is doable but we can come back to it
> later and keep your per-page implementation for now.
>
> BTW, I also think you can further simplify
> xenmem_reservation_va_mapping_* routines by bailing out right away if
> xen_feature(XENFEAT_auto_translated_physmap). In fact, you might even
> make them inlines, along the lines of
>
> inline void xenmem_reservation_va_mapping_reset(unsigned long count,
> struct page **pages)
> {
> #ifdef CONFIG_XEN_HAVE_PVMMU
> if (!xen_feature(XENFEAT_auto_translated_physmap))
> __xenmem_reservation_va_mapping_reset(...)
> #endif
> }
How about:
#ifdef CONFIG_XEN_HAVE_PVMMU
static inline __xenmem_reservation_va_mapping_reset(struct page *page)
{
[...]
}
#endif
and
void xenmem_reservation_va_mapping_reset(unsigned long count,
struct page **pages)
{
#ifdef CONFIG_XEN_HAVE_PVMMU
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
int i;
for (i = 0; i < count; i++)
__xenmem_reservation_va_mapping_reset(pages[i]);
}
#endif
}
This way I can use __xenmem_reservation_va_mapping_reset(page);
instead of xenmem_reservation_va_mapping_reset(1, &page);
> Or some such.
>
> -boris
>
> -boris
>
>
On 05/30/2018 06:20 PM, Boris Ostrovsky wrote:
> On 05/30/2018 02:34 AM, Oleksandr Andrushchenko wrote:
>> On 05/29/2018 10:10 PM, Boris Ostrovsky wrote:
>>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>>> +/**
>>> + * gnttab_dma_free_pages - free DMAable pages
>>> + * @args: arguments to the function
>>> + */
>>> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
>>> +{
>>> + xen_pfn_t *frames;
>>> + size_t size;
>>> + int i, ret;
>>> +
>>> + gnttab_pages_clear_private(args->nr_pages, args->pages);
>>> +
>>> + frames = kcalloc(args->nr_pages, sizeof(*frames), GFP_KERNEL);
>>>
>>> Any way you can do it without allocating memory? One possibility is to
>>> keep allocated frames from gnttab_dma_alloc_pages(). (Not sure I like
>>> that either but it's the only thing I can think of).
>> Yes, I was also thinking about storing the allocated frames array from
>> gnttab_dma_alloc_pages(), but that seemed not to be clear enough as
>> the caller of the gnttab_dma_alloc_pages will need to store those frames
>> in some context, so we can pass them on free. But the caller doesn't
>> really
>> need the frames which might confuse, so I decided to make those
>> allocations
>> on the fly.
>> But I can still rework that to store the frames if you insist: please
>> let me know.
>
> I would prefer not to allocate anything in the release path. Yes, I
> realize that dragging frames array around is not necessary but IMO it's
> better than potentially failing an allocation during a teardown. A
> comment in the struct definition could explain the reason for having
> this field.
Then I would suggest we have it this way: current API requires that
struct page **pages are allocated from outside. So, let's allocate
the frames from outside as well. This way the caller is responsible for
both pages and frames arrays and API looks consistent.
>
>>>
>>>> + if (!frames)
>>>> + return -ENOMEM;
>>>> +
>>>> + for (i = 0; i < args->nr_pages; i++)
>>>> + frames[i] = page_to_xen_pfn(args->pages[i]);
>>> Not xen_page_to_gfn()?
>> Well, according to [1] it should be :
>> /* XENMEM_populate_physmap requires a PFN based on Xen
>> * granularity.
>> */
>> frame_list[i] = page_to_xen_pfn(page);
>
> Ah, yes. I was looking at decrease_reservation and automatically assumed
> the same parameter type.
Good, then this one is resolved
>
> -boris
>
>
Thank you,
Oleksandr
On 05/30/2018 01:46 PM, Oleksandr Andrushchenko wrote:
> On 05/30/2018 06:54 PM, Boris Ostrovsky wrote:
>>
>>
>> BTW, I also think you can further simplify
>> xenmem_reservation_va_mapping_* routines by bailing out right away if
>> xen_feature(XENFEAT_auto_translated_physmap). In fact, you might even
>> make them inlines, along the lines of
>>
>> inline void xenmem_reservation_va_mapping_reset(unsigned long count,
>> struct page **pages)
>> {
>> #ifdef CONFIG_XEN_HAVE_PVMMU
>> if (!xen_feature(XENFEAT_auto_translated_physmap))
>> __xenmem_reservation_va_mapping_reset(...)
>> #endif
>> }
> How about:
>
> #ifdef CONFIG_XEN_HAVE_PVMMU
> static inline __xenmem_reservation_va_mapping_reset(struct page *page)
> {
> [...]
> }
> #endif
>
> and
>
> void xenmem_reservation_va_mapping_reset(unsigned long count,
> struct page **pages)
> {
> #ifdef CONFIG_XEN_HAVE_PVMMU
> if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> int i;
>
> for (i = 0; i < count; i++)
> __xenmem_reservation_va_mapping_reset(pages[i]);
> }
> #endif
> }
>
> This way I can use __xenmem_reservation_va_mapping_reset(page);
> instead of xenmem_reservation_va_mapping_reset(1, &page);
Sure, this also works.
-boris
On 05/30/2018 01:49 PM, Oleksandr Andrushchenko wrote:
> On 05/30/2018 06:20 PM, Boris Ostrovsky wrote:
>> On 05/30/2018 02:34 AM, Oleksandr Andrushchenko wrote:
>>> On 05/29/2018 10:10 PM, Boris Ostrovsky wrote:
>>>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>>>> +/**
>>>> + * gnttab_dma_free_pages - free DMAable pages
>>>> + * @args: arguments to the function
>>>> + */
>>>> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
>>>> +{
>>>> + xen_pfn_t *frames;
>>>> + size_t size;
>>>> + int i, ret;
>>>> +
>>>> + gnttab_pages_clear_private(args->nr_pages, args->pages);
>>>> +
>>>> + frames = kcalloc(args->nr_pages, sizeof(*frames), GFP_KERNEL);
>>>>
>>>> Any way you can do it without allocating memory? One possibility is to
>>>> keep allocated frames from gnttab_dma_alloc_pages(). (Not sure I like
>>>> that either but it's the only thing I can think of).
>>> Yes, I was also thinking about storing the allocated frames array from
>>> gnttab_dma_alloc_pages(), but that seemed not to be clear enough as
>>> the caller of the gnttab_dma_alloc_pages will need to store those
>>> frames
>>> in some context, so we can pass them on free. But the caller doesn't
>>> really
>>> need the frames which might confuse, so I decided to make those
>>> allocations
>>> on the fly.
>>> But I can still rework that to store the frames if you insist: please
>>> let me know.
>>
>> I would prefer not to allocate anything in the release path. Yes, I
>> realize that dragging frames array around is not necessary but IMO it's
>> better than potentially failing an allocation during a teardown. A
>> comment in the struct definition could explain the reason for having
>> this field.
> Then I would suggest we have it this way: current API requires that
> struct page **pages are allocated from outside. So, let's allocate
> the frames from outside as well. This way the caller is responsible for
> both pages and frames arrays and API looks consistent.
Yes, that works too.
-boris
On Fri, May 25, 2018 at 06:33:24PM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Make set/clear page private code shared and accessible to
> other kernel modules which can re-use these instead of open-coding.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/grant-table.c | 54 +++++++++++++++++++++++++--------------
> include/xen/grant_table.h | 3 +++
> 2 files changed, 38 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 27be107d6480..d7488226e1f2 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -769,29 +769,18 @@ void gnttab_free_auto_xlat_frames(void)
> }
> EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
>
> -/**
> - * gnttab_alloc_pages - alloc pages suitable for grant mapping into
> - * @nr_pages: number of pages to alloc
> - * @pages: returns the pages
> - */
> -int gnttab_alloc_pages(int nr_pages, struct page **pages)
> +int gnttab_pages_set_private(int nr_pages, struct page **pages)
> {
> int i;
> - int ret;
> -
> - ret = alloc_xenballooned_pages(nr_pages, pages);
> - if (ret < 0)
> - return ret;
>
> for (i = 0; i < nr_pages; i++) {
> #if BITS_PER_LONG < 64
> struct xen_page_foreign *foreign;
>
> foreign = kzalloc(sizeof(*foreign), GFP_KERNEL);
> - if (!foreign) {
> - gnttab_free_pages(nr_pages, pages);
> + if (!foreign)
Don't we have to free previously allocated "foreign"(s) if it fails in the middle
(e.g. 0 < i && i < nr_pages - 1) before returning?
> return -ENOMEM;
> - }
> +
> set_page_private(pages[i], (unsigned long)foreign);
> #endif
> SetPagePrivate(pages[i]);
> @@ -799,14 +788,30 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
>
> return 0;
> }
> -EXPORT_SYMBOL(gnttab_alloc_pages);
> +EXPORT_SYMBOL(gnttab_pages_set_private);
>
> /**
> - * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
> - * @nr_pages; number of pages to free
> - * @pages: the pages
> + * gnttab_alloc_pages - alloc pages suitable for grant mapping into
> + * @nr_pages: number of pages to alloc
> + * @pages: returns the pages
> */
> -void gnttab_free_pages(int nr_pages, struct page **pages)
> +int gnttab_alloc_pages(int nr_pages, struct page **pages)
> +{
> + int ret;
> +
> + ret = alloc_xenballooned_pages(nr_pages, pages);
> + if (ret < 0)
> + return ret;
> +
> + ret = gnttab_pages_set_private(nr_pages, pages);
> + if (ret < 0)
> + gnttab_free_pages(nr_pages, pages);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(gnttab_alloc_pages);
> +
> +void gnttab_pages_clear_private(int nr_pages, struct page **pages)
> {
> int i;
>
> @@ -818,6 +823,17 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
> ClearPagePrivate(pages[i]);
> }
> }
> +}
> +EXPORT_SYMBOL(gnttab_pages_clear_private);
> +
> +/**
> + * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
> + * @nr_pages; number of pages to free
> + * @pages: the pages
> + */
> +void gnttab_free_pages(int nr_pages, struct page **pages)
> +{
> + gnttab_pages_clear_private(nr_pages, pages);
> free_xenballooned_pages(nr_pages, pages);
> }
> EXPORT_SYMBOL(gnttab_free_pages);
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 2e37741f6b8d..de03f2542bb7 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -198,6 +198,9 @@ void gnttab_free_auto_xlat_frames(void);
> int gnttab_alloc_pages(int nr_pages, struct page **pages);
> void gnttab_free_pages(int nr_pages, struct page **pages);
>
> +int gnttab_pages_set_private(int nr_pages, struct page **pages);
> +void gnttab_pages_clear_private(int nr_pages, struct page **pages);
> +
> int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> struct gnttab_map_grant_ref *kmap_ops,
> struct page **pages, unsigned int count);
> --
> 2.17.0
>
On Fri, May 25, 2018 at 06:33:29PM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> 1. Create a dma-buf from grant references provided by the foreign
> domain. By default dma-buf is backed by system memory pages, but
> by providing GNTDEV_DMA_FLAG_XXX flags it can also be created
> as a DMA write-combine/coherent buffer, e.g. allocated with
> corresponding dma_alloc_xxx API.
> Export the resulting buffer as a new dma-buf.
>
> 2. Implement waiting for the dma-buf to be released: block until the
> dma-buf with the file descriptor provided is released.
> If within the time-out provided the buffer is not released then
> -ETIMEDOUT error is returned. If the buffer with the file descriptor
> does not exist or has already been released, then -ENOENT is returned.
> For valid file descriptors this must not be treated as error.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/gntdev.c | 478 ++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 476 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 9e450622af1a..52abc6cd5846 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -4,6 +4,8 @@
> * Device for accessing (in user-space) pages that have been granted by other
> * domains.
> *
> + * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c.
> + *
> * Copyright (c) 2006-2007, D G Murray.
> * (c) 2009 Gerd Hoffmann <[email protected]>
> * (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
> @@ -41,6 +43,9 @@
> #ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> #include <linux/of_device.h>
> #endif
> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
> +#include <linux/dma-buf.h>
> +#endif
>
> #include <xen/xen.h>
> #include <xen/grant_table.h>
> @@ -81,6 +86,17 @@ struct gntdev_priv {
> /* Device for which DMA memory is allocated. */
> struct device *dma_dev;
> #endif
> +
> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
> + /* Private data of the hyper DMA buffers. */
> +
> + /* List of exported DMA buffers. */
> + struct list_head dmabuf_exp_list;
> + /* List of wait objects. */
> + struct list_head dmabuf_exp_wait_list;
> + /* This is the lock which protects dma_buf_xxx lists. */
> + struct mutex dmabuf_lock;
> +#endif
> };
>
> struct unmap_notify {
> @@ -125,12 +141,38 @@ struct grant_map {
>
> #ifdef CONFIG_XEN_GNTDEV_DMABUF
> struct xen_dmabuf {
> + struct gntdev_priv *priv;
> + struct dma_buf *dmabuf;
> + struct list_head next;
> + int fd;
> +
> union {
> + struct {
> + /* Exported buffers are reference counted. */
> + struct kref refcount;
> + struct grant_map *map;
> + } exp;
> struct {
> /* Granted references of the imported buffer. */
> grant_ref_t *refs;
> } imp;
> } u;
> +
> + /* Number of pages this buffer has. */
> + int nr_pages;
> + /* Pages of this buffer. */
> + struct page **pages;
> +};
> +
> +struct xen_dmabuf_wait_obj {
> + struct list_head next;
> + struct xen_dmabuf *xen_dmabuf;
> + struct completion completion;
> +};
> +
> +struct xen_dmabuf_attachment {
> + struct sg_table *sgt;
> + enum dma_data_direction dir;
> };
> #endif
>
> @@ -320,6 +362,16 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map)
> gntdev_free_map(map);
> }
>
> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
> +static void gntdev_remove_map(struct gntdev_priv *priv, struct grant_map *map)
> +{
> + mutex_lock(&priv->lock);
> + list_del(&map->next);
> + gntdev_put_map(NULL /* already removed */, map);
> + mutex_unlock(&priv->lock);
> +}
> +#endif
> +
> /* ------------------------------------------------------------------ */
>
> static int find_grant_ptes(pte_t *pte, pgtable_t token,
> @@ -628,6 +680,12 @@ static int gntdev_open(struct inode *inode, struct file *flip)
> INIT_LIST_HEAD(&priv->freeable_maps);
> mutex_init(&priv->lock);
>
> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
> + mutex_init(&priv->dmabuf_lock);
> + INIT_LIST_HEAD(&priv->dmabuf_exp_list);
> + INIT_LIST_HEAD(&priv->dmabuf_exp_wait_list);
> +#endif
> +
> if (use_ptemod) {
> priv->mm = get_task_mm(current);
> if (!priv->mm) {
> @@ -1053,17 +1111,433 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
> /* DMA buffer export support. */
> /* ------------------------------------------------------------------ */
>
> +/* ------------------------------------------------------------------ */
> +/* Implementation of wait for exported DMA buffer to be released. */
> +/* ------------------------------------------------------------------ */
> +
> +static void dmabuf_exp_release(struct kref *kref);
> +
> +static struct xen_dmabuf_wait_obj *
> +dmabuf_exp_wait_obj_new(struct gntdev_priv *priv,
> + struct xen_dmabuf *xen_dmabuf)
> +{
> + struct xen_dmabuf_wait_obj *obj;
> +
> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> + if (!obj)
> + return ERR_PTR(-ENOMEM);
> +
> + init_completion(&obj->completion);
> + obj->xen_dmabuf = xen_dmabuf;
> +
> + mutex_lock(&priv->dmabuf_lock);
> + list_add(&obj->next, &priv->dmabuf_exp_wait_list);
> + /* Put our reference and wait for xen_dmabuf's release to fire. */
> + kref_put(&xen_dmabuf->u.exp.refcount, dmabuf_exp_release);
> + mutex_unlock(&priv->dmabuf_lock);
> + return obj;
> +}
> +
> +static void dmabuf_exp_wait_obj_free(struct gntdev_priv *priv,
> + struct xen_dmabuf_wait_obj *obj)
> +{
> + struct xen_dmabuf_wait_obj *cur_obj, *q;
> +
> + mutex_lock(&priv->dmabuf_lock);
> + list_for_each_entry_safe(cur_obj, q, &priv->dmabuf_exp_wait_list, next)
> + if (cur_obj == obj) {
> + list_del(&obj->next);
> + kfree(obj);
> + break;
> + }
> + mutex_unlock(&priv->dmabuf_lock);
> +}
> +
> +static int dmabuf_exp_wait_obj_wait(struct xen_dmabuf_wait_obj *obj,
> + u32 wait_to_ms)
> +{
> + if (wait_for_completion_timeout(&obj->completion,
> + msecs_to_jiffies(wait_to_ms)) <= 0)
> + return -ETIMEDOUT;
> +
> + return 0;
> +}
> +
> +static void dmabuf_exp_wait_obj_signal(struct gntdev_priv *priv,
> + struct xen_dmabuf *xen_dmabuf)
> +{
> + struct xen_dmabuf_wait_obj *obj, *q;
> +
> + list_for_each_entry_safe(obj, q, &priv->dmabuf_exp_wait_list, next)
> + if (obj->xen_dmabuf == xen_dmabuf) {
> + pr_debug("Found xen_dmabuf in the wait list, wake\n");
> + complete_all(&obj->completion);
> + }
> +}
> +
> +static struct xen_dmabuf *
> +dmabuf_exp_wait_obj_get_by_fd(struct gntdev_priv *priv, int fd)
> +{
> + struct xen_dmabuf *q, *xen_dmabuf, *ret = ERR_PTR(-ENOENT);
> +
> + mutex_lock(&priv->dmabuf_lock);
> + list_for_each_entry_safe(xen_dmabuf, q, &priv->dmabuf_exp_list, next)
> + if (xen_dmabuf->fd == fd) {
> + pr_debug("Found xen_dmabuf in the wait list\n");
> + kref_get(&xen_dmabuf->u.exp.refcount);
> + ret = xen_dmabuf;
> + break;
> + }
> + mutex_unlock(&priv->dmabuf_lock);
> + return ret;
> +}
> +
> static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
> int wait_to_ms)
> {
> - return -ETIMEDOUT;
> + struct xen_dmabuf *xen_dmabuf;
> + struct xen_dmabuf_wait_obj *obj;
> + int ret;
> +
> + pr_debug("Will wait for dma-buf with fd %d\n", fd);
> + /*
> + * Try to find the DMA buffer: if not found means that
> + * either the buffer has already been released or file descriptor
> + * provided is wrong.
> + */
> + xen_dmabuf = dmabuf_exp_wait_obj_get_by_fd(priv, fd);
> + if (IS_ERR(xen_dmabuf))
> + return PTR_ERR(xen_dmabuf);
> +
> + /*
> + * xen_dmabuf still exists and is reference count locked by us now,
> + * so prepare to wait: allocate wait object and add it to the wait list,
> + * so we can find it on release.
> + */
> + obj = dmabuf_exp_wait_obj_new(priv, xen_dmabuf);
> + if (IS_ERR(obj)) {
> + pr_err("Failed to setup wait object, ret %ld\n", PTR_ERR(obj));
> + return PTR_ERR(obj);
> + }
> +
> + ret = dmabuf_exp_wait_obj_wait(obj, wait_to_ms);
> + dmabuf_exp_wait_obj_free(priv, obj);
> + return ret;
> +}
> +
> +/* ------------------------------------------------------------------ */
> +/* DMA buffer export support. */
> +/* ------------------------------------------------------------------ */
> +
> +static struct sg_table *
> +dmabuf_pages_to_sgt(struct page **pages, unsigned int nr_pages)
> +{
> + struct sg_table *sgt;
> + int ret;
> +
> + sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
> + if (!sgt) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + ret = sg_alloc_table_from_pages(sgt, pages, nr_pages, 0,
> + nr_pages << PAGE_SHIFT,
> + GFP_KERNEL);
> + if (ret)
> + goto out;
> +
> + return sgt;
> +
> +out:
> + kfree(sgt);
> + return ERR_PTR(ret);
> +}
> +
> +static int dmabuf_exp_ops_attach(struct dma_buf *dma_buf,
> + struct device *target_dev,
> + struct dma_buf_attachment *attach)
> +{
> + struct xen_dmabuf_attachment *xen_dmabuf_attach;
> +
> + xen_dmabuf_attach = kzalloc(sizeof(*xen_dmabuf_attach), GFP_KERNEL);
> + if (!xen_dmabuf_attach)
> + return -ENOMEM;
> +
> + xen_dmabuf_attach->dir = DMA_NONE;
> + attach->priv = xen_dmabuf_attach;
> + /* Might need to pin the pages of the buffer now. */
> + return 0;
> +}
> +
> +static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
> + struct dma_buf_attachment *attach)
> +{
> + struct xen_dmabuf_attachment *xen_dmabuf_attach = attach->priv;
> +
> + if (xen_dmabuf_attach) {
> + struct sg_table *sgt = xen_dmabuf_attach->sgt;
> +
> + if (sgt) {
> + if (xen_dmabuf_attach->dir != DMA_NONE)
> + dma_unmap_sg_attrs(attach->dev, sgt->sgl,
> + sgt->nents,
> + xen_dmabuf_attach->dir,
> + DMA_ATTR_SKIP_CPU_SYNC);
> + sg_free_table(sgt);
> + }
> +
> + kfree(sgt);
> + kfree(xen_dmabuf_attach);
> + attach->priv = NULL;
> + }
> + /* Might need to unpin the pages of the buffer now. */
> +}
> +
> +static struct sg_table *
> +dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
> + enum dma_data_direction dir)
> +{
> + struct xen_dmabuf_attachment *xen_dmabuf_attach = attach->priv;
> + struct xen_dmabuf *xen_dmabuf = attach->dmabuf->priv;
> + struct sg_table *sgt;
> +
> + pr_debug("Mapping %d pages for dev %p\n", xen_dmabuf->nr_pages,
> + attach->dev);
> +
> + if (WARN_ON(dir == DMA_NONE || !xen_dmabuf_attach))
> + return ERR_PTR(-EINVAL);
> +
> + /* Return the cached mapping when possible. */
> + if (xen_dmabuf_attach->dir == dir)
> + return xen_dmabuf_attach->sgt;
may need to check xen_dmabuf_attach->sgt == NULL (i.e. first time mapping)?
Also, I am not sure if this mechanism of reusing previously generated sgt
for other mappings is universally ok for any use-cases... I don't know if
it is acceptable as per the specification.
> +
> + /*
> + * Two mappings with different directions for the same attachment are
> + * not allowed.
> + */
> + if (WARN_ON(xen_dmabuf_attach->dir != DMA_NONE))
> + return ERR_PTR(-EBUSY);
> +
> + sgt = dmabuf_pages_to_sgt(xen_dmabuf->pages, xen_dmabuf->nr_pages);
> + if (!IS_ERR(sgt)) {
> + if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
> + DMA_ATTR_SKIP_CPU_SYNC)) {
> + sg_free_table(sgt);
> + kfree(sgt);
> + sgt = ERR_PTR(-ENOMEM);
> + } else {
> + xen_dmabuf_attach->sgt = sgt;
> + xen_dmabuf_attach->dir = dir;
> + }
> + }
> + if (IS_ERR(sgt))
> + pr_err("Failed to map sg table for dev %p\n", attach->dev);
> + return sgt;
> +}
> +
> +static void dmabuf_exp_ops_unmap_dma_buf(struct dma_buf_attachment *attach,
> + struct sg_table *sgt,
> + enum dma_data_direction dir)
> +{
> + /* Not implemented. The unmap is done at dmabuf_exp_ops_detach(). */
Not sure if it's ok to do nothing here because the spec says this function is
mandatory and it should unmap and "release" &sg_table associated with it.
/**
* @unmap_dma_buf:
*
* This is called by dma_buf_unmap_attachment() and should unmap and
* release the &sg_table allocated in @map_dma_buf, and it is mandatory.
* It should also unpin the backing storage if this is the last mapping
* of the DMA buffer, it the exporter supports backing storage
* migration.
*/
> +}
> +
> +static void dmabuf_exp_release(struct kref *kref)
> +{
> + struct xen_dmabuf *xen_dmabuf =
> + container_of(kref, struct xen_dmabuf, u.exp.refcount);
> +
> + dmabuf_exp_wait_obj_signal(xen_dmabuf->priv, xen_dmabuf);
> + list_del(&xen_dmabuf->next);
> + kfree(xen_dmabuf);
> +}
> +
> +static void dmabuf_exp_ops_release(struct dma_buf *dma_buf)
> +{
> + struct xen_dmabuf *xen_dmabuf = dma_buf->priv;
> + struct gntdev_priv *priv = xen_dmabuf->priv;
> +
> + gntdev_remove_map(priv, xen_dmabuf->u.exp.map);
> + mutex_lock(&priv->dmabuf_lock);
> + kref_put(&xen_dmabuf->u.exp.refcount, dmabuf_exp_release);
> + mutex_unlock(&priv->dmabuf_lock);
> +}
> +
> +static void *dmabuf_exp_ops_kmap_atomic(struct dma_buf *dma_buf,
> + unsigned long page_num)
> +{
> + /* Not implemented. */
> + return NULL;
> +}
> +
> +static void dmabuf_exp_ops_kunmap_atomic(struct dma_buf *dma_buf,
> + unsigned long page_num, void *addr)
> +{
> + /* Not implemented. */
> +}
> +
> +static void *dmabuf_exp_ops_kmap(struct dma_buf *dma_buf,
> + unsigned long page_num)
> +{
> + /* Not implemented. */
> + return NULL;
> +}
> +
> +static void dmabuf_exp_ops_kunmap(struct dma_buf *dma_buf,
> + unsigned long page_num, void *addr)
> +{
> + /* Not implemented. */
> +}
> +
> +static int dmabuf_exp_ops_mmap(struct dma_buf *dma_buf,
> + struct vm_area_struct *vma)
> +{
> + /* Not implemented. */
> + return 0;
> +}
> +
> +static const struct dma_buf_ops dmabuf_exp_ops = {
> + .attach = dmabuf_exp_ops_attach,
> + .detach = dmabuf_exp_ops_detach,
> + .map_dma_buf = dmabuf_exp_ops_map_dma_buf,
> + .unmap_dma_buf = dmabuf_exp_ops_unmap_dma_buf,
> + .release = dmabuf_exp_ops_release,
> + .map = dmabuf_exp_ops_kmap,
> + .map_atomic = dmabuf_exp_ops_kmap_atomic,
> + .unmap = dmabuf_exp_ops_kunmap,
> + .unmap_atomic = dmabuf_exp_ops_kunmap_atomic,
> + .mmap = dmabuf_exp_ops_mmap,
> +};
> +
> +static int dmabuf_export(struct gntdev_priv *priv, struct grant_map *map,
> + int *fd)
> +{
> + DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> + struct xen_dmabuf *xen_dmabuf;
> + int ret = 0;
> +
> + xen_dmabuf = kzalloc(sizeof(*xen_dmabuf), GFP_KERNEL);
> + if (!xen_dmabuf)
> + return -ENOMEM;
> +
> + kref_init(&xen_dmabuf->u.exp.refcount);
> +
> + xen_dmabuf->priv = priv;
> + xen_dmabuf->nr_pages = map->count;
> + xen_dmabuf->pages = map->pages;
> + xen_dmabuf->u.exp.map = map;
> +
> + exp_info.exp_name = KBUILD_MODNAME;
> + if (map->dma_dev->driver && map->dma_dev->driver->owner)
> + exp_info.owner = map->dma_dev->driver->owner;
> + else
> + exp_info.owner = THIS_MODULE;
> + exp_info.ops = &dmabuf_exp_ops;
> + exp_info.size = map->count << PAGE_SHIFT;
> + exp_info.flags = O_RDWR;
> + exp_info.priv = xen_dmabuf;
> +
> + xen_dmabuf->dmabuf = dma_buf_export(&exp_info);
> + if (IS_ERR(xen_dmabuf->dmabuf)) {
> + ret = PTR_ERR(xen_dmabuf->dmabuf);
> + xen_dmabuf->dmabuf = NULL;
> + goto fail;
> + }
> +
> + ret = dma_buf_fd(xen_dmabuf->dmabuf, O_CLOEXEC);
> + if (ret < 0)
> + goto fail;
> +
> + xen_dmabuf->fd = ret;
> + *fd = ret;
> +
> + pr_debug("Exporting DMA buffer with fd %d\n", ret);
> +
> + mutex_lock(&priv->dmabuf_lock);
> + list_add(&xen_dmabuf->next, &priv->dmabuf_exp_list);
> + mutex_unlock(&priv->dmabuf_lock);
> + return 0;
> +
> +fail:
> + if (xen_dmabuf->dmabuf)
> + dma_buf_put(xen_dmabuf->dmabuf);
> + kfree(xen_dmabuf);
> + return ret;
> +}
> +
> +static struct grant_map *
> +dmabuf_exp_alloc_backing_storage(struct gntdev_priv *priv, int dmabuf_flags,
> + int count)
> +{
> + struct grant_map *map;
> +
> + if (unlikely(count <= 0))
> + return ERR_PTR(-EINVAL);
> +
> + if ((dmabuf_flags & GNTDEV_DMA_FLAG_WC) &&
> + (dmabuf_flags & GNTDEV_DMA_FLAG_COHERENT)) {
> + pr_err("Wrong dma-buf flags: either WC or coherent, not both\n");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + map = gntdev_alloc_map(priv, count, dmabuf_flags);
> + if (!map)
> + return ERR_PTR(-ENOMEM);
> +
> + if (unlikely(atomic_add_return(count, &pages_mapped) > limit)) {
> + pr_err("can't map: over limit\n");
> + gntdev_put_map(NULL, map);
> + return ERR_PTR(-ENOMEM);
> + }
> + return map;
> }
When and how would this allocation be freed? I don't see any ioctl for freeing up
shared pages.
>
> static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
> int count, u32 domid, u32 *refs, u32 *fd)
> {
> + struct grant_map *map;
> + int i, ret;
> +
> *fd = -1;
> - return -EINVAL;
> +
> + if (use_ptemod) {
> + pr_err("Cannot provide dma-buf: use_ptemode %d\n",
> + use_ptemod);
> + return -EINVAL;
> + }
> +
> + map = dmabuf_exp_alloc_backing_storage(priv, flags, count);
> + if (IS_ERR(map))
> + return PTR_ERR(map);
> +
> + for (i = 0; i < count; i++) {
> + map->grants[i].domid = domid;
> + map->grants[i].ref = refs[i];
> + }
> +
> + mutex_lock(&priv->lock);
> + gntdev_add_map(priv, map);
> + mutex_unlock(&priv->lock);
> +
> + map->flags |= GNTMAP_host_map;
> +#if defined(CONFIG_X86)
> + map->flags |= GNTMAP_device_map;
> +#endif
> +
> + ret = map_grant_pages(map);
> + if (ret < 0)
> + goto out;
> +
> + ret = dmabuf_export(priv, map, fd);
> + if (ret < 0)
> + goto out;
> +
> + return 0;
> +
> +out:
> + gntdev_remove_map(priv, map);
> + return ret;
> }
>
> /* ------------------------------------------------------------------ */
> --
> 2.17.0
>
On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>
> Oleksandr Andrushchenko (8):
> xen/grant-table: Make set/clear page private code shared
> xen/balloon: Move common memory reservation routines to a module
> xen/grant-table: Allow allocating buffers suitable for DMA
> xen/gntdev: Allow mappings for DMA buffers
> xen/gntdev: Add initial support for dma-buf UAPI
> xen/gntdev: Implement dma-buf export functionality
> xen/gntdev: Implement dma-buf import functionality
> xen/gntdev: Expose gntdev's dma-buf API for in-kernel use
>
> drivers/xen/Kconfig | 23 +
> drivers/xen/Makefile | 1 +
> drivers/xen/balloon.c | 71 +--
> drivers/xen/gntdev.c | 1025 ++++++++++++++++++++++++++++++++-
I think this calls for gntdev_dma.c. I only had a quick look over gntdev
changes but they very much are concentrated in dma-specific routines.
You essentially only share file_operations entry points with original
gntdev code, right?
-boris
> drivers/xen/grant-table.c | 176 +++++-
> drivers/xen/mem-reservation.c | 134 +++++
> include/uapi/xen/gntdev.h | 106 ++++
> include/xen/grant_dev.h | 37 ++
> include/xen/grant_table.h | 28 +
> include/xen/mem_reservation.h | 29 +
> 10 files changed, 1527 insertions(+), 103 deletions(-)
> create mode 100644 drivers/xen/mem-reservation.c
> create mode 100644 include/xen/grant_dev.h
> create mode 100644 include/xen/mem_reservation.h
>
On 05/31/2018 12:34 AM, Dongwon Kim wrote:
> On Fri, May 25, 2018 at 06:33:24PM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> Make set/clear page private code shared and accessible to
>> other kernel modules which can re-use these instead of open-coding.
>>
>> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
>> ---
>> drivers/xen/grant-table.c | 54 +++++++++++++++++++++++++--------------
>> include/xen/grant_table.h | 3 +++
>> 2 files changed, 38 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
>> index 27be107d6480..d7488226e1f2 100644
>> --- a/drivers/xen/grant-table.c
>> +++ b/drivers/xen/grant-table.c
>> @@ -769,29 +769,18 @@ void gnttab_free_auto_xlat_frames(void)
>> }
>> EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
>>
>> -/**
>> - * gnttab_alloc_pages - alloc pages suitable for grant mapping into
>> - * @nr_pages: number of pages to alloc
>> - * @pages: returns the pages
>> - */
>> -int gnttab_alloc_pages(int nr_pages, struct page **pages)
>> +int gnttab_pages_set_private(int nr_pages, struct page **pages)
>> {
>> int i;
>> - int ret;
>> -
>> - ret = alloc_xenballooned_pages(nr_pages, pages);
>> - if (ret < 0)
>> - return ret;
>>
>> for (i = 0; i < nr_pages; i++) {
>> #if BITS_PER_LONG < 64
>> struct xen_page_foreign *foreign;
>>
>> foreign = kzalloc(sizeof(*foreign), GFP_KERNEL);
>> - if (!foreign) {
>> - gnttab_free_pages(nr_pages, pages);
>> + if (!foreign)
> Don't we have to free previously allocated "foreign"(s) if it fails in the middle
> (e.g. 0 < i && i < nr_pages - 1) before returning?
gnttab_free_pages(nr_pages, pages); will take care of it when called from
outside, see below. It can also handle partial allocations, so no problem
here
>> return -ENOMEM;
>> - }
>> +
>> set_page_private(pages[i], (unsigned long)foreign);
>> #endif
>> SetPagePrivate(pages[i]);
>> @@ -799,14 +788,30 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
>>
>> return 0;
>> }
>> -EXPORT_SYMBOL(gnttab_alloc_pages);
>> +EXPORT_SYMBOL(gnttab_pages_set_private);
>>
>> /**
>> - * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
>> - * @nr_pages; number of pages to free
>> - * @pages: the pages
>> + * gnttab_alloc_pages - alloc pages suitable for grant mapping into
>> + * @nr_pages: number of pages to alloc
>> + * @pages: returns the pages
>> */
>> -void gnttab_free_pages(int nr_pages, struct page **pages)
>> +int gnttab_alloc_pages(int nr_pages, struct page **pages)
>> +{
>> + int ret;
>> +
>> + ret = alloc_xenballooned_pages(nr_pages, pages);
>> + if (ret < 0)
>> + return ret;
>> +
>> + ret = gnttab_pages_set_private(nr_pages, pages);
>> + if (ret < 0)
>> + gnttab_free_pages(nr_pages, pages);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(gnttab_alloc_pages);
>> +
>> +void gnttab_pages_clear_private(int nr_pages, struct page **pages)
>> {
>> int i;
>>
>> @@ -818,6 +823,17 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
>> ClearPagePrivate(pages[i]);
>> }
>> }
>> +}
>> +EXPORT_SYMBOL(gnttab_pages_clear_private);
>> +
>> +/**
>> + * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
>> + * @nr_pages; number of pages to free
>> + * @pages: the pages
>> + */
>> +void gnttab_free_pages(int nr_pages, struct page **pages)
>> +{
>> + gnttab_pages_clear_private(nr_pages, pages);
>> free_xenballooned_pages(nr_pages, pages);
>> }
>> EXPORT_SYMBOL(gnttab_free_pages);
>> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
>> index 2e37741f6b8d..de03f2542bb7 100644
>> --- a/include/xen/grant_table.h
>> +++ b/include/xen/grant_table.h
>> @@ -198,6 +198,9 @@ void gnttab_free_auto_xlat_frames(void);
>> int gnttab_alloc_pages(int nr_pages, struct page **pages);
>> void gnttab_free_pages(int nr_pages, struct page **pages);
>>
>> +int gnttab_pages_set_private(int nr_pages, struct page **pages);
>> +void gnttab_pages_clear_private(int nr_pages, struct page **pages);
>> +
>> int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
>> struct gnttab_map_grant_ref *kmap_ops,
>> struct page **pages, unsigned int count);
>> --
>> 2.17.0
>>
On 05/31/2018 04:46 AM, Boris Ostrovsky wrote:
>
>
> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>
>>
>> Oleksandr Andrushchenko (8):
>> xen/grant-table: Make set/clear page private code shared
>> xen/balloon: Move common memory reservation routines to a module
>> xen/grant-table: Allow allocating buffers suitable for DMA
>> xen/gntdev: Allow mappings for DMA buffers
>> xen/gntdev: Add initial support for dma-buf UAPI
>> xen/gntdev: Implement dma-buf export functionality
>> xen/gntdev: Implement dma-buf import functionality
>> xen/gntdev: Expose gntdev's dma-buf API for in-kernel use
>>
>> drivers/xen/Kconfig | 23 +
>> drivers/xen/Makefile | 1 +
>> drivers/xen/balloon.c | 71 +--
>> drivers/xen/gntdev.c | 1025 ++++++++++++++++++++++++++++++++-
>
>
> I think this calls for gntdev_dma.c.
I assume you mean as a separate file (part of gntdev driver)?
> I only had a quick look over gntdev changes but they very much are
> concentrated in dma-specific routines.
>
I tried to do that, but there are some dependencies between the gntdev.c
and gntdev_dma.c,
so finally I decided to put it all together.
> You essentially only share file_operations entry points with original
> gntdev code, right?
>
fops + mappings done by gntdev (struct grant_map) and I need to release
map on dma_buf .release
callback which makes some cross-dependencies between modules which
seemed to be not cute
(gntdev keeps its all structs and functions inside, so I cannot easily
access those w/o
helpers).
But I'll try one more time and move all DMA specific stuff into gntdev_dma.c
> -boris
>
Thank you,
Oleksandr
>
>> drivers/xen/grant-table.c | 176 +++++-
>> drivers/xen/mem-reservation.c | 134 +++++
>> include/uapi/xen/gntdev.h | 106 ++++
>> include/xen/grant_dev.h | 37 ++
>> include/xen/grant_table.h | 28 +
>> include/xen/mem_reservation.h | 29 +
>> 10 files changed, 1527 insertions(+), 103 deletions(-)
>> create mode 100644 drivers/xen/mem-reservation.c
>> create mode 100644 include/xen/grant_dev.h
>> create mode 100644 include/xen/mem_reservation.h
>>
On 05/31/2018 02:10 AM, Dongwon Kim wrote:
> On Fri, May 25, 2018 at 06:33:29PM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> 1. Create a dma-buf from grant references provided by the foreign
>> domain. By default dma-buf is backed by system memory pages, but
>> by providing GNTDEV_DMA_FLAG_XXX flags it can also be created
>> as a DMA write-combine/coherent buffer, e.g. allocated with
>> corresponding dma_alloc_xxx API.
>> Export the resulting buffer as a new dma-buf.
>>
>> 2. Implement waiting for the dma-buf to be released: block until the
>> dma-buf with the file descriptor provided is released.
>> If within the time-out provided the buffer is not released then
>> -ETIMEDOUT error is returned. If the buffer with the file descriptor
>> does not exist or has already been released, then -ENOENT is returned.
>> For valid file descriptors this must not be treated as error.
>>
>> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
>> ---
>> drivers/xen/gntdev.c | 478 ++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 476 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index 9e450622af1a..52abc6cd5846 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
>> @@ -4,6 +4,8 @@
>> * Device for accessing (in user-space) pages that have been granted by other
>> * domains.
>> *
>> + * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c.
>> + *
>> * Copyright (c) 2006-2007, D G Murray.
>> * (c) 2009 Gerd Hoffmann <[email protected]>
>> * (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
>> @@ -41,6 +43,9 @@
>> #ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> #include <linux/of_device.h>
>> #endif
>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>> +#include <linux/dma-buf.h>
>> +#endif
>>
>> #include <xen/xen.h>
>> #include <xen/grant_table.h>
>> @@ -81,6 +86,17 @@ struct gntdev_priv {
>> /* Device for which DMA memory is allocated. */
>> struct device *dma_dev;
>> #endif
>> +
>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>> + /* Private data of the hyper DMA buffers. */
>> +
>> + /* List of exported DMA buffers. */
>> + struct list_head dmabuf_exp_list;
>> + /* List of wait objects. */
>> + struct list_head dmabuf_exp_wait_list;
>> + /* This is the lock which protects dma_buf_xxx lists. */
>> + struct mutex dmabuf_lock;
>> +#endif
>> };
>>
>> struct unmap_notify {
>> @@ -125,12 +141,38 @@ struct grant_map {
>>
>> #ifdef CONFIG_XEN_GNTDEV_DMABUF
>> struct xen_dmabuf {
>> + struct gntdev_priv *priv;
>> + struct dma_buf *dmabuf;
>> + struct list_head next;
>> + int fd;
>> +
>> union {
>> + struct {
>> + /* Exported buffers are reference counted. */
>> + struct kref refcount;
>> + struct grant_map *map;
>> + } exp;
>> struct {
>> /* Granted references of the imported buffer. */
>> grant_ref_t *refs;
>> } imp;
>> } u;
>> +
>> + /* Number of pages this buffer has. */
>> + int nr_pages;
>> + /* Pages of this buffer. */
>> + struct page **pages;
>> +};
>> +
>> +struct xen_dmabuf_wait_obj {
>> + struct list_head next;
>> + struct xen_dmabuf *xen_dmabuf;
>> + struct completion completion;
>> +};
>> +
>> +struct xen_dmabuf_attachment {
>> + struct sg_table *sgt;
>> + enum dma_data_direction dir;
>> };
>> #endif
>>
>> @@ -320,6 +362,16 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map)
>> gntdev_free_map(map);
>> }
>>
>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>> +static void gntdev_remove_map(struct gntdev_priv *priv, struct grant_map *map)
>> +{
>> + mutex_lock(&priv->lock);
>> + list_del(&map->next);
>> + gntdev_put_map(NULL /* already removed */, map);
>> + mutex_unlock(&priv->lock);
>> +}
>> +#endif
>> +
>> /* ------------------------------------------------------------------ */
>>
>> static int find_grant_ptes(pte_t *pte, pgtable_t token,
>> @@ -628,6 +680,12 @@ static int gntdev_open(struct inode *inode, struct file *flip)
>> INIT_LIST_HEAD(&priv->freeable_maps);
>> mutex_init(&priv->lock);
>>
>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>> + mutex_init(&priv->dmabuf_lock);
>> + INIT_LIST_HEAD(&priv->dmabuf_exp_list);
>> + INIT_LIST_HEAD(&priv->dmabuf_exp_wait_list);
>> +#endif
>> +
>> if (use_ptemod) {
>> priv->mm = get_task_mm(current);
>> if (!priv->mm) {
>> @@ -1053,17 +1111,433 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
>> /* DMA buffer export support. */
>> /* ------------------------------------------------------------------ */
>>
>> +/* ------------------------------------------------------------------ */
>> +/* Implementation of wait for exported DMA buffer to be released. */
>> +/* ------------------------------------------------------------------ */
>> +
>> +static void dmabuf_exp_release(struct kref *kref);
>> +
>> +static struct xen_dmabuf_wait_obj *
>> +dmabuf_exp_wait_obj_new(struct gntdev_priv *priv,
>> + struct xen_dmabuf *xen_dmabuf)
>> +{
>> + struct xen_dmabuf_wait_obj *obj;
>> +
>> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
>> + if (!obj)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + init_completion(&obj->completion);
>> + obj->xen_dmabuf = xen_dmabuf;
>> +
>> + mutex_lock(&priv->dmabuf_lock);
>> + list_add(&obj->next, &priv->dmabuf_exp_wait_list);
>> + /* Put our reference and wait for xen_dmabuf's release to fire. */
>> + kref_put(&xen_dmabuf->u.exp.refcount, dmabuf_exp_release);
>> + mutex_unlock(&priv->dmabuf_lock);
>> + return obj;
>> +}
>> +
>> +static void dmabuf_exp_wait_obj_free(struct gntdev_priv *priv,
>> + struct xen_dmabuf_wait_obj *obj)
>> +{
>> + struct xen_dmabuf_wait_obj *cur_obj, *q;
>> +
>> + mutex_lock(&priv->dmabuf_lock);
>> + list_for_each_entry_safe(cur_obj, q, &priv->dmabuf_exp_wait_list, next)
>> + if (cur_obj == obj) {
>> + list_del(&obj->next);
>> + kfree(obj);
>> + break;
>> + }
>> + mutex_unlock(&priv->dmabuf_lock);
>> +}
>> +
>> +static int dmabuf_exp_wait_obj_wait(struct xen_dmabuf_wait_obj *obj,
>> + u32 wait_to_ms)
>> +{
>> + if (wait_for_completion_timeout(&obj->completion,
>> + msecs_to_jiffies(wait_to_ms)) <= 0)
>> + return -ETIMEDOUT;
>> +
>> + return 0;
>> +}
>> +
>> +static void dmabuf_exp_wait_obj_signal(struct gntdev_priv *priv,
>> + struct xen_dmabuf *xen_dmabuf)
>> +{
>> + struct xen_dmabuf_wait_obj *obj, *q;
>> +
>> + list_for_each_entry_safe(obj, q, &priv->dmabuf_exp_wait_list, next)
>> + if (obj->xen_dmabuf == xen_dmabuf) {
>> + pr_debug("Found xen_dmabuf in the wait list, wake\n");
>> + complete_all(&obj->completion);
>> + }
>> +}
>> +
>> +static struct xen_dmabuf *
>> +dmabuf_exp_wait_obj_get_by_fd(struct gntdev_priv *priv, int fd)
>> +{
>> + struct xen_dmabuf *q, *xen_dmabuf, *ret = ERR_PTR(-ENOENT);
>> +
>> + mutex_lock(&priv->dmabuf_lock);
>> + list_for_each_entry_safe(xen_dmabuf, q, &priv->dmabuf_exp_list, next)
>> + if (xen_dmabuf->fd == fd) {
>> + pr_debug("Found xen_dmabuf in the wait list\n");
>> + kref_get(&xen_dmabuf->u.exp.refcount);
>> + ret = xen_dmabuf;
>> + break;
>> + }
>> + mutex_unlock(&priv->dmabuf_lock);
>> + return ret;
>> +}
>> +
>> static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
>> int wait_to_ms)
>> {
>> - return -ETIMEDOUT;
>> + struct xen_dmabuf *xen_dmabuf;
>> + struct xen_dmabuf_wait_obj *obj;
>> + int ret;
>> +
>> + pr_debug("Will wait for dma-buf with fd %d\n", fd);
>> + /*
>> + * Try to find the DMA buffer: if not found means that
>> + * either the buffer has already been released or file descriptor
>> + * provided is wrong.
>> + */
>> + xen_dmabuf = dmabuf_exp_wait_obj_get_by_fd(priv, fd);
>> + if (IS_ERR(xen_dmabuf))
>> + return PTR_ERR(xen_dmabuf);
>> +
>> + /*
>> + * xen_dmabuf still exists and is reference count locked by us now,
>> + * so prepare to wait: allocate wait object and add it to the wait list,
>> + * so we can find it on release.
>> + */
>> + obj = dmabuf_exp_wait_obj_new(priv, xen_dmabuf);
>> + if (IS_ERR(obj)) {
>> + pr_err("Failed to setup wait object, ret %ld\n", PTR_ERR(obj));
>> + return PTR_ERR(obj);
>> + }
>> +
>> + ret = dmabuf_exp_wait_obj_wait(obj, wait_to_ms);
>> + dmabuf_exp_wait_obj_free(priv, obj);
>> + return ret;
>> +}
>> +
>> +/* ------------------------------------------------------------------ */
>> +/* DMA buffer export support. */
>> +/* ------------------------------------------------------------------ */
>> +
>> +static struct sg_table *
>> +dmabuf_pages_to_sgt(struct page **pages, unsigned int nr_pages)
>> +{
>> + struct sg_table *sgt;
>> + int ret;
>> +
>> + sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
>> + if (!sgt) {
>> + ret = -ENOMEM;
>> + goto out;
>> + }
>> +
>> + ret = sg_alloc_table_from_pages(sgt, pages, nr_pages, 0,
>> + nr_pages << PAGE_SHIFT,
>> + GFP_KERNEL);
>> + if (ret)
>> + goto out;
>> +
>> + return sgt;
>> +
>> +out:
>> + kfree(sgt);
>> + return ERR_PTR(ret);
>> +}
>> +
>> +static int dmabuf_exp_ops_attach(struct dma_buf *dma_buf,
>> + struct device *target_dev,
>> + struct dma_buf_attachment *attach)
>> +{
>> + struct xen_dmabuf_attachment *xen_dmabuf_attach;
>> +
>> + xen_dmabuf_attach = kzalloc(sizeof(*xen_dmabuf_attach), GFP_KERNEL);
>> + if (!xen_dmabuf_attach)
>> + return -ENOMEM;
>> +
>> + xen_dmabuf_attach->dir = DMA_NONE;
>> + attach->priv = xen_dmabuf_attach;
>> + /* Might need to pin the pages of the buffer now. */
>> + return 0;
>> +}
>> +
>> +static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
>> + struct dma_buf_attachment *attach)
>> +{
>> + struct xen_dmabuf_attachment *xen_dmabuf_attach = attach->priv;
>> +
>> + if (xen_dmabuf_attach) {
>> + struct sg_table *sgt = xen_dmabuf_attach->sgt;
>> +
>> + if (sgt) {
>> + if (xen_dmabuf_attach->dir != DMA_NONE)
>> + dma_unmap_sg_attrs(attach->dev, sgt->sgl,
>> + sgt->nents,
>> + xen_dmabuf_attach->dir,
>> + DMA_ATTR_SKIP_CPU_SYNC);
>> + sg_free_table(sgt);
>> + }
>> +
>> + kfree(sgt);
>> + kfree(xen_dmabuf_attach);
>> + attach->priv = NULL;
>> + }
>> + /* Might need to unpin the pages of the buffer now. */
>> +}
>> +
>> +static struct sg_table *
>> +dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
>> + enum dma_data_direction dir)
>> +{
>> + struct xen_dmabuf_attachment *xen_dmabuf_attach = attach->priv;
>> + struct xen_dmabuf *xen_dmabuf = attach->dmabuf->priv;
>> + struct sg_table *sgt;
>> +
>> + pr_debug("Mapping %d pages for dev %p\n", xen_dmabuf->nr_pages,
>> + attach->dev);
>> +
>> + if (WARN_ON(dir == DMA_NONE || !xen_dmabuf_attach))
>> + return ERR_PTR(-EINVAL);
>> +
>> + /* Return the cached mapping when possible. */
>> + if (xen_dmabuf_attach->dir == dir)
>> + return xen_dmabuf_attach->sgt;
> may need to check xen_dmabuf_attach->sgt == NULL (i.e. first time mapping)?
> Also, I am not sure if this mechanism of reusing previously generated sgt
> for other mappings is universally ok for any use-cases... I don't know if
> it is acceptable as per the specification.
Well, I was not sure about this piece of code as well,
so I'll probably allocate a new sgt each time and do not reuse it
as now
>> +
>> + /*
>> + * Two mappings with different directions for the same attachment are
>> + * not allowed.
>> + */
>> + if (WARN_ON(xen_dmabuf_attach->dir != DMA_NONE))
>> + return ERR_PTR(-EBUSY);
>> +
>> + sgt = dmabuf_pages_to_sgt(xen_dmabuf->pages, xen_dmabuf->nr_pages);
>> + if (!IS_ERR(sgt)) {
>> + if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
>> + DMA_ATTR_SKIP_CPU_SYNC)) {
>> + sg_free_table(sgt);
>> + kfree(sgt);
>> + sgt = ERR_PTR(-ENOMEM);
>> + } else {
>> + xen_dmabuf_attach->sgt = sgt;
>> + xen_dmabuf_attach->dir = dir;
>> + }
>> + }
>> + if (IS_ERR(sgt))
>> + pr_err("Failed to map sg table for dev %p\n", attach->dev);
>> + return sgt;
>> +}
>> +
>> +static void dmabuf_exp_ops_unmap_dma_buf(struct dma_buf_attachment *attach,
>> + struct sg_table *sgt,
>> + enum dma_data_direction dir)
>> +{
>> + /* Not implemented. The unmap is done at dmabuf_exp_ops_detach(). */
> Not sure if it's ok to do nothing here because the spec says this function is
> mandatory and it should unmap and "release" &sg_table associated with it.
>
> /**
> * @unmap_dma_buf:
> *
> * This is called by dma_buf_unmap_attachment() and should unmap and
> * release the &sg_table allocated in @map_dma_buf, and it is mandatory.
> * It should also unpin the backing storage if this is the last mapping
> * of the DMA buffer, it the exporter supports backing storage
> * migration.
> */
Yes, as I say at the top of the file dma-buf handling is DRM PRIME
based, so I have the workflow just like in there.
Do you think we have to be more strict and rework this?
Daniel, what do you think?
>> +}
>> +
>> +static void dmabuf_exp_release(struct kref *kref)
>> +{
>> + struct xen_dmabuf *xen_dmabuf =
>> + container_of(kref, struct xen_dmabuf, u.exp.refcount);
>> +
>> + dmabuf_exp_wait_obj_signal(xen_dmabuf->priv, xen_dmabuf);
>> + list_del(&xen_dmabuf->next);
>> + kfree(xen_dmabuf);
>> +}
>> +
>> +static void dmabuf_exp_ops_release(struct dma_buf *dma_buf)
>> +{
>> + struct xen_dmabuf *xen_dmabuf = dma_buf->priv;
>> + struct gntdev_priv *priv = xen_dmabuf->priv;
>> +
>> + gntdev_remove_map(priv, xen_dmabuf->u.exp.map);
>> + mutex_lock(&priv->dmabuf_lock);
>> + kref_put(&xen_dmabuf->u.exp.refcount, dmabuf_exp_release);
>> + mutex_unlock(&priv->dmabuf_lock);
>> +}
>> +
>> +static void *dmabuf_exp_ops_kmap_atomic(struct dma_buf *dma_buf,
>> + unsigned long page_num)
>> +{
>> + /* Not implemented. */
>> + return NULL;
>> +}
>> +
>> +static void dmabuf_exp_ops_kunmap_atomic(struct dma_buf *dma_buf,
>> + unsigned long page_num, void *addr)
>> +{
>> + /* Not implemented. */
>> +}
>> +
>> +static void *dmabuf_exp_ops_kmap(struct dma_buf *dma_buf,
>> + unsigned long page_num)
>> +{
>> + /* Not implemented. */
>> + return NULL;
>> +}
>> +
>> +static void dmabuf_exp_ops_kunmap(struct dma_buf *dma_buf,
>> + unsigned long page_num, void *addr)
>> +{
>> + /* Not implemented. */
>> +}
>> +
>> +static int dmabuf_exp_ops_mmap(struct dma_buf *dma_buf,
>> + struct vm_area_struct *vma)
>> +{
>> + /* Not implemented. */
>> + return 0;
>> +}
>> +
>> +static const struct dma_buf_ops dmabuf_exp_ops = {
>> + .attach = dmabuf_exp_ops_attach,
>> + .detach = dmabuf_exp_ops_detach,
>> + .map_dma_buf = dmabuf_exp_ops_map_dma_buf,
>> + .unmap_dma_buf = dmabuf_exp_ops_unmap_dma_buf,
>> + .release = dmabuf_exp_ops_release,
>> + .map = dmabuf_exp_ops_kmap,
>> + .map_atomic = dmabuf_exp_ops_kmap_atomic,
>> + .unmap = dmabuf_exp_ops_kunmap,
>> + .unmap_atomic = dmabuf_exp_ops_kunmap_atomic,
>> + .mmap = dmabuf_exp_ops_mmap,
>> +};
>> +
>> +static int dmabuf_export(struct gntdev_priv *priv, struct grant_map *map,
>> + int *fd)
>> +{
>> + DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>> + struct xen_dmabuf *xen_dmabuf;
>> + int ret = 0;
>> +
>> + xen_dmabuf = kzalloc(sizeof(*xen_dmabuf), GFP_KERNEL);
>> + if (!xen_dmabuf)
>> + return -ENOMEM;
>> +
>> + kref_init(&xen_dmabuf->u.exp.refcount);
>> +
>> + xen_dmabuf->priv = priv;
>> + xen_dmabuf->nr_pages = map->count;
>> + xen_dmabuf->pages = map->pages;
>> + xen_dmabuf->u.exp.map = map;
>> +
>> + exp_info.exp_name = KBUILD_MODNAME;
>> + if (map->dma_dev->driver && map->dma_dev->driver->owner)
>> + exp_info.owner = map->dma_dev->driver->owner;
>> + else
>> + exp_info.owner = THIS_MODULE;
>> + exp_info.ops = &dmabuf_exp_ops;
>> + exp_info.size = map->count << PAGE_SHIFT;
>> + exp_info.flags = O_RDWR;
>> + exp_info.priv = xen_dmabuf;
>> +
>> + xen_dmabuf->dmabuf = dma_buf_export(&exp_info);
>> + if (IS_ERR(xen_dmabuf->dmabuf)) {
>> + ret = PTR_ERR(xen_dmabuf->dmabuf);
>> + xen_dmabuf->dmabuf = NULL;
>> + goto fail;
>> + }
>> +
>> + ret = dma_buf_fd(xen_dmabuf->dmabuf, O_CLOEXEC);
>> + if (ret < 0)
>> + goto fail;
>> +
>> + xen_dmabuf->fd = ret;
>> + *fd = ret;
>> +
>> + pr_debug("Exporting DMA buffer with fd %d\n", ret);
>> +
>> + mutex_lock(&priv->dmabuf_lock);
>> + list_add(&xen_dmabuf->next, &priv->dmabuf_exp_list);
>> + mutex_unlock(&priv->dmabuf_lock);
>> + return 0;
>> +
>> +fail:
>> + if (xen_dmabuf->dmabuf)
>> + dma_buf_put(xen_dmabuf->dmabuf);
>> + kfree(xen_dmabuf);
>> + return ret;
>> +}
>> +
>> +static struct grant_map *
>> +dmabuf_exp_alloc_backing_storage(struct gntdev_priv *priv, int dmabuf_flags,
>> + int count)
>> +{
>> + struct grant_map *map;
>> +
>> + if (unlikely(count <= 0))
>> + return ERR_PTR(-EINVAL);
>> +
>> + if ((dmabuf_flags & GNTDEV_DMA_FLAG_WC) &&
>> + (dmabuf_flags & GNTDEV_DMA_FLAG_COHERENT)) {
>> + pr_err("Wrong dma-buf flags: either WC or coherent, not both\n");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> + map = gntdev_alloc_map(priv, count, dmabuf_flags);
>> + if (!map)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + if (unlikely(atomic_add_return(count, &pages_mapped) > limit)) {
>> + pr_err("can't map: over limit\n");
>> + gntdev_put_map(NULL, map);
>> + return ERR_PTR(-ENOMEM);
>> + }
>> + return map;
>> }
> When and how would this allocation be freed? I don't see any ioctl for freeing up
> shared pages.
on xen_dmabuf.release callback which is refcounted
>>
>> static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
>> int count, u32 domid, u32 *refs, u32 *fd)
>> {
>> + struct grant_map *map;
>> + int i, ret;
>> +
>> *fd = -1;
>> - return -EINVAL;
>> +
>> + if (use_ptemod) {
>> + pr_err("Cannot provide dma-buf: use_ptemode %d\n",
>> + use_ptemod);
>> + return -EINVAL;
>> + }
>> +
>> + map = dmabuf_exp_alloc_backing_storage(priv, flags, count);
>> + if (IS_ERR(map))
>> + return PTR_ERR(map);
>> +
>> + for (i = 0; i < count; i++) {
>> + map->grants[i].domid = domid;
>> + map->grants[i].ref = refs[i];
>> + }
>> +
>> + mutex_lock(&priv->lock);
>> + gntdev_add_map(priv, map);
>> + mutex_unlock(&priv->lock);
>> +
>> + map->flags |= GNTMAP_host_map;
>> +#if defined(CONFIG_X86)
>> + map->flags |= GNTMAP_device_map;
>> +#endif
>> +
>> + ret = map_grant_pages(map);
>> + if (ret < 0)
>> + goto out;
>> +
>> + ret = dmabuf_export(priv, map, fd);
>> + if (ret < 0)
>> + goto out;
>> +
>> + return 0;
>> +
>> +out:
>> + gntdev_remove_map(priv, map);
>> + return ret;
>> }
>>
>> /* ------------------------------------------------------------------ */
>> --
>> 2.17.0
>>
On 05/30/2018 10:24 PM, Boris Ostrovsky wrote:
> On 05/30/2018 01:46 PM, Oleksandr Andrushchenko wrote:
>> On 05/30/2018 06:54 PM, Boris Ostrovsky wrote:
>>>
>>> BTW, I also think you can further simplify
>>> xenmem_reservation_va_mapping_* routines by bailing out right away if
>>> xen_feature(XENFEAT_auto_translated_physmap). In fact, you might even
>>> make them inlines, along the lines of
>>>
>>> inline void xenmem_reservation_va_mapping_reset(unsigned long count,
>>> struct page **pages)
>>> {
>>> #ifdef CONFIG_XEN_HAVE_PVMMU
>>> if (!xen_feature(XENFEAT_auto_translated_physmap))
>>> __xenmem_reservation_va_mapping_reset(...)
>>> #endif
>>> }
>> How about:
>>
>> #ifdef CONFIG_XEN_HAVE_PVMMU
>> static inline __xenmem_reservation_va_mapping_reset(struct page *page)
>> {
>> [...]
>> }
>> #endif
>>
>> and
>>
>> void xenmem_reservation_va_mapping_reset(unsigned long count,
>> struct page **pages)
>> {
>> #ifdef CONFIG_XEN_HAVE_PVMMU
>> if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> int i;
>>
>> for (i = 0; i < count; i++)
>> __xenmem_reservation_va_mapping_reset(pages[i]);
>> }
>> #endif
>> }
>>
>> This way I can use __xenmem_reservation_va_mapping_reset(page);
>> instead of xenmem_reservation_va_mapping_reset(1, &page);
>
> Sure, this also works.
Could you please take look at the patch attached if this is what we want?
> -boris
>
Thank you,
Oleksandr
On 05/31/2018 10:51 AM, Oleksandr Andrushchenko wrote:
> On 05/30/2018 10:24 PM, Boris Ostrovsky wrote:
>> On 05/30/2018 01:46 PM, Oleksandr Andrushchenko wrote:
>>> On 05/30/2018 06:54 PM, Boris Ostrovsky wrote:
>>>>
>>>> BTW, I also think you can further simplify
>>>> xenmem_reservation_va_mapping_* routines by bailing out right away if
>>>> xen_feature(XENFEAT_auto_translated_physmap). In fact, you might even
>>>> make them inlines, along the lines of
>>>>
>>>> inline void xenmem_reservation_va_mapping_reset(unsigned long count,
>>>> struct page **pages)
>>>> {
>>>> #ifdef CONFIG_XEN_HAVE_PVMMU
>>>> if (!xen_feature(XENFEAT_auto_translated_physmap))
>>>> __xenmem_reservation_va_mapping_reset(...)
>>>> #endif
>>>> }
>>> How about:
>>>
>>> #ifdef CONFIG_XEN_HAVE_PVMMU
>>> static inline __xenmem_reservation_va_mapping_reset(struct page *page)
>>> {
>>> [...]
>>> }
>>> #endif
>>>
>>> and
>>>
>>> void xenmem_reservation_va_mapping_reset(unsigned long count,
>>> struct page **pages)
>>> {
>>> #ifdef CONFIG_XEN_HAVE_PVMMU
>>> if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>>> int i;
>>>
>>> for (i = 0; i < count; i++)
>>> __xenmem_reservation_va_mapping_reset(pages[i]);
>>> }
>>> #endif
>>> }
>>>
>>> This way I can use __xenmem_reservation_va_mapping_reset(page);
>>> instead of xenmem_reservation_va_mapping_reset(1, &page);
>>
>> Sure, this also works.
> Could you please take look at the patch attached if this is what we want?
Please ignore it, it is ugly ;)
I have implemented this as you suggested:
static inline void
xenmem_reservation_va_mapping_update(unsigned long count,
struct page **pages,
xen_pfn_t *frames)
{
#ifdef CONFIG_XEN_HAVE_PVMMU
if (!xen_feature(XENFEAT_auto_translated_physmap))
__xenmem_reservation_va_mapping_update(count, pages, frames);
#endif
}
>> -boris
>>
> Thank you,
> Oleksandr
On 05/31/2018 08:55 AM, Oleksandr Andrushchenko wrote:
> On 05/31/2018 02:10 AM, Dongwon Kim wrote:
>> On Fri, May 25, 2018 at 06:33:29PM +0300, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <[email protected]>
>>>
>>> 1. Create a dma-buf from grant references provided by the foreign
>>> domain. By default dma-buf is backed by system memory pages, but
>>> by providing GNTDEV_DMA_FLAG_XXX flags it can also be created
>>> as a DMA write-combine/coherent buffer, e.g. allocated with
>>> corresponding dma_alloc_xxx API.
>>> Export the resulting buffer as a new dma-buf.
>>>
>>> 2. Implement waiting for the dma-buf to be released: block until the
>>> dma-buf with the file descriptor provided is released.
>>> If within the time-out provided the buffer is not released then
>>> -ETIMEDOUT error is returned. If the buffer with the file
>>> descriptor
>>> does not exist or has already been released, then -ENOENT is
>>> returned.
>>> For valid file descriptors this must not be treated as error.
>>>
>>> Signed-off-by: Oleksandr Andrushchenko
>>> <[email protected]>
>>> ---
>>> drivers/xen/gntdev.c | 478
>>> ++++++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 476 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>> index 9e450622af1a..52abc6cd5846 100644
>>> --- a/drivers/xen/gntdev.c
>>> +++ b/drivers/xen/gntdev.c
>>> @@ -4,6 +4,8 @@
>>> * Device for accessing (in user-space) pages that have been
>>> granted by other
>>> * domains.
>>> *
>>> + * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c.
>>> + *
>>> * Copyright (c) 2006-2007, D G Murray.
>>> * (c) 2009 Gerd Hoffmann <[email protected]>
>>> * (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
>>> @@ -41,6 +43,9 @@
>>> #ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>> #include <linux/of_device.h>
>>> #endif
>>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>>> +#include <linux/dma-buf.h>
>>> +#endif
>>> #include <xen/xen.h>
>>> #include <xen/grant_table.h>
>>> @@ -81,6 +86,17 @@ struct gntdev_priv {
>>> /* Device for which DMA memory is allocated. */
>>> struct device *dma_dev;
>>> #endif
>>> +
>>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>>> + /* Private data of the hyper DMA buffers. */
>>> +
>>> + /* List of exported DMA buffers. */
>>> + struct list_head dmabuf_exp_list;
>>> + /* List of wait objects. */
>>> + struct list_head dmabuf_exp_wait_list;
>>> + /* This is the lock which protects dma_buf_xxx lists. */
>>> + struct mutex dmabuf_lock;
>>> +#endif
>>> };
>>> struct unmap_notify {
>>> @@ -125,12 +141,38 @@ struct grant_map {
>>> #ifdef CONFIG_XEN_GNTDEV_DMABUF
>>> struct xen_dmabuf {
>>> + struct gntdev_priv *priv;
>>> + struct dma_buf *dmabuf;
>>> + struct list_head next;
>>> + int fd;
>>> +
>>> union {
>>> + struct {
>>> + /* Exported buffers are reference counted. */
>>> + struct kref refcount;
>>> + struct grant_map *map;
>>> + } exp;
>>> struct {
>>> /* Granted references of the imported buffer. */
>>> grant_ref_t *refs;
>>> } imp;
>>> } u;
>>> +
>>> + /* Number of pages this buffer has. */
>>> + int nr_pages;
>>> + /* Pages of this buffer. */
>>> + struct page **pages;
>>> +};
>>> +
>>> +struct xen_dmabuf_wait_obj {
>>> + struct list_head next;
>>> + struct xen_dmabuf *xen_dmabuf;
>>> + struct completion completion;
>>> +};
>>> +
>>> +struct xen_dmabuf_attachment {
>>> + struct sg_table *sgt;
>>> + enum dma_data_direction dir;
>>> };
>>> #endif
>>> @@ -320,6 +362,16 @@ static void gntdev_put_map(struct gntdev_priv
>>> *priv, struct grant_map *map)
>>> gntdev_free_map(map);
>>> }
>>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>>> +static void gntdev_remove_map(struct gntdev_priv *priv, struct
>>> grant_map *map)
>>> +{
>>> + mutex_lock(&priv->lock);
>>> + list_del(&map->next);
>>> + gntdev_put_map(NULL /* already removed */, map);
>>> + mutex_unlock(&priv->lock);
>>> +}
>>> +#endif
>>> +
>>> /*
>>> ------------------------------------------------------------------ */
>>> static int find_grant_ptes(pte_t *pte, pgtable_t token,
>>> @@ -628,6 +680,12 @@ static int gntdev_open(struct inode *inode,
>>> struct file *flip)
>>> INIT_LIST_HEAD(&priv->freeable_maps);
>>> mutex_init(&priv->lock);
>>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>>> + mutex_init(&priv->dmabuf_lock);
>>> + INIT_LIST_HEAD(&priv->dmabuf_exp_list);
>>> + INIT_LIST_HEAD(&priv->dmabuf_exp_wait_list);
>>> +#endif
>>> +
>>> if (use_ptemod) {
>>> priv->mm = get_task_mm(current);
>>> if (!priv->mm) {
>>> @@ -1053,17 +1111,433 @@ static long gntdev_ioctl_grant_copy(struct
>>> gntdev_priv *priv, void __user *u)
>>> /* DMA buffer export
>>> support. */
>>> /*
>>> ------------------------------------------------------------------ */
>>> +/*
>>> ------------------------------------------------------------------ */
>>> +/* Implementation of wait for exported DMA buffer to be
>>> released. */
>>> +/*
>>> ------------------------------------------------------------------ */
>>> +
>>> +static void dmabuf_exp_release(struct kref *kref);
>>> +
>>> +static struct xen_dmabuf_wait_obj *
>>> +dmabuf_exp_wait_obj_new(struct gntdev_priv *priv,
>>> + struct xen_dmabuf *xen_dmabuf)
>>> +{
>>> + struct xen_dmabuf_wait_obj *obj;
>>> +
>>> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
>>> + if (!obj)
>>> + return ERR_PTR(-ENOMEM);
>>> +
>>> + init_completion(&obj->completion);
>>> + obj->xen_dmabuf = xen_dmabuf;
>>> +
>>> + mutex_lock(&priv->dmabuf_lock);
>>> + list_add(&obj->next, &priv->dmabuf_exp_wait_list);
>>> + /* Put our reference and wait for xen_dmabuf's release to fire. */
>>> + kref_put(&xen_dmabuf->u.exp.refcount, dmabuf_exp_release);
>>> + mutex_unlock(&priv->dmabuf_lock);
>>> + return obj;
>>> +}
>>> +
>>> +static void dmabuf_exp_wait_obj_free(struct gntdev_priv *priv,
>>> + struct xen_dmabuf_wait_obj *obj)
>>> +{
>>> + struct xen_dmabuf_wait_obj *cur_obj, *q;
>>> +
>>> + mutex_lock(&priv->dmabuf_lock);
>>> + list_for_each_entry_safe(cur_obj, q,
>>> &priv->dmabuf_exp_wait_list, next)
>>> + if (cur_obj == obj) {
>>> + list_del(&obj->next);
>>> + kfree(obj);
>>> + break;
>>> + }
>>> + mutex_unlock(&priv->dmabuf_lock);
>>> +}
>>> +
>>> +static int dmabuf_exp_wait_obj_wait(struct xen_dmabuf_wait_obj *obj,
>>> + u32 wait_to_ms)
>>> +{
>>> + if (wait_for_completion_timeout(&obj->completion,
>>> + msecs_to_jiffies(wait_to_ms)) <= 0)
>>> + return -ETIMEDOUT;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void dmabuf_exp_wait_obj_signal(struct gntdev_priv *priv,
>>> + struct xen_dmabuf *xen_dmabuf)
>>> +{
>>> + struct xen_dmabuf_wait_obj *obj, *q;
>>> +
>>> + list_for_each_entry_safe(obj, q, &priv->dmabuf_exp_wait_list,
>>> next)
>>> + if (obj->xen_dmabuf == xen_dmabuf) {
>>> + pr_debug("Found xen_dmabuf in the wait list, wake\n");
>>> + complete_all(&obj->completion);
>>> + }
>>> +}
>>> +
>>> +static struct xen_dmabuf *
>>> +dmabuf_exp_wait_obj_get_by_fd(struct gntdev_priv *priv, int fd)
>>> +{
>>> + struct xen_dmabuf *q, *xen_dmabuf, *ret = ERR_PTR(-ENOENT);
>>> +
>>> + mutex_lock(&priv->dmabuf_lock);
>>> + list_for_each_entry_safe(xen_dmabuf, q, &priv->dmabuf_exp_list,
>>> next)
>>> + if (xen_dmabuf->fd == fd) {
>>> + pr_debug("Found xen_dmabuf in the wait list\n");
>>> + kref_get(&xen_dmabuf->u.exp.refcount);
>>> + ret = xen_dmabuf;
>>> + break;
>>> + }
>>> + mutex_unlock(&priv->dmabuf_lock);
>>> + return ret;
>>> +}
>>> +
>>> static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
>>> int wait_to_ms)
>>> {
>>> - return -ETIMEDOUT;
>>> + struct xen_dmabuf *xen_dmabuf;
>>> + struct xen_dmabuf_wait_obj *obj;
>>> + int ret;
>>> +
>>> + pr_debug("Will wait for dma-buf with fd %d\n", fd);
>>> + /*
>>> + * Try to find the DMA buffer: if not found means that
>>> + * either the buffer has already been released or file descriptor
>>> + * provided is wrong.
>>> + */
>>> + xen_dmabuf = dmabuf_exp_wait_obj_get_by_fd(priv, fd);
>>> + if (IS_ERR(xen_dmabuf))
>>> + return PTR_ERR(xen_dmabuf);
>>> +
>>> + /*
>>> + * xen_dmabuf still exists and is reference count locked by us
>>> now,
>>> + * so prepare to wait: allocate wait object and add it to the
>>> wait list,
>>> + * so we can find it on release.
>>> + */
>>> + obj = dmabuf_exp_wait_obj_new(priv, xen_dmabuf);
>>> + if (IS_ERR(obj)) {
>>> + pr_err("Failed to setup wait object, ret %ld\n",
>>> PTR_ERR(obj));
>>> + return PTR_ERR(obj);
>>> + }
>>> +
>>> + ret = dmabuf_exp_wait_obj_wait(obj, wait_to_ms);
>>> + dmabuf_exp_wait_obj_free(priv, obj);
>>> + return ret;
>>> +}
>>> +
>>> +/*
>>> ------------------------------------------------------------------ */
>>> +/* DMA buffer export
>>> support. */
>>> +/*
>>> ------------------------------------------------------------------ */
>>> +
>>> +static struct sg_table *
>>> +dmabuf_pages_to_sgt(struct page **pages, unsigned int nr_pages)
>>> +{
>>> + struct sg_table *sgt;
>>> + int ret;
>>> +
>>> + sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
>>> + if (!sgt) {
>>> + ret = -ENOMEM;
>>> + goto out;
>>> + }
>>> +
>>> + ret = sg_alloc_table_from_pages(sgt, pages, nr_pages, 0,
>>> + nr_pages << PAGE_SHIFT,
>>> + GFP_KERNEL);
>>> + if (ret)
>>> + goto out;
>>> +
>>> + return sgt;
>>> +
>>> +out:
>>> + kfree(sgt);
>>> + return ERR_PTR(ret);
>>> +}
>>> +
>>> +static int dmabuf_exp_ops_attach(struct dma_buf *dma_buf,
>>> + struct device *target_dev,
>>> + struct dma_buf_attachment *attach)
>>> +{
>>> + struct xen_dmabuf_attachment *xen_dmabuf_attach;
>>> +
>>> + xen_dmabuf_attach = kzalloc(sizeof(*xen_dmabuf_attach),
>>> GFP_KERNEL);
>>> + if (!xen_dmabuf_attach)
>>> + return -ENOMEM;
>>> +
>>> + xen_dmabuf_attach->dir = DMA_NONE;
>>> + attach->priv = xen_dmabuf_attach;
>>> + /* Might need to pin the pages of the buffer now. */
>>> + return 0;
>>> +}
>>> +
>>> +static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
>>> + struct dma_buf_attachment *attach)
>>> +{
>>> + struct xen_dmabuf_attachment *xen_dmabuf_attach = attach->priv;
>>> +
>>> + if (xen_dmabuf_attach) {
>>> + struct sg_table *sgt = xen_dmabuf_attach->sgt;
>>> +
>>> + if (sgt) {
>>> + if (xen_dmabuf_attach->dir != DMA_NONE)
>>> + dma_unmap_sg_attrs(attach->dev, sgt->sgl,
>>> + sgt->nents,
>>> + xen_dmabuf_attach->dir,
>>> + DMA_ATTR_SKIP_CPU_SYNC);
>>> + sg_free_table(sgt);
>>> + }
>>> +
>>> + kfree(sgt);
>>> + kfree(xen_dmabuf_attach);
>>> + attach->priv = NULL;
>>> + }
>>> + /* Might need to unpin the pages of the buffer now. */
>>> +}
>>> +
>>> +static struct sg_table *
>>> +dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
>>> + enum dma_data_direction dir)
>>> +{
>>> + struct xen_dmabuf_attachment *xen_dmabuf_attach = attach->priv;
>>> + struct xen_dmabuf *xen_dmabuf = attach->dmabuf->priv;
>>> + struct sg_table *sgt;
>>> +
>>> + pr_debug("Mapping %d pages for dev %p\n", xen_dmabuf->nr_pages,
>>> + attach->dev);
>>> +
>>> + if (WARN_ON(dir == DMA_NONE || !xen_dmabuf_attach))
>>> + return ERR_PTR(-EINVAL);
>>> +
>>> + /* Return the cached mapping when possible. */
>>> + if (xen_dmabuf_attach->dir == dir)
>>> + return xen_dmabuf_attach->sgt;
>> may need to check xen_dmabuf_attach->sgt == NULL (i.e. first time
>> mapping)?
>> Also, I am not sure if this mechanism of reusing previously generated
>> sgt
>> for other mappings is universally ok for any use-cases... I don't
>> know if
>> it is acceptable as per the specification.
> Well, I was not sure about this piece of code as well,
> so I'll probably allocate a new sgt each time and do not reuse it
> as now
The sgt returned for the same attachment, so it is ok to return this
cached one
as we also check that the direction has not changed
So, I'll leave it as is
>>> +
>>> + /*
>>> + * Two mappings with different directions for the same
>>> attachment are
>>> + * not allowed.
>>> + */
>>> + if (WARN_ON(xen_dmabuf_attach->dir != DMA_NONE))
>>> + return ERR_PTR(-EBUSY);
>>> +
>>> + sgt = dmabuf_pages_to_sgt(xen_dmabuf->pages,
>>> xen_dmabuf->nr_pages);
>>> + if (!IS_ERR(sgt)) {
>>> + if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
>>> + DMA_ATTR_SKIP_CPU_SYNC)) {
>>> + sg_free_table(sgt);
>>> + kfree(sgt);
>>> + sgt = ERR_PTR(-ENOMEM);
>>> + } else {
>>> + xen_dmabuf_attach->sgt = sgt;
>>> + xen_dmabuf_attach->dir = dir;
>>> + }
>>> + }
>>> + if (IS_ERR(sgt))
>>> + pr_err("Failed to map sg table for dev %p\n", attach->dev);
>>> + return sgt;
>>> +}
>>> +
>>> +static void dmabuf_exp_ops_unmap_dma_buf(struct dma_buf_attachment
>>> *attach,
>>> + struct sg_table *sgt,
>>> + enum dma_data_direction dir)
>>> +{
>>> + /* Not implemented. The unmap is done at
>>> dmabuf_exp_ops_detach(). */
>> Not sure if it's ok to do nothing here because the spec says this
>> function is
>> mandatory and it should unmap and "release" &sg_table associated with
>> it.
>>
>> /**
>> * @unmap_dma_buf:
>> *
>> * This is called by dma_buf_unmap_attachment() and should unmap and
>> * release the &sg_table allocated in @map_dma_buf, and it is
>> mandatory.
>> * It should also unpin the backing storage if this is the last
>> mapping
>> * of the DMA buffer, it the exporter supports backing storage
>> * migration.
>> */
> Yes, as I say at the top of the file dma-buf handling is DRM PRIME
> based, so I have the workflow just like in there.
> Do you think we have to be more strict and rework this?
>
> Daniel, what do you think?
I see other drivers in the kernel do the same. I think that *should*
in the dma-buf documentation does allow that.
So, I'll leave it as is for now
>>> +}
>>> +
>>> +static void dmabuf_exp_release(struct kref *kref)
>>> +{
>>> + struct xen_dmabuf *xen_dmabuf =
>>> + container_of(kref, struct xen_dmabuf, u.exp.refcount);
>>> +
>>> + dmabuf_exp_wait_obj_signal(xen_dmabuf->priv, xen_dmabuf);
>>> + list_del(&xen_dmabuf->next);
>>> + kfree(xen_dmabuf);
>>> +}
>>> +
>>> +static void dmabuf_exp_ops_release(struct dma_buf *dma_buf)
>>> +{
>>> + struct xen_dmabuf *xen_dmabuf = dma_buf->priv;
>>> + struct gntdev_priv *priv = xen_dmabuf->priv;
>>> +
>>> + gntdev_remove_map(priv, xen_dmabuf->u.exp.map);
>>> + mutex_lock(&priv->dmabuf_lock);
>>> + kref_put(&xen_dmabuf->u.exp.refcount, dmabuf_exp_release);
>>> + mutex_unlock(&priv->dmabuf_lock);
>>> +}
>>> +
>>> +static void *dmabuf_exp_ops_kmap_atomic(struct dma_buf *dma_buf,
>>> + unsigned long page_num)
>>> +{
>>> + /* Not implemented. */
>>> + return NULL;
>>> +}
>>> +
>>> +static void dmabuf_exp_ops_kunmap_atomic(struct dma_buf *dma_buf,
>>> + unsigned long page_num, void *addr)
>>> +{
>>> + /* Not implemented. */
>>> +}
>>> +
>>> +static void *dmabuf_exp_ops_kmap(struct dma_buf *dma_buf,
>>> + unsigned long page_num)
>>> +{
>>> + /* Not implemented. */
>>> + return NULL;
>>> +}
>>> +
>>> +static void dmabuf_exp_ops_kunmap(struct dma_buf *dma_buf,
>>> + unsigned long page_num, void *addr)
>>> +{
>>> + /* Not implemented. */
>>> +}
>>> +
>>> +static int dmabuf_exp_ops_mmap(struct dma_buf *dma_buf,
>>> + struct vm_area_struct *vma)
>>> +{
>>> + /* Not implemented. */
>>> + return 0;
>>> +}
>>> +
>>> +static const struct dma_buf_ops dmabuf_exp_ops = {
>>> + .attach = dmabuf_exp_ops_attach,
>>> + .detach = dmabuf_exp_ops_detach,
>>> + .map_dma_buf = dmabuf_exp_ops_map_dma_buf,
>>> + .unmap_dma_buf = dmabuf_exp_ops_unmap_dma_buf,
>>> + .release = dmabuf_exp_ops_release,
>>> + .map = dmabuf_exp_ops_kmap,
>>> + .map_atomic = dmabuf_exp_ops_kmap_atomic,
>>> + .unmap = dmabuf_exp_ops_kunmap,
>>> + .unmap_atomic = dmabuf_exp_ops_kunmap_atomic,
>>> + .mmap = dmabuf_exp_ops_mmap,
>>> +};
>>> +
>>> +static int dmabuf_export(struct gntdev_priv *priv, struct grant_map
>>> *map,
>>> + int *fd)
>>> +{
>>> + DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>>> + struct xen_dmabuf *xen_dmabuf;
>>> + int ret = 0;
>>> +
>>> + xen_dmabuf = kzalloc(sizeof(*xen_dmabuf), GFP_KERNEL);
>>> + if (!xen_dmabuf)
>>> + return -ENOMEM;
>>> +
>>> + kref_init(&xen_dmabuf->u.exp.refcount);
>>> +
>>> + xen_dmabuf->priv = priv;
>>> + xen_dmabuf->nr_pages = map->count;
>>> + xen_dmabuf->pages = map->pages;
>>> + xen_dmabuf->u.exp.map = map;
>>> +
>>> + exp_info.exp_name = KBUILD_MODNAME;
>>> + if (map->dma_dev->driver && map->dma_dev->driver->owner)
>>> + exp_info.owner = map->dma_dev->driver->owner;
>>> + else
>>> + exp_info.owner = THIS_MODULE;
>>> + exp_info.ops = &dmabuf_exp_ops;
>>> + exp_info.size = map->count << PAGE_SHIFT;
>>> + exp_info.flags = O_RDWR;
>>> + exp_info.priv = xen_dmabuf;
>>> +
>>> + xen_dmabuf->dmabuf = dma_buf_export(&exp_info);
>>> + if (IS_ERR(xen_dmabuf->dmabuf)) {
>>> + ret = PTR_ERR(xen_dmabuf->dmabuf);
>>> + xen_dmabuf->dmabuf = NULL;
>>> + goto fail;
>>> + }
>>> +
>>> + ret = dma_buf_fd(xen_dmabuf->dmabuf, O_CLOEXEC);
>>> + if (ret < 0)
>>> + goto fail;
>>> +
>>> + xen_dmabuf->fd = ret;
>>> + *fd = ret;
>>> +
>>> + pr_debug("Exporting DMA buffer with fd %d\n", ret);
>>> +
>>> + mutex_lock(&priv->dmabuf_lock);
>>> + list_add(&xen_dmabuf->next, &priv->dmabuf_exp_list);
>>> + mutex_unlock(&priv->dmabuf_lock);
>>> + return 0;
>>> +
>>> +fail:
>>> + if (xen_dmabuf->dmabuf)
>>> + dma_buf_put(xen_dmabuf->dmabuf);
>>> + kfree(xen_dmabuf);
>>> + return ret;
>>> +}
>>> +
>>> +static struct grant_map *
>>> +dmabuf_exp_alloc_backing_storage(struct gntdev_priv *priv, int
>>> dmabuf_flags,
>>> + int count)
>>> +{
>>> + struct grant_map *map;
>>> +
>>> + if (unlikely(count <= 0))
>>> + return ERR_PTR(-EINVAL);
>>> +
>>> + if ((dmabuf_flags & GNTDEV_DMA_FLAG_WC) &&
>>> + (dmabuf_flags & GNTDEV_DMA_FLAG_COHERENT)) {
>>> + pr_err("Wrong dma-buf flags: either WC or coherent, not
>>> both\n");
>>> + return ERR_PTR(-EINVAL);
>>> + }
>>> +
>>> + map = gntdev_alloc_map(priv, count, dmabuf_flags);
>>> + if (!map)
>>> + return ERR_PTR(-ENOMEM);
>>> +
>>> + if (unlikely(atomic_add_return(count, &pages_mapped) > limit)) {
>>> + pr_err("can't map: over limit\n");
>>> + gntdev_put_map(NULL, map);
>>> + return ERR_PTR(-ENOMEM);
>>> + }
>>> + return map;
>>> }
>> When and how would this allocation be freed? I don't see any ioctl
>> for freeing up
>> shared pages.
> on xen_dmabuf.release callback which is refcounted
>>> static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int
>>> flags,
>>> int count, u32 domid, u32 *refs, u32 *fd)
>>> {
>>> + struct grant_map *map;
>>> + int i, ret;
>>> +
>>> *fd = -1;
>>> - return -EINVAL;
>>> +
>>> + if (use_ptemod) {
>>> + pr_err("Cannot provide dma-buf: use_ptemode %d\n",
>>> + use_ptemod);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + map = dmabuf_exp_alloc_backing_storage(priv, flags, count);
>>> + if (IS_ERR(map))
>>> + return PTR_ERR(map);
>>> +
>>> + for (i = 0; i < count; i++) {
>>> + map->grants[i].domid = domid;
>>> + map->grants[i].ref = refs[i];
>>> + }
>>> +
>>> + mutex_lock(&priv->lock);
>>> + gntdev_add_map(priv, map);
>>> + mutex_unlock(&priv->lock);
>>> +
>>> + map->flags |= GNTMAP_host_map;
>>> +#if defined(CONFIG_X86)
>>> + map->flags |= GNTMAP_device_map;
>>> +#endif
>>> +
>>> + ret = map_grant_pages(map);
>>> + if (ret < 0)
>>> + goto out;
>>> +
>>> + ret = dmabuf_export(priv, map, fd);
>>> + if (ret < 0)
>>> + goto out;
>>> +
>>> + return 0;
>>> +
>>> +out:
>>> + gntdev_remove_map(priv, map);
>>> + return ret;
>>> }
>>> /*
>>> ------------------------------------------------------------------ */
>>> --
>>> 2.17.0
>>>
>
On 05/31/2018 08:51 AM, Oleksandr Andrushchenko wrote:
> On 05/31/2018 04:46 AM, Boris Ostrovsky wrote:
>>
>>
>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>>
>>>
>>> Oleksandr Andrushchenko (8):
>>> xen/grant-table: Make set/clear page private code shared
>>> xen/balloon: Move common memory reservation routines to a module
>>> xen/grant-table: Allow allocating buffers suitable for DMA
>>> xen/gntdev: Allow mappings for DMA buffers
>>> xen/gntdev: Add initial support for dma-buf UAPI
>>> xen/gntdev: Implement dma-buf export functionality
>>> xen/gntdev: Implement dma-buf import functionality
>>> xen/gntdev: Expose gntdev's dma-buf API for in-kernel use
>>>
>>> drivers/xen/Kconfig | 23 +
>>> drivers/xen/Makefile | 1 +
>>> drivers/xen/balloon.c | 71 +--
>>> drivers/xen/gntdev.c | 1025
>>> ++++++++++++++++++++++++++++++++-
>>
>>
>> I think this calls for gntdev_dma.c.
> I assume you mean as a separate file (part of gntdev driver)?
>> I only had a quick look over gntdev changes but they very much are
>> concentrated in dma-specific routines.
>>
> I tried to do that, but there are some dependencies between the
> gntdev.c and gntdev_dma.c,
> so finally I decided to put it all together.
>> You essentially only share file_operations entry points with original
>> gntdev code, right?
>>
> fops + mappings done by gntdev (struct grant_map) and I need to
> release map on dma_buf .release
> callback which makes some cross-dependencies between modules which
> seemed to be not cute
> (gntdev keeps its all structs and functions inside, so I cannot easily
> access those w/o
> helpers).
>
> But I'll try one more time and move all DMA specific stuff into
> gntdev_dma.c
Could you please take a quick look at the way I re-structured the
sources here [1]?
If this is what you meant.
Thank you,
Oleksandr
>> -boris
>>
> Thank you,
> Oleksandr
>>
>>> drivers/xen/grant-table.c | 176 +++++-
>>> drivers/xen/mem-reservation.c | 134 +++++
>>> include/uapi/xen/gntdev.h | 106 ++++
>>> include/xen/grant_dev.h | 37 ++
>>> include/xen/grant_table.h | 28 +
>>> include/xen/mem_reservation.h | 29 +
>>> 10 files changed, 1527 insertions(+), 103 deletions(-)
>>> create mode 100644 drivers/xen/mem-reservation.c
>>> create mode 100644 include/xen/grant_dev.h
>>> create mode 100644 include/xen/mem_reservation.h
>>>
>
[1]
https://github.com/andr2000/linux/commits/xen_tip_linux_next_xen_dma_buf_v2
On 05/31/2018 01:51 AM, Oleksandr Andrushchenko wrote:
> On 05/31/2018 04:46 AM, Boris Ostrovsky wrote:
>>
>>
>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>>
>>>
>>> Oleksandr Andrushchenko (8):
>>> xen/grant-table: Make set/clear page private code shared
>>> xen/balloon: Move common memory reservation routines to a module
>>> xen/grant-table: Allow allocating buffers suitable for DMA
>>> xen/gntdev: Allow mappings for DMA buffers
>>> xen/gntdev: Add initial support for dma-buf UAPI
>>> xen/gntdev: Implement dma-buf export functionality
>>> xen/gntdev: Implement dma-buf import functionality
>>> xen/gntdev: Expose gntdev's dma-buf API for in-kernel use
>>>
>>> drivers/xen/Kconfig | 23 +
>>> drivers/xen/Makefile | 1 +
>>> drivers/xen/balloon.c | 71 +--
>>> drivers/xen/gntdev.c | 1025
>>> ++++++++++++++++++++++++++++++++-
>>
>>
>> I think this calls for gntdev_dma.c.
> I assume you mean as a separate file (part of gntdev driver)?
Yes, source only. The driver stays the same.
>> I only had a quick look over gntdev changes but they very much are
>> concentrated in dma-specific routines.
>>
> I tried to do that, but there are some dependencies between the
> gntdev.c and gntdev_dma.c,
> so finally I decided to put it all together.
>> You essentially only share file_operations entry points with original
>> gntdev code, right?
>>
> fops + mappings done by gntdev (struct grant_map) and I need to
> release map on dma_buf .release
> callback which makes some cross-dependencies between modules which
> seemed to be not cute
> (gntdev keeps its all structs and functions inside, so I cannot easily
> access those w/o
> helpers).
>
> But I'll try one more time and move all DMA specific stuff into
> gntdev_dma.c
Yes, please try it. Maybe even have gntdev_common.c, gntdev_mem.c (??)
and gntdev_dma.c.
-boris
>> -boris
>>
> Thank you,
> Oleksandr
>>
>>> drivers/xen/grant-table.c | 176 +++++-
>>> drivers/xen/mem-reservation.c | 134 +++++
>>> include/uapi/xen/gntdev.h | 106 ++++
>>> include/xen/grant_dev.h | 37 ++
>>> include/xen/grant_table.h | 28 +
>>> include/xen/mem_reservation.h | 29 +
>>> 10 files changed, 1527 insertions(+), 103 deletions(-)
>>> create mode 100644 drivers/xen/mem-reservation.c
>>> create mode 100644 include/xen/grant_dev.h
>>> create mode 100644 include/xen/mem_reservation.h
>>>
>
On 05/31/2018 10:41 AM, Oleksandr Andrushchenko wrote:
> On 05/31/2018 08:51 AM, Oleksandr Andrushchenko wrote:
>> On 05/31/2018 04:46 AM, Boris Ostrovsky wrote:
>>>
>>>
>>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>>>
>>>>
>>>> Oleksandr Andrushchenko (8):
>>>> xen/grant-table: Make set/clear page private code shared
>>>> xen/balloon: Move common memory reservation routines to a module
>>>> xen/grant-table: Allow allocating buffers suitable for DMA
>>>> xen/gntdev: Allow mappings for DMA buffers
>>>> xen/gntdev: Add initial support for dma-buf UAPI
>>>> xen/gntdev: Implement dma-buf export functionality
>>>> xen/gntdev: Implement dma-buf import functionality
>>>> xen/gntdev: Expose gntdev's dma-buf API for in-kernel use
>>>>
>>>> drivers/xen/Kconfig | 23 +
>>>> drivers/xen/Makefile | 1 +
>>>> drivers/xen/balloon.c | 71 +--
>>>> drivers/xen/gntdev.c | 1025
>>>> ++++++++++++++++++++++++++++++++-
>>>
>>>
>>> I think this calls for gntdev_dma.c.
>> I assume you mean as a separate file (part of gntdev driver)?
>>> I only had a quick look over gntdev changes but they very much are
>>> concentrated in dma-specific routines.
>>>
>> I tried to do that, but there are some dependencies between the
>> gntdev.c and gntdev_dma.c,
>> so finally I decided to put it all together.
>>> You essentially only share file_operations entry points with
>>> original gntdev code, right?
>>>
>> fops + mappings done by gntdev (struct grant_map) and I need to
>> release map on dma_buf .release
>> callback which makes some cross-dependencies between modules which
>> seemed to be not cute
>> (gntdev keeps its all structs and functions inside, so I cannot
>> easily access those w/o
>> helpers).
>>
>> But I'll try one more time and move all DMA specific stuff into
>> gntdev_dma.c
> Could you please take a quick look at the way I re-structured the
> sources here [1]?
> If this is what you meant.
I looked at final gntdev.c code and I think at least one of the chunks
there ("DMA buffer export support. ") can also be moved out. It still
have a bit too many ifdefs but it looks better to my eye than jamming
everything into a single file (and I think more code can be moved out,
but we can talk about it when you post the patches so that we can see
context).
BTW, I believe it won't build with !CONFIG_XEN_GNTDEV_DMABUF ---
gntdev_remove_map() is defined under this option and is referenced later
without it.
-boris
>
> Thank you,
> Oleksandr
>>> -boris
>>>
>> Thank you,
>> Oleksandr
>>>
>>>> drivers/xen/grant-table.c | 176 +++++-
>>>> drivers/xen/mem-reservation.c | 134 +++++
>>>> include/uapi/xen/gntdev.h | 106 ++++
>>>> include/xen/grant_dev.h | 37 ++
>>>> include/xen/grant_table.h | 28 +
>>>> include/xen/mem_reservation.h | 29 +
>>>> 10 files changed, 1527 insertions(+), 103 deletions(-)
>>>> create mode 100644 drivers/xen/mem-reservation.c
>>>> create mode 100644 include/xen/grant_dev.h
>>>> create mode 100644 include/xen/mem_reservation.h
>>>>
>>
> [1]
> https://github.com/andr2000/linux/commits/xen_tip_linux_next_xen_dma_buf_v2
On 05/31/2018 11:25 PM, Boris Ostrovsky wrote:
> On 05/31/2018 10:41 AM, Oleksandr Andrushchenko wrote:
>> On 05/31/2018 08:51 AM, Oleksandr Andrushchenko wrote:
>>> On 05/31/2018 04:46 AM, Boris Ostrovsky wrote:
>>>>
>>>> On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
>>>>
>>>>> Oleksandr Andrushchenko (8):
>>>>> xen/grant-table: Make set/clear page private code shared
>>>>> xen/balloon: Move common memory reservation routines to a module
>>>>> xen/grant-table: Allow allocating buffers suitable for DMA
>>>>> xen/gntdev: Allow mappings for DMA buffers
>>>>> xen/gntdev: Add initial support for dma-buf UAPI
>>>>> xen/gntdev: Implement dma-buf export functionality
>>>>> xen/gntdev: Implement dma-buf import functionality
>>>>> xen/gntdev: Expose gntdev's dma-buf API for in-kernel use
>>>>>
>>>>> drivers/xen/Kconfig | 23 +
>>>>> drivers/xen/Makefile | 1 +
>>>>> drivers/xen/balloon.c | 71 +--
>>>>> drivers/xen/gntdev.c | 1025
>>>>> ++++++++++++++++++++++++++++++++-
>>>>
>>>> I think this calls for gntdev_dma.c.
>>> I assume you mean as a separate file (part of gntdev driver)?
>>>> I only had a quick look over gntdev changes but they very much are
>>>> concentrated in dma-specific routines.
>>>>
>>> I tried to do that, but there are some dependencies between the
>>> gntdev.c and gntdev_dma.c,
>>> so finally I decided to put it all together.
>>>> You essentially only share file_operations entry points with
>>>> original gntdev code, right?
>>>>
>>> fops + mappings done by gntdev (struct grant_map) and I need to
>>> release map on dma_buf .release
>>> callback which makes some cross-dependencies between modules which
>>> seemed to be not cute
>>> (gntdev keeps its all structs and functions inside, so I cannot
>>> easily access those w/o
>>> helpers).
>>>
>>> But I'll try one more time and move all DMA specific stuff into
>>> gntdev_dma.c
>> Could you please take a quick look at the way I re-structured the
>> sources here [1]?
>> If this is what you meant.
>
> I looked at final gntdev.c code and I think at least one of the chunks
> there ("DMA buffer export support. ") can also be moved out. It still
> have a bit too many ifdefs but it looks better to my eye than jamming
> everything into a single file (and I think more code can be moved out,
> but we can talk about it when you post the patches so that we can see
> context).
Sure, will send v2 after I re-check all the patches
and run some smoke tests again
>
> BTW, I believe it won't build with !CONFIG_XEN_GNTDEV_DMABUF ---
> gntdev_remove_map() is defined under this option and is referenced later
> without it.
Will check, thank you
>
> -boris
>
>
>> Thank you,
>> Oleksandr
>>>> -boris
>>>>
>>> Thank you,
>>> Oleksandr
>>>>> drivers/xen/grant-table.c | 176 +++++-
>>>>> drivers/xen/mem-reservation.c | 134 +++++
>>>>> include/uapi/xen/gntdev.h | 106 ++++
>>>>> include/xen/grant_dev.h | 37 ++
>>>>> include/xen/grant_table.h | 28 +
>>>>> include/xen/mem_reservation.h | 29 +
>>>>> 10 files changed, 1527 insertions(+), 103 deletions(-)
>>>>> create mode 100644 drivers/xen/mem-reservation.c
>>>>> create mode 100644 include/xen/grant_dev.h
>>>>> create mode 100644 include/xen/mem_reservation.h
>>>>>
>> [1]
>> https://github.com/andr2000/linux/commits/xen_tip_linux_next_xen_dma_buf_v2