From: Oleksandr Andrushchenko <[email protected]>
This work is in response to my previous attempt to introduce Xen/DRM
zero-copy driver [1] to enable Linux dma-buf API [2] for Xen based
frontends/backends. There is also an existing hyper_dmabuf approach
available [3] which, if reworked to utilize the proposed solution,
can greatly benefit as well.
RFC for this series was published and discussed [9], comments addressed.
The original rationale behind this work was to enable zero-copying
use-cases while working with Xen para-virtual display driver [4]:
when using Xen PV DRM frontend driver then on backend side one will
need to do copying of display buffers' contents (filled by the
frontend's user-space) into buffers allocated at the backend side.
Taking into account the size of display buffers and frames per
second it may result in unneeded huge data bus occupation and
performance loss.
The helper driver [4] allows implementing zero-copying use-cases
when using Xen para-virtualized frontend display driver by implementing
a DRM/KMS helper driver running on backend's side.
It utilizes PRIME buffers API (implemented on top of Linux dma-buf)
to share frontend's buffers with physical device drivers on
backend's side:
- a dumb buffer created on backend's side can be shared
with the Xen PV frontend driver, so it directly writes
into backend's domain memory (into the buffer exported from
DRM/KMS driver of a physical display device)
- a dumb buffer allocated by the frontend can be imported
into physical device DRM/KMS driver, thus allowing to
achieve no copying as well
Finally, it was discussed and decided ([1], [5]) that it is worth
implementing such use-cases via extension of the existing Xen gntdev
driver instead of introducing new DRM specific driver.
Please note, that the support of dma-buf is Linux only,
as dma-buf is a Linux only thing.
Now to the proposed solution. The changes to the existing Xen drivers
in the Linux kernel fall into 2 categories:
1. DMA-able memory buffer allocation and increasing/decreasing memory
reservation of the pages of such a buffer.
This is required if we are about to share dma-buf with the hardware
that does require those to be allocated with dma_alloc_xxx API.
(It is still possible to allocate a dma-buf from any system memory,
e.g. system pages).
2. Extension of the gntdev driver to enable it to import/export dma-buf’s.
The first five patches are in preparation for Xen dma-buf support,
but I consider those usable regardless of the dma-buf use-case,
e.g. other frontend/backend kernel modules may also benefit from these
for better code reuse:
0001-xen-grant-table-Export-gnttab_-alloc-free-_pages-as-.patch
0002-xen-grant-table-Make-set-clear-page-private-code-sha.patch
0003-xen-balloon-Share-common-memory-reservation-routines.patch
0004-xen-grant-table-Allow-allocating-buffers-suitable-fo.patch
0005-xen-gntdev-Allow-mappings-for-DMA-buffers.patch
The next three patches are Xen implementation of dma-buf as part of
the grant device:
0006-xen-gntdev-Add-initial-support-for-dma-buf-UAPI.patch
0007-xen-gntdev-Implement-dma-buf-export-functionality.patch
0008-xen-gntdev-Implement-dma-buf-import-functionality.patch
The last patch makes it possible for in-kernel use of Xen dma-buf API:
0009-xen-gntdev-Expose-gntdev-s-dma-buf-API-for-in-kernel.patch
The corresponding libxengnttab changes are available at [6].
All the above was tested with display backend [7] and its accompanying
helper library [8] on Renesas ARM64 based board.
Basic balloon tests on x86.
*To all the communities*: I would like to ask you to review the proposed
solution and give feedback on it, so I can improve and send final
patches for review (this is still work in progress, but enough to start
discussing the implementation).
Thank you in advance,
Oleksandr Andrushchenko
[1] https://lists.freedesktop.org/archives/dri-devel/2018-April/173163.html
[2] https://elixir.bootlin.com/linux/v4.17-rc5/source/Documentation/driver-api/dma-buf.rst
[3] https://lists.xenproject.org/archives/html/xen-devel/2018-02/msg01202.html
[4] https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/xen
[5] https://patchwork.kernel.org/patch/10279681/
[6] https://github.com/andr2000/xen/tree/xen_dma_buf_v1
[7] https://github.com/andr2000/displ_be/tree/xen_dma_buf_v1
[8] https://github.com/andr2000/libxenbe/tree/xen_dma_buf_v1
[9] https://lkml.org/lkml/2018/5/17/215
Changes since v1:
*****************
- Define GNTDEV_DMA_FLAG_XXX starting from bit 0
- Rename mem_reservation.h to mem-reservation.h
- Remove usless comments
- Change licenses from GPLv2 OR MIT to GPLv2 only
- Make xenmem_reservation_va_mapping_{update|clear} inline
- Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL for new functions
- Make gnttab_dma_{alloc|free}_pages to request frames array
be allocated outside
- Fixe gnttab_dma_alloc_pages fail path (added xenmem_reservation_increase)
- Move most of dma-buf from gntdev.c to gntdev-dmabuf.c
- Add required dependencies to Kconfig
- Rework "#ifdef CONFIG_XEN_XXX" for if/else
- Export gnttab_{alloc|free}_pages as GPL symbols (patch 1)
Oleksandr Andrushchenko (9):
xen/grant-table: Export gnttab_{alloc|free}_pages as GPL
xen/grant-table: Make set/clear page private code shared
xen/balloon: Share common memory reservation routines
xen/grant-table: Allow allocating buffers suitable for DMA
xen/gntdev: Allow mappings for DMA buffers
xen/gntdev: Add initial support for dma-buf UAPI
xen/gntdev: Implement dma-buf export functionality
xen/gntdev: Implement dma-buf import functionality
xen/gntdev: Expose gntdev's dma-buf API for in-kernel use
drivers/xen/Kconfig | 23 ++
drivers/xen/Makefile | 2 +
drivers/xen/balloon.c | 71 +---
drivers/xen/gntdev-dmabuf.c | 707 ++++++++++++++++++++++++++++++++++
drivers/xen/gntdev-dmabuf.h | 48 +++
drivers/xen/gntdev.c | 387 ++++++++++++++++++-
drivers/xen/grant-table.c | 165 +++++++-
drivers/xen/mem-reservation.c | 120 ++++++
include/uapi/xen/gntdev.h | 106 +++++
include/xen/grant_dev.h | 37 ++
include/xen/grant_table.h | 21 +
include/xen/mem-reservation.h | 65 ++++
12 files changed, 1647 insertions(+), 105 deletions(-)
create mode 100644 drivers/xen/gntdev-dmabuf.c
create mode 100644 drivers/xen/gntdev-dmabuf.h
create mode 100644 drivers/xen/mem-reservation.c
create mode 100644 include/xen/grant_dev.h
create mode 100644 include/xen/mem-reservation.h
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Make set/clear page private code shared and accessible to
other kernel modules which can re-use these instead of open-coding.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/grant-table.c | 54 +++++++++++++++++++++++++--------------
include/xen/grant_table.h | 3 +++
2 files changed, 38 insertions(+), 19 deletions(-)
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index ba36ff3e4903..dbb48a89e987 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -769,29 +769,18 @@ void gnttab_free_auto_xlat_frames(void)
}
EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
-/**
- * gnttab_alloc_pages - alloc pages suitable for grant mapping into
- * @nr_pages: number of pages to alloc
- * @pages: returns the pages
- */
-int gnttab_alloc_pages(int nr_pages, struct page **pages)
+int gnttab_pages_set_private(int nr_pages, struct page **pages)
{
int i;
- int ret;
-
- ret = alloc_xenballooned_pages(nr_pages, pages);
- if (ret < 0)
- return ret;
for (i = 0; i < nr_pages; i++) {
#if BITS_PER_LONG < 64
struct xen_page_foreign *foreign;
foreign = kzalloc(sizeof(*foreign), GFP_KERNEL);
- if (!foreign) {
- gnttab_free_pages(nr_pages, pages);
+ if (!foreign)
return -ENOMEM;
- }
+
set_page_private(pages[i], (unsigned long)foreign);
#endif
SetPagePrivate(pages[i]);
@@ -799,14 +788,30 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
return 0;
}
-EXPORT_SYMBOL_GPL(gnttab_alloc_pages);
+EXPORT_SYMBOL_GPL(gnttab_pages_set_private);
/**
- * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
- * @nr_pages; number of pages to free
- * @pages: the pages
+ * gnttab_alloc_pages - alloc pages suitable for grant mapping into
+ * @nr_pages: number of pages to alloc
+ * @pages: returns the pages
*/
-void gnttab_free_pages(int nr_pages, struct page **pages)
+int gnttab_alloc_pages(int nr_pages, struct page **pages)
+{
+ int ret;
+
+ ret = alloc_xenballooned_pages(nr_pages, pages);
+ if (ret < 0)
+ return ret;
+
+ ret = gnttab_pages_set_private(nr_pages, pages);
+ if (ret < 0)
+ gnttab_free_pages(nr_pages, pages);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(gnttab_alloc_pages);
+
+void gnttab_pages_clear_private(int nr_pages, struct page **pages)
{
int i;
@@ -818,6 +823,17 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
ClearPagePrivate(pages[i]);
}
}
+}
+EXPORT_SYMBOL_GPL(gnttab_pages_clear_private);
+
+/**
+ * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
+ * @nr_pages; number of pages to free
+ * @pages: the pages
+ */
+void gnttab_free_pages(int nr_pages, struct page **pages)
+{
+ gnttab_pages_clear_private(nr_pages, pages);
free_xenballooned_pages(nr_pages, pages);
}
EXPORT_SYMBOL_GPL(gnttab_free_pages);
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 2e37741f6b8d..de03f2542bb7 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -198,6 +198,9 @@ void gnttab_free_auto_xlat_frames(void);
int gnttab_alloc_pages(int nr_pages, struct page **pages);
void gnttab_free_pages(int nr_pages, struct page **pages);
+int gnttab_pages_set_private(int nr_pages, struct page **pages);
+void gnttab_pages_clear_private(int nr_pages, struct page **pages);
+
int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count);
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Add UAPI and IOCTLs for dma-buf grant device driver extension:
the extension allows userspace processes and kernel modules to
use Xen backed dma-buf implementation. With this extension grant
references to the pages of an imported dma-buf can be exported
for other domain use and grant references coming from a foreign
domain can be converted into a local dma-buf for local export.
Implement basic initialization and stubs for Xen DMA buffers'
support.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/Kconfig | 10 +++
drivers/xen/Makefile | 1 +
drivers/xen/gntdev-dmabuf.c | 75 +++++++++++++++++++
drivers/xen/gntdev-dmabuf.h | 41 +++++++++++
drivers/xen/gntdev.c | 142 ++++++++++++++++++++++++++++++++++++
include/uapi/xen/gntdev.h | 91 +++++++++++++++++++++++
6 files changed, 360 insertions(+)
create mode 100644 drivers/xen/gntdev-dmabuf.c
create mode 100644 drivers/xen/gntdev-dmabuf.h
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 39536ddfbce4..52d64e4b6b81 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -152,6 +152,16 @@ config XEN_GNTDEV
help
Allows userspace processes to use grants.
+config XEN_GNTDEV_DMABUF
+ bool "Add support for dma-buf grant access device driver extension"
+ depends on XEN_GNTDEV && XEN_GRANT_DMA_ALLOC && DMA_SHARED_BUFFER
+ help
+ Allows userspace processes and kernel modules to use Xen backed
+ dma-buf implementation. With this extension grant references to
+ the pages of an imported dma-buf can be exported for other domain
+ use and grant references coming from a foreign domain can be
+ converted into a local dma-buf for local export.
+
config XEN_GRANT_DEV_ALLOC
tristate "User-space grant reference allocator driver"
depends on XEN
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 3c87b0c3aca6..33afb7b2b227 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -41,5 +41,6 @@ obj-$(CONFIG_XEN_PVCALLS_BACKEND) += pvcalls-back.o
obj-$(CONFIG_XEN_PVCALLS_FRONTEND) += pvcalls-front.o
xen-evtchn-y := evtchn.o
xen-gntdev-y := gntdev.o
+xen-gntdev-$(CONFIG_XEN_GNTDEV_DMABUF) += gntdev-dmabuf.o
xen-gntalloc-y := gntalloc.o
xen-privcmd-y := privcmd.o
diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
new file mode 100644
index 000000000000..6bedd1387bd9
--- /dev/null
+++ b/drivers/xen/gntdev-dmabuf.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Xen dma-buf functionality for gntdev.
+ *
+ * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
+ */
+
+#include <linux/slab.h>
+
+#include "gntdev-dmabuf.h"
+
+struct gntdev_dmabuf_priv {
+ int dummy;
+};
+
+/* ------------------------------------------------------------------ */
+/* DMA buffer export support. */
+/* ------------------------------------------------------------------ */
+
+/* ------------------------------------------------------------------ */
+/* Implementation of wait for exported DMA buffer to be released. */
+/* ------------------------------------------------------------------ */
+
+int gntdev_dmabuf_exp_wait_released(struct gntdev_dmabuf_priv *priv, int fd,
+ int wait_to_ms)
+{
+ return -EINVAL;
+}
+
+/* ------------------------------------------------------------------ */
+/* DMA buffer export support. */
+/* ------------------------------------------------------------------ */
+
+int gntdev_dmabuf_exp_from_pages(struct gntdev_dmabuf_export_args *args)
+{
+ return -EINVAL;
+}
+
+/* ------------------------------------------------------------------ */
+/* DMA buffer import support. */
+/* ------------------------------------------------------------------ */
+
+struct gntdev_dmabuf *
+gntdev_dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct device *dev,
+ int fd, int count, int domid)
+{
+ return ERR_PTR(-ENOMEM);
+}
+
+u32 *gntdev_dmabuf_imp_get_refs(struct gntdev_dmabuf *gntdev_dmabuf)
+{
+ return NULL;
+}
+
+int gntdev_dmabuf_imp_release(struct gntdev_dmabuf_priv *priv, u32 fd)
+{
+ return -EINVAL;
+}
+
+struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void)
+{
+ struct gntdev_dmabuf_priv *priv;
+
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return ERR_PTR(-ENOMEM);
+
+ return priv;
+}
+
+void gntdev_dmabuf_fini(struct gntdev_dmabuf_priv *priv)
+{
+ kfree(priv);
+}
diff --git a/drivers/xen/gntdev-dmabuf.h b/drivers/xen/gntdev-dmabuf.h
new file mode 100644
index 000000000000..040b2de904ac
--- /dev/null
+++ b/drivers/xen/gntdev-dmabuf.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Xen dma-buf functionality for gntdev.
+ *
+ * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
+ */
+
+#ifndef _GNTDEV_DMABUF_H
+#define _GNTDEV_DMABUF_H
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+
+struct gntdev_dmabuf_priv;
+struct gntdev_dmabuf;
+struct device;
+
+struct gntdev_dmabuf_export_args {
+ int dummy;
+};
+
+struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void);
+
+void gntdev_dmabuf_fini(struct gntdev_dmabuf_priv *priv);
+
+int gntdev_dmabuf_exp_from_pages(struct gntdev_dmabuf_export_args *args);
+
+int gntdev_dmabuf_exp_wait_released(struct gntdev_dmabuf_priv *priv, int fd,
+ int wait_to_ms);
+
+struct gntdev_dmabuf *
+gntdev_dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct device *dev,
+ int fd, int count, int domid);
+
+u32 *gntdev_dmabuf_imp_get_refs(struct gntdev_dmabuf *gntdev_dmabuf);
+
+int gntdev_dmabuf_imp_release(struct gntdev_dmabuf_priv *priv, u32 fd);
+
+#endif
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 9813fc440c70..7d58dfb3e5e8 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -6,6 +6,7 @@
*
* Copyright (c) 2006-2007, D G Murray.
* (c) 2009 Gerd Hoffmann <[email protected]>
+ * (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
@@ -50,6 +51,10 @@
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+#include "gntdev-dmabuf.h"
+#endif
+
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Derek G. Murray <[email protected]>, "
"Gerd Hoffmann <[email protected]>");
@@ -80,6 +85,10 @@ struct gntdev_priv {
/* Device for which DMA memory is allocated. */
struct device *dma_dev;
#endif
+
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+ struct gntdev_dmabuf_priv *dmabuf_priv;
+#endif
};
struct unmap_notify {
@@ -615,6 +624,15 @@ static int gntdev_open(struct inode *inode, struct file *flip)
INIT_LIST_HEAD(&priv->freeable_maps);
mutex_init(&priv->lock);
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+ priv->dmabuf_priv = gntdev_dmabuf_init();
+ if (IS_ERR(priv->dmabuf_priv)) {
+ ret = PTR_ERR(priv->dmabuf_priv);
+ kfree(priv);
+ return ret;
+ }
+#endif
+
if (use_ptemod) {
priv->mm = get_task_mm(current);
if (!priv->mm) {
@@ -664,8 +682,13 @@ static int gntdev_release(struct inode *inode, struct file *flip)
WARN_ON(!list_empty(&priv->freeable_maps));
mutex_unlock(&priv->lock);
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+ gntdev_dmabuf_fini(priv->dmabuf_priv);
+#endif
+
if (use_ptemod)
mmu_notifier_unregister(&priv->mn, priv->mm);
+
kfree(priv);
return 0;
}
@@ -1035,6 +1058,111 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
return ret;
}
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+/* ------------------------------------------------------------------ */
+/* DMA buffer export support. */
+/* ------------------------------------------------------------------ */
+
+int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
+ int count, u32 domid, u32 *refs, u32 *fd)
+{
+ /* XXX: this will need to work with gntdev's map, so leave it here. */
+ *fd = -1;
+ return -EINVAL;
+}
+
+/* ------------------------------------------------------------------ */
+/* DMA buffer IOCTL support. */
+/* ------------------------------------------------------------------ */
+
+static long
+gntdev_ioctl_dmabuf_exp_from_refs(struct gntdev_priv *priv,
+ struct ioctl_gntdev_dmabuf_exp_from_refs __user *u)
+{
+ struct ioctl_gntdev_dmabuf_exp_from_refs op;
+ u32 *refs;
+ long ret;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+
+ refs = kcalloc(op.count, sizeof(*refs), GFP_KERNEL);
+ if (!refs)
+ return -ENOMEM;
+
+ if (copy_from_user(refs, u->refs, sizeof(*refs) * op.count) != 0) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ ret = gntdev_dmabuf_exp_from_refs(priv, op.flags, op.count,
+ op.domid, refs, &op.fd);
+ if (ret)
+ goto out;
+
+ if (copy_to_user(u, &op, sizeof(op)) != 0)
+ ret = -EFAULT;
+
+out:
+ kfree(refs);
+ return ret;
+}
+
+static long
+gntdev_ioctl_dmabuf_exp_wait_released(struct gntdev_priv *priv,
+ struct ioctl_gntdev_dmabuf_exp_wait_released __user *u)
+{
+ struct ioctl_gntdev_dmabuf_exp_wait_released op;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+
+ return gntdev_dmabuf_exp_wait_released(priv->dmabuf_priv, op.fd,
+ op.wait_to_ms);
+}
+
+static long
+gntdev_ioctl_dmabuf_imp_to_refs(struct gntdev_priv *priv,
+ struct ioctl_gntdev_dmabuf_imp_to_refs __user *u)
+{
+ struct ioctl_gntdev_dmabuf_imp_to_refs op;
+ struct gntdev_dmabuf *gntdev_dmabuf;
+ long ret;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+
+ gntdev_dmabuf = gntdev_dmabuf_imp_to_refs(priv->dmabuf_priv,
+ priv->dma_dev, op.fd,
+ op.count, op.domid);
+ if (IS_ERR(gntdev_dmabuf))
+ return PTR_ERR(gntdev_dmabuf);
+
+ if (copy_to_user(u->refs, gntdev_dmabuf_imp_get_refs(gntdev_dmabuf),
+ sizeof(*u->refs) * op.count) != 0) {
+ ret = -EFAULT;
+ goto out_release;
+ }
+ return 0;
+
+out_release:
+ gntdev_dmabuf_imp_release(priv->dmabuf_priv, op.fd);
+ return ret;
+}
+
+static long
+gntdev_ioctl_dmabuf_imp_release(struct gntdev_priv *priv,
+ struct ioctl_gntdev_dmabuf_imp_release __user *u)
+{
+ struct ioctl_gntdev_dmabuf_imp_release op;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+
+ return gntdev_dmabuf_imp_release(priv->dmabuf_priv, op.fd);
+}
+#endif
+
static long gntdev_ioctl(struct file *flip,
unsigned int cmd, unsigned long arg)
{
@@ -1057,6 +1185,20 @@ static long gntdev_ioctl(struct file *flip,
case IOCTL_GNTDEV_GRANT_COPY:
return gntdev_ioctl_grant_copy(priv, ptr);
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+ case IOCTL_GNTDEV_DMABUF_EXP_FROM_REFS:
+ return gntdev_ioctl_dmabuf_exp_from_refs(priv, ptr);
+
+ case IOCTL_GNTDEV_DMABUF_EXP_WAIT_RELEASED:
+ return gntdev_ioctl_dmabuf_exp_wait_released(priv, ptr);
+
+ case IOCTL_GNTDEV_DMABUF_IMP_TO_REFS:
+ return gntdev_ioctl_dmabuf_imp_to_refs(priv, ptr);
+
+ case IOCTL_GNTDEV_DMABUF_IMP_RELEASE:
+ return gntdev_ioctl_dmabuf_imp_release(priv, ptr);
+#endif
+
default:
pr_debug("priv %p, unknown cmd %x\n", priv, cmd);
return -ENOIOCTLCMD;
diff --git a/include/uapi/xen/gntdev.h b/include/uapi/xen/gntdev.h
index 4b9d498a31d4..fe4423e518c6 100644
--- a/include/uapi/xen/gntdev.h
+++ b/include/uapi/xen/gntdev.h
@@ -5,6 +5,7 @@
* Interface to /dev/xen/gntdev.
*
* Copyright (c) 2007, D G Murray
+ * Copyright (c) 2018, Oleksandr Andrushchenko, EPAM Systems Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License version 2
@@ -215,4 +216,94 @@ struct ioctl_gntdev_grant_copy {
*/
#define GNTDEV_DMA_FLAG_COHERENT (1 << 1)
+/*
+ * Create a dma-buf [1] from grant references @refs of count @count provided
+ * by the foreign domain @domid with flags @flags.
+ *
+ * By default dma-buf is backed by system memory pages, but by providing
+ * one of the GNTDEV_DMA_FLAG_XXX flags it can also be created as
+ * a DMA write-combine or coherent buffer, e.g. allocated with dma_alloc_wc/
+ * dma_alloc_coherent.
+ *
+ * Returns 0 if dma-buf was successfully created and the corresponding
+ * dma-buf's file descriptor is returned in @fd.
+ *
+ * [1] Documentation/driver-api/dma-buf.rst
+ */
+
+#define IOCTL_GNTDEV_DMABUF_EXP_FROM_REFS \
+ _IOC(_IOC_NONE, 'G', 9, \
+ sizeof(struct ioctl_gntdev_dmabuf_exp_from_refs))
+struct ioctl_gntdev_dmabuf_exp_from_refs {
+ /* IN parameters. */
+ /* Specific options for this dma-buf: see GNTDEV_DMA_FLAG_XXX. */
+ __u32 flags;
+ /* Number of grant references in @refs array. */
+ __u32 count;
+ /* OUT parameters. */
+ /* File descriptor of the dma-buf. */
+ __u32 fd;
+ /* The domain ID of the grant references to be mapped. */
+ __u32 domid;
+ /* Variable IN parameter. */
+ /* Array of grant references of size @count. */
+ __u32 refs[1];
+};
+
+/*
+ * This will block until the dma-buf with the file descriptor @fd is
+ * released. This is only valid for buffers created with
+ * IOCTL_GNTDEV_DMABUF_EXP_FROM_REFS.
+ *
+ * If within @wait_to_ms milliseconds the buffer is not released
+ * then -ETIMEDOUT error is returned.
+ * If the buffer with the file descriptor @fd does not exist or has already
+ * been released, then -ENOENT is returned. For valid file descriptors
+ * this must not be treated as error.
+ */
+#define IOCTL_GNTDEV_DMABUF_EXP_WAIT_RELEASED \
+ _IOC(_IOC_NONE, 'G', 10, \
+ sizeof(struct ioctl_gntdev_dmabuf_exp_wait_released))
+struct ioctl_gntdev_dmabuf_exp_wait_released {
+ /* IN parameters */
+ __u32 fd;
+ __u32 wait_to_ms;
+};
+
+/*
+ * Import a dma-buf with file descriptor @fd and export granted references
+ * to the pages of that dma-buf into array @refs of size @count.
+ */
+#define IOCTL_GNTDEV_DMABUF_IMP_TO_REFS \
+ _IOC(_IOC_NONE, 'G', 11, \
+ sizeof(struct ioctl_gntdev_dmabuf_imp_to_refs))
+struct ioctl_gntdev_dmabuf_imp_to_refs {
+ /* IN parameters. */
+ /* File descriptor of the dma-buf. */
+ __u32 fd;
+ /* Number of grant references in @refs array. */
+ __u32 count;
+ /* The domain ID for which references to be granted. */
+ __u32 domid;
+ /* Reserved - must be zero. */
+ __u32 reserved;
+ /* OUT parameters. */
+ /* Array of grant references of size @count. */
+ __u32 refs[1];
+};
+
+/*
+ * This will close all references to the imported buffer with file descriptor
+ * @fd, so it can be released by the owner. This is only valid for buffers
+ * created with IOCTL_GNTDEV_DMABUF_IMP_TO_REFS.
+ */
+#define IOCTL_GNTDEV_DMABUF_IMP_RELEASE \
+ _IOC(_IOC_NONE, 'G', 12, \
+ sizeof(struct ioctl_gntdev_dmabuf_imp_release))
+struct ioctl_gntdev_dmabuf_imp_release {
+ /* IN parameters */
+ __u32 fd;
+ __u32 reserved;
+};
+
#endif /* __LINUX_PUBLIC_GNTDEV_H__ */
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
1. Import a dma-buf with the file descriptor provided and export
granted references to the pages of that dma-buf into the array
of grant references.
2. Add API to close all references to an imported buffer, so it can be
released by the owner. This is only valid for buffers created with
IOCTL_GNTDEV_DMABUF_IMP_TO_REFS.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/gntdev-dmabuf.c | 243 +++++++++++++++++++++++++++++++++++-
1 file changed, 241 insertions(+), 2 deletions(-)
diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
index f612468879b4..b5569a220f03 100644
--- a/drivers/xen/gntdev-dmabuf.c
+++ b/drivers/xen/gntdev-dmabuf.c
@@ -11,8 +11,20 @@
#include <linux/dma-buf.h>
#include <linux/slab.h>
+#include <xen/xen.h>
+#include <xen/grant_table.h>
+
#include "gntdev-dmabuf.h"
+#ifndef GRANT_INVALID_REF
+/*
+ * Note on usage of grant reference 0 as invalid grant reference:
+ * grant reference 0 is valid, but never exposed to a driver,
+ * because of the fact it is already in use/reserved by the PV console.
+ */
+#define GRANT_INVALID_REF 0
+#endif
+
struct gntdev_dmabuf {
struct gntdev_dmabuf_priv *priv;
struct dma_buf *dmabuf;
@@ -29,6 +41,14 @@ struct gntdev_dmabuf {
void (*release)(struct gntdev_priv *priv,
struct grant_map *map);
} exp;
+ struct {
+ /* Granted references of the imported buffer. */
+ grant_ref_t *refs;
+ /* Scatter-gather table of the imported buffer. */
+ struct sg_table *sgt;
+ /* dma-buf attachment of the imported buffer. */
+ struct dma_buf_attachment *attach;
+ } imp;
} u;
/* Number of pages this buffer has. */
@@ -53,6 +73,8 @@ struct gntdev_dmabuf_priv {
struct list_head exp_list;
/* List of wait objects. */
struct list_head exp_wait_list;
+ /* List of imported DMA buffers. */
+ struct list_head imp_list;
/* This is the lock which protects dma_buf_xxx lists. */
struct mutex lock;
};
@@ -424,21 +446,237 @@ int gntdev_dmabuf_exp_from_pages(struct gntdev_dmabuf_export_args *args)
/* DMA buffer import support. */
/* ------------------------------------------------------------------ */
+static int
+dmabuf_imp_grant_foreign_access(struct page **pages, u32 *refs,
+ int count, int domid)
+{
+ grant_ref_t priv_gref_head;
+ int i, ret;
+
+ ret = gnttab_alloc_grant_references(count, &priv_gref_head);
+ if (ret < 0) {
+ pr_err("Cannot allocate grant references, ret %d\n", ret);
+ return ret;
+ }
+
+ for (i = 0; i < count; i++) {
+ int cur_ref;
+
+ cur_ref = gnttab_claim_grant_reference(&priv_gref_head);
+ if (cur_ref < 0) {
+ ret = cur_ref;
+ pr_err("Cannot claim grant reference, ret %d\n", ret);
+ goto out;
+ }
+
+ gnttab_grant_foreign_access_ref(cur_ref, domid,
+ xen_page_to_gfn(pages[i]), 0);
+ refs[i] = cur_ref;
+ }
+
+ ret = 0;
+
+out:
+ gnttab_free_grant_references(priv_gref_head);
+ return ret;
+}
+
+static void dmabuf_imp_end_foreign_access(u32 *refs, int count)
+{
+ int i;
+
+ for (i = 0; i < count; i++)
+ if (refs[i] != GRANT_INVALID_REF)
+ gnttab_end_foreign_access(refs[i], 0, 0UL);
+}
+
+static void dmabuf_imp_free_storage(struct gntdev_dmabuf *gntdev_dmabuf)
+{
+ kfree(gntdev_dmabuf->pages);
+ kfree(gntdev_dmabuf->u.imp.refs);
+ kfree(gntdev_dmabuf);
+}
+
+static struct gntdev_dmabuf *dmabuf_imp_alloc_storage(int count)
+{
+ struct gntdev_dmabuf *gntdev_dmabuf;
+ int i;
+
+ gntdev_dmabuf = kzalloc(sizeof(*gntdev_dmabuf), GFP_KERNEL);
+ if (!gntdev_dmabuf)
+ goto fail;
+
+ gntdev_dmabuf->u.imp.refs = kcalloc(count,
+ sizeof(gntdev_dmabuf->u.imp.refs[0]),
+ GFP_KERNEL);
+ if (!gntdev_dmabuf->u.imp.refs)
+ goto fail;
+
+ gntdev_dmabuf->pages = kcalloc(count,
+ sizeof(gntdev_dmabuf->pages[0]),
+ GFP_KERNEL);
+ if (!gntdev_dmabuf->pages)
+ goto fail;
+
+ gntdev_dmabuf->nr_pages = count;
+
+ for (i = 0; i < count; i++)
+ gntdev_dmabuf->u.imp.refs[i] = GRANT_INVALID_REF;
+
+ return gntdev_dmabuf;
+
+fail:
+ dmabuf_imp_free_storage(gntdev_dmabuf);
+ return ERR_PTR(-ENOMEM);
+}
+
struct gntdev_dmabuf *
gntdev_dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct device *dev,
int fd, int count, int domid)
{
- return ERR_PTR(-ENOMEM);
+ struct gntdev_dmabuf *gntdev_dmabuf, *ret;
+ struct dma_buf *dma_buf;
+ struct dma_buf_attachment *attach;
+ struct sg_table *sgt;
+ struct sg_page_iter sg_iter;
+ int i;
+
+ dma_buf = dma_buf_get(fd);
+ if (IS_ERR(dma_buf))
+ return ERR_CAST(dma_buf);
+
+ gntdev_dmabuf = dmabuf_imp_alloc_storage(count);
+ if (IS_ERR(gntdev_dmabuf)) {
+ ret = gntdev_dmabuf;
+ goto fail_put;
+}
+
+ gntdev_dmabuf->priv = priv;
+ gntdev_dmabuf->fd = fd;
+
+ attach = dma_buf_attach(dma_buf, dev);
+ if (IS_ERR(attach)) {
+ ret = ERR_CAST(attach);
+ goto fail_free_obj;
+ }
+
+ gntdev_dmabuf->u.imp.attach = attach;
+
+ sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+ if (IS_ERR(sgt)) {
+ ret = ERR_CAST(sgt);
+ goto fail_detach;
+ }
+
+ /* Check number of pages that imported buffer has. */
+ if (attach->dmabuf->size != gntdev_dmabuf->nr_pages << PAGE_SHIFT) {
+ ret = ERR_PTR(-EINVAL);
+ pr_err("DMA buffer has %zu pages, user-space expects %d\n",
+ attach->dmabuf->size, gntdev_dmabuf->nr_pages);
+ goto fail_unmap;
+ }
+
+ gntdev_dmabuf->u.imp.sgt = sgt;
+
+ /* Now convert sgt to array of pages and check for page validity. */
+ i = 0;
+ for_each_sg_page(sgt->sgl, &sg_iter, sgt->nents, 0) {
+ struct page *page = sg_page_iter_page(&sg_iter);
+ /*
+ * Check if page is valid: this can happen if we are given
+ * a page from VRAM or other resources which are not backed
+ * by a struct page.
+ */
+ if (!pfn_valid(page_to_pfn(page))) {
+ ret = ERR_PTR(-EINVAL);
+ goto fail_unmap;
+ }
+
+ gntdev_dmabuf->pages[i++] = page;
+ }
+
+ ret = ERR_PTR(dmabuf_imp_grant_foreign_access(gntdev_dmabuf->pages,
+ gntdev_dmabuf->u.imp.refs,
+ count, domid));
+ if (IS_ERR(ret))
+ goto fail_end_access;
+
+ pr_debug("Imported DMA buffer with fd %d\n", fd);
+
+ mutex_lock(&priv->lock);
+ list_add(&gntdev_dmabuf->next, &priv->imp_list);
+ mutex_unlock(&priv->lock);
+
+ return gntdev_dmabuf;
+
+fail_end_access:
+ dmabuf_imp_end_foreign_access(gntdev_dmabuf->u.imp.refs, count);
+fail_unmap:
+ dma_buf_unmap_attachment(attach, sgt, DMA_BIDIRECTIONAL);
+fail_detach:
+ dma_buf_detach(dma_buf, attach);
+fail_free_obj:
+ dmabuf_imp_free_storage(gntdev_dmabuf);
+fail_put:
+ dma_buf_put(dma_buf);
+ return ret;
}
u32 *gntdev_dmabuf_imp_get_refs(struct gntdev_dmabuf *gntdev_dmabuf)
{
+ if (gntdev_dmabuf)
+ return gntdev_dmabuf->u.imp.refs;
+
return NULL;
}
+/*
+ * Find the hyper dma-buf by its file descriptor and remove
+ * it from the buffer's list.
+ */
+static struct gntdev_dmabuf *
+dmabuf_imp_find_unlink(struct gntdev_dmabuf_priv *priv, int fd)
+{
+ struct gntdev_dmabuf *q, *gntdev_dmabuf, *ret = ERR_PTR(-ENOENT);
+
+ mutex_lock(&priv->lock);
+ list_for_each_entry_safe(gntdev_dmabuf, q, &priv->imp_list, next) {
+ if (gntdev_dmabuf->fd == fd) {
+ pr_debug("Found gntdev_dmabuf in the import list\n");
+ ret = gntdev_dmabuf;
+ list_del(&gntdev_dmabuf->next);
+ break;
+ }
+ }
+ mutex_unlock(&priv->lock);
+ return ret;
+}
+
int gntdev_dmabuf_imp_release(struct gntdev_dmabuf_priv *priv, u32 fd)
{
- return -EINVAL;
+ struct gntdev_dmabuf *gntdev_dmabuf;
+ struct dma_buf_attachment *attach;
+ struct dma_buf *dma_buf;
+
+ gntdev_dmabuf = dmabuf_imp_find_unlink(priv, fd);
+ if (IS_ERR(gntdev_dmabuf))
+ return PTR_ERR(gntdev_dmabuf);
+
+ pr_debug("Releasing DMA buffer with fd %d\n", fd);
+
+ attach = gntdev_dmabuf->u.imp.attach;
+
+ if (gntdev_dmabuf->u.imp.sgt)
+ dma_buf_unmap_attachment(attach, gntdev_dmabuf->u.imp.sgt,
+ DMA_BIDIRECTIONAL);
+ dma_buf = attach->dmabuf;
+ dma_buf_detach(attach->dmabuf, attach);
+ dma_buf_put(dma_buf);
+
+ dmabuf_imp_end_foreign_access(gntdev_dmabuf->u.imp.refs,
+ gntdev_dmabuf->nr_pages);
+ dmabuf_imp_free_storage(gntdev_dmabuf);
+ return 0;
}
struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void)
@@ -452,6 +690,7 @@ struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void)
mutex_init(&priv->lock);
INIT_LIST_HEAD(&priv->exp_list);
INIT_LIST_HEAD(&priv->exp_wait_list);
+ INIT_LIST_HEAD(&priv->imp_list);
return priv;
}
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Allow creating grant device context for use by kernel modules which
require functionality, provided by gntdev. Export symbols for dma-buf
API provided by the module.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/gntdev-dmabuf.c | 6 +++
drivers/xen/gntdev.c | 92 +++++++++++++++++++++++--------------
include/xen/grant_dev.h | 37 +++++++++++++++
3 files changed, 101 insertions(+), 34 deletions(-)
create mode 100644 include/xen/grant_dev.h
diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
index b5569a220f03..3890ac9dfab6 100644
--- a/drivers/xen/gntdev-dmabuf.c
+++ b/drivers/xen/gntdev-dmabuf.c
@@ -196,6 +196,7 @@ int gntdev_dmabuf_exp_wait_released(struct gntdev_dmabuf_priv *priv, int fd,
dmabuf_exp_wait_obj_free(priv, obj);
return ret;
}
+EXPORT_SYMBOL_GPL(gntdev_dmabuf_exp_wait_released);
/* ------------------------------------------------------------------ */
/* DMA buffer export support. */
@@ -621,6 +622,7 @@ gntdev_dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct device *dev,
dma_buf_put(dma_buf);
return ret;
}
+EXPORT_SYMBOL_GPL(gntdev_dmabuf_imp_to_refs);
u32 *gntdev_dmabuf_imp_get_refs(struct gntdev_dmabuf *gntdev_dmabuf)
{
@@ -629,6 +631,7 @@ u32 *gntdev_dmabuf_imp_get_refs(struct gntdev_dmabuf *gntdev_dmabuf)
return NULL;
}
+EXPORT_SYMBOL_GPL(gntdev_dmabuf_imp_get_refs);
/*
* Find the hyper dma-buf by its file descriptor and remove
@@ -678,6 +681,7 @@ int gntdev_dmabuf_imp_release(struct gntdev_dmabuf_priv *priv, u32 fd)
dmabuf_imp_free_storage(gntdev_dmabuf);
return 0;
}
+EXPORT_SYMBOL_GPL(gntdev_dmabuf_imp_release);
struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void)
{
@@ -694,8 +698,10 @@ struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void)
return priv;
}
+EXPORT_SYMBOL_GPL(gntdev_dmabuf_init);
void gntdev_dmabuf_fini(struct gntdev_dmabuf_priv *priv)
{
kfree(priv);
}
+EXPORT_SYMBOL_GPL(gntdev_dmabuf_fini);
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index cf255d45f20f..63902f5298c9 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -621,14 +621,37 @@ static const struct mmu_notifier_ops gntdev_mmu_ops = {
/* ------------------------------------------------------------------ */
-static int gntdev_open(struct inode *inode, struct file *flip)
+void gntdev_free_context(struct gntdev_priv *priv)
+{
+ struct grant_map *map;
+
+ pr_debug("priv %p\n", priv);
+
+ mutex_lock(&priv->lock);
+ while (!list_empty(&priv->maps)) {
+ map = list_entry(priv->maps.next, struct grant_map, next);
+ list_del(&map->next);
+ gntdev_put_map(NULL /* already removed */, map);
+ }
+ WARN_ON(!list_empty(&priv->freeable_maps));
+
+ mutex_unlock(&priv->lock);
+
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+ gntdev_dmabuf_fini(priv->dmabuf_priv);
+#endif
+
+ kfree(priv);
+}
+EXPORT_SYMBOL_GPL(gntdev_free_context);
+
+struct gntdev_priv *gntdev_alloc_context(struct device *dev)
{
struct gntdev_priv *priv;
- int ret = 0;
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&priv->maps);
INIT_LIST_HEAD(&priv->freeable_maps);
@@ -637,12 +660,40 @@ static int gntdev_open(struct inode *inode, struct file *flip)
#ifdef CONFIG_XEN_GNTDEV_DMABUF
priv->dmabuf_priv = gntdev_dmabuf_init();
if (IS_ERR(priv->dmabuf_priv)) {
- ret = PTR_ERR(priv->dmabuf_priv);
+ struct gntdev_priv *ret;
+
+ ret = ERR_CAST(priv->dmabuf_priv);
kfree(priv);
return ret;
}
#endif
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ priv->dma_dev = dev;
+
+ /*
+ * The device is not spawn from a device tree, so arch_setup_dma_ops
+ * is not called, thus leaving the device with dummy DMA ops.
+ * Fix this call of_dma_configure() with a NULL node to set
+ * default DMA ops.
+ */
+ of_dma_configure(priv->dma_dev, NULL);
+#endif
+ pr_debug("priv %p\n", priv);
+
+ return priv;
+}
+EXPORT_SYMBOL_GPL(gntdev_alloc_context);
+
+static int gntdev_open(struct inode *inode, struct file *flip)
+{
+ struct gntdev_priv *priv;
+ int ret = 0;
+
+ priv = gntdev_alloc_context(gntdev_miscdev.this_device);
+ if (IS_ERR(priv))
+ return PTR_ERR(priv);
+
if (use_ptemod) {
priv->mm = get_task_mm(current);
if (!priv->mm) {
@@ -655,23 +706,11 @@ static int gntdev_open(struct inode *inode, struct file *flip)
}
if (ret) {
- kfree(priv);
+ gntdev_free_context(priv);
return ret;
}
flip->private_data = priv;
-#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
- priv->dma_dev = gntdev_miscdev.this_device;
-
- /*
- * The device is not spawn from a device tree, so arch_setup_dma_ops
- * is not called, thus leaving the device with dummy DMA ops.
- * Fix this call of_dma_configure() with a NULL node to set
- * default DMA ops.
- */
- of_dma_configure(priv->dma_dev, NULL);
-#endif
- pr_debug("priv %p\n", priv);
return 0;
}
@@ -679,27 +718,11 @@ static int gntdev_open(struct inode *inode, struct file *flip)
static int gntdev_release(struct inode *inode, struct file *flip)
{
struct gntdev_priv *priv = flip->private_data;
- struct grant_map *map;
-
- pr_debug("priv %p\n", priv);
-
- mutex_lock(&priv->lock);
- while (!list_empty(&priv->maps)) {
- map = list_entry(priv->maps.next, struct grant_map, next);
- list_del(&map->next);
- gntdev_put_map(NULL /* already removed */, map);
- }
- WARN_ON(!list_empty(&priv->freeable_maps));
- mutex_unlock(&priv->lock);
-
-#ifdef CONFIG_XEN_GNTDEV_DMABUF
- gntdev_dmabuf_fini(priv->dmabuf_priv);
-#endif
if (use_ptemod)
mmu_notifier_unregister(&priv->mn, priv->mm);
- kfree(priv);
+ gntdev_free_context(priv);
return 0;
}
@@ -1156,6 +1179,7 @@ int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
gntdev_remove_map(priv, map);
return ret;
}
+EXPORT_SYMBOL_GPL(gntdev_dmabuf_exp_from_refs);
/* ------------------------------------------------------------------ */
/* DMA buffer IOCTL support. */
diff --git a/include/xen/grant_dev.h b/include/xen/grant_dev.h
new file mode 100644
index 000000000000..b7d0abd1ab16
--- /dev/null
+++ b/include/xen/grant_dev.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Grant device kernel API
+ *
+ * Copyright (C) 2018 EPAM Systems Inc.
+ *
+ * Author: Oleksandr Andrushchenko <[email protected]>
+ */
+
+#ifndef _GRANT_DEV_H
+#define _GRANT_DEV_H
+
+#include <linux/types.h>
+
+struct device;
+struct gntdev_priv;
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+struct xen_dmabuf;
+#endif
+
+struct gntdev_priv *gntdev_alloc_context(struct device *dev);
+void gntdev_free_context(struct gntdev_priv *priv);
+
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
+ int count, u32 domid, u32 *refs, u32 *fd);
+int gntdev_dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd,
+ int wait_to_ms);
+
+struct xen_dmabuf *gntdev_dmabuf_imp_to_refs(struct gntdev_priv *priv,
+ int fd, int count, int domid);
+u32 *gntdev_dmabuf_imp_get_refs(struct xen_dmabuf *xen_dmabuf);
+int gntdev_dmabuf_imp_release(struct gntdev_priv *priv, u32 fd);
+#endif
+
+#endif
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
1. Create a dma-buf from grant references provided by the foreign
domain. By default dma-buf is backed by system memory pages, but
by providing GNTDEV_DMA_FLAG_XXX flags it can also be created
as a DMA write-combine/coherent buffer, e.g. allocated with
corresponding dma_alloc_xxx API.
Export the resulting buffer as a new dma-buf.
2. Implement waiting for the dma-buf to be released: block until the
dma-buf with the file descriptor provided is released.
If within the time-out provided the buffer is not released then
-ETIMEDOUT error is returned. If the buffer with the file descriptor
does not exist or has already been released, then -ENOENT is
returned. For valid file descriptors this must not be treated as
error.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/gntdev-dmabuf.c | 393 +++++++++++++++++++++++++++++++++++-
drivers/xen/gntdev-dmabuf.h | 9 +-
drivers/xen/gntdev.c | 90 ++++++++-
3 files changed, 486 insertions(+), 6 deletions(-)
diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
index 6bedd1387bd9..f612468879b4 100644
--- a/drivers/xen/gntdev-dmabuf.c
+++ b/drivers/xen/gntdev-dmabuf.c
@@ -3,15 +3,58 @@
/*
* Xen dma-buf functionality for gntdev.
*
+ * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c.
+ *
* Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
*/
+#include <linux/dma-buf.h>
#include <linux/slab.h>
#include "gntdev-dmabuf.h"
+struct gntdev_dmabuf {
+ struct gntdev_dmabuf_priv *priv;
+ struct dma_buf *dmabuf;
+ struct list_head next;
+ int fd;
+
+ union {
+ struct {
+ /* Exported buffers are reference counted. */
+ struct kref refcount;
+
+ struct gntdev_priv *priv;
+ struct grant_map *map;
+ void (*release)(struct gntdev_priv *priv,
+ struct grant_map *map);
+ } exp;
+ } u;
+
+ /* Number of pages this buffer has. */
+ int nr_pages;
+ /* Pages of this buffer. */
+ struct page **pages;
+};
+
+struct gntdev_dmabuf_wait_obj {
+ struct list_head next;
+ struct gntdev_dmabuf *gntdev_dmabuf;
+ struct completion completion;
+};
+
+struct gntdev_dmabuf_attachment {
+ struct sg_table *sgt;
+ enum dma_data_direction dir;
+};
+
struct gntdev_dmabuf_priv {
- int dummy;
+ /* List of exported DMA buffers. */
+ struct list_head exp_list;
+ /* List of wait objects. */
+ struct list_head exp_wait_list;
+ /* This is the lock which protects dma_buf_xxx lists. */
+ struct mutex lock;
};
/* ------------------------------------------------------------------ */
@@ -22,19 +65,359 @@ struct gntdev_dmabuf_priv {
/* Implementation of wait for exported DMA buffer to be released. */
/* ------------------------------------------------------------------ */
+static void dmabuf_exp_release(struct kref *kref);
+
+static struct gntdev_dmabuf_wait_obj *
+dmabuf_exp_wait_obj_new(struct gntdev_dmabuf_priv *priv,
+ struct gntdev_dmabuf *gntdev_dmabuf)
+{
+ struct gntdev_dmabuf_wait_obj *obj;
+
+ obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+ if (!obj)
+ return ERR_PTR(-ENOMEM);
+
+ init_completion(&obj->completion);
+ obj->gntdev_dmabuf = gntdev_dmabuf;
+
+ mutex_lock(&priv->lock);
+ list_add(&obj->next, &priv->exp_wait_list);
+ /* Put our reference and wait for gntdev_dmabuf's release to fire. */
+ kref_put(&gntdev_dmabuf->u.exp.refcount, dmabuf_exp_release);
+ mutex_unlock(&priv->lock);
+ return obj;
+}
+
+static void dmabuf_exp_wait_obj_free(struct gntdev_dmabuf_priv *priv,
+ struct gntdev_dmabuf_wait_obj *obj)
+{
+ struct gntdev_dmabuf_wait_obj *cur_obj, *q;
+
+ mutex_lock(&priv->lock);
+ list_for_each_entry_safe(cur_obj, q, &priv->exp_wait_list, next)
+ if (cur_obj == obj) {
+ list_del(&obj->next);
+ kfree(obj);
+ break;
+ }
+ mutex_unlock(&priv->lock);
+}
+
+static int dmabuf_exp_wait_obj_wait(struct gntdev_dmabuf_wait_obj *obj,
+ u32 wait_to_ms)
+{
+ if (wait_for_completion_timeout(&obj->completion,
+ msecs_to_jiffies(wait_to_ms)) <= 0)
+ return -ETIMEDOUT;
+
+ return 0;
+}
+
+static void dmabuf_exp_wait_obj_signal(struct gntdev_dmabuf_priv *priv,
+ struct gntdev_dmabuf *gntdev_dmabuf)
+{
+ struct gntdev_dmabuf_wait_obj *obj, *q;
+
+ list_for_each_entry_safe(obj, q, &priv->exp_wait_list, next)
+ if (obj->gntdev_dmabuf == gntdev_dmabuf) {
+ pr_debug("Found gntdev_dmabuf in the wait list, wake\n");
+ complete_all(&obj->completion);
+ }
+}
+
+static struct gntdev_dmabuf *
+dmabuf_exp_wait_obj_get_by_fd(struct gntdev_dmabuf_priv *priv, int fd)
+{
+ struct gntdev_dmabuf *q, *gntdev_dmabuf, *ret = ERR_PTR(-ENOENT);
+
+ mutex_lock(&priv->lock);
+ list_for_each_entry_safe(gntdev_dmabuf, q, &priv->exp_list, next)
+ if (gntdev_dmabuf->fd == fd) {
+ pr_debug("Found gntdev_dmabuf in the wait list\n");
+ kref_get(&gntdev_dmabuf->u.exp.refcount);
+ ret = gntdev_dmabuf;
+ break;
+ }
+ mutex_unlock(&priv->lock);
+ return ret;
+}
+
int gntdev_dmabuf_exp_wait_released(struct gntdev_dmabuf_priv *priv, int fd,
int wait_to_ms)
{
- return -EINVAL;
+ struct gntdev_dmabuf *gntdev_dmabuf;
+ struct gntdev_dmabuf_wait_obj *obj;
+ int ret;
+
+ pr_debug("Will wait for dma-buf with fd %d\n", fd);
+ /*
+ * Try to find the DMA buffer: if not found means that
+ * either the buffer has already been released or file descriptor
+ * provided is wrong.
+ */
+ gntdev_dmabuf = dmabuf_exp_wait_obj_get_by_fd(priv, fd);
+ if (IS_ERR(gntdev_dmabuf))
+ return PTR_ERR(gntdev_dmabuf);
+
+ /*
+ * gntdev_dmabuf still exists and is reference count locked by us now,
+ * so prepare to wait: allocate wait object and add it to the wait list,
+ * so we can find it on release.
+ */
+ obj = dmabuf_exp_wait_obj_new(priv, gntdev_dmabuf);
+ if (IS_ERR(obj)) {
+ pr_err("Failed to setup wait object, ret %ld\n", PTR_ERR(obj));
+ return PTR_ERR(obj);
+}
+
+ ret = dmabuf_exp_wait_obj_wait(obj, wait_to_ms);
+ dmabuf_exp_wait_obj_free(priv, obj);
+ return ret;
}
/* ------------------------------------------------------------------ */
/* DMA buffer export support. */
/* ------------------------------------------------------------------ */
+static struct sg_table *
+dmabuf_pages_to_sgt(struct page **pages, unsigned int nr_pages)
+{
+ struct sg_table *sgt;
+ int ret;
+
+ sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+ if (!sgt) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = sg_alloc_table_from_pages(sgt, pages, nr_pages, 0,
+ nr_pages << PAGE_SHIFT,
+ GFP_KERNEL);
+ if (ret)
+ goto out;
+
+ return sgt;
+
+out:
+ kfree(sgt);
+ return ERR_PTR(ret);
+}
+
+static int dmabuf_exp_ops_attach(struct dma_buf *dma_buf,
+ struct device *target_dev,
+ struct dma_buf_attachment *attach)
+{
+ struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach;
+
+ gntdev_dmabuf_attach = kzalloc(sizeof(*gntdev_dmabuf_attach),
+ GFP_KERNEL);
+ if (!gntdev_dmabuf_attach)
+ return -ENOMEM;
+
+ gntdev_dmabuf_attach->dir = DMA_NONE;
+ attach->priv = gntdev_dmabuf_attach;
+ /* Might need to pin the pages of the buffer now. */
+ return 0;
+}
+
+static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
+ struct dma_buf_attachment *attach)
+{
+ struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach = attach->priv;
+
+ if (gntdev_dmabuf_attach) {
+ struct sg_table *sgt = gntdev_dmabuf_attach->sgt;
+
+ if (sgt) {
+ if (gntdev_dmabuf_attach->dir != DMA_NONE)
+ dma_unmap_sg_attrs(attach->dev, sgt->sgl,
+ sgt->nents,
+ gntdev_dmabuf_attach->dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
+ sg_free_table(sgt);
+ }
+
+ kfree(sgt);
+ kfree(gntdev_dmabuf_attach);
+ attach->priv = NULL;
+ }
+ /* Might need to unpin the pages of the buffer now. */
+}
+
+static struct sg_table *
+dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
+ enum dma_data_direction dir)
+{
+ struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach = attach->priv;
+ struct gntdev_dmabuf *gntdev_dmabuf = attach->dmabuf->priv;
+ struct sg_table *sgt;
+
+ pr_debug("Mapping %d pages for dev %p\n", gntdev_dmabuf->nr_pages,
+ attach->dev);
+
+ if (WARN_ON(dir == DMA_NONE || !gntdev_dmabuf_attach))
+ return ERR_PTR(-EINVAL);
+
+ /* Return the cached mapping when possible. */
+ if (gntdev_dmabuf_attach->dir == dir)
+ return gntdev_dmabuf_attach->sgt;
+
+ /*
+ * Two mappings with different directions for the same attachment are
+ * not allowed.
+ */
+ if (WARN_ON(gntdev_dmabuf_attach->dir != DMA_NONE))
+ return ERR_PTR(-EBUSY);
+
+ sgt = dmabuf_pages_to_sgt(gntdev_dmabuf->pages,
+ gntdev_dmabuf->nr_pages);
+ if (!IS_ERR(sgt)) {
+ if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
+ DMA_ATTR_SKIP_CPU_SYNC)) {
+ sg_free_table(sgt);
+ kfree(sgt);
+ sgt = ERR_PTR(-ENOMEM);
+ } else {
+ gntdev_dmabuf_attach->sgt = sgt;
+ gntdev_dmabuf_attach->dir = dir;
+ }
+ }
+ if (IS_ERR(sgt))
+ pr_err("Failed to map sg table for dev %p\n", attach->dev);
+ return sgt;
+}
+
+static void dmabuf_exp_ops_unmap_dma_buf(struct dma_buf_attachment *attach,
+ struct sg_table *sgt,
+ enum dma_data_direction dir)
+{
+ /* Not implemented. The unmap is done at dmabuf_exp_ops_detach(). */
+}
+
+static void dmabuf_exp_release(struct kref *kref)
+{
+ struct gntdev_dmabuf *gntdev_dmabuf =
+ container_of(kref, struct gntdev_dmabuf, u.exp.refcount);
+
+ dmabuf_exp_wait_obj_signal(gntdev_dmabuf->priv, gntdev_dmabuf);
+ list_del(&gntdev_dmabuf->next);
+ kfree(gntdev_dmabuf);
+}
+
+static void dmabuf_exp_ops_release(struct dma_buf *dma_buf)
+{
+ struct gntdev_dmabuf *gntdev_dmabuf = dma_buf->priv;
+ struct gntdev_dmabuf_priv *priv = gntdev_dmabuf->priv;
+
+ gntdev_dmabuf->u.exp.release(gntdev_dmabuf->u.exp.priv,
+ gntdev_dmabuf->u.exp.map);
+ mutex_lock(&priv->lock);
+ kref_put(&gntdev_dmabuf->u.exp.refcount, dmabuf_exp_release);
+ mutex_unlock(&priv->lock);
+}
+
+static void *dmabuf_exp_ops_kmap_atomic(struct dma_buf *dma_buf,
+ unsigned long page_num)
+{
+ /* Not implemented. */
+ return NULL;
+}
+
+static void dmabuf_exp_ops_kunmap_atomic(struct dma_buf *dma_buf,
+ unsigned long page_num, void *addr)
+{
+ /* Not implemented. */
+}
+
+static void *dmabuf_exp_ops_kmap(struct dma_buf *dma_buf,
+ unsigned long page_num)
+{
+ /* Not implemented. */
+ return NULL;
+}
+
+static void dmabuf_exp_ops_kunmap(struct dma_buf *dma_buf,
+ unsigned long page_num, void *addr)
+{
+ /* Not implemented. */
+}
+
+static int dmabuf_exp_ops_mmap(struct dma_buf *dma_buf,
+ struct vm_area_struct *vma)
+{
+ /* Not implemented. */
+ return 0;
+}
+
+static const struct dma_buf_ops dmabuf_exp_ops = {
+ .attach = dmabuf_exp_ops_attach,
+ .detach = dmabuf_exp_ops_detach,
+ .map_dma_buf = dmabuf_exp_ops_map_dma_buf,
+ .unmap_dma_buf = dmabuf_exp_ops_unmap_dma_buf,
+ .release = dmabuf_exp_ops_release,
+ .map = dmabuf_exp_ops_kmap,
+ .map_atomic = dmabuf_exp_ops_kmap_atomic,
+ .unmap = dmabuf_exp_ops_kunmap,
+ .unmap_atomic = dmabuf_exp_ops_kunmap_atomic,
+ .mmap = dmabuf_exp_ops_mmap,
+};
+
int gntdev_dmabuf_exp_from_pages(struct gntdev_dmabuf_export_args *args)
{
- return -EINVAL;
+ DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+ struct gntdev_dmabuf *gntdev_dmabuf;
+ int ret = 0;
+
+ gntdev_dmabuf = kzalloc(sizeof(*gntdev_dmabuf), GFP_KERNEL);
+ if (!gntdev_dmabuf)
+ return -ENOMEM;
+
+ kref_init(&gntdev_dmabuf->u.exp.refcount);
+
+ gntdev_dmabuf->priv = args->dmabuf_priv;
+ gntdev_dmabuf->nr_pages = args->count;
+ gntdev_dmabuf->pages = args->pages;
+ gntdev_dmabuf->u.exp.priv = args->priv;
+ gntdev_dmabuf->u.exp.map = args->map;
+ gntdev_dmabuf->u.exp.release = args->release;
+
+ exp_info.exp_name = KBUILD_MODNAME;
+ if (args->dev->driver && args->dev->driver->owner)
+ exp_info.owner = args->dev->driver->owner;
+ else
+ exp_info.owner = THIS_MODULE;
+ exp_info.ops = &dmabuf_exp_ops;
+ exp_info.size = args->count << PAGE_SHIFT;
+ exp_info.flags = O_RDWR;
+ exp_info.priv = gntdev_dmabuf;
+
+ gntdev_dmabuf->dmabuf = dma_buf_export(&exp_info);
+ if (IS_ERR(gntdev_dmabuf->dmabuf)) {
+ ret = PTR_ERR(gntdev_dmabuf->dmabuf);
+ gntdev_dmabuf->dmabuf = NULL;
+ goto fail;
+ }
+
+ ret = dma_buf_fd(gntdev_dmabuf->dmabuf, O_CLOEXEC);
+ if (ret < 0)
+ goto fail;
+
+ gntdev_dmabuf->fd = ret;
+ args->fd = ret;
+
+ pr_debug("Exporting DMA buffer with fd %d\n", ret);
+
+ mutex_lock(&args->dmabuf_priv->lock);
+ list_add(&gntdev_dmabuf->next, &args->dmabuf_priv->exp_list);
+ mutex_unlock(&args->dmabuf_priv->lock);
+ return 0;
+
+fail:
+ if (gntdev_dmabuf->dmabuf)
+ dma_buf_put(gntdev_dmabuf->dmabuf);
+ kfree(gntdev_dmabuf);
+ return ret;
}
/* ------------------------------------------------------------------ */
@@ -66,6 +449,10 @@ struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void)
if (!priv)
return ERR_PTR(-ENOMEM);
+ mutex_init(&priv->lock);
+ INIT_LIST_HEAD(&priv->exp_list);
+ INIT_LIST_HEAD(&priv->exp_wait_list);
+
return priv;
}
diff --git a/drivers/xen/gntdev-dmabuf.h b/drivers/xen/gntdev-dmabuf.h
index 040b2de904ac..95c23a24f640 100644
--- a/drivers/xen/gntdev-dmabuf.h
+++ b/drivers/xen/gntdev-dmabuf.h
@@ -18,7 +18,14 @@ struct gntdev_dmabuf;
struct device;
struct gntdev_dmabuf_export_args {
- int dummy;
+ struct gntdev_priv *priv;
+ struct grant_map *map;
+ void (*release)(struct gntdev_priv *priv, struct grant_map *map);
+ struct gntdev_dmabuf_priv *dmabuf_priv;
+ struct device *dev;
+ int count;
+ struct page **pages;
+ u32 fd;
};
struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void);
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 7d58dfb3e5e8..cf255d45f20f 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -319,6 +319,16 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map)
gntdev_free_map(map);
}
+#ifdef CONFIG_XEN_GNTDEV_DMABUF
+static void gntdev_remove_map(struct gntdev_priv *priv, struct grant_map *map)
+{
+ mutex_lock(&priv->lock);
+ list_del(&map->next);
+ gntdev_put_map(NULL /* already removed */, map);
+ mutex_unlock(&priv->lock);
+}
+#endif
+
/* ------------------------------------------------------------------ */
static int find_grant_ptes(pte_t *pte, pgtable_t token,
@@ -1063,12 +1073,88 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
/* DMA buffer export support. */
/* ------------------------------------------------------------------ */
+static struct grant_map *
+dmabuf_exp_alloc_backing_storage(struct gntdev_priv *priv, int dmabuf_flags,
+ int count)
+{
+ struct grant_map *map;
+
+ if (unlikely(count <= 0))
+ return ERR_PTR(-EINVAL);
+
+ if ((dmabuf_flags & GNTDEV_DMA_FLAG_WC) &&
+ (dmabuf_flags & GNTDEV_DMA_FLAG_COHERENT)) {
+ pr_err("Wrong dma-buf flags: either WC or coherent, not both\n");
+ return ERR_PTR(-EINVAL);
+ }
+
+ map = gntdev_alloc_map(priv, count, dmabuf_flags);
+ if (!map)
+ return ERR_PTR(-ENOMEM);
+
+ if (unlikely(atomic_add_return(count, &pages_mapped) > limit)) {
+ pr_err("can't map: over limit\n");
+ gntdev_put_map(NULL, map);
+ return ERR_PTR(-ENOMEM);
+ }
+ return map;
+}
+
int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
int count, u32 domid, u32 *refs, u32 *fd)
{
- /* XXX: this will need to work with gntdev's map, so leave it here. */
+ struct grant_map *map;
+ struct gntdev_dmabuf_export_args args;
+ int i, ret;
+
*fd = -1;
- return -EINVAL;
+
+ if (use_ptemod) {
+ pr_err("Cannot provide dma-buf: use_ptemode %d\n",
+ use_ptemod);
+ return -EINVAL;
+ }
+
+ map = dmabuf_exp_alloc_backing_storage(priv, flags, count);
+ if (IS_ERR(map))
+ return PTR_ERR(map);
+
+ for (i = 0; i < count; i++) {
+ map->grants[i].domid = domid;
+ map->grants[i].ref = refs[i];
+ }
+
+ mutex_lock(&priv->lock);
+ gntdev_add_map(priv, map);
+ mutex_unlock(&priv->lock);
+
+ map->flags |= GNTMAP_host_map;
+#if defined(CONFIG_X86)
+ map->flags |= GNTMAP_device_map;
+#endif
+
+ ret = map_grant_pages(map);
+ if (ret < 0)
+ goto out;
+
+ args.priv = priv;
+ args.map = map;
+ args.release = gntdev_remove_map;
+ args.dev = priv->dma_dev;
+ args.dmabuf_priv = priv->dmabuf_priv;
+ args.count = map->count;
+ args.pages = map->pages;
+
+ ret = gntdev_dmabuf_exp_from_pages(&args);
+ if (ret < 0)
+ goto out;
+
+ *fd = args.fd;
+ return 0;
+
+out:
+ gntdev_remove_map(priv, map);
+ return ret;
}
/* ------------------------------------------------------------------ */
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Allow mappings for DMA backed buffers if grant table module
supports such: this extends grant device to not only map buffers
made of balloon pages, but also from buffers allocated with
dma_alloc_xxx.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/gntdev.c | 99 ++++++++++++++++++++++++++++++++++++++-
include/uapi/xen/gntdev.h | 15 ++++++
2 files changed, 112 insertions(+), 2 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index bd56653b9bbc..9813fc440c70 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -37,6 +37,9 @@
#include <linux/slab.h>
#include <linux/highmem.h>
#include <linux/refcount.h>
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+#include <linux/of_device.h>
+#endif
#include <xen/xen.h>
#include <xen/grant_table.h>
@@ -72,6 +75,11 @@ struct gntdev_priv {
struct mutex lock;
struct mm_struct *mm;
struct mmu_notifier mn;
+
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ /* Device for which DMA memory is allocated. */
+ struct device *dma_dev;
+#endif
};
struct unmap_notify {
@@ -96,10 +104,27 @@ struct grant_map {
struct gnttab_unmap_grant_ref *kunmap_ops;
struct page **pages;
unsigned long pages_vm_start;
+
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ /*
+ * If dmabuf_vaddr is not NULL then this mapping is backed by DMA
+ * capable memory.
+ */
+
+ struct device *dma_dev;
+ /* Flags used to create this DMA buffer: GNTDEV_DMA_FLAG_XXX. */
+ int dma_flags;
+ void *dma_vaddr;
+ dma_addr_t dma_bus_addr;
+ /* This is required for gnttab_dma_{alloc|free}_pages. */
+ xen_pfn_t *frames;
+#endif
};
static int unmap_grant_pages(struct grant_map *map, int offset, int pages);
+static struct miscdevice gntdev_miscdev;
+
/* ------------------------------------------------------------------ */
static void gntdev_print_maps(struct gntdev_priv *priv,
@@ -121,8 +146,27 @@ static void gntdev_free_map(struct grant_map *map)
if (map == NULL)
return;
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ if (map->dma_vaddr) {
+ struct gnttab_dma_alloc_args args;
+
+ args.dev = map->dma_dev;
+ args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
+ args.nr_pages = map->count;
+ args.pages = map->pages;
+ args.frames = map->frames;
+ args.vaddr = map->dma_vaddr;
+ args.dev_bus_addr = map->dma_bus_addr;
+
+ gnttab_dma_free_pages(&args);
+ } else
+#endif
if (map->pages)
gnttab_free_pages(map->count, map->pages);
+
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ kfree(map->frames);
+#endif
kfree(map->pages);
kfree(map->grants);
kfree(map->map_ops);
@@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map *map)
kfree(map);
}
-static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
+static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count,
+ int dma_flags)
{
struct grant_map *add;
int i;
@@ -155,6 +200,37 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
NULL == add->pages)
goto err;
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ add->dma_flags = dma_flags;
+
+ /*
+ * Check if this mapping is requested to be backed
+ * by a DMA buffer.
+ */
+ if (dma_flags & (GNTDEV_DMA_FLAG_WC | GNTDEV_DMA_FLAG_COHERENT)) {
+ struct gnttab_dma_alloc_args args;
+
+ add->frames = kcalloc(count, sizeof(add->frames[0]),
+ GFP_KERNEL);
+ if (!add->frames)
+ goto err;
+
+ /* Remember the device, so we can free DMA memory. */
+ add->dma_dev = priv->dma_dev;
+
+ args.dev = priv->dma_dev;
+ args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT;
+ args.nr_pages = count;
+ args.pages = add->pages;
+ args.frames = add->frames;
+
+ if (gnttab_dma_alloc_pages(&args))
+ goto err;
+
+ add->dma_vaddr = args.vaddr;
+ add->dma_bus_addr = args.dev_bus_addr;
+ } else
+#endif
if (gnttab_alloc_pages(count, add->pages))
goto err;
@@ -325,6 +401,14 @@ static int map_grant_pages(struct grant_map *map)
map->unmap_ops[i].handle = map->map_ops[i].handle;
if (use_ptemod)
map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ else if (map->dma_vaddr) {
+ unsigned long mfn;
+
+ mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
+ map->unmap_ops[i].dev_bus_addr = __pfn_to_phys(mfn);
+ }
+#endif
}
return err;
}
@@ -548,6 +632,17 @@ static int gntdev_open(struct inode *inode, struct file *flip)
}
flip->private_data = priv;
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+ priv->dma_dev = gntdev_miscdev.this_device;
+
+ /*
+ * The device is not spawn from a device tree, so arch_setup_dma_ops
+ * is not called, thus leaving the device with dummy DMA ops.
+ * Fix this call of_dma_configure() with a NULL node to set
+ * default DMA ops.
+ */
+ of_dma_configure(priv->dma_dev, NULL);
+#endif
pr_debug("priv %p\n", priv);
return 0;
@@ -589,7 +684,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
return -EINVAL;
err = -ENOMEM;
- map = gntdev_alloc_map(priv, op.count);
+ map = gntdev_alloc_map(priv, op.count, 0 /* This is not a dma-buf. */);
if (!map)
return err;
diff --git a/include/uapi/xen/gntdev.h b/include/uapi/xen/gntdev.h
index 6d1163456c03..4b9d498a31d4 100644
--- a/include/uapi/xen/gntdev.h
+++ b/include/uapi/xen/gntdev.h
@@ -200,4 +200,19 @@ struct ioctl_gntdev_grant_copy {
/* Send an interrupt on the indicated event channel */
#define UNMAP_NOTIFY_SEND_EVENT 0x2
+/*
+ * Flags to be used while requesting memory mapping's backing storage
+ * to be allocated with DMA API.
+ */
+
+/*
+ * The buffer is backed with memory allocated with dma_alloc_wc.
+ */
+#define GNTDEV_DMA_FLAG_WC (1 << 0)
+
+/*
+ * The buffer is backed with memory allocated with dma_alloc_coherent.
+ */
+#define GNTDEV_DMA_FLAG_COHERENT (1 << 1)
+
#endif /* __LINUX_PUBLIC_GNTDEV_H__ */
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Extend grant table module API to allow allocating buffers that can
be used for DMA operations and mapping foreign grant references
on top of those.
The resulting buffer is similar to the one allocated by the balloon
driver in terms that proper memory reservation is made
({increase|decrease}_reservation and VA mappings updated if needed).
This is useful for sharing foreign buffers with HW drivers which
cannot work with scattered buffers provided by the balloon driver,
but require DMAable memory instead.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/Kconfig | 13 +++++
drivers/xen/grant-table.c | 109 ++++++++++++++++++++++++++++++++++++++
include/xen/grant_table.h | 18 +++++++
3 files changed, 140 insertions(+)
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index e5d0c28372ea..39536ddfbce4 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -161,6 +161,19 @@ config XEN_GRANT_DEV_ALLOC
to other domains. This can be used to implement frontend drivers
or as part of an inter-domain shared memory channel.
+config XEN_GRANT_DMA_ALLOC
+ bool "Allow allocating DMA capable buffers with grant reference module"
+ depends on XEN && HAS_DMA
+ help
+ Extends grant table module API to allow allocating DMA capable
+ buffers and mapping foreign grant references on top of it.
+ The resulting buffer is similar to one allocated by the balloon
+ driver in terms that proper memory reservation is made
+ ({increase|decrease}_reservation and VA mappings updated if needed).
+ This is useful for sharing foreign buffers with HW drivers which
+ cannot work with scattered buffers provided by the balloon driver,
+ but require DMAable memory instead.
+
config SWIOTLB_XEN
def_bool y
select SWIOTLB
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index dbb48a89e987..5658e58d9cc6 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -45,6 +45,9 @@
#include <linux/workqueue.h>
#include <linux/ratelimit.h>
#include <linux/moduleparam.h>
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+#include <linux/dma-mapping.h>
+#endif
#include <xen/xen.h>
#include <xen/interface/xen.h>
@@ -57,6 +60,7 @@
#ifdef CONFIG_X86
#include <asm/xen/cpuid.h>
#endif
+#include <xen/mem-reservation.h>
#include <asm/xen/hypercall.h>
#include <asm/xen/interface.h>
@@ -811,6 +815,73 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
}
EXPORT_SYMBOL_GPL(gnttab_alloc_pages);
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+/**
+ * gnttab_dma_alloc_pages - alloc DMAable pages suitable for grant mapping into
+ * @args: arguments to the function
+ */
+int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
+{
+ unsigned long pfn, start_pfn;
+ size_t size;
+ int i, ret;
+
+ size = args->nr_pages << PAGE_SHIFT;
+ if (args->coherent)
+ args->vaddr = dma_alloc_coherent(args->dev, size,
+ &args->dev_bus_addr,
+ GFP_KERNEL | __GFP_NOWARN);
+ else
+ args->vaddr = dma_alloc_wc(args->dev, size,
+ &args->dev_bus_addr,
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!args->vaddr) {
+ pr_err("Failed to allocate DMA buffer of size %zu\n", size);
+ return -ENOMEM;
+ }
+
+ start_pfn = __phys_to_pfn(args->dev_bus_addr);
+ for (pfn = start_pfn, i = 0; pfn < start_pfn + args->nr_pages;
+ pfn++, i++) {
+ struct page *page = pfn_to_page(pfn);
+
+ args->pages[i] = page;
+ args->frames[i] = xen_page_to_gfn(page);
+ xenmem_reservation_scrub_page(page);
+ }
+
+ xenmem_reservation_va_mapping_reset(args->nr_pages, args->pages);
+
+ ret = xenmem_reservation_decrease(args->nr_pages, args->frames);
+ if (ret != args->nr_pages) {
+ pr_err("Failed to decrease reservation for DMA buffer\n");
+ ret = -EFAULT;
+ goto fail_free_dma;
+ }
+
+ ret = gnttab_pages_set_private(args->nr_pages, args->pages);
+ if (ret < 0)
+ goto fail_clear_private;
+
+ return 0;
+
+fail_clear_private:
+ gnttab_pages_clear_private(args->nr_pages, args->pages);
+fail_free_dma:
+ xenmem_reservation_increase(args->nr_pages, args->frames);
+ xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
+ args->frames);
+ if (args->coherent)
+ dma_free_coherent(args->dev, size,
+ args->vaddr, args->dev_bus_addr);
+ else
+ dma_free_wc(args->dev, size,
+ args->vaddr, args->dev_bus_addr);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
+#endif
+
void gnttab_pages_clear_private(int nr_pages, struct page **pages)
{
int i;
@@ -838,6 +909,44 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
}
EXPORT_SYMBOL_GPL(gnttab_free_pages);
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+/**
+ * gnttab_dma_free_pages - free DMAable pages
+ * @args: arguments to the function
+ */
+int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
+{
+ size_t size;
+ int i, ret;
+
+ gnttab_pages_clear_private(args->nr_pages, args->pages);
+
+ for (i = 0; i < args->nr_pages; i++)
+ args->frames[i] = page_to_xen_pfn(args->pages[i]);
+
+ ret = xenmem_reservation_increase(args->nr_pages, args->frames);
+ if (ret != args->nr_pages) {
+ pr_err("Failed to decrease reservation for DMA buffer\n");
+ ret = -EFAULT;
+ } else {
+ ret = 0;
+ }
+
+ xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
+ args->frames);
+
+ size = args->nr_pages << PAGE_SHIFT;
+ if (args->coherent)
+ dma_free_coherent(args->dev, size,
+ args->vaddr, args->dev_bus_addr);
+ else
+ dma_free_wc(args->dev, size,
+ args->vaddr, args->dev_bus_addr);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(gnttab_dma_free_pages);
+#endif
+
/* Handling of paged out grant targets (GNTST_eagain) */
#define MAX_DELAY 256
static inline void
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index de03f2542bb7..9bc5bc07d4d3 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -198,6 +198,24 @@ void gnttab_free_auto_xlat_frames(void);
int gnttab_alloc_pages(int nr_pages, struct page **pages);
void gnttab_free_pages(int nr_pages, struct page **pages);
+#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
+struct gnttab_dma_alloc_args {
+ /* Device for which DMA memory will be/was allocated. */
+ struct device *dev;
+ /* If set then DMA buffer is coherent and write-combine otherwise. */
+ bool coherent;
+
+ int nr_pages;
+ struct page **pages;
+ xen_pfn_t *frames;
+ void *vaddr;
+ dma_addr_t dev_bus_addr;
+};
+
+int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args);
+int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args);
+#endif
+
int gnttab_pages_set_private(int nr_pages, struct page **pages);
void gnttab_pages_clear_private(int nr_pages, struct page **pages);
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Memory {increase|decrease}_reservation and VA mappings update/reset
code used in balloon driver can be made common, so other drivers can
also re-use the same functionality without open-coding.
Create a dedicated file for the shared code and export corresponding
symbols for other kernel modules.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/Makefile | 1 +
drivers/xen/balloon.c | 71 ++------------------
drivers/xen/mem-reservation.c | 120 ++++++++++++++++++++++++++++++++++
include/xen/mem-reservation.h | 65 ++++++++++++++++++
4 files changed, 192 insertions(+), 65 deletions(-)
create mode 100644 drivers/xen/mem-reservation.c
create mode 100644 include/xen/mem-reservation.h
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 451e833f5931..3c87b0c3aca6 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -2,6 +2,7 @@
obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
obj-$(CONFIG_X86) += fallback.o
obj-y += grant-table.o features.o balloon.o manage.o preempt.o time.o
+obj-y += mem-reservation.o
obj-y += events/
obj-y += xenbus/
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 065f0b607373..bdbce4257b65 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -71,6 +71,7 @@
#include <xen/balloon.h>
#include <xen/features.h>
#include <xen/page.h>
+#include <xen/mem-reservation.h>
static int xen_hotplug_unpopulated;
@@ -157,13 +158,6 @@ static DECLARE_DELAYED_WORK(balloon_worker, balloon_process);
#define GFP_BALLOON \
(GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
-static void scrub_page(struct page *page)
-{
-#ifdef CONFIG_XEN_SCRUB_PAGES
- clear_highpage(page);
-#endif
-}
-
/* balloon_append: add the given page to the balloon. */
static void __balloon_append(struct page *page)
{
@@ -463,11 +457,6 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
int rc;
unsigned long i;
struct page *page;
- struct xen_memory_reservation reservation = {
- .address_bits = 0,
- .extent_order = EXTENT_ORDER,
- .domid = DOMID_SELF
- };
if (nr_pages > ARRAY_SIZE(frame_list))
nr_pages = ARRAY_SIZE(frame_list);
@@ -486,9 +475,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
page = balloon_next_page(page);
}
- set_xen_guest_handle(reservation.extent_start, frame_list);
- reservation.nr_extents = nr_pages;
- rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+ rc = xenmem_reservation_increase(nr_pages, frame_list);
if (rc <= 0)
return BP_EAGAIN;
@@ -496,29 +483,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
page = balloon_retrieve(false);
BUG_ON(page == NULL);
-#ifdef CONFIG_XEN_HAVE_PVMMU
- /*
- * We don't support PV MMU when Linux and Xen is using
- * different page granularity.
- */
- BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
-
- if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- unsigned long pfn = page_to_pfn(page);
-
- set_phys_to_machine(pfn, frame_list[i]);
-
- /* Link back into the page tables if not highmem. */
- if (!PageHighMem(page)) {
- int ret;
- ret = HYPERVISOR_update_va_mapping(
- (unsigned long)__va(pfn << PAGE_SHIFT),
- mfn_pte(frame_list[i], PAGE_KERNEL),
- 0);
- BUG_ON(ret);
- }
- }
-#endif
+ xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
/* Relinquish the page back to the allocator. */
free_reserved_page(page);
@@ -535,11 +500,6 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
unsigned long i;
struct page *page, *tmp;
int ret;
- struct xen_memory_reservation reservation = {
- .address_bits = 0,
- .extent_order = EXTENT_ORDER,
- .domid = DOMID_SELF
- };
LIST_HEAD(pages);
if (nr_pages > ARRAY_SIZE(frame_list))
@@ -553,7 +513,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
break;
}
adjust_managed_page_count(page, -1);
- scrub_page(page);
+ xenmem_reservation_scrub_page(page);
list_add(&page->lru, &pages);
}
@@ -575,25 +535,8 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
/* XENMEM_decrease_reservation requires a GFN */
frame_list[i++] = xen_page_to_gfn(page);
-#ifdef CONFIG_XEN_HAVE_PVMMU
- /*
- * We don't support PV MMU when Linux and Xen is using
- * different page granularity.
- */
- BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
-
- if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- unsigned long pfn = page_to_pfn(page);
+ xenmem_reservation_va_mapping_reset(1, &page);
- if (!PageHighMem(page)) {
- ret = HYPERVISOR_update_va_mapping(
- (unsigned long)__va(pfn << PAGE_SHIFT),
- __pte_ma(0), 0);
- BUG_ON(ret);
- }
- __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
- }
-#endif
list_del(&page->lru);
balloon_append(page);
@@ -601,9 +544,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
flush_tlb_all();
- set_xen_guest_handle(reservation.extent_start, frame_list);
- reservation.nr_extents = nr_pages;
- ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
+ ret = xenmem_reservation_decrease(nr_pages, frame_list);
BUG_ON(ret != nr_pages);
balloon_stats.current_pages -= nr_pages;
diff --git a/drivers/xen/mem-reservation.c b/drivers/xen/mem-reservation.c
new file mode 100644
index 000000000000..5388df852a21
--- /dev/null
+++ b/drivers/xen/mem-reservation.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/******************************************************************************
+ * Xen memory reservation utilities.
+ *
+ * Copyright (c) 2003, B Dragovic
+ * Copyright (c) 2003-2004, M Williamson, K Fraser
+ * Copyright (c) 2005 Dan M. Smith, IBM Corporation
+ * Copyright (c) 2010 Daniel Kiper
+ * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
+ */
+
+#include <xen/mem-reservation.h>
+
+/*
+ * Use one extent per PAGE_SIZE to avoid to break down the page into
+ * multiple frame.
+ */
+#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
+
+#ifdef CONFIG_XEN_SCRUB_PAGES
+void xenmem_reservation_scrub_page(struct page *page)
+{
+ clear_highpage(page);
+}
+EXPORT_SYMBOL_GPL(xenmem_reservation_scrub_page);
+#endif
+
+#ifdef CONFIG_XEN_HAVE_PVMMU
+void __xenmem_reservation_va_mapping_update(unsigned long count,
+ struct page **pages,
+ xen_pfn_t *frames)
+{
+ int i;
+
+ for (i = 0; i < count; i++) {
+ struct page *page = pages[i];
+ unsigned long pfn = page_to_pfn(page);
+
+ BUG_ON(page == NULL);
+
+ /*
+ * We don't support PV MMU when Linux and Xen is using
+ * different page granularity.
+ */
+ BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
+
+ set_phys_to_machine(pfn, frames[i]);
+
+ /* Link back into the page tables if not highmem. */
+ if (!PageHighMem(page)) {
+ int ret;
+
+ ret = HYPERVISOR_update_va_mapping(
+ (unsigned long)__va(pfn << PAGE_SHIFT),
+ mfn_pte(frames[i], PAGE_KERNEL),
+ 0);
+ BUG_ON(ret);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(__xenmem_reservation_va_mapping_update);
+
+void __xenmem_reservation_va_mapping_reset(unsigned long count,
+ struct page **pages)
+{
+ int i;
+
+ for (i = 0; i < count; i++) {
+ struct page *page = pages[i];
+ unsigned long pfn = page_to_pfn(page);
+
+ /*
+ * We don't support PV MMU when Linux and Xen is using
+ * different page granularity.
+ */
+ BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
+ if (!PageHighMem(page)) {
+ int ret;
+
+ ret = HYPERVISOR_update_va_mapping(
+ (unsigned long)__va(pfn << PAGE_SHIFT),
+ __pte_ma(0), 0);
+ BUG_ON(ret);
+ }
+ __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+ }
+}
+EXPORT_SYMBOL_GPL(__xenmem_reservation_va_mapping_reset);
+#endif /* CONFIG_XEN_HAVE_PVMMU */
+
+int xenmem_reservation_increase(int count, xen_pfn_t *frames)
+{
+ struct xen_memory_reservation reservation = {
+ .address_bits = 0,
+ .extent_order = EXTENT_ORDER,
+ .domid = DOMID_SELF
+ };
+
+ set_xen_guest_handle(reservation.extent_start, frames);
+ reservation.nr_extents = count;
+ return HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+}
+EXPORT_SYMBOL_GPL(xenmem_reservation_increase);
+
+int xenmem_reservation_decrease(int count, xen_pfn_t *frames)
+{
+ struct xen_memory_reservation reservation = {
+ .address_bits = 0,
+ .extent_order = EXTENT_ORDER,
+ .domid = DOMID_SELF
+ };
+
+ set_xen_guest_handle(reservation.extent_start, frames);
+ reservation.nr_extents = count;
+ return HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
+}
+EXPORT_SYMBOL_GPL(xenmem_reservation_decrease);
diff --git a/include/xen/mem-reservation.h b/include/xen/mem-reservation.h
new file mode 100644
index 000000000000..a727d65a1e61
--- /dev/null
+++ b/include/xen/mem-reservation.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Xen memory reservation utilities.
+ *
+ * Copyright (c) 2003, B Dragovic
+ * Copyright (c) 2003-2004, M Williamson, K Fraser
+ * Copyright (c) 2005 Dan M. Smith, IBM Corporation
+ * Copyright (c) 2010 Daniel Kiper
+ * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
+ */
+
+#ifndef _XENMEM_RESERVATION_H
+#define _XENMEM_RESERVATION_H
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/tlb.h>
+
+#include <xen/interface/memory.h>
+#include <xen/page.h>
+
+#ifdef CONFIG_XEN_SCRUB_PAGES
+void xenmem_reservation_scrub_page(struct page *page);
+#else
+static inline void xenmem_reservation_scrub_page(struct page *page)
+{
+}
+#endif
+
+#ifdef CONFIG_XEN_HAVE_PVMMU
+void __xenmem_reservation_va_mapping_update(unsigned long count,
+ struct page **pages,
+ xen_pfn_t *frames);
+
+void __xenmem_reservation_va_mapping_reset(unsigned long count,
+ struct page **pages);
+#endif
+
+static inline void xenmem_reservation_va_mapping_update(unsigned long count,
+ struct page **pages,
+ xen_pfn_t *frames)
+{
+#ifdef CONFIG_XEN_HAVE_PVMMU
+ if (!xen_feature(XENFEAT_auto_translated_physmap))
+ __xenmem_reservation_va_mapping_update(count, pages, frames);
+#endif
+}
+
+static inline void xenmem_reservation_va_mapping_reset(unsigned long count,
+ struct page **pages)
+{
+#ifdef CONFIG_XEN_HAVE_PVMMU
+ if (!xen_feature(XENFEAT_auto_translated_physmap))
+ __xenmem_reservation_va_mapping_reset(count, pages);
+#endif
+}
+
+int xenmem_reservation_increase(int count, xen_pfn_t *frames);
+
+int xenmem_reservation_decrease(int count, xen_pfn_t *frames);
+
+#endif
--
2.17.0
From: Oleksandr Andrushchenko <[email protected]>
Only gnttab_{alloc|free}_pages are exported as EXPORT_SYMBOL
while all the rest are exported as EXPORT_SYMBOL_GPL, thus
effectively making it not possible for non-GPL driver modules
to use grant table module. Export gnttab_{alloc|free}_pages as
EXPORT_SYMBOL_GPL so all the exports are aligned.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
---
drivers/xen/grant-table.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 27be107d6480..ba36ff3e4903 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -799,7 +799,7 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
return 0;
}
-EXPORT_SYMBOL(gnttab_alloc_pages);
+EXPORT_SYMBOL_GPL(gnttab_alloc_pages);
/**
* gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
@@ -820,7 +820,7 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
}
free_xenballooned_pages(nr_pages, pages);
}
-EXPORT_SYMBOL(gnttab_free_pages);
+EXPORT_SYMBOL_GPL(gnttab_free_pages);
/* Handling of paged out grant targets (GNTST_eagain) */
#define MAX_DELAY 256
--
2.17.0
Boris, I dropped your r-b for this patch as I changed
EXPORT_SYMBOL to EXPORT_SYMBOL_GPL as Juergen requested
On 06/01/2018 02:41 PM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Make set/clear page private code shared and accessible to
> other kernel modules which can re-use these instead of open-coding.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/grant-table.c | 54 +++++++++++++++++++++++++--------------
> include/xen/grant_table.h | 3 +++
> 2 files changed, 38 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index ba36ff3e4903..dbb48a89e987 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -769,29 +769,18 @@ void gnttab_free_auto_xlat_frames(void)
> }
> EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
>
> -/**
> - * gnttab_alloc_pages - alloc pages suitable for grant mapping into
> - * @nr_pages: number of pages to alloc
> - * @pages: returns the pages
> - */
> -int gnttab_alloc_pages(int nr_pages, struct page **pages)
> +int gnttab_pages_set_private(int nr_pages, struct page **pages)
> {
> int i;
> - int ret;
> -
> - ret = alloc_xenballooned_pages(nr_pages, pages);
> - if (ret < 0)
> - return ret;
>
> for (i = 0; i < nr_pages; i++) {
> #if BITS_PER_LONG < 64
> struct xen_page_foreign *foreign;
>
> foreign = kzalloc(sizeof(*foreign), GFP_KERNEL);
> - if (!foreign) {
> - gnttab_free_pages(nr_pages, pages);
> + if (!foreign)
> return -ENOMEM;
> - }
> +
> set_page_private(pages[i], (unsigned long)foreign);
> #endif
> SetPagePrivate(pages[i]);
> @@ -799,14 +788,30 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
>
> return 0;
> }
> -EXPORT_SYMBOL_GPL(gnttab_alloc_pages);
> +EXPORT_SYMBOL_GPL(gnttab_pages_set_private);
>
> /**
> - * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
> - * @nr_pages; number of pages to free
> - * @pages: the pages
> + * gnttab_alloc_pages - alloc pages suitable for grant mapping into
> + * @nr_pages: number of pages to alloc
> + * @pages: returns the pages
> */
> -void gnttab_free_pages(int nr_pages, struct page **pages)
> +int gnttab_alloc_pages(int nr_pages, struct page **pages)
> +{
> + int ret;
> +
> + ret = alloc_xenballooned_pages(nr_pages, pages);
> + if (ret < 0)
> + return ret;
> +
> + ret = gnttab_pages_set_private(nr_pages, pages);
> + if (ret < 0)
> + gnttab_free_pages(nr_pages, pages);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(gnttab_alloc_pages);
> +
> +void gnttab_pages_clear_private(int nr_pages, struct page **pages)
> {
> int i;
>
> @@ -818,6 +823,17 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
> ClearPagePrivate(pages[i]);
> }
> }
> +}
> +EXPORT_SYMBOL_GPL(gnttab_pages_clear_private);
> +
> +/**
> + * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
> + * @nr_pages; number of pages to free
> + * @pages: the pages
> + */
> +void gnttab_free_pages(int nr_pages, struct page **pages)
> +{
> + gnttab_pages_clear_private(nr_pages, pages);
> free_xenballooned_pages(nr_pages, pages);
> }
> EXPORT_SYMBOL_GPL(gnttab_free_pages);
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 2e37741f6b8d..de03f2542bb7 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -198,6 +198,9 @@ void gnttab_free_auto_xlat_frames(void);
> int gnttab_alloc_pages(int nr_pages, struct page **pages);
> void gnttab_free_pages(int nr_pages, struct page **pages);
>
> +int gnttab_pages_set_private(int nr_pages, struct page **pages);
> +void gnttab_pages_clear_private(int nr_pages, struct page **pages);
> +
> int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> struct gnttab_map_grant_ref *kmap_ops,
> struct page **pages, unsigned int count);
On 01/06/18 13:41, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Only gnttab_{alloc|free}_pages are exported as EXPORT_SYMBOL
> while all the rest are exported as EXPORT_SYMBOL_GPL, thus
> effectively making it not possible for non-GPL driver modules
> to use grant table module. Export gnttab_{alloc|free}_pages as
> EXPORT_SYMBOL_GPL so all the exports are aligned.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Juergen
On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Make set/clear page private code shared and accessible to
> other kernel modules which can re-use these instead of open-coding.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> diff --git a/include/xen/mem-reservation.h b/include/xen/mem-reservation.h
> new file mode 100644
> index 000000000000..a727d65a1e61
> --- /dev/null
> +++ b/include/xen/mem-reservation.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Xen memory reservation utilities.
> + *
> + * Copyright (c) 2003, B Dragovic
> + * Copyright (c) 2003-2004, M Williamson, K Fraser
> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
> + * Copyright (c) 2010 Daniel Kiper
> + * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
> + */
> +
> +#ifndef _XENMEM_RESERVATION_H
> +#define _XENMEM_RESERVATION_H
> +
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +
> +#include <asm/xen/hypercall.h>
> +#include <asm/tlb.h>
> +
> +#include <xen/interface/memory.h>
> +#include <xen/page.h>
> +
> +#ifdef CONFIG_XEN_SCRUB_PAGES
> +void xenmem_reservation_scrub_page(struct page *page);
> +#else
> +static inline void xenmem_reservation_scrub_page(struct page *page)
> +{
> +}
> +#endif
Given that this is a wrapper around a single call I'd prefer
inline void xenmem_reservation_scrub_page(struct page *page)
{
#ifdef CONFIG_XEN_SCRUB_PAGES
clear_highpage(page);
#endif
}
-boris
On 06/04/2018 07:37 PM, Boris Ostrovsky wrote:
> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> diff --git a/include/xen/mem-reservation.h b/include/xen/mem-reservation.h
>> new file mode 100644
>> index 000000000000..a727d65a1e61
>> --- /dev/null
>> +++ b/include/xen/mem-reservation.h
>> @@ -0,0 +1,65 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +/*
>> + * Xen memory reservation utilities.
>> + *
>> + * Copyright (c) 2003, B Dragovic
>> + * Copyright (c) 2003-2004, M Williamson, K Fraser
>> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
>> + * Copyright (c) 2010 Daniel Kiper
>> + * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
>> + */
>> +
>> +#ifndef _XENMEM_RESERVATION_H
>> +#define _XENMEM_RESERVATION_H
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/slab.h>
>> +
>> +#include <asm/xen/hypercall.h>
>> +#include <asm/tlb.h>
>> +
>> +#include <xen/interface/memory.h>
>> +#include <xen/page.h>
>> +
>> +#ifdef CONFIG_XEN_SCRUB_PAGES
>> +void xenmem_reservation_scrub_page(struct page *page);
>> +#else
>> +static inline void xenmem_reservation_scrub_page(struct page *page)
>> +{
>> +}
>> +#endif
>
> Given that this is a wrapper around a single call I'd prefer
>
> inline void xenmem_reservation_scrub_page(struct page *page)
> {
> #ifdef CONFIG_XEN_SCRUB_PAGES
> clear_highpage(page);
> #endif
> }
>
Ok, will change
>
> -boris
>
>
Thank you,
Oleksandr
On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Extend grant table module API to allow allocating buffers that can
> be used for DMA operations and mapping foreign grant references
> on top of those.
> The resulting buffer is similar to the one allocated by the balloon
> driver in terms that proper memory reservation is made
> ({increase|decrease}_reservation and VA mappings updated if needed).
> This is useful for sharing foreign buffers with HW drivers which
> cannot work with scattered buffers provided by the balloon driver,
> but require DMAable memory instead.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/Kconfig | 13 +++++
> drivers/xen/grant-table.c | 109 ++++++++++++++++++++++++++++++++++++++
> include/xen/grant_table.h | 18 +++++++
> 3 files changed, 140 insertions(+)
>
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index e5d0c28372ea..39536ddfbce4 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -161,6 +161,19 @@ config XEN_GRANT_DEV_ALLOC
> to other domains. This can be used to implement frontend drivers
> or as part of an inter-domain shared memory channel.
>
> +config XEN_GRANT_DMA_ALLOC
> + bool "Allow allocating DMA capable buffers with grant reference module"
> + depends on XEN && HAS_DMA
> + help
> + Extends grant table module API to allow allocating DMA capable
> + buffers and mapping foreign grant references on top of it.
> + The resulting buffer is similar to one allocated by the balloon
> + driver in terms that proper memory reservation is made
> + ({increase|decrease}_reservation and VA mappings updated if needed).
> + This is useful for sharing foreign buffers with HW drivers which
> + cannot work with scattered buffers provided by the balloon driver,
> + but require DMAable memory instead.
> +
> config SWIOTLB_XEN
> def_bool y
> select SWIOTLB
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index dbb48a89e987..5658e58d9cc6 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -45,6 +45,9 @@
> #include <linux/workqueue.h>
> #include <linux/ratelimit.h>
> #include <linux/moduleparam.h>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> +#include <linux/dma-mapping.h>
> +#endif
>
> #include <xen/xen.h>
> #include <xen/interface/xen.h>
> @@ -57,6 +60,7 @@
> #ifdef CONFIG_X86
> #include <asm/xen/cpuid.h>
> #endif
> +#include <xen/mem-reservation.h>
> #include <asm/xen/hypercall.h>
> #include <asm/xen/interface.h>
>
> @@ -811,6 +815,73 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
> }
> EXPORT_SYMBOL_GPL(gnttab_alloc_pages);
>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> +/**
> + * gnttab_dma_alloc_pages - alloc DMAable pages suitable for grant mapping into
> + * @args: arguments to the function
> + */
> +int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
> +{
> + unsigned long pfn, start_pfn;
> + size_t size;
> + int i, ret;
> +
> + size = args->nr_pages << PAGE_SHIFT;
> + if (args->coherent)
> + args->vaddr = dma_alloc_coherent(args->dev, size,
> + &args->dev_bus_addr,
> + GFP_KERNEL | __GFP_NOWARN);
> + else
> + args->vaddr = dma_alloc_wc(args->dev, size,
> + &args->dev_bus_addr,
> + GFP_KERNEL | __GFP_NOWARN);
> + if (!args->vaddr) {
> + pr_err("Failed to allocate DMA buffer of size %zu\n", size);
> + return -ENOMEM;
> + }
> +
> + start_pfn = __phys_to_pfn(args->dev_bus_addr);
> + for (pfn = start_pfn, i = 0; pfn < start_pfn + args->nr_pages;
> + pfn++, i++) {
> + struct page *page = pfn_to_page(pfn);
> +
> + args->pages[i] = page;
> + args->frames[i] = xen_page_to_gfn(page);
> + xenmem_reservation_scrub_page(page);
> + }
> +
> + xenmem_reservation_va_mapping_reset(args->nr_pages, args->pages);
> +
> + ret = xenmem_reservation_decrease(args->nr_pages, args->frames);
> + if (ret != args->nr_pages) {
> + pr_err("Failed to decrease reservation for DMA buffer\n");
> + ret = -EFAULT;
> + goto fail_free_dma;
> + }
> +
> + ret = gnttab_pages_set_private(args->nr_pages, args->pages);
> + if (ret < 0)
> + goto fail_clear_private;
> +
> + return 0;
> +
> +fail_clear_private:
> + gnttab_pages_clear_private(args->nr_pages, args->pages);
> +fail_free_dma:
> + xenmem_reservation_increase(args->nr_pages, args->frames);
> + xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
> + args->frames);
> + if (args->coherent)
> + dma_free_coherent(args->dev, size,
> + args->vaddr, args->dev_bus_addr);
> + else
> + dma_free_wc(args->dev, size,
> + args->vaddr, args->dev_bus_addr);
> + return ret;
> +}
Would it be possible to call gnttab_dma_free_pages() here?
> +EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
> +#endif
> +
> void gnttab_pages_clear_private(int nr_pages, struct page **pages)
> {
> int i;
> @@ -838,6 +909,44 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
> }
> EXPORT_SYMBOL_GPL(gnttab_free_pages);
>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
I'd move this after (or before) gnttab_dma_alloc_page() to keep both
inside a single ifdef block.
-boris
> +/**
> + * gnttab_dma_free_pages - free DMAable pages
> + * @args: arguments to the function
> + */
> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
> +{
> + size_t size;
> + int i, ret;
> +
> + gnttab_pages_clear_private(args->nr_pages, args->pages);
> +
> + for (i = 0; i < args->nr_pages; i++)
> + args->frames[i] = page_to_xen_pfn(args->pages[i]);
> +
> + ret = xenmem_reservation_increase(args->nr_pages, args->frames);
> + if (ret != args->nr_pages) {
> + pr_err("Failed to decrease reservation for DMA buffer\n");
> + ret = -EFAULT;
> + } else {
> + ret = 0;
> + }
> +
> + xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
> + args->frames);
> +
> + size = args->nr_pages << PAGE_SHIFT;
> + if (args->coherent)
> + dma_free_coherent(args->dev, size,
> + args->vaddr, args->dev_bus_addr);
> + else
> + dma_free_wc(args->dev, size,
> + args->vaddr, args->dev_bus_addr);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(gnttab_dma_free_pages);
> +#endif
> +
> /* Handling of paged out grant targets (GNTST_eagain) */
> #define MAX_DELAY 256
> static inline void
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index de03f2542bb7..9bc5bc07d4d3 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -198,6 +198,24 @@ void gnttab_free_auto_xlat_frames(void);
> int gnttab_alloc_pages(int nr_pages, struct page **pages);
> void gnttab_free_pages(int nr_pages, struct page **pages);
>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> +struct gnttab_dma_alloc_args {
> + /* Device for which DMA memory will be/was allocated. */
> + struct device *dev;
> + /* If set then DMA buffer is coherent and write-combine otherwise. */
> + bool coherent;
> +
> + int nr_pages;
> + struct page **pages;
> + xen_pfn_t *frames;
> + void *vaddr;
> + dma_addr_t dev_bus_addr;
> +};
> +
> +int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args);
> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args);
> +#endif
> +
> int gnttab_pages_set_private(int nr_pages, struct page **pages);
> void gnttab_pages_clear_private(int nr_pages, struct page **pages);
>
On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Allow mappings for DMA backed buffers if grant table module
> supports such: this extends grant device to not only map buffers
> made of balloon pages, but also from buffers allocated with
> dma_alloc_xxx.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/gntdev.c | 99 ++++++++++++++++++++++++++++++++++++++-
> include/uapi/xen/gntdev.h | 15 ++++++
> 2 files changed, 112 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index bd56653b9bbc..9813fc440c70 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -37,6 +37,9 @@
> #include <linux/slab.h>
> #include <linux/highmem.h>
> #include <linux/refcount.h>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> +#include <linux/of_device.h>
> +#endif
>
> #include <xen/xen.h>
> #include <xen/grant_table.h>
> @@ -72,6 +75,11 @@ struct gntdev_priv {
> struct mutex lock;
> struct mm_struct *mm;
> struct mmu_notifier mn;
> +
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> + /* Device for which DMA memory is allocated. */
> + struct device *dma_dev;
> +#endif
> };
>
> struct unmap_notify {
> @@ -96,10 +104,27 @@ struct grant_map {
> struct gnttab_unmap_grant_ref *kunmap_ops;
> struct page **pages;
> unsigned long pages_vm_start;
> +
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> + /*
> + * If dmabuf_vaddr is not NULL then this mapping is backed by DMA
> + * capable memory.
> + */
> +
> + struct device *dma_dev;
> + /* Flags used to create this DMA buffer: GNTDEV_DMA_FLAG_XXX. */
> + int dma_flags;
> + void *dma_vaddr;
> + dma_addr_t dma_bus_addr;
> + /* This is required for gnttab_dma_{alloc|free}_pages. */
How about
/* Needed to avoid allocation in gnttab_dma_free_pages(). */
> + xen_pfn_t *frames;
> +#endif
> };
>
> static int unmap_grant_pages(struct grant_map *map, int offset, int pages);
>
> +static struct miscdevice gntdev_miscdev;
> +
> /* ------------------------------------------------------------------ */
>
> static void gntdev_print_maps(struct gntdev_priv *priv,
> @@ -121,8 +146,27 @@ static void gntdev_free_map(struct grant_map *map)
> if (map == NULL)
> return;
>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> + if (map->dma_vaddr) {
> + struct gnttab_dma_alloc_args args;
> +
> + args.dev = map->dma_dev;
> + args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
> + args.nr_pages = map->count;
> + args.pages = map->pages;
> + args.frames = map->frames;
> + args.vaddr = map->dma_vaddr;
> + args.dev_bus_addr = map->dma_bus_addr;
> +
> + gnttab_dma_free_pages(&args);
> + } else
> +#endif
> if (map->pages)
> gnttab_free_pages(map->count, map->pages);
> +
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> + kfree(map->frames);
> +#endif
Can this be done under if (map->dma_vaddr) ? In other words, is it
possible for dma_vaddr to be NULL and still have unallocated frames pointer?
> kfree(map->pages);
> kfree(map->grants);
> kfree(map->map_ops);
> @@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map *map)
> kfree(map);
> }
>
> -static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
> +static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count,
> + int dma_flags)
> {
> struct grant_map *add;
> int i;
> @@ -155,6 +200,37 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
> NULL == add->pages)
> goto err;
>
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> + add->dma_flags = dma_flags;
> +
> + /*
> + * Check if this mapping is requested to be backed
> + * by a DMA buffer.
> + */
> + if (dma_flags & (GNTDEV_DMA_FLAG_WC | GNTDEV_DMA_FLAG_COHERENT)) {
> + struct gnttab_dma_alloc_args args;
> +
> + add->frames = kcalloc(count, sizeof(add->frames[0]),
> + GFP_KERNEL);
> + if (!add->frames)
> + goto err;
> +
> + /* Remember the device, so we can free DMA memory. */
> + add->dma_dev = priv->dma_dev;
> +
> + args.dev = priv->dma_dev;
> + args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT;
> + args.nr_pages = count;
> + args.pages = add->pages;
> + args.frames = add->frames;
> +
> + if (gnttab_dma_alloc_pages(&args))
> + goto err;
> +
> + add->dma_vaddr = args.vaddr;
> + add->dma_bus_addr = args.dev_bus_addr;
> + } else
> +#endif
> if (gnttab_alloc_pages(count, add->pages))
> goto err;
>
> @@ -325,6 +401,14 @@ static int map_grant_pages(struct grant_map *map)
> map->unmap_ops[i].handle = map->map_ops[i].handle;
> if (use_ptemod)
> map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> + else if (map->dma_vaddr) {
> + unsigned long mfn;
> +
> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
Not pfn_to_mfn()?
-boris
> + map->unmap_ops[i].dev_bus_addr = __pfn_to_phys(mfn);
> + }
> +#endif
> }
> return err;
> }
> @@ -548,6 +632,17 @@ static int gntdev_open(struct inode *inode, struct file *flip)
> }
>
> flip->private_data = priv;
> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> + priv->dma_dev = gntdev_miscdev.this_device;
> +
> + /*
> + * The device is not spawn from a device tree, so arch_setup_dma_ops
> + * is not called, thus leaving the device with dummy DMA ops.
> + * Fix this call of_dma_configure() with a NULL node to set
> + * default DMA ops.
> + */
> + of_dma_configure(priv->dma_dev, NULL);
> +#endif
> pr_debug("priv %p\n", priv);
>
> return 0;
> @@ -589,7 +684,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
> return -EINVAL;
>
> err = -ENOMEM;
> - map = gntdev_alloc_map(priv, op.count);
> + map = gntdev_alloc_map(priv, op.count, 0 /* This is not a dma-buf. */);
> if (!map)
> return err;
>
> diff --git a/include/uapi/xen/gntdev.h b/include/uapi/xen/gntdev.h
> index 6d1163456c03..4b9d498a31d4 100644
> --- a/include/uapi/xen/gntdev.h
> +++ b/include/uapi/xen/gntdev.h
> @@ -200,4 +200,19 @@ struct ioctl_gntdev_grant_copy {
> /* Send an interrupt on the indicated event channel */
> #define UNMAP_NOTIFY_SEND_EVENT 0x2
>
> +/*
> + * Flags to be used while requesting memory mapping's backing storage
> + * to be allocated with DMA API.
> + */
> +
> +/*
> + * The buffer is backed with memory allocated with dma_alloc_wc.
> + */
> +#define GNTDEV_DMA_FLAG_WC (1 << 0)
> +
> +/*
> + * The buffer is backed with memory allocated with dma_alloc_coherent.
> + */
> +#define GNTDEV_DMA_FLAG_COHERENT (1 << 1)
> +
> #endif /* __LINUX_PUBLIC_GNTDEV_H__ */
On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Add UAPI and IOCTLs for dma-buf grant device driver extension:
> the extension allows userspace processes and kernel modules to
> use Xen backed dma-buf implementation. With this extension grant
> references to the pages of an imported dma-buf can be exported
> for other domain use and grant references coming from a foreign
> domain can be converted into a local dma-buf for local export.
> Implement basic initialization and stubs for Xen DMA buffers'
> support.
It would be very helpful if people advocating for this interface
reviewed it as well.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/Kconfig | 10 +++
> drivers/xen/Makefile | 1 +
> drivers/xen/gntdev-dmabuf.c | 75 +++++++++++++++++++
> drivers/xen/gntdev-dmabuf.h | 41 +++++++++++
> drivers/xen/gntdev.c | 142 ++++++++++++++++++++++++++++++++++++
> include/uapi/xen/gntdev.h | 91 +++++++++++++++++++++++
> 6 files changed, 360 insertions(+)
> create mode 100644 drivers/xen/gntdev-dmabuf.c
> create mode 100644 drivers/xen/gntdev-dmabuf.h
>
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 39536ddfbce4..52d64e4b6b81 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -152,6 +152,16 @@ config XEN_GNTDEV
> help
> Allows userspace processes to use grants.
>
> +config XEN_GNTDEV_DMABUF
> + bool "Add support for dma-buf grant access device driver extension"
> + depends on XEN_GNTDEV && XEN_GRANT_DMA_ALLOC && DMA_SHARED_BUFFER
Is there a reason to have XEN_GRANT_DMA_ALLOC without XEN_GNTDEV_DMABUF?
> + help
> + Allows userspace processes and kernel modules to use Xen backed
> + dma-buf implementation. With this extension grant references to
> + the pages of an imported dma-buf can be exported for other domain
> + use and grant references coming from a foreign domain can be
> + converted into a local dma-buf for local export.
> +
> config XEN_GRANT_DEV_ALLOC
> tristate "User-space grant reference allocator driver"
> depends on XEN
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 3c87b0c3aca6..33afb7b2b227 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -41,5 +41,6 @@ obj-$(CONFIG_XEN_PVCALLS_BACKEND) += pvcalls-back.o
> obj-$(CONFIG_XEN_PVCALLS_FRONTEND) += pvcalls-front.o
> xen-evtchn-y := evtchn.o
> xen-gntdev-y := gntdev.o
> +xen-gntdev-$(CONFIG_XEN_GNTDEV_DMABUF) += gntdev-dmabuf.o
> xen-gntalloc-y := gntalloc.o
> xen-privcmd-y := privcmd.o
> diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
> new file mode 100644
> index 000000000000..6bedd1387bd9
> --- /dev/null
> +++ b/drivers/xen/gntdev-dmabuf.c
> @@ -0,0 +1,75 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Xen dma-buf functionality for gntdev.
> + *
> + * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
> + */
> +
> +#include <linux/slab.h>
> +
> +#include "gntdev-dmabuf.h"
> +
> +struct gntdev_dmabuf_priv {
> + int dummy;
> +};
> +
> +/* ------------------------------------------------------------------ */
> +/* DMA buffer export support. */
> +/* ------------------------------------------------------------------ */
> +
> +/* ------------------------------------------------------------------ */
> +/* Implementation of wait for exported DMA buffer to be released. */
> +/* ------------------------------------------------------------------ */
Why this comment style?
> +
> +int gntdev_dmabuf_exp_wait_released(struct gntdev_dmabuf_priv *priv, int fd,
> + int wait_to_ms)
> +{
> + return -EINVAL;
> +}
> +
> +/* ------------------------------------------------------------------ */
> +/* DMA buffer export support. */
> +/* ------------------------------------------------------------------ */
> +
> +int gntdev_dmabuf_exp_from_pages(struct gntdev_dmabuf_export_args *args)
> +{
> + return -EINVAL;
> +}
> +
> +/* ------------------------------------------------------------------ */
> +/* DMA buffer import support. */
> +/* ------------------------------------------------------------------ */
> +
> +struct gntdev_dmabuf *
> +gntdev_dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct device *dev,
> + int fd, int count, int domid)
> +{
> + return ERR_PTR(-ENOMEM);
> +}
> +
> +u32 *gntdev_dmabuf_imp_get_refs(struct gntdev_dmabuf *gntdev_dmabuf)
> +{
> + return NULL;
> +}
> +
> +int gntdev_dmabuf_imp_release(struct gntdev_dmabuf_priv *priv, u32 fd)
> +{
> + return -EINVAL;
> +}
> +
> +struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void)
> +{
> + struct gntdev_dmabuf_priv *priv;
> +
> + priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> + if (!priv)
> + return ERR_PTR(-ENOMEM);
> +
> + return priv;
> +}
> +
> +void gntdev_dmabuf_fini(struct gntdev_dmabuf_priv *priv)
> +{
> + kfree(priv);
> +}
> diff --git a/drivers/xen/gntdev-dmabuf.h b/drivers/xen/gntdev-dmabuf.h
> new file mode 100644
> index 000000000000..040b2de904ac
> --- /dev/null
> +++ b/drivers/xen/gntdev-dmabuf.h
> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Xen dma-buf functionality for gntdev.
> + *
> + * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
> + */
> +
> +#ifndef _GNTDEV_DMABUF_H
> +#define _GNTDEV_DMABUF_H
> +
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/types.h>
> +
> +struct gntdev_dmabuf_priv;
> +struct gntdev_dmabuf;
> +struct device;
> +
> +struct gntdev_dmabuf_export_args {
> + int dummy;
> +};
Please define the full structure (at least what you have in the next
patch) here.
> +
> +struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void);
> +
> +void gntdev_dmabuf_fini(struct gntdev_dmabuf_priv *priv);
> +
> +int gntdev_dmabuf_exp_from_pages(struct gntdev_dmabuf_export_args *args);
> +
> +int gntdev_dmabuf_exp_wait_released(struct gntdev_dmabuf_priv *priv, int fd,
> + int wait_to_ms);
> +
> +struct gntdev_dmabuf *
> +gntdev_dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct device *dev,
> + int fd, int count, int domid);
> +
> +u32 *gntdev_dmabuf_imp_get_refs(struct gntdev_dmabuf *gntdev_dmabuf);
> +
> +int gntdev_dmabuf_imp_release(struct gntdev_dmabuf_priv *priv, u32 fd);
> +
> +#endif
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 9813fc440c70..7d58dfb3e5e8 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
...
>
> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
This code belongs in gntdev-dmabuf.c.
> +/* ------------------------------------------------------------------ */
> +/* DMA buffer export support. */
> +/* ------------------------------------------------------------------ */
> +
> +int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
> + int count, u32 domid, u32 *refs, u32 *fd)
> +{
> + /* XXX: this will need to work with gntdev's map, so leave it here. */
This doesn't help understanding what's going on (at least to me) and is
removed in the next patch. So no need for this comment.
-boris
> + *fd = -1;
> + return -EINVAL;
> +}
On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> 1. Create a dma-buf from grant references provided by the foreign
> domain. By default dma-buf is backed by system memory pages, but
> by providing GNTDEV_DMA_FLAG_XXX flags it can also be created
> as a DMA write-combine/coherent buffer, e.g. allocated with
> corresponding dma_alloc_xxx API.
> Export the resulting buffer as a new dma-buf.
>
> 2. Implement waiting for the dma-buf to be released: block until the
> dma-buf with the file descriptor provided is released.
> If within the time-out provided the buffer is not released then
> -ETIMEDOUT error is returned. If the buffer with the file descriptor
> does not exist or has already been released, then -ENOENT is
> returned. For valid file descriptors this must not be treated as
> error.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> ---
> drivers/xen/gntdev-dmabuf.c | 393 +++++++++++++++++++++++++++++++++++-
> drivers/xen/gntdev-dmabuf.h | 9 +-
> drivers/xen/gntdev.c | 90 ++++++++-
> 3 files changed, 486 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
> index 6bedd1387bd9..f612468879b4 100644
> --- a/drivers/xen/gntdev-dmabuf.c
> +++ b/drivers/xen/gntdev-dmabuf.c
> @@ -3,15 +3,58 @@
> /*
> * Xen dma-buf functionality for gntdev.
> *
> + * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c.
> + *
> * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
> */
>
> +#include <linux/dma-buf.h>
> #include <linux/slab.h>
>
> #include "gntdev-dmabuf.h"
>
> +struct gntdev_dmabuf {
> + struct gntdev_dmabuf_priv *priv;
> + struct dma_buf *dmabuf;
> + struct list_head next;
> + int fd;
> +
> + union {
> + struct {
> + /* Exported buffers are reference counted. */
> + struct kref refcount;
> +
> + struct gntdev_priv *priv;
> + struct grant_map *map;
> + void (*release)(struct gntdev_priv *priv,
> + struct grant_map *map);
> + } exp;
> + } u;
> +
> + /* Number of pages this buffer has. */
> + int nr_pages;
> + /* Pages of this buffer. */
> + struct page **pages;
> +};
> +
> +struct gntdev_dmabuf_wait_obj {
> + struct list_head next;
> + struct gntdev_dmabuf *gntdev_dmabuf;
> + struct completion completion;
> +};
> +
> +struct gntdev_dmabuf_attachment {
> + struct sg_table *sgt;
> + enum dma_data_direction dir;
> +};
> +
> struct gntdev_dmabuf_priv {
> - int dummy;
> + /* List of exported DMA buffers. */
> + struct list_head exp_list;
> + /* List of wait objects. */
> + struct list_head exp_wait_list;
> + /* This is the lock which protects dma_buf_xxx lists. */
> + struct mutex lock;
> };
>
> /* ------------------------------------------------------------------ */
> @@ -22,19 +65,359 @@ struct gntdev_dmabuf_priv {
> /* Implementation of wait for exported DMA buffer to be released. */
> /* ------------------------------------------------------------------ */
>
> +static void dmabuf_exp_release(struct kref *kref);
> +
> +static struct gntdev_dmabuf_wait_obj *
> +dmabuf_exp_wait_obj_new(struct gntdev_dmabuf_priv *priv,
> + struct gntdev_dmabuf *gntdev_dmabuf)
> +{
> + struct gntdev_dmabuf_wait_obj *obj;
> +
> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> + if (!obj)
> + return ERR_PTR(-ENOMEM);
> +
> + init_completion(&obj->completion);
> + obj->gntdev_dmabuf = gntdev_dmabuf;
> +
> + mutex_lock(&priv->lock);
> + list_add(&obj->next, &priv->exp_wait_list);
> + /* Put our reference and wait for gntdev_dmabuf's release to fire. */
> + kref_put(&gntdev_dmabuf->u.exp.refcount, dmabuf_exp_release);
> + mutex_unlock(&priv->lock);
> + return obj;
> +}
> +
> +static void dmabuf_exp_wait_obj_free(struct gntdev_dmabuf_priv *priv,
> + struct gntdev_dmabuf_wait_obj *obj)
> +{
> + struct gntdev_dmabuf_wait_obj *cur_obj, *q;
> +
> + mutex_lock(&priv->lock);
> + list_for_each_entry_safe(cur_obj, q, &priv->exp_wait_list, next)
> + if (cur_obj == obj) {
> + list_del(&obj->next);
> + kfree(obj);
> + break;
> + }
> + mutex_unlock(&priv->lock);
> +}
> +
> +static int dmabuf_exp_wait_obj_wait(struct gntdev_dmabuf_wait_obj *obj,
> + u32 wait_to_ms)
> +{
> + if (wait_for_completion_timeout(&obj->completion,
> + msecs_to_jiffies(wait_to_ms)) <= 0)
> + return -ETIMEDOUT;
> +
> + return 0;
> +}
> +
> +static void dmabuf_exp_wait_obj_signal(struct gntdev_dmabuf_priv *priv,
> + struct gntdev_dmabuf *gntdev_dmabuf)
> +{
> + struct gntdev_dmabuf_wait_obj *obj, *q;
> +
> + list_for_each_entry_safe(obj, q, &priv->exp_wait_list, next)
> + if (obj->gntdev_dmabuf == gntdev_dmabuf) {
> + pr_debug("Found gntdev_dmabuf in the wait list, wake\n");
> + complete_all(&obj->completion);
break ?
> + }
> +}
> +
> +static struct gntdev_dmabuf *
> +dmabuf_exp_wait_obj_get_by_fd(struct gntdev_dmabuf_priv *priv, int fd)
> +{
> + struct gntdev_dmabuf *q, *gntdev_dmabuf, *ret = ERR_PTR(-ENOENT);
> +
> + mutex_lock(&priv->lock);
> + list_for_each_entry_safe(gntdev_dmabuf, q, &priv->exp_list, next)
> + if (gntdev_dmabuf->fd == fd) {
> + pr_debug("Found gntdev_dmabuf in the wait list\n");
> + kref_get(&gntdev_dmabuf->u.exp.refcount);
> + ret = gntdev_dmabuf;
> + break;
> + }
> + mutex_unlock(&priv->lock);
> + return ret;
> +}
> +
> int gntdev_dmabuf_exp_wait_released(struct gntdev_dmabuf_priv *priv, int fd,
> int wait_to_ms)
> {
> - return -EINVAL;
> + struct gntdev_dmabuf *gntdev_dmabuf;
> + struct gntdev_dmabuf_wait_obj *obj;
> + int ret;
> +
> + pr_debug("Will wait for dma-buf with fd %d\n", fd);
> + /*
> + * Try to find the DMA buffer: if not found means that
> + * either the buffer has already been released or file descriptor
> + * provided is wrong.
> + */
> + gntdev_dmabuf = dmabuf_exp_wait_obj_get_by_fd(priv, fd);
> + if (IS_ERR(gntdev_dmabuf))
> + return PTR_ERR(gntdev_dmabuf);
> +
> + /*
> + * gntdev_dmabuf still exists and is reference count locked by us now,
> + * so prepare to wait: allocate wait object and add it to the wait list,
> + * so we can find it on release.
> + */
> + obj = dmabuf_exp_wait_obj_new(priv, gntdev_dmabuf);
> + if (IS_ERR(obj)) {
> + pr_err("Failed to setup wait object, ret %ld\n", PTR_ERR(obj));
No need for pr_err. We are out of memory.
> + return PTR_ERR(obj);
> +}
> +
> + ret = dmabuf_exp_wait_obj_wait(obj, wait_to_ms);
> + dmabuf_exp_wait_obj_free(priv, obj);
> + return ret;
> }
>
> /* ------------------------------------------------------------------ */
> /* DMA buffer export support. */
> /* ------------------------------------------------------------------ */
>
> +static struct sg_table *
> +dmabuf_pages_to_sgt(struct page **pages, unsigned int nr_pages)
> +{
> + struct sg_table *sgt;
> + int ret;
> +
> + sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
> + if (!sgt) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + ret = sg_alloc_table_from_pages(sgt, pages, nr_pages, 0,
> + nr_pages << PAGE_SHIFT,
> + GFP_KERNEL);
> + if (ret)
> + goto out;
> +
> + return sgt;
> +
> +out:
> + kfree(sgt);
> + return ERR_PTR(ret);
> +}
> +
> +static int dmabuf_exp_ops_attach(struct dma_buf *dma_buf,
> + struct device *target_dev,
> + struct dma_buf_attachment *attach)
> +{
> + struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach;
> +
> + gntdev_dmabuf_attach = kzalloc(sizeof(*gntdev_dmabuf_attach),
> + GFP_KERNEL);
> + if (!gntdev_dmabuf_attach)
> + return -ENOMEM;
> +
> + gntdev_dmabuf_attach->dir = DMA_NONE;
> + attach->priv = gntdev_dmabuf_attach;
> + /* Might need to pin the pages of the buffer now. */
Who is supposed to pin the pages? The caller?
> + return 0;
> +}
> +
> +static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
> + struct dma_buf_attachment *attach)
> +{
> + struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach = attach->priv;
> +
> + if (gntdev_dmabuf_attach) {
> + struct sg_table *sgt = gntdev_dmabuf_attach->sgt;
> +
> + if (sgt) {
> + if (gntdev_dmabuf_attach->dir != DMA_NONE)
> + dma_unmap_sg_attrs(attach->dev, sgt->sgl,
> + sgt->nents,
> + gntdev_dmabuf_attach->dir,
> + DMA_ATTR_SKIP_CPU_SYNC);
> + sg_free_table(sgt);
> + }
> +
> + kfree(sgt);
> + kfree(gntdev_dmabuf_attach);
> + attach->priv = NULL;
> + }
> + /* Might need to unpin the pages of the buffer now. */
Same question.
> +}
> +
> +static struct sg_table *
> +dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
> + enum dma_data_direction dir)
> +{
> + struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach = attach->priv;
> + struct gntdev_dmabuf *gntdev_dmabuf = attach->dmabuf->priv;
> + struct sg_table *sgt;
> +
> + pr_debug("Mapping %d pages for dev %p\n", gntdev_dmabuf->nr_pages,
> + attach->dev);
> +
> + if (WARN_ON(dir == DMA_NONE || !gntdev_dmabuf_attach))
WARN_ON_ONCE. Here and elsewhere.
> + return ERR_PTR(-EINVAL);
> +
> + /* Return the cached mapping when possible. */
> + if (gntdev_dmabuf_attach->dir == dir)
> + return gntdev_dmabuf_attach->sgt;
> +
> + /*
> + * Two mappings with different directions for the same attachment are
> + * not allowed.
> + */
> + if (WARN_ON(gntdev_dmabuf_attach->dir != DMA_NONE))
> + return ERR_PTR(-EBUSY);
> +
> + sgt = dmabuf_pages_to_sgt(gntdev_dmabuf->pages,
> + gntdev_dmabuf->nr_pages);
> + if (!IS_ERR(sgt)) {
> + if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
> + DMA_ATTR_SKIP_CPU_SYNC)) {
> + sg_free_table(sgt);
> + kfree(sgt);
> + sgt = ERR_PTR(-ENOMEM);
> + } else {
> + gntdev_dmabuf_attach->sgt = sgt;
> + gntdev_dmabuf_attach->dir = dir;
> + }
> + }
> + if (IS_ERR(sgt))
> + pr_err("Failed to map sg table for dev %p\n", attach->dev);
> + return sgt;
> +}
> +
> +static void dmabuf_exp_ops_unmap_dma_buf(struct dma_buf_attachment *attach,
> + struct sg_table *sgt,
> + enum dma_data_direction dir)
> +{
> + /* Not implemented. The unmap is done at dmabuf_exp_ops_detach(). */
> +}
> +
> +static void dmabuf_exp_release(struct kref *kref)
> +{
> + struct gntdev_dmabuf *gntdev_dmabuf =
> + container_of(kref, struct gntdev_dmabuf, u.exp.refcount);
> +
> + dmabuf_exp_wait_obj_signal(gntdev_dmabuf->priv, gntdev_dmabuf);
> + list_del(&gntdev_dmabuf->next);
> + kfree(gntdev_dmabuf);
> +}
> +
> +static void dmabuf_exp_ops_release(struct dma_buf *dma_buf)
> +{
> + struct gntdev_dmabuf *gntdev_dmabuf = dma_buf->priv;
> + struct gntdev_dmabuf_priv *priv = gntdev_dmabuf->priv;
> +
> + gntdev_dmabuf->u.exp.release(gntdev_dmabuf->u.exp.priv,
> + gntdev_dmabuf->u.exp.map);
> + mutex_lock(&priv->lock);
> + kref_put(&gntdev_dmabuf->u.exp.refcount, dmabuf_exp_release);
> + mutex_unlock(&priv->lock);
> +}
> +
> +static void *dmabuf_exp_ops_kmap_atomic(struct dma_buf *dma_buf,
> + unsigned long page_num)
> +{
> + /* Not implemented. */
> + return NULL;
> +}
> +
> +static void dmabuf_exp_ops_kunmap_atomic(struct dma_buf *dma_buf,
> + unsigned long page_num, void *addr)
> +{
> + /* Not implemented. */
> +}
> +
> +static void *dmabuf_exp_ops_kmap(struct dma_buf *dma_buf,
> + unsigned long page_num)
> +{
> + /* Not implemented. */
> + return NULL;
> +}
> +
> +static void dmabuf_exp_ops_kunmap(struct dma_buf *dma_buf,
> + unsigned long page_num, void *addr)
> +{
> + /* Not implemented. */
> +}
> +
> +static int dmabuf_exp_ops_mmap(struct dma_buf *dma_buf,
> + struct vm_area_struct *vma)
> +{
> + /* Not implemented. */
> + return 0;
> +}
> +
> +static const struct dma_buf_ops dmabuf_exp_ops = {
> + .attach = dmabuf_exp_ops_attach,
> + .detach = dmabuf_exp_ops_detach,
> + .map_dma_buf = dmabuf_exp_ops_map_dma_buf,
> + .unmap_dma_buf = dmabuf_exp_ops_unmap_dma_buf,
> + .release = dmabuf_exp_ops_release,
> + .map = dmabuf_exp_ops_kmap,
> + .map_atomic = dmabuf_exp_ops_kmap_atomic,
> + .unmap = dmabuf_exp_ops_kunmap,
> + .unmap_atomic = dmabuf_exp_ops_kunmap_atomic,
> + .mmap = dmabuf_exp_ops_mmap,
> +};
> +
> int gntdev_dmabuf_exp_from_pages(struct gntdev_dmabuf_export_args *args)
> {
> - return -EINVAL;
> + DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> + struct gntdev_dmabuf *gntdev_dmabuf;
> + int ret = 0;
> +
> + gntdev_dmabuf = kzalloc(sizeof(*gntdev_dmabuf), GFP_KERNEL);
> + if (!gntdev_dmabuf)
> + return -ENOMEM;
> +
> + kref_init(&gntdev_dmabuf->u.exp.refcount);
> +
> + gntdev_dmabuf->priv = args->dmabuf_priv;
> + gntdev_dmabuf->nr_pages = args->count;
> + gntdev_dmabuf->pages = args->pages;
> + gntdev_dmabuf->u.exp.priv = args->priv;
> + gntdev_dmabuf->u.exp.map = args->map;
> + gntdev_dmabuf->u.exp.release = args->release;
> +
> + exp_info.exp_name = KBUILD_MODNAME;
> + if (args->dev->driver && args->dev->driver->owner)
> + exp_info.owner = args->dev->driver->owner;
> + else
> + exp_info.owner = THIS_MODULE;
> + exp_info.ops = &dmabuf_exp_ops;
> + exp_info.size = args->count << PAGE_SHIFT;
> + exp_info.flags = O_RDWR;
> + exp_info.priv = gntdev_dmabuf;
> +
> + gntdev_dmabuf->dmabuf = dma_buf_export(&exp_info);
> + if (IS_ERR(gntdev_dmabuf->dmabuf)) {
> + ret = PTR_ERR(gntdev_dmabuf->dmabuf);
> + gntdev_dmabuf->dmabuf = NULL;
> + goto fail;
> + }
> +
> + ret = dma_buf_fd(gntdev_dmabuf->dmabuf, O_CLOEXEC);
> + if (ret < 0)
> + goto fail;
> +
> + gntdev_dmabuf->fd = ret;
> + args->fd = ret;
> +
> + pr_debug("Exporting DMA buffer with fd %d\n", ret);
> +
> + mutex_lock(&args->dmabuf_priv->lock);
> + list_add(&gntdev_dmabuf->next, &args->dmabuf_priv->exp_list);
> + mutex_unlock(&args->dmabuf_priv->lock);
> + return 0;
> +
> +fail:
> + if (gntdev_dmabuf->dmabuf)
> + dma_buf_put(gntdev_dmabuf->dmabuf);
> + kfree(gntdev_dmabuf);
> + return ret;
> }
>
> /* ------------------------------------------------------------------ */
> @@ -66,6 +449,10 @@ struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void)
> if (!priv)
> return ERR_PTR(-ENOMEM);
>
> + mutex_init(&priv->lock);
> + INIT_LIST_HEAD(&priv->exp_list);
> + INIT_LIST_HEAD(&priv->exp_wait_list);
> +
> return priv;
> }
>
> diff --git a/drivers/xen/gntdev-dmabuf.h b/drivers/xen/gntdev-dmabuf.h
> index 040b2de904ac..95c23a24f640 100644
> --- a/drivers/xen/gntdev-dmabuf.h
> +++ b/drivers/xen/gntdev-dmabuf.h
> @@ -18,7 +18,14 @@ struct gntdev_dmabuf;
> struct device;
>
> struct gntdev_dmabuf_export_args {
> - int dummy;
> + struct gntdev_priv *priv;
> + struct grant_map *map;
> + void (*release)(struct gntdev_priv *priv, struct grant_map *map);
> + struct gntdev_dmabuf_priv *dmabuf_priv;
> + struct device *dev;
> + int count;
> + struct page **pages;
> + u32 fd;
> };
>
> struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void);
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 7d58dfb3e5e8..cf255d45f20f 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -319,6 +319,16 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map)
> gntdev_free_map(map);
> }
>
> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
> +static void gntdev_remove_map(struct gntdev_priv *priv, struct grant_map *map)
> +{
> + mutex_lock(&priv->lock);
> + list_del(&map->next);
> + gntdev_put_map(NULL /* already removed */, map);
> + mutex_unlock(&priv->lock);
> +}
> +#endif
> +
> /* ------------------------------------------------------------------ */
>
> static int find_grant_ptes(pte_t *pte, pgtable_t token,
> @@ -1063,12 +1073,88 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
> /* DMA buffer export support. */
> /* ------------------------------------------------------------------ */
>
> +static struct grant_map *
> +dmabuf_exp_alloc_backing_storage(struct gntdev_priv *priv, int dmabuf_flags,
> + int count)
> +{
> + struct grant_map *map;
> +
> + if (unlikely(count <= 0))
> + return ERR_PTR(-EINVAL);
> +
> + if ((dmabuf_flags & GNTDEV_DMA_FLAG_WC) &&
> + (dmabuf_flags & GNTDEV_DMA_FLAG_COHERENT)) {
> + pr_err("Wrong dma-buf flags: either WC or coherent, not both\n");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + map = gntdev_alloc_map(priv, count, dmabuf_flags);
> + if (!map)
> + return ERR_PTR(-ENOMEM);
> +
> + if (unlikely(atomic_add_return(count, &pages_mapped) > limit)) {
> + pr_err("can't map: over limit\n");
> + gntdev_put_map(NULL, map);
> + return ERR_PTR(-ENOMEM);
> + }
> + return map;
> +}
> +
> int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
> int count, u32 domid, u32 *refs, u32 *fd)
> {
> - /* XXX: this will need to work with gntdev's map, so leave it here. */
> + struct grant_map *map;
> + struct gntdev_dmabuf_export_args args;
> + int i, ret;
> +
> *fd = -1;
> - return -EINVAL;
> +
> + if (use_ptemod) {
> + pr_err("Cannot provide dma-buf: use_ptemode %d\n",
> + use_ptemod);
No pr_err here please. This can potentially become a DoS vector as it
comes directly from ioctl.
I would, in fact, revisit other uses of pr_err in this file.
> + return -EINVAL;
> + }
> +
> + map = dmabuf_exp_alloc_backing_storage(priv, flags, count);
@count comes from userspace. dmabuf_exp_alloc_backing_storage only
checks for it to be >0. Should it be checked for some sane max value?
-boris
> + if (IS_ERR(map))
> + return PTR_ERR(map);
> +
> + for (i = 0; i < count; i++) {
> + map->grants[i].domid = domid;
> + map->grants[i].ref = refs[i];
> + }
> +
> + mutex_lock(&priv->lock);
> + gntdev_add_map(priv, map);
> + mutex_unlock(&priv->lock);
> +
> + map->flags |= GNTMAP_host_map;
> +#if defined(CONFIG_X86)
> + map->flags |= GNTMAP_device_map;
> +#endif
> +
> + ret = map_grant_pages(map);
> + if (ret < 0)
> + goto out;
> +
> + args.priv = priv;
> + args.map = map;
> + args.release = gntdev_remove_map;
> + args.dev = priv->dma_dev;
> + args.dmabuf_priv = priv->dmabuf_priv;
> + args.count = map->count;
> + args.pages = map->pages;
> +
> + ret = gntdev_dmabuf_exp_from_pages(&args);
> + if (ret < 0)
> + goto out;
> +
> + *fd = args.fd;
> + return 0;
> +
> +out:
> + gntdev_remove_map(priv, map);
> + return ret;
> }
>
> /* ------------------------------------------------------------------ */
On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> /* ------------------------------------------------------------------ */
>
> +static int
> +dmabuf_imp_grant_foreign_access(struct page **pages, u32 *refs,
> + int count, int domid)
> +{
> + grant_ref_t priv_gref_head;
> + int i, ret;
> +
> + ret = gnttab_alloc_grant_references(count, &priv_gref_head);
> + if (ret < 0) {
> + pr_err("Cannot allocate grant references, ret %d\n", ret);
> + return ret;
> + }
> +
> + for (i = 0; i < count; i++) {
> + int cur_ref;
> +
> + cur_ref = gnttab_claim_grant_reference(&priv_gref_head);
> + if (cur_ref < 0) {
> + ret = cur_ref;
> + pr_err("Cannot claim grant reference, ret %d\n", ret);
> + goto out;
> + }
> +
> + gnttab_grant_foreign_access_ref(cur_ref, domid,
> + xen_page_to_gfn(pages[i]), 0);
> + refs[i] = cur_ref;
> + }
> +
> + ret = 0;
return 0?
> +
> +out:
> + gnttab_free_grant_references(priv_gref_head);
> + return ret;
> +}
> +
On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Allow creating grant device context for use by kernel modules which
> require functionality, provided by gntdev. Export symbols for dma-buf
> API provided by the module.
Can you give an example of who'd be using these interfaces?
-boris
On 06/04/2018 07:37 PM, Boris Ostrovsky wrote:
> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> diff --git a/include/xen/mem-reservation.h b/include/xen/mem-reservation.h
>> new file mode 100644
>> index 000000000000..a727d65a1e61
>> --- /dev/null
>> +++ b/include/xen/mem-reservation.h
>> @@ -0,0 +1,65 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +/*
>> + * Xen memory reservation utilities.
>> + *
>> + * Copyright (c) 2003, B Dragovic
>> + * Copyright (c) 2003-2004, M Williamson, K Fraser
>> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
>> + * Copyright (c) 2010 Daniel Kiper
>> + * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
>> + */
>> +
>> +#ifndef _XENMEM_RESERVATION_H
>> +#define _XENMEM_RESERVATION_H
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/slab.h>
>> +
>> +#include <asm/xen/hypercall.h>
>> +#include <asm/tlb.h>
>> +
>> +#include <xen/interface/memory.h>
>> +#include <xen/page.h>
>> +
>> +#ifdef CONFIG_XEN_SCRUB_PAGES
>> +void xenmem_reservation_scrub_page(struct page *page);
>> +#else
>> +static inline void xenmem_reservation_scrub_page(struct page *page)
>> +{
>> +}
>> +#endif
>
> Given that this is a wrapper around a single call I'd prefer
>
> inline void xenmem_reservation_scrub_page(struct page *page)
> {
> #ifdef CONFIG_XEN_SCRUB_PAGES
> clear_highpage(page);
> #endif
> }
Unfortunately this can't be done because of
EXPORT_SYMBOL_GPL(xenmem_reservation_scrub_page);
which is obviously cannot be used for static inline functions.
So, I'll keep it as is.
>
>
> -boris
>
>
Thank you,
Oleksandr
On 06/04/2018 09:46 PM, Boris Ostrovsky wrote:
> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> Extend grant table module API to allow allocating buffers that can
>> be used for DMA operations and mapping foreign grant references
>> on top of those.
>> The resulting buffer is similar to the one allocated by the balloon
>> driver in terms that proper memory reservation is made
>> ({increase|decrease}_reservation and VA mappings updated if needed).
>> This is useful for sharing foreign buffers with HW drivers which
>> cannot work with scattered buffers provided by the balloon driver,
>> but require DMAable memory instead.
>>
>> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
>> ---
>> drivers/xen/Kconfig | 13 +++++
>> drivers/xen/grant-table.c | 109 ++++++++++++++++++++++++++++++++++++++
>> include/xen/grant_table.h | 18 +++++++
>> 3 files changed, 140 insertions(+)
>>
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index e5d0c28372ea..39536ddfbce4 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -161,6 +161,19 @@ config XEN_GRANT_DEV_ALLOC
>> to other domains. This can be used to implement frontend drivers
>> or as part of an inter-domain shared memory channel.
>>
>> +config XEN_GRANT_DMA_ALLOC
>> + bool "Allow allocating DMA capable buffers with grant reference module"
>> + depends on XEN && HAS_DMA
>> + help
>> + Extends grant table module API to allow allocating DMA capable
>> + buffers and mapping foreign grant references on top of it.
>> + The resulting buffer is similar to one allocated by the balloon
>> + driver in terms that proper memory reservation is made
>> + ({increase|decrease}_reservation and VA mappings updated if needed).
>> + This is useful for sharing foreign buffers with HW drivers which
>> + cannot work with scattered buffers provided by the balloon driver,
>> + but require DMAable memory instead.
>> +
>> config SWIOTLB_XEN
>> def_bool y
>> select SWIOTLB
>> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
>> index dbb48a89e987..5658e58d9cc6 100644
>> --- a/drivers/xen/grant-table.c
>> +++ b/drivers/xen/grant-table.c
>> @@ -45,6 +45,9 @@
>> #include <linux/workqueue.h>
>> #include <linux/ratelimit.h>
>> #include <linux/moduleparam.h>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> +#include <linux/dma-mapping.h>
>> +#endif
>>
>> #include <xen/xen.h>
>> #include <xen/interface/xen.h>
>> @@ -57,6 +60,7 @@
>> #ifdef CONFIG_X86
>> #include <asm/xen/cpuid.h>
>> #endif
>> +#include <xen/mem-reservation.h>
>> #include <asm/xen/hypercall.h>
>> #include <asm/xen/interface.h>
>>
>> @@ -811,6 +815,73 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
>> }
>> EXPORT_SYMBOL_GPL(gnttab_alloc_pages);
>>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> +/**
>> + * gnttab_dma_alloc_pages - alloc DMAable pages suitable for grant mapping into
>> + * @args: arguments to the function
>> + */
>> +int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
>> +{
>> + unsigned long pfn, start_pfn;
>> + size_t size;
>> + int i, ret;
>> +
>> + size = args->nr_pages << PAGE_SHIFT;
>> + if (args->coherent)
>> + args->vaddr = dma_alloc_coherent(args->dev, size,
>> + &args->dev_bus_addr,
>> + GFP_KERNEL | __GFP_NOWARN);
>> + else
>> + args->vaddr = dma_alloc_wc(args->dev, size,
>> + &args->dev_bus_addr,
>> + GFP_KERNEL | __GFP_NOWARN);
>> + if (!args->vaddr) {
>> + pr_err("Failed to allocate DMA buffer of size %zu\n", size);
>> + return -ENOMEM;
>> + }
>> +
>> + start_pfn = __phys_to_pfn(args->dev_bus_addr);
>> + for (pfn = start_pfn, i = 0; pfn < start_pfn + args->nr_pages;
>> + pfn++, i++) {
>> + struct page *page = pfn_to_page(pfn);
>> +
>> + args->pages[i] = page;
>> + args->frames[i] = xen_page_to_gfn(page);
>> + xenmem_reservation_scrub_page(page);
>> + }
>> +
>> + xenmem_reservation_va_mapping_reset(args->nr_pages, args->pages);
>> +
>> + ret = xenmem_reservation_decrease(args->nr_pages, args->frames);
>> + if (ret != args->nr_pages) {
>> + pr_err("Failed to decrease reservation for DMA buffer\n");
>> + ret = -EFAULT;
>> + goto fail_free_dma;
>> + }
>> +
>> + ret = gnttab_pages_set_private(args->nr_pages, args->pages);
>> + if (ret < 0)
>> + goto fail_clear_private;
>> +
>> + return 0;
>> +
>> +fail_clear_private:
>> + gnttab_pages_clear_private(args->nr_pages, args->pages);
>> +fail_free_dma:
>> + xenmem_reservation_increase(args->nr_pages, args->frames);
>> + xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
>> + args->frames);
>> + if (args->coherent)
>> + dma_free_coherent(args->dev, size,
>> + args->vaddr, args->dev_bus_addr);
>> + else
>> + dma_free_wc(args->dev, size,
>> + args->vaddr, args->dev_bus_addr);
>> + return ret;
>> +}
>
> Would it be possible to call gnttab_dma_free_pages() here?
As we moved frames array outside - yes, I'll call gnttab_dma_free_pages
on failure then.
>
>> +EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
>> +#endif
>> +
>> void gnttab_pages_clear_private(int nr_pages, struct page **pages)
>> {
>> int i;
>> @@ -838,6 +909,44 @@ void gnttab_free_pages(int nr_pages, struct page **pages)
>> }
>> EXPORT_SYMBOL_GPL(gnttab_free_pages);
>>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> I'd move this after (or before) gnttab_dma_alloc_page() to keep both
> inside a single ifdef block.
Ok, will also move and regroup functions to be implemented
close to each other:
gnttab_dma_{alloc|free}_pages
gnttab_{alloc|free}_pages
gnttab_pages_{set|clear}_private
> -boris
>
>
>> +/**
>> + * gnttab_dma_free_pages - free DMAable pages
>> + * @args: arguments to the function
>> + */
>> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
>> +{
>> + size_t size;
>> + int i, ret;
>> +
>> + gnttab_pages_clear_private(args->nr_pages, args->pages);
>> +
>> + for (i = 0; i < args->nr_pages; i++)
>> + args->frames[i] = page_to_xen_pfn(args->pages[i]);
>> +
>> + ret = xenmem_reservation_increase(args->nr_pages, args->frames);
>> + if (ret != args->nr_pages) {
>> + pr_err("Failed to decrease reservation for DMA buffer\n");
>> + ret = -EFAULT;
>> + } else {
>> + ret = 0;
>> + }
>> +
>> + xenmem_reservation_va_mapping_update(args->nr_pages, args->pages,
>> + args->frames);
>> +
>> + size = args->nr_pages << PAGE_SHIFT;
>> + if (args->coherent)
>> + dma_free_coherent(args->dev, size,
>> + args->vaddr, args->dev_bus_addr);
>> + else
>> + dma_free_wc(args->dev, size,
>> + args->vaddr, args->dev_bus_addr);
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(gnttab_dma_free_pages);
>> +#endif
>> +
>> /* Handling of paged out grant targets (GNTST_eagain) */
>> #define MAX_DELAY 256
>> static inline void
>> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
>> index de03f2542bb7..9bc5bc07d4d3 100644
>> --- a/include/xen/grant_table.h
>> +++ b/include/xen/grant_table.h
>> @@ -198,6 +198,24 @@ void gnttab_free_auto_xlat_frames(void);
>> int gnttab_alloc_pages(int nr_pages, struct page **pages);
>> void gnttab_free_pages(int nr_pages, struct page **pages);
>>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> +struct gnttab_dma_alloc_args {
>> + /* Device for which DMA memory will be/was allocated. */
>> + struct device *dev;
>> + /* If set then DMA buffer is coherent and write-combine otherwise. */
>> + bool coherent;
>> +
>> + int nr_pages;
>> + struct page **pages;
>> + xen_pfn_t *frames;
>> + void *vaddr;
>> + dma_addr_t dev_bus_addr;
>> +};
>> +
>> +int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args);
>> +int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args);
>> +#endif
>> +
>> int gnttab_pages_set_private(int nr_pages, struct page **pages);
>> void gnttab_pages_clear_private(int nr_pages, struct page **pages);
>>
On 06/04/2018 11:12 PM, Boris Ostrovsky wrote:
> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> Allow mappings for DMA backed buffers if grant table module
>> supports such: this extends grant device to not only map buffers
>> made of balloon pages, but also from buffers allocated with
>> dma_alloc_xxx.
>>
>> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
>> ---
>> drivers/xen/gntdev.c | 99 ++++++++++++++++++++++++++++++++++++++-
>> include/uapi/xen/gntdev.h | 15 ++++++
>> 2 files changed, 112 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index bd56653b9bbc..9813fc440c70 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
>> @@ -37,6 +37,9 @@
>> #include <linux/slab.h>
>> #include <linux/highmem.h>
>> #include <linux/refcount.h>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> +#include <linux/of_device.h>
>> +#endif
>>
>> #include <xen/xen.h>
>> #include <xen/grant_table.h>
>> @@ -72,6 +75,11 @@ struct gntdev_priv {
>> struct mutex lock;
>> struct mm_struct *mm;
>> struct mmu_notifier mn;
>> +
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + /* Device for which DMA memory is allocated. */
>> + struct device *dma_dev;
>> +#endif
>> };
>>
>> struct unmap_notify {
>> @@ -96,10 +104,27 @@ struct grant_map {
>> struct gnttab_unmap_grant_ref *kunmap_ops;
>> struct page **pages;
>> unsigned long pages_vm_start;
>> +
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + /*
>> + * If dmabuf_vaddr is not NULL then this mapping is backed by DMA
>> + * capable memory.
>> + */
>> +
>> + struct device *dma_dev;
>> + /* Flags used to create this DMA buffer: GNTDEV_DMA_FLAG_XXX. */
>> + int dma_flags;
>> + void *dma_vaddr;
>> + dma_addr_t dma_bus_addr;
>> + /* This is required for gnttab_dma_{alloc|free}_pages. */
> How about
>
> /* Needed to avoid allocation in gnttab_dma_free_pages(). */
>
Ok
>> + xen_pfn_t *frames;
>> +#endif
>> };
>>
>> static int unmap_grant_pages(struct grant_map *map, int offset, int pages);
>>
>> +static struct miscdevice gntdev_miscdev;
>> +
>> /* ------------------------------------------------------------------ */
>>
>> static void gntdev_print_maps(struct gntdev_priv *priv,
>> @@ -121,8 +146,27 @@ static void gntdev_free_map(struct grant_map *map)
>> if (map == NULL)
>> return;
>>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + if (map->dma_vaddr) {
>> + struct gnttab_dma_alloc_args args;
>> +
>> + args.dev = map->dma_dev;
>> + args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>> + args.nr_pages = map->count;
>> + args.pages = map->pages;
>> + args.frames = map->frames;
>> + args.vaddr = map->dma_vaddr;
>> + args.dev_bus_addr = map->dma_bus_addr;
>> +
>> + gnttab_dma_free_pages(&args);
>> + } else
>> +#endif
>> if (map->pages)
>> gnttab_free_pages(map->count, map->pages);
>> +
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + kfree(map->frames);
>> +#endif
>
> Can this be done under if (map->dma_vaddr) ?
> In other words, is it
> possible for dma_vaddr to be NULL and still have unallocated frames pointer?
It is possible to have vaddr == NULL and frames != NULL as we
allocate frames outside of gnttab_dma_alloc_pages which
may fail. Calling kfree on NULL pointer is safe, so
I see no reason to change this code.
>
>> kfree(map->pages);
>> kfree(map->grants);
>> kfree(map->map_ops);
>> @@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map *map)
>> kfree(map);
>> }
>>
>> -static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
>> +static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count,
>> + int dma_flags)
>> {
>> struct grant_map *add;
>> int i;
>> @@ -155,6 +200,37 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
>> NULL == add->pages)
>> goto err;
>>
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + add->dma_flags = dma_flags;
>> +
>> + /*
>> + * Check if this mapping is requested to be backed
>> + * by a DMA buffer.
>> + */
>> + if (dma_flags & (GNTDEV_DMA_FLAG_WC | GNTDEV_DMA_FLAG_COHERENT)) {
>> + struct gnttab_dma_alloc_args args;
>> +
>> + add->frames = kcalloc(count, sizeof(add->frames[0]),
>> + GFP_KERNEL);
>> + if (!add->frames)
>> + goto err;
>> +
>> + /* Remember the device, so we can free DMA memory. */
>> + add->dma_dev = priv->dma_dev;
>> +
>> + args.dev = priv->dma_dev;
>> + args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>> + args.nr_pages = count;
>> + args.pages = add->pages;
>> + args.frames = add->frames;
>> +
>> + if (gnttab_dma_alloc_pages(&args))
>> + goto err;
>> +
>> + add->dma_vaddr = args.vaddr;
>> + add->dma_bus_addr = args.dev_bus_addr;
>> + } else
>> +#endif
>> if (gnttab_alloc_pages(count, add->pages))
>> goto err;
>>
>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct grant_map *map)
>> map->unmap_ops[i].handle = map->map_ops[i].handle;
>> if (use_ptemod)
>> map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + else if (map->dma_vaddr) {
>> + unsigned long mfn;
>> +
>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>
> Not pfn_to_mfn()?
I'd love to, but pfn_to_mfn is only defined for x86, not ARM: [1] and [2]
Thus,
drivers/xen/gntdev.c:408:10: error: implicit declaration of function
‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
So, I'll keep __pfn_to_mfn
>
>
> -boris
Thank you,
Oleksandr
>> + map->unmap_ops[i].dev_bus_addr = __pfn_to_phys(mfn);
>> + }
>> +#endif
>> }
>> return err;
>> }
>> @@ -548,6 +632,17 @@ static int gntdev_open(struct inode *inode, struct file *flip)
>> }
>>
>> flip->private_data = priv;
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + priv->dma_dev = gntdev_miscdev.this_device;
>> +
>> + /*
>> + * The device is not spawn from a device tree, so arch_setup_dma_ops
>> + * is not called, thus leaving the device with dummy DMA ops.
>> + * Fix this call of_dma_configure() with a NULL node to set
>> + * default DMA ops.
>> + */
>> + of_dma_configure(priv->dma_dev, NULL);
>> +#endif
>> pr_debug("priv %p\n", priv);
>>
>> return 0;
>> @@ -589,7 +684,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>> return -EINVAL;
>>
>> err = -ENOMEM;
>> - map = gntdev_alloc_map(priv, op.count);
>> + map = gntdev_alloc_map(priv, op.count, 0 /* This is not a dma-buf. */);
>> if (!map)
>> return err;
>>
>> diff --git a/include/uapi/xen/gntdev.h b/include/uapi/xen/gntdev.h
>> index 6d1163456c03..4b9d498a31d4 100644
>> --- a/include/uapi/xen/gntdev.h
>> +++ b/include/uapi/xen/gntdev.h
>> @@ -200,4 +200,19 @@ struct ioctl_gntdev_grant_copy {
>> /* Send an interrupt on the indicated event channel */
>> #define UNMAP_NOTIFY_SEND_EVENT 0x2
>>
>> +/*
>> + * Flags to be used while requesting memory mapping's backing storage
>> + * to be allocated with DMA API.
>> + */
>> +
>> +/*
>> + * The buffer is backed with memory allocated with dma_alloc_wc.
>> + */
>> +#define GNTDEV_DMA_FLAG_WC (1 << 0)
>> +
>> +/*
>> + * The buffer is backed with memory allocated with dma_alloc_coherent.
>> + */
>> +#define GNTDEV_DMA_FLAG_COHERENT (1 << 1)
>> +
>> #endif /* __LINUX_PUBLIC_GNTDEV_H__ */
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> https://lists.xenproject.org/mailman/listinfo/xen-devel
[1] https://elixir.bootlin.com/linux/v4.17/ident/pfn_to_mfn
[2] https://elixir.bootlin.com/linux/v4.17/ident/__pfn_to_mfn
On 06/04/2018 11:49 PM, Boris Ostrovsky wrote:
> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> Add UAPI and IOCTLs for dma-buf grant device driver extension:
>> the extension allows userspace processes and kernel modules to
>> use Xen backed dma-buf implementation. With this extension grant
>> references to the pages of an imported dma-buf can be exported
>> for other domain use and grant references coming from a foreign
>> domain can be converted into a local dma-buf for local export.
>> Implement basic initialization and stubs for Xen DMA buffers'
>> support.
>
> It would be very helpful if people advocating for this interface
> reviewed it as well.
I would also love to see their comments here ;)
>
>> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
>> ---
>> drivers/xen/Kconfig | 10 +++
>> drivers/xen/Makefile | 1 +
>> drivers/xen/gntdev-dmabuf.c | 75 +++++++++++++++++++
>> drivers/xen/gntdev-dmabuf.h | 41 +++++++++++
>> drivers/xen/gntdev.c | 142 ++++++++++++++++++++++++++++++++++++
>> include/uapi/xen/gntdev.h | 91 +++++++++++++++++++++++
>> 6 files changed, 360 insertions(+)
>> create mode 100644 drivers/xen/gntdev-dmabuf.c
>> create mode 100644 drivers/xen/gntdev-dmabuf.h
>>
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index 39536ddfbce4..52d64e4b6b81 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -152,6 +152,16 @@ config XEN_GNTDEV
>> help
>> Allows userspace processes to use grants.
>>
>> +config XEN_GNTDEV_DMABUF
>> + bool "Add support for dma-buf grant access device driver extension"
>> + depends on XEN_GNTDEV && XEN_GRANT_DMA_ALLOC && DMA_SHARED_BUFFER
>
> Is there a reason to have XEN_GRANT_DMA_ALLOC without XEN_GNTDEV_DMABUF?
One can use grant-table's DMA API without using dma-buf at all, e.g.
dma-buf is sort of functionality on top of DMA allocated memory.
We have a use-case for a driver domain (guest domain in fact)
backed with IOMMU and still requiring allocations created as
contiguous/DMA memory, so those buffers can be passed around to
drivers expecting DMA-only buffers.
So, IMO this is a valid use-case "to have XEN_GRANT_DMA_ALLOC
without XEN_GNTDEV_DMABUF"
>
>> + help
>> + Allows userspace processes and kernel modules to use Xen backed
>> + dma-buf implementation. With this extension grant references to
>> + the pages of an imported dma-buf can be exported for other domain
>> + use and grant references coming from a foreign domain can be
>> + converted into a local dma-buf for local export.
>> +
>> config XEN_GRANT_DEV_ALLOC
>> tristate "User-space grant reference allocator driver"
>> depends on XEN
>> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
>> index 3c87b0c3aca6..33afb7b2b227 100644
>> --- a/drivers/xen/Makefile
>> +++ b/drivers/xen/Makefile
>> @@ -41,5 +41,6 @@ obj-$(CONFIG_XEN_PVCALLS_BACKEND) += pvcalls-back.o
>> obj-$(CONFIG_XEN_PVCALLS_FRONTEND) += pvcalls-front.o
>> xen-evtchn-y := evtchn.o
>> xen-gntdev-y := gntdev.o
>> +xen-gntdev-$(CONFIG_XEN_GNTDEV_DMABUF) += gntdev-dmabuf.o
>> xen-gntalloc-y := gntalloc.o
>> xen-privcmd-y := privcmd.o
>> diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
>> new file mode 100644
>> index 000000000000..6bedd1387bd9
>> --- /dev/null
>> +++ b/drivers/xen/gntdev-dmabuf.c
>> @@ -0,0 +1,75 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +/*
>> + * Xen dma-buf functionality for gntdev.
>> + *
>> + * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
>> + */
>> +
>> +#include <linux/slab.h>
>> +
>> +#include "gntdev-dmabuf.h"
>> +
>> +struct gntdev_dmabuf_priv {
>> + int dummy;
>> +};
>> +
>> +/* ------------------------------------------------------------------ */
>> +/* DMA buffer export support. */
>> +/* ------------------------------------------------------------------ */
>> +
>> +/* ------------------------------------------------------------------ */
>> +/* Implementation of wait for exported DMA buffer to be released. */
>> +/* ------------------------------------------------------------------ */
> Why this comment style?
Just a copy-paste from gntdev, will change to usual /*..*/
>
>> +
>> +int gntdev_dmabuf_exp_wait_released(struct gntdev_dmabuf_priv *priv, int fd,
>> + int wait_to_ms)
>> +{
>> + return -EINVAL;
>> +}
>> +
>> +/* ------------------------------------------------------------------ */
>> +/* DMA buffer export support. */
>> +/* ------------------------------------------------------------------ */
>> +
>> +int gntdev_dmabuf_exp_from_pages(struct gntdev_dmabuf_export_args *args)
>> +{
>> + return -EINVAL;
>> +}
>> +
>> +/* ------------------------------------------------------------------ */
>> +/* DMA buffer import support. */
>> +/* ------------------------------------------------------------------ */
>> +
>> +struct gntdev_dmabuf *
>> +gntdev_dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct device *dev,
>> + int fd, int count, int domid)
>> +{
>> + return ERR_PTR(-ENOMEM);
>> +}
>> +
>> +u32 *gntdev_dmabuf_imp_get_refs(struct gntdev_dmabuf *gntdev_dmabuf)
>> +{
>> + return NULL;
>> +}
>> +
>> +int gntdev_dmabuf_imp_release(struct gntdev_dmabuf_priv *priv, u32 fd)
>> +{
>> + return -EINVAL;
>> +}
>> +
>> +struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void)
>> +{
>> + struct gntdev_dmabuf_priv *priv;
>> +
>> + priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>> + if (!priv)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + return priv;
>> +}
>> +
>> +void gntdev_dmabuf_fini(struct gntdev_dmabuf_priv *priv)
>> +{
>> + kfree(priv);
>> +}
>> diff --git a/drivers/xen/gntdev-dmabuf.h b/drivers/xen/gntdev-dmabuf.h
>> new file mode 100644
>> index 000000000000..040b2de904ac
>> --- /dev/null
>> +++ b/drivers/xen/gntdev-dmabuf.h
>> @@ -0,0 +1,41 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +/*
>> + * Xen dma-buf functionality for gntdev.
>> + *
>> + * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
>> + */
>> +
>> +#ifndef _GNTDEV_DMABUF_H
>> +#define _GNTDEV_DMABUF_H
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/errno.h>
>> +#include <linux/types.h>
>> +
>> +struct gntdev_dmabuf_priv;
>> +struct gntdev_dmabuf;
>> +struct device;
>> +
>> +struct gntdev_dmabuf_export_args {
>> + int dummy;
>> +};
>
> Please define the full structure (at least what you have in the next
> patch) here.
Ok, will define what I have in the next patch, but won't
initialize anything until the next patch. Will this work for you?
>
>> +
>> +struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void);
>> +
>> +void gntdev_dmabuf_fini(struct gntdev_dmabuf_priv *priv);
>> +
>> +int gntdev_dmabuf_exp_from_pages(struct gntdev_dmabuf_export_args *args);
>> +
>> +int gntdev_dmabuf_exp_wait_released(struct gntdev_dmabuf_priv *priv, int fd,
>> + int wait_to_ms);
>> +
>> +struct gntdev_dmabuf *
>> +gntdev_dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct device *dev,
>> + int fd, int count, int domid);
>> +
>> +u32 *gntdev_dmabuf_imp_get_refs(struct gntdev_dmabuf *gntdev_dmabuf);
>> +
>> +int gntdev_dmabuf_imp_release(struct gntdev_dmabuf_priv *priv, u32 fd);
>> +
>> +#endif
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index 9813fc440c70..7d58dfb3e5e8 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
> ...
>
>>
>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
> This code belongs in gntdev-dmabuf.c.
The reason I have this code here is that it is heavily
tied to gntdev's internal functionality, e.g. map/unmap.
I do not want to extend gntdev's API, so gntdev-dmabuf can
access these. What is more dma-buf doesn't need to know about
maps done by gntdev as there is no use of that information
in gntdev-dmabuf. So, it seems more naturally to have
dma-buf's related map/unmap code where it is: in gntdev.
>
>> +/* ------------------------------------------------------------------ */
>> +/* DMA buffer export support. */
>> +/* ------------------------------------------------------------------ */
>> +
>> +int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
>> + int count, u32 domid, u32 *refs, u32 *fd)
>> +{
>> + /* XXX: this will need to work with gntdev's map, so leave it here. */
> This doesn't help understanding what's going on (at least to me) and is
> removed in the next patch. So no need for this comment.
Will remove the comment
> -boris
>
>> + *fd = -1;
>> + return -EINVAL;
>> +}
>
On 06/05/2018 01:07 AM, Boris Ostrovsky wrote:
> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> 1. Create a dma-buf from grant references provided by the foreign
>> domain. By default dma-buf is backed by system memory pages, but
>> by providing GNTDEV_DMA_FLAG_XXX flags it can also be created
>> as a DMA write-combine/coherent buffer, e.g. allocated with
>> corresponding dma_alloc_xxx API.
>> Export the resulting buffer as a new dma-buf.
>>
>> 2. Implement waiting for the dma-buf to be released: block until the
>> dma-buf with the file descriptor provided is released.
>> If within the time-out provided the buffer is not released then
>> -ETIMEDOUT error is returned. If the buffer with the file descriptor
>> does not exist or has already been released, then -ENOENT is
>> returned. For valid file descriptors this must not be treated as
>> error.
>>
>> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
>> ---
>> drivers/xen/gntdev-dmabuf.c | 393 +++++++++++++++++++++++++++++++++++-
>> drivers/xen/gntdev-dmabuf.h | 9 +-
>> drivers/xen/gntdev.c | 90 ++++++++-
>> 3 files changed, 486 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
>> index 6bedd1387bd9..f612468879b4 100644
>> --- a/drivers/xen/gntdev-dmabuf.c
>> +++ b/drivers/xen/gntdev-dmabuf.c
>> @@ -3,15 +3,58 @@
>> /*
>> * Xen dma-buf functionality for gntdev.
>> *
>> + * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c.
>> + *
>> * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
>> */
>>
>> +#include <linux/dma-buf.h>
>> #include <linux/slab.h>
>>
>> #include "gntdev-dmabuf.h"
>>
>> +struct gntdev_dmabuf {
>> + struct gntdev_dmabuf_priv *priv;
>> + struct dma_buf *dmabuf;
>> + struct list_head next;
>> + int fd;
>> +
>> + union {
>> + struct {
>> + /* Exported buffers are reference counted. */
>> + struct kref refcount;
>> +
>> + struct gntdev_priv *priv;
>> + struct grant_map *map;
>> + void (*release)(struct gntdev_priv *priv,
>> + struct grant_map *map);
>> + } exp;
>> + } u;
>> +
>> + /* Number of pages this buffer has. */
>> + int nr_pages;
>> + /* Pages of this buffer. */
>> + struct page **pages;
>> +};
>> +
>> +struct gntdev_dmabuf_wait_obj {
>> + struct list_head next;
>> + struct gntdev_dmabuf *gntdev_dmabuf;
>> + struct completion completion;
>> +};
>> +
>> +struct gntdev_dmabuf_attachment {
>> + struct sg_table *sgt;
>> + enum dma_data_direction dir;
>> +};
>> +
>> struct gntdev_dmabuf_priv {
>> - int dummy;
>> + /* List of exported DMA buffers. */
>> + struct list_head exp_list;
>> + /* List of wait objects. */
>> + struct list_head exp_wait_list;
>> + /* This is the lock which protects dma_buf_xxx lists. */
>> + struct mutex lock;
>> };
>>
>> /* ------------------------------------------------------------------ */
>> @@ -22,19 +65,359 @@ struct gntdev_dmabuf_priv {
>> /* Implementation of wait for exported DMA buffer to be released. */
>> /* ------------------------------------------------------------------ */
>>
>> +static void dmabuf_exp_release(struct kref *kref);
>> +
>> +static struct gntdev_dmabuf_wait_obj *
>> +dmabuf_exp_wait_obj_new(struct gntdev_dmabuf_priv *priv,
>> + struct gntdev_dmabuf *gntdev_dmabuf)
>> +{
>> + struct gntdev_dmabuf_wait_obj *obj;
>> +
>> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
>> + if (!obj)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + init_completion(&obj->completion);
>> + obj->gntdev_dmabuf = gntdev_dmabuf;
>> +
>> + mutex_lock(&priv->lock);
>> + list_add(&obj->next, &priv->exp_wait_list);
>> + /* Put our reference and wait for gntdev_dmabuf's release to fire. */
>> + kref_put(&gntdev_dmabuf->u.exp.refcount, dmabuf_exp_release);
>> + mutex_unlock(&priv->lock);
>> + return obj;
>> +}
>> +
>> +static void dmabuf_exp_wait_obj_free(struct gntdev_dmabuf_priv *priv,
>> + struct gntdev_dmabuf_wait_obj *obj)
>> +{
>> + struct gntdev_dmabuf_wait_obj *cur_obj, *q;
>> +
>> + mutex_lock(&priv->lock);
>> + list_for_each_entry_safe(cur_obj, q, &priv->exp_wait_list, next)
>> + if (cur_obj == obj) {
>> + list_del(&obj->next);
>> + kfree(obj);
>> + break;
>> + }
>> + mutex_unlock(&priv->lock);
>> +}
>> +
>> +static int dmabuf_exp_wait_obj_wait(struct gntdev_dmabuf_wait_obj *obj,
>> + u32 wait_to_ms)
>> +{
>> + if (wait_for_completion_timeout(&obj->completion,
>> + msecs_to_jiffies(wait_to_ms)) <= 0)
>> + return -ETIMEDOUT;
>> +
>> + return 0;
>> +}
>> +
>> +static void dmabuf_exp_wait_obj_signal(struct gntdev_dmabuf_priv *priv,
>> + struct gntdev_dmabuf *gntdev_dmabuf)
>> +{
>> + struct gntdev_dmabuf_wait_obj *obj, *q;
>> +
>> + list_for_each_entry_safe(obj, q, &priv->exp_wait_list, next)
>> + if (obj->gntdev_dmabuf == gntdev_dmabuf) {
>> + pr_debug("Found gntdev_dmabuf in the wait list, wake\n");
>> + complete_all(&obj->completion);
> break ?
sure, thank you
>> + }
>> +}
>> +
>> +static struct gntdev_dmabuf *
>> +dmabuf_exp_wait_obj_get_by_fd(struct gntdev_dmabuf_priv *priv, int fd)
>> +{
>> + struct gntdev_dmabuf *q, *gntdev_dmabuf, *ret = ERR_PTR(-ENOENT);
>> +
>> + mutex_lock(&priv->lock);
>> + list_for_each_entry_safe(gntdev_dmabuf, q, &priv->exp_list, next)
>> + if (gntdev_dmabuf->fd == fd) {
>> + pr_debug("Found gntdev_dmabuf in the wait list\n");
>> + kref_get(&gntdev_dmabuf->u.exp.refcount);
>> + ret = gntdev_dmabuf;
>> + break;
>> + }
>> + mutex_unlock(&priv->lock);
>> + return ret;
>> +}
>> +
>> int gntdev_dmabuf_exp_wait_released(struct gntdev_dmabuf_priv *priv, int fd,
>> int wait_to_ms)
>> {
>> - return -EINVAL;
>> + struct gntdev_dmabuf *gntdev_dmabuf;
>> + struct gntdev_dmabuf_wait_obj *obj;
>> + int ret;
>> +
>> + pr_debug("Will wait for dma-buf with fd %d\n", fd);
>> + /*
>> + * Try to find the DMA buffer: if not found means that
>> + * either the buffer has already been released or file descriptor
>> + * provided is wrong.
>> + */
>> + gntdev_dmabuf = dmabuf_exp_wait_obj_get_by_fd(priv, fd);
>> + if (IS_ERR(gntdev_dmabuf))
>> + return PTR_ERR(gntdev_dmabuf);
>> +
>> + /*
>> + * gntdev_dmabuf still exists and is reference count locked by us now,
>> + * so prepare to wait: allocate wait object and add it to the wait list,
>> + * so we can find it on release.
>> + */
>> + obj = dmabuf_exp_wait_obj_new(priv, gntdev_dmabuf);
>> + if (IS_ERR(obj)) {
>> + pr_err("Failed to setup wait object, ret %ld\n", PTR_ERR(obj));
>
> No need for pr_err. We are out of memory.
Will remove
>
>> + return PTR_ERR(obj);
>> +}
>> +
>> + ret = dmabuf_exp_wait_obj_wait(obj, wait_to_ms);
>> + dmabuf_exp_wait_obj_free(priv, obj);
>> + return ret;
>> }
>>
>> /* ------------------------------------------------------------------ */
>> /* DMA buffer export support. */
>> /* ------------------------------------------------------------------ */
>>
>> +static struct sg_table *
>> +dmabuf_pages_to_sgt(struct page **pages, unsigned int nr_pages)
>> +{
>> + struct sg_table *sgt;
>> + int ret;
>> +
>> + sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
>> + if (!sgt) {
>> + ret = -ENOMEM;
>> + goto out;
>> + }
>> +
>> + ret = sg_alloc_table_from_pages(sgt, pages, nr_pages, 0,
>> + nr_pages << PAGE_SHIFT,
>> + GFP_KERNEL);
>> + if (ret)
>> + goto out;
>> +
>> + return sgt;
>> +
>> +out:
>> + kfree(sgt);
>> + return ERR_PTR(ret);
>> +}
>> +
>> +static int dmabuf_exp_ops_attach(struct dma_buf *dma_buf,
>> + struct device *target_dev,
>> + struct dma_buf_attachment *attach)
>> +{
>> + struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach;
>> +
>> + gntdev_dmabuf_attach = kzalloc(sizeof(*gntdev_dmabuf_attach),
>> + GFP_KERNEL);
>> + if (!gntdev_dmabuf_attach)
>> + return -ENOMEM;
>> +
>> + gntdev_dmabuf_attach->dir = DMA_NONE;
>> + attach->priv = gntdev_dmabuf_attach;
>> + /* Might need to pin the pages of the buffer now. */
>
> Who is supposed to pin the pages? The caller?
Ok, as we do not implement .mmap for Xen dma-buf and there is
no plan to mmap kernel memory (either ballooned or dma_alloc'ed),
then we can remove this comment as there is no need to pin/unpin
pages.
>
>> + return 0;
>> +}
>> +
>> +static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
>> + struct dma_buf_attachment *attach)
>> +{
>> + struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach = attach->priv;
>> +
>> + if (gntdev_dmabuf_attach) {
>> + struct sg_table *sgt = gntdev_dmabuf_attach->sgt;
>> +
>> + if (sgt) {
>> + if (gntdev_dmabuf_attach->dir != DMA_NONE)
>> + dma_unmap_sg_attrs(attach->dev, sgt->sgl,
>> + sgt->nents,
>> + gntdev_dmabuf_attach->dir,
>> + DMA_ATTR_SKIP_CPU_SYNC);
>> + sg_free_table(sgt);
>> + }
>> +
>> + kfree(sgt);
>> + kfree(gntdev_dmabuf_attach);
>> + attach->priv = NULL;
>> + }
>> + /* Might need to unpin the pages of the buffer now. */
> Same question.
Please see above
>> +}
>> +
>> +static struct sg_table *
>> +dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
>> + enum dma_data_direction dir)
>> +{
>> + struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach = attach->priv;
>> + struct gntdev_dmabuf *gntdev_dmabuf = attach->dmabuf->priv;
>> + struct sg_table *sgt;
>> +
>> + pr_debug("Mapping %d pages for dev %p\n", gntdev_dmabuf->nr_pages,
>> + attach->dev);
>> +
>> + if (WARN_ON(dir == DMA_NONE || !gntdev_dmabuf_attach))
>
> WARN_ON_ONCE. Here and elsewhere.
Why? The UAPI may be used by different applications, thus we might
lose warnings for some of them. Having WARN_ON will show problems
for multiple users, not for the first one.
Does this make sense to still use WARN_ON?
>
>> + return ERR_PTR(-EINVAL);
>> +
>> + /* Return the cached mapping when possible. */
>> + if (gntdev_dmabuf_attach->dir == dir)
>> + return gntdev_dmabuf_attach->sgt;
>> +
>> + /*
>> + * Two mappings with different directions for the same attachment are
>> + * not allowed.
>> + */
>> + if (WARN_ON(gntdev_dmabuf_attach->dir != DMA_NONE))
>> + return ERR_PTR(-EBUSY);
>> +
>> + sgt = dmabuf_pages_to_sgt(gntdev_dmabuf->pages,
>> + gntdev_dmabuf->nr_pages);
>> + if (!IS_ERR(sgt)) {
>> + if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
>> + DMA_ATTR_SKIP_CPU_SYNC)) {
>> + sg_free_table(sgt);
>> + kfree(sgt);
>> + sgt = ERR_PTR(-ENOMEM);
>> + } else {
>> + gntdev_dmabuf_attach->sgt = sgt;
>> + gntdev_dmabuf_attach->dir = dir;
>> + }
>> + }
>> + if (IS_ERR(sgt))
>> + pr_err("Failed to map sg table for dev %p\n", attach->dev);
>> + return sgt;
>> +}
>> +
>> +static void dmabuf_exp_ops_unmap_dma_buf(struct dma_buf_attachment *attach,
>> + struct sg_table *sgt,
>> + enum dma_data_direction dir)
>> +{
>> + /* Not implemented. The unmap is done at dmabuf_exp_ops_detach(). */
>> +}
>> +
>> +static void dmabuf_exp_release(struct kref *kref)
>> +{
>> + struct gntdev_dmabuf *gntdev_dmabuf =
>> + container_of(kref, struct gntdev_dmabuf, u.exp.refcount);
>> +
>> + dmabuf_exp_wait_obj_signal(gntdev_dmabuf->priv, gntdev_dmabuf);
>> + list_del(&gntdev_dmabuf->next);
>> + kfree(gntdev_dmabuf);
>> +}
>> +
>> +static void dmabuf_exp_ops_release(struct dma_buf *dma_buf)
>> +{
>> + struct gntdev_dmabuf *gntdev_dmabuf = dma_buf->priv;
>> + struct gntdev_dmabuf_priv *priv = gntdev_dmabuf->priv;
>> +
>> + gntdev_dmabuf->u.exp.release(gntdev_dmabuf->u.exp.priv,
>> + gntdev_dmabuf->u.exp.map);
>> + mutex_lock(&priv->lock);
>> + kref_put(&gntdev_dmabuf->u.exp.refcount, dmabuf_exp_release);
>> + mutex_unlock(&priv->lock);
>> +}
>> +
>> +static void *dmabuf_exp_ops_kmap_atomic(struct dma_buf *dma_buf,
>> + unsigned long page_num)
>> +{
>> + /* Not implemented. */
>> + return NULL;
>> +}
>> +
>> +static void dmabuf_exp_ops_kunmap_atomic(struct dma_buf *dma_buf,
>> + unsigned long page_num, void *addr)
>> +{
>> + /* Not implemented. */
>> +}
>> +
>> +static void *dmabuf_exp_ops_kmap(struct dma_buf *dma_buf,
>> + unsigned long page_num)
>> +{
>> + /* Not implemented. */
>> + return NULL;
>> +}
>> +
>> +static void dmabuf_exp_ops_kunmap(struct dma_buf *dma_buf,
>> + unsigned long page_num, void *addr)
>> +{
>> + /* Not implemented. */
>> +}
>> +
>> +static int dmabuf_exp_ops_mmap(struct dma_buf *dma_buf,
>> + struct vm_area_struct *vma)
>> +{
>> + /* Not implemented. */
>> + return 0;
>> +}
>> +
>> +static const struct dma_buf_ops dmabuf_exp_ops = {
>> + .attach = dmabuf_exp_ops_attach,
>> + .detach = dmabuf_exp_ops_detach,
>> + .map_dma_buf = dmabuf_exp_ops_map_dma_buf,
>> + .unmap_dma_buf = dmabuf_exp_ops_unmap_dma_buf,
>> + .release = dmabuf_exp_ops_release,
>> + .map = dmabuf_exp_ops_kmap,
>> + .map_atomic = dmabuf_exp_ops_kmap_atomic,
>> + .unmap = dmabuf_exp_ops_kunmap,
>> + .unmap_atomic = dmabuf_exp_ops_kunmap_atomic,
>> + .mmap = dmabuf_exp_ops_mmap,
>> +};
>> +
>> int gntdev_dmabuf_exp_from_pages(struct gntdev_dmabuf_export_args *args)
>> {
>> - return -EINVAL;
>> + DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>> + struct gntdev_dmabuf *gntdev_dmabuf;
>> + int ret = 0;
>> +
>> + gntdev_dmabuf = kzalloc(sizeof(*gntdev_dmabuf), GFP_KERNEL);
>> + if (!gntdev_dmabuf)
>> + return -ENOMEM;
>> +
>> + kref_init(&gntdev_dmabuf->u.exp.refcount);
>> +
>> + gntdev_dmabuf->priv = args->dmabuf_priv;
>> + gntdev_dmabuf->nr_pages = args->count;
>> + gntdev_dmabuf->pages = args->pages;
>> + gntdev_dmabuf->u.exp.priv = args->priv;
>> + gntdev_dmabuf->u.exp.map = args->map;
>> + gntdev_dmabuf->u.exp.release = args->release;
>> +
>> + exp_info.exp_name = KBUILD_MODNAME;
>> + if (args->dev->driver && args->dev->driver->owner)
>> + exp_info.owner = args->dev->driver->owner;
>> + else
>> + exp_info.owner = THIS_MODULE;
>> + exp_info.ops = &dmabuf_exp_ops;
>> + exp_info.size = args->count << PAGE_SHIFT;
>> + exp_info.flags = O_RDWR;
>> + exp_info.priv = gntdev_dmabuf;
>> +
>> + gntdev_dmabuf->dmabuf = dma_buf_export(&exp_info);
>> + if (IS_ERR(gntdev_dmabuf->dmabuf)) {
>> + ret = PTR_ERR(gntdev_dmabuf->dmabuf);
>> + gntdev_dmabuf->dmabuf = NULL;
>> + goto fail;
>> + }
>> +
>> + ret = dma_buf_fd(gntdev_dmabuf->dmabuf, O_CLOEXEC);
>> + if (ret < 0)
>> + goto fail;
>> +
>> + gntdev_dmabuf->fd = ret;
>> + args->fd = ret;
>> +
>> + pr_debug("Exporting DMA buffer with fd %d\n", ret);
>> +
>> + mutex_lock(&args->dmabuf_priv->lock);
>> + list_add(&gntdev_dmabuf->next, &args->dmabuf_priv->exp_list);
>> + mutex_unlock(&args->dmabuf_priv->lock);
>> + return 0;
>> +
>> +fail:
>> + if (gntdev_dmabuf->dmabuf)
>> + dma_buf_put(gntdev_dmabuf->dmabuf);
>> + kfree(gntdev_dmabuf);
>> + return ret;
>> }
>>
>> /* ------------------------------------------------------------------ */
>> @@ -66,6 +449,10 @@ struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void)
>> if (!priv)
>> return ERR_PTR(-ENOMEM);
>>
>> + mutex_init(&priv->lock);
>> + INIT_LIST_HEAD(&priv->exp_list);
>> + INIT_LIST_HEAD(&priv->exp_wait_list);
>> +
>> return priv;
>> }
>>
>> diff --git a/drivers/xen/gntdev-dmabuf.h b/drivers/xen/gntdev-dmabuf.h
>> index 040b2de904ac..95c23a24f640 100644
>> --- a/drivers/xen/gntdev-dmabuf.h
>> +++ b/drivers/xen/gntdev-dmabuf.h
>> @@ -18,7 +18,14 @@ struct gntdev_dmabuf;
>> struct device;
>>
>> struct gntdev_dmabuf_export_args {
>> - int dummy;
>> + struct gntdev_priv *priv;
>> + struct grant_map *map;
>> + void (*release)(struct gntdev_priv *priv, struct grant_map *map);
>> + struct gntdev_dmabuf_priv *dmabuf_priv;
>> + struct device *dev;
>> + int count;
>> + struct page **pages;
>> + u32 fd;
>> };
>>
>> struct gntdev_dmabuf_priv *gntdev_dmabuf_init(void);
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index 7d58dfb3e5e8..cf255d45f20f 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
>> @@ -319,6 +319,16 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map)
>> gntdev_free_map(map);
>> }
>>
>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>> +static void gntdev_remove_map(struct gntdev_priv *priv, struct grant_map *map)
>> +{
>> + mutex_lock(&priv->lock);
>> + list_del(&map->next);
>> + gntdev_put_map(NULL /* already removed */, map);
>> + mutex_unlock(&priv->lock);
>> +}
>> +#endif
>> +
>> /* ------------------------------------------------------------------ */
>>
>> static int find_grant_ptes(pte_t *pte, pgtable_t token,
>> @@ -1063,12 +1073,88 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
>> /* DMA buffer export support. */
>> /* ------------------------------------------------------------------ */
>>
>> +static struct grant_map *
>> +dmabuf_exp_alloc_backing_storage(struct gntdev_priv *priv, int dmabuf_flags,
>> + int count)
>> +{
>> + struct grant_map *map;
>> +
>> + if (unlikely(count <= 0))
>> + return ERR_PTR(-EINVAL);
>> +
>> + if ((dmabuf_flags & GNTDEV_DMA_FLAG_WC) &&
>> + (dmabuf_flags & GNTDEV_DMA_FLAG_COHERENT)) {
>> + pr_err("Wrong dma-buf flags: either WC or coherent, not both\n");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> + map = gntdev_alloc_map(priv, count, dmabuf_flags);
>> + if (!map)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + if (unlikely(atomic_add_return(count, &pages_mapped) > limit)) {
>> + pr_err("can't map: over limit\n");
>> + gntdev_put_map(NULL, map);
>> + return ERR_PTR(-ENOMEM);
>> + }
>> + return map;
>> +}
>> +
>> int gntdev_dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags,
>> int count, u32 domid, u32 *refs, u32 *fd)
>> {
>> - /* XXX: this will need to work with gntdev's map, so leave it here. */
>> + struct grant_map *map;
>> + struct gntdev_dmabuf_export_args args;
>> + int i, ret;
>> +
>> *fd = -1;
>> - return -EINVAL;
>> +
>> + if (use_ptemod) {
>> + pr_err("Cannot provide dma-buf: use_ptemode %d\n",
>> + use_ptemod);
> No pr_err here please. This can potentially become a DoS vector as it
> comes directly from ioctl.
>
> I would, in fact, revisit other uses of pr_err in this file.
Sure, all of pr_err can actually be pr_debug...
>> + return -EINVAL;
>> + }
>> +
>> + map = dmabuf_exp_alloc_backing_storage(priv, flags, count);
>
> @count comes from userspace. dmabuf_exp_alloc_backing_storage only
> checks for it to be >0. Should it be checked for some sane max value?
This is not easy as it is hard to tell what could be that
max value. For DMA buffers if count is too big then allocation
will fail, so need to check for max here (dma_alloc_{xxx} will
filter out too big allocations).
For Xen balloon allocations I cannot tell what could be that
max value neither. Tough question how to limit.
>
> -boris
Thank you,
Oleksandr
>> + if (IS_ERR(map))
>> + return PTR_ERR(map);
>> +
>> + for (i = 0; i < count; i++) {
>> + map->grants[i].domid = domid;
>> + map->grants[i].ref = refs[i];
>> + }
>> +
>> + mutex_lock(&priv->lock);
>> + gntdev_add_map(priv, map);
>> + mutex_unlock(&priv->lock);
>> +
>> + map->flags |= GNTMAP_host_map;
>> +#if defined(CONFIG_X86)
>> + map->flags |= GNTMAP_device_map;
>> +#endif
>> +
>> + ret = map_grant_pages(map);
>> + if (ret < 0)
>> + goto out;
>> +
>> + args.priv = priv;
>> + args.map = map;
>> + args.release = gntdev_remove_map;
>> + args.dev = priv->dma_dev;
>> + args.dmabuf_priv = priv->dmabuf_priv;
>> + args.count = map->count;
>> + args.pages = map->pages;
>> +
>> + ret = gntdev_dmabuf_exp_from_pages(&args);
>> + if (ret < 0)
>> + goto out;
>> +
>> + *fd = args.fd;
>> + return 0;
>> +
>> +out:
>> + gntdev_remove_map(priv, map);
>> + return ret;
>> }
>>
>> /* ------------------------------------------------------------------ */
On 06/05/2018 01:28 AM, Boris Ostrovsky wrote:
> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> /* ------------------------------------------------------------------ */
>>
>> +static int
>> +dmabuf_imp_grant_foreign_access(struct page **pages, u32 *refs,
>> + int count, int domid)
>> +{
>> + grant_ref_t priv_gref_head;
>> + int i, ret;
>> +
>> + ret = gnttab_alloc_grant_references(count, &priv_gref_head);
>> + if (ret < 0) {
>> + pr_err("Cannot allocate grant references, ret %d\n", ret);
>> + return ret;
>> + }
>> +
>> + for (i = 0; i < count; i++) {
>> + int cur_ref;
>> +
>> + cur_ref = gnttab_claim_grant_reference(&priv_gref_head);
>> + if (cur_ref < 0) {
>> + ret = cur_ref;
>> + pr_err("Cannot claim grant reference, ret %d\n", ret);
>> + goto out;
>> + }
>> +
>> + gnttab_grant_foreign_access_ref(cur_ref, domid,
>> + xen_page_to_gfn(pages[i]), 0);
>> + refs[i] = cur_ref;
>> + }
>> +
>> + ret = 0;
> return 0?
My bad, thank you
>> +
>> +out:
>> + gnttab_free_grant_references(priv_gref_head);
>> + return ret;
>> +}
>> +
On 06/05/2018 01:36 AM, Boris Ostrovsky wrote:
> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <[email protected]>
>>
>> Allow creating grant device context for use by kernel modules which
>> require functionality, provided by gntdev. Export symbols for dma-buf
>> API provided by the module.
> Can you give an example of who'd be using these interfaces?
There is no use-case at the moment I can think of, but hyper dma-buf
[1], [2]
I let Intel folks (CCed) to defend this patch as it was done primarily
for them
and I don't use it in any of my use-cases. So, from this POV it can be
dropped,
at least from this series.
>
> -boris
>
[1] https://patchwork.freedesktop.org/series/38207/
[2] https://patchwork.freedesktop.org/patch/204447/
On 06/06/2018 03:24 AM, Oleksandr Andrushchenko wrote:
> On 06/04/2018 07:37 PM, Boris Ostrovsky wrote:
>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>>> diff --git a/include/xen/mem-reservation.h
>>> b/include/xen/mem-reservation.h
>>> new file mode 100644
>>> index 000000000000..a727d65a1e61
>>> --- /dev/null
>>> +++ b/include/xen/mem-reservation.h
>>> @@ -0,0 +1,65 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +
>>> +/*
>>> + * Xen memory reservation utilities.
>>> + *
>>> + * Copyright (c) 2003, B Dragovic
>>> + * Copyright (c) 2003-2004, M Williamson, K Fraser
>>> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
>>> + * Copyright (c) 2010 Daniel Kiper
>>> + * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
>>> + */
>>> +
>>> +#ifndef _XENMEM_RESERVATION_H
>>> +#define _XENMEM_RESERVATION_H
>>> +
>>> +#include <linux/kernel.h>
>>> +#include <linux/slab.h>
>>> +
>>> +#include <asm/xen/hypercall.h>
>>> +#include <asm/tlb.h>
>>> +
>>> +#include <xen/interface/memory.h>
>>> +#include <xen/page.h>
>>> +
>>> +#ifdef CONFIG_XEN_SCRUB_PAGES
>>> +void xenmem_reservation_scrub_page(struct page *page);
>>> +#else
>>> +static inline void xenmem_reservation_scrub_page(struct page *page)
>>> +{
>>> +}
>>> +#endif
>>
>> Given that this is a wrapper around a single call I'd prefer
>>
>> inline void xenmem_reservation_scrub_page(struct page *page)
>> {
>> #ifdef CONFIG_XEN_SCRUB_PAGES
>> clear_highpage(page);
>> #endif
>> }
> Unfortunately this can't be done because of
> EXPORT_SYMBOL_GPL(xenmem_reservation_scrub_page);
> which is obviously cannot be used for static inline functions.
Why do you need to export it? It's an inline defined in the header file.
Just like clear_highpage().
-boris
> So, I'll keep it as is.
>>
>>
>> -boris
>>
>>
> Thank you,
> Oleksandr
On 06/06/2018 04:14 AM, Oleksandr Andrushchenko wrote:
> On 06/04/2018 11:12 PM, Boris Ostrovsky wrote:
>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> @@ -121,8 +146,27 @@ static void gntdev_free_map(struct grant_map *map)
>> if (map == NULL)
>> return;
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + if (map->dma_vaddr) {
>> + struct gnttab_dma_alloc_args args;
>> +
>> + args.dev = map->dma_dev;
>> + args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>> + args.nr_pages = map->count;
>> + args.pages = map->pages;
>> + args.frames = map->frames;
>> + args.vaddr = map->dma_vaddr;
>> + args.dev_bus_addr = map->dma_bus_addr;
>> +
>> + gnttab_dma_free_pages(&args);
>> + } else
>> +#endif
>> if (map->pages)
>> gnttab_free_pages(map->count, map->pages);
>> +
>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> + kfree(map->frames);
>> +#endif
>>
>> Can this be done under if (map->dma_vaddr) ?
>
>> In other words, is it
>> possible for dma_vaddr to be NULL and still have unallocated frames
>> pointer?
> It is possible to have vaddr == NULL and frames != NULL as we
> allocate frames outside of gnttab_dma_alloc_pages which
> may fail. Calling kfree on NULL pointer is safe,
I am not questioning safety of the code, I would like avoid another ifdef.
> so
> I see no reason to change this code.
>>
>>> kfree(map->pages);
>>> kfree(map->grants);
>>> kfree(map->map_ops);
>>> @@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map *map)
>>> kfree(map);
>>> }
>>> -static struct grant_map *gntdev_alloc_map(struct gntdev_priv
>>> *priv, int count)
>>> +static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv,
>>> int count,
>>> + int dma_flags)
>>> {
>>> struct grant_map *add;
>>> int i;
>>> @@ -155,6 +200,37 @@ static struct grant_map
>>> *gntdev_alloc_map(struct gntdev_priv *priv, int count)
>>> NULL == add->pages)
>>> goto err;
>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>> + add->dma_flags = dma_flags;
>>> +
>>> + /*
>>> + * Check if this mapping is requested to be backed
>>> + * by a DMA buffer.
>>> + */
>>> + if (dma_flags & (GNTDEV_DMA_FLAG_WC | GNTDEV_DMA_FLAG_COHERENT)) {
>>> + struct gnttab_dma_alloc_args args;
>>> +
>>> + add->frames = kcalloc(count, sizeof(add->frames[0]),
>>> + GFP_KERNEL);
>>> + if (!add->frames)
>>> + goto err;
>>> +
>>> + /* Remember the device, so we can free DMA memory. */
>>> + add->dma_dev = priv->dma_dev;
>>> +
>>> + args.dev = priv->dma_dev;
>>> + args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>>> + args.nr_pages = count;
>>> + args.pages = add->pages;
>>> + args.frames = add->frames;
>>> +
>>> + if (gnttab_dma_alloc_pages(&args))
>>> + goto err;
>>> +
>>> + add->dma_vaddr = args.vaddr;
>>> + add->dma_bus_addr = args.dev_bus_addr;
>>> + } else
>>> +#endif
>>> if (gnttab_alloc_pages(count, add->pages))
>>> goto err;
>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct grant_map
>>> *map)
>>> map->unmap_ops[i].handle = map->map_ops[i].handle;
>>> if (use_ptemod)
>>> map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>> + else if (map->dma_vaddr) {
>>> + unsigned long mfn;
>>> +
>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>
>> Not pfn_to_mfn()?
> I'd love to, but pfn_to_mfn is only defined for x86, not ARM: [1] and [2]
> Thus,
>
> drivers/xen/gntdev.c:408:10: error: implicit declaration of function
> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>
> So, I'll keep __pfn_to_mfn
How will this work on non-PV x86?
-boris
On 06/06/2018 05:06 AM, Oleksandr Andrushchenko wrote:
> On 06/04/2018 11:49 PM, Boris Ostrovsky wrote:
>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> +struct gntdev_dmabuf_export_args {
>> + int dummy;
>> +};
>>
>> Please define the full structure (at least what you have in the next
>> patch) here.
> Ok, will define what I have in the next patch, but won't
> initialize anything until the next patch. Will this work for you?
Sure, I just didn't see the need for the dummy argument that you remove
later.
>>
>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>> index 9813fc440c70..7d58dfb3e5e8 100644
>>> --- a/drivers/xen/gntdev.c
>>> +++ b/drivers/xen/gntdev.c
>> ...
>>
>>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>> This code belongs in gntdev-dmabuf.c.
> The reason I have this code here is that it is heavily
> tied to gntdev's internal functionality, e.g. map/unmap.
> I do not want to extend gntdev's API, so gntdev-dmabuf can
> access these. What is more dma-buf doesn't need to know about
> maps done by gntdev as there is no use of that information
> in gntdev-dmabuf. So, it seems more naturally to have
> dma-buf's related map/unmap code where it is: in gntdev.
Sorry, I don't follow. Why would this require extending the API? It's
just moving routines to a different file that is linked to the same module.
Since this is under CONFIG_XEN_GNTDEV_DMABUF then why shouldn't it be in
gntdev-dmabuf.c? In my view that's the file where all dma-related
"stuff" lives.
-boris
-boris
On 06/06/2018 08:46 AM, Oleksandr Andrushchenko wrote:
> On 06/05/2018 01:36 AM, Boris Ostrovsky wrote:
>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <[email protected]>
>>>
>>> Allow creating grant device context for use by kernel modules which
>>> require functionality, provided by gntdev. Export symbols for dma-buf
>>> API provided by the module.
>> Can you give an example of who'd be using these interfaces?
> There is no use-case at the moment I can think of, but hyper dma-buf
> [1], [2]
> I let Intel folks (CCed) to defend this patch as it was done primarily
> for them
> and I don't use it in any of my use-cases. So, from this POV it can be
> dropped,
> at least from this series.
Yes, let's drop this until someone actually needs it.
-boris
>>
>> -boris
>>
> [1] https://patchwork.freedesktop.org/series/38207/
> [2] https://patchwork.freedesktop.org/patch/204447/
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> https://lists.xenproject.org/mailman/listinfo/xen-devel
On 06/06/2018 08:10 AM, Oleksandr Andrushchenko wrote:
> On 06/05/2018 01:07 AM, Boris Ostrovsky wrote:
>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>> +
>> +static struct sg_table *
>> +dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
>> + enum dma_data_direction dir)
>> +{
>> + struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach =
>> attach->priv;
>> + struct gntdev_dmabuf *gntdev_dmabuf = attach->dmabuf->priv;
>> + struct sg_table *sgt;
>> +
>> + pr_debug("Mapping %d pages for dev %p\n", gntdev_dmabuf->nr_pages,
>> + attach->dev);
>> +
>> + if (WARN_ON(dir == DMA_NONE || !gntdev_dmabuf_attach))
>>
>> WARN_ON_ONCE. Here and elsewhere.
> Why? The UAPI may be used by different applications, thus we might
> lose warnings for some of them. Having WARN_ON will show problems
> for multiple users, not for the first one.
> Does this make sense to still use WARN_ON?
Just as with pr_err call somewhere else the concern here is that
userland (which I think is where this is eventually called from?) may
intentionally trigger the error, flooding the log.
And even this is not directly called from userland there is still a
possibility of triggering this error multiple times.
>>
>>> +
>>> + if (use_ptemod) {
>>> + pr_err("Cannot provide dma-buf: use_ptemode %d\n",
>>> + use_ptemod);
>> No pr_err here please. This can potentially become a DoS vector as it
>> comes directly from ioctl.
>>
>> I would, in fact, revisit other uses of pr_err in this file.
> Sure, all of pr_err can actually be pr_debug...
I'd check even further and see if any prink is needed. I think I saw a
couple that were not especially useful.
>>> + return -EINVAL;
>>> + }
>>> +
>>> + map = dmabuf_exp_alloc_backing_storage(priv, flags, count);
>>
>> @count comes from userspace. dmabuf_exp_alloc_backing_storage only
>> checks for it to be >0. Should it be checked for some sane max value?
> This is not easy as it is hard to tell what could be that
> max value. For DMA buffers if count is too big then allocation
> will fail, so need to check for max here (dma_alloc_{xxx} will
> filter out too big allocations).
OK, that may be sufficient. BTW, I believe there were other loops with
@count being the control variable. Please see if a user can pass a bogus
value.
> For Xen balloon allocations I cannot tell what could be that
> max value neither. Tough question how to limit.
I think in balloon there is also a guarantee (of sorts) that something
prior to a loop will fail.
-boris
On Wed, Jun 06, 2018 at 05:51:38PM -0400, Boris Ostrovsky wrote:
> On 06/06/2018 08:46 AM, Oleksandr Andrushchenko wrote:
> > On 06/05/2018 01:36 AM, Boris Ostrovsky wrote:
> >> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> >>> From: Oleksandr Andrushchenko <[email protected]>
> >>>
> >>> Allow creating grant device context for use by kernel modules which
> >>> require functionality, provided by gntdev. Export symbols for dma-buf
> >>> API provided by the module.
> >> Can you give an example of who'd be using these interfaces?
> > There is no use-case at the moment I can think of, but hyper dma-buf
> > [1], [2]
> > I let Intel folks (CCed) to defend this patch as it was done primarily
> > for them
> > and I don't use it in any of my use-cases. So, from this POV it can be
> > dropped,
> > at least from this series.
>
>
> Yes, let's drop this until someone actually needs it.
>
> -boris
I agree. We are working on re-architecturing hyper_dmabuf. We would use zcopy
apis however, not sure if we are going to do it from kernel or from userspace.
So please do not expose those for now.
>
>
> >>
> >> -boris
> >>
> > [1] https://patchwork.freedesktop.org/series/38207/
> > [2] https://patchwork.freedesktop.org/patch/204447/
> >
> > _______________________________________________
> > Xen-devel mailing list
> > [email protected]
> > https://lists.xenproject.org/mailman/listinfo/xen-devel
>
On 06/07/2018 12:09 AM, Boris Ostrovsky wrote:
> On 06/06/2018 03:24 AM, Oleksandr Andrushchenko wrote:
>> On 06/04/2018 07:37 PM, Boris Ostrovsky wrote:
>>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>>>> diff --git a/include/xen/mem-reservation.h
>>>> b/include/xen/mem-reservation.h
>>>> new file mode 100644
>>>> index 000000000000..a727d65a1e61
>>>> --- /dev/null
>>>> +++ b/include/xen/mem-reservation.h
>>>> @@ -0,0 +1,65 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>> +
>>>> +/*
>>>> + * Xen memory reservation utilities.
>>>> + *
>>>> + * Copyright (c) 2003, B Dragovic
>>>> + * Copyright (c) 2003-2004, M Williamson, K Fraser
>>>> + * Copyright (c) 2005 Dan M. Smith, IBM Corporation
>>>> + * Copyright (c) 2010 Daniel Kiper
>>>> + * Copyright (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc.
>>>> + */
>>>> +
>>>> +#ifndef _XENMEM_RESERVATION_H
>>>> +#define _XENMEM_RESERVATION_H
>>>> +
>>>> +#include <linux/kernel.h>
>>>> +#include <linux/slab.h>
>>>> +
>>>> +#include <asm/xen/hypercall.h>
>>>> +#include <asm/tlb.h>
>>>> +
>>>> +#include <xen/interface/memory.h>
>>>> +#include <xen/page.h>
>>>> +
>>>> +#ifdef CONFIG_XEN_SCRUB_PAGES
>>>> +void xenmem_reservation_scrub_page(struct page *page);
>>>> +#else
>>>> +static inline void xenmem_reservation_scrub_page(struct page *page)
>>>> +{
>>>> +}
>>>> +#endif
>>> Given that this is a wrapper around a single call I'd prefer
>>>
>>> inline void xenmem_reservation_scrub_page(struct page *page)
>>> {
>>> #ifdef CONFIG_XEN_SCRUB_PAGES
>>> clear_highpage(page);
>>> #endif
>>> }
>> Unfortunately this can't be done because of
>> EXPORT_SYMBOL_GPL(xenmem_reservation_scrub_page);
>> which is obviously cannot be used for static inline functions.
>
>
> Why do you need to export it? It's an inline defined in the header file.
> Just like clear_highpage().
You are perfectly right, will change as you suggest
>
> -boris
Thank you,
Oleksandr
>> So, I'll keep it as is.
>>>
>>> -boris
>>>
>>>
>> Thank you,
>> Oleksandr
On 06/07/2018 12:19 AM, Boris Ostrovsky wrote:
> On 06/06/2018 04:14 AM, Oleksandr Andrushchenko wrote:
>> On 06/04/2018 11:12 PM, Boris Ostrovsky wrote:
>>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>>> @@ -121,8 +146,27 @@ static void gntdev_free_map(struct grant_map *map)
>>> if (map == NULL)
>>> return;
>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
*Option 1: kfree(map->frames);*
>>> + if (map->dma_vaddr) {
>>> + struct gnttab_dma_alloc_args args;
>>> +
>>> + args.dev = map->dma_dev;
>>> + args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>>> + args.nr_pages = map->count;
>>> + args.pages = map->pages;
>>> + args.frames = map->frames;
>>> + args.vaddr = map->dma_vaddr;
>>> + args.dev_bus_addr = map->dma_bus_addr;
>>> +
>>> + gnttab_dma_free_pages(&args);
*Option 2: kfree(map->frames);*
>>> + } else
>>> +#endif
>>> if (map->pages)
>>> gnttab_free_pages(map->count, map->pages);
>>> +
>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>> + kfree(map->frames);
>>> +#endif
>>>
>>> Can this be done under if (map->dma_vaddr) ?
>>> In other words, is it
>>> possible for dma_vaddr to be NULL and still have unallocated frames
>>> pointer?
>> It is possible to have vaddr == NULL and frames != NULL as we
>> allocate frames outside of gnttab_dma_alloc_pages which
>> may fail. Calling kfree on NULL pointer is safe,
>
> I am not questioning safety of the code, I would like avoid another ifdef.
Ah, I now understand, so you are asking if we can have
that kfree(map->frames); in the place *Option 2* I marked above.
Unfortunately no: map->frames is allocated before we try to
allocate DMA memory, e.g. before dma_vaddr is set:
[...]
add->frames = kcalloc(count, sizeof(add->frames[0]),
GFP_KERNEL);
if (!add->frames)
goto err;
[...]
if (gnttab_dma_alloc_pages(&args))
goto err;
add->dma_vaddr = args.vaddr;
[...]
err:
gntdev_free_map(add);
So, it is possible to enter gntdev_free_map with
frames != NULL and dma_vaddr == NULL. Option 1 above cannot be used
as map->frames is needed for gnttab_dma_free_pages(&args);
and Option 2 cannot be used as frames != NULL and dma_vaddr == NULL.
Thus, I think that unfortunately we need that #ifdef.
Option 3 below can also be considered, but that seems to be not good
as we free resources in different places which looks inconsistent.
Sorry if I'm still missing your point.
>
>> so
>> I see no reason to change this code.
>>>> kfree(map->pages);
>>>> kfree(map->grants);
>>>> kfree(map->map_ops);
>>>> @@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map *map)
>>>> kfree(map);
>>>> }
>>>> -static struct grant_map *gntdev_alloc_map(struct gntdev_priv
>>>> *priv, int count)
>>>> +static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv,
>>>> int count,
>>>> + int dma_flags)
>>>> {
>>>> struct grant_map *add;
>>>> int i;
>>>> @@ -155,6 +200,37 @@ static struct grant_map
>>>> *gntdev_alloc_map(struct gntdev_priv *priv, int count)
>>>> NULL == add->pages)
>>>> goto err;
>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>> + add->dma_flags = dma_flags;
>>>> +
>>>> + /*
>>>> + * Check if this mapping is requested to be backed
>>>> + * by a DMA buffer.
>>>> + */
>>>> + if (dma_flags & (GNTDEV_DMA_FLAG_WC | GNTDEV_DMA_FLAG_COHERENT)) {
>>>> + struct gnttab_dma_alloc_args args;
>>>> +
>>>> + add->frames = kcalloc(count, sizeof(add->frames[0]),
>>>> + GFP_KERNEL);
>>>> + if (!add->frames)
>>>> + goto err;
>>>> +
>>>> + /* Remember the device, so we can free DMA memory. */
>>>> + add->dma_dev = priv->dma_dev;
>>>> +
>>>> + args.dev = priv->dma_dev;
>>>> + args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>>>> + args.nr_pages = count;
>>>> + args.pages = add->pages;
>>>> + args.frames = add->frames;
>>>> +
>>>> + if (gnttab_dma_alloc_pages(&args))
*Option 3: kfree(map->frames);*
>>>> + goto err;
>>>> +
>>>> + add->dma_vaddr = args.vaddr;
>>>> + add->dma_bus_addr = args.dev_bus_addr;
>>>> + } else
>>>> +#endif
>>>> if (gnttab_alloc_pages(count, add->pages))
>>>> goto err;
>>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct grant_map
>>>> *map)
>>>> map->unmap_ops[i].handle = map->map_ops[i].handle;
>>>> if (use_ptemod)
>>>> map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>> + else if (map->dma_vaddr) {
>>>> + unsigned long mfn;
>>>> +
>>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>> Not pfn_to_mfn()?
>> I'd love to, but pfn_to_mfn is only defined for x86, not ARM: [1] and [2]
>> Thus,
>>
>> drivers/xen/gntdev.c:408:10: error: implicit declaration of function
>> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>
>> So, I'll keep __pfn_to_mfn
>
> How will this work on non-PV x86?
So, you mean I need:
#ifdef CONFIG_X86
mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
#else
mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
#endif
> -boris
>
>
Thank you,
Oleksandr
On 06/07/2018 12:32 AM, Boris Ostrovsky wrote:
> On 06/06/2018 05:06 AM, Oleksandr Andrushchenko wrote:
>> On 06/04/2018 11:49 PM, Boris Ostrovsky wrote:
>>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>>> +struct gntdev_dmabuf_export_args {
>>> + int dummy;
>>> +};
>>>
>>> Please define the full structure (at least what you have in the next
>>> patch) here.
>> Ok, will define what I have in the next patch, but won't
>> initialize anything until the next patch. Will this work for you?
> Sure, I just didn't see the need for the dummy argument that you remove
> later.
Ok
>>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>>> index 9813fc440c70..7d58dfb3e5e8 100644
>>>> --- a/drivers/xen/gntdev.c
>>>> +++ b/drivers/xen/gntdev.c
>>> ...
>>>
>>>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>>> This code belongs in gntdev-dmabuf.c.
>> The reason I have this code here is that it is heavily
>> tied to gntdev's internal functionality, e.g. map/unmap.
>> I do not want to extend gntdev's API, so gntdev-dmabuf can
>> access these. What is more dma-buf doesn't need to know about
>> maps done by gntdev as there is no use of that information
>> in gntdev-dmabuf. So, it seems more naturally to have
>> dma-buf's related map/unmap code where it is: in gntdev.
> Sorry, I don't follow. Why would this require extending the API? It's
> just moving routines to a different file that is linked to the same module.
I do understand your intention here and tried to avoid dma-buf
related code in gntdev.c as much as possible. So, let me explain
my decision in more detail.
There are 2 use-cases we have: dma-buf import and export.
While importing a dma-buf all the dma-buf related functionality can
easily be kept inside gntdev-dmabuf.c w/o any issue as all we need
from gntdev.c is dev, dma_buf_fd, count and domid for that.
But in case of dma-buf export we need to:
1. struct grant_map *map = gntdev_alloc_map(priv, count, dmabuf_flags);
2. gntdev_add_map(priv, map);
3. Set map->flags
4. ret = map_grant_pages(map);
5. And only now we are all set to export the new dma-buf from *map->pages*
So, until 5) we use private gtndev.c's API not exported to outside world:
a. struct grant_map
b. static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv,
int count,
int dma_flags)
c. static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map
*add)
d. static int map_grant_pages(struct grant_map *map)
Thus, all the above cannot be accessed from gntdev-dmabuf.c
This is why I say that gntdev.c's API will need to be extended to
provide the above
a-d if we want all dma-buf export code to leave in gntdev-dmabuf.c.
But that doesn't seem good to me and what is more a-d are really gntdev.c's
functionality, not dma-buf's which only needs pages and doesn't really
care from
where those come.
That was the reason I partitioned export into 2 chunks: gntdev +
gntdev-dmabuf.
You might also ask why importing side does Xen related things (granting
references+)
in gntdev-dmabuf, not gntdev so it is consistent with the dma-buf exporter?
This is because importer uses grant-table's API which seems to be not
natural for gntdev.c,
so it can leave in gntdev-dmabuf.c which has a use-case for that, while
gntdev
remains the same.
> Since this is under CONFIG_XEN_GNTDEV_DMABUF then why shouldn't it be in
> gntdev-dmabuf.c? In my view that's the file where all dma-related
> "stuff" lives.
Agree, but IMO grant_map stuff for dma-buf importer is right in its
place in gntdev.c
and all the rest of dma-buf specifics live in gntdev-dmabuf.c as they should
>
> -boris
>
>
> -boris
>
Thank you,
Oleksandr
On 06/07/2018 01:05 AM, Dongwon Kim wrote:
> On Wed, Jun 06, 2018 at 05:51:38PM -0400, Boris Ostrovsky wrote:
>> On 06/06/2018 08:46 AM, Oleksandr Andrushchenko wrote:
>>> On 06/05/2018 01:36 AM, Boris Ostrovsky wrote:
>>>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>>>>> From: Oleksandr Andrushchenko <[email protected]>
>>>>>
>>>>> Allow creating grant device context for use by kernel modules which
>>>>> require functionality, provided by gntdev. Export symbols for dma-buf
>>>>> API provided by the module.
>>>> Can you give an example of who'd be using these interfaces?
>>> There is no use-case at the moment I can think of, but hyper dma-buf
>>> [1], [2]
>>> I let Intel folks (CCed) to defend this patch as it was done primarily
>>> for them
>>> and I don't use it in any of my use-cases. So, from this POV it can be
>>> dropped,
>>> at least from this series.
>>
>> Yes, let's drop this until someone actually needs it.
>>
>> -boris
> I agree. We are working on re-architecturing hyper_dmabuf. We would use zcopy
> apis however, not sure if we are going to do it from kernel or from userspace.
> So please do not expose those for now.
Ok, as we are all on the same page for that then I'll drop this patch
for now
>>
>>>> -boris
>>>>
>>> [1] https://patchwork.freedesktop.org/series/38207/
>>> [2] https://patchwork.freedesktop.org/patch/204447/
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> [email protected]
>>> https://lists.xenproject.org/mailman/listinfo/xen-devel
On 06/07/2018 12:48 AM, Boris Ostrovsky wrote:
> On 06/06/2018 08:10 AM, Oleksandr Andrushchenko wrote:
>> On 06/05/2018 01:07 AM, Boris Ostrovsky wrote:
>>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>
>>> +
>>> +static struct sg_table *
>>> +dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
>>> + enum dma_data_direction dir)
>>> +{
>>> + struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach =
>>> attach->priv;
>>> + struct gntdev_dmabuf *gntdev_dmabuf = attach->dmabuf->priv;
>>> + struct sg_table *sgt;
>>> +
>>> + pr_debug("Mapping %d pages for dev %p\n", gntdev_dmabuf->nr_pages,
>>> + attach->dev);
>>> +
>>> + if (WARN_ON(dir == DMA_NONE || !gntdev_dmabuf_attach))
>>>
>>> WARN_ON_ONCE. Here and elsewhere.
>> Why? The UAPI may be used by different applications, thus we might
>> lose warnings for some of them. Having WARN_ON will show problems
>> for multiple users, not for the first one.
>> Does this make sense to still use WARN_ON?
>
> Just as with pr_err call somewhere else the concern here is that
> userland (which I think is where this is eventually called from?) may
> intentionally trigger the error, flooding the log.
>
> And even this is not directly called from userland there is still a
> possibility of triggering this error multiple times.
Ok, will use WARN_ON_ONCE
>
>>>> +
>>>> + if (use_ptemod) {
>>>> + pr_err("Cannot provide dma-buf: use_ptemode %d\n",
>>>> + use_ptemod);
>>> No pr_err here please. This can potentially become a DoS vector as it
>>> comes directly from ioctl.
>>>
>>> I would, in fact, revisit other uses of pr_err in this file.
>> Sure, all of pr_err can actually be pr_debug...
> I'd check even further and see if any prink is needed. I think I saw a
> couple that were not especially useful.
All those were useful while debugging the code and use-cases,
so I would prefer to have them all still available, but as pr_debug
instead of pr_err
If hyper_dmabuf will use this Xen dma-buf solution then I believe
those will help as well
>
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + map = dmabuf_exp_alloc_backing_storage(priv, flags, count);
>>> @count comes from userspace. dmabuf_exp_alloc_backing_storage only
>>> checks for it to be >0. Should it be checked for some sane max value?
>> This is not easy as it is hard to tell what could be that
>> max value. For DMA buffers if count is too big then allocation
>> will fail, so need to check for max here (dma_alloc_{xxx} will
>> filter out too big allocations).
> OK, that may be sufficient. BTW, I believe there were other loops with
> @count being the control variable. Please see if a user can pass a bogus
> value.
Will check for op.count in IOCTLs
>> For Xen balloon allocations I cannot tell what could be that
>> max value neither. Tough question how to limit.
> I think in balloon there is also a guarantee (of sorts) that something
> prior to a loop will fail.
>
>
> -boris
Thank you,
Oleksandr
(Stefano, question for you at the end)
On 06/07/2018 02:39 AM, Oleksandr Andrushchenko wrote:
> On 06/07/2018 12:19 AM, Boris Ostrovsky wrote:
>> On 06/06/2018 04:14 AM, Oleksandr Andrushchenko wrote:
>>> On 06/04/2018 11:12 PM, Boris Ostrovsky wrote:
>>>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>>>> @@ -121,8 +146,27 @@ static void gntdev_free_map(struct grant_map
>>>> *map)
>>>> if (map == NULL)
>>>> return;
>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> *Option 1: kfree(map->frames);*
>>>> + if (map->dma_vaddr) {
>>>> + struct gnttab_dma_alloc_args args;
>>>> +
>>>> + args.dev = map->dma_dev;
>>>> + args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>>>> + args.nr_pages = map->count;
>>>> + args.pages = map->pages;
>>>> + args.frames = map->frames;
>>>> + args.vaddr = map->dma_vaddr;
>>>> + args.dev_bus_addr = map->dma_bus_addr;
>>>> +
>>>> + gnttab_dma_free_pages(&args);
> *Option 2: kfree(map->frames);*
>>>> + } else
>>>> +#endif
>>>> if (map->pages)
>>>> gnttab_free_pages(map->count, map->pages);
>>>> +
>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>> + kfree(map->frames);
>>>> +#endif
>>>>
>>>> Can this be done under if (map->dma_vaddr) ?
>>>> In other words, is it
>>>> possible for dma_vaddr to be NULL and still have unallocated frames
>>>> pointer?
>>> It is possible to have vaddr == NULL and frames != NULL as we
>>> allocate frames outside of gnttab_dma_alloc_pages which
>>> may fail. Calling kfree on NULL pointer is safe,
>>
>> I am not questioning safety of the code, I would like avoid another
>> ifdef.
> Ah, I now understand, so you are asking if we can have
> that kfree(map->frames); in the place *Option 2* I marked above.
> Unfortunately no: map->frames is allocated before we try to
> allocate DMA memory, e.g. before dma_vaddr is set:
> [...]
> add->frames = kcalloc(count, sizeof(add->frames[0]),
> GFP_KERNEL);
> if (!add->frames)
> goto err;
>
> [...]
> if (gnttab_dma_alloc_pages(&args))
> goto err;
>
> add->dma_vaddr = args.vaddr;
> [...]
> err:
> gntdev_free_map(add);
>
> So, it is possible to enter gntdev_free_map with
> frames != NULL and dma_vaddr == NULL. Option 1 above cannot be used
> as map->frames is needed for gnttab_dma_free_pages(&args);
> and Option 2 cannot be used as frames != NULL and dma_vaddr == NULL.
> Thus, I think that unfortunately we need that #ifdef.
> Option 3 below can also be considered, but that seems to be not good
> as we free resources in different places which looks inconsistent.
I was only thinking of option 2. But if it is possible to have frames !=
NULL and dma_vaddr == NULL then perhaps we indeed will have to live with
the extra ifdef.
>
> Sorry if I'm still missing your point.
>>
>>> so
>>> I see no reason to change this code.
>>>>> kfree(map->pages);
>>>>> kfree(map->grants);
>>>>> kfree(map->map_ops);
>>>>> @@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map
>>>>> *map)
>>>>> kfree(map);
>>>>> }
>>>>> -static struct grant_map *gntdev_alloc_map(struct gntdev_priv
>>>>> *priv, int count)
>>>>> +static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv,
>>>>> int count,
>>>>> + int dma_flags)
>>>>> {
>>>>> struct grant_map *add;
>>>>> int i;
>>>>> @@ -155,6 +200,37 @@ static struct grant_map
>>>>> *gntdev_alloc_map(struct gntdev_priv *priv, int count)
>>>>> NULL == add->pages)
>>>>> goto err;
>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>> + add->dma_flags = dma_flags;
>>>>> +
>>>>> + /*
>>>>> + * Check if this mapping is requested to be backed
>>>>> + * by a DMA buffer.
>>>>> + */
>>>>> + if (dma_flags & (GNTDEV_DMA_FLAG_WC |
>>>>> GNTDEV_DMA_FLAG_COHERENT)) {
>>>>> + struct gnttab_dma_alloc_args args;
>>>>> +
>>>>> + add->frames = kcalloc(count, sizeof(add->frames[0]),
>>>>> + GFP_KERNEL);
>>>>> + if (!add->frames)
>>>>> + goto err;
>>>>> +
>>>>> + /* Remember the device, so we can free DMA memory. */
>>>>> + add->dma_dev = priv->dma_dev;
>>>>> +
>>>>> + args.dev = priv->dma_dev;
>>>>> + args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>>>>> + args.nr_pages = count;
>>>>> + args.pages = add->pages;
>>>>> + args.frames = add->frames;
>>>>> +
>>>>> + if (gnttab_dma_alloc_pages(&args))
> *Option 3: kfree(map->frames);*
>>>>> + goto err;
>>>>> +
>>>>> + add->dma_vaddr = args.vaddr;
>>>>> + add->dma_bus_addr = args.dev_bus_addr;
>>>>> + } else
>>>>> +#endif
>>>>> if (gnttab_alloc_pages(count, add->pages))
>>>>> goto err;
>>>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct grant_map
>>>>> *map)
>>>>> map->unmap_ops[i].handle = map->map_ops[i].handle;
>>>>> if (use_ptemod)
>>>>> map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>> + else if (map->dma_vaddr) {
>>>>> + unsigned long mfn;
>>>>> +
>>>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>> Not pfn_to_mfn()?
>>> I'd love to, but pfn_to_mfn is only defined for x86, not ARM: [1]
>>> and [2]
>>> Thus,
>>>
>>> drivers/xen/gntdev.c:408:10: error: implicit declaration of function
>>> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>
>>> So, I'll keep __pfn_to_mfn
>>
>> How will this work on non-PV x86?
> So, you mean I need:
> #ifdef CONFIG_X86
> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
> #else
> mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
> #endif
>
I'd rather fix it in ARM code. Stefano, why does ARM uses the
underscored version?
-boris
On 06/07/2018 03:17 AM, Oleksandr Andrushchenko wrote:
> On 06/07/2018 12:32 AM, Boris Ostrovsky wrote:
>> On 06/06/2018 05:06 AM, Oleksandr Andrushchenko wrote:
>>> On 06/04/2018 11:49 PM, Boris Ostrovsky wrote:
>>>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>>>> index 9813fc440c70..7d58dfb3e5e8 100644
>>>>> --- a/drivers/xen/gntdev.c
>>>>> +++ b/drivers/xen/gntdev.c
>>>> ...
>>>>
>>>>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>>>> This code belongs in gntdev-dmabuf.c.
>>> The reason I have this code here is that it is heavily
>>> tied to gntdev's internal functionality, e.g. map/unmap.
>>> I do not want to extend gntdev's API, so gntdev-dmabuf can
>>> access these. What is more dma-buf doesn't need to know about
>>> maps done by gntdev as there is no use of that information
>>> in gntdev-dmabuf. So, it seems more naturally to have
>>> dma-buf's related map/unmap code where it is: in gntdev.
>> Sorry, I don't follow. Why would this require extending the API? It's
>> just moving routines to a different file that is linked to the same
>> module.
> I do understand your intention here and tried to avoid dma-buf
> related code in gntdev.c as much as possible. So, let me explain
> my decision in more detail.
>
> There are 2 use-cases we have: dma-buf import and export.
>
> While importing a dma-buf all the dma-buf related functionality can
> easily be kept inside gntdev-dmabuf.c w/o any issue as all we need
> from gntdev.c is dev, dma_buf_fd, count and domid for that.
>
> But in case of dma-buf export we need to:
> 1. struct grant_map *map = gntdev_alloc_map(priv, count, dmabuf_flags);
> 2. gntdev_add_map(priv, map);
> 3. Set map->flags
> 4. ret = map_grant_pages(map);
> 5. And only now we are all set to export the new dma-buf from
> *map->pages*
>
> So, until 5) we use private gtndev.c's API not exported to outside world:
> a. struct grant_map
> b. static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv,
> int count,
> int dma_flags)
> c. static void gntdev_add_map(struct gntdev_priv *priv, struct
> grant_map *add)
> d. static int map_grant_pages(struct grant_map *map)
>
> Thus, all the above cannot be accessed from gntdev-dmabuf.c
> This is why I say that gntdev.c's API will need to be extended to
> provide the above
> a-d if we want all dma-buf export code to leave in gntdev-dmabuf.c.
I still don't understand why you feel this would be extending the API.
These routines and the struct can be declared in local header file and
this header file will not be visible to anyone but gntdev.c and
gntdev-dmabuf.c. You can, for example, put this into gntdev-dmabuf.h
(and then rename it to something else, like gntdev-common.h).
> But that doesn't seem good to me and what is more a-d are really
> gntdev.c's
> functionality, not dma-buf's which only needs pages and doesn't really
> care from
> where those come.
> That was the reason I partitioned export into 2 chunks: gntdev +
> gntdev-dmabuf.
>
> You might also ask why importing side does Xen related things
> (granting references+)
> in gntdev-dmabuf, not gntdev so it is consistent with the dma-buf
> exporter?
> This is because importer uses grant-table's API which seems to be not
> natural for gntdev.c,
> so it can leave in gntdev-dmabuf.c which has a use-case for that,
> while gntdev
> remains the same.
Yet another reason why this code should be moved: importing and
exporting functionalities logically belong together. The fat that they
are implemented using different methods is not relevant IMO.
If you have code which is under ifdef CONFIG_GNTDEV_DMABUF and you have
file called gntdev-dmabuf.c it sort of implies that this code should
live in that file (unless that code is intertwined with other code,
which is not the case here).
-boris
>> Since this is under CONFIG_XEN_GNTDEV_DMABUF then why shouldn't it be in
>> gntdev-dmabuf.c? In my view that's the file where all dma-related
>> "stuff" lives.
> Agree, but IMO grant_map stuff for dma-buf importer is right in its
> place in gntdev.c
> and all the rest of dma-buf specifics live in gntdev-dmabuf.c as they
> should
>>
>> -boris
>>
>>
>> -boris
>>
> Thank you,
> Oleksandr
On 06/07/2018 04:44 AM, Oleksandr Andrushchenko wrote:
> On 06/07/2018 12:48 AM, Boris Ostrovsky wrote:
>> On 06/06/2018 08:10 AM, Oleksandr Andrushchenko wrote:
>>> On 06/05/2018 01:07 AM, Boris Ostrovsky wrote:
>>>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>>
>>>> +
>>>> +static struct sg_table *
>>>> +dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
>>>> + enum dma_data_direction dir)
>>>> +{
>>>> + struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach =
>>>> attach->priv;
>>>> + struct gntdev_dmabuf *gntdev_dmabuf = attach->dmabuf->priv;
>>>> + struct sg_table *sgt;
>>>> +
>>>> + pr_debug("Mapping %d pages for dev %p\n",
>>>> gntdev_dmabuf->nr_pages,
>>>> + attach->dev);
>>>> +
>>>> + if (WARN_ON(dir == DMA_NONE || !gntdev_dmabuf_attach))
>>>>
>>>> WARN_ON_ONCE. Here and elsewhere.
>>> Why? The UAPI may be used by different applications, thus we might
>>> lose warnings for some of them. Having WARN_ON will show problems
>>> for multiple users, not for the first one.
>>> Does this make sense to still use WARN_ON?
>>
>> Just as with pr_err call somewhere else the concern here is that
>> userland (which I think is where this is eventually called from?) may
>> intentionally trigger the error, flooding the log.
>>
>> And even this is not directly called from userland there is still a
>> possibility of triggering this error multiple times.
> Ok, will use WARN_ON_ONCE
In fact, is there a reason to use WARN at all? Does this condition
indicate some sort of internal inconsistency/error?
-boris
On 06/08/2018 01:30 AM, Boris Ostrovsky wrote:
> On 06/07/2018 04:44 AM, Oleksandr Andrushchenko wrote:
>> On 06/07/2018 12:48 AM, Boris Ostrovsky wrote:
>>> On 06/06/2018 08:10 AM, Oleksandr Andrushchenko wrote:
>>>> On 06/05/2018 01:07 AM, Boris Ostrovsky wrote:
>>>>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>>>>> +
>>>>> +static struct sg_table *
>>>>> +dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment *attach,
>>>>> + enum dma_data_direction dir)
>>>>> +{
>>>>> + struct gntdev_dmabuf_attachment *gntdev_dmabuf_attach =
>>>>> attach->priv;
>>>>> + struct gntdev_dmabuf *gntdev_dmabuf = attach->dmabuf->priv;
>>>>> + struct sg_table *sgt;
>>>>> +
>>>>> + pr_debug("Mapping %d pages for dev %p\n",
>>>>> gntdev_dmabuf->nr_pages,
>>>>> + attach->dev);
>>>>> +
>>>>> + if (WARN_ON(dir == DMA_NONE || !gntdev_dmabuf_attach))
>>>>>
>>>>> WARN_ON_ONCE. Here and elsewhere.
>>>> Why? The UAPI may be used by different applications, thus we might
>>>> lose warnings for some of them. Having WARN_ON will show problems
>>>> for multiple users, not for the first one.
>>>> Does this make sense to still use WARN_ON?
>>> Just as with pr_err call somewhere else the concern here is that
>>> userland (which I think is where this is eventually called from?) may
>>> intentionally trigger the error, flooding the log.
>>>
>>> And even this is not directly called from userland there is still a
>>> possibility of triggering this error multiple times.
>> Ok, will use WARN_ON_ONCE
>
> In fact, is there a reason to use WARN at all? Does this condition
> indicate some sort of internal inconsistency/error?
Well, the corresponding errors are anyways handled, so I will remove WARN
> -boris
>
>
>
On 06/08/2018 12:46 AM, Boris Ostrovsky wrote:
> (Stefano, question for you at the end)
>
> On 06/07/2018 02:39 AM, Oleksandr Andrushchenko wrote:
>> On 06/07/2018 12:19 AM, Boris Ostrovsky wrote:
>>> On 06/06/2018 04:14 AM, Oleksandr Andrushchenko wrote:
>>>> On 06/04/2018 11:12 PM, Boris Ostrovsky wrote:
>>>>> On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
>>>>> @@ -121,8 +146,27 @@ static void gntdev_free_map(struct grant_map
>>>>> *map)
>>>>> if (map == NULL)
>>>>> return;
>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>> *Option 1: kfree(map->frames);*
>>>>> + if (map->dma_vaddr) {
>>>>> + struct gnttab_dma_alloc_args args;
>>>>> +
>>>>> + args.dev = map->dma_dev;
>>>>> + args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>>>>> + args.nr_pages = map->count;
>>>>> + args.pages = map->pages;
>>>>> + args.frames = map->frames;
>>>>> + args.vaddr = map->dma_vaddr;
>>>>> + args.dev_bus_addr = map->dma_bus_addr;
>>>>> +
>>>>> + gnttab_dma_free_pages(&args);
>> *Option 2: kfree(map->frames);*
>>>>> + } else
>>>>> +#endif
>>>>> if (map->pages)
>>>>> gnttab_free_pages(map->count, map->pages);
>>>>> +
>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>> + kfree(map->frames);
>>>>> +#endif
>>>>>
>>>>> Can this be done under if (map->dma_vaddr) ?
>>>>> In other words, is it
>>>>> possible for dma_vaddr to be NULL and still have unallocated frames
>>>>> pointer?
>>>> It is possible to have vaddr == NULL and frames != NULL as we
>>>> allocate frames outside of gnttab_dma_alloc_pages which
>>>> may fail. Calling kfree on NULL pointer is safe,
>>> I am not questioning safety of the code, I would like avoid another
>>> ifdef.
>> Ah, I now understand, so you are asking if we can have
>> that kfree(map->frames); in the place *Option 2* I marked above.
>> Unfortunately no: map->frames is allocated before we try to
>> allocate DMA memory, e.g. before dma_vaddr is set:
>> [...]
>> add->frames = kcalloc(count, sizeof(add->frames[0]),
>> GFP_KERNEL);
>> if (!add->frames)
>> goto err;
>>
>> [...]
>> if (gnttab_dma_alloc_pages(&args))
>> goto err;
>>
>> add->dma_vaddr = args.vaddr;
>> [...]
>> err:
>> gntdev_free_map(add);
>>
>> So, it is possible to enter gntdev_free_map with
>> frames != NULL and dma_vaddr == NULL. Option 1 above cannot be used
>> as map->frames is needed for gnttab_dma_free_pages(&args);
>> and Option 2 cannot be used as frames != NULL and dma_vaddr == NULL.
>> Thus, I think that unfortunately we need that #ifdef.
>> Option 3 below can also be considered, but that seems to be not good
>> as we free resources in different places which looks inconsistent.
>
> I was only thinking of option 2. But if it is possible to have frames !=
> NULL and dma_vaddr == NULL then perhaps we indeed will have to live with
> the extra ifdef.
ok
>
>> Sorry if I'm still missing your point.
>>>> so
>>>> I see no reason to change this code.
>>>>>> kfree(map->pages);
>>>>>> kfree(map->grants);
>>>>>> kfree(map->map_ops);
>>>>>> @@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map
>>>>>> *map)
>>>>>> kfree(map);
>>>>>> }
>>>>>> -static struct grant_map *gntdev_alloc_map(struct gntdev_priv
>>>>>> *priv, int count)
>>>>>> +static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv,
>>>>>> int count,
>>>>>> + int dma_flags)
>>>>>> {
>>>>>> struct grant_map *add;
>>>>>> int i;
>>>>>> @@ -155,6 +200,37 @@ static struct grant_map
>>>>>> *gntdev_alloc_map(struct gntdev_priv *priv, int count)
>>>>>> NULL == add->pages)
>>>>>> goto err;
>>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>>> + add->dma_flags = dma_flags;
>>>>>> +
>>>>>> + /*
>>>>>> + * Check if this mapping is requested to be backed
>>>>>> + * by a DMA buffer.
>>>>>> + */
>>>>>> + if (dma_flags & (GNTDEV_DMA_FLAG_WC |
>>>>>> GNTDEV_DMA_FLAG_COHERENT)) {
>>>>>> + struct gnttab_dma_alloc_args args;
>>>>>> +
>>>>>> + add->frames = kcalloc(count, sizeof(add->frames[0]),
>>>>>> + GFP_KERNEL);
>>>>>> + if (!add->frames)
>>>>>> + goto err;
>>>>>> +
>>>>>> + /* Remember the device, so we can free DMA memory. */
>>>>>> + add->dma_dev = priv->dma_dev;
>>>>>> +
>>>>>> + args.dev = priv->dma_dev;
>>>>>> + args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT;
>>>>>> + args.nr_pages = count;
>>>>>> + args.pages = add->pages;
>>>>>> + args.frames = add->frames;
>>>>>> +
>>>>>> + if (gnttab_dma_alloc_pages(&args))
>> *Option 3: kfree(map->frames);*
>>>>>> + goto err;
>>>>>> +
>>>>>> + add->dma_vaddr = args.vaddr;
>>>>>> + add->dma_bus_addr = args.dev_bus_addr;
>>>>>> + } else
>>>>>> +#endif
>>>>>> if (gnttab_alloc_pages(count, add->pages))
>>>>>> goto err;
>>>>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct grant_map
>>>>>> *map)
>>>>>> map->unmap_ops[i].handle = map->map_ops[i].handle;
>>>>>> if (use_ptemod)
>>>>>> map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>>> + else if (map->dma_vaddr) {
>>>>>> + unsigned long mfn;
>>>>>> +
>>>>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>> Not pfn_to_mfn()?
>>>> I'd love to, but pfn_to_mfn is only defined for x86, not ARM: [1]
>>>> and [2]
>>>> Thus,
>>>>
>>>> drivers/xen/gntdev.c:408:10: error: implicit declaration of function
>>>> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>
>>>> So, I'll keep __pfn_to_mfn
>>> How will this work on non-PV x86?
>> So, you mean I need:
>> #ifdef CONFIG_X86
>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>> #else
>> mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>> #endif
>>
> I'd rather fix it in ARM code. Stefano, why does ARM uses the
> underscored version?
Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
with static inline for ARM? e.g.
static inline ...pfn_to_mfn(...)
{
__pfn_to_mfn();
}
>
> -boris
>
Thank you,
Oleksandr
On 06/08/2018 01:26 AM, Boris Ostrovsky wrote:
> On 06/07/2018 03:17 AM, Oleksandr Andrushchenko wrote:
>> On 06/07/2018 12:32 AM, Boris Ostrovsky wrote:
>>> On 06/06/2018 05:06 AM, Oleksandr Andrushchenko wrote:
>>>> On 06/04/2018 11:49 PM, Boris Ostrovsky wrote:
>>>>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>>>>> index 9813fc440c70..7d58dfb3e5e8 100644
>>>>>> --- a/drivers/xen/gntdev.c
>>>>>> +++ b/drivers/xen/gntdev.c
>>>>> ...
>>>>>
>>>>>> +#ifdef CONFIG_XEN_GNTDEV_DMABUF
>>>>> This code belongs in gntdev-dmabuf.c.
>>>> The reason I have this code here is that it is heavily
>>>> tied to gntdev's internal functionality, e.g. map/unmap.
>>>> I do not want to extend gntdev's API, so gntdev-dmabuf can
>>>> access these. What is more dma-buf doesn't need to know about
>>>> maps done by gntdev as there is no use of that information
>>>> in gntdev-dmabuf. So, it seems more naturally to have
>>>> dma-buf's related map/unmap code where it is: in gntdev.
>>> Sorry, I don't follow. Why would this require extending the API? It's
>>> just moving routines to a different file that is linked to the same
>>> module.
>> I do understand your intention here and tried to avoid dma-buf
>> related code in gntdev.c as much as possible. So, let me explain
>> my decision in more detail.
>>
>> There are 2 use-cases we have: dma-buf import and export.
>>
>> While importing a dma-buf all the dma-buf related functionality can
>> easily be kept inside gntdev-dmabuf.c w/o any issue as all we need
>> from gntdev.c is dev, dma_buf_fd, count and domid for that.
>>
>> But in case of dma-buf export we need to:
>> 1. struct grant_map *map = gntdev_alloc_map(priv, count, dmabuf_flags);
>> 2. gntdev_add_map(priv, map);
>> 3. Set map->flags
>> 4. ret = map_grant_pages(map);
>> 5. And only now we are all set to export the new dma-buf from
>> *map->pages*
>>
>> So, until 5) we use private gtndev.c's API not exported to outside world:
>> a. struct grant_map
>> b. static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv,
>> int count,
>> int dma_flags)
>> c. static void gntdev_add_map(struct gntdev_priv *priv, struct
>> grant_map *add)
>> d. static int map_grant_pages(struct grant_map *map)
>>
>> Thus, all the above cannot be accessed from gntdev-dmabuf.c
>> This is why I say that gntdev.c's API will need to be extended to
>> provide the above
>> a-d if we want all dma-buf export code to leave in gntdev-dmabuf.c.
>
>
> I still don't understand why you feel this would be extending the API.
> These routines and the struct can be declared in local header file and
> this header file will not be visible to anyone but gntdev.c and
> gntdev-dmabuf.c.
Ok, this is what I meant: I will need to move private structures
and some function prototypes from gntdev.c into a header file,
thus extending its API: before the header nothing were exposed.
Sorry for not being clear here.
> You can, for example, put this into gntdev-dmabuf.h
> (and then rename it to something else, like gntdev-common.h).
Sure, I will move all I need into that shared header
>
>
>> But that doesn't seem good to me and what is more a-d are really
>> gntdev.c's
>> functionality, not dma-buf's which only needs pages and doesn't really
>> care from
>> where those come.
>> That was the reason I partitioned export into 2 chunks: gntdev +
>> gntdev-dmabuf.
>>
>> You might also ask why importing side does Xen related things
>> (granting references+)
>> in gntdev-dmabuf, not gntdev so it is consistent with the dma-buf
>> exporter?
>> This is because importer uses grant-table's API which seems to be not
>> natural for gntdev.c,
>> so it can leave in gntdev-dmabuf.c which has a use-case for that,
>> while gntdev
>> remains the same.
>
> Yet another reason why this code should be moved: importing and
> exporting functionalities logically belong together. The fat that they
> are implemented using different methods is not relevant IMO.
>
> If you have code which is under ifdef CONFIG_GNTDEV_DMABUF and you have
> file called gntdev-dmabuf.c it sort of implies that this code should
> live in that file (unless that code is intertwined with other code,
> which is not the case here).
Ok, will move as discussed above
>
> -boris
Thank you,
Oleksandr
>
>
>>> Since this is under CONFIG_XEN_GNTDEV_DMABUF then why shouldn't it be in
>>> gntdev-dmabuf.c? In my view that's the file where all dma-related
>>> "stuff" lives.
>> Agree, but IMO grant_map stuff for dma-buf importer is right in its
>> place in gntdev.c
>> and all the rest of dma-buf specifics live in gntdev-dmabuf.c as they
>> should
>>> -boris
>>>
>>>
>>> -boris
>>>
>> Thank you,
>> Oleksandr
On Fri, 8 Jun 2018, Oleksandr Andrushchenko wrote:
> On 06/08/2018 12:46 AM, Boris Ostrovsky wrote:
> > (Stefano, question for you at the end)
> >
> > On 06/07/2018 02:39 AM, Oleksandr Andrushchenko wrote:
> > > On 06/07/2018 12:19 AM, Boris Ostrovsky wrote:
> > > > On 06/06/2018 04:14 AM, Oleksandr Andrushchenko wrote:
> > > > > On 06/04/2018 11:12 PM, Boris Ostrovsky wrote:
> > > > > > On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> > > > > > @@ -121,8 +146,27 @@ static void gntdev_free_map(struct grant_map
> > > > > > *map)
> > > > > > if (map == NULL)
> > > > > > return;
> > > > > > +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> > > *Option 1: kfree(map->frames);*
> > > > > > + if (map->dma_vaddr) {
> > > > > > + struct gnttab_dma_alloc_args args;
> > > > > > +
> > > > > > + args.dev = map->dma_dev;
> > > > > > + args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
> > > > > > + args.nr_pages = map->count;
> > > > > > + args.pages = map->pages;
> > > > > > + args.frames = map->frames;
> > > > > > + args.vaddr = map->dma_vaddr;
> > > > > > + args.dev_bus_addr = map->dma_bus_addr;
> > > > > > +
> > > > > > + gnttab_dma_free_pages(&args);
> > > *Option 2: kfree(map->frames);*
> > > > > > + } else
> > > > > > +#endif
> > > > > > if (map->pages)
> > > > > > gnttab_free_pages(map->count, map->pages);
> > > > > > +
> > > > > > +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> > > > > > + kfree(map->frames);
> > > > > > +#endif
> > > > > >
> > > > > > Can this be done under if (map->dma_vaddr) ?
> > > > > > In other words, is it
> > > > > > possible for dma_vaddr to be NULL and still have unallocated frames
> > > > > > pointer?
> > > > > It is possible to have vaddr == NULL and frames != NULL as we
> > > > > allocate frames outside of gnttab_dma_alloc_pages which
> > > > > may fail. Calling kfree on NULL pointer is safe,
> > > > I am not questioning safety of the code, I would like avoid another
> > > > ifdef.
> > > Ah, I now understand, so you are asking if we can have
> > > that kfree(map->frames); in the place *Option 2* I marked above.
> > > Unfortunately no: map->frames is allocated before we try to
> > > allocate DMA memory, e.g. before dma_vaddr is set:
> > > [...]
> > > add->frames = kcalloc(count, sizeof(add->frames[0]),
> > > GFP_KERNEL);
> > > if (!add->frames)
> > > goto err;
> > >
> > > [...]
> > > if (gnttab_dma_alloc_pages(&args))
> > > goto err;
> > >
> > > add->dma_vaddr = args.vaddr;
> > > [...]
> > > err:
> > > gntdev_free_map(add);
> > >
> > > So, it is possible to enter gntdev_free_map with
> > > frames != NULL and dma_vaddr == NULL. Option 1 above cannot be used
> > > as map->frames is needed for gnttab_dma_free_pages(&args);
> > > and Option 2 cannot be used as frames != NULL and dma_vaddr == NULL.
> > > Thus, I think that unfortunately we need that #ifdef.
> > > Option 3 below can also be considered, but that seems to be not good
> > > as we free resources in different places which looks inconsistent.
> >
> > I was only thinking of option 2. But if it is possible to have frames !=
> > NULL and dma_vaddr == NULL then perhaps we indeed will have to live with
> > the extra ifdef.
> ok
> >
> > > Sorry if I'm still missing your point.
> > > > > so
> > > > > I see no reason to change this code.
> > > > > > > kfree(map->pages);
> > > > > > > kfree(map->grants);
> > > > > > > kfree(map->map_ops);
> > > > > > > @@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map
> > > > > > > *map)
> > > > > > > kfree(map);
> > > > > > > }
> > > > > > > -static struct grant_map *gntdev_alloc_map(struct gntdev_priv
> > > > > > > *priv, int count)
> > > > > > > +static struct grant_map *gntdev_alloc_map(struct gntdev_priv
> > > > > > > *priv,
> > > > > > > int count,
> > > > > > > + int dma_flags)
> > > > > > > {
> > > > > > > struct grant_map *add;
> > > > > > > int i;
> > > > > > > @@ -155,6 +200,37 @@ static struct grant_map
> > > > > > > *gntdev_alloc_map(struct gntdev_priv *priv, int count)
> > > > > > > NULL == add->pages)
> > > > > > > goto err;
> > > > > > > +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> > > > > > > + add->dma_flags = dma_flags;
> > > > > > > +
> > > > > > > + /*
> > > > > > > + * Check if this mapping is requested to be backed
> > > > > > > + * by a DMA buffer.
> > > > > > > + */
> > > > > > > + if (dma_flags & (GNTDEV_DMA_FLAG_WC |
> > > > > > > GNTDEV_DMA_FLAG_COHERENT)) {
> > > > > > > + struct gnttab_dma_alloc_args args;
> > > > > > > +
> > > > > > > + add->frames = kcalloc(count, sizeof(add->frames[0]),
> > > > > > > + GFP_KERNEL);
> > > > > > > + if (!add->frames)
> > > > > > > + goto err;
> > > > > > > +
> > > > > > > + /* Remember the device, so we can free DMA memory. */
> > > > > > > + add->dma_dev = priv->dma_dev;
> > > > > > > +
> > > > > > > + args.dev = priv->dma_dev;
> > > > > > > + args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT;
> > > > > > > + args.nr_pages = count;
> > > > > > > + args.pages = add->pages;
> > > > > > > + args.frames = add->frames;
> > > > > > > +
> > > > > > > + if (gnttab_dma_alloc_pages(&args))
> > > *Option 3: kfree(map->frames);*
> > > > > > > + goto err;
> > > > > > > +
> > > > > > > + add->dma_vaddr = args.vaddr;
> > > > > > > + add->dma_bus_addr = args.dev_bus_addr;
> > > > > > > + } else
> > > > > > > +#endif
> > > > > > > if (gnttab_alloc_pages(count, add->pages))
> > > > > > > goto err;
> > > > > > > @@ -325,6 +401,14 @@ static int map_grant_pages(struct
> > > > > > > grant_map
> > > > > > > *map)
> > > > > > > map->unmap_ops[i].handle = map->map_ops[i].handle;
> > > > > > > if (use_ptemod)
> > > > > > > map->kunmap_ops[i].handle =
> > > > > > > map->kmap_ops[i].handle;
> > > > > > > +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> > > > > > > + else if (map->dma_vaddr) {
> > > > > > > + unsigned long mfn;
> > > > > > > +
> > > > > > > + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > > > > Not pfn_to_mfn()?
> > > > > I'd love to, but pfn_to_mfn is only defined for x86, not ARM: [1]
> > > > > and [2]
> > > > > Thus,
> > > > >
> > > > > drivers/xen/gntdev.c:408:10: error: implicit declaration of function
> > > > > ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
> > > > > mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > > >
> > > > > So, I'll keep __pfn_to_mfn
> > > > How will this work on non-PV x86?
> > > So, you mean I need:
> > > #ifdef CONFIG_X86
> > > mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > #else
> > > mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > #endif
> > >
> > I'd rather fix it in ARM code. Stefano, why does ARM uses the
> > underscored version?
> Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
> with static inline for ARM? e.g.
> static inline ...pfn_to_mfn(...)
> {
> __pfn_to_mfn();
> }
A Xen on ARM guest doesn't actually know the mfns behind its own
pseudo-physical pages. This is why we stopped using pfn_to_mfn and
started using pfn_to_bfn instead, which will generally return "pfn",
unless the page is a foreign grant. See include/xen/arm/page.h.
pfn_to_bfn was also introduced on x86. For example, see the usage of
pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't care
about other mapped grants, you can just use pfn_to_gfn, that always
returns pfn.
Also, for your information, we support different page granularities in
Linux as a Xen guest, see the comment at include/xen/arm/page.h:
/*
* The pseudo-physical frame (pfn) used in all the helpers is always based
* on Xen page granularity (i.e 4KB).
*
* A Linux page may be split across multiple non-contiguous Xen page so we
* have to keep track with frame based on 4KB page granularity.
*
* PV drivers should never make a direct usage of those helpers (particularly
* pfn_to_gfn and gfn_to_pfn).
*/
A Linux page could be 64K, but a Xen page is always 4K. A granted page
is also 4K. We have helpers to take into account the offsets to map
multiple Xen grants in a single Linux page, see for example
drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have
been converted to be able to work with 64K pages correctly, but if I
remember correctly gntdev.c is the only remaining driver that doesn't
support 64K pages yet, so you don't have to deal with it if you don't
want to.
On 06/08/2018 01:59 PM, Stefano Stabellini wrote:
>
>>>>>>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct
>>>>>>>> grant_map
>>>>>>>> *map)
>>>>>>>> map->unmap_ops[i].handle = map->map_ops[i].handle;
>>>>>>>> if (use_ptemod)
>>>>>>>> map->kunmap_ops[i].handle =
>>>>>>>> map->kmap_ops[i].handle;
>>>>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>>>>> + else if (map->dma_vaddr) {
>>>>>>>> + unsigned long mfn;
>>>>>>>> +
>>>>>>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>> Not pfn_to_mfn()?
>>>>>> I'd love to, but pfn_to_mfn is only defined for x86, not ARM: [1]
>>>>>> and [2]
>>>>>> Thus,
>>>>>>
>>>>>> drivers/xen/gntdev.c:408:10: error: implicit declaration of function
>>>>>> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>
>>>>>> So, I'll keep __pfn_to_mfn
>>>>> How will this work on non-PV x86?
>>>> So, you mean I need:
>>>> #ifdef CONFIG_X86
>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>> #else
>>>> mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>> #endif
>>>>
>>> I'd rather fix it in ARM code. Stefano, why does ARM uses the
>>> underscored version?
>> Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
>> with static inline for ARM? e.g.
>> static inline ...pfn_to_mfn(...)
>> {
>> __pfn_to_mfn();
>> }
>
> A Xen on ARM guest doesn't actually know the mfns behind its own
> pseudo-physical pages. This is why we stopped using pfn_to_mfn and
> started using pfn_to_bfn instead, which will generally return "pfn",
> unless the page is a foreign grant. See include/xen/arm/page.h.
> pfn_to_bfn was also introduced on x86. For example, see the usage of
> pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't care
> about other mapped grants, you can just use pfn_to_gfn, that always
> returns pfn.
I think then this code needs to use pfn_to_bfn().
>
> Also, for your information, we support different page granularities in
> Linux as a Xen guest, see the comment at include/xen/arm/page.h:
>
> /*
> * The pseudo-physical frame (pfn) used in all the helpers is always based
> * on Xen page granularity (i.e 4KB).
> *
> * A Linux page may be split across multiple non-contiguous Xen page so we
> * have to keep track with frame based on 4KB page granularity.
> *
> * PV drivers should never make a direct usage of those helpers (particularly
> * pfn_to_gfn and gfn_to_pfn).
> */
>
> A Linux page could be 64K, but a Xen page is always 4K. A granted page
> is also 4K. We have helpers to take into account the offsets to map
> multiple Xen grants in a single Linux page, see for example
> drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have
> been converted to be able to work with 64K pages correctly, but if I
> remember correctly gntdev.c is the only remaining driver that doesn't
> support 64K pages yet, so you don't have to deal with it if you don't
> want to.
I believe somewhere in this series there is a test for PAGE_SIZE vs.
XEN_PAGE_SIZE. Right, Oleksandr?
Thanks for the explanation.
-boris
On 06/08/2018 10:21 PM, Boris Ostrovsky wrote:
> On 06/08/2018 01:59 PM, Stefano Stabellini wrote:
>>>>>>>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct
>>>>>>>>> grant_map
>>>>>>>>> *map)
>>>>>>>>> map->unmap_ops[i].handle = map->map_ops[i].handle;
>>>>>>>>> if (use_ptemod)
>>>>>>>>> map->kunmap_ops[i].handle =
>>>>>>>>> map->kmap_ops[i].handle;
>>>>>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>>>>>> + else if (map->dma_vaddr) {
>>>>>>>>> + unsigned long mfn;
>>>>>>>>> +
>>>>>>>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>> Not pfn_to_mfn()?
>>>>>>> I'd love to, but pfn_to_mfn is only defined for x86, not ARM: [1]
>>>>>>> and [2]
>>>>>>> Thus,
>>>>>>>
>>>>>>> drivers/xen/gntdev.c:408:10: error: implicit declaration of function
>>>>>>> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>
>>>>>>> So, I'll keep __pfn_to_mfn
>>>>>> How will this work on non-PV x86?
>>>>> So, you mean I need:
>>>>> #ifdef CONFIG_X86
>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>> #else
>>>>> mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>> #endif
>>>>>
>>>> I'd rather fix it in ARM code. Stefano, why does ARM uses the
>>>> underscored version?
>>> Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
>>> with static inline for ARM? e.g.
>>> static inline ...pfn_to_mfn(...)
>>> {
>>> __pfn_to_mfn();
>>> }
>> A Xen on ARM guest doesn't actually know the mfns behind its own
>> pseudo-physical pages. This is why we stopped using pfn_to_mfn and
>> started using pfn_to_bfn instead, which will generally return "pfn",
>> unless the page is a foreign grant. See include/xen/arm/page.h.
>> pfn_to_bfn was also introduced on x86. For example, see the usage of
>> pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't care
>> about other mapped grants, you can just use pfn_to_gfn, that always
>> returns pfn.
>
> I think then this code needs to use pfn_to_bfn().
Ok
>
>
>> Also, for your information, we support different page granularities in
>> Linux as a Xen guest, see the comment at include/xen/arm/page.h:
>>
>> /*
>> * The pseudo-physical frame (pfn) used in all the helpers is always based
>> * on Xen page granularity (i.e 4KB).
>> *
>> * A Linux page may be split across multiple non-contiguous Xen page so we
>> * have to keep track with frame based on 4KB page granularity.
>> *
>> * PV drivers should never make a direct usage of those helpers (particularly
>> * pfn_to_gfn and gfn_to_pfn).
>> */
>>
>> A Linux page could be 64K, but a Xen page is always 4K. A granted page
>> is also 4K. We have helpers to take into account the offsets to map
>> multiple Xen grants in a single Linux page, see for example
>> drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have
>> been converted to be able to work with 64K pages correctly, but if I
>> remember correctly gntdev.c is the only remaining driver that doesn't
>> support 64K pages yet, so you don't have to deal with it if you don't
>> want to.
>
> I believe somewhere in this series there is a test for PAGE_SIZE vs.
> XEN_PAGE_SIZE. Right, Oleksandr?
Not in gntdev. You might have seen this in xen-drmfront/xen-sndfront,
but I didn't touch gntdev for that. Do you want me to add yet another patch
in the series to check for that?
> Thanks for the explanation.
>
> -boris
Thank you,
Oleksandr
On Mon, 11 Jun 2018, Oleksandr Andrushchenko wrote:
> On 06/08/2018 10:21 PM, Boris Ostrovsky wrote:
> > On 06/08/2018 01:59 PM, Stefano Stabellini wrote:
> > > > > > > > > > @@ -325,6 +401,14 @@ static int map_grant_pages(struct
> > > > > > > > > > grant_map
> > > > > > > > > > *map)
> > > > > > > > > > map->unmap_ops[i].handle =
> > > > > > > > > > map->map_ops[i].handle;
> > > > > > > > > > if (use_ptemod)
> > > > > > > > > > map->kunmap_ops[i].handle =
> > > > > > > > > > map->kmap_ops[i].handle;
> > > > > > > > > > +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> > > > > > > > > > + else if (map->dma_vaddr) {
> > > > > > > > > > + unsigned long mfn;
> > > > > > > > > > +
> > > > > > > > > > + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > > > > > > > Not pfn_to_mfn()?
> > > > > > > > I'd love to, but pfn_to_mfn is only defined for x86, not ARM:
> > > > > > > > [1]
> > > > > > > > and [2]
> > > > > > > > Thus,
> > > > > > > >
> > > > > > > > drivers/xen/gntdev.c:408:10: error: implicit declaration of
> > > > > > > > function
> > > > > > > > ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
> > > > > > > > mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > > > > > >
> > > > > > > > So, I'll keep __pfn_to_mfn
> > > > > > > How will this work on non-PV x86?
> > > > > > So, you mean I need:
> > > > > > #ifdef CONFIG_X86
> > > > > > mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > > > > #else
> > > > > > mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > > > > #endif
> > > > > >
> > > > > I'd rather fix it in ARM code. Stefano, why does ARM uses the
> > > > > underscored version?
> > > > Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
> > > > with static inline for ARM? e.g.
> > > > static inline ...pfn_to_mfn(...)
> > > > {
> > > > __pfn_to_mfn();
> > > > }
> > > A Xen on ARM guest doesn't actually know the mfns behind its own
> > > pseudo-physical pages. This is why we stopped using pfn_to_mfn and
> > > started using pfn_to_bfn instead, which will generally return "pfn",
> > > unless the page is a foreign grant. See include/xen/arm/page.h.
> > > pfn_to_bfn was also introduced on x86. For example, see the usage of
> > > pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't care
> > > about other mapped grants, you can just use pfn_to_gfn, that always
> > > returns pfn.
> >
> > I think then this code needs to use pfn_to_bfn().
> Ok
> >
> >
> > > Also, for your information, we support different page granularities in
> > > Linux as a Xen guest, see the comment at include/xen/arm/page.h:
> > >
> > > /*
> > > * The pseudo-physical frame (pfn) used in all the helpers is always
> > > based
> > > * on Xen page granularity (i.e 4KB).
> > > *
> > > * A Linux page may be split across multiple non-contiguous Xen page so
> > > we
> > > * have to keep track with frame based on 4KB page granularity.
> > > *
> > > * PV drivers should never make a direct usage of those helpers
> > > (particularly
> > > * pfn_to_gfn and gfn_to_pfn).
> > > */
> > >
> > > A Linux page could be 64K, but a Xen page is always 4K. A granted page
> > > is also 4K. We have helpers to take into account the offsets to map
> > > multiple Xen grants in a single Linux page, see for example
> > > drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have
> > > been converted to be able to work with 64K pages correctly, but if I
> > > remember correctly gntdev.c is the only remaining driver that doesn't
> > > support 64K pages yet, so you don't have to deal with it if you don't
> > > want to.
> >
> > I believe somewhere in this series there is a test for PAGE_SIZE vs.
> > XEN_PAGE_SIZE. Right, Oleksandr?
> Not in gntdev. You might have seen this in xen-drmfront/xen-sndfront,
> but I didn't touch gntdev for that. Do you want me to add yet another patch
> in the series to check for that?
gntdev.c is already not capable of handling PAGE_SIZE != XEN_PAGE_SIZE,
so you are not going to break anything that is not already broken :-) If
your new gntdev.c code relies on PAGE_SIZE == XEN_PAGE_SIZE, it might be
good to add an in-code comment about it, just to make it easier to fix
the whole of gntdev.c in the future.
> > Thanks for the explanation.
On 06/11/2018 07:51 PM, Stefano Stabellini wrote:
> On Mon, 11 Jun 2018, Oleksandr Andrushchenko wrote:
>> On 06/08/2018 10:21 PM, Boris Ostrovsky wrote:
>>> On 06/08/2018 01:59 PM, Stefano Stabellini wrote:
>>>>>>>>>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct
>>>>>>>>>>> grant_map
>>>>>>>>>>> *map)
>>>>>>>>>>> map->unmap_ops[i].handle =
>>>>>>>>>>> map->map_ops[i].handle;
>>>>>>>>>>> if (use_ptemod)
>>>>>>>>>>> map->kunmap_ops[i].handle =
>>>>>>>>>>> map->kmap_ops[i].handle;
>>>>>>>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>>>>>>>> + else if (map->dma_vaddr) {
>>>>>>>>>>> + unsigned long mfn;
>>>>>>>>>>> +
>>>>>>>>>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>> Not pfn_to_mfn()?
>>>>>>>>> I'd love to, but pfn_to_mfn is only defined for x86, not ARM:
>>>>>>>>> [1]
>>>>>>>>> and [2]
>>>>>>>>> Thus,
>>>>>>>>>
>>>>>>>>> drivers/xen/gntdev.c:408:10: error: implicit declaration of
>>>>>>>>> function
>>>>>>>>> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
>>>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>
>>>>>>>>> So, I'll keep __pfn_to_mfn
>>>>>>>> How will this work on non-PV x86?
>>>>>>> So, you mean I need:
>>>>>>> #ifdef CONFIG_X86
>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>> #else
>>>>>>> mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>> #endif
>>>>>>>
>>>>>> I'd rather fix it in ARM code. Stefano, why does ARM uses the
>>>>>> underscored version?
>>>>> Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
>>>>> with static inline for ARM? e.g.
>>>>> static inline ...pfn_to_mfn(...)
>>>>> {
>>>>> __pfn_to_mfn();
>>>>> }
>>>> A Xen on ARM guest doesn't actually know the mfns behind its own
>>>> pseudo-physical pages. This is why we stopped using pfn_to_mfn and
>>>> started using pfn_to_bfn instead, which will generally return "pfn",
>>>> unless the page is a foreign grant. See include/xen/arm/page.h.
>>>> pfn_to_bfn was also introduced on x86. For example, see the usage of
>>>> pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't care
>>>> about other mapped grants, you can just use pfn_to_gfn, that always
>>>> returns pfn.
>>> I think then this code needs to use pfn_to_bfn().
>> Ok
>>>
>>>> Also, for your information, we support different page granularities in
>>>> Linux as a Xen guest, see the comment at include/xen/arm/page.h:
>>>>
>>>> /*
>>>> * The pseudo-physical frame (pfn) used in all the helpers is always
>>>> based
>>>> * on Xen page granularity (i.e 4KB).
>>>> *
>>>> * A Linux page may be split across multiple non-contiguous Xen page so
>>>> we
>>>> * have to keep track with frame based on 4KB page granularity.
>>>> *
>>>> * PV drivers should never make a direct usage of those helpers
>>>> (particularly
>>>> * pfn_to_gfn and gfn_to_pfn).
>>>> */
>>>>
>>>> A Linux page could be 64K, but a Xen page is always 4K. A granted page
>>>> is also 4K. We have helpers to take into account the offsets to map
>>>> multiple Xen grants in a single Linux page, see for example
>>>> drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have
>>>> been converted to be able to work with 64K pages correctly, but if I
>>>> remember correctly gntdev.c is the only remaining driver that doesn't
>>>> support 64K pages yet, so you don't have to deal with it if you don't
>>>> want to.
>>> I believe somewhere in this series there is a test for PAGE_SIZE vs.
>>> XEN_PAGE_SIZE. Right, Oleksandr?
>> Not in gntdev. You might have seen this in xen-drmfront/xen-sndfront,
>> but I didn't touch gntdev for that. Do you want me to add yet another patch
>> in the series to check for that?
> gntdev.c is already not capable of handling PAGE_SIZE != XEN_PAGE_SIZE,
> so you are not going to break anything that is not already broken :-) If
> your new gntdev.c code relies on PAGE_SIZE == XEN_PAGE_SIZE, it might be
> good to add an in-code comment about it, just to make it easier to fix
> the whole of gntdev.c in the future.
>
Yes, I just mean I can add something like [1] as a separate patch to the
series,
so we are on the safe side here
>
>>> Thanks for the explanation.
[1]
https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/xen/xen_drm_front.c#n813
Hi,
On 06/11/2018 06:49 PM, Oleksandr Andrushchenko wrote:
> On 06/11/2018 08:46 PM, Julien Grall wrote:
>> Hi,
>>
>> On 06/11/2018 06:16 PM, Oleksandr Andrushchenko wrote:
>>> On 06/11/2018 07:51 PM, Stefano Stabellini wrote:
>>>> On Mon, 11 Jun 2018, Oleksandr Andrushchenko wrote:
>>>>> On 06/08/2018 10:21 PM, Boris Ostrovsky wrote:
>>>>>> On 06/08/2018 01:59 PM, Stefano Stabellini wrote:
>>>>>>>>>>>>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct
>>>>>>>>>>>>>> grant_map
>>>>>>>>>>>>>> *map)
>>>>>>>>>>>>>> map->unmap_ops[i].handle =
>>>>>>>>>>>>>> map->map_ops[i].handle;
>>>>>>>>>>>>>> if (use_ptemod)
>>>>>>>>>>>>>> map->kunmap_ops[i].handle =
>>>>>>>>>>>>>> map->kmap_ops[i].handle;
>>>>>>>>>>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>>>>>>>>>>> + else if (map->dma_vaddr) {
>>>>>>>>>>>>>> + unsigned long mfn;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>>>>> Not pfn_to_mfn()?
>>>>>>>>>>>> I'd love to, but pfn_to_mfn is only defined for x86, not ARM:
>>>>>>>>>>>> [1]
>>>>>>>>>>>> and [2]
>>>>>>>>>>>> Thus,
>>>>>>>>>>>>
>>>>>>>>>>>> drivers/xen/gntdev.c:408:10: error: implicit declaration of
>>>>>>>>>>>> function
>>>>>>>>>>>> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
>>>>>>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>>>>
>>>>>>>>>>>> So, I'll keep __pfn_to_mfn
>>>>>>>>>>> How will this work on non-PV x86?
>>>>>>>>>> So, you mean I need:
>>>>>>>>>> #ifdef CONFIG_X86
>>>>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>> #else
>>>>>>>>>> mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>> #endif
>>>>>>>>>>
>>>>>>>>> I'd rather fix it in ARM code. Stefano, why does ARM uses the
>>>>>>>>> underscored version?
>>>>>>>> Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
>>>>>>>> with static inline for ARM? e.g.
>>>>>>>> static inline ...pfn_to_mfn(...)
>>>>>>>> {
>>>>>>>> __pfn_to_mfn();
>>>>>>>> }
>>>>>>> A Xen on ARM guest doesn't actually know the mfns behind its own
>>>>>>> pseudo-physical pages. This is why we stopped using pfn_to_mfn and
>>>>>>> started using pfn_to_bfn instead, which will generally return "pfn",
>>>>>>> unless the page is a foreign grant. See include/xen/arm/page.h.
>>>>>>> pfn_to_bfn was also introduced on x86. For example, see the usage of
>>>>>>> pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't
>>>>>>> care
>>>>>>> about other mapped grants, you can just use pfn_to_gfn, that always
>>>>>>> returns pfn.
>>>>>> I think then this code needs to use pfn_to_bfn().
>>>>> Ok
>>>>>>
>>>>>>> Also, for your information, we support different page
>>>>>>> granularities in
>>>>>>> Linux as a Xen guest, see the comment at include/xen/arm/page.h:
>>>>>>>
>>>>>>> /*
>>>>>>> * The pseudo-physical frame (pfn) used in all the helpers is
>>>>>>> always
>>>>>>> based
>>>>>>> * on Xen page granularity (i.e 4KB).
>>>>>>> *
>>>>>>> * A Linux page may be split across multiple non-contiguous
>>>>>>> Xen page so
>>>>>>> we
>>>>>>> * have to keep track with frame based on 4KB page granularity.
>>>>>>> *
>>>>>>> * PV drivers should never make a direct usage of those helpers
>>>>>>> (particularly
>>>>>>> * pfn_to_gfn and gfn_to_pfn).
>>>>>>> */
>>>>>>>
>>>>>>> A Linux page could be 64K, but a Xen page is always 4K. A granted
>>>>>>> page
>>>>>>> is also 4K. We have helpers to take into account the offsets to map
>>>>>>> multiple Xen grants in a single Linux page, see for example
>>>>>>> drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have
>>>>>>> been converted to be able to work with 64K pages correctly, but if I
>>>>>>> remember correctly gntdev.c is the only remaining driver that
>>>>>>> doesn't
>>>>>>> support 64K pages yet, so you don't have to deal with it if you
>>>>>>> don't
>>>>>>> want to.
>>>>>> I believe somewhere in this series there is a test for PAGE_SIZE vs.
>>>>>> XEN_PAGE_SIZE. Right, Oleksandr?
>>>>> Not in gntdev. You might have seen this in xen-drmfront/xen-sndfront,
>>>>> but I didn't touch gntdev for that. Do you want me to add yet
>>>>> another patch
>>>>> in the series to check for that?
>>>> gntdev.c is already not capable of handling PAGE_SIZE != XEN_PAGE_SIZE,
>>>> so you are not going to break anything that is not already broken
>>>> :-) If
>>>> your new gntdev.c code relies on PAGE_SIZE == XEN_PAGE_SIZE, it
>>>> might be
>>>> good to add an in-code comment about it, just to make it easier to fix
>>>> the whole of gntdev.c in the future.
>>>>
>>> Yes, I just mean I can add something like [1] as a separate patch to
>>> the series,
>>> so we are on the safe side here
>>
>> See my comment on Stefano's e-mail. I believe gntdev is able to handle
>> PAGE_SIZE != XEN_PAGE_SIZE. So I would rather keep the behavior we
>> have today for such case.
>>
> Sure, with a note that we waste most of a 64KiB page ;)
That's the second definition of "64KB page" ;). In the case of grants,
it is actually quite hard to merge them in a single page. So quite a few
places still allocate 64KB but only map the first 4KB.
You would need to rework most the grant framework (not only gntdev) to
avoid that waste. Patches are welcomed.
Cheers,
--
Julien Grall
Hi Stefano,
On 06/11/2018 05:51 PM, Stefano Stabellini wrote:
> On Mon, 11 Jun 2018, Oleksandr Andrushchenko wrote:
>> On 06/08/2018 10:21 PM, Boris Ostrovsky wrote:
>>> On 06/08/2018 01:59 PM, Stefano Stabellini wrote:
>>>>>>>>>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct
>>>>>>>>>>> grant_map
>>>>>>>>>>> *map)
>>>>>>>>>>> map->unmap_ops[i].handle =
>>>>>>>>>>> map->map_ops[i].handle;
>>>>>>>>>>> if (use_ptemod)
>>>>>>>>>>> map->kunmap_ops[i].handle =
>>>>>>>>>>> map->kmap_ops[i].handle;
>>>>>>>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>>>>>>>> + else if (map->dma_vaddr) {
>>>>>>>>>>> + unsigned long mfn;
>>>>>>>>>>> +
>>>>>>>>>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>> Not pfn_to_mfn()?
>>>>>>>>> I'd love to, but pfn_to_mfn is only defined for x86, not ARM:
>>>>>>>>> [1]
>>>>>>>>> and [2]
>>>>>>>>> Thus,
>>>>>>>>>
>>>>>>>>> drivers/xen/gntdev.c:408:10: error: implicit declaration of
>>>>>>>>> function
>>>>>>>>> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
>>>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>
>>>>>>>>> So, I'll keep __pfn_to_mfn
>>>>>>>> How will this work on non-PV x86?
>>>>>>> So, you mean I need:
>>>>>>> #ifdef CONFIG_X86
>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>> #else
>>>>>>> mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>> #endif
>>>>>>>
>>>>>> I'd rather fix it in ARM code. Stefano, why does ARM uses the
>>>>>> underscored version?
>>>>> Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
>>>>> with static inline for ARM? e.g.
>>>>> static inline ...pfn_to_mfn(...)
>>>>> {
>>>>> __pfn_to_mfn();
>>>>> }
>>>> A Xen on ARM guest doesn't actually know the mfns behind its own
>>>> pseudo-physical pages. This is why we stopped using pfn_to_mfn and
>>>> started using pfn_to_bfn instead, which will generally return "pfn",
>>>> unless the page is a foreign grant. See include/xen/arm/page.h.
>>>> pfn_to_bfn was also introduced on x86. For example, see the usage of
>>>> pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't care
>>>> about other mapped grants, you can just use pfn_to_gfn, that always
>>>> returns pfn.
>>>
>>> I think then this code needs to use pfn_to_bfn().
>> Ok
>>>
>>>
>>>> Also, for your information, we support different page granularities in
>>>> Linux as a Xen guest, see the comment at include/xen/arm/page.h:
>>>>
>>>> /*
>>>> * The pseudo-physical frame (pfn) used in all the helpers is always
>>>> based
>>>> * on Xen page granularity (i.e 4KB).
>>>> *
>>>> * A Linux page may be split across multiple non-contiguous Xen page so
>>>> we
>>>> * have to keep track with frame based on 4KB page granularity.
>>>> *
>>>> * PV drivers should never make a direct usage of those helpers
>>>> (particularly
>>>> * pfn_to_gfn and gfn_to_pfn).
>>>> */
>>>>
>>>> A Linux page could be 64K, but a Xen page is always 4K. A granted page
>>>> is also 4K. We have helpers to take into account the offsets to map
>>>> multiple Xen grants in a single Linux page, see for example
>>>> drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have
>>>> been converted to be able to work with 64K pages correctly, but if I
>>>> remember correctly gntdev.c is the only remaining driver that doesn't
>>>> support 64K pages yet, so you don't have to deal with it if you don't
>>>> want to.
>>>
>>> I believe somewhere in this series there is a test for PAGE_SIZE vs.
>>> XEN_PAGE_SIZE. Right, Oleksandr?
>> Not in gntdev. You might have seen this in xen-drmfront/xen-sndfront,
>> but I didn't touch gntdev for that. Do you want me to add yet another patch
>> in the series to check for that?
>
> gntdev.c is already not capable of handling PAGE_SIZE != XEN_PAGE_SIZE,
> so you are not going to break anything that is not already broken :-) If
> your new gntdev.c code relies on PAGE_SIZE == XEN_PAGE_SIZE, it might be
> good to add an in-code comment about it, just to make it easier to fix
> the whole of gntdev.c in the future.
Well, I think gntdev is capable of handling PAGE_SIZE != XEN_PAGE_SIZE.
Let's imagine Linux is built with 64K pages. gntdev will map each grant
at a 64K alignment. Although, I am not sure if patches for QEMU ever
make it upstream (I think it is in Centos).
Cheers,
--
Julien Grall
Hi,
On 06/11/2018 06:16 PM, Oleksandr Andrushchenko wrote:
> On 06/11/2018 07:51 PM, Stefano Stabellini wrote:
>> On Mon, 11 Jun 2018, Oleksandr Andrushchenko wrote:
>>> On 06/08/2018 10:21 PM, Boris Ostrovsky wrote:
>>>> On 06/08/2018 01:59 PM, Stefano Stabellini wrote:
>>>>>>>>>>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct
>>>>>>>>>>>> grant_map
>>>>>>>>>>>> *map)
>>>>>>>>>>>> map->unmap_ops[i].handle =
>>>>>>>>>>>> map->map_ops[i].handle;
>>>>>>>>>>>> if (use_ptemod)
>>>>>>>>>>>> map->kunmap_ops[i].handle =
>>>>>>>>>>>> map->kmap_ops[i].handle;
>>>>>>>>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>>>>>>>>> + else if (map->dma_vaddr) {
>>>>>>>>>>>> + unsigned long mfn;
>>>>>>>>>>>> +
>>>>>>>>>>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>>> Not pfn_to_mfn()?
>>>>>>>>>> I'd love to, but pfn_to_mfn is only defined for x86, not ARM:
>>>>>>>>>> [1]
>>>>>>>>>> and [2]
>>>>>>>>>> Thus,
>>>>>>>>>>
>>>>>>>>>> drivers/xen/gntdev.c:408:10: error: implicit declaration of
>>>>>>>>>> function
>>>>>>>>>> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
>>>>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>>
>>>>>>>>>> So, I'll keep __pfn_to_mfn
>>>>>>>>> How will this work on non-PV x86?
>>>>>>>> So, you mean I need:
>>>>>>>> #ifdef CONFIG_X86
>>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>> #else
>>>>>>>> mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>> #endif
>>>>>>>>
>>>>>>> I'd rather fix it in ARM code. Stefano, why does ARM uses the
>>>>>>> underscored version?
>>>>>> Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
>>>>>> with static inline for ARM? e.g.
>>>>>> static inline ...pfn_to_mfn(...)
>>>>>> {
>>>>>> __pfn_to_mfn();
>>>>>> }
>>>>> A Xen on ARM guest doesn't actually know the mfns behind its own
>>>>> pseudo-physical pages. This is why we stopped using pfn_to_mfn and
>>>>> started using pfn_to_bfn instead, which will generally return "pfn",
>>>>> unless the page is a foreign grant. See include/xen/arm/page.h.
>>>>> pfn_to_bfn was also introduced on x86. For example, see the usage of
>>>>> pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't care
>>>>> about other mapped grants, you can just use pfn_to_gfn, that always
>>>>> returns pfn.
>>>> I think then this code needs to use pfn_to_bfn().
>>> Ok
>>>>
>>>>> Also, for your information, we support different page granularities in
>>>>> Linux as a Xen guest, see the comment at include/xen/arm/page.h:
>>>>>
>>>>> /*
>>>>> * The pseudo-physical frame (pfn) used in all the helpers is
>>>>> always
>>>>> based
>>>>> * on Xen page granularity (i.e 4KB).
>>>>> *
>>>>> * A Linux page may be split across multiple non-contiguous Xen
>>>>> page so
>>>>> we
>>>>> * have to keep track with frame based on 4KB page granularity.
>>>>> *
>>>>> * PV drivers should never make a direct usage of those helpers
>>>>> (particularly
>>>>> * pfn_to_gfn and gfn_to_pfn).
>>>>> */
>>>>>
>>>>> A Linux page could be 64K, but a Xen page is always 4K. A granted page
>>>>> is also 4K. We have helpers to take into account the offsets to map
>>>>> multiple Xen grants in a single Linux page, see for example
>>>>> drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have
>>>>> been converted to be able to work with 64K pages correctly, but if I
>>>>> remember correctly gntdev.c is the only remaining driver that doesn't
>>>>> support 64K pages yet, so you don't have to deal with it if you don't
>>>>> want to.
>>>> I believe somewhere in this series there is a test for PAGE_SIZE vs.
>>>> XEN_PAGE_SIZE. Right, Oleksandr?
>>> Not in gntdev. You might have seen this in xen-drmfront/xen-sndfront,
>>> but I didn't touch gntdev for that. Do you want me to add yet another
>>> patch
>>> in the series to check for that?
>> gntdev.c is already not capable of handling PAGE_SIZE != XEN_PAGE_SIZE,
>> so you are not going to break anything that is not already broken :-) If
>> your new gntdev.c code relies on PAGE_SIZE == XEN_PAGE_SIZE, it might be
>> good to add an in-code comment about it, just to make it easier to fix
>> the whole of gntdev.c in the future.
>>
> Yes, I just mean I can add something like [1] as a separate patch to the
> series,
> so we are on the safe side here
See my comment on Stefano's e-mail. I believe gntdev is able to handle
PAGE_SIZE != XEN_PAGE_SIZE. So I would rather keep the behavior we have
today for such case.
Cheers,
--
Julien Grall
On 06/11/2018 08:46 PM, Julien Grall wrote:
> Hi,
>
> On 06/11/2018 06:16 PM, Oleksandr Andrushchenko wrote:
>> On 06/11/2018 07:51 PM, Stefano Stabellini wrote:
>>> On Mon, 11 Jun 2018, Oleksandr Andrushchenko wrote:
>>>> On 06/08/2018 10:21 PM, Boris Ostrovsky wrote:
>>>>> On 06/08/2018 01:59 PM, Stefano Stabellini wrote:
>>>>>>>>>>>>> @@ -325,6 +401,14 @@ static int map_grant_pages(struct
>>>>>>>>>>>>> grant_map
>>>>>>>>>>>>> *map)
>>>>>>>>>>>>> map->unmap_ops[i].handle =
>>>>>>>>>>>>> map->map_ops[i].handle;
>>>>>>>>>>>>> if (use_ptemod)
>>>>>>>>>>>>> map->kunmap_ops[i].handle =
>>>>>>>>>>>>> map->kmap_ops[i].handle;
>>>>>>>>>>>>> +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>>>>>>>>>>>>> + else if (map->dma_vaddr) {
>>>>>>>>>>>>> + unsigned long mfn;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>>>> Not pfn_to_mfn()?
>>>>>>>>>>> I'd love to, but pfn_to_mfn is only defined for x86, not ARM:
>>>>>>>>>>> [1]
>>>>>>>>>>> and [2]
>>>>>>>>>>> Thus,
>>>>>>>>>>>
>>>>>>>>>>> drivers/xen/gntdev.c:408:10: error: implicit declaration of
>>>>>>>>>>> function
>>>>>>>>>>> ‘pfn_to_mfn’ [-Werror=implicit-function-declaration]
>>>>>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>>>>
>>>>>>>>>>> So, I'll keep __pfn_to_mfn
>>>>>>>>>> How will this work on non-PV x86?
>>>>>>>>> So, you mean I need:
>>>>>>>>> #ifdef CONFIG_X86
>>>>>>>>> mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>> #else
>>>>>>>>> mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
>>>>>>>>> #endif
>>>>>>>>>
>>>>>>>> I'd rather fix it in ARM code. Stefano, why does ARM uses the
>>>>>>>> underscored version?
>>>>>>> Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
>>>>>>> with static inline for ARM? e.g.
>>>>>>> static inline ...pfn_to_mfn(...)
>>>>>>> {
>>>>>>> __pfn_to_mfn();
>>>>>>> }
>>>>>> A Xen on ARM guest doesn't actually know the mfns behind its own
>>>>>> pseudo-physical pages. This is why we stopped using pfn_to_mfn and
>>>>>> started using pfn_to_bfn instead, which will generally return "pfn",
>>>>>> unless the page is a foreign grant. See include/xen/arm/page.h.
>>>>>> pfn_to_bfn was also introduced on x86. For example, see the usage of
>>>>>> pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't
>>>>>> care
>>>>>> about other mapped grants, you can just use pfn_to_gfn, that always
>>>>>> returns pfn.
>>>>> I think then this code needs to use pfn_to_bfn().
>>>> Ok
>>>>>
>>>>>> Also, for your information, we support different page
>>>>>> granularities in
>>>>>> Linux as a Xen guest, see the comment at include/xen/arm/page.h:
>>>>>>
>>>>>> /*
>>>>>> * The pseudo-physical frame (pfn) used in all the helpers is
>>>>>> always
>>>>>> based
>>>>>> * on Xen page granularity (i.e 4KB).
>>>>>> *
>>>>>> * A Linux page may be split across multiple non-contiguous
>>>>>> Xen page so
>>>>>> we
>>>>>> * have to keep track with frame based on 4KB page granularity.
>>>>>> *
>>>>>> * PV drivers should never make a direct usage of those helpers
>>>>>> (particularly
>>>>>> * pfn_to_gfn and gfn_to_pfn).
>>>>>> */
>>>>>>
>>>>>> A Linux page could be 64K, but a Xen page is always 4K. A granted
>>>>>> page
>>>>>> is also 4K. We have helpers to take into account the offsets to map
>>>>>> multiple Xen grants in a single Linux page, see for example
>>>>>> drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have
>>>>>> been converted to be able to work with 64K pages correctly, but if I
>>>>>> remember correctly gntdev.c is the only remaining driver that
>>>>>> doesn't
>>>>>> support 64K pages yet, so you don't have to deal with it if you
>>>>>> don't
>>>>>> want to.
>>>>> I believe somewhere in this series there is a test for PAGE_SIZE vs.
>>>>> XEN_PAGE_SIZE. Right, Oleksandr?
>>>> Not in gntdev. You might have seen this in xen-drmfront/xen-sndfront,
>>>> but I didn't touch gntdev for that. Do you want me to add yet
>>>> another patch
>>>> in the series to check for that?
>>> gntdev.c is already not capable of handling PAGE_SIZE != XEN_PAGE_SIZE,
>>> so you are not going to break anything that is not already broken
>>> :-) If
>>> your new gntdev.c code relies on PAGE_SIZE == XEN_PAGE_SIZE, it
>>> might be
>>> good to add an in-code comment about it, just to make it easier to fix
>>> the whole of gntdev.c in the future.
>>>
>> Yes, I just mean I can add something like [1] as a separate patch to
>> the series,
>> so we are on the safe side here
>
> See my comment on Stefano's e-mail. I believe gntdev is able to handle
> PAGE_SIZE != XEN_PAGE_SIZE. So I would rather keep the behavior we
> have today for such case.
>
Sure, with a note that we waste most of a 64KiB page ;)
> Cheers,
>
On 01/06/18 13:41, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <[email protected]>
>
> Only gnttab_{alloc|free}_pages are exported as EXPORT_SYMBOL
> while all the rest are exported as EXPORT_SYMBOL_GPL, thus
> effectively making it not possible for non-GPL driver modules
> to use grant table module. Export gnttab_{alloc|free}_pages as
> EXPORT_SYMBOL_GPL so all the exports are aligned.
>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
Pushed to xen/tip.git for-linus-4.18
Juergen