Hello,
This patchset introduces memory shrinker for the VirtIO-GPU DRM driver.
During OOM, the shrinker will release BOs that are marked as "not needed"
by userspace using the new madvise IOCTL. The userspace in this case is
the Mesa VirGL driver, it will mark the cached BOs as "not needed",
allowing kernel driver to release memory of the cached shmem BOs on lowmem
situations, preventing OOM kills.
This patchset includes couple fixes for problems I found while was working
on the shrinker, it also includes prerequisite DMA API usage improvement
needed by the shrinker.
The Mesa and IGT patches will be kept on hold until this kernel series
will be approved and applied.
This patchset was tested using Qemu and crosvm, including both cases of
IOMMU off/on.
Mesa: https://gitlab.freedesktop.org/digetx/mesa/-/commits/virgl-madvise
IGT: https://gitlab.freedesktop.org/digetx/igt-gpu-tools/-/tree/virtio-madvise
Dmitry Osipenko (5):
drm/virtio: Correct drm_gem_shmem_get_sg_table() error handling
drm/virtio: Check whether transferred 2D BO is shmem
drm/virtio: Unlock GEM reservations in error code path
drm/virtio: Improve DMA API usage for shmem BOs
drm/virtio: Add memory shrinker
drivers/gpu/drm/virtio/Makefile | 3 +-
drivers/gpu/drm/virtio/virtgpu_drv.c | 22 +++-
drivers/gpu/drm/virtio/virtgpu_drv.h | 31 ++++-
drivers/gpu/drm/virtio/virtgpu_gem.c | 84 ++++++++++++
drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c | 124 ++++++++++++++++++
drivers/gpu/drm/virtio/virtgpu_ioctl.c | 37 ++++++
drivers/gpu/drm/virtio/virtgpu_kms.c | 17 ++-
drivers/gpu/drm/virtio/virtgpu_object.c | 63 +++------
drivers/gpu/drm/virtio/virtgpu_plane.c | 17 ++-
drivers/gpu/drm/virtio/virtgpu_vq.c | 30 +++--
include/uapi/drm/virtgpu_drm.h | 14 ++
11 files changed, 373 insertions(+), 69 deletions(-)
create mode 100644 drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c
--
2.35.1
DRM API requires the DRM's driver to be backed with the device that can
be used for generic DMA operations. The VirtIO-GPU device can't perform
DMA operations if it uses PCI transport because PCI device driver creates
a virtual VirtIO-GPU device that isn't associated with the PCI. Use PCI's
GPU device for the DRM's device instead of the VirtIO-GPU device and drop
DMA-related hacks from the VirtIO-GPU driver.
Signed-off-by: Dmitry Osipenko <[email protected]>
---
drivers/gpu/drm/virtio/virtgpu_drv.c | 22 +++++++---
drivers/gpu/drm/virtio/virtgpu_drv.h | 5 +--
drivers/gpu/drm/virtio/virtgpu_kms.c | 7 ++--
drivers/gpu/drm/virtio/virtgpu_object.c | 56 +++++--------------------
drivers/gpu/drm/virtio/virtgpu_vq.c | 13 +++---
5 files changed, 37 insertions(+), 66 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
index 5f25a8d15464..8449dad3e65c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -46,9 +46,9 @@ static int virtio_gpu_modeset = -1;
MODULE_PARM_DESC(modeset, "Disable/Enable modesetting");
module_param_named(modeset, virtio_gpu_modeset, int, 0400);
-static int virtio_gpu_pci_quirk(struct drm_device *dev, struct virtio_device *vdev)
+static int virtio_gpu_pci_quirk(struct drm_device *dev)
{
- struct pci_dev *pdev = to_pci_dev(vdev->dev.parent);
+ struct pci_dev *pdev = to_pci_dev(dev->dev);
const char *pname = dev_name(&pdev->dev);
bool vga = (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
char unique[20];
@@ -101,6 +101,7 @@ static int virtio_gpu_pci_quirk(struct drm_device *dev, struct virtio_device *vd
static int virtio_gpu_probe(struct virtio_device *vdev)
{
struct drm_device *dev;
+ struct device *dma_dev;
int ret;
if (drm_firmware_drivers_only() && virtio_gpu_modeset == -1)
@@ -109,18 +110,29 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
if (virtio_gpu_modeset == 0)
return -EINVAL;
- dev = drm_dev_alloc(&driver, &vdev->dev);
+ /*
+ * If GPU's parent is a PCI device, then we will use this PCI device
+ * for the DRM's driver device because GPU won't have PCI's IOMMU DMA
+ * ops in this case since GPU device is sitting on a separate (from PCI)
+ * virtio-bus.
+ */
+ if (!strcmp(vdev->dev.parent->bus->name, "pci"))
+ dma_dev = vdev->dev.parent;
+ else
+ dma_dev = &vdev->dev;
+
+ dev = drm_dev_alloc(&driver, dma_dev);
if (IS_ERR(dev))
return PTR_ERR(dev);
vdev->priv = dev;
if (!strcmp(vdev->dev.parent->bus->name, "pci")) {
- ret = virtio_gpu_pci_quirk(dev, vdev);
+ ret = virtio_gpu_pci_quirk(dev);
if (ret)
goto err_free;
}
- ret = virtio_gpu_init(dev);
+ ret = virtio_gpu_init(vdev, dev);
if (ret)
goto err_free;
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 0a194aaad419..b2d93cb12ebf 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -100,8 +100,6 @@ struct virtio_gpu_object {
struct virtio_gpu_object_shmem {
struct virtio_gpu_object base;
- struct sg_table *pages;
- uint32_t mapped;
};
struct virtio_gpu_object_vram {
@@ -214,7 +212,6 @@ struct virtio_gpu_drv_cap_cache {
};
struct virtio_gpu_device {
- struct device *dev;
struct drm_device *ddev;
struct virtio_device *vdev;
@@ -282,7 +279,7 @@ extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file);
/* virtgpu_kms.c */
-int virtio_gpu_init(struct drm_device *dev);
+int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev);
void virtio_gpu_deinit(struct drm_device *dev);
void virtio_gpu_release(struct drm_device *dev);
int virtio_gpu_driver_open(struct drm_device *dev, struct drm_file *file);
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c b/drivers/gpu/drm/virtio/virtgpu_kms.c
index 3313b92db531..0d1e3eb61bee 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -110,7 +110,7 @@ static void virtio_gpu_get_capsets(struct virtio_gpu_device *vgdev,
vgdev->num_capsets = num_capsets;
}
-int virtio_gpu_init(struct drm_device *dev)
+int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev)
{
static vq_callback_t *callbacks[] = {
virtio_gpu_ctrl_ack, virtio_gpu_cursor_ack
@@ -123,7 +123,7 @@ int virtio_gpu_init(struct drm_device *dev)
u32 num_scanouts, num_capsets;
int ret = 0;
- if (!virtio_has_feature(dev_to_virtio(dev->dev), VIRTIO_F_VERSION_1))
+ if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
return -ENODEV;
vgdev = kzalloc(sizeof(struct virtio_gpu_device), GFP_KERNEL);
@@ -132,8 +132,7 @@ int virtio_gpu_init(struct drm_device *dev)
vgdev->ddev = dev;
dev->dev_private = vgdev;
- vgdev->vdev = dev_to_virtio(dev->dev);
- vgdev->dev = dev->dev;
+ vgdev->vdev = vdev;
spin_lock_init(&vgdev->display_info_lock);
spin_lock_init(&vgdev->resource_export_lock);
diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c
index 0b8cbb87f8d8..1964c0d8b51f 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -67,21 +67,6 @@ void virtio_gpu_cleanup_object(struct virtio_gpu_object *bo)
virtio_gpu_resource_id_put(vgdev, bo->hw_res_handle);
if (virtio_gpu_is_shmem(bo)) {
- struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo);
-
- if (shmem->pages) {
- if (shmem->mapped) {
- dma_unmap_sgtable(vgdev->vdev->dev.parent,
- shmem->pages, DMA_TO_DEVICE, 0);
- shmem->mapped = 0;
- }
-
- sg_free_table(shmem->pages);
- kfree(shmem->pages);
- shmem->pages = NULL;
- drm_gem_shmem_unpin(&bo->base);
- }
-
drm_gem_shmem_free(&bo->base);
} else if (virtio_gpu_is_vram(bo)) {
struct virtio_gpu_object_vram *vram = to_virtio_gpu_vram(bo);
@@ -153,37 +138,18 @@ static int virtio_gpu_object_shmem_init(struct virtio_gpu_device *vgdev,
unsigned int *nents)
{
bool use_dma_api = !virtio_has_dma_quirk(vgdev->vdev);
- struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo);
struct scatterlist *sg;
- int si, ret;
+ struct sg_table *pages;
+ int si;
- ret = drm_gem_shmem_pin(&bo->base);
- if (ret < 0)
- return -EINVAL;
-
- /*
- * virtio_gpu uses drm_gem_shmem_get_sg_table instead of
- * drm_gem_shmem_get_pages_sgt because virtio has it's own set of
- * dma-ops. This is discouraged for other drivers, but should be fine
- * since virtio_gpu doesn't support dma-buf import from other devices.
- */
- shmem->pages = drm_gem_shmem_get_sg_table(&bo->base);
- ret = PTR_ERR(shmem->pages);
- if (ret) {
- drm_gem_shmem_unpin(&bo->base);
- shmem->pages = NULL;
- return ret;
- }
+ pages = drm_gem_shmem_get_pages_sgt(&bo->base);
+ if (IS_ERR(pages))
+ return PTR_ERR(pages);
- if (use_dma_api) {
- ret = dma_map_sgtable(vgdev->vdev->dev.parent,
- shmem->pages, DMA_TO_DEVICE, 0);
- if (ret)
- return ret;
- *nents = shmem->mapped = shmem->pages->nents;
- } else {
- *nents = shmem->pages->orig_nents;
- }
+ if (use_dma_api)
+ *nents = pages->nents;
+ else
+ *nents = pages->orig_nents;
*ents = kvmalloc_array(*nents,
sizeof(struct virtio_gpu_mem_entry),
@@ -194,13 +160,13 @@ static int virtio_gpu_object_shmem_init(struct virtio_gpu_device *vgdev,
}
if (use_dma_api) {
- for_each_sgtable_dma_sg(shmem->pages, sg, si) {
+ for_each_sgtable_dma_sg(pages, sg, si) {
(*ents)[si].addr = cpu_to_le64(sg_dma_address(sg));
(*ents)[si].length = cpu_to_le32(sg_dma_len(sg));
(*ents)[si].padding = 0;
}
} else {
- for_each_sgtable_sg(shmem->pages, sg, si) {
+ for_each_sgtable_sg(pages, sg, si) {
(*ents)[si].addr = cpu_to_le64(sg_phys(sg));
(*ents)[si].length = cpu_to_le32(sg->length);
(*ents)[si].padding = 0;
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 2edf31806b74..06566e44307d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -593,11 +593,10 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct virtio_gpu_device *vgdev,
struct virtio_gpu_transfer_to_host_2d *cmd_p;
struct virtio_gpu_vbuffer *vbuf;
bool use_dma_api = !virtio_has_dma_quirk(vgdev->vdev);
- struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo);
if (virtio_gpu_is_shmem(bo) && use_dma_api)
- dma_sync_sgtable_for_device(vgdev->vdev->dev.parent,
- shmem->pages, DMA_TO_DEVICE);
+ dma_sync_sgtable_for_device(&vgdev->vdev->dev,
+ bo->base.sgt, DMA_TO_DEVICE);
cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
memset(cmd_p, 0, sizeof(*cmd_p));
@@ -1017,11 +1016,9 @@ void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev,
struct virtio_gpu_vbuffer *vbuf;
bool use_dma_api = !virtio_has_dma_quirk(vgdev->vdev);
- if (virtio_gpu_is_shmem(bo) && use_dma_api) {
- struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo);
- dma_sync_sgtable_for_device(vgdev->vdev->dev.parent,
- shmem->pages, DMA_TO_DEVICE);
- }
+ if (virtio_gpu_is_shmem(bo) && use_dma_api)
+ dma_sync_sgtable_for_device(&vgdev->vdev->dev,
+ bo->base.sgt, DMA_TO_DEVICE);
cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
memset(cmd_p, 0, sizeof(*cmd_p));
--
2.35.1
Add memory shrinker and new madvise IOCTL to the VirtIO-GPU driver.
Userspace (BO cache manager of Mesa driver) will mark BOs as "don't need"
using the new IOCTL to let shrinker purge the marked BOs on OOM, thus
shrinker will lower memory pressure and prevent OOM kills.
Signed-off-by: Daniel Almeida <[email protected]>
Signed-off-by: Dmitry Osipenko <[email protected]>
---
drivers/gpu/drm/virtio/Makefile | 3 +-
drivers/gpu/drm/virtio/virtgpu_drv.h | 26 +++-
drivers/gpu/drm/virtio/virtgpu_gem.c | 84 ++++++++++++
drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c | 124 ++++++++++++++++++
drivers/gpu/drm/virtio/virtgpu_ioctl.c | 37 ++++++
drivers/gpu/drm/virtio/virtgpu_kms.c | 10 ++
drivers/gpu/drm/virtio/virtgpu_object.c | 7 +
drivers/gpu/drm/virtio/virtgpu_plane.c | 17 ++-
drivers/gpu/drm/virtio/virtgpu_vq.c | 15 +++
include/uapi/drm/virtgpu_drm.h | 14 ++
10 files changed, 333 insertions(+), 4 deletions(-)
create mode 100644 drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c
diff --git a/drivers/gpu/drm/virtio/Makefile b/drivers/gpu/drm/virtio/Makefile
index b99fa4a73b68..e1715ae02a2d 100644
--- a/drivers/gpu/drm/virtio/Makefile
+++ b/drivers/gpu/drm/virtio/Makefile
@@ -6,6 +6,7 @@
virtio-gpu-y := virtgpu_drv.o virtgpu_kms.o virtgpu_gem.o virtgpu_vram.o \
virtgpu_display.o virtgpu_vq.o \
virtgpu_fence.o virtgpu_object.o virtgpu_debugfs.o virtgpu_plane.o \
- virtgpu_ioctl.o virtgpu_prime.o virtgpu_trace_points.o
+ virtgpu_ioctl.o virtgpu_prime.o virtgpu_trace_points.o \
+ virtgpu_gem_shrinker.o
obj-$(CONFIG_DRM_VIRTIO_GPU) += virtio-gpu.o
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index b2d93cb12ebf..f907fcd81a24 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -94,6 +94,8 @@ struct virtio_gpu_object {
int uuid_state;
uuid_t uuid;
+
+ refcount_t pin_count;
};
#define gem_to_virtio_gpu_obj(gobj) \
container_of((gobj), struct virtio_gpu_object, base.base)
@@ -211,6 +213,11 @@ struct virtio_gpu_drv_cap_cache {
atomic_t is_valid;
};
+struct virtio_gpu_shrinker {
+ struct shrinker shrinker;
+ struct list_head list;
+};
+
struct virtio_gpu_device {
struct drm_device *ddev;
@@ -261,6 +268,11 @@ struct virtio_gpu_device {
spinlock_t resource_export_lock;
/* protects map state and host_visible_mm */
spinlock_t host_visible_lock;
+
+ struct virtio_gpu_shrinker vgshrinker;
+
+ /* protects all memory management operations */
+ struct mutex mm_lock;
};
struct virtio_gpu_fpriv {
@@ -274,7 +286,7 @@ struct virtio_gpu_fpriv {
};
/* virtgpu_ioctl.c */
-#define DRM_VIRTIO_NUM_IOCTLS 12
+#define DRM_VIRTIO_NUM_IOCTLS 13
extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file);
@@ -310,6 +322,12 @@ void virtio_gpu_array_put_free(struct virtio_gpu_object_array *objs);
void virtio_gpu_array_put_free_delayed(struct virtio_gpu_device *vgdev,
struct virtio_gpu_object_array *objs);
void virtio_gpu_array_put_free_work(struct work_struct *work);
+int virtio_gpu_array_validate(struct virtio_gpu_device *vgdev,
+ struct virtio_gpu_object_array *objs);
+bool virtio_gpu_gem_madvise(struct virtio_gpu_object *obj, int madv);
+int virtio_gpu_gem_host_mem_release(struct virtio_gpu_object *bo);
+int virtio_gpu_gem_pin(struct virtio_gpu_object *bo);
+void virtio_gpu_gem_unpin(struct virtio_gpu_object *bo);
/* virtgpu_vq.c */
int virtio_gpu_alloc_vbufs(struct virtio_gpu_device *vgdev);
@@ -321,6 +339,8 @@ void virtio_gpu_cmd_create_resource(struct virtio_gpu_device *vgdev,
struct virtio_gpu_fence *fence);
void virtio_gpu_cmd_unref_resource(struct virtio_gpu_device *vgdev,
struct virtio_gpu_object *bo);
+int virtio_gpu_cmd_release_resource(struct virtio_gpu_device *vgdev,
+ struct virtio_gpu_object *bo);
void virtio_gpu_cmd_transfer_to_host_2d(struct virtio_gpu_device *vgdev,
uint64_t offset,
uint32_t width, uint32_t height,
@@ -483,4 +503,8 @@ void virtio_gpu_vram_unmap_dma_buf(struct device *dev,
struct sg_table *sgt,
enum dma_data_direction dir);
+/* virtgpu_gem_shrinker.c */
+int virtio_gpu_gem_shrinker_init(struct virtio_gpu_device *vgdev);
+void virtio_gpu_gem_shrinker_fini(struct virtio_gpu_device *vgdev);
+
#endif
diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c
index 48d3c9955f0d..dbe5b8fa8285 100644
--- a/drivers/gpu/drm/virtio/virtgpu_gem.c
+++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
@@ -282,3 +282,87 @@ void virtio_gpu_array_put_free_work(struct work_struct *work)
}
spin_unlock(&vgdev->obj_free_lock);
}
+
+int virtio_gpu_array_validate(struct virtio_gpu_device *vgdev,
+ struct virtio_gpu_object_array *objs)
+{
+ struct drm_gem_shmem_object *shmem;
+ int ret = 0;
+ u32 i;
+
+ mutex_lock(&vgdev->mm_lock);
+
+ for (i = 0; i < objs->nents; i++) {
+ shmem = to_drm_gem_shmem_obj(objs->objs[i]);
+ if (shmem->madv) {
+ ret = -ENOMEM;
+ break;
+ }
+ }
+
+ mutex_unlock(&vgdev->mm_lock);
+
+ return ret;
+}
+
+bool virtio_gpu_gem_madvise(struct virtio_gpu_object *bo, int madv)
+{
+ struct virtio_gpu_device *vgdev = bo->base.base.dev->dev_private;
+ bool retained;
+
+ /* we care only about purging BOs that are backed by guest's memory */
+ if (!virtio_gpu_is_shmem(bo))
+ return true;
+
+ mutex_lock(&vgdev->mm_lock);
+
+ retained = drm_gem_shmem_madvise(&bo->base, madv);
+
+ if (retained && madv == VIRTGPU_MADV_DONTNEED)
+ list_move_tail(&bo->base.madv_list, &vgdev->vgshrinker.list);
+ else
+ list_del_init(&bo->base.madv_list);
+
+ mutex_unlock(&vgdev->mm_lock);
+
+ return retained;
+}
+
+int virtio_gpu_gem_host_mem_release(struct virtio_gpu_object *bo)
+{
+ struct virtio_gpu_device *vgdev = bo->base.base.dev->dev_private;
+ int err;
+
+ if (bo->created) {
+ err = virtio_gpu_cmd_release_resource(vgdev, bo);
+ if (err)
+ return err;
+
+ virtio_gpu_notify(vgdev);
+ bo->created = false;
+ }
+
+ return 0;
+}
+
+int virtio_gpu_gem_pin(struct virtio_gpu_object *bo)
+{
+ struct virtio_gpu_device *vgdev = bo->base.base.dev->dev_private;
+ int ret = 0;
+
+ mutex_lock(&vgdev->mm_lock);
+
+ if (bo->base.madv == VIRTGPU_MADV_WILLNEED)
+ refcount_inc(&bo->pin_count);
+ else
+ ret = -ENOMEM;
+
+ mutex_unlock(&vgdev->mm_lock);
+
+ return ret;
+}
+
+void virtio_gpu_gem_unpin(struct virtio_gpu_object *bo)
+{
+ refcount_dec(&bo->pin_count);
+}
diff --git a/drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c b/drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c
new file mode 100644
index 000000000000..39eb9a3e7e4a
--- /dev/null
+++ b/drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 Collabora Ltd.
+ */
+
+#include <linux/dma-mapping.h>
+#include <linux/shmem_fs.h>
+
+#include "virtgpu_drv.h"
+
+static unsigned long
+virtio_gpu_gem_shrinker_count_objects(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+ struct drm_gem_shmem_object *shmem;
+ struct virtio_gpu_device *vgdev;
+ unsigned long count = 0;
+ bool empty = true;
+
+ vgdev = container_of(shrinker, struct virtio_gpu_device,
+ vgshrinker.shrinker);
+
+ if (!mutex_trylock(&vgdev->mm_lock))
+ return 0;
+
+ list_for_each_entry(shmem, &vgdev->vgshrinker.list, madv_list) {
+ empty = false;
+
+ if (!mutex_trylock(&shmem->pages_lock))
+ continue;
+
+ if (drm_gem_shmem_is_purgeable(shmem))
+ count += shmem->base.size >> PAGE_SHIFT;
+
+ mutex_unlock(&shmem->pages_lock);
+ }
+
+ mutex_unlock(&vgdev->mm_lock);
+
+ return empty ? SHRINK_EMPTY : count;
+}
+
+static bool virtio_gpu_gem_shrinker_purge(struct virtio_gpu_device *vgdev,
+ struct drm_gem_object *obj)
+{
+ struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+ struct drm_gem_shmem_object *shmem = &bo->base;
+ int err;
+
+ if (!dma_resv_test_signaled(obj->resv, true) ||
+ !drm_gem_shmem_is_purgeable(shmem) ||
+ refcount_read(&bo->pin_count))
+ return false;
+
+ /*
+ * Release host's memory before guest's memory is gone to ensure that
+ * host won't touch released memory of the guest.
+ */
+ err = virtio_gpu_gem_host_mem_release(bo);
+ if (err)
+ return false;
+
+ list_del_init(&shmem->madv_list);
+ drm_gem_shmem_purge_locked(shmem);
+
+ return true;
+}
+
+static unsigned long
+virtio_gpu_gem_shrinker_scan_objects(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+ struct drm_gem_shmem_object *shmem, *tmp;
+ struct virtio_gpu_device *vgdev;
+ unsigned long freed = 0;
+
+ vgdev = container_of(shrinker, struct virtio_gpu_device,
+ vgshrinker.shrinker);
+
+ if (!mutex_trylock(&vgdev->mm_lock))
+ return SHRINK_STOP;
+
+ list_for_each_entry_safe(shmem, tmp, &vgdev->vgshrinker.list, madv_list) {
+ if (freed >= sc->nr_to_scan)
+ break;
+
+ if (!dma_resv_trylock(shmem->base.resv))
+ continue;
+
+ if (!mutex_trylock(&shmem->pages_lock))
+ goto resv_unlock;
+
+ if (virtio_gpu_gem_shrinker_purge(vgdev, &shmem->base))
+ freed += shmem->base.size >> PAGE_SHIFT;
+
+ mutex_unlock(&shmem->pages_lock);
+resv_unlock:
+ dma_resv_unlock(shmem->base.resv);
+ }
+
+ mutex_unlock(&vgdev->mm_lock);
+
+ return freed;
+}
+
+int virtio_gpu_gem_shrinker_init(struct virtio_gpu_device *vgdev)
+{
+ struct shrinker *shrinker = &vgdev->vgshrinker.shrinker;
+
+ shrinker->count_objects = virtio_gpu_gem_shrinker_count_objects;
+ shrinker->scan_objects = virtio_gpu_gem_shrinker_scan_objects;
+ shrinker->seeks = DEFAULT_SEEKS;
+
+ INIT_LIST_HEAD(&vgdev->vgshrinker.list);
+
+ return register_shrinker(shrinker);
+}
+
+void virtio_gpu_gem_shrinker_fini(struct virtio_gpu_device *vgdev)
+{
+ struct shrinker *shrinker = &vgdev->vgshrinker.shrinker;
+
+ unregister_shrinker(shrinker);
+}
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index c708bab555c6..bb5369eee425 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -217,6 +217,10 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data,
ret = virtio_gpu_array_lock_resv(buflist);
if (ret)
goto out_memdup;
+
+ ret = virtio_gpu_array_validate(vgdev, buflist);
+ if (ret)
+ goto out_unresv;
}
out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
@@ -423,6 +427,10 @@ static int virtio_gpu_transfer_from_host_ioctl(struct drm_device *dev,
if (ret != 0)
goto err_put_free;
+ ret = virtio_gpu_array_validate(vgdev, objs);
+ if (ret)
+ goto err_unlock;
+
fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
if (!fence) {
ret = -ENOMEM;
@@ -482,6 +490,10 @@ static int virtio_gpu_transfer_to_host_ioctl(struct drm_device *dev, void *data,
if (ret != 0)
goto err_put_free;
+ ret = virtio_gpu_array_validate(vgdev, objs);
+ if (ret)
+ goto err_unlock;
+
ret = -ENOMEM;
fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context,
0);
@@ -836,6 +848,28 @@ static int virtio_gpu_context_init_ioctl(struct drm_device *dev,
return ret;
}
+static int virtio_gpu_madvise_ioctl(struct drm_device *dev,
+ void *data,
+ struct drm_file *file)
+{
+ struct drm_virtgpu_madvise *args = data;
+ struct virtio_gpu_object *bo;
+ struct drm_gem_object *obj;
+
+ if (args->madv > VIRTGPU_MADV_DONTNEED)
+ return -EOPNOTSUPP;
+
+ obj = drm_gem_object_lookup(file, args->bo_handle);
+ if (!obj)
+ return -ENOENT;
+
+ bo = gem_to_virtio_gpu_obj(obj);
+ args->retained = virtio_gpu_gem_madvise(bo, args->madv);
+ drm_gem_object_put(obj);
+
+ return 0;
+}
+
struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS] = {
DRM_IOCTL_DEF_DRV(VIRTGPU_MAP, virtio_gpu_map_ioctl,
DRM_RENDER_ALLOW),
@@ -875,4 +909,7 @@ struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS] = {
DRM_IOCTL_DEF_DRV(VIRTGPU_CONTEXT_INIT, virtio_gpu_context_init_ioctl,
DRM_RENDER_ALLOW),
+
+ DRM_IOCTL_DEF_DRV(VIRTGPU_MADVISE, virtio_gpu_madvise_ioctl,
+ DRM_RENDER_ALLOW),
};
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c b/drivers/gpu/drm/virtio/virtgpu_kms.c
index 0d1e3eb61bee..0a8c7cc64da9 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -134,6 +134,7 @@ int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev)
dev->dev_private = vgdev;
vgdev->vdev = vdev;
+ mutex_init(&vgdev->mm_lock);
spin_lock_init(&vgdev->display_info_lock);
spin_lock_init(&vgdev->resource_export_lock);
spin_lock_init(&vgdev->host_visible_lock);
@@ -238,6 +239,12 @@ int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev)
goto err_scanouts;
}
+ ret = virtio_gpu_gem_shrinker_init(vgdev);
+ if (ret) {
+ DRM_ERROR("shrinker init failed\n");
+ goto err_modeset;
+ }
+
virtio_device_ready(vgdev->vdev);
if (num_capsets)
@@ -250,6 +257,8 @@ int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev)
5 * HZ);
return 0;
+err_modeset:
+ virtio_gpu_modeset_fini(vgdev);
err_scanouts:
virtio_gpu_free_vbufs(vgdev);
err_vbufs:
@@ -289,6 +298,7 @@ void virtio_gpu_release(struct drm_device *dev)
if (!vgdev)
return;
+ virtio_gpu_gem_shrinker_fini(vgdev);
virtio_gpu_modeset_fini(vgdev);
virtio_gpu_free_vbufs(vgdev);
virtio_gpu_cleanup_cap_cache(vgdev);
diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c
index 1964c0d8b51f..696bb31fc626 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -88,6 +88,13 @@ static void virtio_gpu_free_object(struct drm_gem_object *obj)
struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
struct virtio_gpu_device *vgdev = bo->base.base.dev->dev_private;
+ /*
+ * Prevent racing with the memory shrinker during BO's cleanup
+ * by taking out BO from the shrinker's purge list if BO is on
+ * the list, otherwise this is a no-op if BO is already purged.
+ */
+ virtio_gpu_gem_madvise(bo, VIRTGPU_MADV_WILLNEED);
+
if (bo->created) {
virtio_gpu_cmd_unref_resource(vgdev, bo);
virtio_gpu_notify(vgdev);
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index 6d3cc9e238a4..597ef1645bf2 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -246,20 +246,28 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane,
struct virtio_gpu_device *vgdev = dev->dev_private;
struct virtio_gpu_framebuffer *vgfb;
struct virtio_gpu_object *bo;
+ int err;
if (!new_state->fb)
return 0;
vgfb = to_virtio_gpu_framebuffer(new_state->fb);
bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]);
- if (!bo || (plane->type == DRM_PLANE_TYPE_PRIMARY && !bo->guest_blob))
+
+ err = virtio_gpu_gem_pin(bo);
+ if (err)
+ return err;
+
+ if (plane->type == DRM_PLANE_TYPE_PRIMARY && !bo->guest_blob)
return 0;
if (bo->dumb && (plane->state->fb != new_state->fb)) {
vgfb->fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context,
0);
- if (!vgfb->fence)
+ if (!vgfb->fence) {
+ virtio_gpu_gem_unpin(bo);
return -ENOMEM;
+ }
}
return 0;
@@ -269,15 +277,20 @@ static void virtio_gpu_plane_cleanup_fb(struct drm_plane *plane,
struct drm_plane_state *old_state)
{
struct virtio_gpu_framebuffer *vgfb;
+ struct virtio_gpu_object *bo;
if (!plane->state->fb)
return;
vgfb = to_virtio_gpu_framebuffer(plane->state->fb);
+ bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]);
+
if (vgfb->fence) {
dma_fence_put(&vgfb->fence->f);
vgfb->fence = NULL;
}
+
+ virtio_gpu_gem_unpin(bo);
}
static void virtio_gpu_cursor_plane_update(struct drm_plane *plane,
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 06566e44307d..c55c2fc8ecc0 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -536,6 +536,21 @@ void virtio_gpu_cmd_unref_resource(struct virtio_gpu_device *vgdev,
virtio_gpu_cleanup_object(bo);
}
+int virtio_gpu_cmd_release_resource(struct virtio_gpu_device *vgdev,
+ struct virtio_gpu_object *bo)
+{
+ struct virtio_gpu_resource_unref *cmd_p;
+ struct virtio_gpu_vbuffer *vbuf;
+
+ cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
+ memset(cmd_p, 0, sizeof(*cmd_p));
+
+ cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_UNREF);
+ cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
+
+ return virtio_gpu_queue_ctrl_buffer(vgdev, vbuf);
+}
+
void virtio_gpu_cmd_set_scanout(struct virtio_gpu_device *vgdev,
uint32_t scanout_id, uint32_t resource_id,
uint32_t width, uint32_t height,
diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h
index 0512fde5e697..12197d8e9759 100644
--- a/include/uapi/drm/virtgpu_drm.h
+++ b/include/uapi/drm/virtgpu_drm.h
@@ -48,6 +48,7 @@ extern "C" {
#define DRM_VIRTGPU_GET_CAPS 0x09
#define DRM_VIRTGPU_RESOURCE_CREATE_BLOB 0x0a
#define DRM_VIRTGPU_CONTEXT_INIT 0x0b
+#define DRM_VIRTGPU_MADVISE 0x0c
#define VIRTGPU_EXECBUF_FENCE_FD_IN 0x01
#define VIRTGPU_EXECBUF_FENCE_FD_OUT 0x02
@@ -196,6 +197,15 @@ struct drm_virtgpu_context_init {
__u64 ctx_set_params;
};
+#define VIRTGPU_MADV_WILLNEED 0
+#define VIRTGPU_MADV_DONTNEED 1
+struct drm_virtgpu_madvise {
+ __u32 bo_handle;
+ __u32 retained; /* out, non-zero if BO can be used */
+ __u32 madv;
+ __u32 pad;
+};
+
/*
* Event code that's given when VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK is in
* effect. The event size is sizeof(drm_event), since there is no additional
@@ -246,6 +256,10 @@ struct drm_virtgpu_context_init {
DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_CONTEXT_INIT, \
struct drm_virtgpu_context_init)
+#define DRM_IOCTL_VIRTGPU_MADVISE \
+ DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_MADVISE, \
+ struct drm_virtgpu_madvise)
+
#if defined(__cplusplus)
}
#endif
--
2.35.1
drm_gem_shmem_get_sg_table() never ever returned NULL on error. Correct
the error handling to avoid crash on OOM.
Cc: [email protected]
Signed-off-by: Dmitry Osipenko <[email protected]>
---
drivers/gpu/drm/virtio/virtgpu_object.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c
index f293e6ad52da..bea7806a3ae3 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -168,9 +168,11 @@ static int virtio_gpu_object_shmem_init(struct virtio_gpu_device *vgdev,
* since virtio_gpu doesn't support dma-buf import from other devices.
*/
shmem->pages = drm_gem_shmem_get_sg_table(&bo->base);
- if (!shmem->pages) {
+ ret = PTR_ERR(shmem->pages);
+ if (ret) {
drm_gem_shmem_unpin(&bo->base);
- return -EINVAL;
+ shmem->pages = NULL;
+ return ret;
}
if (use_dma_api) {
--
2.35.1
On 3/8/22 19:29, Rob Clark wrote:
> On Tue, Mar 8, 2022 at 5:17 AM Dmitry Osipenko
> <[email protected]> wrote:
>>
>> Hello,
>>
>> This patchset introduces memory shrinker for the VirtIO-GPU DRM driver.
>> During OOM, the shrinker will release BOs that are marked as "not needed"
>> by userspace using the new madvise IOCTL. The userspace in this case is
>> the Mesa VirGL driver, it will mark the cached BOs as "not needed",
>> allowing kernel driver to release memory of the cached shmem BOs on lowmem
>> situations, preventing OOM kills.
>
> Will host memory pressure already trigger shrinker in guest?
The host memory pressure won't trigger shrinker in guest here. This
series will help only with the memory pressure within the guest using a
usual "virgl context".
Having a host shrinker in a case of "virgl contexts" should be a
difficult problem to solve.
> This is
> something I'm quite interested in for "virtgpu native contexts" (ie.
> native guest driver with new context type sitting on top of virtgpu),
In a case of "native contexts" it should be doable, at least I can't see
any obvious problems. The madvise invocations could be passed to the
host using a new virtio-gpu command by the guest's madvise IOCTL
handler, instead-of/in-addition-to handling madvise in the guest's
kernel, and that's it.
> since that isn't using host storage
s/host/guest ?
On 3/8/22 16:17, Dmitry Osipenko wrote:
> @@ -246,20 +246,28 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane,
> struct virtio_gpu_device *vgdev = dev->dev_private;
> struct virtio_gpu_framebuffer *vgfb;
> struct virtio_gpu_object *bo;
> + int err;
>
> if (!new_state->fb)
> return 0;
>
> vgfb = to_virtio_gpu_framebuffer(new_state->fb);
> bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]);
> - if (!bo || (plane->type == DRM_PLANE_TYPE_PRIMARY && !bo->guest_blob))
> +
> + err = virtio_gpu_gem_pin(bo);
> + if (err)
> + return err;
I just noticed that this produces a refcount debug warning because I
missed to initialize the refcount when BO is created. That warning splat
was hidden by a huge lockdep splat produced by
drm_aperture_remove_conflicting_pci_framebuffers(), which probably
should be fixed. I'll correct it in v2.
On Tue, Mar 8, 2022 at 5:17 AM Dmitry Osipenko
<[email protected]> wrote:
>
> Hello,
>
> This patchset introduces memory shrinker for the VirtIO-GPU DRM driver.
> During OOM, the shrinker will release BOs that are marked as "not needed"
> by userspace using the new madvise IOCTL. The userspace in this case is
> the Mesa VirGL driver, it will mark the cached BOs as "not needed",
> allowing kernel driver to release memory of the cached shmem BOs on lowmem
> situations, preventing OOM kills.
Will host memory pressure already trigger shrinker in guest? This is
something I'm quite interested in for "virtgpu native contexts" (ie.
native guest driver with new context type sitting on top of virtgpu),
since that isn't using host storage
BR,
-R
> This patchset includes couple fixes for problems I found while was working
> on the shrinker, it also includes prerequisite DMA API usage improvement
> needed by the shrinker.
>
> The Mesa and IGT patches will be kept on hold until this kernel series
> will be approved and applied.
>
> This patchset was tested using Qemu and crosvm, including both cases of
> IOMMU off/on.
>
> Mesa: https://gitlab.freedesktop.org/digetx/mesa/-/commits/virgl-madvise
> IGT: https://gitlab.freedesktop.org/digetx/igt-gpu-tools/-/tree/virtio-madvise
>
> Dmitry Osipenko (5):
> drm/virtio: Correct drm_gem_shmem_get_sg_table() error handling
> drm/virtio: Check whether transferred 2D BO is shmem
> drm/virtio: Unlock GEM reservations in error code path
> drm/virtio: Improve DMA API usage for shmem BOs
> drm/virtio: Add memory shrinker
>
> drivers/gpu/drm/virtio/Makefile | 3 +-
> drivers/gpu/drm/virtio/virtgpu_drv.c | 22 +++-
> drivers/gpu/drm/virtio/virtgpu_drv.h | 31 ++++-
> drivers/gpu/drm/virtio/virtgpu_gem.c | 84 ++++++++++++
> drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c | 124 ++++++++++++++++++
> drivers/gpu/drm/virtio/virtgpu_ioctl.c | 37 ++++++
> drivers/gpu/drm/virtio/virtgpu_kms.c | 17 ++-
> drivers/gpu/drm/virtio/virtgpu_object.c | 63 +++------
> drivers/gpu/drm/virtio/virtgpu_plane.c | 17 ++-
> drivers/gpu/drm/virtio/virtgpu_vq.c | 30 +++--
> include/uapi/drm/virtgpu_drm.h | 14 ++
> 11 files changed, 373 insertions(+), 69 deletions(-)
> create mode 100644 drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c
>
> --
> 2.35.1
>
Unlock reservations in the error code path of virtio_gpu_object_create()
to silence debug warning splat produced by ww_mutex_destroy(&obj->lock)
when GEM is released with the held lock.
Signed-off-by: Dmitry Osipenko <[email protected]>
---
drivers/gpu/drm/virtio/virtgpu_object.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c
index bea7806a3ae3..0b8cbb87f8d8 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -250,6 +250,8 @@ int virtio_gpu_object_create(struct virtio_gpu_device *vgdev,
ret = virtio_gpu_object_shmem_init(vgdev, bo, &ents, &nents);
if (ret != 0) {
+ if (fence)
+ virtio_gpu_array_unlock_resv(objs);
virtio_gpu_array_put_free(objs);
virtio_gpu_free_object(&shmem_obj->base);
return ret;
--
2.35.1
On Tue, Mar 8, 2022 at 11:28 AM Dmitry Osipenko
<[email protected]> wrote:
>
> On 3/8/22 19:29, Rob Clark wrote:
> > On Tue, Mar 8, 2022 at 5:17 AM Dmitry Osipenko
> > <[email protected]> wrote:
> >>
> >> Hello,
> >>
> >> This patchset introduces memory shrinker for the VirtIO-GPU DRM driver.
> >> During OOM, the shrinker will release BOs that are marked as "not needed"
> >> by userspace using the new madvise IOCTL. The userspace in this case is
> >> the Mesa VirGL driver, it will mark the cached BOs as "not needed",
> >> allowing kernel driver to release memory of the cached shmem BOs on lowmem
> >> situations, preventing OOM kills.
> >
> > Will host memory pressure already trigger shrinker in guest?
>
> The host memory pressure won't trigger shrinker in guest here. This
> series will help only with the memory pressure within the guest using a
> usual "virgl context".
>
> Having a host shrinker in a case of "virgl contexts" should be a
> difficult problem to solve.
Hmm, I think we just need the balloon driver to trigger the shrinker
in the guest kernel? I suppose a driver like drm/virtio might want to
differentiate between host and guest pressure (ie. consider only
objects that have host vs guest storage), but even without that,
freeing up memory in the guest when host is under memory pressure
seems worthwhile. Maybe I'm over-simplifying?
> > This is
> > something I'm quite interested in for "virtgpu native contexts" (ie.
> > native guest driver with new context type sitting on top of virtgpu),
>
> In a case of "native contexts" it should be doable, at least I can't see
> any obvious problems. The madvise invocations could be passed to the
> host using a new virtio-gpu command by the guest's madvise IOCTL
> handler, instead-of/in-addition-to handling madvise in the guest's
> kernel, and that's it.
I think we don't want to do that, because MADV:WILLNEED would be by
far the most frequent guest<->host synchronous round trip. So from
that perspective tracking madvise state in guest kernel seems quite
attractive.
If we really can't track madvise state in the guest for dealing with
host memory pressure, I think the better option is to introduce
MADV:WILLNEED_REPLACE, ie. something to tell the host kernel that the
buffer is needed but the previous contents are not (as long as the GPU
VA remains the same). With this the host could allocate new pages if
needed, and the guest would not need to wait for a reply from host.
> > since that isn't using host storage
>
> s/host/guest ?
Yes, sorry, I meant that it is not using guest storage.
BR,
-R
Transferred 2D BO always must be a shmem BO. Add check for that to prevent
NULL dereference if userspace passes a VRAM BO.
Signed-off-by: Dmitry Osipenko <[email protected]>
---
drivers/gpu/drm/virtio/virtgpu_vq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 7c052efe8836..2edf31806b74 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -595,7 +595,7 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct virtio_gpu_device *vgdev,
bool use_dma_api = !virtio_has_dma_quirk(vgdev->vdev);
struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo);
- if (use_dma_api)
+ if (virtio_gpu_is_shmem(bo) && use_dma_api)
dma_sync_sgtable_for_device(vgdev->vdev->dev.parent,
shmem->pages, DMA_TO_DEVICE);
--
2.35.1
On Tue, Mar 8, 2022 at 5:17 AM Dmitry Osipenko
<[email protected]> wrote:
>
> Add memory shrinker and new madvise IOCTL to the VirtIO-GPU driver.
> Userspace (BO cache manager of Mesa driver) will mark BOs as "don't need"
> using the new IOCTL to let shrinker purge the marked BOs on OOM, thus
> shrinker will lower memory pressure and prevent OOM kills.
>
> Signed-off-by: Daniel Almeida <[email protected]>
> Signed-off-by: Dmitry Osipenko <[email protected]>
> ---
> drivers/gpu/drm/virtio/Makefile | 3 +-
> drivers/gpu/drm/virtio/virtgpu_drv.h | 26 +++-
> drivers/gpu/drm/virtio/virtgpu_gem.c | 84 ++++++++++++
> drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c | 124 ++++++++++++++++++
> drivers/gpu/drm/virtio/virtgpu_ioctl.c | 37 ++++++
> drivers/gpu/drm/virtio/virtgpu_kms.c | 10 ++
> drivers/gpu/drm/virtio/virtgpu_object.c | 7 +
> drivers/gpu/drm/virtio/virtgpu_plane.c | 17 ++-
> drivers/gpu/drm/virtio/virtgpu_vq.c | 15 +++
> include/uapi/drm/virtgpu_drm.h | 14 ++
> 10 files changed, 333 insertions(+), 4 deletions(-)
> create mode 100644 drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c
>
[snip]
> diff --git a/drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c b/drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c
> new file mode 100644
> index 000000000000..39eb9a3e7e4a
> --- /dev/null
> +++ b/drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c
> @@ -0,0 +1,124 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2022 Collabora Ltd.
> + */
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/shmem_fs.h>
> +
> +#include "virtgpu_drv.h"
> +
> +static unsigned long
> +virtio_gpu_gem_shrinker_count_objects(struct shrinker *shrinker,
> + struct shrink_control *sc)
> +{
> + struct drm_gem_shmem_object *shmem;
> + struct virtio_gpu_device *vgdev;
> + unsigned long count = 0;
> + bool empty = true;
> +
> + vgdev = container_of(shrinker, struct virtio_gpu_device,
> + vgshrinker.shrinker);
> +
> + if (!mutex_trylock(&vgdev->mm_lock))
> + return 0;
One bit of advice from previously dealing with shrinker and heavy
memory pressure situations (turns out 4GB chromebooks can be pretty
much under *constant* memory pressure):
You *really* want to make shrinker->count_objects lockless.. and
minimize the lock contention on shrinker->scan_objects (ie. The
problem is you can end up with shrinking going on on all CPU cores in
parallel, you want to not funnel that thru one lock as much as
possible.
See in particular:
25ed38b3ed26 ("drm/msm: Drop mm_lock in scan loop")
cc8a4d5a1bd8 ("drm/msm: Avoid mutex in shrinker_count()")
BR,
-R
> + list_for_each_entry(shmem, &vgdev->vgshrinker.list, madv_list) {
> + empty = false;
> +
> + if (!mutex_trylock(&shmem->pages_lock))
> + continue;
> +
> + if (drm_gem_shmem_is_purgeable(shmem))
> + count += shmem->base.size >> PAGE_SHIFT;
> +
> + mutex_unlock(&shmem->pages_lock);
> + }
> +
> + mutex_unlock(&vgdev->mm_lock);
> +
> + return empty ? SHRINK_EMPTY : count;
> +}
> +
> +static bool virtio_gpu_gem_shrinker_purge(struct virtio_gpu_device *vgdev,
> + struct drm_gem_object *obj)
> +{
> + struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> + struct drm_gem_shmem_object *shmem = &bo->base;
> + int err;
> +
> + if (!dma_resv_test_signaled(obj->resv, true) ||
> + !drm_gem_shmem_is_purgeable(shmem) ||
> + refcount_read(&bo->pin_count))
> + return false;
> +
> + /*
> + * Release host's memory before guest's memory is gone to ensure that
> + * host won't touch released memory of the guest.
> + */
> + err = virtio_gpu_gem_host_mem_release(bo);
> + if (err)
> + return false;
> +
> + list_del_init(&shmem->madv_list);
> + drm_gem_shmem_purge_locked(shmem);
> +
> + return true;
> +}
> +
> +static unsigned long
> +virtio_gpu_gem_shrinker_scan_objects(struct shrinker *shrinker,
> + struct shrink_control *sc)
> +{
> + struct drm_gem_shmem_object *shmem, *tmp;
> + struct virtio_gpu_device *vgdev;
> + unsigned long freed = 0;
> +
> + vgdev = container_of(shrinker, struct virtio_gpu_device,
> + vgshrinker.shrinker);
> +
> + if (!mutex_trylock(&vgdev->mm_lock))
> + return SHRINK_STOP;
> +
> + list_for_each_entry_safe(shmem, tmp, &vgdev->vgshrinker.list, madv_list) {
> + if (freed >= sc->nr_to_scan)
> + break;
> +
> + if (!dma_resv_trylock(shmem->base.resv))
> + continue;
> +
> + if (!mutex_trylock(&shmem->pages_lock))
> + goto resv_unlock;
> +
> + if (virtio_gpu_gem_shrinker_purge(vgdev, &shmem->base))
> + freed += shmem->base.size >> PAGE_SHIFT;
> +
> + mutex_unlock(&shmem->pages_lock);
> +resv_unlock:
> + dma_resv_unlock(shmem->base.resv);
> + }
> +
> + mutex_unlock(&vgdev->mm_lock);
> +
> + return freed;
> +}
> +
> +int virtio_gpu_gem_shrinker_init(struct virtio_gpu_device *vgdev)
> +{
> + struct shrinker *shrinker = &vgdev->vgshrinker.shrinker;
> +
> + shrinker->count_objects = virtio_gpu_gem_shrinker_count_objects;
> + shrinker->scan_objects = virtio_gpu_gem_shrinker_scan_objects;
> + shrinker->seeks = DEFAULT_SEEKS;
> +
> + INIT_LIST_HEAD(&vgdev->vgshrinker.list);
> +
> + return register_shrinker(shrinker);
> +}
> +
> +void virtio_gpu_gem_shrinker_fini(struct virtio_gpu_device *vgdev)
> +{
> + struct shrinker *shrinker = &vgdev->vgshrinker.shrinker;
> +
> + unregister_shrinker(shrinker);
> +}
On Tue, Mar 8, 2022 at 3:36 PM Dmitry Osipenko
<[email protected]> wrote:
>
> On 3/9/22 01:24, Rob Clark wrote:
> > On Tue, Mar 8, 2022 at 11:28 AM Dmitry Osipenko
> > <[email protected]> wrote:
> >>
> >> On 3/8/22 19:29, Rob Clark wrote:
> >>> On Tue, Mar 8, 2022 at 5:17 AM Dmitry Osipenko
> >>> <[email protected]> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> This patchset introduces memory shrinker for the VirtIO-GPU DRM driver.
> >>>> During OOM, the shrinker will release BOs that are marked as "not needed"
> >>>> by userspace using the new madvise IOCTL. The userspace in this case is
> >>>> the Mesa VirGL driver, it will mark the cached BOs as "not needed",
> >>>> allowing kernel driver to release memory of the cached shmem BOs on lowmem
> >>>> situations, preventing OOM kills.
> >>>
> >>> Will host memory pressure already trigger shrinker in guest?
> >>
> >> The host memory pressure won't trigger shrinker in guest here. This
> >> series will help only with the memory pressure within the guest using a
> >> usual "virgl context".
> >>
> >> Having a host shrinker in a case of "virgl contexts" should be a
> >> difficult problem to solve.
> >
> > Hmm, I think we just need the balloon driver to trigger the shrinker
> > in the guest kernel? I suppose a driver like drm/virtio might want to
> > differentiate between host and guest pressure (ie. consider only
> > objects that have host vs guest storage), but even without that,
> > freeing up memory in the guest when host is under memory pressure
> > seems worthwhile. Maybe I'm over-simplifying?
>
> Might be the opposite, i.e. me over-complicating :) The variant with
> memory ballooning actually could be good and will work for all kinds of
> virtio contexts universally. There will be some back-n-forth between
> host and guest, but perhaps it will work okay. Thank you for the suggestion.
>
> >>> This is
> >>> something I'm quite interested in for "virtgpu native contexts" (ie.
> >>> native guest driver with new context type sitting on top of virtgpu),
> >>
> >> In a case of "native contexts" it should be doable, at least I can't see
> >> any obvious problems. The madvise invocations could be passed to the
> >> host using a new virtio-gpu command by the guest's madvise IOCTL
> >> handler, instead-of/in-addition-to handling madvise in the guest's
> >> kernel, and that's it.
> >
> > I think we don't want to do that, because MADV:WILLNEED would be by
> > far the most frequent guest<->host synchronous round trip. So from
> > that perspective tracking madvise state in guest kernel seems quite
> > attractive.
>
> This is a valid concern. I'd assume that the overhead should be
> tolerable, but I don't have any actual perf numbers.
jfwiw, MADV:WILLNEED is a *very* hot path for gl drivers, based on
some measurements I did a while back with various apps/benchmarks..
easily more than 10x the next most frequent ioctl (for MADV:WONTNEED
and MADV:WILLNEED each, so more than 20x combined.. but MADV:WONTNEED
can be async).
But if the balloon triggering shrinker approach works out, that would
be pretty great.. it seems like the easy option and doesn't require
adding new host kernel uabi :-)
BR,
-R
> > If we really can't track madvise state in the guest for dealing with
> > host memory pressure, I think the better option is to introduce
> > MADV:WILLNEED_REPLACE, ie. something to tell the host kernel that the
> > buffer is needed but the previous contents are not (as long as the GPU
> > VA remains the same). With this the host could allocate new pages if
> > needed, and the guest would not need to wait for a reply from host.
>
> If variant with the memory ballooning will work, then it will be
> possible to track the state within guest-only. Let's consider the
> simplest variant for now.
>
> I'll try to implement the balloon driver support in the v2 and will get
> back to you.
>
> >>> since that isn't using host storage
> >>
> >> s/host/guest ?
> >
> > Yes, sorry, I meant that it is not using guest storage.
>
> Thank you for the clarification.
On 3/9/22 01:24, Rob Clark wrote:
> On Tue, Mar 8, 2022 at 11:28 AM Dmitry Osipenko
> <[email protected]> wrote:
>>
>> On 3/8/22 19:29, Rob Clark wrote:
>>> On Tue, Mar 8, 2022 at 5:17 AM Dmitry Osipenko
>>> <[email protected]> wrote:
>>>>
>>>> Hello,
>>>>
>>>> This patchset introduces memory shrinker for the VirtIO-GPU DRM driver.
>>>> During OOM, the shrinker will release BOs that are marked as "not needed"
>>>> by userspace using the new madvise IOCTL. The userspace in this case is
>>>> the Mesa VirGL driver, it will mark the cached BOs as "not needed",
>>>> allowing kernel driver to release memory of the cached shmem BOs on lowmem
>>>> situations, preventing OOM kills.
>>>
>>> Will host memory pressure already trigger shrinker in guest?
>>
>> The host memory pressure won't trigger shrinker in guest here. This
>> series will help only with the memory pressure within the guest using a
>> usual "virgl context".
>>
>> Having a host shrinker in a case of "virgl contexts" should be a
>> difficult problem to solve.
>
> Hmm, I think we just need the balloon driver to trigger the shrinker
> in the guest kernel? I suppose a driver like drm/virtio might want to
> differentiate between host and guest pressure (ie. consider only
> objects that have host vs guest storage), but even without that,
> freeing up memory in the guest when host is under memory pressure
> seems worthwhile. Maybe I'm over-simplifying?
Might be the opposite, i.e. me over-complicating :) The variant with
memory ballooning actually could be good and will work for all kinds of
virtio contexts universally. There will be some back-n-forth between
host and guest, but perhaps it will work okay. Thank you for the suggestion.
>>> This is
>>> something I'm quite interested in for "virtgpu native contexts" (ie.
>>> native guest driver with new context type sitting on top of virtgpu),
>>
>> In a case of "native contexts" it should be doable, at least I can't see
>> any obvious problems. The madvise invocations could be passed to the
>> host using a new virtio-gpu command by the guest's madvise IOCTL
>> handler, instead-of/in-addition-to handling madvise in the guest's
>> kernel, and that's it.
>
> I think we don't want to do that, because MADV:WILLNEED would be by
> far the most frequent guest<->host synchronous round trip. So from
> that perspective tracking madvise state in guest kernel seems quite
> attractive.
This is a valid concern. I'd assume that the overhead should be
tolerable, but I don't have any actual perf numbers.
> If we really can't track madvise state in the guest for dealing with
> host memory pressure, I think the better option is to introduce
> MADV:WILLNEED_REPLACE, ie. something to tell the host kernel that the
> buffer is needed but the previous contents are not (as long as the GPU
> VA remains the same). With this the host could allocate new pages if
> needed, and the guest would not need to wait for a reply from host.
If variant with the memory ballooning will work, then it will be
possible to track the state within guest-only. Let's consider the
simplest variant for now.
I'll try to implement the balloon driver support in the v2 and will get
back to you.
>>> since that isn't using host storage
>>
>> s/host/guest ?
>
> Yes, sorry, I meant that it is not using guest storage.
Thank you for the clarification.
Hi
Am 08.03.22 um 14:17 schrieb Dmitry Osipenko:
> Hello,
>
> This patchset introduces memory shrinker for the VirtIO-GPU DRM driver.
> During OOM, the shrinker will release BOs that are marked as "not needed"
> by userspace using the new madvise IOCTL. The userspace in this case is
> the Mesa VirGL driver, it will mark the cached BOs as "not needed",
> allowing kernel driver to release memory of the cached shmem BOs on lowmem
> situations, preventing OOM kills.
Virtio-gpu is build on top of GEM shmem helpers. I have a prototype
patchset that adds a shrinker to these helpers. If you want to go
further, you could implement something like that instead. Panfrost and
lima also have their own shrinker and could certainly be converted to
the gem-shmem shrinker.
Best regards
Thomas
>
> This patchset includes couple fixes for problems I found while was working
> on the shrinker, it also includes prerequisite DMA API usage improvement
> needed by the shrinker.
>
> The Mesa and IGT patches will be kept on hold until this kernel series
> will be approved and applied.
>
> This patchset was tested using Qemu and crosvm, including both cases of
> IOMMU off/on.
>
> Mesa: https://gitlab.freedesktop.org/digetx/mesa/-/commits/virgl-madvise
> IGT: https://gitlab.freedesktop.org/digetx/igt-gpu-tools/-/tree/virtio-madvise
>
> Dmitry Osipenko (5):
> drm/virtio: Correct drm_gem_shmem_get_sg_table() error handling
> drm/virtio: Check whether transferred 2D BO is shmem
> drm/virtio: Unlock GEM reservations in error code path
> drm/virtio: Improve DMA API usage for shmem BOs
> drm/virtio: Add memory shrinker
>
> drivers/gpu/drm/virtio/Makefile | 3 +-
> drivers/gpu/drm/virtio/virtgpu_drv.c | 22 +++-
> drivers/gpu/drm/virtio/virtgpu_drv.h | 31 ++++-
> drivers/gpu/drm/virtio/virtgpu_gem.c | 84 ++++++++++++
> drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c | 124 ++++++++++++++++++
> drivers/gpu/drm/virtio/virtgpu_ioctl.c | 37 ++++++
> drivers/gpu/drm/virtio/virtgpu_kms.c | 17 ++-
> drivers/gpu/drm/virtio/virtgpu_object.c | 63 +++------
> drivers/gpu/drm/virtio/virtgpu_plane.c | 17 ++-
> drivers/gpu/drm/virtio/virtgpu_vq.c | 30 +++--
> include/uapi/drm/virtgpu_drm.h | 14 ++
> 11 files changed, 373 insertions(+), 69 deletions(-)
> create mode 100644 drivers/gpu/drm/virtio/virtgpu_gem_shrinker.c
>
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev
On 3/9/22 04:12, Rob Clark wrote:
>> +static unsigned long
>> +virtio_gpu_gem_shrinker_count_objects(struct shrinker *shrinker,
>> + struct shrink_control *sc)
>> +{
>> + struct drm_gem_shmem_object *shmem;
>> + struct virtio_gpu_device *vgdev;
>> + unsigned long count = 0;
>> + bool empty = true;
>> +
>> + vgdev = container_of(shrinker, struct virtio_gpu_device,
>> + vgshrinker.shrinker);
>> +
>> + if (!mutex_trylock(&vgdev->mm_lock))
>> + return 0;
> One bit of advice from previously dealing with shrinker and heavy
> memory pressure situations (turns out 4GB chromebooks can be pretty
> much under *constant* memory pressure):
>
> You *really* want to make shrinker->count_objects lockless.. and
> minimize the lock contention on shrinker->scan_objects (ie. The
> problem is you can end up with shrinking going on on all CPU cores in
> parallel, you want to not funnel that thru one lock as much as
> possible.
>
> See in particular:
>
> 25ed38b3ed26 ("drm/msm: Drop mm_lock in scan loop")
> cc8a4d5a1bd8 ("drm/msm: Avoid mutex in shrinker_count()")
Thank you, I'll take that into account for v2.
Hello,
On 3/9/22 11:59, Thomas Zimmermann wrote:
> Hi
>
> Am 08.03.22 um 14:17 schrieb Dmitry Osipenko:
>> Hello,
>>
>> This patchset introduces memory shrinker for the VirtIO-GPU DRM driver.
>> During OOM, the shrinker will release BOs that are marked as "not needed"
>> by userspace using the new madvise IOCTL. The userspace in this case is
>> the Mesa VirGL driver, it will mark the cached BOs as "not needed",
>> allowing kernel driver to release memory of the cached shmem BOs on
>> lowmem
>> situations, preventing OOM kills.
>
> Virtio-gpu is build on top of GEM shmem helpers. I have a prototype
> patchset that adds a shrinker to these helpers. If you want to go
> further, you could implement something like that instead. Panfrost and
> lima also have their own shrinker and could certainly be converted to
> the gem-shmem shrinker.
I had a thought that it could be possible to unify shrinkers into a
common DRM framework. Could you please give me a link to yours prototype
patchset?
On 3/9/22 22:28, Thomas Zimmermann wrote:
> Hi
>
> Am 09.03.22 um 12:55 schrieb Dmitry Osipenko:
>> Hello,
>>
>> On 3/9/22 11:59, Thomas Zimmermann wrote:
>>> Hi
>>>
>>> Am 08.03.22 um 14:17 schrieb Dmitry Osipenko:
>>>> Hello,
>>>>
>>>> This patchset introduces memory shrinker for the VirtIO-GPU DRM driver.
>>>> During OOM, the shrinker will release BOs that are marked as "not
>>>> needed"
>>>> by userspace using the new madvise IOCTL. The userspace in this case is
>>>> the Mesa VirGL driver, it will mark the cached BOs as "not needed",
>>>> allowing kernel driver to release memory of the cached shmem BOs on
>>>> lowmem
>>>> situations, preventing OOM kills.
>>>
>>> Virtio-gpu is build on top of GEM shmem helpers. I have a prototype
>>> patchset that adds a shrinker to these helpers. If you want to go
>>> further, you could implement something like that instead. Panfrost and
>>> lima also have their own shrinker and could certainly be converted to
>>> the gem-shmem shrinker.
>>
>> I had a thought that it could be possible to unify shrinkers into a
>> common DRM framework. Could you please give me a link to yours prototype
>> patchset?
>
> I uploaded the patches to
>
>
> https://gitlab.freedesktop.org/tzimmermann/linux/-/commits/gem-shmem-cached-mappings
>
>
> it's incomplete and un-debugged, but it shows what needs to be done. It
> has the infrastructure, but lacks the changes to the GEM shmem code.
>
> The reason for this work is to keep GEM shmem pages mapped and allocated
> even while the BO is neither mapped nor pinned. As it is now, GEM SHMEM
> creates and releases pages on each pin and unpin, and maps and unmaps
> memory ranges on each vmap and vunmap. It's all wasteful. Only the
> first pin and vmap calls should establish pages and mappings and only
> the purge and free functions should release them.
Hm, aren't maps and pins already refcounted?
> The patchset adds new helpers for BO purging to struct
> drm_gem_object_funcs. With this, I think it might be possible to have
> one global DRM shrinker and let it handle all BOs; independent of each
> BO's memory manager.
Thank you, I'll give it a try.
Hi
Am 09.03.22 um 12:55 schrieb Dmitry Osipenko:
> Hello,
>
> On 3/9/22 11:59, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 08.03.22 um 14:17 schrieb Dmitry Osipenko:
>>> Hello,
>>>
>>> This patchset introduces memory shrinker for the VirtIO-GPU DRM driver.
>>> During OOM, the shrinker will release BOs that are marked as "not needed"
>>> by userspace using the new madvise IOCTL. The userspace in this case is
>>> the Mesa VirGL driver, it will mark the cached BOs as "not needed",
>>> allowing kernel driver to release memory of the cached shmem BOs on
>>> lowmem
>>> situations, preventing OOM kills.
>>
>> Virtio-gpu is build on top of GEM shmem helpers. I have a prototype
>> patchset that adds a shrinker to these helpers. If you want to go
>> further, you could implement something like that instead. Panfrost and
>> lima also have their own shrinker and could certainly be converted to
>> the gem-shmem shrinker.
>
> I had a thought that it could be possible to unify shrinkers into a
> common DRM framework. Could you please give me a link to yours prototype
> patchset?
I uploaded the patches to
https://gitlab.freedesktop.org/tzimmermann/linux/-/commits/gem-shmem-cached-mappings
it's incomplete and un-debugged, but it shows what needs to be done. It
has the infrastructure, but lacks the changes to the GEM shmem code.
The reason for this work is to keep GEM shmem pages mapped and allocated
even while the BO is neither mapped nor pinned. As it is now, GEM SHMEM
creates and releases pages on each pin and unpin, and maps and unmaps
memory ranges on each vmap and vunmap. It's all wasteful. Only the
first pin and vmap calls should establish pages and mappings and only
the purge and free functions should release them.
The patchset adds new helpers for BO purging to struct
drm_gem_object_funcs. With this, I think it might be possible to have
one global DRM shrinker and let it handle all BOs; independent of each
BO's memory manager.
Best regards
Thomas
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev
On 3/10/22 00:51, Rob Clark wrote:
> On Wed, Mar 9, 2022 at 12:06 PM Dmitry Osipenko
> <[email protected]> wrote:
>>
>> On 3/9/22 03:56, Rob Clark wrote:
>>>> If we really can't track madvise state in the guest for dealing with
>>>> host memory pressure, I think the better option is to introduce
>>>> MADV:WILLNEED_REPLACE, ie. something to tell the host kernel that the
>>>> buffer is needed but the previous contents are not (as long as the GPU
>>>> VA remains the same). With this the host could allocate new pages if
>>>> needed, and the guest would not need to wait for a reply from host.
>>> If variant with the memory ballooning will work, then it will be
>>> possible to track the state within guest-only. Let's consider the
>>> simplest variant for now.
>>>
>>> I'll try to implement the balloon driver support in the v2 and will get
>>> back to you.
>>>
>>
>> I looked at the generic balloon driver and looks like this not what we
>> want because:
>>
>> 1. Memory ballooning is primarily about handling memory overcommit
>> situations. I.e. when there are multiple VMs consuming more memory than
>> available in the system. Ballooning allows host to ask guest to give
>> unused pages back to host and host could give pages to other VMs.
>>
>> 2. Memory ballooning operates with guest memory pages only. I.e. each
>> ballooned page is reported to/from host in a form of page's DMA address.
>>
>> 3. There is no direct connection between host's OOM events and the
>> balloon manager. I guess host could watch system's memory pressure and
>> inflate VMs' balloons on low memory, releasing the guest's memory to the
>> system, but apparently this use-case not supported by anyone today, at
>> least I don't see Qemu supporting it.
>>
>
> hmm, on CrOS I do see balloon getting used to balance host vs guest
> memory.. but admittedly I've not yet looked closely at how that works,
> and it does seem like we have some things that are not yet upstream
> all over the place (not to mention crosvm vs qemu)
That's interesting, I missed that CrOS uses a customized ballooning.
>> So the virtio-balloon driver isn't very useful for us as-is.
>>
>> One possible solution could be to create something like a new
>> virtio-shrinker device or add shrinker functionality to the virtio-gpu
>> device, allowing host to ask guests to drop shared caches. Host then
>> should become a PSI handler. I think this should be doable in a case of
>> crosvm. In a case of GNU world, it could take a lot of effort to get
>> everything to upstreamable state, at first there is a need to
>> demonstrate real problem being solved by this solution.
>
> I guess with 4GB chromebooks running one or more VMs in addition to
> lots of browser tabs in the host, it shouldn't be too hard to
> demonstrate a problem ;-)
But then guest also should have a significant amount of BOs in cache to
purge, which potentially could be solved using a timer approach.
> (but also, however we end up solving that, certainly shouldn't block
> this series)
Sure, there is no problem with extending shrinker functionality with
more features later on, so the host's shrinker isn't a blocker.
>> The other minor issue is that only integrated GPUs may use system's
>> memory and even then they could use a dedicated memory carveout, i.e.
>> releasing VRAM BOs may not help with host's OOM. In case of virgl
>> context we have no clue about where buffers are physically located. On
>> the other hand, in the worst case dropping host caches just won't help
>> with OOM.
>
> Userspace should know whether the BO has CPU storage, so I don't think
> this should be a problem virtio_gpu needs to worry about
>
>> It's now unclear how we should proceed with the host-side shrinker
>> support. Thoughts?
>>
>> We may start easy and instead of thinking about host-side shrinker, we
>> could make VirtIO-GPU driver to expire cached BOs after a certain
>> timeout. Mesa already uses timeout-based BO caching, but it doesn't have
>> an alarm timer and simply checks expiration when BO is allocated. Should
>> be too much trouble to handle timers within Mesa since it's executed in
>> application context, easier to do it in kernel, like VC4 driver does it
>> for example. This is not good as a proper memory shrinker, but could be
>> good enough in practice.
>
> I think that, given virgl uses host storage, guest shrinker should be
> still useful.. so I think continue with this series.
Guest shrinker indeed will be useful for virgl today. I was already
questioning why virgl needs both host and guest storages, maybe it will
move to a host-only storage approach in the future.
I think the variant with the timer expiration actually could be
interesting to try because it should allow host to purge its VM BOs by
itself, preventing host OOMs.
> For host shrinker, I'll have to look more at when crosvm triggers
> balloon inflation. I could still go the MADV:WILLNEED_REPLACE
> approach instead, which does have the advantage of host kernel not
> relying on host userspace or vm having a chance to run in order to
> release pages. The downside (perhaps?) is it would be more specific
> to virtgpu-native-context and less so to virgl or venus (but I guess
> there doesn't currently exist a way for madvise to be useful for vk
> drivers).
I'll also take a look at what CrOS does, maybe it has some interesting
ideas.
On 3/9/22 03:56, Rob Clark wrote:
>> If we really can't track madvise state in the guest for dealing with
>> host memory pressure, I think the better option is to introduce
>> MADV:WILLNEED_REPLACE, ie. something to tell the host kernel that the
>> buffer is needed but the previous contents are not (as long as the GPU
>> VA remains the same). With this the host could allocate new pages if
>> needed, and the guest would not need to wait for a reply from host.
> If variant with the memory ballooning will work, then it will be
> possible to track the state within guest-only. Let's consider the
> simplest variant for now.
>
> I'll try to implement the balloon driver support in the v2 and will get
> back to you.
>
I looked at the generic balloon driver and looks like this not what we
want because:
1. Memory ballooning is primarily about handling memory overcommit
situations. I.e. when there are multiple VMs consuming more memory than
available in the system. Ballooning allows host to ask guest to give
unused pages back to host and host could give pages to other VMs.
2. Memory ballooning operates with guest memory pages only. I.e. each
ballooned page is reported to/from host in a form of page's DMA address.
3. There is no direct connection between host's OOM events and the
balloon manager. I guess host could watch system's memory pressure and
inflate VMs' balloons on low memory, releasing the guest's memory to the
system, but apparently this use-case not supported by anyone today, at
least I don't see Qemu supporting it.
So the virtio-balloon driver isn't very useful for us as-is.
One possible solution could be to create something like a new
virtio-shrinker device or add shrinker functionality to the virtio-gpu
device, allowing host to ask guests to drop shared caches. Host then
should become a PSI handler. I think this should be doable in a case of
crosvm. In a case of GNU world, it could take a lot of effort to get
everything to upstreamable state, at first there is a need to
demonstrate real problem being solved by this solution.
The other minor issue is that only integrated GPUs may use system's
memory and even then they could use a dedicated memory carveout, i.e.
releasing VRAM BOs may not help with host's OOM. In case of virgl
context we have no clue about where buffers are physically located. On
the other hand, in the worst case dropping host caches just won't help
with OOM.
It's now unclear how we should proceed with the host-side shrinker
support. Thoughts?
We may start easy and instead of thinking about host-side shrinker, we
could make VirtIO-GPU driver to expire cached BOs after a certain
timeout. Mesa already uses timeout-based BO caching, but it doesn't have
an alarm timer and simply checks expiration when BO is allocated. Should
be too much trouble to handle timers within Mesa since it's executed in
application context, easier to do it in kernel, like VC4 driver does it
for example. This is not good as a proper memory shrinker, but could be
good enough in practice.
On Wed, Mar 9, 2022 at 12:06 PM Dmitry Osipenko
<[email protected]> wrote:
>
> On 3/9/22 03:56, Rob Clark wrote:
> >> If we really can't track madvise state in the guest for dealing with
> >> host memory pressure, I think the better option is to introduce
> >> MADV:WILLNEED_REPLACE, ie. something to tell the host kernel that the
> >> buffer is needed but the previous contents are not (as long as the GPU
> >> VA remains the same). With this the host could allocate new pages if
> >> needed, and the guest would not need to wait for a reply from host.
> > If variant with the memory ballooning will work, then it will be
> > possible to track the state within guest-only. Let's consider the
> > simplest variant for now.
> >
> > I'll try to implement the balloon driver support in the v2 and will get
> > back to you.
> >
>
> I looked at the generic balloon driver and looks like this not what we
> want because:
>
> 1. Memory ballooning is primarily about handling memory overcommit
> situations. I.e. when there are multiple VMs consuming more memory than
> available in the system. Ballooning allows host to ask guest to give
> unused pages back to host and host could give pages to other VMs.
>
> 2. Memory ballooning operates with guest memory pages only. I.e. each
> ballooned page is reported to/from host in a form of page's DMA address.
>
> 3. There is no direct connection between host's OOM events and the
> balloon manager. I guess host could watch system's memory pressure and
> inflate VMs' balloons on low memory, releasing the guest's memory to the
> system, but apparently this use-case not supported by anyone today, at
> least I don't see Qemu supporting it.
>
hmm, on CrOS I do see balloon getting used to balance host vs guest
memory.. but admittedly I've not yet looked closely at how that works,
and it does seem like we have some things that are not yet upstream
all over the place (not to mention crosvm vs qemu)
>
> So the virtio-balloon driver isn't very useful for us as-is.
>
> One possible solution could be to create something like a new
> virtio-shrinker device or add shrinker functionality to the virtio-gpu
> device, allowing host to ask guests to drop shared caches. Host then
> should become a PSI handler. I think this should be doable in a case of
> crosvm. In a case of GNU world, it could take a lot of effort to get
> everything to upstreamable state, at first there is a need to
> demonstrate real problem being solved by this solution.
I guess with 4GB chromebooks running one or more VMs in addition to
lots of browser tabs in the host, it shouldn't be too hard to
demonstrate a problem ;-)
(but also, however we end up solving that, certainly shouldn't block
this series)
> The other minor issue is that only integrated GPUs may use system's
> memory and even then they could use a dedicated memory carveout, i.e.
> releasing VRAM BOs may not help with host's OOM. In case of virgl
> context we have no clue about where buffers are physically located. On
> the other hand, in the worst case dropping host caches just won't help
> with OOM.
Userspace should know whether the BO has CPU storage, so I don't think
this should be a problem virtio_gpu needs to worry about
> It's now unclear how we should proceed with the host-side shrinker
> support. Thoughts?
>
> We may start easy and instead of thinking about host-side shrinker, we
> could make VirtIO-GPU driver to expire cached BOs after a certain
> timeout. Mesa already uses timeout-based BO caching, but it doesn't have
> an alarm timer and simply checks expiration when BO is allocated. Should
> be too much trouble to handle timers within Mesa since it's executed in
> application context, easier to do it in kernel, like VC4 driver does it
> for example. This is not good as a proper memory shrinker, but could be
> good enough in practice.
I think that, given virgl uses host storage, guest shrinker should be
still useful.. so I think continue with this series.
For host shrinker, I'll have to look more at when crosvm triggers
balloon inflation. I could still go the MADV:WILLNEED_REPLACE
approach instead, which does have the advantage of host kernel not
relying on host userspace or vm having a chance to run in order to
release pages. The downside (perhaps?) is it would be more specific
to virtgpu-native-context and less so to virgl or venus (but I guess
there doesn't currently exist a way for madvise to be useful for vk
drivers).
BR,
-R
Hi
Am 09.03.22 um 23:25 schrieb Dmitry Osipenko:
>>
>> The reason for this work is to keep GEM shmem pages mapped and allocated
>> even while the BO is neither mapped nor pinned. As it is now, GEM SHMEM
>> creates and releases pages on each pin and unpin, and maps and unmaps
>> memory ranges on each vmap and vunmap. It's all wasteful. Only the
>> first pin and vmap calls should establish pages and mappings and only
>> the purge and free functions should release them.
>
> Hm, aren't maps and pins already refcounted?
They are. But even when the refcounter reaches 0 on deref, there's no
need to remove the mapping or free the memory pages. We can keep them
around for the next ref operation. Only the shrinker's purge or freeing
the object has to do such clean-up operations. Such behavior is
supported by TTM and we already use it in VRAM helpers as well.
Best regards
Thomas
>
>> The patchset adds new helpers for BO purging to struct
>> drm_gem_object_funcs. With this, I think it might be possible to have
>> one global DRM shrinker and let it handle all BOs; independent of each
>> BO's memory manager.
>
> Thank you, I'll give it a try.
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev
On 3/10/22 22:02, Thomas Zimmermann wrote:
> Hi
>
> Am 09.03.22 um 23:25 schrieb Dmitry Osipenko:
>>>
>>> The reason for this work is to keep GEM shmem pages mapped and allocated
>>> even while the BO is neither mapped nor pinned. As it is now, GEM SHMEM
>>> creates and releases pages on each pin and unpin, and maps and unmaps
>>> memory ranges on each vmap and vunmap. It's all wasteful. Only the
>>> first pin and vmap calls should establish pages and mappings and only
>>> the purge and free functions should release them.
>>
>> Hm, aren't maps and pins already refcounted?
>
> They are. But even when the refcounter reaches 0 on deref, there's no
> need to remove the mapping or free the memory pages. We can keep them
> around for the next ref operation. Only the shrinker's purge or freeing
> the object has to do such clean-up operations. Such behavior is
> supported by TTM and we already use it in VRAM helpers as well.
When driver won't want to pin BO at the creation time? Looks like all
shmem users do this today, and thus, pages shouldn't be freed until BO
is destroyed or purged.
On 3/10/22 01:43, Dmitry Osipenko wrote:
>> I think that, given virgl uses host storage, guest shrinker should be
>> still useful.. so I think continue with this series.
> Guest shrinker indeed will be useful for virgl today. I was already
> questioning why virgl needs both host and guest storages, maybe it will
> move to a host-only storage approach in the future.
>
> I think the variant with the timer expiration actually could be
> interesting to try because it should allow host to purge its VM BOs by
> itself, preventing host OOMs.
While I was working on shrinker v2, I noticed that host-side allocation
may take a significant time. So I decided to defer implementation of my
idea of using timer-based expiration for host-only BOs. I'll need to
examine it more.
>> For host shrinker, I'll have to look more at when crosvm triggers
>> balloon inflation. I could still go the MADV:WILLNEED_REPLACE
>> approach instead, which does have the advantage of host kernel not
>> relying on host userspace or vm having a chance to run in order to
>> release pages. The downside (perhaps?) is it would be more specific
>> to virtgpu-native-context and less so to virgl or venus (but I guess
>> there doesn't currently exist a way for madvise to be useful for vk
>> drivers).
> I'll also take a look at what CrOS does, maybe it has some interesting
> ideas.
I looked at CrOS kernel and crosvm, and haven't noticed anything special
there in regards to purging GPU's memory of VM on host-side memory
pressure. If you'll find something, then please let me know.
I sent out v2 of the shrinker series, but missed to CC you on it by
accident, please find it here [1].
[1]
https://lore.kernel.org/dri-devel/[email protected]/T/#t