2022-03-01 20:18:57

by Iouri Tarassov

[permalink] [raw]
Subject: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

- Create skeleton and add basic functionality for the
hyper-v compute device driver (dxgkrnl).

- Register for PCI and VM bus driver notifications and
handle initialization of VM bus channels.

- Connect the dxgkrnl module to the drivers/hv/ Makefile and Kconfig

- Create a MAINTAINERS entry

A VM bus channel is a communication interface between the hyper-v guest
and the host. The are two type of VM bus channels, used in the driver:
- the global channel
- per virtual compute device channel

A PCI device is created for each virtual compute device, projected
by the host. The device vendor is PCI_VENDOR_ID_MICROSOFT and device
id is PCI_DEVICE_ID_VIRTUAL_RENDER. dxg_pci_probe_device handles
arrival of such devices. The PCI config space of the virtual compute
device has luid of the corresponding virtual compute device VM
bus channel. This is how the compute device adapter objects are
linked to VM bus channels.

VM bus interface version is exchanged by reading/writing the PCI config
space of the virtual compute device.

The IO space is used to handle CPU accessible compute device
allocations. Hyper-v allocates IO space for the global VM bus channel.

Signed-off-by: Iouri Tarassov <[email protected]>
---
MAINTAINERS | 7 +
drivers/hv/Kconfig | 2 +
drivers/hv/Makefile | 1 +
drivers/hv/dxgkrnl/Kconfig | 26 ++
drivers/hv/dxgkrnl/Makefile | 5 +
drivers/hv/dxgkrnl/dxgkrnl.h | 119 ++++++++
drivers/hv/dxgkrnl/dxgmodule.c | 531 +++++++++++++++++++++++++++++++++
include/uapi/misc/d3dkmthk.h | 23 ++
8 files changed, 714 insertions(+)
create mode 100644 drivers/hv/dxgkrnl/Kconfig
create mode 100644 drivers/hv/dxgkrnl/Makefile
create mode 100644 drivers/hv/dxgkrnl/dxgkrnl.h
create mode 100644 drivers/hv/dxgkrnl/dxgmodule.c
create mode 100644 include/uapi/misc/d3dkmthk.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a2bd991db512..5856e09d834c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8841,6 +8841,13 @@ F: Documentation/devicetree/bindings/mtd/ti,am654-hbmc.yaml
F: drivers/mtd/hyperbus/
F: include/linux/mtd/hyperbus.h

+Hyper-V vGPU DRIVER
+M: Iouri Tarassov <[email protected]>
+L: [email protected]
+S: Supported
+F: drivers/hv/dxgkrnl/
+F: include/uapi/misc/d3dkmthk.h
+
HYPERVISOR VIRTUAL CONSOLE DRIVER
L: [email protected]
S: Odd Fixes
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index dd12af20e467..7006f7b66200 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -29,4 +29,6 @@ config HYPERV_BALLOON
help
Select this option to enable Hyper-V Balloon driver.

+source "drivers/hv/dxgkrnl/Kconfig"
+
endmenu
diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
index d76df5c8c2a9..aa1cbdb5d0d2 100644
--- a/drivers/hv/Makefile
+++ b/drivers/hv/Makefile
@@ -2,6 +2,7 @@
obj-$(CONFIG_HYPERV) += hv_vmbus.o
obj-$(CONFIG_HYPERV_UTILS) += hv_utils.o
obj-$(CONFIG_HYPERV_BALLOON) += hv_balloon.o
+obj-$(CONFIG_DXGKRNL) += dxgkrnl/

CFLAGS_hv_trace.o = -I$(src)
CFLAGS_hv_balloon.o = -I$(src)
diff --git a/drivers/hv/dxgkrnl/Kconfig b/drivers/hv/dxgkrnl/Kconfig
new file mode 100644
index 000000000000..bcd92bbff939
--- /dev/null
+++ b/drivers/hv/dxgkrnl/Kconfig
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: GPL-2.0
+# Configuration for the hyper-v virtual compute driver (dxgkrnl)
+#
+
+config DXGKRNL
+ tristate "Microsoft Paravirtualized GPU support"
+ depends on HYPERV
+ depends on 64BIT || COMPILE_TEST
+ help
+ This driver supports paravirtualized virtual compute devices, exposed
+ by Microsoft Hyper-V when Linux is running inside of a virtual machine
+ hosted by Windows. The virtual machines needs to be configured to use
+ host compute adapters. The driver name is dxgkrnl.
+
+ An example of such virtual machine is a Windows Subsystem for
+ Linux container. When such container is instantiated, the Windows host
+ assigns compatible host GPU adapters to the container. The corresponding
+ virtual GPU devices appear on the PCI bus in the container. These
+ devices are enumerated and accessed by this driver.
+
+ Communications with the driver are done by using the Microsoft libdxcore
+ library, which translates the D3DKMT interface
+ <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/d3dkmthk/>
+ to the driver IOCTLs. The virtual GPU devices are paravirtualized,
+ which means that access to the hardware is done in the host. The driver
+ communicates with the host using Hyper-V VM bus communication channels.
diff --git a/drivers/hv/dxgkrnl/Makefile b/drivers/hv/dxgkrnl/Makefile
new file mode 100644
index 000000000000..052f8bd62ad3
--- /dev/null
+++ b/drivers/hv/dxgkrnl/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for the hyper-v compute device driver (dxgkrnl).
+
+obj-$(CONFIG_DXGKRNL) += dxgkrnl.o
+dxgkrnl-y := dxgmodule.o
diff --git a/drivers/hv/dxgkrnl/dxgkrnl.h b/drivers/hv/dxgkrnl/dxgkrnl.h
new file mode 100644
index 000000000000..6b3e9334bce8
--- /dev/null
+++ b/drivers/hv/dxgkrnl/dxgkrnl.h
@@ -0,0 +1,119 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2019, Microsoft Corporation.
+ *
+ * Author:
+ * Iouri Tarassov <[email protected]>
+ *
+ * Dxgkrnl Graphics Driver
+ * Headers for internal objects
+ *
+ */
+
+#ifndef _DXGKRNL_H
+#define _DXGKRNL_H
+
+#include <linux/uuid.h>
+#include <linux/kernel.h>
+#include <linux/mutex.h>
+#include <linux/semaphore.h>
+#include <linux/refcount.h>
+#include <linux/rwsem.h>
+#include <linux/atomic.h>
+#include <linux/spinlock.h>
+#include <linux/gfp.h>
+#include <linux/miscdevice.h>
+#include <linux/pci.h>
+#include <linux/hyperv.h>
+
+struct dxgadapter;
+
+#include <uapi/misc/d3dkmthk.h>
+
+struct dxgvmbuschannel {
+ struct vmbus_channel *channel;
+ struct hv_device *hdev;
+ spinlock_t packet_list_mutex;
+ struct list_head packet_list_head;
+ struct kmem_cache *packet_cache;
+ atomic64_t packet_request_id;
+};
+
+int dxgvmbuschannel_init(struct dxgvmbuschannel *ch, struct hv_device *hdev);
+void dxgvmbuschannel_destroy(struct dxgvmbuschannel *ch);
+void dxgvmbuschannel_receive(void *ctx);
+
+/*
+ * The structure defines an offered vGPU vm bus channel.
+ */
+struct dxgvgpuchannel {
+ struct list_head vgpu_ch_list_entry;
+ struct winluid adapter_luid;
+ struct hv_device *hdev;
+};
+
+struct dxgglobal {
+ struct dxgvmbuschannel channel;
+ struct delayed_work dwork;
+ struct hv_device *hdev;
+ u32 num_adapters;
+ u32 vmbus_ver; /* Interface version */
+ struct resource *mem;
+ u64 mmiospace_base;
+ u64 mmiospace_size;
+ struct miscdevice dxgdevice;
+ struct mutex device_mutex;
+
+ /* list of created processes */
+ struct list_head plisthead;
+ struct mutex plistmutex;
+
+ /*
+ * List of the vGPU VM bus channels (dxgvgpuchannel)
+ * Protected by device_mutex
+ */
+ struct list_head vgpu_ch_list_head;
+
+ /* protects acces to the global VM bus channel */
+ struct rw_semaphore channel_lock;
+
+ bool dxg_dev_initialized;
+ bool vmbus_registered;
+ bool pci_registered;
+ bool global_channel_initialized;
+ bool async_msg_enabled;
+};
+
+extern struct dxgglobal *dxgglobal;
+
+struct dxgprocess {
+ /* Placeholder */
+};
+
+void init_ioctls(void);
+long dxgk_compat_ioctl(struct file *f, unsigned int p1, unsigned long p2);
+long dxgk_unlocked_ioctl(struct file *f, unsigned int p1, unsigned long p2);
+
+static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
+{
+ *luid = *(struct winluid *)&guid->b[0];
+}
+
+/*
+ * VM bus interface
+ *
+ */
+
+/*
+ * The interface version is used to ensure that the host and the guest use the
+ * same VM bus protocol. It needs to be incremented every time the VM bus
+ * interface changes. DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION is
+ * incremented each time the earlier versions of the interface are no longer
+ * compatible with the current version.
+ */
+#define DXGK_VMBUS_INTERFACE_VERSION_OLD 27
+#define DXGK_VMBUS_INTERFACE_VERSION 40
+#define DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION 16
+
+#endif
diff --git a/drivers/hv/dxgkrnl/dxgmodule.c b/drivers/hv/dxgkrnl/dxgmodule.c
new file mode 100644
index 000000000000..412c37bc592d
--- /dev/null
+++ b/drivers/hv/dxgkrnl/dxgmodule.c
@@ -0,0 +1,531 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2019, Microsoft Corporation.
+ *
+ * Author:
+ * Iouri Tarassov <[email protected]>
+ *
+ * Dxgkrnl Graphics Driver
+ * Interface with Linux kernel, PCI driver and the VM bus driver
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/eventfd.h>
+#include <linux/hyperv.h>
+#include <linux/pci.h>
+
+#include "dxgkrnl.h"
+
+/*
+ * Pointer to the global device data. By design
+ * there is a single vGPU device on the VM bus and a single /dev/dxg device
+ * is created.
+ */
+struct dxgglobal *dxgglobal;
+
+#define DXGKRNL_VERSION 0x2216
+#define PCI_VENDOR_ID_MICROSOFT 0x1414
+#define PCI_DEVICE_ID_VIRTUAL_RENDER 0x008E
+
+#undef pr_fmt
+#define pr_fmt(fmt) "dxgk: " fmt
+
+//
+// Interface from dxgglobal
+//
+
+struct vmbus_channel *dxgglobal_get_vmbus(void)
+{
+ return dxgglobal->channel.channel;
+}
+
+struct dxgvmbuschannel *dxgglobal_get_dxgvmbuschannel(void)
+{
+ return &dxgglobal->channel;
+}
+
+int dxgglobal_acquire_channel_lock(void)
+{
+ down_read(&dxgglobal->channel_lock);
+ if (dxgglobal->channel.channel == NULL) {
+ pr_err("Failed to acquire global channel lock");
+ return -ENODEV;
+ } else {
+ return 0;
+ }
+}
+
+void dxgglobal_release_channel_lock(void)
+{
+ up_read(&dxgglobal->channel_lock);
+}
+
+/*
+ * File operations for the /dev/dxg device
+ */
+
+static int dxgk_open(struct inode *n, struct file *f)
+{
+ return 0;
+}
+
+static int dxgk_release(struct inode *n, struct file *f)
+{
+ return 0;
+}
+
+static ssize_t dxgk_read(struct file *f, char __user *s, size_t len,
+ loff_t *o)
+{
+ pr_debug("file read\n");
+ return 0;
+}
+
+static ssize_t dxgk_write(struct file *f, const char __user *s, size_t len,
+ loff_t *o)
+{
+ pr_debug("file write\n");
+ return len;
+}
+
+const struct file_operations dxgk_fops = {
+ .owner = THIS_MODULE,
+ .open = dxgk_open,
+ .release = dxgk_release,
+ .write = dxgk_write,
+ .read = dxgk_read,
+};
+
+/*
+ * Interface with the PCI driver
+ */
+
+/*
+ * Part of the PCI config space of the vGPU device is used for vGPU
+ * configuration data. Reading/writing of the PCI config space is forwarded
+ * to the host.
+ */
+
+/* vGPU VM bus channel instance ID */
+#define DXGK_VMBUS_CHANNEL_ID_OFFSET 192
+/* DXGK_VMBUS_INTERFACE_VERSION (u32) */
+#define DXGK_VMBUS_VERSION_OFFSET (DXGK_VMBUS_CHANNEL_ID_OFFSET + \
+ sizeof(guid_t))
+/* Luid of the virtual GPU on the host (struct winluid) */
+#define DXGK_VMBUS_VGPU_LUID_OFFSET (DXGK_VMBUS_VERSION_OFFSET + \
+ sizeof(u32))
+/* The guest writes its capavilities to this adderss */
+#define DXGK_VMBUS_GUESTCAPS_OFFSET (DXGK_VMBUS_VERSION_OFFSET + \
+ sizeof(u32))
+
+/* Capabilities of the guest driver, reported to the host */
+struct dxgk_vmbus_guestcaps {
+ union {
+ struct {
+ u32 wsl2 : 1;
+ u32 reserved : 31;
+ };
+ u32 guest_caps;
+ };
+};
+
+/*
+ * A helper function to read PCI config space.
+ */
+static int dxg_pci_read_dwords(struct pci_dev *dev, int offset, int size,
+ void *val)
+{
+ int off = offset;
+ int ret;
+ int i;
+
+ for (i = 0; i < size / sizeof(int); i++) {
+ ret = pci_read_config_dword(dev, off, &((int *)val)[i]);
+ if (ret) {
+ pr_err("Failed to read PCI config: %d", off);
+ return ret;
+ }
+ off += sizeof(int);
+ }
+ return 0;
+}
+
+static int dxg_pci_probe_device(struct pci_dev *dev,
+ const struct pci_device_id *id)
+{
+ int ret;
+ guid_t guid;
+ u32 vmbus_interface_ver = DXGK_VMBUS_INTERFACE_VERSION;
+ struct winluid vgpu_luid = {};
+ struct dxgk_vmbus_guestcaps guest_caps = {.wsl2 = 1};
+
+ mutex_lock(&dxgglobal->device_mutex);
+
+ if (dxgglobal->vmbus_ver == 0) {
+ /* Report capabilities to the host */
+
+ ret = pci_write_config_dword(dev, DXGK_VMBUS_GUESTCAPS_OFFSET,
+ guest_caps.guest_caps);
+ if (ret)
+ goto cleanup;
+
+ /* Negotiate the VM bus version */
+
+ ret = pci_read_config_dword(dev, DXGK_VMBUS_VERSION_OFFSET,
+ &vmbus_interface_ver);
+ if (ret == 0 && vmbus_interface_ver != 0)
+ dxgglobal->vmbus_ver = vmbus_interface_ver;
+ else
+ dxgglobal->vmbus_ver = DXGK_VMBUS_INTERFACE_VERSION_OLD;
+
+ if (dxgglobal->vmbus_ver < DXGK_VMBUS_INTERFACE_VERSION)
+ goto read_channel_id;
+
+ ret = pci_write_config_dword(dev, DXGK_VMBUS_VERSION_OFFSET,
+ DXGK_VMBUS_INTERFACE_VERSION);
+ if (ret)
+ goto cleanup;
+
+ if (dxgglobal->vmbus_ver > DXGK_VMBUS_INTERFACE_VERSION)
+ dxgglobal->vmbus_ver = DXGK_VMBUS_INTERFACE_VERSION;
+ }
+
+read_channel_id:
+
+ /* Get the VM bus channel ID for the virtual GPU */
+ ret = dxg_pci_read_dwords(dev, DXGK_VMBUS_CHANNEL_ID_OFFSET,
+ sizeof(guid), (int *)&guid);
+ if (ret)
+ goto cleanup;
+
+ if (dxgglobal->vmbus_ver >= DXGK_VMBUS_INTERFACE_VERSION) {
+ ret = dxg_pci_read_dwords(dev, DXGK_VMBUS_VGPU_LUID_OFFSET,
+ sizeof(vgpu_luid), &vgpu_luid);
+ if (ret)
+ goto cleanup;
+ }
+
+ pr_debug("Adapter channel: %pUb\n", &guid);
+ pr_debug("Vmbus interface version: %d\n",
+ dxgglobal->vmbus_ver);
+ pr_debug("Host vGPU luid: %x-%x\n",
+ vgpu_luid.b, vgpu_luid.a);
+
+cleanup:
+
+ mutex_unlock(&dxgglobal->device_mutex);
+
+ if (ret)
+ pr_debug("err: %s %d", __func__, ret);
+ return ret;
+}
+
+static void dxg_pci_remove_device(struct pci_dev *dev)
+{
+ /* Placeholder */
+}
+
+static struct pci_device_id dxg_pci_id_table = {
+ .vendor = PCI_VENDOR_ID_MICROSOFT,
+ .device = PCI_DEVICE_ID_VIRTUAL_RENDER,
+ .subvendor = PCI_ANY_ID,
+ .subdevice = PCI_ANY_ID
+};
+
+static struct pci_driver dxg_pci_drv = {
+ .name = KBUILD_MODNAME,
+ .id_table = &dxg_pci_id_table,
+ .probe = dxg_pci_probe_device,
+ .remove = dxg_pci_remove_device
+};
+
+/*
+ * Interface with the VM bus driver
+ */
+
+static int dxgglobal_getiospace(struct dxgglobal *dxgglobal)
+{
+ /* Get mmio space for the global channel */
+ struct hv_device *hdev = dxgglobal->hdev;
+ struct vmbus_channel *channel = hdev->channel;
+ resource_size_t pot_start = 0;
+ resource_size_t pot_end = -1;
+ int ret;
+
+ dxgglobal->mmiospace_size = channel->offermsg.offer.mmio_megabytes;
+ if (dxgglobal->mmiospace_size == 0) {
+ pr_debug("zero mmio space is offered\n");
+ return -ENOMEM;
+ }
+ dxgglobal->mmiospace_size <<= 20;
+ pr_debug("mmio offered: %llx\n",
+ dxgglobal->mmiospace_size);
+
+ ret = vmbus_allocate_mmio(&dxgglobal->mem, hdev, pot_start, pot_end,
+ dxgglobal->mmiospace_size, 0x10000, false);
+ if (ret) {
+ pr_err("Unable to allocate mmio memory: %d\n", ret);
+ return ret;
+ }
+ dxgglobal->mmiospace_size = dxgglobal->mem->end -
+ dxgglobal->mem->start + 1;
+ dxgglobal->mmiospace_base = dxgglobal->mem->start;
+ pr_info("mmio allocated %llx %llx %llx %llx\n",
+ dxgglobal->mmiospace_base,
+ dxgglobal->mmiospace_size,
+ dxgglobal->mem->start, dxgglobal->mem->end);
+
+ return 0;
+}
+
+int dxgglobal_init_global_channel(void)
+{
+ int ret = 0;
+
+ ret = dxgvmbuschannel_init(&dxgglobal->channel, dxgglobal->hdev);
+ if (ret) {
+ pr_err("dxgvmbuschannel_init failed: %d\n", ret);
+ goto error;
+ }
+
+ ret = dxgglobal_getiospace(dxgglobal);
+ if (ret) {
+ pr_err("getiospace failed: %d\n", ret);
+ goto error;
+ }
+
+ hv_set_drvdata(dxgglobal->hdev, dxgglobal);
+
+ dxgglobal->dxgdevice.minor = MISC_DYNAMIC_MINOR;
+ dxgglobal->dxgdevice.name = "dxg";
+ dxgglobal->dxgdevice.fops = &dxgk_fops;
+ dxgglobal->dxgdevice.mode = 0666;
+ ret = misc_register(&dxgglobal->dxgdevice);
+ if (ret) {
+ pr_err("misc_register failed: %d", ret);
+ goto error;
+ }
+ dxgglobal->dxg_dev_initialized = true;
+
+error:
+ return ret;
+}
+
+void dxgglobal_destroy_global_channel(void)
+{
+ down_write(&dxgglobal->channel_lock);
+
+ dxgglobal->global_channel_initialized = false;
+
+ if (dxgglobal->dxg_dev_initialized) {
+ misc_deregister(&dxgglobal->dxgdevice);
+ dxgglobal->dxg_dev_initialized = false;
+ }
+
+ if (dxgglobal->mem) {
+ vmbus_free_mmio(dxgglobal->mmiospace_base,
+ dxgglobal->mmiospace_size);
+ dxgglobal->mem = NULL;
+ }
+
+ dxgvmbuschannel_destroy(&dxgglobal->channel);
+
+ if (dxgglobal->hdev) {
+ hv_set_drvdata(dxgglobal->hdev, NULL);
+ dxgglobal->hdev = NULL;
+ }
+
+ up_write(&dxgglobal->channel_lock);
+}
+
+static const struct hv_vmbus_device_id id_table[] = {
+ /* Per GPU Device GUID */
+ { HV_GPUP_DXGK_VGPU_GUID },
+ /* Global Dxgkgnl channel for the virtual machine */
+ { HV_GPUP_DXGK_GLOBAL_GUID },
+ { }
+};
+
+static int dxg_probe_vmbus(struct hv_device *hdev,
+ const struct hv_vmbus_device_id *dev_id)
+{
+ int ret = 0;
+ struct winluid luid;
+ struct dxgvgpuchannel *vgpuch;
+
+ mutex_lock(&dxgglobal->device_mutex);
+
+ if (uuid_le_cmp(hdev->dev_type, id_table[0].guid) == 0) {
+ /* This is a new virtual GPU channel */
+ guid_to_luid(&hdev->channel->offermsg.offer.if_instance, &luid);
+ pr_debug("vGPU channel: %pUb",
+ &hdev->channel->offermsg.offer.if_instance);
+ vgpuch = vzalloc(sizeof(struct dxgvgpuchannel));
+ if (vgpuch == NULL) {
+ ret = -ENOMEM;
+ goto error;
+ }
+ vgpuch->adapter_luid = luid;
+ vgpuch->hdev = hdev;
+ list_add_tail(&vgpuch->vgpu_ch_list_entry,
+ &dxgglobal->vgpu_ch_list_head);
+ } else if (uuid_le_cmp(hdev->dev_type, id_table[1].guid) == 0) {
+ /* This is the global Dxgkgnl channel */
+ pr_debug("Global channel: %pUb",
+ &hdev->channel->offermsg.offer.if_instance);
+ if (dxgglobal->hdev) {
+ /* This device should appear only once */
+ pr_err("global channel already present\n");
+ ret = -EBADE;
+ goto error;
+ }
+ dxgglobal->hdev = hdev;
+ } else {
+ /* Unknown device type */
+ pr_err("probe: unknown device type\n");
+ ret = -EBADE;
+ goto error;
+ }
+
+error:
+
+ mutex_unlock(&dxgglobal->device_mutex);
+
+ if (ret)
+ pr_debug("err: %s %d", __func__, ret);
+ return ret;
+}
+
+static int dxg_remove_vmbus(struct hv_device *hdev)
+{
+ int ret = 0;
+ struct dxgvgpuchannel *vgpu_channel;
+
+ mutex_lock(&dxgglobal->device_mutex);
+
+ if (uuid_le_cmp(hdev->dev_type, id_table[0].guid) == 0) {
+ pr_debug("Remove virtual GPU channel\n");
+ list_for_each_entry(vgpu_channel,
+ &dxgglobal->vgpu_ch_list_head,
+ vgpu_ch_list_entry) {
+ if (vgpu_channel->hdev == hdev) {
+ list_del(&vgpu_channel->vgpu_ch_list_entry);
+ vfree(vgpu_channel);
+ break;
+ }
+ }
+ } else if (uuid_le_cmp(hdev->dev_type, id_table[1].guid) == 0) {
+ pr_debug("Remove global channel device\n");
+ dxgglobal_destroy_global_channel();
+ } else {
+ /* Unknown device type */
+ pr_err("remove: unknown device type\n");
+ ret = -EBADE;
+ }
+
+ mutex_unlock(&dxgglobal->device_mutex);
+ if (ret)
+ pr_debug("err: %s %d", __func__, ret);
+ return ret;
+}
+
+MODULE_DEVICE_TABLE(vmbus, id_table);
+
+static struct hv_driver dxg_drv = {
+ .name = KBUILD_MODNAME,
+ .id_table = id_table,
+ .probe = dxg_probe_vmbus,
+ .remove = dxg_remove_vmbus,
+ .driver = {
+ .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+ },
+};
+
+/*
+ * Interface with Linux kernel
+ */
+
+static int dxgglobal_create(void)
+{
+ int ret = 0;
+
+ dxgglobal = vzalloc(sizeof(struct dxgglobal));
+ if (!dxgglobal)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&dxgglobal->plisthead);
+ mutex_init(&dxgglobal->plistmutex);
+ mutex_init(&dxgglobal->device_mutex);
+
+ INIT_LIST_HEAD(&dxgglobal->vgpu_ch_list_head);
+
+ init_rwsem(&dxgglobal->channel_lock);
+
+ pr_debug("dxgglobal_init end\n");
+ return ret;
+}
+
+static void dxgglobal_destroy(void)
+{
+ if (dxgglobal) {
+ if (dxgglobal->vmbus_registered)
+ vmbus_driver_unregister(&dxg_drv);
+
+ dxgglobal_destroy_global_channel();
+
+ if (dxgglobal->pci_registered)
+ pci_unregister_driver(&dxg_pci_drv);
+
+ vfree(dxgglobal);
+ dxgglobal = NULL;
+ }
+}
+
+/*
+ * Driver entry points
+ */
+
+static int __init dxg_drv_init(void)
+{
+ int ret;
+
+
+ ret = dxgglobal_create();
+ if (ret) {
+ pr_err("dxgglobal_init failed");
+ return -ENOMEM;
+ }
+
+ ret = vmbus_driver_register(&dxg_drv);
+ if (ret) {
+ pr_err("vmbus_driver_register failed: %d", ret);
+ return ret;
+ }
+ dxgglobal->vmbus_registered = true;
+
+ pr_info("%s Version: %x", __func__, DXGKRNL_VERSION);
+
+ ret = pci_register_driver(&dxg_pci_drv);
+ if (ret) {
+ pr_err("pci_driver_register failed: %d", ret);
+ return ret;
+ }
+ dxgglobal->pci_registered = true;
+
+ init_ioctls();
+
+ return 0;
+}
+
+static void __exit dxg_drv_exit(void)
+{
+ dxgglobal_destroy();
+}
+
+module_init(dxg_drv_init);
+module_exit(dxg_drv_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Microsoft Dxgkrnl virtual GPU Driver");
diff --git a/include/uapi/misc/d3dkmthk.h b/include/uapi/misc/d3dkmthk.h
new file mode 100644
index 000000000000..bdb7bc325d1a
--- /dev/null
+++ b/include/uapi/misc/d3dkmthk.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+/*
+ * Copyright (c) 2019, Microsoft Corporation.
+ *
+ * Author:
+ * Iouri Tarassov <[email protected]>
+ *
+ * Dxgkrnl Graphics Driver
+ * User mode WDDM interface definitions
+ *
+ */
+
+#ifndef _D3DKMTHK_H
+#define _D3DKMTHK_H
+
+/* Matches Windows LUID definition */
+struct winluid {
+ __u32 a;
+ __u32 b;
+};
+
+#endif /* _D3DKMTHK_H */
--
2.35.1


2022-03-01 21:08:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Tue, Mar 01, 2022 at 11:45:49AM -0800, Iouri Tarassov wrote:
> - Create skeleton and add basic functionality for the
> hyper-v compute device driver (dxgkrnl).
>
> - Register for PCI and VM bus driver notifications and
> handle initialization of VM bus channels.
>
> - Connect the dxgkrnl module to the drivers/hv/ Makefile and Kconfig
>
> - Create a MAINTAINERS entry
>
> A VM bus channel is a communication interface between the hyper-v guest
> and the host. The are two type of VM bus channels, used in the driver:
> - the global channel
> - per virtual compute device channel
>
> A PCI device is created for each virtual compute device, projected
> by the host. The device vendor is PCI_VENDOR_ID_MICROSOFT and device
> id is PCI_DEVICE_ID_VIRTUAL_RENDER. dxg_pci_probe_device handles
> arrival of such devices. The PCI config space of the virtual compute
> device has luid of the corresponding virtual compute device VM
> bus channel. This is how the compute device adapter objects are
> linked to VM bus channels.
>
> VM bus interface version is exchanged by reading/writing the PCI config
> space of the virtual compute device.
>
> The IO space is used to handle CPU accessible compute device
> allocations. Hyper-v allocates IO space for the global VM bus channel.
>
> Signed-off-by: Iouri Tarassov <[email protected]>

Please work with internal developers to get reviews from them first,
before requiring the kernel community to point out basic issues. It
will save you a lot of time and stop us from feeling like we are having
our time wasted.

Some simple examples below that your coworkers should have caught:

> --- /dev/null
> +++ b/drivers/hv/dxgkrnl/dxgkrnl.h
> @@ -0,0 +1,119 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Copyright (c) 2019, Microsoft Corporation.

It is now 2022 :)

> +void init_ioctls(void);

That is a horrible global function name you just added to the kernel's
namespace for a single driver :(

> +long dxgk_unlocked_ioctl(struct file *f, unsigned int p1, unsigned long p2);
> +
> +static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
> +{
> + *luid = *(struct winluid *)&guid->b[0];

Why is the cast needed? Shouldn't you use real types in your
structures?

> +/*
> + * The interface version is used to ensure that the host and the guest use the
> + * same VM bus protocol. It needs to be incremented every time the VM bus
> + * interface changes. DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION is
> + * incremented each time the earlier versions of the interface are no longer
> + * compatible with the current version.
> + */
> +#define DXGK_VMBUS_INTERFACE_VERSION_OLD 27
> +#define DXGK_VMBUS_INTERFACE_VERSION 40
> +#define DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION 16

Where do these numbers come from, the hypervisor specification?

> +/*
> + * Pointer to the global device data. By design
> + * there is a single vGPU device on the VM bus and a single /dev/dxg device
> + * is created.
> + */
> +struct dxgglobal *dxgglobal;

No, make this per-device, NEVER have a single device for your driver.
The Linux driver model makes it harder to do it this way than to do it
correctly. Do it correctly please and have no global structures like
this.

> +#define DXGKRNL_VERSION 0x2216

What does this mean?

> +#define PCI_VENDOR_ID_MICROSOFT 0x1414
> +#define PCI_DEVICE_ID_VIRTUAL_RENDER 0x008E
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt) "dxgk: " fmt

Use the dev_*() print functions, you are a driver, and you always have a
pointer to a struct device. There's no need to ever call pr_*().

> +
> +//
> +// Interface from dxgglobal
> +//

You mix /* */ and // styles for large comment blocks like this. Pick
one and use it consistently.

> +
> +struct vmbus_channel *dxgglobal_get_vmbus(void)
> +{
> + return dxgglobal->channel.channel;
> +}

Again, no globals.

> +/*
> + * File operations for the /dev/dxg device
> + */
> +
> +static int dxgk_open(struct inode *n, struct file *f)
> +{
> + return 0;
> +}
> +
> +static int dxgk_release(struct inode *n, struct file *f)
> +{
> + return 0;
> +}

If you do not need open/release, do not provide them.

> +
> +static ssize_t dxgk_read(struct file *f, char __user *s, size_t len,
> + loff_t *o)
> +{
> + pr_debug("file read\n");

ftrace is your friend, please remove.

> + return 0;
> +}
> +
> +static ssize_t dxgk_write(struct file *f, const char __user *s, size_t len,
> + loff_t *o)
> +{
> + pr_debug("file write\n");
> + return len;
> +}
> +
> +const struct file_operations dxgk_fops = {
> + .owner = THIS_MODULE,
> + .open = dxgk_open,
> + .release = dxgk_release,
> + .write = dxgk_write,
> + .read = dxgk_read,
> +};
> +
> +/*
> + * Interface with the PCI driver
> + */
> +
> +/*
> + * Part of the PCI config space of the vGPU device is used for vGPU
> + * configuration data. Reading/writing of the PCI config space is forwarded
> + * to the host.
> + */
> +
> +/* vGPU VM bus channel instance ID */
> +#define DXGK_VMBUS_CHANNEL_ID_OFFSET 192

No tab? Why not?

> +/* DXGK_VMBUS_INTERFACE_VERSION (u32) */

What is this?

> +#define DXGK_VMBUS_VERSION_OFFSET (DXGK_VMBUS_CHANNEL_ID_OFFSET + \
> + sizeof(guid_t))

offsetof() is your friend, please use it.

> +/* Luid of the virtual GPU on the host (struct winluid) */

What is a "luid"?

> +#define DXGK_VMBUS_VGPU_LUID_OFFSET (DXGK_VMBUS_VERSION_OFFSET + \
> + sizeof(u32))

Again, offsetof() please.

> +/* The guest writes its capavilities to this adderss */

Odd spelling :(

> +#define DXGK_VMBUS_GUESTCAPS_OFFSET (DXGK_VMBUS_VERSION_OFFSET + \
> + sizeof(u32))

offsetof().

> +static int dxg_probe_vmbus(struct hv_device *hdev,
> + const struct hv_vmbus_device_id *dev_id)
> +{
> + int ret = 0;
> + struct winluid luid;
> + struct dxgvgpuchannel *vgpuch;
> +
> + mutex_lock(&dxgglobal->device_mutex);

Lock the device's mutex, not the global.

> +
> + if (uuid_le_cmp(hdev->dev_type, id_table[0].guid) == 0) {
> + /* This is a new virtual GPU channel */
> + guid_to_luid(&hdev->channel->offermsg.offer.if_instance, &luid);
> + pr_debug("vGPU channel: %pUb",
> + &hdev->channel->offermsg.offer.if_instance);
> + vgpuch = vzalloc(sizeof(struct dxgvgpuchannel));
> + if (vgpuch == NULL) {
> + ret = -ENOMEM;
> + goto error;
> + }
> + vgpuch->adapter_luid = luid;
> + vgpuch->hdev = hdev;
> + list_add_tail(&vgpuch->vgpu_ch_list_entry,
> + &dxgglobal->vgpu_ch_list_head);
> + } else if (uuid_le_cmp(hdev->dev_type, id_table[1].guid) == 0) {
> + /* This is the global Dxgkgnl channel */
> + pr_debug("Global channel: %pUb",
> + &hdev->channel->offermsg.offer.if_instance);
> + if (dxgglobal->hdev) {
> + /* This device should appear only once */
> + pr_err("global channel already present\n");
> + ret = -EBADE;
> + goto error;
> + }
> + dxgglobal->hdev = hdev;
> + } else {
> + /* Unknown device type */
> + pr_err("probe: unknown device type\n");

dev_err().

> + ret = -EBADE;

-ENODEV


> + goto error;

No need for this goto.

> + }
> +
> +error:
> +
> + mutex_unlock(&dxgglobal->device_mutex);
> +
> + if (ret)
> + pr_debug("err: %s %d", __func__, ret);

dev_dbg()

Also, pr_debug() and dev_dbg() already provide you with the function
name, you never need to add it again.

> + return ret;
> +}
> +
> +static int dxg_remove_vmbus(struct hv_device *hdev)
> +{
> + int ret = 0;
> + struct dxgvgpuchannel *vgpu_channel;
> +
> + mutex_lock(&dxgglobal->device_mutex);
> +
> + if (uuid_le_cmp(hdev->dev_type, id_table[0].guid) == 0) {
> + pr_debug("Remove virtual GPU channel\n");
> + list_for_each_entry(vgpu_channel,
> + &dxgglobal->vgpu_ch_list_head,
> + vgpu_ch_list_entry) {
> + if (vgpu_channel->hdev == hdev) {
> + list_del(&vgpu_channel->vgpu_ch_list_entry);
> + vfree(vgpu_channel);
> + break;
> + }
> + }
> + } else if (uuid_le_cmp(hdev->dev_type, id_table[1].guid) == 0) {
> + pr_debug("Remove global channel device\n");
> + dxgglobal_destroy_global_channel();
> + } else {
> + /* Unknown device type */
> + pr_err("remove: unknown device type\n");
> + ret = -EBADE;
> + }
> +
> + mutex_unlock(&dxgglobal->device_mutex);
> + if (ret)
> + pr_debug("err: %s %d", __func__, ret);
> + return ret;
> +}
> +
> +MODULE_DEVICE_TABLE(vmbus, id_table);
> +
> +static struct hv_driver dxg_drv = {
> + .name = KBUILD_MODNAME,
> + .id_table = id_table,
> + .probe = dxg_probe_vmbus,
> + .remove = dxg_remove_vmbus,
> + .driver = {
> + .probe_type = PROBE_PREFER_ASYNCHRONOUS,
> + },

Odd indentation :(

> +};
> +
> +/*
> + * Interface with Linux kernel

What was the code above this???

> + */
> +
> +static int dxgglobal_create(void)
> +{
> + int ret = 0;

No need for this variable.

> +
> + dxgglobal = vzalloc(sizeof(struct dxgglobal));
> + if (!dxgglobal)
> + return -ENOMEM;
> +
> + INIT_LIST_HEAD(&dxgglobal->plisthead);
> + mutex_init(&dxgglobal->plistmutex);
> + mutex_init(&dxgglobal->device_mutex);
> +
> + INIT_LIST_HEAD(&dxgglobal->vgpu_ch_list_head);
> +
> + init_rwsem(&dxgglobal->channel_lock);
> +
> + pr_debug("dxgglobal_init end\n");
> + return ret;
> +}
> +
> +static void dxgglobal_destroy(void)
> +{
> + if (dxgglobal) {
> + if (dxgglobal->vmbus_registered)
> + vmbus_driver_unregister(&dxg_drv);
> +
> + dxgglobal_destroy_global_channel();
> +
> + if (dxgglobal->pci_registered)
> + pci_unregister_driver(&dxg_pci_drv);
> +
> + vfree(dxgglobal);
> + dxgglobal = NULL;
> + }
> +}
> +
> +/*
> + * Driver entry points

???? Entry into what?

> + */
> +
> +static int __init dxg_drv_init(void)
> +{
> + int ret;
> +
> +

Too many blank lines.

> + ret = dxgglobal_create();
> + if (ret) {
> + pr_err("dxgglobal_init failed");
> + return -ENOMEM;
> + }
> +
> + ret = vmbus_driver_register(&dxg_drv);
> + if (ret) {
> + pr_err("vmbus_driver_register failed: %d", ret);
> + return ret;
> + }
> + dxgglobal->vmbus_registered = true;
> +
> + pr_info("%s Version: %x", __func__, DXGKRNL_VERSION);

When drivers work, they are quiet.

Also, in-kernel modules do not have a version, they are part of the
kernel tree, and that's the "version" they have. You can drop the
individual version here, it makes no sense anymore.

> +#ifndef _D3DKMTHK_H
> +#define _D3DKMTHK_H
> +
> +/* Matches Windows LUID definition */

That does not describe much :(

> +struct winluid {
> + __u32 a;
> + __u32 b;

No kernel doc documentation? What is "a"? "b"?

I'll stop here in the patch series.

Again, have your coworkers review this first and have them put their
names as part of the signed-off-by chain so that we know they agree with
the submission.

thanks,

greg k-h

2022-03-01 23:58:15

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

I will skip things that are pointed out by Greg.

On Tue, Mar 01, 2022 at 11:45:49AM -0800, Iouri Tarassov wrote:
> - Create skeleton and add basic functionality for the
> hyper-v compute device driver (dxgkrnl).
>
> - Register for PCI and VM bus driver notifications and
> handle initialization of VM bus channels.
>
> - Connect the dxgkrnl module to the drivers/hv/ Makefile and Kconfig
>
> - Create a MAINTAINERS entry
>
> A VM bus channel is a communication interface between the hyper-v guest
> and the host. The are two type of VM bus channels, used in the driver:
> - the global channel
> - per virtual compute device channel
>

Same comment regarding the spelling of VMBus and Hyper-V. Please fix
other instances in code and comments.

> A PCI device is created for each virtual compute device, projected
> by the host. The device vendor is PCI_VENDOR_ID_MICROSOFT and device
> id is PCI_DEVICE_ID_VIRTUAL_RENDER. dxg_pci_probe_device handles
> arrival of such devices. The PCI config space of the virtual compute
> device has luid of the corresponding virtual compute device VM
> bus channel. This is how the compute device adapter objects are
> linked to VM bus channels.
>
> VM bus interface version is exchanged by reading/writing the PCI config
> space of the virtual compute device.
>
> The IO space is used to handle CPU accessible compute device
> allocations. Hyper-v allocates IO space for the global VM bus channel.
>
> Signed-off-by: Iouri Tarassov <[email protected]>
> ---
[...]
> +static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
> +{
> + *luid = *(struct winluid *)&guid->b[0];
> +}

This should be moved to the header where luid is defined -- presumably
this is useful for other things in the future too.

Also, please provide a comment on why this conversion is okay.

[...]
> + * A helper function to read PCI config space.
> + */
> +static int dxg_pci_read_dwords(struct pci_dev *dev, int offset, int size,
> + void *val)
> +{
> + int off = offset;
> + int ret;
> + int i;
> +

I think you should check for alignment here? size has to be a round
number of sizeof(int) otherwise you risk reading cross boundary and
causes unintended side effect?

> + for (i = 0; i < size / sizeof(int); i++) {
> + ret = pci_read_config_dword(dev, off, &((int *)val)[i]);
> + if (ret) {
> + pr_err("Failed to read PCI config: %d", off);
> + return ret;
> + }
> + off += sizeof(int);
> + }
> + return 0;
> +}
> +
> +static int dxg_pci_probe_device(struct pci_dev *dev,
> + const struct pci_device_id *id)
> +{
> + int ret;
> + guid_t guid;
> + u32 vmbus_interface_ver = DXGK_VMBUS_INTERFACE_VERSION;
> + struct winluid vgpu_luid = {};
> + struct dxgk_vmbus_guestcaps guest_caps = {.wsl2 = 1};
> +
> + mutex_lock(&dxgglobal->device_mutex);
> +
> + if (dxgglobal->vmbus_ver == 0) {
> + /* Report capabilities to the host */
> +
> + ret = pci_write_config_dword(dev, DXGK_VMBUS_GUESTCAPS_OFFSET,
> + guest_caps.guest_caps);
> + if (ret)
> + goto cleanup;
> +
> + /* Negotiate the VM bus version */
> +
[...]
> + pr_debug("Adapter channel: %pUb\n", &guid);
> + pr_debug("Vmbus interface version: %d\n",
> + dxgglobal->vmbus_ver);

No need to wrap the line here.

> + pr_debug("Host vGPU luid: %x-%x\n",
> + vgpu_luid.b, vgpu_luid.a);

Ditto.

> +
> +cleanup:
> +
> + mutex_unlock(&dxgglobal->device_mutex);
> +
> + if (ret)
> + pr_debug("err: %s %d", __func__, ret);
> + return ret;
> +}
> +
[...]
> +static void __exit dxg_drv_exit(void)
> +{
> + dxgglobal_destroy();
> +}
> +
> +module_init(dxg_drv_init);
> +module_exit(dxg_drv_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Microsoft Dxgkrnl virtual GPU Driver");

Should be "virtual compute device driver" here? Please be consistent.

Thanks,
Wei.

2022-03-02 04:45:41

by Iouri Tarassov

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On 3/1/2022 2:23 PM, Wei Liu wrote:
> On Tue, Mar 01, 2022 at 09:45:41PM +0100, Greg KH wrote:
> > On Tue, Mar 01, 2022 at 11:45:49AM -0800, Iouri Tarassov wrote:
> > > - Create skeleton and add basic functionality for the
> > > hyper-v compute device driver (dxgkrnl).
> > >
> > > - Register for PCI and VM bus driver notifications and
> > > handle initialization of VM bus channels.
> > >
> > > - Connect the dxgkrnl module to the drivers/hv/ Makefile and Kconfig
> > >
> > > - Create a MAINTAINERS entry
> > >
> > > A VM bus channel is a communication interface between the hyper-v guest
> > > and the host. The are two type of VM bus channels, used in the driver:
> > > - the global channel
> > > - per virtual compute device channel
> > >
> > > A PCI device is created for each virtual compute device, projected
> > > by the host. The device vendor is PCI_VENDOR_ID_MICROSOFT and device
> > > id is PCI_DEVICE_ID_VIRTUAL_RENDER. dxg_pci_probe_device handles
> > > arrival of such devices. The PCI config space of the virtual compute
> > > device has luid of the corresponding virtual compute device VM
> > > bus channel. This is how the compute device adapter objects are
> > > linked to VM bus channels.
> > >
> > > VM bus interface version is exchanged by reading/writing the PCI config
> > > space of the virtual compute device.
> > >
> > > The IO space is used to handle CPU accessible compute device
> > > allocations. Hyper-v allocates IO space for the global VM bus channel.
> > >
> > > Signed-off-by: Iouri Tarassov <[email protected]>
> >
> > Please work with internal developers to get reviews from them first,
> > before requiring the kernel community to point out basic issues. It
> > will save you a lot of time and stop us from feeling like we are having
> > our time wasted.
> >
> > Some simple examples below that your coworkers should have caught:
> >
> > > --- /dev/null
> > > +++ b/drivers/hv/dxgkrnl/dxgkrnl.h
> > > @@ -0,0 +1,119 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +
> > > +/*
> > > + * Copyright (c) 2019, Microsoft Corporation.
> >
> > It is now 2022 :)
> >
> > > +void init_ioctls(void);
> >
> > That is a horrible global function name you just added to the kernel's
> > namespace for a single driver :(
> >
> > > +long dxgk_unlocked_ioctl(struct file *f, unsigned int p1, unsigned long p2);
> > > +
> > > +static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
> > > +{
> > > + *luid = *(struct winluid *)&guid->b[0];
> >
> > Why is the cast needed? Shouldn't you use real types in your
> > structures?
> >
> > > +/*
> > > + * The interface version is used to ensure that the host and the guest use the
> > > + * same VM bus protocol. It needs to be incremented every time the VM bus
> > > + * interface changes. DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION is
> > > + * incremented each time the earlier versions of the interface are no longer
> > > + * compatible with the current version.
> > > + */
> > > +#define DXGK_VMBUS_INTERFACE_VERSION_OLD 27
> > > +#define DXGK_VMBUS_INTERFACE_VERSION 40
> > > +#define DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION 16
> >
> > Where do these numbers come from, the hypervisor specification?
> >
> > > +/*
> > > + * Pointer to the global device data. By design
> > > + * there is a single vGPU device on the VM bus and a single /dev/dxg device
> > > + * is created.
> > > + */
> > > +struct dxgglobal *dxgglobal;
> >
> > No, make this per-device, NEVER have a single device for your driver.
> > The Linux driver model makes it harder to do it this way than to do it
> > correctly. Do it correctly please and have no global structures like
> > this.
> >
>
> This may not be as big an issue as you thought. The device discovery is
> still done via the normal VMBus probing routine. For all intents and
> purposes the dxgglobal structure can be broken down into per device
> fields and a global structure which contains the protocol versioning
> information -- my understanding is there will always be a global
> structure to hold information related to the backend, regardless of how
> many devices there are.
>
> I definitely think splitting is doable, but I also understand why Iouri
> does not want to do it _now_ given there is no such a model for multiple
> devices yet, so anything we put into the per-device structure could be
> incomplete and it requires further changing when such a model arrives
> later.
>
> Iouri, please correct me if I have the wrong mental model here.
>
> All in all, I hope this is not going to be a deal breaker for the
> acceptance of this driver.
>
> Thanks,
> Wei.

I agree with Wei that there always be global driver data.

The driver reflects what the host offers and also it must provide the same
interface to user mode as the host driver does. This is because we want the
user mode clients to use the same device interface as if they are working on
the host directly.

By design a single global VMBus channel is offered by the host and a single
/dev/dxg device is created. The /dev/dxg device provides interface to enumerate
virtual compute devices via an ioctl.

If we are to change this model, we would need to make changes to user mode
clients, which is a big re-design change, affecting many hardware vendors.

Thanks
Iouri


2022-03-02 05:11:55

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Tue, Mar 01, 2022 at 09:45:41PM +0100, Greg KH wrote:
> On Tue, Mar 01, 2022 at 11:45:49AM -0800, Iouri Tarassov wrote:
> > - Create skeleton and add basic functionality for the
> > hyper-v compute device driver (dxgkrnl).
> >
> > - Register for PCI and VM bus driver notifications and
> > handle initialization of VM bus channels.
> >
> > - Connect the dxgkrnl module to the drivers/hv/ Makefile and Kconfig
> >
> > - Create a MAINTAINERS entry
> >
> > A VM bus channel is a communication interface between the hyper-v guest
> > and the host. The are two type of VM bus channels, used in the driver:
> > - the global channel
> > - per virtual compute device channel
> >
> > A PCI device is created for each virtual compute device, projected
> > by the host. The device vendor is PCI_VENDOR_ID_MICROSOFT and device
> > id is PCI_DEVICE_ID_VIRTUAL_RENDER. dxg_pci_probe_device handles
> > arrival of such devices. The PCI config space of the virtual compute
> > device has luid of the corresponding virtual compute device VM
> > bus channel. This is how the compute device adapter objects are
> > linked to VM bus channels.
> >
> > VM bus interface version is exchanged by reading/writing the PCI config
> > space of the virtual compute device.
> >
> > The IO space is used to handle CPU accessible compute device
> > allocations. Hyper-v allocates IO space for the global VM bus channel.
> >
> > Signed-off-by: Iouri Tarassov <[email protected]>
>
> Please work with internal developers to get reviews from them first,
> before requiring the kernel community to point out basic issues. It
> will save you a lot of time and stop us from feeling like we are having
> our time wasted.
>
> Some simple examples below that your coworkers should have caught:
>
> > --- /dev/null
> > +++ b/drivers/hv/dxgkrnl/dxgkrnl.h
> > @@ -0,0 +1,119 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +/*
> > + * Copyright (c) 2019, Microsoft Corporation.
>
> It is now 2022 :)
>
> > +void init_ioctls(void);
>
> That is a horrible global function name you just added to the kernel's
> namespace for a single driver :(
>
> > +long dxgk_unlocked_ioctl(struct file *f, unsigned int p1, unsigned long p2);
> > +
> > +static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
> > +{
> > + *luid = *(struct winluid *)&guid->b[0];
>
> Why is the cast needed? Shouldn't you use real types in your
> structures?
>
> > +/*
> > + * The interface version is used to ensure that the host and the guest use the
> > + * same VM bus protocol. It needs to be incremented every time the VM bus
> > + * interface changes. DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION is
> > + * incremented each time the earlier versions of the interface are no longer
> > + * compatible with the current version.
> > + */
> > +#define DXGK_VMBUS_INTERFACE_VERSION_OLD 27
> > +#define DXGK_VMBUS_INTERFACE_VERSION 40
> > +#define DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION 16
>
> Where do these numbers come from, the hypervisor specification?
>
> > +/*
> > + * Pointer to the global device data. By design
> > + * there is a single vGPU device on the VM bus and a single /dev/dxg device
> > + * is created.
> > + */
> > +struct dxgglobal *dxgglobal;
>
> No, make this per-device, NEVER have a single device for your driver.
> The Linux driver model makes it harder to do it this way than to do it
> correctly. Do it correctly please and have no global structures like
> this.
>

This may not be as big an issue as you thought. The device discovery is
still done via the normal VMBus probing routine. For all intents and
purposes the dxgglobal structure can be broken down into per device
fields and a global structure which contains the protocol versioning
information -- my understanding is there will always be a global
structure to hold information related to the backend, regardless of how
many devices there are.

I definitely think splitting is doable, but I also understand why Iouri
does not want to do it _now_ given there is no such a model for multiple
devices yet, so anything we put into the per-device structure could be
incomplete and it requires further changing when such a model arrives
later.

Iouri, please correct me if I have the wrong mental model here.

All in all, I hope this is not going to be a deal breaker for the
acceptance of this driver.

Thanks,
Wei.

2022-03-02 10:40:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> > > +struct dxgglobal *dxgglobal;
> >
> > No, make this per-device, NEVER have a single device for your driver.
> > The Linux driver model makes it harder to do it this way than to do it
> > correctly. Do it correctly please and have no global structures like
> > this.
> >
>
> This may not be as big an issue as you thought. The device discovery is
> still done via the normal VMBus probing routine. For all intents and
> purposes the dxgglobal structure can be broken down into per device
> fields and a global structure which contains the protocol versioning
> information -- my understanding is there will always be a global
> structure to hold information related to the backend, regardless of how
> many devices there are.

Then that is wrong and needs to be fixed. Drivers should almost never
have any global data, that is not how Linux drivers work. What happens
when you get a second device in your system for this? Major rework
would have to happen and the code will break. Handle that all now as it
takes less work to make this per-device than it does to have a global
variable.

> I definitely think splitting is doable, but I also understand why Iouri
> does not want to do it _now_ given there is no such a model for multiple
> devices yet, so anything we put into the per-device structure could be
> incomplete and it requires further changing when such a model arrives
> later.
>
> Iouri, please correct me if I have the wrong mental model here.
>
> All in all, I hope this is not going to be a deal breaker for the
> acceptance of this driver.

For my reviews, yes it will be.

Again, it should be easier to keep things in a per-device state than
not as the proper lifetime rules and the like are automatically handled
for you. If you have global data, you have to manage that all on your
own and it is _MUCH_ harder to review that you got it correct.

Please fix, it will make your life easier as well.

thanks,

greg k-h

2022-03-02 14:30:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Tue, Mar 01, 2022 at 02:47:28PM -0800, Iouri Tarassov wrote:
> On 3/1/2022 2:23 PM, Wei Liu wrote:
> > On Tue, Mar 01, 2022 at 09:45:41PM +0100, Greg KH wrote:
> > > On Tue, Mar 01, 2022 at 11:45:49AM -0800, Iouri Tarassov wrote:
> > > > - Create skeleton and add basic functionality for the
> > > > hyper-v compute device driver (dxgkrnl).
> > > > > > - Register for PCI and VM bus driver notifications and
> > > > handle initialization of VM bus channels.
> > > > > > - Connect the dxgkrnl module to the drivers/hv/ Makefile and
> > Kconfig
> > > > > > - Create a MAINTAINERS entry
> > > > > > A VM bus channel is a communication interface between the
> > hyper-v guest
> > > > and the host. The are two type of VM bus channels, used in the driver:
> > > > - the global channel
> > > > - per virtual compute device channel
> > > > > > A PCI device is created for each virtual compute device,
> > projected
> > > > by the host. The device vendor is PCI_VENDOR_ID_MICROSOFT and device
> > > > id is PCI_DEVICE_ID_VIRTUAL_RENDER. dxg_pci_probe_device handles
> > > > arrival of such devices. The PCI config space of the virtual compute
> > > > device has luid of the corresponding virtual compute device VM
> > > > bus channel. This is how the compute device adapter objects are
> > > > linked to VM bus channels.
> > > > > > VM bus interface version is exchanged by reading/writing the PCI
> > config
> > > > space of the virtual compute device.
> > > > > > The IO space is used to handle CPU accessible compute device
> > > > allocations. Hyper-v allocates IO space for the global VM bus channel.
> > > > > > Signed-off-by: Iouri Tarassov <[email protected]>
> > > > Please work with internal developers to get reviews from them first,
> > > before requiring the kernel community to point out basic issues. It
> > > will save you a lot of time and stop us from feeling like we are having
> > > our time wasted.
> > > > Some simple examples below that your coworkers should have caught:
> > > > > --- /dev/null
> > > > +++ b/drivers/hv/dxgkrnl/dxgkrnl.h
> > > > @@ -0,0 +1,119 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +
> > > > +/*
> > > > + * Copyright (c) 2019, Microsoft Corporation.
> > > > It is now 2022 :)
> > > > > +void init_ioctls(void);
> > > > That is a horrible global function name you just added to the
> > kernel's
> > > namespace for a single driver :(
> > > > > +long dxgk_unlocked_ioctl(struct file *f, unsigned int p1,
> > unsigned long p2);
> > > > +
> > > > +static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
> > > > +{
> > > > + *luid = *(struct winluid *)&guid->b[0];
> > > > Why is the cast needed? Shouldn't you use real types in your
> > > structures?
> > > > > +/*
> > > > + * The interface version is used to ensure that the host and the guest use the
> > > > + * same VM bus protocol. It needs to be incremented every time the VM bus
> > > > + * interface changes. DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION is
> > > > + * incremented each time the earlier versions of the interface are no longer
> > > > + * compatible with the current version.
> > > > + */
> > > > +#define DXGK_VMBUS_INTERFACE_VERSION_OLD 27
> > > > +#define DXGK_VMBUS_INTERFACE_VERSION 40
> > > > +#define DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION 16
> > > > Where do these numbers come from, the hypervisor specification?
> > > > > +/*
> > > > + * Pointer to the global device data. By design
> > > > + * there is a single vGPU device on the VM bus and a single /dev/dxg device
> > > > + * is created.
> > > > + */
> > > > +struct dxgglobal *dxgglobal;
> > > > No, make this per-device, NEVER have a single device for your
> > driver.
> > > The Linux driver model makes it harder to do it this way than to do it
> > > correctly. Do it correctly please and have no global structures like
> > > this.
> > >
> >
> > This may not be as big an issue as you thought. The device discovery is
> > still done via the normal VMBus probing routine. For all intents and
> > purposes the dxgglobal structure can be broken down into per device
> > fields and a global structure which contains the protocol versioning
> > information -- my understanding is there will always be a global
> > structure to hold information related to the backend, regardless of how
> > many devices there are.
> >
> > I definitely think splitting is doable, but I also understand why Iouri
> > does not want to do it _now_ given there is no such a model for multiple
> > devices yet, so anything we put into the per-device structure could be
> > incomplete and it requires further changing when such a model arrives
> > later.
> >
> > Iouri, please correct me if I have the wrong mental model here.
> >
> > All in all, I hope this is not going to be a deal breaker for the
> > acceptance of this driver.
> >
> > Thanks,
> > Wei.
>
> I agree with Wei that there always be global driver data.

Why?

> The driver reflects what the host offers and also it must provide the same
> interface to user mode as the host driver does. This is because we want the
> user mode clients to use the same device interface as if they are working on
> the host directly.

That's fine, put that state in the device interface.

> By design a single global VMBus channel is offered by the host and a single
> /dev/dxg device is created. The /dev/dxg device provides interface to enumerate
> virtual compute devices via an ioctl.

You have device data for each vmbus channel given to the driver, use
that.

> If we are to change this model, we would need to make changes to user mode
> clients, which is a big re-design change, affecting many hardware vendors.

This should not be visible to userspace at all. If so, you are really
doing something odd.

thanks,

greg k-h

2022-03-02 22:40:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Wed, Mar 02, 2022 at 10:49:15AM -0800, Iouri Tarassov wrote:
> On 3/2/2022 3:53 AM, Wei Liu wrote:
> > On Wed, Mar 02, 2022 at 08:53:15AM +0100, Greg KH wrote:
> > > On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> > > > > > +struct dxgglobal *dxgglobal;
> > > > >
> > > > > No, make this per-device, NEVER have a single device for your driver.
> > > > > The Linux driver model makes it harder to do it this way than to do it
> > > > > correctly. Do it correctly please and have no global structures like
> > > > > this.
> > > > >
> > > >
> > > > This may not be as big an issue as you thought. The device discovery is
> > > > still done via the normal VMBus probing routine. For all intents and
> > > > purposes the dxgglobal structure can be broken down into per device
> > > > fields and a global structure which contains the protocol versioning
> > > > information -- my understanding is there will always be a global
> > > > structure to hold information related to the backend, regardless of how
> > > > many devices there are.
> > >
> > > Then that is wrong and needs to be fixed. Drivers should almost never
> > > have any global data, that is not how Linux drivers work. What happens
> > > when you get a second device in your system for this? Major rework
> > > would have to happen and the code will break. Handle that all now as it
> > > takes less work to make this per-device than it does to have a global
> > > variable.
> > >
> >
> > It is perhaps easier to draw parallel from an existing driver. I feel
> > like we're talking past each other.
> >
> > Let's look at drivers/iommu/intel/iommu.c. There are a bunch of lists
> > like `static LIST_HEAD(dmar_rmrr_units)`. During the probing phase, new
> > units will be added to the list. I this the current code is following
> > this model. dxgglobal fulfills the role of a list.
> >
> > Setting aside the question of whether it makes sense to keep a copy of
> > the per-VM state in each device instance, I can see the code be changed
> > to:
> >
> > struct mutex device_mutex; /* split out from dxgglobal */
> > static LIST_HEAD(dxglist);
> >
> > /* Rename struct dxgglobal to struct dxgstate */
> > struct dxgstate {
> > struct list_head dxglist; /* link for dxglist */
> > /* ... original fields sans device_mutex */
> > }
> >
> > /*
> > * Provide a bunch of helpers manipulate the list. Called in probe /
> > * remove etc.
> > */
> > struct dxgstate *find_dxgstate(...);
> > void remove_dxgstate(...);
> > int add_dxgstate(...);
> >
> > This model is well understood and used in tree. It is just that it
> > doesn't provide much value in doing this now since the list will only
> > contain one element. I hope that you're not saying we cannot even use a
> > per-module pointer to quickly get the data structure we want to use,
> > right?
> >
> > Are you suggesting Iouri use dev_set_drvdata to stash the dxgstate
> > into the device object? I think that can be done too.
> >
> > The code can be changed as:
> >
> > /* Rename struct dxgglobal to dxgstate and remove unneeded fields */
> > struct dxgstate { ... };
> >
> > static int dxg_probe_vmbus(...) {
> >
> > /* probe successfully */
> >
> > struct dxgstate *state = kmalloc(...);
> > /* Fill in dxgstate with information from backend */
> >
> > /* hdev->dev is the device object from the core driver framework */
> > dev_set_drvdata(&hdev->dev, state);
> > }
> >
> > static int dxg_remove_vmbus(...) {
> > /* Normal stuff here ...*/
> >
> > struct dxgstate *state = dev_get_drvdata(...);
> > dev_set_drvdata(..., NULL);
> > kfree(state);
> > }
> >
> > /* In all other functions */
> > void do_things(...) {
> > struct dxgstate *state = dev_get_drvdata(...);
> >
> > /* Use state in place of where dxgglobal was needed */
> >
> > }
> >
> > Iouri, notice this doesn't change anything regarding how userspace is
> > designed. This is about how kernel organises its data.
> >
> > I hope what I wrote above can bring our understanding closer.
> >
> > Thanks,
> > Wei.
>
>
> I can certainly remove dxgglobal and keep the? pointer to the global
> state in the device object.
>
> This will require passing of the global pointer to all functions, which
> need to access it.
>
>
> Maybe my understanding of the Greg's suggestion was not correct. I
> thought the suggestion was
>
> to have multiple /dev/dxgN devices (one per virtual compute device).

You have one device per HV device, as the bus already provides you.
That's all you really need, right? Who would be opening the same device
node multiple times?

> This would change how the user mode
> clients enumerate and communicate with compute devices.

What does userspace have to do here? It should just open the device
node that is present when needed. How will there be multiple userspace
clients for a single HV device?

thanks,

greg k-h

2022-03-02 23:07:53

by Iouri Tarassov

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On 3/2/2022 3:53 AM, Wei Liu wrote:
> On Wed, Mar 02, 2022 at 08:53:15AM +0100, Greg KH wrote:
> > On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> > > > > +struct dxgglobal *dxgglobal;
> > > >
> > > > No, make this per-device, NEVER have a single device for your driver.
> > > > The Linux driver model makes it harder to do it this way than to do it
> > > > correctly. Do it correctly please and have no global structures like
> > > > this.
> > > >
> > >
> > > This may not be as big an issue as you thought. The device discovery is
> > > still done via the normal VMBus probing routine. For all intents and
> > > purposes the dxgglobal structure can be broken down into per device
> > > fields and a global structure which contains the protocol versioning
> > > information -- my understanding is there will always be a global
> > > structure to hold information related to the backend, regardless of how
> > > many devices there are.
> >
> > Then that is wrong and needs to be fixed. Drivers should almost never
> > have any global data, that is not how Linux drivers work. What happens
> > when you get a second device in your system for this? Major rework
> > would have to happen and the code will break. Handle that all now as it
> > takes less work to make this per-device than it does to have a global
> > variable.
> >
>
> It is perhaps easier to draw parallel from an existing driver. I feel
> like we're talking past each other.
>
> Let's look at drivers/iommu/intel/iommu.c. There are a bunch of lists
> like `static LIST_HEAD(dmar_rmrr_units)`. During the probing phase, new
> units will be added to the list. I this the current code is following
> this model. dxgglobal fulfills the role of a list.
>
> Setting aside the question of whether it makes sense to keep a copy of
> the per-VM state in each device instance, I can see the code be changed
> to:
>
> struct mutex device_mutex; /* split out from dxgglobal */
> static LIST_HEAD(dxglist);
>
> /* Rename struct dxgglobal to struct dxgstate */
> struct dxgstate {
> struct list_head dxglist; /* link for dxglist */
> /* ... original fields sans device_mutex */
> }
>
> /*
> * Provide a bunch of helpers manipulate the list. Called in probe /
> * remove etc.
> */
> struct dxgstate *find_dxgstate(...);
> void remove_dxgstate(...);
> int add_dxgstate(...);
>
> This model is well understood and used in tree. It is just that it
> doesn't provide much value in doing this now since the list will only
> contain one element. I hope that you're not saying we cannot even use a
> per-module pointer to quickly get the data structure we want to use,
> right?
>
> Are you suggesting Iouri use dev_set_drvdata to stash the dxgstate
> into the device object? I think that can be done too.
>
> The code can be changed as:
>
> /* Rename struct dxgglobal to dxgstate and remove unneeded fields */
> struct dxgstate { ... };
>
> static int dxg_probe_vmbus(...) {
>
> /* probe successfully */
>
> struct dxgstate *state = kmalloc(...);
> /* Fill in dxgstate with information from backend */
>
> /* hdev->dev is the device object from the core driver framework */
> dev_set_drvdata(&hdev->dev, state);
> }
>
> static int dxg_remove_vmbus(...) {
> /* Normal stuff here ...*/
>
> struct dxgstate *state = dev_get_drvdata(...);
> dev_set_drvdata(..., NULL);
> kfree(state);
> }
>
> /* In all other functions */
> void do_things(...) {
> struct dxgstate *state = dev_get_drvdata(...);
>
> /* Use state in place of where dxgglobal was needed */
>
> }
>
> Iouri, notice this doesn't change anything regarding how userspace is
> designed. This is about how kernel organises its data.
>
> I hope what I wrote above can bring our understanding closer.
>
> Thanks,
> Wei.


I can certainly remove dxgglobal and keep the  pointer to the global
state in the device object.

This will require passing of the global pointer to all functions, which
need to access it.


Maybe my understanding of the Greg's suggestion was not correct. I
thought the suggestion was

to have multiple /dev/dxgN devices (one per virtual compute device).
This would change how the user mode

clients enumerate and communicate with compute devices.


Thanks

Iouri

2022-03-02 23:20:12

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Wed, Mar 02, 2022 at 08:53:15AM +0100, Greg KH wrote:
> On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> > > > +struct dxgglobal *dxgglobal;
> > >
> > > No, make this per-device, NEVER have a single device for your driver.
> > > The Linux driver model makes it harder to do it this way than to do it
> > > correctly. Do it correctly please and have no global structures like
> > > this.
> > >
> >
> > This may not be as big an issue as you thought. The device discovery is
> > still done via the normal VMBus probing routine. For all intents and
> > purposes the dxgglobal structure can be broken down into per device
> > fields and a global structure which contains the protocol versioning
> > information -- my understanding is there will always be a global
> > structure to hold information related to the backend, regardless of how
> > many devices there are.
>
> Then that is wrong and needs to be fixed. Drivers should almost never
> have any global data, that is not how Linux drivers work. What happens
> when you get a second device in your system for this? Major rework
> would have to happen and the code will break. Handle that all now as it
> takes less work to make this per-device than it does to have a global
> variable.
>

It is perhaps easier to draw parallel from an existing driver. I feel
like we're talking past each other.

Let's look at drivers/iommu/intel/iommu.c. There are a bunch of lists
like `static LIST_HEAD(dmar_rmrr_units)`. During the probing phase, new
units will be added to the list. I this the current code is following
this model. dxgglobal fulfills the role of a list.

Setting aside the question of whether it makes sense to keep a copy of
the per-VM state in each device instance, I can see the code be changed
to:

struct mutex device_mutex; /* split out from dxgglobal */
static LIST_HEAD(dxglist);

/* Rename struct dxgglobal to struct dxgstate */
struct dxgstate {
struct list_head dxglist; /* link for dxglist */
/* ... original fields sans device_mutex */
}

/*
* Provide a bunch of helpers manipulate the list. Called in probe /
* remove etc.
*/
struct dxgstate *find_dxgstate(...);
void remove_dxgstate(...);
int add_dxgstate(...);

This model is well understood and used in tree. It is just that it
doesn't provide much value in doing this now since the list will only
contain one element. I hope that you're not saying we cannot even use a
per-module pointer to quickly get the data structure we want to use,
right?

Are you suggesting Iouri use dev_set_drvdata to stash the dxgstate
into the device object? I think that can be done too.

The code can be changed as:

/* Rename struct dxgglobal to dxgstate and remove unneeded fields */
struct dxgstate { ... };

static int dxg_probe_vmbus(...) {

/* probe successfully */

struct dxgstate *state = kmalloc(...);
/* Fill in dxgstate with information from backend */

/* hdev->dev is the device object from the core driver framework */
dev_set_drvdata(&hdev->dev, state);
}

static int dxg_remove_vmbus(...) {
/* Normal stuff here ...*/

struct dxgstate *state = dev_get_drvdata(...);
dev_set_drvdata(..., NULL);
kfree(state);
}

/* In all other functions */
void do_things(...) {
struct dxgstate *state = dev_get_drvdata(...);

/* Use state in place of where dxgglobal was needed */

}

Iouri, notice this doesn't change anything regarding how userspace is
designed. This is about how kernel organises its data.

I hope what I wrote above can bring our understanding closer.

Thanks,
Wei.

2022-03-03 00:05:51

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Wed, Mar 02, 2022 at 10:49:15AM -0800, Iouri Tarassov wrote:
> On 3/2/2022 3:53 AM, Wei Liu wrote:
> > On Wed, Mar 02, 2022 at 08:53:15AM +0100, Greg KH wrote:
> > > On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> > > > > > +struct dxgglobal *dxgglobal;
> > > > >
> > > > > No, make this per-device, NEVER have a single device for your driver.
> > > > > The Linux driver model makes it harder to do it this way than to do it
> > > > > correctly. Do it correctly please and have no global structures like
> > > > > this.
> > > > >
> > > >
> > > > This may not be as big an issue as you thought. The device discovery is
> > > > still done via the normal VMBus probing routine. For all intents and
> > > > purposes the dxgglobal structure can be broken down into per device
> > > > fields and a global structure which contains the protocol versioning
> > > > information -- my understanding is there will always be a global
> > > > structure to hold information related to the backend, regardless of how
> > > > many devices there are.
> > >
> > > Then that is wrong and needs to be fixed. Drivers should almost never
> > > have any global data, that is not how Linux drivers work. What happens
> > > when you get a second device in your system for this? Major rework
> > > would have to happen and the code will break. Handle that all now as it
> > > takes less work to make this per-device than it does to have a global
> > > variable.
> > >
> >
> > It is perhaps easier to draw parallel from an existing driver. I feel
> > like we're talking past each other.
> >
> > Let's look at drivers/iommu/intel/iommu.c. There are a bunch of lists
> > like `static LIST_HEAD(dmar_rmrr_units)`. During the probing phase, new
> > units will be added to the list. I this the current code is following
> > this model. dxgglobal fulfills the role of a list.
> >
> > Setting aside the question of whether it makes sense to keep a copy of
> > the per-VM state in each device instance, I can see the code be changed
> > to:
> >
> > struct mutex device_mutex; /* split out from dxgglobal */
> > static LIST_HEAD(dxglist);
> >
> > /* Rename struct dxgglobal to struct dxgstate */
> > struct dxgstate {
> > struct list_head dxglist; /* link for dxglist */
> > /* ... original fields sans device_mutex */
> > }
> >
> > /*
> > * Provide a bunch of helpers manipulate the list. Called in probe /
> > * remove etc.
> > */
> > struct dxgstate *find_dxgstate(...);
> > void remove_dxgstate(...);
> > int add_dxgstate(...);
> >
> > This model is well understood and used in tree. It is just that it
> > doesn't provide much value in doing this now since the list will only
> > contain one element. I hope that you're not saying we cannot even use a
> > per-module pointer to quickly get the data structure we want to use,
> > right?
> >
> > Are you suggesting Iouri use dev_set_drvdata to stash the dxgstate
> > into the device object? I think that can be done too.
> >
> > The code can be changed as:
> >
> > /* Rename struct dxgglobal to dxgstate and remove unneeded fields */
> > struct dxgstate { ... };
> >
> > static int dxg_probe_vmbus(...) {
> >
> > /* probe successfully */
> >
> > struct dxgstate *state = kmalloc(...);
> > /* Fill in dxgstate with information from backend */
> >
> > /* hdev->dev is the device object from the core driver framework */
> > dev_set_drvdata(&hdev->dev, state);
> > }
> >
> > static int dxg_remove_vmbus(...) {
> > /* Normal stuff here ...*/
> >
> > struct dxgstate *state = dev_get_drvdata(...);
> > dev_set_drvdata(..., NULL);
> > kfree(state);
> > }
> >
> > /* In all other functions */
> > void do_things(...) {
> > struct dxgstate *state = dev_get_drvdata(...);
> >
> > /* Use state in place of where dxgglobal was needed */
> >
> > }
> >
> > Iouri, notice this doesn't change anything regarding how userspace is
> > designed. This is about how kernel organises its data.
> >
> > I hope what I wrote above can bring our understanding closer.
> >
> > Thanks,
> > Wei.
>
>
> I can certainly remove dxgglobal and keep the? pointer to the global
> state in the device object.
>

No, no more global pointer needed. You just call dev_drv_setdata in the
place that you assign to the global pointer.

> This will require passing of the global pointer to all functions, which
> need to access it.
>

And in the place you need the global pointer, call dev_drv_getdata.

>
> Maybe my understanding of the Greg's suggestion was not correct. I
> thought the suggestion was
>
> to have multiple /dev/dxgN devices (one per virtual compute device).
> This would change how the user mode
>

No. You still have only one /dev/dxg here.

Wei.

2022-03-03 00:33:20

by Iouri Tarassov

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading


On 3/2/2022 12:20 PM, Greg KH wrote:
> On Wed, Mar 02, 2022 at 10:49:15AM -0800, Iouri Tarassov wrote:
> > On 3/2/2022 3:53 AM, Wei Liu wrote:
> > > On Wed, Mar 02, 2022 at 08:53:15AM +0100, Greg KH wrote:
> > > > On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> > > > > > > +struct dxgglobal *dxgglobal;
> > > > > >
> > > > > > No, make this per-device, NEVER have a single device for your driver.
> > > > > > The Linux driver model makes it harder to do it this way than to do it
> > > > > > correctly. Do it correctly please and have no global structures like
> > > > > > this.
> > > > > >
> > > > >
> > > > > This may not be as big an issue as you thought. The device discovery is
> > > > > still done via the normal VMBus probing routine. For all intents and
> > > > > purposes the dxgglobal structure can be broken down into per device
> > > > > fields and a global structure which contains the protocol versioning
> > > > > information -- my understanding is there will always be a global
> > > > > structure to hold information related to the backend, regardless of how
> > > > > many devices there are.
> > > >
> > > > Then that is wrong and needs to be fixed. Drivers should almost never
> > > > have any global data, that is not how Linux drivers work. What happens
> > > > when you get a second device in your system for this? Major rework
> > > > would have to happen and the code will break. Handle that all now as it
> > > > takes less work to make this per-device than it does to have a global
> > > > variable.
> > > >
> > >
> > > It is perhaps easier to draw parallel from an existing driver. I feel
> > > like we're talking past each other.
> > >
> > > Let's look at drivers/iommu/intel/iommu.c. There are a bunch of lists
> > > like `static LIST_HEAD(dmar_rmrr_units)`. During the probing phase, new
> > > units will be added to the list. I this the current code is following
> > > this model. dxgglobal fulfills the role of a list.
> > >
> > > Setting aside the question of whether it makes sense to keep a copy of
> > > the per-VM state in each device instance, I can see the code be changed
> > > to:
> > >
> > > struct mutex device_mutex; /* split out from dxgglobal */
> > > static LIST_HEAD(dxglist);
> > >
> > > /* Rename struct dxgglobal to struct dxgstate */
> > > struct dxgstate {
> > > struct list_head dxglist; /* link for dxglist */
> > > /* ... original fields sans device_mutex */
> > > }
> > >
> > > /*
> > > * Provide a bunch of helpers manipulate the list. Called in probe /
> > > * remove etc.
> > > */
> > > struct dxgstate *find_dxgstate(...);
> > > void remove_dxgstate(...);
> > > int add_dxgstate(...);
> > >
> > > This model is well understood and used in tree. It is just that it
> > > doesn't provide much value in doing this now since the list will only
> > > contain one element. I hope that you're not saying we cannot even use a
> > > per-module pointer to quickly get the data structure we want to use,
> > > right?
> > >
> > > Are you suggesting Iouri use dev_set_drvdata to stash the dxgstate
> > > into the device object? I think that can be done too.
> > >
> > > The code can be changed as:
> > >
> > > /* Rename struct dxgglobal to dxgstate and remove unneeded fields */
> > > struct dxgstate { ... };
> > >
> > > static int dxg_probe_vmbus(...) {
> > >
> > > /* probe successfully */
> > >
> > > struct dxgstate *state = kmalloc(...);
> > > /* Fill in dxgstate with information from backend */
> > >
> > > /* hdev->dev is the device object from the core driver framework */
> > > dev_set_drvdata(&hdev->dev, state);
> > > }
> > >
> > > static int dxg_remove_vmbus(...) {
> > > /* Normal stuff here ...*/
> > >
> > > struct dxgstate *state = dev_get_drvdata(...);
> > > dev_set_drvdata(..., NULL);
> > > kfree(state);
> > > }
> > >
> > > /* In all other functions */
> > > void do_things(...) {
> > > struct dxgstate *state = dev_get_drvdata(...);
> > >
> > > /* Use state in place of where dxgglobal was needed */
> > >
> > > }
> > >
> > > Iouri, notice this doesn't change anything regarding how userspace is
> > > designed. This is about how kernel organises its data.
> > >
> > > I hope what I wrote above can bring our understanding closer.
> > >
> > > Thanks,
> > > Wei.
> >
> >
> > I can certainly remove dxgglobal and keep the  pointer to the global
> > state in the device object.
> >
> > This will require passing of the global pointer to all functions, which
> > need to access it.
> >
> >
> > Maybe my understanding of the Greg's suggestion was not correct. I
> > thought the suggestion was
> >
> > to have multiple /dev/dxgN devices (one per virtual compute device).
>
> You have one device per HV device, as the bus already provides you.
> That's all you really need, right? Who would be opening the same device
> node multiple times?
> > This would change how the user mode
> > clients enumerate and communicate with compute devices.
>
> What does userspace have to do here? It should just open the device
> node that is present when needed. How will there be multiple userspace
> clients for a single HV device?


Dxgkrnl creates a single user mode visible device node /dev/dxg. It has
nothing to do with a specific hardware compute device on the host. Its
purpose is to provide services (IOCTLs) to enumerate and manage virtual
compute devices, which represent hardware devices on the host. The VMBus
devices are not used directly by user mode clients in the current design.

Virtual compute devices are shared between processes. There could be a
Cuda application, Gimp and a Direct3D12 application working at the same
time.This is what I mean by saying that there are multiple user mode
clients who use the /dev/dxg driver interface. Each of this applications
will open the /dev/dxg device node and enumerate/use virtual compute
devices.

If we change the way how the virtual compute devices are visible to user
mode, the Cuda runtime, Direct3D runtime would need to be changed.

I think we agreed that I will keep the global driver state in the device
object as Wei suggested and remove global variables. There still will be
a single /dev/dxg device node. Correct?


Thanks

Iouri

2022-03-03 00:38:49

by Iouri Tarassov

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading


On 3/1/2022 12:45 PM, Greg KH wrote:
> On Tue, Mar 01, 2022 at 11:45:49AM -0800, Iouri Tarassov wrote:
> > - Create skeleton and add basic functionality for the
> > hyper-v compute device driver (dxgkrnl).
> >
> > +
> > +#undef pr_fmt
> > +#define pr_fmt(fmt) "dxgk: " fmt
>
> Use the dev_*() print functions, you are a driver, and you always have a
> pointer to a struct device. There's no need to ever call pr_*().
>

There is no struct device during module initialization until the
/dev/dxg device is created. Is it ok to use pr_* functions in this case?
Should dev_*(NULL,...) be used? I see other drivers use the pr_*
functions in this case (mips.c as an example).


Thanks

Iouri

> thanks,
>
> greg k-h

2022-03-03 00:52:43

by Iouri Tarassov

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

Hi Greg,

Thanks a lot for reviewing the patches.

I appreciate the very detailed comments. I my reply I omitted the
comments, which I agree with and will address in subsequent patches.

On 3/1/2022 12:45 PM, Greg KH wrote:

>
> > +long dxgk_unlocked_ioctl(struct file *f, unsigned int p1, unsigned long p2);
> > +
> > +static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
> > +{
> > + *luid = *(struct winluid *)&guid->b[0];
>
> Why is the cast needed? Shouldn't you use real types in your
> structures?


The VMBus channel ID, which is GUID, is present in the PCI config space
of the compute device. The convention is that the lower part of the GUID
is LUID (local unique identifier). This function just converts GUID to
LUID. I am not sure what is the ask here.


> > +/*
> > + * The interface version is used to ensure that the host and the guest use the
> > + * same VM bus protocol. It needs to be incremented every time the VM bus
> > + * interface changes. DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION is
> > + * incremented each time the earlier versions of the interface are no longer
> > + * compatible with the current version.
> > + */
> > +#define DXGK_VMBUS_INTERFACE_VERSION_OLD 27
> > +#define DXGK_VMBUS_INTERFACE_VERSION 40
> > +#define DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION 16
>
> Where do these numbers come from, the hypervisor specification?


The definitions are for the VMBus interface between the virtual compute
device in the guest and the Windows host. The interface version is
updated when the VMBus protocol is changed. These are specific to the
virtual compute device, not to the hypervisor.


> > +/*
> > + * Pointer to the global device data. By design
> > + * there is a single vGPU device on the VM bus and a single /dev/dxg device
> > + * is created.
> > + */
> > +struct dxgglobal *dxgglobal;
>
> No, make this per-device, NEVER have a single device for your driver.
> The Linux driver model makes it harder to do it this way than to do it
> correctly. Do it correctly please and have no global structures like
> this.


This will be addressed in the next version of patches.


> > +#define DXGKRNL_VERSION 0x2216
>
> What does this mean?


This is the driver implementation version. There are many Microsoft
shipped Linux kernels, which include the driver. The version allows to
quickly determine, which level of functionality or bug fixes is
implemented in the current driver.


> > +
> > +/*
> > + * Interface with the PCI driver
> > + */
> > +
> > +/*
> > + * Part of the PCI config space of the vGPU device is used for vGPU
> > + * configuration data. Reading/writing of the PCI config space is forwarded
> > + * to the host.
> > + */
>
> > +/* DXGK_VMBUS_INTERFACE_VERSION (u32) */
>
> What is this?


This comment explains that the value of DXGK_VMBUS_INTERFACE_VERSION is
located at the offset DXGK_VMBUS_CHANNEL_ID_OFFSET in the PCI config space.


> > +#define DXGK_VMBUS_VERSION_OFFSET (DXGK_VMBUS_CHANNEL_ID_OFFSET + \
> > + sizeof(guid_t))
>
> offsetof() is your friend, please use it.


There is no structure definition for the PCI config space of the device.
Therefore, offsets are computed by using data size.


> > +/* Luid of the virtual GPU on the host (struct winluid) */
>
> What is a "luid"?


LUID is "Locally Unique Identifier". It is used in Windows and its value
is guaranteed to be unique until the computer reboots. This is similar
to GUID, but is it not globally unique. Dxgkrnl on the host uses LUIDs
to identify compute adapter objects.
I added a comment to the definition in the header.


> > + ret = vmbus_driver_register(&dxg_drv);
> > + if (ret) {
> > + pr_err("vmbus_driver_register failed: %d", ret);
> > + return ret;
> > + }
> > + dxgglobal->vmbus_registered = true;
> > +
> > + pr_info("%s Version: %x", __func__, DXGKRNL_VERSION);
>
> When drivers work, they are quiet.
>
> Also, in-kernel modules do not have a version, they are part of the
> kernel tree, and that's the "version" they have. You can drop the
> individual version here, it makes no sense anymore.


Microsoft develops many Linux kernels when the dxgkrnl driver is
included (EFLOW, Android, WSL). The kernels are built at different times
and might include a different version of the driver. Printing the
version allows to quickly determine what functionality and bug fixes are
included in the driver.

I see that other in-tree drivers print some information during
init_module (like nfblock.c).

This is done for convenience. So instead of tracking multiple kernel
versions, we can track the dxgkrnl driver version. I am ok to remove it
if it really hurts something. It is only a single line per driver load.


Thanks

Iouri

2022-03-03 01:31:57

by Iouri Tarassov

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading


On 3/1/2022 11:53 PM, Greg KH wrote:
> On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> > > > +struct dxgglobal *dxgglobal;
> > >
> > > No, make this per-device, NEVER have a single device for your driver.
> > > The Linux driver model makes it harder to do it this way than to do it
> > > correctly. Do it correctly please and have no global structures like
> > > this.
> > >
> >
> > This may not be as big an issue as you thought. The device discovery is
> > still done via the normal VMBus probing routine. For all intents and
> > purposes the dxgglobal structure can be broken down into per device
> > fields and a global structure which contains the protocol versioning
> > information -- my understanding is there will always be a global
> > structure to hold information related to the backend, regardless of how
> > many devices there are.
>
> Then that is wrong and needs to be fixed. Drivers should almost never
> have any global data, that is not how Linux drivers work. What happens
> when you get a second device in your system for this? Major rework
> would have to happen and the code will break. Handle that all now as it
> takes less work to make this per-device than it does to have a global
> variable.
>
> > I definitely think splitting is doable, but I also understand why Iouri
> > does not want to do it _now_ given there is no such a model for multiple
> > devices yet, so anything we put into the per-device structure could be
> > incomplete and it requires further changing when such a model arrives
> > later.
> >
> > Iouri, please correct me if I have the wrong mental model here.
> >
> > All in all, I hope this is not going to be a deal breaker for the
> > acceptance of this driver.
>
> For my reviews, yes it will be.
>
> Again, it should be easier to keep things in a per-device state than
> not as the proper lifetime rules and the like are automatically handled
> for you. If you have global data, you have to manage that all on your
> own and it is _MUCH_ harder to review that you got it correct.

Hi Greg,

I do not really see how the driver be written without the global data. Let's review the design.

Dxgkrnl acts as the aggregator of all virtual compute devices, projected by the host. It needs to do operations, which do not belong to a particular compute device. For example, cross device synchronization and resource sharing.

A PCI device device is created for each virtual compute device. Therefore, there should be a global list of objects and a mutex to synchronize access to the list.

A VMBus channel is offered by the host for each compute device. The list of the VMBus channels should be global.

A global VMBus channel is offered by the host. The channel does not belong to any particular compute device, so it must be global.

IO space is shared by all compute devices, so its parameters should be global.

Dxgkrnl needs to maintain a list of processes, which opened compute device objects. Dxgkrnl maintains private state for each process and when a process opens the /dev/dxg device, Dxgkrnl needs to find if the process state is already created by walking the global process list.

Now, where to keep this global state? It could be kept in the /dev/dxg private device structure. But this structure is not available when, for example, dxg_pci_probe_device() or dxg_probe_vmbus() is called.

Can there be multiple /dev/dxg devices? No. Because the /dev/dxg device represents the driver itself, not a particular compute device.

I am not sure what design model you have in mind when saying there should be no global data. Could you please explain keeping in mind the above requirements?

Thanks
Iouri

2022-03-03 02:02:34

by Iouri Tarassov

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading


On 3/1/2022 2:06 PM, Wei Liu wrote:
> I will skip things that are pointed out by Greg.
>
> On Tue, Mar 01, 2022 at 11:45:49AM -0800, Iouri Tarassov wrote:
> > - Create skeleton and add basic functionality for the
> > hyper-v compute device driver (dxgkrnl).
> >
> > - Register for PCI and VM bus driver notifications and
> > handle initialization of VM bus channels.
> >
> > - Connect the dxgkrnl module to the drivers/hv/ Makefile and Kconfig
> >
> > - Create a MAINTAINERS entry
> >
> > A VM bus channel is a communication interface between the hyper-v guest
> > and the host. The are two type of VM bus channels, used in the driver:
> > - the global channel
> > - per virtual compute device channel
> >
>
> Same comment regarding the spelling of VMBus and Hyper-V. Please fix
> other instances in code and comments.
>
> > A PCI device is created for each virtual compute device, projected
> > by the host. The device vendor is PCI_VENDOR_ID_MICROSOFT and device
> > id is PCI_DEVICE_ID_VIRTUAL_RENDER. dxg_pci_probe_device handles
> > arrival of such devices. The PCI config space of the virtual compute
> > device has luid of the corresponding virtual compute device VM
> > bus channel. This is how the compute device adapter objects are
> > linked to VM bus channels.
> >
> > VM bus interface version is exchanged by reading/writing the PCI config
> > space of the virtual compute device.
> >
> > The IO space is used to handle CPU accessible compute device
> > allocations. Hyper-v allocates IO space for the global VM bus channel.
> >
> > Signed-off-by: Iouri Tarassov <[email protected]>
> > ---
> [...]
> > +static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
> > +{
> > + *luid = *(struct winluid *)&guid->b[0];
> > +}
>
> This should be moved to the header where luid is defined -- presumably
> this is useful for other things in the future too.
>
> Also, please provide a comment on why this conversion is okay.
>
The definition of the structure is in the public header. I do not think it makes sense to move the function there. It is a detail of the internal implementation. There is no official conversion of GUID to LUID.

The comment will be added.

Thanks
Iouri


2022-03-03 13:59:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Wed, Mar 02, 2022 at 03:27:56PM -0800, Iouri Tarassov wrote:
>
> On 3/1/2022 12:45 PM, Greg KH wrote:
> > On Tue, Mar 01, 2022 at 11:45:49AM -0800, Iouri Tarassov wrote:
> > > - Create skeleton and add basic functionality for the
> > > hyper-v compute device driver (dxgkrnl).
> > >
> > > +
> > > +#undef pr_fmt
> > > +#define pr_fmt(fmt) "dxgk: " fmt
> >
> > Use the dev_*() print functions, you are a driver, and you always have a
> > pointer to a struct device. There's no need to ever call pr_*().
> >
>
> There is no struct device during module initialization until the
> /dev/dxg device is created.

Then you should not have anything to print out.

> Is it ok to use pr_* functions in this case?

Nope.

> Should dev_*(NULL,...) be used?

No, not at all, never!

> I see other drivers use the pr_* functions in this case (mips.c as an
> example).

There are lots of bad examples in the kernel tree, let's make your code
a good one.

thanks,

greg k-h

2022-03-03 16:14:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Wed, Mar 02, 2022 at 02:59:13PM -0800, Iouri Tarassov wrote:
> Hi Greg,
>
> Thanks a lot for reviewing the patches.
>
> I appreciate the very detailed comments. I my reply I omitted the
> comments, which I agree with and will address in subsequent patches.
>
> On 3/1/2022 12:45 PM, Greg KH wrote:
>
> >
> > > +long dxgk_unlocked_ioctl(struct file *f, unsigned int p1, unsigned long p2);
> > > +
> > > +static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
> > > +{
> > > + *luid = *(struct winluid *)&guid->b[0];
> >
> > Why is the cast needed? Shouldn't you use real types in your
> > structures?
>
>
> The VMBus channel ID, which is GUID, is present in the PCI config space
> of the compute device. The convention is that the lower part of the GUID
> is LUID (local unique identifier). This function just converts GUID to
> LUID. I am not sure what is the ask here.

You need to documen the heck out of what you are doing here as it looks
very very odd.

> > > +/*
> > > + * The interface version is used to ensure that the host and the guest use the
> > > + * same VM bus protocol. It needs to be incremented every time the VM bus
> > > + * interface changes. DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION is
> > > + * incremented each time the earlier versions of the interface are no longer
> > > + * compatible with the current version.
> > > + */
> > > +#define DXGK_VMBUS_INTERFACE_VERSION_OLD 27
> > > +#define DXGK_VMBUS_INTERFACE_VERSION 40
> > > +#define DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION 16
> >
> > Where do these numbers come from, the hypervisor specification?
>
>
> The definitions are for the VMBus interface between the virtual compute
> device in the guest and the Windows host. The interface version is
> updated when the VMBus protocol is changed. These are specific to the
> virtual compute device, not to the hypervisor.

That's not ok, you can not break the user/kernel interface from this
moment going forward forever. You can't just have version numbers for
your protocol, you need to handle it correctly like all of Linux does.

> > > +#define DXGKRNL_VERSION 0x2216
> >
> > What does this mean?
>
>
> This is the driver implementation version. There are many Microsoft
> shipped Linux kernels, which include the driver. The version allows to
> quickly determine, which level of functionality or bug fixes is
> implemented in the current driver.

That number means nothing once the driver is in the kernel tree as your
driver is part of a larger whole and you have no idea what people have,
or have not, backported to your portion of the driver or the rest of the
kernel anymore. Just drop this, it's not going to be used ever again,
and will always be out of date and wrong.

In-kernel drivers do NOT need a version number. I have worked to remove
almost all of them, but have not finished that task, which is why you
see a few remaining. Do not copy bad examples.

> > > +/*
> > > + * Part of the PCI config space of the vGPU device is used for vGPU
> > > + * configuration data. Reading/writing of the PCI config space is forwarded
> > > + * to the host.
> > > + */
> >
> > > +/* DXGK_VMBUS_INTERFACE_VERSION (u32) */
> >
> > What is this?
>
>
> This comment explains that the value of DXGK_VMBUS_INTERFACE_VERSION is
> located at the offset DXGK_VMBUS_CHANNEL_ID_OFFSET in the PCI config space.

Then please explain that.

> > > +#define DXGK_VMBUS_VERSION_OFFSET (DXGK_VMBUS_CHANNEL_ID_OFFSET + \
> > > + sizeof(guid_t))
> >
> > offsetof() is your friend, please use it.
>
>
> There is no structure definition for the PCI config space of the device.
> Therefore, offsets are computed by using data size.

That's odd, why not just use a structure on top of the memory location?
That way you get real structure information if things change and you
don't have to cast anything.

> > > +/* Luid of the virtual GPU on the host (struct winluid) */
> >
> > What is a "luid"?
>
>
> LUID is "Locally Unique Identifier". It is used in Windows and its value
> is guaranteed to be unique until the computer reboots.

What about vm lifespans? VM cloning? Why does the Linux kernel care
about this value?

> > > + ret = vmbus_driver_register(&dxg_drv);
> > > + if (ret) {
> > > + pr_err("vmbus_driver_register failed: %d", ret);
> > > + return ret;
> > > + }
> > > + dxgglobal->vmbus_registered = true;
> > > +
> > > + pr_info("%s Version: %x", __func__, DXGKRNL_VERSION);
> >
> > When drivers work, they are quiet.
> >
> > Also, in-kernel modules do not have a version, they are part of the
> > kernel tree, and that's the "version" they have. You can drop the
> > individual version here, it makes no sense anymore.
>
>
> Microsoft develops many Linux kernels when the dxgkrnl driver is
> included (EFLOW, Android, WSL). The kernels are built at different times
> and might include a different version of the driver. Printing the
> version allows to quickly determine what functionality and bug fixes are
> included in the driver.

Again, see above why your driver version is not needed. Please remove
this line as well.

> I see that other in-tree drivers print some information during
> init_module (like nfblock.c).

Some older ones do, yes. Please be good and follow the proper rules and
be quiet when booting.

> This is done for convenience. So instead of tracking multiple kernel
> versions, we can track the dxgkrnl driver version. I am ok to remove it
> if it really hurts something. It is only a single line per driver load.

The driver version _IS_ the kernel version once it is merged into the
kernel tree, as you rely on the whole of the kernel for your proper
functionality. Drivers do not need individual versions.

Again, work with the experienced Linux developers at your company for
this type of thing. They know this already, you don't need me to find
basic problems like this in your code :)

thanks,

greg k-h

2022-03-03 18:14:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Wed, Mar 02, 2022 at 05:09:21PM -0800, Iouri Tarassov wrote:
>
> On 3/1/2022 11:53 PM, Greg KH wrote:
> > On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> > > > > +struct dxgglobal *dxgglobal;
> > > >
> > > > No, make this per-device, NEVER have a single device for your driver.
> > > > The Linux driver model makes it harder to do it this way than to do it
> > > > correctly. Do it correctly please and have no global structures like
> > > > this.
> > > >
> > >
> > > This may not be as big an issue as you thought. The device discovery is
> > > still done via the normal VMBus probing routine. For all intents and
> > > purposes the dxgglobal structure can be broken down into per device
> > > fields and a global structure which contains the protocol versioning
> > > information -- my understanding is there will always be a global
> > > structure to hold information related to the backend, regardless of how
> > > many devices there are.
> >
> > Then that is wrong and needs to be fixed. Drivers should almost never
> > have any global data, that is not how Linux drivers work. What happens
> > when you get a second device in your system for this? Major rework
> > would have to happen and the code will break. Handle that all now as it
> > takes less work to make this per-device than it does to have a global
> > variable.
> >
> > > I definitely think splitting is doable, but I also understand why Iouri
> > > does not want to do it _now_ given there is no such a model for multiple
> > > devices yet, so anything we put into the per-device structure could be
> > > incomplete and it requires further changing when such a model arrives
> > > later.
> > >
> > > Iouri, please correct me if I have the wrong mental model here.
> > >
> > > All in all, I hope this is not going to be a deal breaker for the
> > > acceptance of this driver.
> >
> > For my reviews, yes it will be.
> >
> > Again, it should be easier to keep things in a per-device state than
> > not as the proper lifetime rules and the like are automatically handled
> > for you. If you have global data, you have to manage that all on your
> > own and it is _MUCH_ harder to review that you got it correct.
>
> Hi Greg,
>
> I do not really see how the driver be written without the global data. Let's review the design.

I see it the other way around. It's easier to make it without a static
structure, it is more work to keep it as you have done so here. Do it
correctly to start with and you will not have any of these issues going
forward.

> Dxgkrnl acts as the aggregator of all virtual compute devices, projected by the host. It needs to do operations, which do not belong to a particular compute device. For example, cross device synchronization and resource sharing.

Then hang your data off of your device node structure that you created.
Why ignore that?

> A PCI device device is created for each virtual compute device. Therefore, there should be a global list of objects and a mutex to synchronize access to the list.

Woah, what? You create a fake PCI device for each virtual device? If
so, great, then you are now a PCI bus and create the PCI devices
properly so that the PCI core can handle and manage them and then assign
them to your driver. You should NEVER have a global list of these
devices, as that is what the driver model should be managing. Not you!

> A VMBus channel is offered by the host for each compute device. The list of the VMBus channels should be global.

The vmbus channels are already handled by the driver core. Use those
devices that are given to you. You don't need to manage them at all.

> A global VMBus channel is offered by the host. The channel does not belong to any particular compute device, so it must be global.

That channel is attached to your driver, use the device given to your
driver by the bus. It's not "global" in any sense of the word.

And what's up with your lack of line wrapping?

> IO space is shared by all compute devices, so its parameters should be global.

Huh? If that's the case then you have bigger problems. Use the aux bus
for devices that share io space. That is what it was created for, do
not ignore the functionality that Linux already provides you by trying
to go around it and writing your own code. Use the frameworks we have
already debugged and support. This is why your Linux driver should be
at least 1/3 smaller than drivers for other operating systems.

> Dxgkrnl needs to maintain a list of processes, which opened compute device objects. Dxgkrnl maintains private state for each process and when a process opens the /dev/dxg device, Dxgkrnl needs to find if the process state is already created by walking the global process list.

That "list" is handled by the device node structure that was opened.
It's not "global" at all. Again, just like any other device node in
Linux, this isn't a new thing or anything special at all.

> Now, where to keep this global state? It could be kept in the /dev/dxg private device structure. But this structure is not available when, for example, dxg_pci_probe_device() or dxg_probe_vmbus() is called.

Then your design is wrong. It's as simple as that. Fix it.

> Can there be multiple /dev/dxg devices? No. Because the /dev/dxg device represents the driver itself, not a particular compute device.

Then fix this. Make your compute devices store the needed information
when they are created. Again, we have loads of examples in the kernel,
this is nothing new.

> I am not sure what design model you have in mind when saying there should be no global data. Could you please explain keeping in mind the above requirements?

Please see all of my responses above, and please use more \n characters
in the future :)

good luck!

greg k-h

2022-03-03 18:45:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Wed, Mar 02, 2022 at 02:27:05PM -0800, Iouri Tarassov wrote:
>
> On 3/2/2022 12:20 PM, Greg KH wrote:
> > On Wed, Mar 02, 2022 at 10:49:15AM -0800, Iouri Tarassov wrote:
> > > On 3/2/2022 3:53 AM, Wei Liu wrote:
> > > > On Wed, Mar 02, 2022 at 08:53:15AM +0100, Greg KH wrote:
> > > > > On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> > > > > > > > +struct dxgglobal *dxgglobal;
> > > > > > >
> > > > > > > No, make this per-device, NEVER have a single device for your driver.
> > > > > > > The Linux driver model makes it harder to do it this way than to do it
> > > > > > > correctly. Do it correctly please and have no global structures like
> > > > > > > this.
> > > > > > >
> > > > > >
> > > > > > This may not be as big an issue as you thought. The device discovery is
> > > > > > still done via the normal VMBus probing routine. For all intents and
> > > > > > purposes the dxgglobal structure can be broken down into per device
> > > > > > fields and a global structure which contains the protocol versioning
> > > > > > information -- my understanding is there will always be a global
> > > > > > structure to hold information related to the backend, regardless of how
> > > > > > many devices there are.
> > > > >
> > > > > Then that is wrong and needs to be fixed. Drivers should almost never
> > > > > have any global data, that is not how Linux drivers work. What happens
> > > > > when you get a second device in your system for this? Major rework
> > > > > would have to happen and the code will break. Handle that all now as it
> > > > > takes less work to make this per-device than it does to have a global
> > > > > variable.
> > > > >
> > > >
> > > > It is perhaps easier to draw parallel from an existing driver. I feel
> > > > like we're talking past each other.
> > > >
> > > > Let's look at drivers/iommu/intel/iommu.c. There are a bunch of lists
> > > > like `static LIST_HEAD(dmar_rmrr_units)`. During the probing phase, new
> > > > units will be added to the list. I this the current code is following
> > > > this model. dxgglobal fulfills the role of a list.
> > > >
> > > > Setting aside the question of whether it makes sense to keep a copy of
> > > > the per-VM state in each device instance, I can see the code be changed
> > > > to:
> > > >
> > > > struct mutex device_mutex; /* split out from dxgglobal */
> > > > static LIST_HEAD(dxglist);
> > > >
> > > > /* Rename struct dxgglobal to struct dxgstate */
> > > > struct dxgstate {
> > > > struct list_head dxglist; /* link for dxglist */
> > > > /* ... original fields sans device_mutex */
> > > > }
> > > >
> > > > /*
> > > > * Provide a bunch of helpers manipulate the list. Called in probe /
> > > > * remove etc.
> > > > */
> > > > struct dxgstate *find_dxgstate(...);
> > > > void remove_dxgstate(...);
> > > > int add_dxgstate(...);
> > > >
> > > > This model is well understood and used in tree. It is just that it
> > > > doesn't provide much value in doing this now since the list will only
> > > > contain one element. I hope that you're not saying we cannot even use a
> > > > per-module pointer to quickly get the data structure we want to use,
> > > > right?
> > > >
> > > > Are you suggesting Iouri use dev_set_drvdata to stash the dxgstate
> > > > into the device object? I think that can be done too.
> > > >
> > > > The code can be changed as:
> > > >
> > > > /* Rename struct dxgglobal to dxgstate and remove unneeded fields */
> > > > struct dxgstate { ... };
> > > >
> > > > static int dxg_probe_vmbus(...) {
> > > >
> > > > /* probe successfully */
> > > >
> > > > struct dxgstate *state = kmalloc(...);
> > > > /* Fill in dxgstate with information from backend */
> > > >
> > > > /* hdev->dev is the device object from the core driver framework */
> > > > dev_set_drvdata(&hdev->dev, state);
> > > > }
> > > >
> > > > static int dxg_remove_vmbus(...) {
> > > > /* Normal stuff here ...*/
> > > >
> > > > struct dxgstate *state = dev_get_drvdata(...);
> > > > dev_set_drvdata(..., NULL);
> > > > kfree(state);
> > > > }
> > > >
> > > > /* In all other functions */
> > > > void do_things(...) {
> > > > struct dxgstate *state = dev_get_drvdata(...);
> > > >
> > > > /* Use state in place of where dxgglobal was needed */
> > > >
> > > > }
> > > >
> > > > Iouri, notice this doesn't change anything regarding how userspace is
> > > > designed. This is about how kernel organises its data.
> > > >
> > > > I hope what I wrote above can bring our understanding closer.
> > > >
> > > > Thanks,
> > > > Wei.
> > >
> > >
> > > I can certainly remove dxgglobal and keep the? pointer to the global
> > > state in the device object.
> > >
> > > This will require passing of the global pointer to all functions, which
> > > need to access it.
> > >
> > >
> > > Maybe my understanding of the Greg's suggestion was not correct. I
> > > thought the suggestion was
> > >
> > > to have multiple /dev/dxgN devices (one per virtual compute device).
> >
> > You have one device per HV device, as the bus already provides you.
> > That's all you really need, right? Who would be opening the same device
> > node multiple times?
> > > This would change how the user mode
> > > clients enumerate and communicate with compute devices.
> >
> > What does userspace have to do here? It should just open the device
> > node that is present when needed. How will there be multiple userspace
> > clients for a single HV device?
>
>
> Dxgkrnl creates a single user mode visible device node /dev/dxg.

When you do that, you have a device to put all of your data on. Use
that.

> It has
> nothing to do with a specific hardware compute device on the host. Its
> purpose is to provide services (IOCTLs) to enumerate and manage virtual
> compute devices, which represent hardware devices on the host. The VMBus
> devices are not used directly by user mode clients in the current design.

That's horrid, why not just export the virtual compute devices properly
through individual device nodes instead?

In essence, you are creating a new syscall here to manage and handle
devices for just your driver with the use of this ioctl. That's
generally a bad idea.

> Virtual compute devices are shared between processes. There could be a
> Cuda application, Gimp and a Direct3D12 application working at the same
> time.

Why are all of those applications sharing anything? How are they
sharing anything? If you need to share something across processes, use
the existing inter-process ways of sharing stuff that we have today (12+
different apis and counting). Don't create yet-another-one just for
your tiny little driver here. That's rude to the rest of the kernel.

> This is what I mean by saying that there are multiple user mode
> clients who use the /dev/dxg driver interface. Each of this applications
> will open the /dev/dxg device node and enumerate/use virtual compute
> devices.

That seems like an odd model to follow. How many virtual devices do you
support? Is there a limit? Why not just enumerate them all to start
with? Or if there are too many, why not do it like the raw device and
have a single device node that is used to create the virtual devices you
wish to use?

> If we change the way how the virtual compute devices are visible to user
> mode, the Cuda runtime, Direct3D runtime would need to be changed.

We do not care about existing userspace code at this point in time, you
are trying to get the kernel api correct here. Once you have done that,
then you can go fix up your userspace code to work properly. Anything
that came before today is pointless here, right? :)

> I think we agreed that I will keep the global driver state in the device
> object as Wei suggested and remove global variables. There still will be
> a single /dev/dxg device node. Correct?

I do not know, as I can't figure out your goals here at all, sorry.

Please go work with some experienced Linux developers in your company.
They should be the ones answering these questions for you, not me :)

thanks,

greg k-h

2022-03-04 17:21:10

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Thu, Mar 03, 2022 at 02:22:32PM +0100, Greg KH wrote:
> On Wed, Mar 02, 2022 at 05:09:21PM -0800, Iouri Tarassov wrote:
> >
> > On 3/1/2022 11:53 PM, Greg KH wrote:
> > > On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> > > > > > +struct dxgglobal *dxgglobal;
> > > > >
> > > > > No, make this per-device, NEVER have a single device for your driver.
> > > > > The Linux driver model makes it harder to do it this way than to do it
> > > > > correctly. Do it correctly please and have no global structures like
> > > > > this.
> > > > >
> > > >
> > > > This may not be as big an issue as you thought. The device discovery is
> > > > still done via the normal VMBus probing routine. For all intents and
> > > > purposes the dxgglobal structure can be broken down into per device
> > > > fields and a global structure which contains the protocol versioning
> > > > information -- my understanding is there will always be a global
> > > > structure to hold information related to the backend, regardless of how
> > > > many devices there are.
> > >
> > > Then that is wrong and needs to be fixed. Drivers should almost never
> > > have any global data, that is not how Linux drivers work. What happens
> > > when you get a second device in your system for this? Major rework
> > > would have to happen and the code will break. Handle that all now as it
> > > takes less work to make this per-device than it does to have a global
> > > variable.
> > >
> > > > I definitely think splitting is doable, but I also understand why Iouri
> > > > does not want to do it _now_ given there is no such a model for multiple
> > > > devices yet, so anything we put into the per-device structure could be
> > > > incomplete and it requires further changing when such a model arrives
> > > > later.
> > > >
> > > > Iouri, please correct me if I have the wrong mental model here.
> > > >
> > > > All in all, I hope this is not going to be a deal breaker for the
> > > > acceptance of this driver.
> > >
> > > For my reviews, yes it will be.
> > >
> > > Again, it should be easier to keep things in a per-device state than
> > > not as the proper lifetime rules and the like are automatically handled
> > > for you. If you have global data, you have to manage that all on your
> > > own and it is _MUCH_ harder to review that you got it correct.
> >
> > Hi Greg,
> >
> > I do not really see how the driver be written without the global data. Let's review the design.
>
> I see it the other way around. It's easier to make it without a static
> structure, it is more work to keep it as you have done so here. Do it
> correctly to start with and you will not have any of these issues going
> forward.
>

> > Dxgkrnl acts as the aggregator of all virtual compute devices, projected by the host. It needs to do operations, which do not belong to a particular compute device. For example, cross device synchronization and resource sharing.

Hey Iouri, please fix your text wrapping.

Greg, I have to admit I only started paying close attention to this
series a few days ago, so I don't claim I know a lot.

>
> Then hang your data off of your device node structure that you
> created. Why ignore that?
>
> > A PCI device device is created for each virtual compute device.
> > Therefore, there should be a global list of objects and a mutex to
> > synchronize access to the list.
>
> Woah, what? You create a fake PCI device for each virtual device? If
> so, great, then you are now a PCI bus and create the PCI devices
> properly so that the PCI core can handle and manage them and then
> assign them to your driver. You should NEVER have a global list of
> these devices, as that is what the driver model should be managing.
> Not you!
>

No, there is no fake PCI device. The device object is still coming from
the PCI core driver. There is code to match against PCI vendor ID and
device ID, and follow the usual way of managing PCI device.

Iouri understands device specific state should be encapsulated in the
private data field in their respective device. And I believe the code
can perhaps be rewritten to better conform to Linux kernel's model.

That should address the issue ...

> > A VMBus channel is offered by the host for each compute device. The
> > list of the VMBus channels should be global.
>
> The vmbus channels are already handled by the driver core. Use those
> devices that are given to you. You don't need to manage them at all.
>

here ...

> > A global VMBus channel is offered by the host. The channel does not
> > belong to any particular compute device, so it must be global.
>
> That channel is attached to your driver, use the device given to your
> driver by the bus. It's not "global" in any sense of the word.
>

here ...

> And what's up with your lack of line wrapping?
>
> > IO space is shared by all compute devices, so its parameters should
> > be global.
>
> Huh? If that's the case then you have bigger problems. Use the aux
> bus for devices that share io space. That is what it was created for,
> do not ignore the functionality that Linux already provides you by
> trying to go around it and writing your own code. Use the frameworks
> we have already debugged and support. This is why your Linux driver
> should be at least 1/3 smaller than drivers for other operating
> systems.
>

To be fair, auxiliary bus was only added in 5.11, while this series was
written long before that. Unfortunately one only has so much time to
follow Linux kernel development closely. I admit this is the first time
I hear about it. :-)

> > Dxgkrnl needs to maintain a list of processes, which opened compute
> > device objects. Dxgkrnl maintains private state for each process and
> > when a process opens the /dev/dxg device, Dxgkrnl needs to find if
> > the process state is already created by walking the global process
> > list.
>
> That "list" is handled by the device node structure that was opened.
> It's not "global" at all. Again, just like any other device node in
> Linux, this isn't a new thing or anything special at all.
>

Again, the state can be associated with the `private_data` field in
struct file.

> > Now, where to keep this global state? It could be kept in the
> > /dev/dxg private device structure. But this structure is not
> > available when, for example, dxg_pci_probe_device() or
> > dxg_probe_vmbus() is called.
>
> Then your design is wrong. It's as simple as that. Fix it.
>
> > Can there be multiple /dev/dxg devices? No. Because the /dev/dxg
> > device represents the driver itself, not a particular compute
> > device.
>
> Then fix this. Make your compute devices store the needed information
> when they are created. Again, we have loads of examples in the kernel,
> this is nothing new.
>

At this point, I think Iouri and I have settled on more encapsulation is
needed. Yet there is something I don't know how to square yet. That is,
devices (either from vmbus or pci) don't form a clear hierarchy. If
there isn't a linked list or some sort to organize them it would be
difficult to cross-reference. This then goes back to what I wrote
earlier in <20220302115334.wemdkznokszlzcpe@liuwe-devbox-debian-v2>. I
hope you won't be against using that pattern -- it is used in a lot of
places in tree.

Thanks,
Wei.

2022-03-04 20:47:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Fri, Mar 04, 2022 at 04:04:10PM +0000, Wei Liu wrote:
> > > A PCI device device is created for each virtual compute device.
> > > Therefore, there should be a global list of objects and a mutex to
> > > synchronize access to the list.
> >
> > Woah, what? You create a fake PCI device for each virtual device? If
> > so, great, then you are now a PCI bus and create the PCI devices
> > properly so that the PCI core can handle and manage them and then
> > assign them to your driver. You should NEVER have a global list of
> > these devices, as that is what the driver model should be managing.
> > Not you!
> >
>
> No, there is no fake PCI device. The device object is still coming from
> the PCI core driver. There is code to match against PCI vendor ID and
> device ID, and follow the usual way of managing PCI device.

So it is a real PCI device? Why the confusion?

> Iouri understands device specific state should be encapsulated in the
> private data field in their respective device. And I believe the code
> can perhaps be rewritten to better conform to Linux kernel's model.

It has to follow the Linux kernel model, to think otherwise is quite
odd.

> > > IO space is shared by all compute devices, so its parameters should
> > > be global.
> >
> > Huh? If that's the case then you have bigger problems. Use the aux
> > bus for devices that share io space. That is what it was created for,
> > do not ignore the functionality that Linux already provides you by
> > trying to go around it and writing your own code. Use the frameworks
> > we have already debugged and support. This is why your Linux driver
> > should be at least 1/3 smaller than drivers for other operating
> > systems.
> >
>
> To be fair, auxiliary bus was only added in 5.11, while this series was
> written long before that. Unfortunately one only has so much time to
> follow Linux kernel development closely. I admit this is the first time
> I hear about it. :-)

5.11 was over a year ago. We do not take "old" drivers into the kernel
tree, and if this series has not been touched for over a year, um, you
all have bigger problems here and are just wasting our time :(

> > Then fix this. Make your compute devices store the needed information
> > when they are created. Again, we have loads of examples in the kernel,
> > this is nothing new.
> >
>
> At this point, I think Iouri and I have settled on more encapsulation is
> needed. Yet there is something I don't know how to square yet. That is,
> devices (either from vmbus or pci) don't form a clear hierarchy.

That is because devices don't care where they are in the larger kernel
tree of devices, they are independant. To try to claim there is a
needed hierarchy is quite odd and will cause lots of problems when that
heirachy changes by the underlying system.

> If
> there isn't a linked list or some sort to organize them it would be
> difficult to cross-reference.

You should never be touching another device from another one. There are
a few rare exceptions but they are rare and you should not use them at
all. And if you do have to do so, you better get the reference counting
logic correct.

good luck!

greg k-h

2022-03-10 09:05:37

by Steve Pronovost

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading


On 3/3/22 05:16, Greg KH wrote:
> On Wed, Mar 02, 2022 at 02:27:05PM -0800, Iouri Tarassov wrote:
>> On 3/2/2022 12:20 PM, Greg KH wrote:
>>> On Wed, Mar 02, 2022 at 10:49:15AM -0800, Iouri Tarassov wrote:
>>>> On 3/2/2022 3:53 AM, Wei Liu wrote:
>>>>> On Wed, Mar 02, 2022 at 08:53:15AM +0100, Greg KH wrote:
>>>>>> On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
>>>>>>>>> +struct dxgglobal *dxgglobal;
>>>>>>>> No, make this per-device, NEVER have a single device for your driver.
>>>>>>>> The Linux driver model makes it harder to do it this way than to do it
>>>>>>>> correctly. Do it correctly please and have no global structures like
>>>>>>>> this.
>>>>>>>>
>>>>>>> This may not be as big an issue as you thought. The device discovery is
>>>>>>> still done via the normal VMBus probing routine. For all intents and
>>>>>>> purposes the dxgglobal structure can be broken down into per device
>>>>>>> fields and a global structure which contains the protocol versioning
>>>>>>> information -- my understanding is there will always be a global
>>>>>>> structure to hold information related to the backend, regardless of how
>>>>>>> many devices there are.
>>>>>> Then that is wrong and needs to be fixed. Drivers should almost never
>>>>>> have any global data, that is not how Linux drivers work. What happens
>>>>>> when you get a second device in your system for this? Major rework
>>>>>> would have to happen and the code will break. Handle that all now as it
>>>>>> takes less work to make this per-device than it does to have a global
>>>>>> variable.
>>>>>>
>>>>> It is perhaps easier to draw parallel from an existing driver. I feel
>>>>> like we're talking past each other.
>>>>>
>>>>> Let's look at drivers/iommu/intel/iommu.c. There are a bunch of lists
>>>>> like `static LIST_HEAD(dmar_rmrr_units)`. During the probing phase, new
>>>>> units will be added to the list. I this the current code is following
>>>>> this model. dxgglobal fulfills the role of a list.
>>>>>
>>>>> Setting aside the question of whether it makes sense to keep a copy of
>>>>> the per-VM state in each device instance, I can see the code be changed
>>>>> to:
>>>>>
>>>>> struct mutex device_mutex; /* split out from dxgglobal */
>>>>> static LIST_HEAD(dxglist);
>>>>>
>>>>> /* Rename struct dxgglobal to struct dxgstate */
>>>>> struct dxgstate {
>>>>> struct list_head dxglist; /* link for dxglist */
>>>>> /* ... original fields sans device_mutex */
>>>>> }
>>>>>
>>>>> /*
>>>>> * Provide a bunch of helpers manipulate the list. Called in probe /
>>>>> * remove etc.
>>>>> */
>>>>> struct dxgstate *find_dxgstate(...);
>>>>> void remove_dxgstate(...);
>>>>> int add_dxgstate(...);
>>>>>
>>>>> This model is well understood and used in tree. It is just that it
>>>>> doesn't provide much value in doing this now since the list will only
>>>>> contain one element. I hope that you're not saying we cannot even use a
>>>>> per-module pointer to quickly get the data structure we want to use,
>>>>> right?
>>>>>
>>>>> Are you suggesting Iouri use dev_set_drvdata to stash the dxgstate
>>>>> into the device object? I think that can be done too.
>>>>>
>>>>> The code can be changed as:
>>>>>
>>>>> /* Rename struct dxgglobal to dxgstate and remove unneeded fields */
>>>>> struct dxgstate { ... };
>>>>>
>>>>> static int dxg_probe_vmbus(...) {
>>>>>
>>>>> /* probe successfully */
>>>>>
>>>>> struct dxgstate *state = kmalloc(...);
>>>>> /* Fill in dxgstate with information from backend */
>>>>>
>>>>> /* hdev->dev is the device object from the core driver framework */
>>>>> dev_set_drvdata(&hdev->dev, state);
>>>>> }
>>>>>
>>>>> static int dxg_remove_vmbus(...) {
>>>>> /* Normal stuff here ...*/
>>>>>
>>>>> struct dxgstate *state = dev_get_drvdata(...);
>>>>> dev_set_drvdata(..., NULL);
>>>>> kfree(state);
>>>>> }
>>>>>
>>>>> /* In all other functions */
>>>>> void do_things(...) {
>>>>> struct dxgstate *state = dev_get_drvdata(...);
>>>>>
>>>>> /* Use state in place of where dxgglobal was needed */
>>>>>
>>>>> }
>>>>>
>>>>> Iouri, notice this doesn't change anything regarding how userspace is
>>>>> designed. This is about how kernel organises its data.
>>>>>
>>>>> I hope what I wrote above can bring our understanding closer.
>>>>>
>>>>> Thanks,
>>>>> Wei.
>>>>
>>>> I can certainly remove dxgglobal and keep the  pointer to the global
>>>> state in the device object.
>>>>
>>>> This will require passing of the global pointer to all functions, which
>>>> need to access it.
>>>>
>>>>
>>>> Maybe my understanding of the Greg's suggestion was not correct. I
>>>> thought the suggestion was
>>>>
>>>> to have multiple /dev/dxgN devices (one per virtual compute device).
>>> You have one device per HV device, as the bus already provides you.
>>> That's all you really need, right? Who would be opening the same device
>>> node multiple times?
>>>> This would change how the user mode
>>>> clients enumerate and communicate with compute devices.
>>> What does userspace have to do here? It should just open the device
>>> node that is present when needed. How will there be multiple userspace
>>> clients for a single HV device?
>>
>> Dxgkrnl creates a single user mode visible device node /dev/dxg.
> When you do that, you have a device to put all of your data on. Use
> that.
>
>> It has
>> nothing to do with a specific hardware compute device on the host. Its
>> purpose is to provide services (IOCTLs) to enumerate and manage virtual
>> compute devices, which represent hardware devices on the host. The VMBus
>> devices are not used directly by user mode clients in the current design.
> That's horrid, why not just export the virtual compute devices properly
> through individual device nodes instead?
>
> In essence, you are creating a new syscall here to manage and handle
> devices for just your driver with the use of this ioctl. That's
> generally a bad idea.
>
>> Virtual compute devices are shared between processes. There could be a
>> Cuda application, Gimp and a Direct3D12 application working at the same
>> time.
> Why are all of those applications sharing anything? How are they
> sharing anything? If you need to share something across processes, use
> the existing inter-process ways of sharing stuff that we have today (12+
> different apis and counting). Don't create yet-another-one just for
> your tiny little driver here. That's rude to the rest of the kernel.
>
>> This is what I mean by saying that there are multiple user mode
>> clients who use the /dev/dxg driver interface. Each of this applications
>> will open the /dev/dxg device node and enumerate/use virtual compute
>> devices.
> That seems like an odd model to follow. How many virtual devices do you
> support? Is there a limit? Why not just enumerate them all to start
> with? Or if there are too many, why not do it like the raw device and
> have a single device node that is used to create the virtual devices you
> wish to use?
>
>> If we change the way how the virtual compute devices are visible to user
>> mode, the Cuda runtime, Direct3D runtime would need to be changed.
> We do not care about existing userspace code at this point in time, you
> are trying to get the kernel api correct here. Once you have done that,
> then you can go fix up your userspace code to work properly. Anything
> that came before today is pointless here, right? :)
>
>> I think we agreed that I will keep the global driver state in the device
>> object as Wei suggested and remove global variables. There still will be
>> a single /dev/dxg device node. Correct?
> I do not know, as I can't figure out your goals here at all, sorry.
>
> Please go work with some experienced Linux developers in your company.
> They should be the ones answering these questions for you, not me :)
>
> thanks,
>
> greg k-h
Thanks Greg and folks for all the feedback. The userspace API provided  
by this driver is specifically designed to match the Windows compute  
device abstraction in order to light up a large eco-system of graphics  
and compute APIs in our various virtualized Linux environment. This is  
a pretty critical design point as this is the only way we can scale to  
such a vast eco-system of APIs and their related tools. This enables  
APIs and framework which are available natively on Linux, such as  
CUDA, OpenGL, OpenCL, OpenVINO, OneAPI/L0, TensorFlow, Pytorch, soon  
Vulkan and more in the future and allows them to share the host GPU or  
compute device when run under our hypervisor. Most of these APIs sit  
directly on top of our userspace API and changing that abstraction  
would have major implications. Our D3D/DML userspace components are  
only implementation details for a subset of those APIs.
 
We have millions of developers experiencing and enjoying various Linux  
distributions through our Windows Subsystem for Linux (WSL) and this  
gives them a way to get hardware acceleration in a large number of  
important scenarios. WSL makes it trivial for developers to try out  
Linux totally risk free on over a billion devices, without having to  
commit to a native install upfront or understand and manually manage  
VMs. We think this is a pretty valuable proposition to both the  
Windows and the Linux community and one which has definitely been  
quite popular.
 
That being said, we recognize that our virtualization approach makes  
some folks in the community uncomfortable and the resulting userspace  
API exposed by our driver will look a bit different than typical Linux  
driver. Taking into consideration feedback and reception, we decided  
to pause our upstream effort for this driver. We will keep our driver  
open source and available in our downstream kernel tree for folks who  
wants to leverage it in their private kernel.
 
Thanks, Steve

2022-03-10 17:21:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

On Wed, Mar 09, 2022 at 03:14:24PM -0800, Steve Pronovost wrote:
>
> On 3/3/22 05:16, Greg KH wrote:
> > On Wed, Mar 02, 2022 at 02:27:05PM -0800, Iouri Tarassov wrote:
> >> On 3/2/2022 12:20 PM, Greg KH wrote:
> >>> On Wed, Mar 02, 2022 at 10:49:15AM -0800, Iouri Tarassov wrote:
> >>>> On 3/2/2022 3:53 AM, Wei Liu wrote:
> >>>>> On Wed, Mar 02, 2022 at 08:53:15AM +0100, Greg KH wrote:
> >>>>>> On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
> >>>>>>>>> +struct dxgglobal *dxgglobal;
> >>>>>>>> No, make this per-device, NEVER have a single device for your driver.
> >>>>>>>> The Linux driver model makes it harder to do it this way than to do it
> >>>>>>>> correctly. Do it correctly please and have no global structures like
> >>>>>>>> this.
> >>>>>>>>
> >>>>>>> This may not be as big an issue as you thought. The device discovery is
> >>>>>>> still done via the normal VMBus probing routine. For all intents and
> >>>>>>> purposes the dxgglobal structure can be broken down into per device
> >>>>>>> fields and a global structure which contains the protocol versioning
> >>>>>>> information -- my understanding is there will always be a global
> >>>>>>> structure to hold information related to the backend, regardless of how
> >>>>>>> many devices there are.
> >>>>>> Then that is wrong and needs to be fixed. Drivers should almost never
> >>>>>> have any global data, that is not how Linux drivers work. What happens
> >>>>>> when you get a second device in your system for this? Major rework
> >>>>>> would have to happen and the code will break. Handle that all now as it
> >>>>>> takes less work to make this per-device than it does to have a global
> >>>>>> variable.
> >>>>>>
> >>>>> It is perhaps easier to draw parallel from an existing driver. I feel
> >>>>> like we're talking past each other.
> >>>>>
> >>>>> Let's look at drivers/iommu/intel/iommu.c. There are a bunch of lists
> >>>>> like `static LIST_HEAD(dmar_rmrr_units)`. During the probing phase, new
> >>>>> units will be added to the list. I this the current code is following
> >>>>> this model. dxgglobal fulfills the role of a list.
> >>>>>
> >>>>> Setting aside the question of whether it makes sense to keep a copy of
> >>>>> the per-VM state in each device instance, I can see the code be changed
> >>>>> to:
> >>>>>
> >>>>> struct mutex device_mutex; /* split out from dxgglobal */
> >>>>> static LIST_HEAD(dxglist);
> >>>>>
> >>>>> /* Rename struct dxgglobal to struct dxgstate */
> >>>>> struct dxgstate {
> >>>>> struct list_head dxglist; /* link for dxglist */
> >>>>> /* ... original fields sans device_mutex */
> >>>>> }
> >>>>>
> >>>>> /*
> >>>>> * Provide a bunch of helpers manipulate the list. Called in probe /
> >>>>> * remove etc.
> >>>>> */
> >>>>> struct dxgstate *find_dxgstate(...);
> >>>>> void remove_dxgstate(...);
> >>>>> int add_dxgstate(...);
> >>>>>
> >>>>> This model is well understood and used in tree. It is just that it
> >>>>> doesn't provide much value in doing this now since the list will only
> >>>>> contain one element. I hope that you're not saying we cannot even use a
> >>>>> per-module pointer to quickly get the data structure we want to use,
> >>>>> right?
> >>>>>
> >>>>> Are you suggesting Iouri use dev_set_drvdata to stash the dxgstate
> >>>>> into the device object? I think that can be done too.
> >>>>>
> >>>>> The code can be changed as:
> >>>>>
> >>>>> /* Rename struct dxgglobal to dxgstate and remove unneeded fields */
> >>>>> struct dxgstate { ... };
> >>>>>
> >>>>> static int dxg_probe_vmbus(...) {
> >>>>>
> >>>>> /* probe successfully */
> >>>>>
> >>>>> struct dxgstate *state = kmalloc(...);
> >>>>> /* Fill in dxgstate with information from backend */
> >>>>>
> >>>>> /* hdev->dev is the device object from the core driver framework */
> >>>>> dev_set_drvdata(&hdev->dev, state);
> >>>>> }
> >>>>>
> >>>>> static int dxg_remove_vmbus(...) {
> >>>>> /* Normal stuff here ...*/
> >>>>>
> >>>>> struct dxgstate *state = dev_get_drvdata(...);
> >>>>> dev_set_drvdata(..., NULL);
> >>>>> kfree(state);
> >>>>> }
> >>>>>
> >>>>> /* In all other functions */
> >>>>> void do_things(...) {
> >>>>> struct dxgstate *state = dev_get_drvdata(...);
> >>>>>
> >>>>> /* Use state in place of where dxgglobal was needed */
> >>>>>
> >>>>> }
> >>>>>
> >>>>> Iouri, notice this doesn't change anything regarding how userspace is
> >>>>> designed. This is about how kernel organises its data.
> >>>>>
> >>>>> I hope what I wrote above can bring our understanding closer.
> >>>>>
> >>>>> Thanks,
> >>>>> Wei.
> >>>>
> >>>> I can certainly remove dxgglobal and keep the? pointer to the global
> >>>> state in the device object.
> >>>>
> >>>> This will require passing of the global pointer to all functions, which
> >>>> need to access it.
> >>>>
> >>>>
> >>>> Maybe my understanding of the Greg's suggestion was not correct. I
> >>>> thought the suggestion was
> >>>>
> >>>> to have multiple /dev/dxgN devices (one per virtual compute device).
> >>> You have one device per HV device, as the bus already provides you.
> >>> That's all you really need, right? Who would be opening the same device
> >>> node multiple times?
> >>>> This would change how the user mode
> >>>> clients enumerate and communicate with compute devices.
> >>> What does userspace have to do here? It should just open the device
> >>> node that is present when needed. How will there be multiple userspace
> >>> clients for a single HV device?
> >>
> >> Dxgkrnl creates a single user mode visible device node /dev/dxg.
> > When you do that, you have a device to put all of your data on. Use
> > that.
> >
> >> It has
> >> nothing to do with a specific hardware compute device on the host. Its
> >> purpose is to provide services (IOCTLs) to enumerate and manage virtual
> >> compute devices, which represent hardware devices on the host. The VMBus
> >> devices are not used directly by user mode clients in the current design.
> > That's horrid, why not just export the virtual compute devices properly
> > through individual device nodes instead?
> >
> > In essence, you are creating a new syscall here to manage and handle
> > devices for just your driver with the use of this ioctl. That's
> > generally a bad idea.
> >
> >> Virtual compute devices are shared between processes. There could be a
> >> Cuda application, Gimp and a Direct3D12 application working at the same
> >> time.
> > Why are all of those applications sharing anything? How are they
> > sharing anything? If you need to share something across processes, use
> > the existing inter-process ways of sharing stuff that we have today (12+
> > different apis and counting). Don't create yet-another-one just for
> > your tiny little driver here. That's rude to the rest of the kernel.
> >
> >> This is what I mean by saying that there are multiple user mode
> >> clients who use the /dev/dxg driver interface. Each of this applications
> >> will open the /dev/dxg device node and enumerate/use virtual compute
> >> devices.
> > That seems like an odd model to follow. How many virtual devices do you
> > support? Is there a limit? Why not just enumerate them all to start
> > with? Or if there are too many, why not do it like the raw device and
> > have a single device node that is used to create the virtual devices you
> > wish to use?
> >
> >> If we change the way how the virtual compute devices are visible to user
> >> mode, the Cuda runtime, Direct3D runtime would need to be changed.
> > We do not care about existing userspace code at this point in time, you
> > are trying to get the kernel api correct here. Once you have done that,
> > then you can go fix up your userspace code to work properly. Anything
> > that came before today is pointless here, right? :)
> >
> >> I think we agreed that I will keep the global driver state in the device
> >> object as Wei suggested and remove global variables. There still will be
> >> a single /dev/dxg device node. Correct?
> > I do not know, as I can't figure out your goals here at all, sorry.
> >
> > Please go work with some experienced Linux developers in your company.
> > They should be the ones answering these questions for you, not me :)
> >
> > thanks,
> >
> > greg k-h
> Thanks Greg and folks for all the feedback. The userspace API provided ?
> by this driver is specifically designed to match the Windows compute ?
> device abstraction in order to light up a large eco-system of graphics ?
> and compute APIs in our various virtualized Linux environment. This is ?
> a pretty critical design point as this is the only way we can scale to ?
> such a vast eco-system of APIs and their related tools. This enables ?
> APIs and framework which are available natively on Linux, such as ?
> CUDA, OpenGL, OpenCL, OpenVINO, OneAPI/L0, TensorFlow, Pytorch, soon ?
> Vulkan and more in the future and allows them to share the host GPU or ?
> compute device when run under our hypervisor. Most of these APIs sit ?
> directly on top of our userspace API and changing that abstraction ?
> would have major implications. Our D3D/DML userspace components are ?
> only implementation details for a subset of those APIs.

This feels odd to me as I was only discussing the in-kernel apis, and
how this driver does not follow them correctly at all (global device
state, odd device creation, etc.) That should have nothing to do with
the user/kernel api at all. And surely CUDA and the like never have
defined a specific user/kernal api that must be followed, right?

Yes, the existing user/kernel api in this driver is a bit odd, but
really, we haven't even started to review that. For you all to just
cut-and-run at the first sign of a detailed in-kernel code review is
strange, why are you giving up now?

> We have millions of developers experiencing and enjoying various Linux ?
> distributions through our Windows Subsystem for Linux (WSL) and this ?
> gives them a way to get hardware acceleration in a large number of ?
> important scenarios. WSL makes it trivial for developers to try out ?
> Linux totally risk free on over a billion devices, without having to ?
> commit to a native install upfront or understand and manually manage ?
> VMs. We think this is a pretty valuable proposition to both the ?
> Windows and the Linux community and one which has definitely been ?
> quite popular.

This has nothing to do with how this tiny driver uses the in-kernel apis
for creating and managing devices. You are discussing userspace
applications here, which is a nice marketing discussion about how many
developers you have, but means nothing when it comes to how the kernel
code works. So why mention it at all?

> That being said, we recognize that our virtualization approach makes ?
> some folks in the community uncomfortable and the resulting userspace ?
> API exposed by our driver will look a bit different than typical Linux ?
> driver. Taking into consideration feedback and reception, we decided ?
> to pause our upstream effort for this driver. We will keep our driver ?
> open source and available in our downstream kernel tree for folks who ?
> wants to leverage it in their private kernel.

The problems I am pointing out here are how your kernel driver is
interacting with the kernel driver model. It has almost nothing to do
with the user/kernel api at all.

And even if it did concern the user/kernel api, you already have a
userspace library that has to talk to this driver, all you need to do is
modify that to handle the expected changes that will happen when you
actually get experienced kernel developers to review your code and apis.

I don't understand what you all were expecting here. Did you not want
to fit into the kernel tree and take advantage of using the proper
functions and interfaces that have been created already? That should
enable your users to be able to rely on a more stable and secure and
widespread driver for everyone to use.

Instead you are saying that you don't appreciate the time and energy we
spent on trying to help you improve your code for your users, and wish
to just ignore our help?

By doing this, you are telling all community members that they should
not waste their time in reviewing your company's code submissions as
they are not appreciated or wanted unless they are just a rubber-stamp
approval. That's going to ensure that no one ever reviews your
contributions again in the future, which is an odd request, don't you
think?

Oh well, it's easy to add a filter to just ignore any future code
submissions from your email domain. I'll recommend that all other
kernel developers and maintainers do the same as we all have changes to
review from companies and developers that actually want our feedback.

best of luck!

greg k-h

2022-03-10 19:59:12

by Steve Pronovost

[permalink] [raw]
Subject: Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading


On 3/10/22 02:13, Greg KH wrote:
> On Wed, Mar 09, 2022 at 03:14:24PM -0800, Steve Pronovost wrote:
>> On 3/3/22 05:16, Greg KH wrote:
>>> On Wed, Mar 02, 2022 at 02:27:05PM -0800, Iouri Tarassov wrote:
>>>> On 3/2/2022 12:20 PM, Greg KH wrote:
>>>>> On Wed, Mar 02, 2022 at 10:49:15AM -0800, Iouri Tarassov wrote:
>>>>>> On 3/2/2022 3:53 AM, Wei Liu wrote:
>>>>>>> On Wed, Mar 02, 2022 at 08:53:15AM +0100, Greg KH wrote:
>>>>>>>> On Tue, Mar 01, 2022 at 10:23:21PM +0000, Wei Liu wrote:
>>>>>>>>>>> +struct dxgglobal *dxgglobal;
>>>>>>>>>> No, make this per-device, NEVER have a single device for your driver.
>>>>>>>>>> The Linux driver model makes it harder to do it this way than to do it
>>>>>>>>>> correctly. Do it correctly please and have no global structures like
>>>>>>>>>> this.
>>>>>>>>>>
>>>>>>>>> This may not be as big an issue as you thought. The device discovery is
>>>>>>>>> still done via the normal VMBus probing routine. For all intents and
>>>>>>>>> purposes the dxgglobal structure can be broken down into per device
>>>>>>>>> fields and a global structure which contains the protocol versioning
>>>>>>>>> information -- my understanding is there will always be a global
>>>>>>>>> structure to hold information related to the backend, regardless of how
>>>>>>>>> many devices there are.
>>>>>>>> Then that is wrong and needs to be fixed. Drivers should almost never
>>>>>>>> have any global data, that is not how Linux drivers work. What happens
>>>>>>>> when you get a second device in your system for this? Major rework
>>>>>>>> would have to happen and the code will break. Handle that all now as it
>>>>>>>> takes less work to make this per-device than it does to have a global
>>>>>>>> variable.
>>>>>>>>
>>>>>>> It is perhaps easier to draw parallel from an existing driver. I feel
>>>>>>> like we're talking past each other.
>>>>>>>
>>>>>>> Let's look at drivers/iommu/intel/iommu.c. There are a bunch of lists
>>>>>>> like `static LIST_HEAD(dmar_rmrr_units)`. During the probing phase, new
>>>>>>> units will be added to the list. I this the current code is following
>>>>>>> this model. dxgglobal fulfills the role of a list.
>>>>>>>
>>>>>>> Setting aside the question of whether it makes sense to keep a copy of
>>>>>>> the per-VM state in each device instance, I can see the code be changed
>>>>>>> to:
>>>>>>>
>>>>>>> struct mutex device_mutex; /* split out from dxgglobal */
>>>>>>> static LIST_HEAD(dxglist);
>>>>>>>
>>>>>>> /* Rename struct dxgglobal to struct dxgstate */
>>>>>>> struct dxgstate {
>>>>>>> struct list_head dxglist; /* link for dxglist */
>>>>>>> /* ... original fields sans device_mutex */
>>>>>>> }
>>>>>>>
>>>>>>> /*
>>>>>>> * Provide a bunch of helpers manipulate the list. Called in probe /
>>>>>>> * remove etc.
>>>>>>> */
>>>>>>> struct dxgstate *find_dxgstate(...);
>>>>>>> void remove_dxgstate(...);
>>>>>>> int add_dxgstate(...);
>>>>>>>
>>>>>>> This model is well understood and used in tree. It is just that it
>>>>>>> doesn't provide much value in doing this now since the list will only
>>>>>>> contain one element. I hope that you're not saying we cannot even use a
>>>>>>> per-module pointer to quickly get the data structure we want to use,
>>>>>>> right?
>>>>>>>
>>>>>>> Are you suggesting Iouri use dev_set_drvdata to stash the dxgstate
>>>>>>> into the device object? I think that can be done too.
>>>>>>>
>>>>>>> The code can be changed as:
>>>>>>>
>>>>>>> /* Rename struct dxgglobal to dxgstate and remove unneeded fields */
>>>>>>> struct dxgstate { ... };
>>>>>>>
>>>>>>> static int dxg_probe_vmbus(...) {
>>>>>>>
>>>>>>> /* probe successfully */
>>>>>>>
>>>>>>> struct dxgstate *state = kmalloc(...);
>>>>>>> /* Fill in dxgstate with information from backend */
>>>>>>>
>>>>>>> /* hdev->dev is the device object from the core driver framework */
>>>>>>> dev_set_drvdata(&hdev->dev, state);
>>>>>>> }
>>>>>>>
>>>>>>> static int dxg_remove_vmbus(...) {
>>>>>>> /* Normal stuff here ...*/
>>>>>>>
>>>>>>> struct dxgstate *state = dev_get_drvdata(...);
>>>>>>> dev_set_drvdata(..., NULL);
>>>>>>> kfree(state);
>>>>>>> }
>>>>>>>
>>>>>>> /* In all other functions */
>>>>>>> void do_things(...) {
>>>>>>> struct dxgstate *state = dev_get_drvdata(...);
>>>>>>>
>>>>>>> /* Use state in place of where dxgglobal was needed */
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> Iouri, notice this doesn't change anything regarding how userspace is
>>>>>>> designed. This is about how kernel organises its data.
>>>>>>>
>>>>>>> I hope what I wrote above can bring our understanding closer.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Wei.
>>>>>> I can certainly remove dxgglobal and keep the  pointer to the global
>>>>>> state in the device object.
>>>>>>
>>>>>> This will require passing of the global pointer to all functions, which
>>>>>> need to access it.
>>>>>>
>>>>>>
>>>>>> Maybe my understanding of the Greg's suggestion was not correct. I
>>>>>> thought the suggestion was
>>>>>>
>>>>>> to have multiple /dev/dxgN devices (one per virtual compute device).
>>>>> You have one device per HV device, as the bus already provides you.
>>>>> That's all you really need, right? Who would be opening the same device
>>>>> node multiple times?
>>>>>> This would change how the user mode
>>>>>> clients enumerate and communicate with compute devices.
>>>>> What does userspace have to do here? It should just open the device
>>>>> node that is present when needed. How will there be multiple userspace
>>>>> clients for a single HV device?
>>>> Dxgkrnl creates a single user mode visible device node /dev/dxg.
>>> When you do that, you have a device to put all of your data on. Use
>>> that.
>>>
>>>> It has
>>>> nothing to do with a specific hardware compute device on the host. Its
>>>> purpose is to provide services (IOCTLs) to enumerate and manage virtual
>>>> compute devices, which represent hardware devices on the host. The VMBus
>>>> devices are not used directly by user mode clients in the current design.
>>> That's horrid, why not just export the virtual compute devices properly
>>> through individual device nodes instead?
>>>
>>> In essence, you are creating a new syscall here to manage and handle
>>> devices for just your driver with the use of this ioctl. That's
>>> generally a bad idea.
>>>
>>>> Virtual compute devices are shared between processes. There could be a
>>>> Cuda application, Gimp and a Direct3D12 application working at the same
>>>> time.
>>> Why are all of those applications sharing anything? How are they
>>> sharing anything? If you need to share something across processes, use
>>> the existing inter-process ways of sharing stuff that we have today (12+
>>> different apis and counting). Don't create yet-another-one just for
>>> your tiny little driver here. That's rude to the rest of the kernel.
>>>
>>>> This is what I mean by saying that there are multiple user mode
>>>> clients who use the /dev/dxg driver interface. Each of this applications
>>>> will open the /dev/dxg device node and enumerate/use virtual compute
>>>> devices.
>>> That seems like an odd model to follow. How many virtual devices do you
>>> support? Is there a limit? Why not just enumerate them all to start
>>> with? Or if there are too many, why not do it like the raw device and
>>> have a single device node that is used to create the virtual devices you
>>> wish to use?
>>>
>>>> If we change the way how the virtual compute devices are visible to user
>>>> mode, the Cuda runtime, Direct3D runtime would need to be changed.
>>> We do not care about existing userspace code at this point in time, you
>>> are trying to get the kernel api correct here. Once you have done that,
>>> then you can go fix up your userspace code to work properly. Anything
>>> that came before today is pointless here, right? :)
>>>
>>>> I think we agreed that I will keep the global driver state in the device
>>>> object as Wei suggested and remove global variables. There still will be
>>>> a single /dev/dxg device node. Correct?
>>> I do not know, as I can't figure out your goals here at all, sorry.
>>>
>>> Please go work with some experienced Linux developers in your company.
>>> They should be the ones answering these questions for you, not me :)
>>>
>>> thanks,
>>>
>>> greg k-h
>> Thanks Greg and folks for all the feedback. The userspace API provided  
>> by this driver is specifically designed to match the Windows compute  
>> device abstraction in order to light up a large eco-system of graphics  
>> and compute APIs in our various virtualized Linux environment. This is  
>> a pretty critical design point as this is the only way we can scale to  
>> such a vast eco-system of APIs and their related tools. This enables  
>> APIs and framework which are available natively on Linux, such as  
>> CUDA, OpenGL, OpenCL, OpenVINO, OneAPI/L0, TensorFlow, Pytorch, soon  
>> Vulkan and more in the future and allows them to share the host GPU or  
>> compute device when run under our hypervisor. Most of these APIs sit  
>> directly on top of our userspace API and changing that abstraction  
>> would have major implications. Our D3D/DML userspace components are  
>> only implementation details for a subset of those APIs.
> This feels odd to me as I was only discussing the in-kernel apis, and
> how this driver does not follow them correctly at all (global device
> state, odd device creation, etc.) That should have nothing to do with
> the user/kernel api at all. And surely CUDA and the like never have
> defined a specific user/kernal api that must be followed, right?
>
> Yes, the existing user/kernel api in this driver is a bit odd, but
> really, we haven't even started to review that. For you all to just
> cut-and-run at the first sign of a detailed in-kernel code review is
> strange, why are you giving up now?
>
>> We have millions of developers experiencing and enjoying various Linux  
>> distributions through our Windows Subsystem for Linux (WSL) and this  
>> gives them a way to get hardware acceleration in a large number of  
>> important scenarios. WSL makes it trivial for developers to try out  
>> Linux totally risk free on over a billion devices, without having to  
>> commit to a native install upfront or understand and manually manage  
>> VMs. We think this is a pretty valuable proposition to both the  
>> Windows and the Linux community and one which has definitely been  
>> quite popular.
> This has nothing to do with how this tiny driver uses the in-kernel apis
> for creating and managing devices. You are discussing userspace
> applications here, which is a nice marketing discussion about how many
> developers you have, but means nothing when it comes to how the kernel
> code works. So why mention it at all?
>
>> That being said, we recognize that our virtualization approach makes  
>> some folks in the community uncomfortable and the resulting userspace  
>> API exposed by our driver will look a bit different than typical Linux  
>> driver. Taking into consideration feedback and reception, we decided  
>> to pause our upstream effort for this driver. We will keep our driver  
>> open source and available in our downstream kernel tree for folks who  
>> wants to leverage it in their private kernel.
> The problems I am pointing out here are how your kernel driver is
> interacting with the kernel driver model. It has almost nothing to do
> with the user/kernel api at all.
>
> And even if it did concern the user/kernel api, you already have a
> userspace library that has to talk to this driver, all you need to do is
> modify that to handle the expected changes that will happen when you
> actually get experienced kernel developers to review your code and apis.
>
> I don't understand what you all were expecting here. Did you not want
> to fit into the kernel tree and take advantage of using the proper
> functions and interfaces that have been created already? That should
> enable your users to be able to rely on a more stable and secure and
> widespread driver for everyone to use.
>
> Instead you are saying that you don't appreciate the time and energy we
> spent on trying to help you improve your code for your users, and wish
> to just ignore our help?
>
> By doing this, you are telling all community members that they should
> not waste their time in reviewing your company's code submissions as
> they are not appreciated or wanted unless they are just a rubber-stamp
> approval. That's going to ensure that no one ever reviews your
> contributions again in the future, which is an odd request, don't you
> think?
>
> Oh well, it's easy to add a filter to just ignore any future code
> submissions from your email domain. I'll recommend that all other
> kernel developers and maintainers do the same as we all have changes to
> review from companies and developers that actually want our feedback.
>
> best of luck!
>
> greg k-h
Hey Greg,
 
Apologies, I didn’t mean to imply we don’t appreciate the feedback from  
you or the community nor were we looking for a rubber stamp approval.  
You’ve provided great feedback and we’ll continue to evolve our driver  
toward this and we will continue to review internally with more  
experienced Linux kernel developers to get our driver into a better  
shape that can be upstreamed
 
It’s clear our code isn’t ready to be included upstream and we don’t  
want to waste reviewer’s time on it. We take all of the feedback  
seriously. Last year in our rfc, when we got  the feedback that we  
really needed an open source userspace, we went to great length to  
get one. We will continue to integrate all of the feedback we have  
gotten so far.
 
I totally agree that some of the feedback has nothing to do with the  
userspace API and we’ll evolve the driver toward that direction, it’s  
just goodness and will help make our driver simpler and more reliable.  
We will go back and figure out how to address this feedback, but it  
will take some time to do it right. The pause below wasn’t meant to  
imply we’re giving up forever, but that it will take a while before  
we can attempt again with a patch set that is hopefully better aligned  
and more mature in the future.
 
My email below was meant to be a thank you for the feedback, we’ll  
pause iteration on the current patch set since we have a lot of  
internalizing to do and some things that will take quite a while to  
address and we didn’t want to just go silent. but obviously I did a  
very poor job of communicating that. Let me assure you there is no ill  
feeling on our end and we do appreciate the feedback, it may take some  
time but we will be back again :-).
 
Thanks,
Steve