Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp3631348ybl; Sun, 8 Dec 2019 19:52:34 -0800 (PST) X-Google-Smtp-Source: APXvYqxt9/IZscFhrew5MPVAwsZ/sb48XQ7DXWw1weNvPYY7r0ZDOh5pisPQBHnRgIblbB9mMJn2 X-Received: by 2002:a05:6830:194:: with SMTP id q20mr10921264ota.92.1575863554164; Sun, 08 Dec 2019 19:52:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575863554; cv=none; d=google.com; s=arc-20160816; b=lQUpE0x61Dh957pxwBXNtMWR/Z0HKUVTzoAOOVzeKKdBpltHtbpN8x65ellqtETc3t H7FdDYMMm03GgpUnt+1/YrmOJ498cpUHTp9KmI65AnXhL7Tc3Q8nNQSqgEqbrBJym6wU 12Vo7+1PXmV9z9G9h54nJormT5KZdkTNrH4ans/7CtMaLYkA8BJFSKGrkgMwUG81irSN 9KEyoYGZZkpAdR9/wi4X2G6OyD0F+vbQQ76FQJNZEM5D6gcQjc/i8b9tNMByFSa5RaHM zh5AQAaFDjxdl/H6rEybD9UybnvvXtmOXKvMj8SVsIhojfGhm6Q9UlPCJ5yANysptUpm fILw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date; bh=zvhMnM5eTUtQ1nDnoj8oMoNSVV5O+rwS0bpTJwcWusY=; b=vtKicRmPdpqmRYKPz5OJYjtjsfXHg8c7GCP8g7AvHzkGNPAbKmoKVfUqC+GO+hwAxp 8VkdkJDh6hA6k4gMGT1eQhg/Pknb50ur1UimGtx0ycR9+umZEgHTROFbbHOjQm9VjHNq d6+2x9tOhNJxZD3FLZuazRJiEjGLUDJAh2pYaaDbHCL700bmhGtEdkaOyQ6U9JXLIkp+ W7obxKwMd6/ldz6jpIZYy3vVYF2FcH18VufZ9sYSlEafpN5f353NTbHMfhyNLkegIFtV VrpwMTMUmwzETVMRDfpp1oLvi6uj738pXbUnaLEpiguKh7v251dKEwogR+2wO4cFZpY/ BzZw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q5si1197758otc.56.2019.12.08.19.52.22; Sun, 08 Dec 2019 19:52:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727077AbfLIDui (ORCPT + 99 others); Sun, 8 Dec 2019 22:50:38 -0500 Received: from mga04.intel.com ([192.55.52.120]:41836 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726748AbfLIDui (ORCPT ); Sun, 8 Dec 2019 22:50:38 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2019 19:50:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,294,1571727600"; d="scan'208";a="244308757" Received: from joy-optiplex-7040.sh.intel.com (HELO joy-OptiPlex-7040) ([10.239.13.9]) by fmsmga002.fm.intel.com with ESMTP; 08 Dec 2019 19:50:35 -0800 Date: Sun, 8 Dec 2019 22:42:25 -0500 From: Yan Zhao To: Alex Williamson Cc: "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "libvir-list@redhat.com" , "qemu-devel@nongnu.org" , "cohuck@redhat.com" , "zhenyuw@linux.intel.com" , "Wang, Zhi A" , "Tian, Kevin" , "He, Shaopeng" Subject: Re: [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops Message-ID: <20191209034225.GK31791@joy-OptiPlex-7040> Reply-To: Yan Zhao References: <20191205032419.29606-1-yan.y.zhao@intel.com> <20191205032536.29653-1-yan.y.zhao@intel.com> <20191205165519.106bd210@x1.home> <20191206075655.GG31791@joy-OptiPlex-7040> <20191206142226.2698a2be@x1.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191206142226.2698a2be@x1.home> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Dec 07, 2019 at 05:22:26AM +0800, Alex Williamson wrote: > On Fri, 6 Dec 2019 02:56:55 -0500 > Yan Zhao wrote: > > > On Fri, Dec 06, 2019 at 07:55:19AM +0800, Alex Williamson wrote: > > > On Wed, 4 Dec 2019 22:25:36 -0500 > > > Yan Zhao wrote: > > > > > > > when vfio-pci is bound to a physical device, almost all the hardware > > > > resources are passthroughed. > > > > Sometimes, vendor driver of this physcial device may want to mediate some > > > > hardware resource access for a short period of time, e.g. dirty page > > > > tracking during live migration. > > > > > > > > Here we introduce mediate ops in vfio-pci for this purpose. > > > > > > > > Vendor driver can register a mediate ops to vfio-pci. > > > > But rather than directly bind to the passthroughed device, the > > > > vendor driver is now either a module that does not bind to any device or > > > > a module binds to other device. > > > > E.g. when passing through a VF device that is bound to vfio-pci modules, > > > > PF driver that binds to PF device can register to vfio-pci to mediate > > > > VF's regions, hence supporting VF live migration. > > > > > > > > The sequence goes like this: > > > > 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver > > > > > > > > 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops > > > > > > > > 3. Whenever vfio-pci opens a device, it searches the list and call > > > > vfio_pci_mediate_ops->open() to check whether a vendor driver supports > > > > mediating this device. > > > > Upon a success return value of from vfio_pci_mediate_ops->open(), > > > > vfio-pci will stop list searching and store a mediate handle to > > > > represent this open into vendor driver. > > > > (so if multiple vendor drivers support mediating a device through > > > > vfio_pci_mediate_ops, only one will win, depending on their registering > > > > sequence) > > > > > > > > 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci > > > > ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that > > > > vendor driver is able to override a region's default flags and caps, > > > > e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole > > > > region. > > > > > > > > 5. vfio_pci_rw()/vfio_pci_mmap() first calls into > > > > vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps(). > > > > if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further > > > > passthrough this read/write/mmap to physical device, otherwise it just > > > > returns without touch physical device. > > > > > > > > 6. When vfio-pci closes a device, vfio_pci_release() chains into > > > > vfio_pci_mediate_ops->release() to close the reference in vendor driver. > > > > > > > > 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits > > > > > > > > Cc: Kevin Tian > > > > > > > > Signed-off-by: Yan Zhao > > > > --- > > > > drivers/vfio/pci/vfio_pci.c | 146 ++++++++++++++++++++++++++++ > > > > drivers/vfio/pci/vfio_pci_private.h | 2 + > > > > include/linux/vfio.h | 16 +++ > > > > 3 files changed, 164 insertions(+) > > > > > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > > > > index 02206162eaa9..55080ff29495 100644 > > > > --- a/drivers/vfio/pci/vfio_pci.c > > > > +++ b/drivers/vfio/pci/vfio_pci.c > > > > @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); > > > > MODULE_PARM_DESC(disable_idle_d3, > > > > "Disable using the PCI D3 low power state for idle, unused devices"); > > > > > > > > +static LIST_HEAD(mediate_ops_list); > > > > +static DEFINE_MUTEX(mediate_ops_list_lock); > > > > +struct vfio_pci_mediate_ops_list_entry { > > > > + struct vfio_pci_mediate_ops *ops; > > > > + int refcnt; > > > > + struct list_head next; > > > > +}; > > > > + > > > > static inline bool vfio_vga_disabled(void) > > > > { > > > > #ifdef CONFIG_VFIO_PCI_VGA > > > > @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data) > > > > if (!(--vdev->refcnt)) { > > > > vfio_spapr_pci_eeh_release(vdev->pdev); > > > > vfio_pci_disable(vdev); > > > > + if (vdev->mediate_ops && vdev->mediate_ops->release) { > > > > + vdev->mediate_ops->release(vdev->mediate_handle); > > > > + vdev->mediate_ops = NULL; > > > > + } > > > > } > > > > > > > > mutex_unlock(&vdev->reflck->lock); > > > > @@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data) > > > > { > > > > struct vfio_pci_device *vdev = device_data; > > > > int ret = 0; > > > > + struct vfio_pci_mediate_ops_list_entry *mentry; > > > > > > > > if (!try_module_get(THIS_MODULE)) > > > > return -ENODEV; > > > > @@ -495,6 +508,30 @@ static int vfio_pci_open(void *device_data) > > > > goto error; > > > > > > > > vfio_spapr_pci_eeh_open(vdev->pdev); > > > > + mutex_lock(&mediate_ops_list_lock); > > > > + list_for_each_entry(mentry, &mediate_ops_list, next) { > > > > + u64 caps; > > > > + u32 handle; > > > > > > Wouldn't it seem likely that the ops provider might use this handle as > > > a pointer, so we'd want it to be an opaque void*? > > > > > yes, you are right, handle as a pointer is much better. will change it. > > Thanks :) > > > > > > + > > > > + memset(&caps, 0, sizeof(caps)); > > > > > > @caps has no purpose here, add it if/when we do something with it. > > > It's also a standard type, why are we memset'ing it rather than just > > > =0?? > > > > > > > + ret = mentry->ops->open(vdev->pdev, &caps, &handle); > > > > + if (!ret) { > > > > + vdev->mediate_ops = mentry->ops; > > > > + vdev->mediate_handle = handle; > > > > + > > > > + pr_info("vfio pci found mediate_ops %s, caps=%llx, handle=%x for %x:%x\n", > > > > + vdev->mediate_ops->name, caps, > > > > + handle, vdev->pdev->vendor, > > > > + vdev->pdev->device); > > > > > > Generally not advisable to make user accessible printks. > > > > > ok. > > > > > > + /* > > > > + * only find the first matching mediate_ops, > > > > + * and add its refcnt > > > > + */ > > > > + mentry->refcnt++; > > > > + break; > > > > + } > > > > + } > > > > + mutex_unlock(&mediate_ops_list_lock); > > > > } > > > > vdev->refcnt++; > > > > error: > > > > @@ -736,6 +773,14 @@ static long vfio_pci_ioctl(void *device_data, > > > > info.size = pdev->cfg_size; > > > > info.flags = VFIO_REGION_INFO_FLAG_READ | > > > > VFIO_REGION_INFO_FLAG_WRITE; > > > > + > > > > + if (vdev->mediate_ops && > > > > + vdev->mediate_ops->get_region_info) { > > > > + vdev->mediate_ops->get_region_info( > > > > + vdev->mediate_handle, > > > > + &info, &caps, NULL); > > > > + } > > > > > > These would be a lot cleaner if we could just call a helper function: > > > > > > void vfio_pci_region_info_mediation_hook(vdev, info, caps, etc...) > > > { > > > if (vdev->mediate_ops > > > vdev->mediate_ops->get_region_info) > > > vdev->mediate_ops->get_region_info(vdev->mediate_handle, > > > &info, &caps, NULL); > > > } > > > > > > I'm not thrilled with all these hooks, but not open coding every one of > > > them might help. > > > > ok. got it. > > > > > > > + > > > > break; > > > > case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: > > > > info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); > > > > @@ -756,6 +801,13 @@ static long vfio_pci_ioctl(void *device_data, > > > > } > > > > } > > > > > > > > + if (vdev->mediate_ops && > > > > + vdev->mediate_ops->get_region_info) { > > > > + vdev->mediate_ops->get_region_info( > > > > + vdev->mediate_handle, > > > > + &info, &caps, NULL); > > > > + } > > > > + > > > > break; > > > > case VFIO_PCI_ROM_REGION_INDEX: > > > > { > > > > @@ -794,6 +846,14 @@ static long vfio_pci_ioctl(void *device_data, > > > > } > > > > > > > > pci_write_config_word(pdev, PCI_COMMAND, orig_cmd); > > > > + > > > > + if (vdev->mediate_ops && > > > > + vdev->mediate_ops->get_region_info) { > > > > + vdev->mediate_ops->get_region_info( > > > > + vdev->mediate_handle, > > > > + &info, &caps, NULL); > > > > + } > > > > + > > > > break; > > > > } > > > > case VFIO_PCI_VGA_REGION_INDEX: > > > > @@ -805,6 +865,13 @@ static long vfio_pci_ioctl(void *device_data, > > > > info.flags = VFIO_REGION_INFO_FLAG_READ | > > > > VFIO_REGION_INFO_FLAG_WRITE; > > > > > > > > + if (vdev->mediate_ops && > > > > + vdev->mediate_ops->get_region_info) { > > > > + vdev->mediate_ops->get_region_info( > > > > + vdev->mediate_handle, > > > > + &info, &caps, NULL); > > > > + } > > > > + > > > > break; > > > > default: > > > > { > > > > @@ -839,6 +906,13 @@ static long vfio_pci_ioctl(void *device_data, > > > > if (ret) > > > > return ret; > > > > } > > > > + > > > > + if (vdev->mediate_ops && > > > > + vdev->mediate_ops->get_region_info) { > > > > + vdev->mediate_ops->get_region_info( > > > > + vdev->mediate_handle, > > > > + &info, &caps, &cap_type); > > > > + } > > > > } > > > > } > > > > > > > > @@ -1151,6 +1225,16 @@ static ssize_t vfio_pci_rw(void *device_data, char __user *buf, > > > > if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) > > > > return -EINVAL; > > > > > > > > + if (vdev->mediate_ops && vdev->mediate_ops->rw) { > > > > + int ret; > > > > + bool pt = true; > > > > + > > > > + ret = vdev->mediate_ops->rw(vdev->mediate_handle, > > > > + buf, count, ppos, iswrite, &pt); > > > > + if (!pt) > > > > + return ret; > > > > + } > > > > + > > > > switch (index) { > > > > case VFIO_PCI_CONFIG_REGION_INDEX: > > > > return vfio_pci_config_rw(vdev, buf, count, ppos, iswrite); > > > > @@ -1200,6 +1284,15 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) > > > > u64 phys_len, req_len, pgoff, req_start; > > > > int ret; > > > > > > > > + if (vdev->mediate_ops && vdev->mediate_ops->mmap) { > > > > + int ret; > > > > + bool pt = true; > > > > + > > > > + ret = vdev->mediate_ops->mmap(vdev->mediate_handle, vma, &pt); > > > > + if (!pt) > > > > + return ret; > > > > + } > > > > > > There must be a better way to do all these. Do we really want to call > > > into ops for every rw or mmap, have the vendor code decode a region, > > > and maybe or maybe not have it handle it? It's pretty ugly. Do we > > > > do you think below flow is good ? > > 1. in mediate_ops->open(), return > > (1) region[] indexed by region index, if a mediate driver supports mediating > > region[i], region[i].ops->get_region_info, regions[i].ops->rw, or > > regions[i].ops->mmap is not null. > > (2) irq_info[] indexed by irq index, if a mediate driver supports mediating > > irq_info[i], irq_info[i].ops->get_irq_info or irq_info[i].ops->set_irq_info > > is not null. > > > > Then, vfio_pci_rw/vfio_pci_mmap/vfio_pci_ioctl only call into those > > non-null hooks. > > Or would it be better to always call into the hooks and the vendor > driver is allowed to selectively replace the hooks for regions they > want to mediate. For example, region[i].ops->rw could by default point > to vfio_pci_default_rw() and the mediation driver would have a > mechanism to replace that with its own vendorABC_vfio_pci_rw(). We > could export vfio_pci_default_rw() such that the vendor driver would be > responsible for calling it as necessary. > good idea :) > > > need the mediation provider to be able to dynamically setup the ops per > > May I confirm that you are not saying dynamic registering mediate ops > > after vfio-pci already opened a device, right? > > I'm not necessarily excluding or advocating for that. > ok. got it. > > > region and export the default handlers out for them to call? > > > > > could we still keep checking return value of the hooks rather than > > export default handlers? Otherwise at least vfio_pci_default_ioctl(), > > vfio_pci_default_rw(), and vfio_pci_default_mmap() need to be exported. > > The ugliness of vfio-pci having all these vendor branches is what I'm > trying to avoid, so I really am not a fan of the idea or mechanism that > the vfio-pci core code is directly involving a mediation driver and > handling the return for every entry point. > I see :) > > > > + > > > > index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); > > > > > > > > if (vma->vm_end < vma->vm_start) > > > > @@ -1629,8 +1722,17 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) > > > > > > > > static void __exit vfio_pci_cleanup(void) > > > > { > > > > + struct vfio_pci_mediate_ops_list_entry *mentry, *n; > > > > + > > > > pci_unregister_driver(&vfio_pci_driver); > > > > vfio_pci_uninit_perm_bits(); > > > > + > > > > + mutex_lock(&mediate_ops_list_lock); > > > > + list_for_each_entry_safe(mentry, n, &mediate_ops_list, next) { > > > > + list_del(&mentry->next); > > > > + kfree(mentry); > > > > + } > > > > + mutex_unlock(&mediate_ops_list_lock); > > > > > > Is it even possible to unload vfio-pci while there are mediation > > > drivers registered? I don't think the module interactions are well > > > thought out here, ex. do you really want i40e to have build and runtime > > > dependencies on vfio-pci? I don't think so. > > > > > Currently, yes, i40e has build dependency on vfio-pci. > > It's like this, if i40e decides to support SRIOV and compiles in vf > > related code who depends on vfio-pci, it will also have build dependency > > on vfio-pci. isn't it natural? > > No, this is not natural. There are certainly i40e VF use cases that > have no interest in vfio and having dependencies between the two > modules is unacceptable. I think you probably want to modularize the > i40e vfio support code and then perhaps register a table in vfio-pci > that the vfio-pci code can perform a module request when using a > compatible device. Just and idea, there might be better options. I > will not accept a solution that requires unloading the i40e driver in > order to unload the vfio-pci driver. It's inconvenient with just one > NIC driver, imagine how poorly that scales. > what about this way: mediate driver registers a module notifier and every time when vfio_pci is loaded, register to vfio_pci its mediate ops? (Just like in below sample code) This way vfio-pci is free to unload and this registering only gives vfio-pci a name of what module to request. After that, in vfio_pci_open(), vfio-pci requests the mediate driver. (or puts the mediate driver when mediate driver does not support mediating the device) in vfio_pci_release(), vfio-pci puts the mediate driver. static void register_mediate_ops(void) { int (*func)(struct vfio_pci_mediate_ops *ops) = NULL; func = symbol_get(vfio_pci_register_mediate_ops); if (func) { func(&igd_dt_ops); symbol_put(vfio_pci_register_mediate_ops); } } static int igd_module_notify(struct notifier_block *self, unsigned long val, void *data) { struct module *mod = data; int ret = 0; switch (val) { case MODULE_STATE_LIVE: if (!strcmp(mod->name, "vfio_pci")) register_mediate_ops(); break; case MODULE_STATE_GOING: break; default: break; } return ret; } static struct notifier_block igd_module_nb = { .notifier_call = igd_module_notify, .priority = 0, }; static int __init igd_dt_init(void) { ... register_mediate_ops(); register_module_notifier(&igd_module_nb); ... return 0; } > > > > } > > > > > > > > static void __init vfio_pci_fill_ids(void) > > > > @@ -1697,6 +1799,50 @@ static int __init vfio_pci_init(void) > > > > return ret; > > > > } > > > > > > > > +int vfio_pci_register_mediate_ops(struct vfio_pci_mediate_ops *ops) > > > > +{ > > > > + struct vfio_pci_mediate_ops_list_entry *mentry; > > > > + > > > > + mutex_lock(&mediate_ops_list_lock); > > > > + mentry = kzalloc(sizeof(*mentry), GFP_KERNEL); > > > > + if (!mentry) { > > > > + mutex_unlock(&mediate_ops_list_lock); > > > > + return -ENOMEM; > > > > + } > > > > + > > > > + mentry->ops = ops; > > > > + mentry->refcnt = 0; > > > > > > It's kZalloc'd, this is unnecessary. > > > > > right :) > > > > + list_add(&mentry->next, &mediate_ops_list); > > > > > > Check for duplicates? > > > > > ok. will do it. > > > > + > > > > + pr_info("registered dm ops %s\n", ops->name); > > > > + mutex_unlock(&mediate_ops_list_lock); > > > > + > > > > + return 0; > > > > +} > > > > +EXPORT_SYMBOL(vfio_pci_register_mediate_ops); > > > > + > > > > +void vfio_pci_unregister_mediate_ops(struct vfio_pci_mediate_ops *ops) > > > > +{ > > > > + struct vfio_pci_mediate_ops_list_entry *mentry, *n; > > > > + > > > > + mutex_lock(&mediate_ops_list_lock); > > > > + list_for_each_entry_safe(mentry, n, &mediate_ops_list, next) { > > > > + if (mentry->ops != ops) > > > > + continue; > > > > + > > > > + mentry->refcnt--; > > > > > > Whose reference is this removing? > > > > > I intended to prevent mediate driver from calling unregister mediate ops > > while there're still opened devices in it. > > after a successful mediate_ops->open(), mentry->refcnt++. > > after calling mediate_ops->release(). mentry->refcnt--. > > > > (seems in this RFC, I missed a mentry->refcnt-- after calling > > mediate_ops->release()) > > > > > > > > + if (!mentry->refcnt) { > > > > + list_del(&mentry->next); > > > > + kfree(mentry); > > > > + } else > > > > + pr_err("vfio_pci unregister mediate ops %s error\n", > > > > + mentry->ops->name); > > > > > > This is bad, we should hold a reference to the module providing these > > > ops for each use of it such that the module cannot be removed while > > > it's in use. Otherwise we enter a very bad state here and it's > > > trivially accessible by an admin remove the module while in use. > > mediate driver is supposed to ref its own module on a success > > mediate_ops->open(), and deref its own module on mediate_ops->release(). > > so, it can't be accidentally removed. > > Where was that semantic expressed in this series? We should create > interfaces that are hard to use incorrectly. It is far too easy for a > vendor driver to overlook such a requirement, which means fixing the > same bugs repeatedly for each vendor. It needs to be improved. Thanks, right. will improve it. Thanks Yan