Received: by 2002:a05:6a10:8a4d:0:0:0:0 with SMTP id dn13csp377058pxb; Thu, 12 Aug 2021 19:18:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxOfV+dZgvUZx/ZeVnhzM+lOgE9lE0amywhBOJl3i9emFVKYshwcDSboyXF2rGO0hpKinyc X-Received: by 2002:a05:6402:26c6:: with SMTP id x6mr12173edd.175.1628821091248; Thu, 12 Aug 2021 19:18:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628821091; cv=none; d=google.com; s=arc-20160816; b=GxN7GAGHRdjKP3qKwFn2IJ/RJshzqAhzn++scEwZYLV7ZVF+/48E7p95MhjM+kIaQ0 /bVh3vOFhmzRWwW4s5wXatDzzFhOMV9Tv4P3n+4wfWQ+aLPYL6xY0y9Dg7Elp2U6NtQC XjQASQqzr/5XwX0/hLcgAHa8/Q5ZdT2f/Z5xHwe1carBq2RN4Xu3aIewzy74FIfH2/x8 7zdJxT1LBLorEP7hXlL1YO605NLYTn9r15FWPveYrcF8gXdIVNPvFRhX7NHn7bed1cB0 58NVcKmHhToeRoti0bA7Yj4jsz7C7KlW7OjrGNH1B5wJp/QMLuB8wTMPdHUBbVMOGBxh rQ/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=pey/y+NQi6mFmOUKEyJZJKA/WhF7AU7PH/OiQylAGUo=; b=VBEvGaWy4oOwqAibxJgJmSyfV4PdKcV/vZe3fV0xGCLlS6egRaKTUWhDV9jribShxw DOSivZbV5/qqbywMTpJOd21A0goeRm66TEopCCJ19plOzq4EHiQGjC3ml/kiCmaAJ03V pc3R/1dPCdttslbelLnM1nLsqtY2l/5xZbubfw1oKZ26VLtK1DZPrYqjW6pyctE97hb0 rwkwBoFB00zuEYw/cPd3S75Szp1YHP9CJptbGphYrgzADpfSt9GUzTb5eL1OWHJ/Y4H1 +1dffnrr6ls0japGu7WL1rtR/V2dte0UiR9Ab7z6CLRiYEUZbi+9xHE4QpX24OSvpybe VtjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qf21si109022ejc.483.2021.08.12.19.17.44; Thu, 12 Aug 2021 19:18:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235028AbhHMCOE (ORCPT + 99 others); Thu, 12 Aug 2021 22:14:04 -0400 Received: from mga07.intel.com ([134.134.136.100]:48088 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233036AbhHMCOE (ORCPT ); Thu, 12 Aug 2021 22:14:04 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10074"; a="279226606" X-IronPort-AV: E=Sophos;i="5.84,317,1620716400"; d="scan'208";a="279226606" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2021 19:13:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,317,1620716400"; d="scan'208";a="517789531" Received: from unknown (HELO coxu-arch-shz.sh.intel.com) ([10.239.160.21]) by FMSMGA003.fm.intel.com with ESMTP; 12 Aug 2021 19:13:34 -0700 From: Colin Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, alex.williamson@redhat.com Cc: colin.xu@intel.com, zhenyuw@linux.intel.com, hang.yuan@linux.intel.com, swee.yee.fonn@intel.com, fred.gao@intel.com Subject: [PATCH] vfio/pci: Add OpRegion 2.0 Extended VBT support. Date: Fri, 13 Aug 2021 10:13:29 +0800 Message-Id: <20210813021329.128543-1-colin.xu@intel.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Due to historical reason, some legacy shipped system doesn't follow OpRegion 2.1 spec but still stick to OpRegion 2.0, in which the extended VBT is not contigious after OpRegion in physical address, but any location pointed by RVDA via absolute address. Thus it's impossible to map a contigious range to hold both OpRegion and extended VBT as 2.1. Since the only difference between OpRegion 2.0 and 2.1 is where extended VBT is stored: For 2.0, RVDA is the absolute address of extended VBT while for 2.1, RVDA is the relative address of extended VBT to OpRegion baes, and there is no other difference between OpRegion 2.0 and 2.1, it's feasible to amend OpRegion support for these legacy system (before upgrading the system firmware), by kazlloc a range to shadown OpRegion from the beginning and stitch VBT after closely, patch the shadow OpRegion version from 2.0 to 2.1, and patch the shadow RVDA to relative address. So that from the vfio igd OpRegion r/w ops view, only OpRegion 2.1 is exposed regardless the underneath host OpRegion is 2.0 or 2.1 if the extended VBT exists. vfio igd OpRegion r/w ops will return either shadowed data (OpRegion 2.0) or directly from physical address (OpRegion 2.1+) based on host OpRegion version and RVDA/RVDS. The shadow mechanism makes it possible to support legacy systems on the market. Cc: Zhenyu Wang Cc: Hang Yuan Cc: Swee Yee Fonn Cc: Fred Gao Signed-off-by: Colin Xu --- drivers/vfio/pci/vfio_pci_igd.c | 117 ++++++++++++++++++++------------ 1 file changed, 75 insertions(+), 42 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c index 228df565e9bc..22b9436a3044 100644 --- a/drivers/vfio/pci/vfio_pci_igd.c +++ b/drivers/vfio/pci/vfio_pci_igd.c @@ -48,7 +48,10 @@ static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf, static void vfio_pci_igd_release(struct vfio_pci_device *vdev, struct vfio_pci_region *region) { - memunmap(region->data); + if (is_ioremap_addr(region->data)) + memunmap(region->data); + else + kfree(region->data); } static const struct vfio_pci_regops vfio_pci_igd_regops = { @@ -59,10 +62,11 @@ static const struct vfio_pci_regops vfio_pci_igd_regops = { static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev) { __le32 *dwordp = (__le32 *)(vdev->vconfig + OPREGION_PCI_ADDR); - u32 addr, size; - void *base; + u32 addr, size, rvds = 0; + void *base, *opregionvbt; int ret; u16 version; + u64 rvda = 0; ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr); if (ret) @@ -89,66 +93,95 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev) size *= 1024; /* In KB */ /* - * Support opregion v2.1+ - * When VBT data exceeds 6KB size and cannot be within mailbox #4, then - * the Extended VBT region next to opregion is used to hold the VBT data. - * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS - * (Raw VBT Data Size) from opregion structure member are used to hold the - * address from region base and size of VBT data. RVDA/RVDS are not - * defined before opregion 2.0. + * OpRegion and VBT: + * When VBT data doesn't exceed 6KB, it's stored in Mailbox #4. + * When VBT data exceeds 6KB size, Mailbox #4 is no longer large enough + * to hold the VBT data, the Extended VBT region is introduced since + * OpRegion 2.0 to hold the VBT data. Since OpRegion 2.0, RVDA/RVDS are + * introduced to define the extended VBT data location and size. + * OpRegion 2.0: RVDA defines the absolute physical address of the + * extended VBT data, RVDS defines the VBT data size. + * OpRegion 2.1 and above: RVDA defines the relative address of the + * extended VBT data to OpRegion base, RVDS defines the VBT data size. * - * opregion 2.1+: RVDA is unsigned, relative offset from - * opregion base, and should point to the end of opregion. - * otherwise, exposing to userspace to allow read access to everything between - * the OpRegion and VBT is not safe. - * RVDS is defined as size in bytes. - * - * opregion 2.0: rvda is the physical VBT address. - * Since rvda is HPA it cannot be directly used in guest. - * And it should not be practically available for end user,so it is not supported. + * Due to the RVDA difference in OpRegion VBT (also the only diff between + * 2.0 and 2.1), while for OpRegion 2.1 and above it's possible to map + * a contigious memory to expose OpRegion and VBT r/w via the vfio + * region, for OpRegion 2.0 shadow and amendment mechanism is used to + * expose OpRegion and VBT r/w properly. So that from r/w ops view, only + * OpRegion 2.1 is exposed regardless underneath Region is 2.0 or 2.1. */ version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION)); - if (version >= 0x0200) { - u64 rvda; - u32 rvds; + if (version >= 0x0200) { rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA)); rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS)); + + /* The extended VBT is valid only when RVDA/RVDS are non-zero. */ if (rvda && rvds) { - /* no support for opregion v2.0 with physical VBT address */ - if (version == 0x0200) { + size += rvds; + } + + /* The extended VBT must follows OpRegion for OpRegion 2.1+ */ + if (rvda != size && version > 0x0200) { + memunmap(base); + pci_err(vdev->pdev, + "Extended VBT does not follow opregion on version 0x%04x\n", + version); + return -EINVAL; + } + } + + if (size != OPREGION_SIZE) { + /* Allocate memory for OpRegion and extended VBT for 2.0 */ + if (rvda && rvds && version == 0x0200) { + void *vbt_base; + + vbt_base = memremap(rvda, rvds, MEMREMAP_WB); + if (!vbt_base) { memunmap(base); - pci_err(vdev->pdev, - "IGD assignment does not support opregion v2.0 with an extended VBT region\n"); - return -EINVAL; + return -ENOMEM; } - if (rvda != size) { + opregionvbt = kzalloc(size, GFP_KERNEL); + if (!opregionvbt) { memunmap(base); - pci_err(vdev->pdev, - "Extended VBT does not follow opregion on version 0x%04x\n", - version); - return -EINVAL; + memunmap(vbt_base); + return -ENOMEM; } - /* region size for opregion v2.0+: opregion and VBT size. */ - size += rvds; + /* Stitch VBT after OpRegion noncontigious */ + memcpy(opregionvbt, base, OPREGION_SIZE); + memcpy(opregionvbt + OPREGION_SIZE, vbt_base, rvds); + + /* Patch OpRegion 2.0 to 2.1 */ + *(__le16 *)(opregionvbt + OPREGION_VERSION) = 0x0201; + /* Patch RVDA to relative address after OpRegion */ + *(__le64 *)(opregionvbt + OPREGION_RVDA) = OPREGION_SIZE; + + memunmap(vbt_base); + memunmap(base); + + /* Register shadow instead of map as vfio_region */ + base = opregionvbt; + /* Remap OpRegion + extended VBT for 2.1+ */ + } else { + memunmap(base); + base = memremap(addr, size, MEMREMAP_WB); + if (!base) + return -ENOMEM; } } - if (size != OPREGION_SIZE) { - memunmap(base); - base = memremap(addr, size, MEMREMAP_WB); - if (!base) - return -ENOMEM; - } - ret = vfio_pci_register_dev_region(vdev, PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE, VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &vfio_pci_igd_regops, size, VFIO_REGION_INFO_FLAG_READ, base); if (ret) { - memunmap(base); + if (is_ioremap_addr(base)) + memunmap(base); + else + kfree(base); return ret; } -- 2.32.0