Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp911376pxb; Thu, 26 Aug 2021 18:39:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz98+QAedrDKT0Ts9i7vMWLhoCBrq66FbY0t06XcQeVTRMm/HG4jL0iZCOduPccAplmgbPj X-Received: by 2002:a05:6638:370a:: with SMTP id k10mr5908156jav.20.1630028342218; Thu, 26 Aug 2021 18:39:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630028342; cv=none; d=google.com; s=arc-20160816; b=vq510nW2pi3I5OANKJiSBCUBmiwBPSuLljqytZSppQIkDM4DXOJZ9z0Vz106IRMR5r /Bc1Bo/aXmEMnRsp7zLy1DG11e/p539zhHo5r4X4Tu8tTXiGo4Ma4xPubPstE13/Hmjx 6XSFyKg6/F8VXA59jeAzDd3ijiqCxh7rDrxeJIX0gBxqB9UNg/Ou5FDhe3IiOL8c3+gU SPPWMvjdUsWgzdmM2NycXU88kT2uuAnbGh9W9tPCxY7vTwnrE6T8F49J8h4WUYNHDdk3 M/pLpRgaYeHG+LExNq55pNV7GnspNIgG/cu7VFl1umcRQ09Da4Bf5Hz+a1F0aWqwFIqd OK1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date; bh=UmihXMzW2FMauDinW2o3N8mir2iRcbpPbm8ErvAgdVY=; b=KuLA/4n1dWS+KQFkGaVV3DItbRWYNRT5gMYN2XJjZkd0xuYZARoNL+SvEJoF4kyn/b j+wGpy2ccODlK8YvkQ1lkIpuJkPW7P1JoVtZ5+M23wIsO9RZTR5fEsf2xSAtkYPNGLKP b1R4Sl+2MtsrZU69jaP7/DYKOYJOWd2uuNNVMbdWQ7vscigUF9P+r0WPWaAyMIZVk/l5 UjvjUC8f8rGXjleqmeRe86OozcxxK416p++7n7b0TAWosuNfrmAGRQ63Su6+9OypkDfh Hb1V0Z7XgVhTRrMUZrK05KUf76ye5cDq3QbBEZ4kUKHdLCpaJfuHI46X2NzRAZFTiKVH GpQg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z2si4238795jaj.101.2021.08.26.18.38.51; Thu, 26 Aug 2021 18:39:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243951AbhH0Bh2 (ORCPT + 99 others); Thu, 26 Aug 2021 21:37:28 -0400 Received: from mga07.intel.com ([134.134.136.100]:34469 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231481AbhH0Bh2 (ORCPT ); Thu, 26 Aug 2021 21:37:28 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="281595661" X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="281595661" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 18:36:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="495424502" Received: from unknown (HELO coxu-arch-shz) ([10.239.160.21]) by fmsmga008.fm.intel.com with ESMTP; 26 Aug 2021 18:36:37 -0700 Date: Fri, 27 Aug 2021 09:36:36 +0800 (CST) From: Colin Xu To: Colin Xu cc: Alex Williamson , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, zhenyuw@linux.intel.com, hang.yuan@linux.intel.com, swee.yee.fonn@intel.com, fred.gao@intel.com Subject: Re: [PATCH] vfio/pci: Add OpRegion 2.0 Extended VBT support. In-Reply-To: <9dc7eba-2c49-8d40-46b-5f4382a25c1e@outlook.office365.com> Message-ID: <365a1c5b-53fe-cd2c-11a1-9678dde0c5@outlook.office365.com> References: <20210813021329.128543-1-colin.xu@intel.com> <20210816163958.04ab019c.alex.williamson@redhat.com> <9dc7eba-2c49-8d40-46b-5f4382a25c1e@outlook.office365.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alex, In addition to the background that devices on market may still need OpRegion 2.0 support in vfio-pci, do you have other comments to the patch body? On Tue, 17 Aug 2021, Colin Xu wrote: > On Mon, 16 Aug 2021, Alex Williamson wrote: > >> On Fri, 13 Aug 2021 10:13:29 +0800 >> Colin Xu wrote: >> >>> Due to historical reason, some legacy shipped system doesn't follow >>> OpRegion 2.1 spec but still stick to OpRegion 2.0, in which the extended >>> VBT is not contigious after OpRegion in physical address, but any >>> location pointed by RVDA via absolute address. Thus it's impossible >>> to map a contigious range to hold both OpRegion and extended VBT as 2.1. >>> >>> Since the only difference between OpRegion 2.0 and 2.1 is where extended >>> VBT is stored: For 2.0, RVDA is the absolute address of extended VBT >>> while for 2.1, RVDA is the relative address of extended VBT to OpRegion >>> baes, and there is no other difference between OpRegion 2.0 and 2.1, >>> it's feasible to amend OpRegion support for these legacy system (before >>> upgrading the system firmware), by kazlloc a range to shadown OpRegion >>> from the beginning and stitch VBT after closely, patch the shadow >>> OpRegion version from 2.0 to 2.1, and patch the shadow RVDA to relative >>> address. So that from the vfio igd OpRegion r/w ops view, only OpRegion >>> 2.1 is exposed regardless the underneath host OpRegion is 2.0 or 2.1 >>> if the extended VBT exists. vfio igd OpRegion r/w ops will return either >>> shadowed data (OpRegion 2.0) or directly from physical address >>> (OpRegion 2.1+) based on host OpRegion version and RVDA/RVDS. The shadow >>> mechanism makes it possible to support legacy systems on the market. >> >> Which systems does this enable? There's a suggestion above that these >> systems could update firmware to get OpRegion v2.1 support, why >> shouldn't we ask users to do that instead? When we added OpRegion v2.1 >> support we were told that v2.0 support was essentially non-existent, >> why should we add code to support and old spec with few users for such >> a niche use case? > Hi Alex, there was some mis-alignment with the BIOS owner that we were told > the 2.0 system doesn't for retail but only for internal development. However > in other projects we DO see the retail market has such systems, including NUC > NUC6CAYB, some APL industrial PC used in RT system, and some customized APL > motherboard by commercial virtualization solution. We immediately contact the > BIOS owner to ask for a clarification and they admit it. These system won't > get updated BIOS for OpRegion update but still under warranty. That's why the > OpRegion 2.0 support is still needed. > >> >>> Cc: Zhenyu Wang >>> Cc: Hang Yuan >>> Cc: Swee Yee Fonn >>> Cc: Fred Gao >>> Signed-off-by: Colin Xu >>> --- >>> drivers/vfio/pci/vfio_pci_igd.c | 117 ++++++++++++++++++++------------ >>> 1 file changed, 75 insertions(+), 42 deletions(-) >>> >>> diff --git a/drivers/vfio/pci/vfio_pci_igd.c >>> b/drivers/vfio/pci/vfio_pci_igd.c >>> index 228df565e9bc..22b9436a3044 100644 >>> --- a/drivers/vfio/pci/vfio_pci_igd.c >>> +++ b/drivers/vfio/pci/vfio_pci_igd.c >>> @@ -48,7 +48,10 @@ static size_t vfio_pci_igd_rw(struct vfio_pci_device >>> *vdev, char __user *buf, >>> static void vfio_pci_igd_release(struct vfio_pci_device *vdev, >>> struct vfio_pci_region *region) >>> { >>> - memunmap(region->data); >>> + if (is_ioremap_addr(region->data)) >>> + memunmap(region->data); >>> + else >>> + kfree(region->data); >>> } >>> >>> static const struct vfio_pci_regops vfio_pci_igd_regops = { >>> @@ -59,10 +62,11 @@ static const struct vfio_pci_regops >>> vfio_pci_igd_regops = { >>> static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev) >>> { >>> __le32 *dwordp = (__le32 *)(vdev->vconfig + OPREGION_PCI_ADDR); >>> - u32 addr, size; >>> - void *base; >>> + u32 addr, size, rvds = 0; >>> + void *base, *opregionvbt; >>> int ret; >>> u16 version; >>> + u64 rvda = 0; >>> >>> ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr); >>> if (ret) >>> @@ -89,66 +93,95 @@ static int vfio_pci_igd_opregion_init(struct >>> vfio_pci_device *vdev) >>> size *= 1024; /* In KB */ >>> >>> /* >>> - * Support opregion v2.1+ >>> - * When VBT data exceeds 6KB size and cannot be within mailbox #4, >>> then >>> - * the Extended VBT region next to opregion is used to hold the VBT >>> data. >>> - * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS >>> - * (Raw VBT Data Size) from opregion structure member are used to >>> hold the >>> - * address from region base and size of VBT data. RVDA/RVDS are not >>> - * defined before opregion 2.0. >>> + * OpRegion and VBT: >>> + * When VBT data doesn't exceed 6KB, it's stored in Mailbox #4. >>> + * When VBT data exceeds 6KB size, Mailbox #4 is no longer large >>> enough >>> + * to hold the VBT data, the Extended VBT region is introduced since >>> + * OpRegion 2.0 to hold the VBT data. Since OpRegion 2.0, RVDA/RVDS >>> are >>> + * introduced to define the extended VBT data location and size. >>> + * OpRegion 2.0: RVDA defines the absolute physical address of the >>> + * extended VBT data, RVDS defines the VBT data size. >>> + * OpRegion 2.1 and above: RVDA defines the relative address of the >>> + * extended VBT data to OpRegion base, RVDS defines the VBT data >>> size. >>> * >>> - * opregion 2.1+: RVDA is unsigned, relative offset from >>> - * opregion base, and should point to the end of opregion. >>> - * otherwise, exposing to userspace to allow read access to >>> everything between >>> - * the OpRegion and VBT is not safe. >>> - * RVDS is defined as size in bytes. >>> - * >>> - * opregion 2.0: rvda is the physical VBT address. >>> - * Since rvda is HPA it cannot be directly used in guest. >>> - * And it should not be practically available for end user,so it is >>> not supported. >>> + * Due to the RVDA difference in OpRegion VBT (also the only diff >>> between >>> + * 2.0 and 2.1), while for OpRegion 2.1 and above it's possible to >>> map >>> + * a contigious memory to expose OpRegion and VBT r/w via the vfio >>> + * region, for OpRegion 2.0 shadow and amendment mechanism is used to >>> + * expose OpRegion and VBT r/w properly. So that from r/w ops view, >>> only >>> + * OpRegion 2.1 is exposed regardless underneath Region is 2.0 or >>> 2.1. >>> */ >>> version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION)); >>> - if (version >= 0x0200) { >>> - u64 rvda; >>> - u32 rvds; >>> >>> + if (version >= 0x0200) { >>> rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA)); >>> rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS)); >>> + >>> + /* The extended VBT is valid only when RVDA/RVDS are >>> non-zero. */ >>> if (rvda && rvds) { >>> - /* no support for opregion v2.0 with physical VBT >>> address */ >>> - if (version == 0x0200) { >>> + size += rvds; >>> + } >>> + >>> + /* The extended VBT must follows OpRegion for OpRegion 2.1+ >>> */ >>> + if (rvda != size && version > 0x0200) { >> >> But we already added rvds to size, this is not compatible with the >> previous code that required rvda == size BEFORE adding rvds. >> >>> + memunmap(base); >>> + pci_err(vdev->pdev, >>> + "Extended VBT does not follow opregion on >>> version 0x%04x\n", >>> + version); >>> + return -EINVAL; >>> + } >>> + } >>> + >>> + if (size != OPREGION_SIZE) { >>> + /* Allocate memory for OpRegion and extended VBT for 2.0 */ >>> + if (rvda && rvds && version == 0x0200) { >>> + void *vbt_base; >>> + >>> + vbt_base = memremap(rvda, rvds, MEMREMAP_WB); >>> + if (!vbt_base) { >>> memunmap(base); >>> - pci_err(vdev->pdev, >>> - "IGD assignment does not support >>> opregion v2.0 with an extended VBT region\n"); >>> - return -EINVAL; >>> + return -ENOMEM; >>> } >>> >>> - if (rvda != size) { >>> + opregionvbt = kzalloc(size, GFP_KERNEL); >>> + if (!opregionvbt) { >>> memunmap(base); >>> - pci_err(vdev->pdev, >>> - "Extended VBT does not follow >>> opregion on version 0x%04x\n", >>> - version); >>> - return -EINVAL; >>> + memunmap(vbt_base); >>> + return -ENOMEM; >>> } >>> >>> - /* region size for opregion v2.0+: opregion and VBT >>> size. */ >>> - size += rvds; >>> + /* Stitch VBT after OpRegion noncontigious */ >>> + memcpy(opregionvbt, base, OPREGION_SIZE); >>> + memcpy(opregionvbt + OPREGION_SIZE, vbt_base, rvds); >>> + >>> + /* Patch OpRegion 2.0 to 2.1 */ >>> + *(__le16 *)(opregionvbt + OPREGION_VERSION) = 0x0201; >>> + /* Patch RVDA to relative address after OpRegion */ >>> + *(__le64 *)(opregionvbt + OPREGION_RVDA) = >>> OPREGION_SIZE; >> >> AIUI, the OpRegion is a two-way channel between the IGD device/system >> BIOS and the driver, numerous fields are writable by the driver. Now >> the driver writes to a shadow copy of the OpRegion table. What >> completes the write to the real OpRegion table for consumption by the >> device/BIOS? Likewise, what updates the fields that are written by the >> device/BIOS for consumption by the driver? >> >> If a shadow copy of the OpRegion detached from the physical table is >> sufficient here, why wouldn't we always shadow the OpRegion and prevent >> all userspace writes from touching the real version? Thanks, >> >> Alex >> >>> + >>> + memunmap(vbt_base); >>> + memunmap(base); >>> + >>> + /* Register shadow instead of map as vfio_region */ >>> + base = opregionvbt; >>> + /* Remap OpRegion + extended VBT for 2.1+ */ >>> + } else { >>> + memunmap(base); >>> + base = memremap(addr, size, MEMREMAP_WB); >>> + if (!base) >>> + return -ENOMEM; >>> } >>> } >>> >>> - if (size != OPREGION_SIZE) { >>> - memunmap(base); >>> - base = memremap(addr, size, MEMREMAP_WB); >>> - if (!base) >>> - return -ENOMEM; >>> - } >>> - >>> ret = vfio_pci_register_dev_region(vdev, >>> PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE, >>> VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, >>> &vfio_pci_igd_regops, size, VFIO_REGION_INFO_FLAG_READ, base); >>> if (ret) { >>> - memunmap(base); >>> + if (is_ioremap_addr(base)) >>> + memunmap(base); >>> + else >>> + kfree(base); >>> return ret; >>> } >>> >> >> > > -- > Best Regards, > Colin Xu > > -- Best Regards, Colin Xu