Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp916473pxb; Thu, 26 Aug 2021 18:49:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwbxPCXqnbOG9R08tyk4yhTTRNvuHxI6sr6k5cezEA9bix4B8A2lBNBR6SLG22BpKx6Er0L X-Received: by 2002:a5e:8e04:: with SMTP id a4mr5463285ion.56.1630028989346; Thu, 26 Aug 2021 18:49:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630028989; cv=none; d=google.com; s=arc-20160816; b=iTt/tolDa+r9Dt2TMLkpI+BV1hmPwAIO6tkiZiz2luatjh7fYwXX9BAbXZ/zZmdQnr q1/89mQa50aaU1ln30mRFt7icH7dUCkd9X09b/MnvEm77L+DMfzGST0t+DpCgVpjcwZX 3G4NfuSHtH5EouNkzWAW83O8/8J13KyF5GbQUJq6gKr4+ePudZTNPJg7SSbUR90TF5uE TDyN32U57ps0Ip6vZ4TCMyDEgGkT8R9IjLboU1TTZCx/vqltauMilPnF7fTU+6uxb7Tt /lhZbl5VB3mXqZ8T3b6/oiSF51EE1zkpbth/OfSJM+J5O5cAdLbBzgUHZgWppbDwZTBN 8A2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=DBae8PxX7qj8o1pcDXyyfMAR2sUWBm6T7LaML7kuDgM=; b=UHYBqRySQ+4qtsQ04Bt1OJka9Sbwz+S7J4pLAvQE6cecz9fY+OJ5o6nTw8HmgCBOig f5BI1yLFiFpN7bGLuzGdq30dvjMuHl2Pmnz/cFpdfVC3gKpa0bHO8qD7dyTPWo2NKqvU GKDrGFaQ7+YqbTtlb6q3910QfoBjxhn2MmRxzjp0IExSlazU8kl4WAXN72R93T9UxC0W /vZtqBtadw9DSfFLRW6kYiaJAOGPqpNQOpLEbFX17VzVjofzdiZ9T+zyYHALdKdYAhEc OY05VJM9M3HlBbkZUnjRzD+QwwWQs0Qb6DQb28FsP2F+Gj1BeYS3wmgrXJz9I8Jzn75Y DbQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hU54uRgC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b9si4487725ion.34.2021.08.26.18.49.37; Thu, 26 Aug 2021 18:49:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hU54uRgC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243977AbhH0Btm (ORCPT + 99 others); Thu, 26 Aug 2021 21:49:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:21715 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243971AbhH0Btl (ORCPT ); Thu, 26 Aug 2021 21:49:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1630028932; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DBae8PxX7qj8o1pcDXyyfMAR2sUWBm6T7LaML7kuDgM=; b=hU54uRgCKOhR00NFAchQRwlUm46ZXFlqSFLHW56vx66rDvjujSFdfCt2m/Y1pL1WiPblY/ K/XZEx8ajLQ0qHCtaOCrj4L4jfOv3mk1i9KPlROFESrtV+HAMwreUVz6ANrnJ2NzKm9v/g 7BIeWTPPAjkxoECJbqmwtaFgtjAeF0Q= Received: from mail-oi1-f198.google.com (mail-oi1-f198.google.com [209.85.167.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-442-Ds8BXkRKN2mIervjFK-ekA-1; Thu, 26 Aug 2021 21:48:51 -0400 X-MC-Unique: Ds8BXkRKN2mIervjFK-ekA-1 Received: by mail-oi1-f198.google.com with SMTP id 11-20020aca090b0000b02902684dd1d9a5so1841463oij.4 for ; Thu, 26 Aug 2021 18:48:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=DBae8PxX7qj8o1pcDXyyfMAR2sUWBm6T7LaML7kuDgM=; b=F51OuUJ5gxlCHOd6Njb7GBDFRcmllMaBK0aCKwLOp0wUinTSY6gbk8mKw+1xhx1bzH z7on93he7KNh0ns7ViZ0O6IZnO8zvJ1MBvbork7B8xngmwYsXVt2XAQkBlhI53LyItbI ysYflPWT9Jhoim2cF03nhtk3vqoelXUoWsfPsaFn00GUOU+W5dsRsLXhqdeK43jsXpBz 1VJ6t9MpYxA+EKliVdIhXpIILxuhv7gJtL4nanNXG2EwQfKBDXv2Bs8tlk0owtVcXvul lmgdvzk0Eq/OCCb7dbooZSlAT8LYdmCWO31zRZJI6NklkLTLfeU1CQhgGjvzOedRajus g7aA== X-Gm-Message-State: AOAM533LkLPS9YWv2C5LPcBe2987t1GuxqjYLMmh49TRvXpOCGwe10OU UzyYEZjOCFGQmszN8JLKZDDUiGfrs/Q2G+6VuwevpZXLixwlwvcuAxf2Bx11UtnYDmrAbkxE6L8 OFqY9F2mlpVjlX6aUnnodObxm X-Received: by 2002:a05:6830:913:: with SMTP id v19mr5976667ott.131.1630028930308; Thu, 26 Aug 2021 18:48:50 -0700 (PDT) X-Received: by 2002:a05:6830:913:: with SMTP id v19mr5976653ott.131.1630028930066; Thu, 26 Aug 2021 18:48:50 -0700 (PDT) Received: from redhat.com ([198.99.80.109]) by smtp.gmail.com with ESMTPSA id r28sm734247oiw.45.2021.08.26.18.48.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Aug 2021 18:48:49 -0700 (PDT) Date: Thu, 26 Aug 2021 19:48:48 -0600 From: Alex Williamson To: Colin Xu Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, zhenyuw@linux.intel.com, hang.yuan@linux.intel.com, swee.yee.fonn@intel.com, fred.gao@intel.com Subject: Re: [PATCH] vfio/pci: Add OpRegion 2.0 Extended VBT support. Message-ID: <20210826194848.140e7da7.alex.williamson@redhat.com> In-Reply-To: <365a1c5b-53fe-cd2c-11a1-9678dde0c5@outlook.office365.com> References: <20210813021329.128543-1-colin.xu@intel.com> <20210816163958.04ab019c.alex.williamson@redhat.com> <9dc7eba-2c49-8d40-46b-5f4382a25c1e@outlook.office365.com> <365a1c5b-53fe-cd2c-11a1-9678dde0c5@outlook.office365.com> Organization: Red Hat X-Mailer: Claws Mail 3.18.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 27 Aug 2021 09:36:36 +0800 (CST) Colin Xu wrote: > Hi Alex, > > In addition to the background that devices on market may still need > OpRegion 2.0 support in vfio-pci, do you have other comments to the patch > body? Yes, there were further comments in my first reply below. Thanks, Alex > On Tue, 17 Aug 2021, Colin Xu wrote: > > > On Mon, 16 Aug 2021, Alex Williamson wrote: > > > >> On Fri, 13 Aug 2021 10:13:29 +0800 > >> Colin Xu wrote: > >> > >>> Due to historical reason, some legacy shipped system doesn't follow > >>> OpRegion 2.1 spec but still stick to OpRegion 2.0, in which the extended > >>> VBT is not contigious after OpRegion in physical address, but any > >>> location pointed by RVDA via absolute address. Thus it's impossible > >>> to map a contigious range to hold both OpRegion and extended VBT as 2.1. > >>> > >>> Since the only difference between OpRegion 2.0 and 2.1 is where extended > >>> VBT is stored: For 2.0, RVDA is the absolute address of extended VBT > >>> while for 2.1, RVDA is the relative address of extended VBT to OpRegion > >>> baes, and there is no other difference between OpRegion 2.0 and 2.1, > >>> it's feasible to amend OpRegion support for these legacy system (before > >>> upgrading the system firmware), by kazlloc a range to shadown OpRegion > >>> from the beginning and stitch VBT after closely, patch the shadow > >>> OpRegion version from 2.0 to 2.1, and patch the shadow RVDA to relative > >>> address. So that from the vfio igd OpRegion r/w ops view, only OpRegion > >>> 2.1 is exposed regardless the underneath host OpRegion is 2.0 or 2.1 > >>> if the extended VBT exists. vfio igd OpRegion r/w ops will return either > >>> shadowed data (OpRegion 2.0) or directly from physical address > >>> (OpRegion 2.1+) based on host OpRegion version and RVDA/RVDS. The shadow > >>> mechanism makes it possible to support legacy systems on the market. > >> > >> Which systems does this enable? There's a suggestion above that these > >> systems could update firmware to get OpRegion v2.1 support, why > >> shouldn't we ask users to do that instead? When we added OpRegion v2.1 > >> support we were told that v2.0 support was essentially non-existent, > >> why should we add code to support and old spec with few users for such > >> a niche use case? > > Hi Alex, there was some mis-alignment with the BIOS owner that we were told > > the 2.0 system doesn't for retail but only for internal development. However > > in other projects we DO see the retail market has such systems, including NUC > > NUC6CAYB, some APL industrial PC used in RT system, and some customized APL > > motherboard by commercial virtualization solution. We immediately contact the > > BIOS owner to ask for a clarification and they admit it. These system won't > > get updated BIOS for OpRegion update but still under warranty. That's why the > > OpRegion 2.0 support is still needed. > > > >> > >>> Cc: Zhenyu Wang > >>> Cc: Hang Yuan > >>> Cc: Swee Yee Fonn > >>> Cc: Fred Gao > >>> Signed-off-by: Colin Xu > >>> --- > >>> drivers/vfio/pci/vfio_pci_igd.c | 117 ++++++++++++++++++++------------ > >>> 1 file changed, 75 insertions(+), 42 deletions(-) > >>> > >>> diff --git a/drivers/vfio/pci/vfio_pci_igd.c > >>> b/drivers/vfio/pci/vfio_pci_igd.c > >>> index 228df565e9bc..22b9436a3044 100644 > >>> --- a/drivers/vfio/pci/vfio_pci_igd.c > >>> +++ b/drivers/vfio/pci/vfio_pci_igd.c > >>> @@ -48,7 +48,10 @@ static size_t vfio_pci_igd_rw(struct vfio_pci_device > >>> *vdev, char __user *buf, > >>> static void vfio_pci_igd_release(struct vfio_pci_device *vdev, > >>> struct vfio_pci_region *region) > >>> { > >>> - memunmap(region->data); > >>> + if (is_ioremap_addr(region->data)) > >>> + memunmap(region->data); > >>> + else > >>> + kfree(region->data); > >>> } > >>> > >>> static const struct vfio_pci_regops vfio_pci_igd_regops = { > >>> @@ -59,10 +62,11 @@ static const struct vfio_pci_regops > >>> vfio_pci_igd_regops = { > >>> static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev) > >>> { > >>> __le32 *dwordp = (__le32 *)(vdev->vconfig + OPREGION_PCI_ADDR); > >>> - u32 addr, size; > >>> - void *base; > >>> + u32 addr, size, rvds = 0; > >>> + void *base, *opregionvbt; > >>> int ret; > >>> u16 version; > >>> + u64 rvda = 0; > >>> > >>> ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr); > >>> if (ret) > >>> @@ -89,66 +93,95 @@ static int vfio_pci_igd_opregion_init(struct > >>> vfio_pci_device *vdev) > >>> size *= 1024; /* In KB */ > >>> > >>> /* > >>> - * Support opregion v2.1+ > >>> - * When VBT data exceeds 6KB size and cannot be within mailbox #4, > >>> then > >>> - * the Extended VBT region next to opregion is used to hold the VBT > >>> data. > >>> - * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS > >>> - * (Raw VBT Data Size) from opregion structure member are used to > >>> hold the > >>> - * address from region base and size of VBT data. RVDA/RVDS are not > >>> - * defined before opregion 2.0. > >>> + * OpRegion and VBT: > >>> + * When VBT data doesn't exceed 6KB, it's stored in Mailbox #4. > >>> + * When VBT data exceeds 6KB size, Mailbox #4 is no longer large > >>> enough > >>> + * to hold the VBT data, the Extended VBT region is introduced since > >>> + * OpRegion 2.0 to hold the VBT data. Since OpRegion 2.0, RVDA/RVDS > >>> are > >>> + * introduced to define the extended VBT data location and size. > >>> + * OpRegion 2.0: RVDA defines the absolute physical address of the > >>> + * extended VBT data, RVDS defines the VBT data size. > >>> + * OpRegion 2.1 and above: RVDA defines the relative address of the > >>> + * extended VBT data to OpRegion base, RVDS defines the VBT data > >>> size. > >>> * > >>> - * opregion 2.1+: RVDA is unsigned, relative offset from > >>> - * opregion base, and should point to the end of opregion. > >>> - * otherwise, exposing to userspace to allow read access to > >>> everything between > >>> - * the OpRegion and VBT is not safe. > >>> - * RVDS is defined as size in bytes. > >>> - * > >>> - * opregion 2.0: rvda is the physical VBT address. > >>> - * Since rvda is HPA it cannot be directly used in guest. > >>> - * And it should not be practically available for end user,so it is > >>> not supported. > >>> + * Due to the RVDA difference in OpRegion VBT (also the only diff > >>> between > >>> + * 2.0 and 2.1), while for OpRegion 2.1 and above it's possible to > >>> map > >>> + * a contigious memory to expose OpRegion and VBT r/w via the vfio > >>> + * region, for OpRegion 2.0 shadow and amendment mechanism is used to > >>> + * expose OpRegion and VBT r/w properly. So that from r/w ops view, > >>> only > >>> + * OpRegion 2.1 is exposed regardless underneath Region is 2.0 or > >>> 2.1. > >>> */ > >>> version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION)); > >>> - if (version >= 0x0200) { > >>> - u64 rvda; > >>> - u32 rvds; > >>> > >>> + if (version >= 0x0200) { > >>> rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA)); > >>> rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS)); > >>> + > >>> + /* The extended VBT is valid only when RVDA/RVDS are > >>> non-zero. */ > >>> if (rvda && rvds) { > >>> - /* no support for opregion v2.0 with physical VBT > >>> address */ > >>> - if (version == 0x0200) { > >>> + size += rvds; > >>> + } > >>> + > >>> + /* The extended VBT must follows OpRegion for OpRegion 2.1+ > >>> */ > >>> + if (rvda != size && version > 0x0200) { > >> > >> But we already added rvds to size, this is not compatible with the > >> previous code that required rvda == size BEFORE adding rvds. > >> > >>> + memunmap(base); > >>> + pci_err(vdev->pdev, > >>> + "Extended VBT does not follow opregion on > >>> version 0x%04x\n", > >>> + version); > >>> + return -EINVAL; > >>> + } > >>> + } > >>> + > >>> + if (size != OPREGION_SIZE) { > >>> + /* Allocate memory for OpRegion and extended VBT for 2.0 */ > >>> + if (rvda && rvds && version == 0x0200) { > >>> + void *vbt_base; > >>> + > >>> + vbt_base = memremap(rvda, rvds, MEMREMAP_WB); > >>> + if (!vbt_base) { > >>> memunmap(base); > >>> - pci_err(vdev->pdev, > >>> - "IGD assignment does not support > >>> opregion v2.0 with an extended VBT region\n"); > >>> - return -EINVAL; > >>> + return -ENOMEM; > >>> } > >>> > >>> - if (rvda != size) { > >>> + opregionvbt = kzalloc(size, GFP_KERNEL); > >>> + if (!opregionvbt) { > >>> memunmap(base); > >>> - pci_err(vdev->pdev, > >>> - "Extended VBT does not follow > >>> opregion on version 0x%04x\n", > >>> - version); > >>> - return -EINVAL; > >>> + memunmap(vbt_base); > >>> + return -ENOMEM; > >>> } > >>> > >>> - /* region size for opregion v2.0+: opregion and VBT > >>> size. */ > >>> - size += rvds; > >>> + /* Stitch VBT after OpRegion noncontigious */ > >>> + memcpy(opregionvbt, base, OPREGION_SIZE); > >>> + memcpy(opregionvbt + OPREGION_SIZE, vbt_base, rvds); > >>> + > >>> + /* Patch OpRegion 2.0 to 2.1 */ > >>> + *(__le16 *)(opregionvbt + OPREGION_VERSION) = 0x0201; > >>> + /* Patch RVDA to relative address after OpRegion */ > >>> + *(__le64 *)(opregionvbt + OPREGION_RVDA) = > >>> OPREGION_SIZE; > >> > >> AIUI, the OpRegion is a two-way channel between the IGD device/system > >> BIOS and the driver, numerous fields are writable by the driver. Now > >> the driver writes to a shadow copy of the OpRegion table. What > >> completes the write to the real OpRegion table for consumption by the > >> device/BIOS? Likewise, what updates the fields that are written by the > >> device/BIOS for consumption by the driver? > >> > >> If a shadow copy of the OpRegion detached from the physical table is > >> sufficient here, why wouldn't we always shadow the OpRegion and prevent > >> all userspace writes from touching the real version? Thanks, > >> > >> Alex > >> > >>> + > >>> + memunmap(vbt_base); > >>> + memunmap(base); > >>> + > >>> + /* Register shadow instead of map as vfio_region */ > >>> + base = opregionvbt; > >>> + /* Remap OpRegion + extended VBT for 2.1+ */ > >>> + } else { > >>> + memunmap(base); > >>> + base = memremap(addr, size, MEMREMAP_WB); > >>> + if (!base) > >>> + return -ENOMEM; > >>> } > >>> } > >>> > >>> - if (size != OPREGION_SIZE) { > >>> - memunmap(base); > >>> - base = memremap(addr, size, MEMREMAP_WB); > >>> - if (!base) > >>> - return -ENOMEM; > >>> - } > >>> - > >>> ret = vfio_pci_register_dev_region(vdev, > >>> PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE, > >>> VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, > >>> &vfio_pci_igd_regops, size, VFIO_REGION_INFO_FLAG_READ, base); > >>> if (ret) { > >>> - memunmap(base); > >>> + if (is_ioremap_addr(base)) > >>> + memunmap(base); > >>> + else > >>> + kfree(base); > >>> return ret; > >>> } > >>> > >> > >> > > > > -- > > Best Regards, > > Colin Xu > > > > > > -- > Best Regards, > Colin Xu >