Changes from v4:
- Switch from dev_flags to a bitfield
- Scrubbed improper use of MSE acronym
- Restored the fixes tag to patch 3 (but the other 2 patches are
now pre-reqs -- cc stable 5.8?)
Since commit abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO
access on disabled memory") VFIO now rejects guest MMIO access when the
PCI_COMMAND_MEMORY bit is OFF. This is however not the case for VFs
(fixed in commit ebfa440ce38b ("vfio/pci: Fix SR-IOV VF handling with
MMIO blocking")). Furthermore, on s390 where we always run with at
least a bare-metal hypervisor (LPAR) PCI_COMMAND_MEMORY, unlike Device/
Vendor IDs and BARs, is not emulated when VFs are passed-through to the
OS independently.
Based upon Bjorn's most recent comment [1], I investigated the notion of
setting is_virtfn=1 for VFs passed-through to Linux and not linked to a
parent PF (referred to as a 'detached VF' in my prior post). However,
we rapidly run into issues on how to treat an is_virtfn device with no
linked PF. Further complicating the issue is when you consider the guest
kernel has a passed-through VF but has CONFIG_PCI_IOV=n as in many
locations is_virtfn checking is ifdef'd out altogether and the device is
assumed to be an independent PCI function.
The decision made by VFIO whether to require or emulate a PCI feature
(in this case PCI_COMMAND_MEMORY) is based upon the knowledge it has
about the device, including implicit expectations of what/is not
emulated below VFIO. (ex: is it safe to read vendor/id from config
space?) -- Our firmware layer attempts similar behavior by emulating
things such as vendor/id/BAR access - without these an unlinked VF would
not be usable. But what is or is not emulated by the layer below may be
different based upon which entity is providing the emulation (vfio,
LPAR, some other hypervisor)
So, the proposal here aims to fix the immediate issue of s390
pass-through VFs becoming suddenly unusable by vfio by using a new
bit to identify a VF feature that we know is hardwired to 0 for any
VF (PCI_COMMAND_MEMORY) and de-coupling the need for emulating
PCI_COMMAND_MEMORY from the is_virtfn flag. The exact scope of is_virtfn
and physfn for bare-metal vs guest scenarios and identifying what
features are / are not emulated by the lower-level hypervisors is a much
bigger discussion independent of this limited proposal.
[1]: https://marc.info/?l=linux-pci&m=159856041930022&w=2
Matthew Rosato (3):
PCI/IOV: Mark VFs as not implementing PCI_COMMAND_MEMORY
s390/pci: Mark all VFs as not implementing PCI_COMMAND_MEMORY
vfio/pci: Decouple PCI_COMMAND_MEMORY bit checks from is_virtfn
arch/s390/pci/pci_bus.c | 5 +++--
drivers/pci/iov.c | 1 +
drivers/vfio/pci/vfio_pci_config.c | 24 ++++++++++++++----------
include/linux/pci.h | 1 +
4 files changed, 19 insertions(+), 12 deletions(-)
--
1.8.3.1
For s390 we can have VFs that are passed-through without the associated
PF. Firmware provides an emulation layer to allow these devices to
operate independently, but is missing emulation of the Memory Space
Enable bit. For these as well as linked VFs, set no_command_memory
which specifies these devices do not implement PCI_COMMAND_MEMORY.
Signed-off-by: Matthew Rosato <[email protected]>
Reviewed-by: Niklas Schnelle <[email protected]>
Reviewed-by: Pierre Morel <[email protected]>
---
arch/s390/pci/pci_bus.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
index 5967f30..c93486a 100644
--- a/arch/s390/pci/pci_bus.c
+++ b/arch/s390/pci/pci_bus.c
@@ -197,9 +197,10 @@ void pcibios_bus_add_device(struct pci_dev *pdev)
* With pdev->no_vf_scan the common PCI probing code does not
* perform PF/VF linking.
*/
- if (zdev->vfn)
+ if (zdev->vfn) {
zpci_bus_setup_virtfn(zdev->zbus, pdev, zdev->vfn);
-
+ pdev->no_command_memory = 1;
+ }
}
static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev)
--
1.8.3.1