2009-09-17 23:49:42

by Kay, Allen M

[permalink] [raw]
Subject: [PATCH ACS v3 1/1]

This patch enables P2P upstream forwarding in ACS capable PCIe switches.
This solves two potential problems in virtualization environment where
a PCIe device is assigned to a guest domain using a HW iommu such as VT-d:

1) Unintentional failure caused by guest physical address programmed
into the device's DMA that happens to match the memory address range
of other downstream ports in the same PCIe switch. This causes the PCI
transaction to go to the matching downstream port instead of go to the
root complex to get translated by VT-d as it should be.

2) Malicious guest software intentionally attacks another downstream
PCIe device by programming the DMA address into the assigned device
that matches memory address range of the downstream PCIe port.

We are in process of implementing device filtering software in KVM/XEN
management software to allow device assignment of PCIe devices behind
a PCIe switch only if it has ACS capability and with the P2P upstream
forwarding bits enabled. This patch is intended to work for both KVM
and Xen environments.

Changes from initial to v1:
- removed #define ACS_ENABLE and dev_info() call
- changed ctrl value setting without using if-condition
- fixed ACS #defines in pci_regs.h

Changes from v2 to v3:
- change #define indention to 2 for PCI reg and 1 for bit
position

Signed-off-by: Allen Kay <[email protected]>
Reviewed--by: Mathew Wilcox <[email protected]>
---
drivers/pci/pci.c | 35 +++++++++++++++++++++++++++++++++++
drivers/pci/pci.h | 1 +
drivers/pci/probe.c | 3 +++
include/linux/pci_regs.h | 13 +++++++++++++
4 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6edecff..1171c6d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1533,6 +1533,41 @@ void pci_enable_ari(struct pci_dev *dev)
}

/**
+ * pci_acs_enable - enable ACS if hardware support it
+ * @dev: the PCI device
+ */
+void pci_acs_init(struct pci_dev *dev)
+{
+ int pos;
+ u16 cap;
+ u16 ctrl;
+
+ if (!dev->is_pcie)
+ return;
+
+ pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+ if (!pos)
+ return;
+
+ pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
+ pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+
+ /* Source Validation */
+ ctrl |= (cap & PCI_ACS_SV);
+
+ /* P2P Request Redirect */
+ ctrl |= (cap & PCI_ACS_RR);
+
+ /* P2P Completion Redirect */
+ ctrl |= (cap & PCI_ACS_CR);
+
+ /* Upstream Forwarding */
+ ctrl |= (cap & PCI_ACS_UF);
+
+ pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+}
+
+/**
* pci_swizzle_interrupt_pin - swizzle INTx for device behind bridge
* @dev: the PCI device
* @pin: the INTx pin (1=INTA, 2=INTB, 3=INTD, 4=INTD)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d92d195..ec8e2c1 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -202,6 +202,7 @@ static inline int pci_ari_enabled(struct pci_bus *bus)
{
return bus->self && bus->self->ari_enabled;
}
+extern void pci_acs_init(struct pci_dev *dev);

#ifdef CONFIG_PCI_QUIRKS
extern int pci_is_reassigndev(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8105e32..72b9822 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1014,6 +1014,9 @@ static void pci_init_capabilities(struct pci_dev *dev)

/* Single Root I/O Virtualization */
pci_iov_init(dev);
+
+ /* Access Control Service */
+ pci_acs_init(dev);
}

void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index dd0bed4..d798770 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -502,6 +502,7 @@
#define PCI_EXT_CAP_ID_VC 2
#define PCI_EXT_CAP_ID_DSN 3
#define PCI_EXT_CAP_ID_PWR 4
+#define PCI_EXT_CAP_ID_ACS 13
#define PCI_EXT_CAP_ID_ARI 14
#define PCI_EXT_CAP_ID_ATS 15
#define PCI_EXT_CAP_ID_SRIOV 16
@@ -662,4 +663,16 @@
#define PCI_SRIOV_VFM_MO 0x2 /* Active.MigrateOut */
#define PCI_SRIOV_VFM_AV 0x3 /* Active.Available */

+/* Access Control Service */
+#define PCI_ACS_CAP 0x04 /* ACS Capability Register */
+#define PCI_ACS_SV 0x01 /* Source Validation */
+#define PCI_ACS_TB 0x02 /* Translation Blocking */
+#define PCI_ACS_RR 0x04 /* P2P Request Redirect */
+#define PCI_ACS_CR 0x08 /* P2P Completion Redirect */
+#define PCI_ACS_UF 0x10 /* Upstream Forwarding */
+#define PCI_ACS_EC 0x20 /* P2P Egress Control */
+#define PCI_ACS_DT 0x40 /* Direct Translated P2P */
+#define PCI_ACS_CTRL 0x06 /* ACS Control Register */
+#define PCI_ACS_EGRESS_CTL_V 0x08 /* ACS Egress Control Vector */
+
#endif /* LINUX_PCI_REGS_H */
--
1.6.0.6


2009-09-29 00:00:47

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH ACS v3 1/1]

* Allen Kay ([email protected]) wrote:
> This patch enables P2P upstream forwarding in ACS capable PCIe switches.
> This solves two potential problems in virtualization environment where
> a PCIe device is assigned to a guest domain using a HW iommu such as VT-d:

This may negatively impact p2p traffic throughput for devices that don't
need it. Have you considered this impact or attempted to measure it?

An alternative approach would be to enable this during device assignment.

Also, there is no checking that the relevant path through the topology has
the right capabilties. Is there any reason you left that out? It would
certainly simplify the filtering logic, for example. And given some
states result in undefined behaviour, perhaps it makes sense to check
while enabling ACS.

> 1) Unintentional failure caused by guest physical address programmed
> into the device's DMA that happens to match the memory address range
> of other downstream ports in the same PCIe switch. This causes the PCI
> transaction to go to the matching downstream port instead of go to the
> root complex to get translated by VT-d as it should be.
>
> 2) Malicious guest software intentionally attacks another downstream
> PCIe device by programming the DMA address into the assigned device
> that matches memory address range of the downstream PCIe port.
>
> We are in process of implementing device filtering software in KVM/XEN
> management software to allow device assignment of PCIe devices behind
> a PCIe switch only if it has ACS capability and with the P2P upstream
> forwarding bits enabled. This patch is intended to work for both KVM
> and Xen environments.
>
> Changes from initial to v1:
> - removed #define ACS_ENABLE and dev_info() call
> - changed ctrl value setting without using if-condition
> - fixed ACS #defines in pci_regs.h
>
> Changes from v2 to v3:
> - change #define indention to 2 for PCI reg and 1 for bit
> position
>
> Signed-off-by: Allen Kay <[email protected]>
> Reviewed--by: Mathew Wilcox <[email protected]>
> ---
> drivers/pci/pci.c | 35 +++++++++++++++++++++++++++++++++++
> drivers/pci/pci.h | 1 +
> drivers/pci/probe.c | 3 +++
> include/linux/pci_regs.h | 13 +++++++++++++
> 4 files changed, 52 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 6edecff..1171c6d 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1533,6 +1533,41 @@ void pci_enable_ari(struct pci_dev *dev)
> }
>
> /**
> + * pci_acs_enable - enable ACS if hardware support it
> + * @dev: the PCI device
> + */
> +void pci_acs_init(struct pci_dev *dev)

I'd call it pci_enable_acs...in fact, the kdoc above tries something
close to that ;-)

2009-09-29 17:46:41

by Kay, Allen M

[permalink] [raw]
Subject: RE: [PATCH ACS v3 1/1]

>
>This may negatively impact p2p traffic throughput for devices that don't
>need it. Have you considered this impact or attempted to measure it?
>

As far as I know, there is no existing PCIe devices that have ACS capable PCIe switches. This means this patch will not impact existing P2P devices. On the NHM platform I tested this patch on, only root ports support ACS which has no material impact on PCIe transactions since whatever upstream traffic root port sees is already forwarded to the root complex anyways.

As for future devices that does have ACS capable PCIe switches, this patch can cause potential P2P performance issue as you indicated. Although PCI IOV SIG has yet to make a decision on this issue, it would be reasonable to expect this problem can be mitigated with ATS capable devices. For example, it would be reasonable to expect translated addresses can be routed directly to the peer device while un-translated addresses would have to be routed to the root complex.

By the way, PLX technology announced first such switch on 8/26. We will be take a look at these devices as soon as we get hold of these in our lab.

>
>An alternative approach would be to enable this during device assignment.
>

I have indeed spent some time playing around with a patch that does this. There are some potential drawbacks. Given that PCI is already enabled at the time of device assignment, enabling P2P upstream forwarding might disrupt in flight PCIe transactions. In addition, this means we need separate patches for enabling ACS for KVM and Xen as device assignment for KVM and Xen do not share code paths.

>
>Also, there is no checking that the relevant path through the topology has
>the right capabilties. Is there any reason you left that out? It would
>certainly simplify the filtering logic, for example.
>

Do you mean enable p2p forwarding on all upstream PCIe switches only if all of them are ACS capable? I can see this can potentially simplify filtering software to just check the lowest level PCIe switch.

This appears to be a trade-off between whether we want put the complexity in Linux PCI driver or in the user mode filtering code. In my mind, if we take the view that the device filtering software is the ultimate authority in determining whether a device is assignable, it probably should not trust the host to always do the right thing from virtualization standpoint. If a paranoid filtering software always checks the entire path from the device to the root complex anyways, it might be reasonable to simplify the code in the kernel.

>
>And given some states result in undefined behavior, perhaps it makes sense to check
>while enabling ACS.
>

By "undefined behavior", do you mean when there a mix of ACS and non-ACS capable PCIe switches and P2P upstream forwarding is enabled in ACS capable PCIe switches? I would expect the aggregate behavior is the same as no P2P upstream forwarding.

Let's say we have a configuration where the lowest PCIe switch is ACS capable and it has P2P upstream forwarding enabled. However, the PCIe switch just above it is not ACS capable.

I would expect the following behavior:

1) P2P transaction is forwarded upstream by the ACS capable PCIe switch
2) non-ACS capable switch sends the transaction back
3) ACS capable switch sends the transaction to the peer device.

The aggregate result is the transaction behaved as if all the switches are not ACS capable.

>
> I'd call it pci_enable_acs...in fact, the kdoc above tries something close to that ;-)
>

No problem, I can change the code to incorporate this once we have an agreement on other items.


Allen

2009-09-29 18:46:53

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH ACS v3 1/1]

* Kay, Allen M ([email protected]) wrote:
> >
> >This may negatively impact p2p traffic throughput for devices that don't
> >need it. Have you considered this impact or attempted to measure it?
> >
>
> As far as I know, there is no existing PCIe devices that have ACS capable PCIe switches. This means this patch will not impact existing P2P devices. On the NHM platform I tested this patch on, only root ports support ACS which has no material impact on PCIe transactions since whatever upstream traffic root port sees is already forwarded to the root complex anyways.

BTW, I tested it as well...works as advertised ;-) I did not generate
any errors to see if AER is working, did you?

> As for future devices that does have ACS capable PCIe switches, this patch can cause potential P2P performance issue as you indicated. Although PCI IOV SIG has yet to make a decision on this issue, it would be reasonable to expect this problem can be mitigated with ATS capable devices. For example, it would be reasonable to expect translated addresses can be routed directly to the peer device while un-translated addresses would have to be routed to the root complex.

This means adding direct translated P2P support, no? And what does the
device cache when the IOMMU is in PT mode? I'm mainly voicing concern
about the non-IOV case (i.e. common case) that this impacts by enabling
as a default.

> By the way, PLX technology announced first such switch on 8/26. We will be take a look at these devices as soon as we get hold of these in our lab.

Or multifunction devices, but any testing is good.

> >An alternative approach would be to enable this during device assignment.
> >
>
> I have indeed spent some time playing around with a patch that does this. There are some potential drawbacks. Given that PCI is already enabled at the time of device assignment, enabling P2P upstream forwarding might disrupt in flight PCIe transactions. In addition, this means we need separate patches for enabling ACS for KVM and Xen as device assignment for KVM and Xen do not share code paths.

I hadn't considered in-flight transactions. The device should be
quiesced and reset before assignment, but that doesn't account for
other devices effected by intermediate downstream port ACS changes.
It's also not entirely clear what to do on de-assignment. Would be a
bit odd, but could be driven from userspace.

> >Also, there is no checking that the relevant path through the topology has
> >the right capabilties. Is there any reason you left that out? It would
> >certainly simplify the filtering logic, for example.
> >
>
> Do you mean enable p2p forwarding on all upstream PCIe switches only if all of them are ACS capable? I can see this can potentially simplify filtering software to just check the lowest level PCIe switch.

Yeah, and the RC requirements too, of course.

> This appears to be a trade-off between whether we want put the complexity in Linux PCI driver or in the user mode filtering code. In my mind, if we take the view that the device filtering software is the ultimate authority in determining whether a device is assignable, it probably should not trust the host to always do the right thing from virtualization standpoint. If a paranoid filtering software always checks the entire path from the device to the root complex anyways, it might be reasonable to simplify the code in the kernel.

The reason I mention it is not just filtering, but can create a platform
w/ undefined behaviour w/out checking.

> >And given some states result in undefined behavior, perhaps it makes sense to check
> >while enabling ACS.
> >
>
> By "undefined behavior", do you mean when there a mix of ACS and non-ACS capable PCIe switches and P2P upstream forwarding is enabled in ACS capable PCIe switches? I would expect the aggregate behavior is the same as no P2P upstream forwarding.

Yes, that's what I mean.

> Let's say we have a configuration where the lowest PCIe switch is ACS capable and it has P2P upstream forwarding enabled. However, the PCIe switch just above it is not ACS capable.
>
> I would expect the following behavior:
>
> 1) P2P transaction is forwarded upstream by the ACS capable PCIe switch
> 2) non-ACS capable switch sends the transaction back
> 3) ACS capable switch sends the transaction to the peer device.
>
> The aggregate result is the transaction behaved as if all the switches are not ACS capable.

Right, although it's implementation specific what actually happens.
May not matter much, I just don't know what switch vendors will do.

> > I'd call it pci_enable_acs...in fact, the kdoc above tries something close to that ;-)
> >
>
> No problem, I can change the code to incorporate this once we have an agreement on other items.

thanks,
-chris

2009-09-30 23:33:13

by Kay, Allen M

[permalink] [raw]
Subject: RE: [PATCH ACS v3 1/1]

>
> I did not generateany errors to see if AER is working, did you?
>

No, I believe AER RC interrupts are turned off by default. I can try to turn it on to see if I see anything.

> This means adding direct translated P2P support, no?

Yes, I believe PCIe switches will need to be enhanced to differentiate between transactions with translated addresses and un-translated addresses to support P2P.

>
> And what does the device cache when the IOMMU is in PT mode? I'm mainly voicing concern
> about the non-IOV case (i.e. common case) that this impacts by enabling as a default.
>

I don't think device caches translation when VT-d is in PT mode.

On the other hand, can we say VT-d PT mode is mainly for KVM virtualization use case? If so, is it reasonable to say performance of host P2P in this mode is not of highest priority?

If not, another option is to have a kernel boot parameter to configure an kernel boot instance to be either host kernel optimized or virtualization optimized. I don't know whether this is a reasonable or not ...


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Chris Wright
Sent: Tuesday, September 29, 2009 11:47 AM
To: Kay, Allen M
Cc: Chris Wright; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH ACS v3 1/1]

* Kay, Allen M ([email protected]) wrote:
> >
> >This may negatively impact p2p traffic throughput for devices that don't
> >need it. Have you considered this impact or attempted to measure it?
> >
>
> As far as I know, there is no existing PCIe devices that have ACS capable PCIe switches. This means this patch will not impact existing P2P devices. On the NHM platform I tested this patch on, only root ports support ACS which has no material impact on PCIe transactions since whatever upstream traffic root port sees is already forwarded to the root complex anyways.

BTW, I tested it as well...works as advertised ;-) I did not generate
any errors to see if AER is working, did you?

> As for future devices that does have ACS capable PCIe switches, this patch can cause potential P2P performance issue as you indicated. Although PCI IOV SIG has yet to make a decision on this issue, it would be reasonable to expect this problem can be mitigated with ATS capable devices. For example, it would be reasonable to expect translated addresses can be routed directly to the peer device while un-translated addresses would have to be routed to the root complex.

This means adding direct translated P2P support, no? And what does the
device cache when the IOMMU is in PT mode? I'm mainly voicing concern
about the non-IOV case (i.e. common case) that this impacts by enabling
as a default.

> By the way, PLX technology announced first such switch on 8/26. We will be take a look at these devices as soon as we get hold of these in our lab.

Or multifunction devices, but any testing is good.

> >An alternative approach would be to enable this during device assignment.
> >
>
> I have indeed spent some time playing around with a patch that does this. There are some potential drawbacks. Given that PCI is already enabled at the time of device assignment, enabling P2P upstream forwarding might disrupt in flight PCIe transactions. In addition, this means we need separate patches for enabling ACS for KVM and Xen as device assignment for KVM and Xen do not share code paths.

I hadn't considered in-flight transactions. The device should be
quiesced and reset before assignment, but that doesn't account for
other devices effected by intermediate downstream port ACS changes.
It's also not entirely clear what to do on de-assignment. Would be a
bit odd, but could be driven from userspace.

> >Also, there is no checking that the relevant path through the topology has
> >the right capabilties. Is there any reason you left that out? It would
> >certainly simplify the filtering logic, for example.
> >
>
> Do you mean enable p2p forwarding on all upstream PCIe switches only if all of them are ACS capable? I can see this can potentially simplify filtering software to just check the lowest level PCIe switch.

Yeah, and the RC requirements too, of course.

> This appears to be a trade-off between whether we want put the complexity in Linux PCI driver or in the user mode filtering code. In my mind, if we take the view that the device filtering software is the ultimate authority in determining whether a device is assignable, it probably should not trust the host to always do the right thing from virtualization standpoint. If a paranoid filtering software always checks the entire path from the device to the root complex anyways, it might be reasonable to simplify the code in the kernel.

The reason I mention it is not just filtering, but can create a platform
w/ undefined behaviour w/out checking.

> >And given some states result in undefined behavior, perhaps it makes sense to check
> >while enabling ACS.
> >
>
> By "undefined behavior", do you mean when there a mix of ACS and non-ACS capable PCIe switches and P2P upstream forwarding is enabled in ACS capable PCIe switches? I would expect the aggregate behavior is the same as no P2P upstream forwarding.

Yes, that's what I mean.

> Let's say we have a configuration where the lowest PCIe switch is ACS capable and it has P2P upstream forwarding enabled. However, the PCIe switch just above it is not ACS capable.
>
> I would expect the following behavior:
>
> 1) P2P transaction is forwarded upstream by the ACS capable PCIe switch
> 2) non-ACS capable switch sends the transaction back
> 3) ACS capable switch sends the transaction to the peer device.
>
> The aggregate result is the transaction behaved as if all the switches are not ACS capable.

Right, although it's implementation specific what actually happens.
May not matter much, I just don't know what switch vendors will do.

> > I'd call it pci_enable_acs...in fact, the kdoc above tries something close to that ;-)
> >
>
> No problem, I can change the code to incorporate this once we have an agreement on other items.

thanks,
-chris

2009-10-01 01:17:25

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH ACS v3 1/1]

* Kay, Allen M ([email protected]) wrote:
> On the other hand, can we say VT-d PT mode is mainly for KVM
> virtualization use case? If so, is it reasonable to say performance of
> host P2P in this mode is not of highest priority?

Guess it depends on the workload. Would be helpful to identify a use
case that is p2p heavy (and then the impact of enabling ACS).

> If not, another option is to have a kernel boot parameter to
> configure an kernel boot instance to be either host kernel optimized or
> virtualization optimized. I don't know whether this is a reasonable or
> not ...

I was thinking that ACS could be enabled if an IOMMU is enabled. Not a
perfect fit, but seems reasonably close.

thanks,
-chris

2009-10-06 00:15:07

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH ACS v3 1/1]

On Wed, 30 Sep 2009 18:17:21 -0700
Chris Wright <[email protected]> wrote:

> * Kay, Allen M ([email protected]) wrote:
> > On the other hand, can we say VT-d PT mode is mainly for KVM
> > virtualization use case? If so, is it reasonable to say
> > performance of host P2P in this mode is not of highest priority?
>
> Guess it depends on the workload. Would be helpful to identify a use
> case that is p2p heavy (and then the impact of enabling ACS).
>
> > If not, another option is to have a kernel boot parameter to
> > configure an kernel boot instance to be either host kernel
> > optimized or virtualization optimized. I don't know whether this
> > is a reasonable or not ...
>
> I was thinking that ACS could be enabled if an IOMMU is enabled. Not
> a perfect fit, but seems reasonably close.

Allen, did you want to do these changes as an incremental patch on your
last one or just send me a replacement? Either way is fine with me.

--
Jesse Barnes, Intel Open Source Technology Center

2009-10-06 00:29:43

by Kay, Allen M

[permalink] [raw]
Subject: RE: [PATCH ACS v3 1/1]

Jesse,

I will send you a replacement patch since the size of the patch is not so big.

Allen

-----Original Message-----
From: Jesse Barnes [mailto:[email protected]]
Sent: Monday, October 05, 2009 5:14 PM
To: Chris Wright
Cc: Kay, Allen M; Chris Wright; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH ACS v3 1/1]

On Wed, 30 Sep 2009 18:17:21 -0700
Chris Wright <[email protected]> wrote:

> * Kay, Allen M ([email protected]) wrote:
> > On the other hand, can we say VT-d PT mode is mainly for KVM
> > virtualization use case? If so, is it reasonable to say
> > performance of host P2P in this mode is not of highest priority?
>
> Guess it depends on the workload. Would be helpful to identify a use
> case that is p2p heavy (and then the impact of enabling ACS).
>
> > If not, another option is to have a kernel boot parameter to
> > configure an kernel boot instance to be either host kernel
> > optimized or virtualization optimized. I don't know whether this
> > is a reasonable or not ...
>
> I was thinking that ACS could be enabled if an IOMMU is enabled. Not
> a perfect fit, but seems reasonably close.

Allen, did you want to do these changes as an incremental patch on your
last one or just send me a replacement? Either way is fine with me.

--
Jesse Barnes, Intel Open Source Technology Center

2009-10-06 20:16:50

by Kay, Allen M

[permalink] [raw]
Subject: RE: [PATCH ACS v3 1/1]

> I was thinking that ACS could be enabled if an IOMMU is enabled. Not a
> perfect fit, but seems reasonably close.

Chris, I'm working on a new version of the patch that incorporates iommu check. I'm also adding a check for dom0 kernel as HW iommu is not visible to dom0.

Allen

-----Original Message-----
From: Chris Wright [mailto:[email protected]]
Sent: Wednesday, September 30, 2009 6:17 PM
To: Kay, Allen M
Cc: Chris Wright; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH ACS v3 1/1]

* Kay, Allen M ([email protected]) wrote:
> On the other hand, can we say VT-d PT mode is mainly for KVM
> virtualization use case? If so, is it reasonable to say performance of
> host P2P in this mode is not of highest priority?

Guess it depends on the workload. Would be helpful to identify a use
case that is p2p heavy (and then the impact of enabling ACS).

> If not, another option is to have a kernel boot parameter to
> configure an kernel boot instance to be either host kernel optimized or
> virtualization optimized. I don't know whether this is a reasonable or
> not ...

I was thinking that ACS could be enabled if an IOMMU is enabled. Not a
perfect fit, but seems reasonably close.

thanks,
-chris