From: Kuppuswamy Sathyanarayanan <[email protected]>
Current implementation of ATS, PASID, PRI does not handle VF dependencies
correctly. Following patches addresses this issue.
Changes since v1:
* Added more details about the patches in commit log.
* Removed bulk spec check patch.
* Addressed comments from Bjorn Helgaas.
Kuppuswamy Sathyanarayanan (5):
PCI/ATS: Add PRI support for PCIe VF devices
PCI/ATS: Add PASID support for PCIe VF devices
PCI/ATS: Skip VF ATS initialization if PF does not implement it
PCI/ATS: Disable PF/VF ATS service independently
PCI: Skip Enhanced Allocation (EA) initalization for VF device
drivers/pci/ats.c | 131 +++++++++++++++++++++++++++++++++++++++-----
drivers/pci/pci.c | 7 +++
include/linux/pci.h | 3 +-
3 files changed, 126 insertions(+), 15 deletions(-)
--
2.20.1
From: Kuppuswamy Sathyanarayanan <[email protected]>
If PF does not implement ATS and VF implements/uses it, it might lead to
runtime issues. Also, as per spec r4.0, sec 9.3.7.8, PF should implement
ATS if VF implements it. So add additional check to confirm given device
aligns with the spec.
Cc: Ashok Raj <[email protected]>
Cc: Keith Busch <[email protected]>
Suggested-by: Ashok Raj <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
drivers/pci/ats.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index e7a904e347c3..718e6f414680 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -19,6 +19,7 @@
void pci_ats_init(struct pci_dev *dev)
{
int pos;
+ struct pci_dev *pdev;
if (pci_ats_disabled())
return;
@@ -27,6 +28,17 @@ void pci_ats_init(struct pci_dev *dev)
if (!pos)
return;
+ /*
+ * Per PCIe r4.0, sec 9.3.7.8, if VF implements Address Translation
+ * Services (ATS) Extended Capability then corresponding PF should
+ * also implement it.
+ */
+ if (dev->is_virtfn) {
+ pdev = pci_physfn(dev);
+ if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS))
+ return;
+ }
+
dev->ats_cap = pos;
}
--
2.20.1
From: Kuppuswamy Sathyanarayanan <[email protected]>
As per PCIe r4.0, sec 9.3.6, VF must not implement Enhanced Allocation
Capability. So skip pci_ea_init() for virtual devices.
Cc: Ashok Raj <[email protected]>
Cc: Keith Busch <[email protected]>
Suggested-by: Ashok Raj <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
drivers/pci/pci.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 766f5779db92..0ba3930e3b54 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2976,6 +2976,13 @@ void pci_ea_init(struct pci_dev *dev)
int offset;
int i;
+ /*
+ * Per PCIe r4.0, sec 9.3.6, VF must not implement Enhanced
+ * Allocation Capability.
+ */
+ if (dev->is_virtfn)
+ return;
+
/* find PCI EA capability in list */
ea = pci_find_capability(dev, PCI_CAP_ID_EA);
if (!ea)
--
2.20.1
From: Kuppuswamy Sathyanarayanan <[email protected]>
When IOMMU tries to enable PASID for VF device in
iommu_enable_dev_iotlb(), it always fails because PASID support for PCIe
VF device is currently broken in PCIE driver. Current implementation
expects the given PCIe device (PF & VF) to implement PASID capability
before enabling the PASID support. But this assumption is incorrect. As
per PCIe spec r4.0, sec 9.3.7.14, all VFs associated with PF can only
use the PASID of the PF and not implement it.
Since PASID is shared between PF/VF devices, following rules should
apply.
1. Enable PASID in VF only if its already enabled in PF.
2. Enable PASID in VF only if the requested features matches with PF
config, otherwise return error.
3. When enabling/disabling PASID for VF, instead of configuring the PF
registers just increase/decrease the usage count (pasid_ref_cnt).
4. Disable PASID in PF (configuring the registers) only if pasid_ref_cnt
is zero.
5. When reading PASID features/settings for VF, use registers of
corresponding PF.
Cc: Ashok Raj <[email protected]>
Cc: Keith Busch <[email protected]>
Suggested-by: Ashok Raj <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
drivers/pci/ats.c | 55 ++++++++++++++++++++++++++++++++++++++++++++-
include/linux/pci.h | 1 +
2 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 5582e5d83a3f..e7a904e347c3 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -345,6 +345,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
{
u16 control, supported;
int pos;
+ struct pci_dev *pf;
if (WARN_ON(pdev->pasid_enabled))
return -EBUSY;
@@ -353,7 +354,33 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
return -EINVAL;
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
- if (!pos)
+
+ if (pdev->is_virtfn) {
+ /*
+ * Per PCIe r4.0, sec 9.3.7.14, VF must not implement
+ * Process Address Space ID (PASID) Capability.
+ */
+ if (pos) {
+ dev_err(&pdev->dev, "VF must not implement PASID\n");
+ return -EINVAL
+ }
+ /* Since VF shares PASID with PF, use PF config */
+ pf = pci_physfn(pdev);
+
+ /* If VF config does not match with PF, return error */
+ if (!pf->pasid_enabled || pf->pasid_features != features)
+ return -EINVAL;
+
+ pdev->pasid_features = features;
+ pdev->pasid_enabled = 1;
+
+ /* Increment PF PASID refcount */
+ atomic_inc(&pf->pasid_ref_cnt);
+
+ return 0;
+ }
+
+ if (pdev->is_physfn && !pos)
return -EINVAL;
pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
@@ -382,10 +409,27 @@ void pci_disable_pasid(struct pci_dev *pdev)
{
u16 control = 0;
int pos;
+ struct pci_dev *pf;
if (WARN_ON(!pdev->pasid_enabled))
return;
+ /* All VFs PASID should be disabled before disabling PF PASID*/
+ if (atomic_read(&pdev->pasid_ref_cnt))
+ return;
+
+ if (pdev->is_virtfn) {
+ /* Since VF shares PASID with PF, use PF config. */
+ pf = pci_physfn(pdev);
+
+ /* Decrement PF PASID refcount */
+ atomic_dec(&pf->pasid_ref_cnt);
+
+ pdev->pasid_enabled = 0;
+
+ return;
+ }
+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
if (!pos)
return;
@@ -408,6 +452,9 @@ void pci_restore_pasid_state(struct pci_dev *pdev)
if (!pdev->pasid_enabled)
return;
+ if (pdev->is_virtfn)
+ return;
+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
if (!pos)
return;
@@ -432,6 +479,9 @@ int pci_pasid_features(struct pci_dev *pdev)
u16 supported;
int pos;
+ if (pdev->is_virtfn)
+ pdev = pci_physfn(pdev);
+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
if (!pos)
return -EINVAL;
@@ -488,6 +538,9 @@ int pci_max_pasids(struct pci_dev *pdev)
u16 supported;
int pos;
+ if (pdev->is_virtfn)
+ pdev = pci_physfn(pdev);
+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
if (!pos)
return -EINVAL;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 699c79c99a39..2a761ea63f8d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -454,6 +454,7 @@ struct pci_dev {
#endif
#ifdef CONFIG_PCI_PASID
u16 pasid_features;
+ atomic_t pasid_ref_cnt; /* Number of VFs with PASID enabled */
#endif
#ifdef CONFIG_PCI_P2PDMA
struct pci_p2pdma *p2pdma;
--
2.20.1
From: Kuppuswamy Sathyanarayanan <[email protected]>
When IOMMU tries to enable PRI for VF device in
iommu_enable_dev_iotlb(), it always fails because PRI support for PCIe
VF device is currently broken in PCIE driver. Current implementation
expects the given PCIe device (PF & VF) to implement PRI capability
before enabling the PRI support. But this assumption is incorrect. As
per PCIe spec r4.0, sec 9.3.7.11, all VFs associated with PF can only
use the Page Request Interface (PRI) of the PF and not implement it.
Hence we need to create exception for handling the PRI support for PCIe
VF device.
Since PRI is shared between PF/VF devices, following rules should apply.
1. Enable PRI in VF only if its already enabled in PF.
2. When enabling/disabling PRI for VF, instead of configuring the
registers just increase/decrease the usage count (pri_ref_cnt) of PF.
3. Disable PRI in PF only if pr_ref_cnt is zero.
Cc: Ashok Raj <[email protected]>
Cc: Keith Busch <[email protected]>
Suggested-by: Ashok Raj <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
drivers/pci/ats.c | 53 +++++++++++++++++++++++++++++++++++++++++++--
include/linux/pci.h | 1 +
2 files changed, 52 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 97c08146534a..5582e5d83a3f 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -181,12 +181,39 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
u16 control, status;
u32 max_requests;
int pos;
+ struct pci_dev *pf;
if (WARN_ON(pdev->pri_enabled))
return -EBUSY;
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
- if (!pos)
+
+ if (pdev->is_virtfn) {
+ /*
+ * Per PCIe r4.0, sec 9.3.7.11, VF must not implement PRI
+ * Capability.
+ */
+ if (pos) {
+ dev_err(&pdev->dev, "VF must not implement PRI");
+ return -EINVAL;
+ }
+
+ pf = pci_physfn(pdev);
+
+ /* If VF config does not match with PF, return error */
+ if (!pf->pri_enabled)
+ return -EINVAL;
+
+ pdev->pri_reqs_alloc = pf->pri_reqs_alloc;
+ pdev->pri_enabled = 1;
+
+ /* Increment PF PRI refcount */
+ atomic_inc(&pf->pri_ref_cnt);
+
+ return 0;
+ }
+
+ if (pdev->is_physfn && !pos)
return -EINVAL;
pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
@@ -202,7 +229,6 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
pdev->pri_enabled = 1;
-
return 0;
}
EXPORT_SYMBOL_GPL(pci_enable_pri);
@@ -217,10 +243,27 @@ void pci_disable_pri(struct pci_dev *pdev)
{
u16 control;
int pos;
+ struct pci_dev *pf;
if (WARN_ON(!pdev->pri_enabled))
return;
+ /* All VFs should be disabled before disabling PF */
+ if (atomic_read(&pdev->pri_ref_cnt))
+ return;
+
+ if (pdev->is_virtfn) {
+ /* Since VF shares PRI with PF, use PF config. */
+ pf = pci_physfn(pdev);
+
+ /* Decrement PF PRI refcount */
+ atomic_dec(&pf->pri_ref_cnt);
+
+ pdev->pri_enabled = 0;
+
+ return;
+ }
+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
if (!pos)
return;
@@ -246,6 +289,9 @@ void pci_restore_pri_state(struct pci_dev *pdev)
if (!pdev->pri_enabled)
return;
+ if (pdev->is_virtfn)
+ return;
+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
if (!pos)
return;
@@ -270,6 +316,9 @@ int pci_reset_pri(struct pci_dev *pdev)
if (WARN_ON(pdev->pri_enabled))
return -EBUSY;
+ if (pdev->is_virtfn)
+ return 0;
+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
if (!pos)
return -EINVAL;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 77448215ef5b..699c79c99a39 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -450,6 +450,7 @@ struct pci_dev {
#endif
#ifdef CONFIG_PCI_PRI
u32 pri_reqs_alloc; /* Number of PRI requests allocated */
+ atomic_t pri_ref_cnt; /* Number of VFs with PRI enabled */
#endif
#ifdef CONFIG_PCI_PASID
u16 pasid_features;
--
2.20.1
From: Kuppuswamy Sathyanarayanan <[email protected]>
Currently all VF's needs to be disable their ATS service before
disabling the ATS service in corresponding PF device. But this logic is
incorrect and does not align with the spec. Also it might lead to
some power and performance impact in the system. As per PCIe spec r4.0,
sec 9.3.7.8, ATS Capabilities in VFs and their associated PFs may be
enabled/disabled independently. So remove this dependency logic in
enable/disable code.
Cc: Ashok Raj <[email protected]>
Cc: Keith Busch <[email protected]>
Suggested-by: Ashok Raj <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
drivers/pci/ats.c | 11 -----------
include/linux/pci.h | 1 -
2 files changed, 12 deletions(-)
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 718e6f414680..977f5a1ace40 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -72,8 +72,6 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
pdev = pci_physfn(dev);
if (pdev->ats_stu != ps)
return -EINVAL;
-
- atomic_inc(&pdev->ats_ref_cnt); /* count enabled VFs */
} else {
dev->ats_stu = ps;
ctrl |= PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
@@ -91,20 +89,11 @@ EXPORT_SYMBOL_GPL(pci_enable_ats);
*/
void pci_disable_ats(struct pci_dev *dev)
{
- struct pci_dev *pdev;
u16 ctrl;
if (WARN_ON(!dev->ats_enabled))
return;
- if (atomic_read(&dev->ats_ref_cnt))
- return; /* VFs still enabled */
-
- if (dev->is_virtfn) {
- pdev = pci_physfn(dev);
- atomic_dec(&pdev->ats_ref_cnt);
- }
-
pci_read_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, &ctrl);
ctrl &= ~PCI_ATS_CTRL_ENABLE;
pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2a761ea63f8d..c38cbf8fa03b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -446,7 +446,6 @@ struct pci_dev {
};
u16 ats_cap; /* ATS Capability offset */
u8 ats_stu; /* ATS Smallest Translation Unit */
- atomic_t ats_ref_cnt; /* Number of VFs with ATS enabled */
#endif
#ifdef CONFIG_PCI_PRI
u32 pri_reqs_alloc; /* Number of PRI requests allocated */
--
2.20.1
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> [email protected]
> Sent: Monday, May 6, 2019 12:20 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [PATCH v2 3/5] PCI/ATS: Skip VF ATS initialization if PF does not implement it
>
> From: Kuppuswamy Sathyanarayanan <[email protected]>
>
> If PF does not implement ATS and VF implements/uses it, it might lead to
> runtime issues. Also, as per spec r4.0, sec 9.3.7.8, PF should implement
> ATS if VF implements it. So add additional check to confirm given device
> aligns with the spec.
...
> + /*
> + * Per PCIe r4.0, sec 9.3.7.8, if VF implements Address Translation
> + * Services (ATS) Extended Capability then corresponding PF should
> + * also implement it.
> + */
...
In standardese, "should" means recommended, not required. The PCIe spec uses
"must" for this rule; the comments should match.
Hi All,
On 5/6/19 10:20 AM, [email protected] wrote:
> From: Kuppuswamy Sathyanarayanan <[email protected]>
>
> When IOMMU tries to enable PASID for VF device in
> iommu_enable_dev_iotlb(), it always fails because PASID support for PCIe
> VF device is currently broken in PCIE driver. Current implementation
> expects the given PCIe device (PF & VF) to implement PASID capability
> before enabling the PASID support. But this assumption is incorrect. As
> per PCIe spec r4.0, sec 9.3.7.14, all VFs associated with PF can only
> use the PASID of the PF and not implement it.
>
> Since PASID is shared between PF/VF devices, following rules should
> apply.
>
> 1. Enable PASID in VF only if its already enabled in PF.
I need more eyes on how to interpret the spec for PF/VF PASID enable. As
per PCIe spec r4.0. sec 9.3.7.14, PASID VF/PF dependency details are,
An Endpoint device is permitted to support PASID. The PASID
configuration of the single function (Function or PF) representing the
device is also used by all VFs in the device. A PF is permitted to
implement the PASID capability, but VFs must not implement it. An
Endpoint device is permitted to support PASID. The PASID configuration
of the single function (Function or PF) representing the device is also
used by all VFs in the device. A PF is permitted to implement the PASID
capability, but VFs must not implement it.
Since it says that the VF uses PF configuration, I have interpreted it
as "VF follows what ever we configure for PF and don't enable PASID in
VF if its not enabled in PF" (otherwise it would require us modifying PF
registers for VF use case). But I am not sure whether the my assumption
is correct. In one of our testing, during IOMMU bind of VF device, PASID
enable fails because associated PF device did not enable PASID. But I am
sure whether we should expect PF PASID to be binded before VF.
So please let me know your comments.
> 2. Enable PASID in VF only if the requested features matches with PF
> config, otherwise return error.
> 3. When enabling/disabling PASID for VF, instead of configuring the PF
> registers just increase/decrease the usage count (pasid_ref_cnt).
> 4. Disable PASID in PF (configuring the registers) only if pasid_ref_cnt
> is zero.
> 5. When reading PASID features/settings for VF, use registers of
> corresponding PF.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Keith Busch <[email protected]>
> Suggested-by: Ashok Raj <[email protected]>
> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> ---
> drivers/pci/ats.c | 55 ++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/pci.h | 1 +
> 2 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 5582e5d83a3f..e7a904e347c3 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -345,6 +345,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
> {
> u16 control, supported;
> int pos;
> + struct pci_dev *pf;
>
> if (WARN_ON(pdev->pasid_enabled))
> return -EBUSY;
> @@ -353,7 +354,33 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
> return -EINVAL;
>
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> - if (!pos)
> +
> + if (pdev->is_virtfn) {
> + /*
> + * Per PCIe r4.0, sec 9.3.7.14, VF must not implement
> + * Process Address Space ID (PASID) Capability.
> + */
> + if (pos) {
> + dev_err(&pdev->dev, "VF must not implement PASID\n");
> + return -EINVAL
> + }
> + /* Since VF shares PASID with PF, use PF config */
> + pf = pci_physfn(pdev);
> +
> + /* If VF config does not match with PF, return error */
> + if (!pf->pasid_enabled || pf->pasid_features != features)
> + return -EINVAL;
> +
> + pdev->pasid_features = features;
> + pdev->pasid_enabled = 1;
> +
> + /* Increment PF PASID refcount */
> + atomic_inc(&pf->pasid_ref_cnt);
> +
> + return 0;
> + }
> +
> + if (pdev->is_physfn && !pos)
> return -EINVAL;
>
> pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
> @@ -382,10 +409,27 @@ void pci_disable_pasid(struct pci_dev *pdev)
> {
> u16 control = 0;
> int pos;
> + struct pci_dev *pf;
>
> if (WARN_ON(!pdev->pasid_enabled))
> return;
>
> + /* All VFs PASID should be disabled before disabling PF PASID*/
> + if (atomic_read(&pdev->pasid_ref_cnt))
> + return;
> +
> + if (pdev->is_virtfn) {
> + /* Since VF shares PASID with PF, use PF config. */
> + pf = pci_physfn(pdev);
> +
> + /* Decrement PF PASID refcount */
> + atomic_dec(&pf->pasid_ref_cnt);
> +
> + pdev->pasid_enabled = 0;
> +
> + return;
> + }
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> if (!pos)
> return;
> @@ -408,6 +452,9 @@ void pci_restore_pasid_state(struct pci_dev *pdev)
> if (!pdev->pasid_enabled)
> return;
>
> + if (pdev->is_virtfn)
> + return;
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> if (!pos)
> return;
> @@ -432,6 +479,9 @@ int pci_pasid_features(struct pci_dev *pdev)
> u16 supported;
> int pos;
>
> + if (pdev->is_virtfn)
> + pdev = pci_physfn(pdev);
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> if (!pos)
> return -EINVAL;
> @@ -488,6 +538,9 @@ int pci_max_pasids(struct pci_dev *pdev)
> u16 supported;
> int pos;
>
> + if (pdev->is_virtfn)
> + pdev = pci_physfn(pdev);
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> if (!pos)
> return -EINVAL;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 699c79c99a39..2a761ea63f8d 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -454,6 +454,7 @@ struct pci_dev {
> #endif
> #ifdef CONFIG_PCI_PASID
> u16 pasid_features;
> + atomic_t pasid_ref_cnt; /* Number of VFs with PASID enabled */
> #endif
> #ifdef CONFIG_PCI_P2PDMA
> struct pci_p2pdma *p2pdma;
--
Sathyanarayanan Kuppuswamy
Linux kernel developer
On Mon, May 06, 2019 at 10:20:03AM -0700, [email protected] wrote:
> From: Kuppuswamy Sathyanarayanan <[email protected]>
>
> When IOMMU tries to enable PRI for VF device in
> iommu_enable_dev_iotlb(), it always fails because PRI support for PCIe
> VF device is currently broken in PCIE driver. Current implementation
> expects the given PCIe device (PF & VF) to implement PRI capability
> before enabling the PRI support. But this assumption is incorrect. As
> per PCIe spec r4.0, sec 9.3.7.11, all VFs associated with PF can only
> use the Page Request Interface (PRI) of the PF and not implement it.
> Hence we need to create exception for handling the PRI support for PCIe
> VF device.
>
> Since PRI is shared between PF/VF devices, following rules should apply.
>
> 1. Enable PRI in VF only if its already enabled in PF.
> 2. When enabling/disabling PRI for VF, instead of configuring the
> registers just increase/decrease the usage count (pri_ref_cnt) of PF.
> 3. Disable PRI in PF only if pr_ref_cnt is zero.
s/pr_ref_cnt/pri_ref_cnt/
> Cc: Ashok Raj <[email protected]>
> Cc: Keith Busch <[email protected]>
> Suggested-by: Ashok Raj <[email protected]>
> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> ---
> drivers/pci/ats.c | 53 +++++++++++++++++++++++++++++++++++++++++++--
> include/linux/pci.h | 1 +
> 2 files changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 97c08146534a..5582e5d83a3f 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -181,12 +181,39 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> u16 control, status;
> u32 max_requests;
> int pos;
> + struct pci_dev *pf;
>
> if (WARN_ON(pdev->pri_enabled))
> return -EBUSY;
>
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> - if (!pos)
> +
> + if (pdev->is_virtfn) {
> + /*
> + * Per PCIe r4.0, sec 9.3.7.11, VF must not implement PRI
> + * Capability.
> + */
> + if (pos) {
> + dev_err(&pdev->dev, "VF must not implement PRI");
> + return -EINVAL;
> + }
This seems gratuitous. It finds implementation errors, but since we
correctly use the PF here anyway, it doesn't *need* to prevent PRI on
the VF from working.
I think you should just have:
if (pdev->is_virtfn) {
pf = pci_physfn(pdev);
if (!pf->pri_enabled)
return -EINVAL;
pdev->pri_enabled = 1;
atomic_inc(&pf->pri_ref_cnt);
}
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
if (!pos)
return -EINVAL;
> + pf = pci_physfn(pdev);
> +
> + /* If VF config does not match with PF, return error */
> + if (!pf->pri_enabled)
> + return -EINVAL;
> +
> + pdev->pri_reqs_alloc = pf->pri_reqs_alloc;
Is there any point in setting vf->pri_reqs_alloc? I don't think it's
used for anything since pri_reqs_alloc is only used to write the PF
capability, and we only do that for the PF.
> + pdev->pri_enabled = 1;
> +
> + /* Increment PF PRI refcount */
Superfluous comment, since that's exactly what the code says. It's
always good when the code is so clear that it doesn't require comments :)
> + atomic_inc(&pf->pri_ref_cnt);
> +
> + return 0;
> + }
> +
> + if (pdev->is_physfn && !pos)
> return -EINVAL;
>
> pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> @@ -202,7 +229,6 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>
> pdev->pri_enabled = 1;
> -
> return 0;
> }
> EXPORT_SYMBOL_GPL(pci_enable_pri);
> @@ -217,10 +243,27 @@ void pci_disable_pri(struct pci_dev *pdev)
> {
> u16 control;
> int pos;
> + struct pci_dev *pf;
>
> if (WARN_ON(!pdev->pri_enabled))
> return;
>
> + /* All VFs should be disabled before disabling PF */
> + if (atomic_read(&pdev->pri_ref_cnt))
> + return;
> +
> + if (pdev->is_virtfn) {
> + /* Since VF shares PRI with PF, use PF config. */
> + pf = pci_physfn(pdev);
> +
> + /* Decrement PF PRI refcount */
> + atomic_dec(&pf->pri_ref_cnt);
> +
> + pdev->pri_enabled = 0;
> +
> + return;
> + }
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> if (!pos)
> return;
> @@ -246,6 +289,9 @@ void pci_restore_pri_state(struct pci_dev *pdev)
> if (!pdev->pri_enabled)
> return;
>
> + if (pdev->is_virtfn)
> + return;
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> if (!pos)
> return;
> @@ -270,6 +316,9 @@ int pci_reset_pri(struct pci_dev *pdev)
> if (WARN_ON(pdev->pri_enabled))
> return -EBUSY;
>
> + if (pdev->is_virtfn)
> + return 0;
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> if (!pos)
> return -EINVAL;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 77448215ef5b..699c79c99a39 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -450,6 +450,7 @@ struct pci_dev {
> #endif
> #ifdef CONFIG_PCI_PRI
> u32 pri_reqs_alloc; /* Number of PRI requests allocated */
> + atomic_t pri_ref_cnt; /* Number of VFs with PRI enabled */
> #endif
> #ifdef CONFIG_PCI_PASID
> u16 pasid_features;
> --
> 2.20.1
>
On Mon, May 06, 2019 at 10:20:04AM -0700, [email protected] wrote:
> From: Kuppuswamy Sathyanarayanan <[email protected]>
>
> When IOMMU tries to enable PASID for VF device in
> iommu_enable_dev_iotlb(), it always fails because PASID support for PCIe
> VF device is currently broken in PCIE driver. Current implementation
> expects the given PCIe device (PF & VF) to implement PASID capability
> before enabling the PASID support. But this assumption is incorrect. As
> per PCIe spec r4.0, sec 9.3.7.14, all VFs associated with PF can only
> use the PASID of the PF and not implement it.
>
> Since PASID is shared between PF/VF devices, following rules should
> apply.
>
> 1. Enable PASID in VF only if its already enabled in PF.
s/its/it's/ (same comment applies to PRI patch, IIRC)
> 2. Enable PASID in VF only if the requested features matches with PF
> config, otherwise return error.
> 3. When enabling/disabling PASID for VF, instead of configuring the PF
> registers just increase/decrease the usage count (pasid_ref_cnt).
> 4. Disable PASID in PF (configuring the registers) only if pasid_ref_cnt
> is zero.
> 5. When reading PASID features/settings for VF, use registers of
> corresponding PF.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Keith Busch <[email protected]>
> Suggested-by: Ashok Raj <[email protected]>
> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> ---
> drivers/pci/ats.c | 55 ++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/pci.h | 1 +
> 2 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 5582e5d83a3f..e7a904e347c3 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -345,6 +345,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
> {
> u16 control, supported;
> int pos;
> + struct pci_dev *pf;
>
> if (WARN_ON(pdev->pasid_enabled))
> return -EBUSY;
> @@ -353,7 +354,33 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
> return -EINVAL;
>
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> - if (!pos)
> +
> + if (pdev->is_virtfn) {
> + /*
> + * Per PCIe r4.0, sec 9.3.7.14, VF must not implement
> + * Process Address Space ID (PASID) Capability.
> + */
> + if (pos) {
> + dev_err(&pdev->dev, "VF must not implement PASID\n");
> + return -EINVAL
> + }
Same comment as for PRI.
> + /* Since VF shares PASID with PF, use PF config */
> + pf = pci_physfn(pdev);
> +
> + /* If VF config does not match with PF, return error */
> + if (!pf->pasid_enabled || pf->pasid_features != features)
> + return -EINVAL;
> +
> + pdev->pasid_features = features;
I don't think this is used for VFs.
> + pdev->pasid_enabled = 1;
> +
> + /* Increment PF PASID refcount */
> + atomic_inc(&pf->pasid_ref_cnt);
> +
> + return 0;
> + }
> +
> + if (pdev->is_physfn && !pos)
> return -EINVAL;
>
> pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
> @@ -382,10 +409,27 @@ void pci_disable_pasid(struct pci_dev *pdev)
> {
> u16 control = 0;
> int pos;
> + struct pci_dev *pf;
>
> if (WARN_ON(!pdev->pasid_enabled))
> return;
>
> + /* All VFs PASID should be disabled before disabling PF PASID*/
Add space at end of comment.
> + if (atomic_read(&pdev->pasid_ref_cnt))
> + return;
> +
> + if (pdev->is_virtfn) {
> + /* Since VF shares PASID with PF, use PF config. */
Most single-line, single-sentence comments here do not include a
trailing period.
> + pf = pci_physfn(pdev);
> +
> + /* Decrement PF PASID refcount */
> + atomic_dec(&pf->pasid_ref_cnt);
> +
> + pdev->pasid_enabled = 0;
> +
> + return;
> + }
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> if (!pos)
> return;
> @@ -408,6 +452,9 @@ void pci_restore_pasid_state(struct pci_dev *pdev)
> if (!pdev->pasid_enabled)
> return;
>
> + if (pdev->is_virtfn)
> + return;
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> if (!pos)
> return;
> @@ -432,6 +479,9 @@ int pci_pasid_features(struct pci_dev *pdev)
> u16 supported;
> int pos;
>
> + if (pdev->is_virtfn)
> + pdev = pci_physfn(pdev);
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> if (!pos)
> return -EINVAL;
> @@ -488,6 +538,9 @@ int pci_max_pasids(struct pci_dev *pdev)
> u16 supported;
> int pos;
>
> + if (pdev->is_virtfn)
> + pdev = pci_physfn(pdev);
> +
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> if (!pos)
> return -EINVAL;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 699c79c99a39..2a761ea63f8d 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -454,6 +454,7 @@ struct pci_dev {
> #endif
> #ifdef CONFIG_PCI_PASID
> u16 pasid_features;
> + atomic_t pasid_ref_cnt; /* Number of VFs with PASID enabled */
> #endif
> #ifdef CONFIG_PCI_P2PDMA
> struct pci_p2pdma *p2pdma;
> --
> 2.20.1
>
On Wed, May 29, 2019 at 05:57:14PM -0500, Bjorn Helgaas wrote:
> On Mon, May 06, 2019 at 10:20:03AM -0700, [email protected] wrote:
> > From: Kuppuswamy Sathyanarayanan <[email protected]>
> >
> > When IOMMU tries to enable PRI for VF device in
> > iommu_enable_dev_iotlb(), it always fails because PRI support for PCIe
> > VF device is currently broken in PCIE driver. Current implementation
> > expects the given PCIe device (PF & VF) to implement PRI capability
> > before enabling the PRI support. But this assumption is incorrect. As
> > per PCIe spec r4.0, sec 9.3.7.11, all VFs associated with PF can only
> > use the Page Request Interface (PRI) of the PF and not implement it.
> > Hence we need to create exception for handling the PRI support for PCIe
> > VF device.
> >
> > Since PRI is shared between PF/VF devices, following rules should apply.
> >
> > 1. Enable PRI in VF only if its already enabled in PF.
> > 2. When enabling/disabling PRI for VF, instead of configuring the
> > registers just increase/decrease the usage count (pri_ref_cnt) of PF.
> > 3. Disable PRI in PF only if pr_ref_cnt is zero.
>
> s/pr_ref_cnt/pri_ref_cnt/
>
> > Cc: Ashok Raj <[email protected]>
> > Cc: Keith Busch <[email protected]>
> > Suggested-by: Ashok Raj <[email protected]>
> > Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> > ---
> > drivers/pci/ats.c | 53 +++++++++++++++++++++++++++++++++++++++++++--
> > include/linux/pci.h | 1 +
> > 2 files changed, 52 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > index 97c08146534a..5582e5d83a3f 100644
> > --- a/drivers/pci/ats.c
> > +++ b/drivers/pci/ats.c
> > @@ -181,12 +181,39 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> > u16 control, status;
> > u32 max_requests;
> > int pos;
> > + struct pci_dev *pf;
> >
> > if (WARN_ON(pdev->pri_enabled))
> > return -EBUSY;
> >
> > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > - if (!pos)
> > +
> > + if (pdev->is_virtfn) {
> > + /*
> > + * Per PCIe r4.0, sec 9.3.7.11, VF must not implement PRI
> > + * Capability.
> > + */
> > + if (pos) {
> > + dev_err(&pdev->dev, "VF must not implement PRI");
> > + return -EINVAL;
> > + }
>
> This seems gratuitous. It finds implementation errors, but since we
> correctly use the PF here anyway, it doesn't *need* to prevent PRI on
> the VF from working.
>
> I think you should just have:
>
> if (pdev->is_virtfn) {
> pf = pci_physfn(pdev);
> if (!pf->pri_enabled)
> return -EINVAL;
This would be incorrect. Since if we never did any bind_mm to the PF
PRI would not have been enabled. Currently this is done in the IOMMU
driver, and not in the device driver.
I suppose we should enable PF capability if its not enabled. Same
comment would be applicable for PASID as well.
>
> pdev->pri_enabled = 1;
> atomic_inc(&pf->pri_ref_cnt);
> }
>
> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> if (!pos)
> return -EINVAL;
>
> > + pf = pci_physfn(pdev);
> > +
> > + /* If VF config does not match with PF, return error */
> > + if (!pf->pri_enabled)
> > + return -EINVAL;
> > +
> > + pdev->pri_reqs_alloc = pf->pri_reqs_alloc;
>
> Is there any point in setting vf->pri_reqs_alloc? I don't think it's
> used for anything since pri_reqs_alloc is only used to write the PF
> capability, and we only do that for the PF.
>
> > + pdev->pri_enabled = 1;
> > +
> > + /* Increment PF PRI refcount */
>
> Superfluous comment, since that's exactly what the code says. It's
> always good when the code is so clear that it doesn't require comments :)
>
> > + atomic_inc(&pf->pri_ref_cnt);
> > +
> > + return 0;
> > + }
> > +
> > + if (pdev->is_physfn && !pos)
> > return -EINVAL;
> >
> > pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> > @@ -202,7 +229,6 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> > pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> >
> > pdev->pri_enabled = 1;
> > -
> > return 0;
> > }
> > EXPORT_SYMBOL_GPL(pci_enable_pri);
> > @@ -217,10 +243,27 @@ void pci_disable_pri(struct pci_dev *pdev)
> > {
> > u16 control;
> > int pos;
> > + struct pci_dev *pf;
> >
> > if (WARN_ON(!pdev->pri_enabled))
> > return;
> >
> > + /* All VFs should be disabled before disabling PF */
> > + if (atomic_read(&pdev->pri_ref_cnt))
> > + return;
> > +
> > + if (pdev->is_virtfn) {
> > + /* Since VF shares PRI with PF, use PF config. */
> > + pf = pci_physfn(pdev);
> > +
> > + /* Decrement PF PRI refcount */
> > + atomic_dec(&pf->pri_ref_cnt);
> > +
> > + pdev->pri_enabled = 0;
> > +
> > + return;
> > + }
> > +
> > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > if (!pos)
> > return;
> > @@ -246,6 +289,9 @@ void pci_restore_pri_state(struct pci_dev *pdev)
> > if (!pdev->pri_enabled)
> > return;
> >
> > + if (pdev->is_virtfn)
> > + return;
> > +
> > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > if (!pos)
> > return;
> > @@ -270,6 +316,9 @@ int pci_reset_pri(struct pci_dev *pdev)
> > if (WARN_ON(pdev->pri_enabled))
> > return -EBUSY;
> >
> > + if (pdev->is_virtfn)
> > + return 0;
> > +
> > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > if (!pos)
> > return -EINVAL;
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 77448215ef5b..699c79c99a39 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -450,6 +450,7 @@ struct pci_dev {
> > #endif
> > #ifdef CONFIG_PCI_PRI
> > u32 pri_reqs_alloc; /* Number of PRI requests allocated */
> > + atomic_t pri_ref_cnt; /* Number of VFs with PRI enabled */
> > #endif
> > #ifdef CONFIG_PCI_PASID
> > u16 pasid_features;
> > --
> > 2.20.1
> >
On 5/29/19 4:04 PM, Raj, Ashok wrote:
> On Wed, May 29, 2019 at 05:57:14PM -0500, Bjorn Helgaas wrote:
>> On Mon, May 06, 2019 at 10:20:03AM -0700, [email protected] wrote:
>>> From: Kuppuswamy Sathyanarayanan <[email protected]>
>>>
>>> When IOMMU tries to enable PRI for VF device in
>>> iommu_enable_dev_iotlb(), it always fails because PRI support for PCIe
>>> VF device is currently broken in PCIE driver. Current implementation
>>> expects the given PCIe device (PF & VF) to implement PRI capability
>>> before enabling the PRI support. But this assumption is incorrect. As
>>> per PCIe spec r4.0, sec 9.3.7.11, all VFs associated with PF can only
>>> use the Page Request Interface (PRI) of the PF and not implement it.
>>> Hence we need to create exception for handling the PRI support for PCIe
>>> VF device.
>>>
>>> Since PRI is shared between PF/VF devices, following rules should apply.
>>>
>>> 1. Enable PRI in VF only if its already enabled in PF.
>>> 2. When enabling/disabling PRI for VF, instead of configuring the
>>> registers just increase/decrease the usage count (pri_ref_cnt) of PF.
>>> 3. Disable PRI in PF only if pr_ref_cnt is zero.
>> s/pr_ref_cnt/pri_ref_cnt/
>>
>>> Cc: Ashok Raj <[email protected]>
>>> Cc: Keith Busch <[email protected]>
>>> Suggested-by: Ashok Raj <[email protected]>
>>> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
>>> ---
>>> drivers/pci/ats.c | 53 +++++++++++++++++++++++++++++++++++++++++++--
>>> include/linux/pci.h | 1 +
>>> 2 files changed, 52 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
>>> index 97c08146534a..5582e5d83a3f 100644
>>> --- a/drivers/pci/ats.c
>>> +++ b/drivers/pci/ats.c
>>> @@ -181,12 +181,39 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>>> u16 control, status;
>>> u32 max_requests;
>>> int pos;
>>> + struct pci_dev *pf;
>>>
>>> if (WARN_ON(pdev->pri_enabled))
>>> return -EBUSY;
>>>
>>> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>>> - if (!pos)
>>> +
>>> + if (pdev->is_virtfn) {
>>> + /*
>>> + * Per PCIe r4.0, sec 9.3.7.11, VF must not implement PRI
>>> + * Capability.
>>> + */
>>> + if (pos) {
>>> + dev_err(&pdev->dev, "VF must not implement PRI");
>>> + return -EINVAL;
>>> + }
>> This seems gratuitous. It finds implementation errors, but since we
>> correctly use the PF here anyway, it doesn't *need* to prevent PRI on
>> the VF from working.
>>
>> I think you should just have:
>>
>> if (pdev->is_virtfn) {
>> pf = pci_physfn(pdev);
>> if (!pf->pri_enabled)
>> return -EINVAL;
> This would be incorrect. Since if we never did any bind_mm to the PF
> PRI would not have been enabled. Currently this is done in the IOMMU
> driver, and not in the device driver.
>
> I suppose we should enable PF capability if its not enabled. Same
> comment would be applicable for PASID as well.
I am currently working on a fix to handle the bind issue (VF binding
before PF). My next version will have this update.
But, regarding VF spec compliance checks, Is there any issue in having
them in enable code ? Perhaps I can change dev_err to dev_warn and not
return error if it found implementation errors. I found it useful to
have them because it helped me in finding some faulty devices during my
testing. Let me know your comments.
>
>
>> pdev->pri_enabled = 1;
>> atomic_inc(&pf->pri_ref_cnt);
>> }
>>
>> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>> if (!pos)
>> return -EINVAL;
>>
>>> + pf = pci_physfn(pdev);
>>> +
>>> + /* If VF config does not match with PF, return error */
>>> + if (!pf->pri_enabled)
>>> + return -EINVAL;
>>> +
>>> + pdev->pri_reqs_alloc = pf->pri_reqs_alloc;
>> Is there any point in setting vf->pri_reqs_alloc? I don't think it's
>> used for anything since pri_reqs_alloc is only used to write the PF
>> capability, and we only do that for the PF.
>>
>>> + pdev->pri_enabled = 1;
>>> +
>>> + /* Increment PF PRI refcount */
>> Superfluous comment, since that's exactly what the code says. It's
>> always good when the code is so clear that it doesn't require comments :)
>>
>>> + atomic_inc(&pf->pri_ref_cnt);
>>> +
>>> + return 0;
>>> + }
>>> +
>>> + if (pdev->is_physfn && !pos)
>>> return -EINVAL;
>>>
>>> pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
>>> @@ -202,7 +229,6 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>>> pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>>>
>>> pdev->pri_enabled = 1;
>>> -
>>> return 0;
>>> }
>>> EXPORT_SYMBOL_GPL(pci_enable_pri);
>>> @@ -217,10 +243,27 @@ void pci_disable_pri(struct pci_dev *pdev)
>>> {
>>> u16 control;
>>> int pos;
>>> + struct pci_dev *pf;
>>>
>>> if (WARN_ON(!pdev->pri_enabled))
>>> return;
>>>
>>> + /* All VFs should be disabled before disabling PF */
>>> + if (atomic_read(&pdev->pri_ref_cnt))
>>> + return;
>>> +
>>> + if (pdev->is_virtfn) {
>>> + /* Since VF shares PRI with PF, use PF config. */
>>> + pf = pci_physfn(pdev);
>>> +
>>> + /* Decrement PF PRI refcount */
>>> + atomic_dec(&pf->pri_ref_cnt);
>>> +
>>> + pdev->pri_enabled = 0;
>>> +
>>> + return;
>>> + }
>>> +
>>> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>>> if (!pos)
>>> return;
>>> @@ -246,6 +289,9 @@ void pci_restore_pri_state(struct pci_dev *pdev)
>>> if (!pdev->pri_enabled)
>>> return;
>>>
>>> + if (pdev->is_virtfn)
>>> + return;
>>> +
>>> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>>> if (!pos)
>>> return;
>>> @@ -270,6 +316,9 @@ int pci_reset_pri(struct pci_dev *pdev)
>>> if (WARN_ON(pdev->pri_enabled))
>>> return -EBUSY;
>>>
>>> + if (pdev->is_virtfn)
>>> + return 0;
>>> +
>>> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>>> if (!pos)
>>> return -EINVAL;
>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>> index 77448215ef5b..699c79c99a39 100644
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -450,6 +450,7 @@ struct pci_dev {
>>> #endif
>>> #ifdef CONFIG_PCI_PRI
>>> u32 pri_reqs_alloc; /* Number of PRI requests allocated */
>>> + atomic_t pri_ref_cnt; /* Number of VFs with PRI enabled */
>>> #endif
>>> #ifdef CONFIG_PCI_PASID
>>> u16 pasid_features;
>>> --
>>> 2.20.1
>>>
--
Sathyanarayanan Kuppuswamy
Linux kernel developer
On Mon, May 06, 2019 at 10:20:05AM -0700, [email protected] wrote:
> From: Kuppuswamy Sathyanarayanan <[email protected]>
>
> If PF does not implement ATS and VF implements/uses it, it might lead to
> runtime issues. Also, as per spec r4.0, sec 9.3.7.8, PF should implement
> ATS if VF implements it. So add additional check to confirm given device
> aligns with the spec.
"might lead to runtime issues" is pretty wishy-washy. I really don't
want to prevent some device from working merely because it has
something in config space that doesn't follow the spec exactly but we
never touch.
> Cc: Ashok Raj <[email protected]>
> Cc: Keith Busch <[email protected]>
> Suggested-by: Ashok Raj <[email protected]>
> Reviewed-by: Keith Busch <[email protected]>
> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> ---
> drivers/pci/ats.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index e7a904e347c3..718e6f414680 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -19,6 +19,7 @@
> void pci_ats_init(struct pci_dev *dev)
> {
> int pos;
> + struct pci_dev *pdev;
>
> if (pci_ats_disabled())
> return;
> @@ -27,6 +28,17 @@ void pci_ats_init(struct pci_dev *dev)
> if (!pos)
> return;
>
> + /*
> + * Per PCIe r4.0, sec 9.3.7.8, if VF implements Address Translation
> + * Services (ATS) Extended Capability then corresponding PF should
> + * also implement it.
> + */
> + if (dev->is_virtfn) {
> + pdev = pci_physfn(dev);
> + if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS))
> + return;
> + }
> +
> dev->ats_cap = pos;
> }
>
> --
> 2.20.1
>
On Wed, May 29, 2019 at 04:04:27PM -0700, Raj, Ashok wrote:
> On Wed, May 29, 2019 at 05:57:14PM -0500, Bjorn Helgaas wrote:
> > On Mon, May 06, 2019 at 10:20:03AM -0700, [email protected] wrote:
> > > From: Kuppuswamy Sathyanarayanan <[email protected]>
> > >
> > > When IOMMU tries to enable PRI for VF device in
> > > iommu_enable_dev_iotlb(), it always fails because PRI support for PCIe
> > > VF device is currently broken in PCIE driver. Current implementation
> > > expects the given PCIe device (PF & VF) to implement PRI capability
> > > before enabling the PRI support. But this assumption is incorrect. As
> > > per PCIe spec r4.0, sec 9.3.7.11, all VFs associated with PF can only
> > > use the Page Request Interface (PRI) of the PF and not implement it.
> > > Hence we need to create exception for handling the PRI support for PCIe
> > > VF device.
> > >
> > > Since PRI is shared between PF/VF devices, following rules should apply.
> > >
> > > 1. Enable PRI in VF only if its already enabled in PF.
> > > 2. When enabling/disabling PRI for VF, instead of configuring the
> > > registers just increase/decrease the usage count (pri_ref_cnt) of PF.
> > > 3. Disable PRI in PF only if pr_ref_cnt is zero.
> >
> > s/pr_ref_cnt/pri_ref_cnt/
> >
> > > Cc: Ashok Raj <[email protected]>
> > > Cc: Keith Busch <[email protected]>
> > > Suggested-by: Ashok Raj <[email protected]>
> > > Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> > > ---
> > > drivers/pci/ats.c | 53 +++++++++++++++++++++++++++++++++++++++++++--
> > > include/linux/pci.h | 1 +
> > > 2 files changed, 52 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > > index 97c08146534a..5582e5d83a3f 100644
> > > --- a/drivers/pci/ats.c
> > > +++ b/drivers/pci/ats.c
> > > @@ -181,12 +181,39 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> > > u16 control, status;
> > > u32 max_requests;
> > > int pos;
> > > + struct pci_dev *pf;
> > >
> > > if (WARN_ON(pdev->pri_enabled))
> > > return -EBUSY;
> > >
> > > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > > - if (!pos)
> > > +
> > > + if (pdev->is_virtfn) {
> > > + /*
> > > + * Per PCIe r4.0, sec 9.3.7.11, VF must not implement PRI
> > > + * Capability.
> > > + */
> > > + if (pos) {
> > > + dev_err(&pdev->dev, "VF must not implement PRI");
> > > + return -EINVAL;
> > > + }
> >
> > This seems gratuitous. It finds implementation errors, but since we
> > correctly use the PF here anyway, it doesn't *need* to prevent PRI on
> > the VF from working.
> >
> > I think you should just have:
> >
> > if (pdev->is_virtfn) {
> > pf = pci_physfn(pdev);
> > if (!pf->pri_enabled)
> > return -EINVAL;
>
> This would be incorrect. Since if we never did any bind_mm to the PF
> PRI would not have been enabled. Currently this is done in the IOMMU
> driver, and not in the device driver.
This is functionally the same as the original patch, only omitting the
"VF must not implement PRI" check.
> I suppose we should enable PF capability if its not enabled. Same
> comment would be applicable for PASID as well.
Operating on a device other than the one the driver owns opens the
issue of mutual exclusion and races, so would require careful
scrutiny. Are PRI/PASID things that could be *always* enabled for the
PF at enumeration-time, or do we have to wait until a driver claims
the VF? If the latter, are there coordination issues between drivers
of different VFs?
> > pdev->pri_enabled = 1;
> > atomic_inc(&pf->pri_ref_cnt);
> > }
> >
> > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > if (!pos)
> > return -EINVAL;
> >
> > > + pf = pci_physfn(pdev);
> > > +
> > > + /* If VF config does not match with PF, return error */
> > > + if (!pf->pri_enabled)
> > > + return -EINVAL;
> > > +
> > > + pdev->pri_reqs_alloc = pf->pri_reqs_alloc;
> >
> > Is there any point in setting vf->pri_reqs_alloc? I don't think it's
> > used for anything since pri_reqs_alloc is only used to write the PF
> > capability, and we only do that for the PF.
> >
> > > + pdev->pri_enabled = 1;
> > > +
> > > + /* Increment PF PRI refcount */
> >
> > Superfluous comment, since that's exactly what the code says. It's
> > always good when the code is so clear that it doesn't require comments :)
> >
> > > + atomic_inc(&pf->pri_ref_cnt);
> > > +
> > > + return 0;
> > > + }
> > > +
> > > + if (pdev->is_physfn && !pos)
> > > return -EINVAL;
> > >
> > > pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> > > @@ -202,7 +229,6 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> > > pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> > >
> > > pdev->pri_enabled = 1;
> > > -
> > > return 0;
On Wed, May 29, 2019 at 04:24:05PM -0700, sathyanarayanan kuppuswamy wrote:
> But, regarding VF spec compliance checks, Is there any issue in having them
> in enable code ? Perhaps I can change dev_err to dev_warn and not return
> error if it found implementation errors. I found it useful to have them
> because it helped me in finding some faulty devices during my testing. Let
> me know your comments.
If you need quirks to make these non-compliant devices usable, we
should check for compliance. If not, my personal opinion is that we
shouldn't touch things we don't need.
Bjorn
On Thu, May 30, 2019 at 08:17:38AM -0500, Bjorn Helgaas wrote:
> On Wed, May 29, 2019 at 04:04:27PM -0700, Raj, Ashok wrote:
> > On Wed, May 29, 2019 at 05:57:14PM -0500, Bjorn Helgaas wrote:
> > > On Mon, May 06, 2019 at 10:20:03AM -0700, [email protected] wrote:
> > > > From: Kuppuswamy Sathyanarayanan <[email protected]>
> > > >
> > > > When IOMMU tries to enable PRI for VF device in
> > > > iommu_enable_dev_iotlb(), it always fails because PRI support for PCIe
> > > > VF device is currently broken in PCIE driver. Current implementation
> > > > expects the given PCIe device (PF & VF) to implement PRI capability
> > > > before enabling the PRI support. But this assumption is incorrect. As
> > > > per PCIe spec r4.0, sec 9.3.7.11, all VFs associated with PF can only
> > > > use the Page Request Interface (PRI) of the PF and not implement it.
> > > > Hence we need to create exception for handling the PRI support for PCIe
> > > > VF device.
> > > >
> > > > Since PRI is shared between PF/VF devices, following rules should apply.
> > > >
> > > > 1. Enable PRI in VF only if its already enabled in PF.
> > > > 2. When enabling/disabling PRI for VF, instead of configuring the
> > > > registers just increase/decrease the usage count (pri_ref_cnt) of PF.
> > > > 3. Disable PRI in PF only if pr_ref_cnt is zero.
> > >
> > > s/pr_ref_cnt/pri_ref_cnt/
> > >
> > > > Cc: Ashok Raj <[email protected]>
> > > > Cc: Keith Busch <[email protected]>
> > > > Suggested-by: Ashok Raj <[email protected]>
> > > > Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> > > > ---
> > > > drivers/pci/ats.c | 53 +++++++++++++++++++++++++++++++++++++++++++--
> > > > include/linux/pci.h | 1 +
> > > > 2 files changed, 52 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > > > index 97c08146534a..5582e5d83a3f 100644
> > > > --- a/drivers/pci/ats.c
> > > > +++ b/drivers/pci/ats.c
> > > > @@ -181,12 +181,39 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> > > > u16 control, status;
> > > > u32 max_requests;
> > > > int pos;
> > > > + struct pci_dev *pf;
> > > >
> > > > if (WARN_ON(pdev->pri_enabled))
> > > > return -EBUSY;
> > > >
> > > > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > > > - if (!pos)
> > > > +
> > > > + if (pdev->is_virtfn) {
> > > > + /*
> > > > + * Per PCIe r4.0, sec 9.3.7.11, VF must not implement PRI
> > > > + * Capability.
> > > > + */
> > > > + if (pos) {
> > > > + dev_err(&pdev->dev, "VF must not implement PRI");
> > > > + return -EINVAL;
> > > > + }
> > >
> > > This seems gratuitous. It finds implementation errors, but since we
> > > correctly use the PF here anyway, it doesn't *need* to prevent PRI on
> > > the VF from working.
> > >
> > > I think you should just have:
> > >
> > > if (pdev->is_virtfn) {
> > > pf = pci_physfn(pdev);
> > > if (!pf->pri_enabled)
> > > return -EINVAL;
> >
> > This would be incorrect. Since if we never did any bind_mm to the PF
> > PRI would not have been enabled. Currently this is done in the IOMMU
> > driver, and not in the device driver.
>
> This is functionally the same as the original patch, only omitting the
> "VF must not implement PRI" check.
>
> > I suppose we should enable PF capability if its not enabled. Same
> > comment would be applicable for PASID as well.
>
> Operating on a device other than the one the driver owns opens the
> issue of mutual exclusion and races, so would require careful
> scrutiny. Are PRI/PASID things that could be *always* enabled for the
> PF at enumeration-time, or do we have to wait until a driver claims
> the VF? If the latter, are there coordination issues between drivers
> of different VFs?
I suppose that's a reasonably good alternative. You mean we could
do this when VF's are being created? Otherwise we can do this as its
done today, on demand for all normal PF's.
Cheers,
Ashok
On 5/30/19 10:20 AM, Raj, Ashok wrote:
> On Thu, May 30, 2019 at 08:17:38AM -0500, Bjorn Helgaas wrote:
>> On Wed, May 29, 2019 at 04:04:27PM -0700, Raj, Ashok wrote:
>>> On Wed, May 29, 2019 at 05:57:14PM -0500, Bjorn Helgaas wrote:
>>>> On Mon, May 06, 2019 at 10:20:03AM -0700, [email protected] wrote:
>>>>> From: Kuppuswamy Sathyanarayanan <[email protected]>
>>>>>
>>>>> When IOMMU tries to enable PRI for VF device in
>>>>> iommu_enable_dev_iotlb(), it always fails because PRI support for PCIe
>>>>> VF device is currently broken in PCIE driver. Current implementation
>>>>> expects the given PCIe device (PF & VF) to implement PRI capability
>>>>> before enabling the PRI support. But this assumption is incorrect. As
>>>>> per PCIe spec r4.0, sec 9.3.7.11, all VFs associated with PF can only
>>>>> use the Page Request Interface (PRI) of the PF and not implement it.
>>>>> Hence we need to create exception for handling the PRI support for PCIe
>>>>> VF device.
>>>>>
>>>>> Since PRI is shared between PF/VF devices, following rules should apply.
>>>>>
>>>>> 1. Enable PRI in VF only if its already enabled in PF.
>>>>> 2. When enabling/disabling PRI for VF, instead of configuring the
>>>>> registers just increase/decrease the usage count (pri_ref_cnt) of PF.
>>>>> 3. Disable PRI in PF only if pr_ref_cnt is zero.
>>>> s/pr_ref_cnt/pri_ref_cnt/
>>>>
>>>>> Cc: Ashok Raj <[email protected]>
>>>>> Cc: Keith Busch <[email protected]>
>>>>> Suggested-by: Ashok Raj <[email protected]>
>>>>> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
>>>>> ---
>>>>> drivers/pci/ats.c | 53 +++++++++++++++++++++++++++++++++++++++++++--
>>>>> include/linux/pci.h | 1 +
>>>>> 2 files changed, 52 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
>>>>> index 97c08146534a..5582e5d83a3f 100644
>>>>> --- a/drivers/pci/ats.c
>>>>> +++ b/drivers/pci/ats.c
>>>>> @@ -181,12 +181,39 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>>>>> u16 control, status;
>>>>> u32 max_requests;
>>>>> int pos;
>>>>> + struct pci_dev *pf;
>>>>>
>>>>> if (WARN_ON(pdev->pri_enabled))
>>>>> return -EBUSY;
>>>>>
>>>>> pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>>>>> - if (!pos)
>>>>> +
>>>>> + if (pdev->is_virtfn) {
>>>>> + /*
>>>>> + * Per PCIe r4.0, sec 9.3.7.11, VF must not implement PRI
>>>>> + * Capability.
>>>>> + */
>>>>> + if (pos) {
>>>>> + dev_err(&pdev->dev, "VF must not implement PRI");
>>>>> + return -EINVAL;
>>>>> + }
>>>> This seems gratuitous. It finds implementation errors, but since we
>>>> correctly use the PF here anyway, it doesn't *need* to prevent PRI on
>>>> the VF from working.
>>>>
>>>> I think you should just have:
>>>>
>>>> if (pdev->is_virtfn) {
>>>> pf = pci_physfn(pdev);
>>>> if (!pf->pri_enabled)
>>>> return -EINVAL;
>>> This would be incorrect. Since if we never did any bind_mm to the PF
>>> PRI would not have been enabled. Currently this is done in the IOMMU
>>> driver, and not in the device driver.
>> This is functionally the same as the original patch, only omitting the
>> "VF must not implement PRI" check.
>>
>>> I suppose we should enable PF capability if its not enabled. Same
>>> comment would be applicable for PASID as well.
>> Operating on a device other than the one the driver owns opens the
>> issue of mutual exclusion and races, so would require careful
>> scrutiny. Are PRI/PASID things that could be *always* enabled for the
>> PF at enumeration-time, or do we have to wait until a driver claims
>> the VF? If the latter, are there coordination issues between drivers
>> of different VFs?
> I suppose that's a reasonably good alternative. You mean we could
> do this when VF's are being created? Otherwise we can do this as its
> done today, on demand for all normal PF's.
If we are going to enable it with default features then its doable. But
for cases with custom requirements, it will become complicated. For
example, in following code, IOMMU sets PRI Outstanding Page Allocation
quota as 32 or 1 based on errata info. So if we just enable it by
default then we may not be able to take these requirements into
consideration.
2051 static int pdev_iommuv2_enable(struct pci_dev *pdev)
2052 {
2053 bool reset_enable;
2054 int reqs, ret;
2055
2056 /* FIXME: Hardcode number of outstanding requests for now */
2057 reqs = 32;
2058 if (pdev_pri_erratum(pdev, AMD_PRI_DEV_ERRATUM_LIMIT_REQ_ONE))
2059 reqs = 1;
2060 reset_enable = pdev_pri_erratum(pdev,
AMD_PRI_DEV_ERRATUM_ENABLE_RESET);
2073 ret = pci_enable_pri(pdev, reqs);
>
>
> Cheers,
> Ashok
>
--
Sathyanarayanan Kuppuswamy
Linux kernel developer