Clear the optional capabilities ARI Forwarding Enable, AtomicOp Requester
Enable and 10-Bit Tag Requester Enable in DEVCTL2 unconditionally on a
hot-plug event. These are the bits which are negotiated between endpoint
and root port and should be re-negotiated and set up for optimal operation
on a hot add.
According to implementation notes in 2.2.6.2, of PCIe Base Specification
[1], "For platforms where the RC supports 10-Bit Tag Completer capability,
it is highly recommended for platform firmware or operating software that
configures PCIe hierarchies to Set the 10-Bit Tag Requester Enable bit
automatically in Endpoints with 10-Bit Tag Requester capability. This
enables the important class of 10-Bit Tag capable adapters that send
Memory Read Requests only to host memory". Hence, it must be noted that
Platform FW may enable these bits if endpoint supports them at boot time
for performance reasons even if Linux hasn't defined them.
Issues are being observed where a device became inaccessible and the port
was not able to be recovered without a system reset when a device with
10-bit tags was removed and replaced with a device that didn't support
10-bit tags.
Section 2.2.6.2, in PCIe Base Specification also implies that:
* If a Requester sends a 10-Bit Tag Request to a Completer that lacks
10-Bit Completer capability, the returned Completion(s) will have Tags
with Tag[9:8] equal to 00b. Since the Requester is forbidden to generate
these Tag values for 10-Bit Tags, such Completions will be handled as
Unexpected Completions, which by default are Advisory Non-Fatal Errors.
The Requester must follow standard PCI Express error handling
requirements.
* In configurations where a Requester with 10-Bit Tag Requester capability
needs to target multiple Completers, one needs to ensure that the Requester
sends 10-Bit Tag Requests only to Completers that have 10-Bit Tag Completer
capability.
Additionally, Section 6.13 and 6.15 of the PCIe Base Spec points out that
following a hot-plug event, clear the ARI Forwarding Enable bit and
AtomicOp Requester Enable as its not determined whether the next device
inserted will support these capabilities. AtomicOp capabilities are not
supported on PCI Express to PCI/PCI-X Bridges and any newly added
component may not be an ARI device.
[1] PCI Express Base Specification Revision 6.0, Dec 16 2021.
https://members.pcisig.com/wg/PCI-SIG/document/16609
Signed-off-by: Smita Koralahalli <[email protected]>
---
v2:
Clear all optional capabilities in Device Control 2 register
instead of individually clearing ARI Forwarding Enable,
AtomicOp Requestor Enable and 10-bit Tag Requestor Enable.
v3:
Restore clearing only ARI, Atomic Op and 10 bit tags as these are
the optional capabilities.
Provide all necessary information in commit description.
Clear register bits of the hotplug port.
---
drivers/pci/hotplug/pciehp_pci.c | 4 ++++
include/uapi/linux/pci_regs.h | 1 +
2 files changed, 5 insertions(+)
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index ad12515a4a12..e27fd2bc4ceb 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -102,6 +102,10 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
pci_lock_rescan_remove();
+ pcie_capability_clear_word(ctrl->pcie->port, PCI_EXP_DEVCTL2,
+ (PCI_EXP_DEVCTL2_ARI | PCI_EXP_DEVCTL2_ATOMIC_REQ |
+ PCI_EXP_DEVCTL2_TAG_REQ_EN));
+
/*
* Stopping an SR-IOV PF device removes all the associated VFs,
* which will update the bus->devices list and confuse the
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index dc2000e0fe3a..6fbc47f23d52 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -668,6 +668,7 @@
#define PCI_EXP_DEVCTL2_IDO_REQ_EN 0x0100 /* Allow IDO for requests */
#define PCI_EXP_DEVCTL2_IDO_CMP_EN 0x0200 /* Allow IDO for completions */
#define PCI_EXP_DEVCTL2_LTR_EN 0x0400 /* Enable LTR mechanism */
+#define PCI_EXP_DEVCTL2_TAG_REQ_EN 0x1000 /* Allow 10 Tags for Requester */
#define PCI_EXP_DEVCTL2_OBFF_MSGA_EN 0x2000 /* Enable OBFF Message type A */
#define PCI_EXP_DEVCTL2_OBFF_MSGB_EN 0x4000 /* Enable OBFF Message type B */
#define PCI_EXP_DEVCTL2_OBFF_WAKE_EN 0x6000 /* OBFF using WAKE# signaling */
--
2.17.1
On Wed, Jun 21, 2023 at 06:51:52PM +0000, Smita Koralahalli wrote:
> --- a/drivers/pci/hotplug/pciehp_pci.c
> +++ b/drivers/pci/hotplug/pciehp_pci.c
> @@ -102,6 +102,10 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
>
> pci_lock_rescan_remove();
>
> + pcie_capability_clear_word(ctrl->pcie->port, PCI_EXP_DEVCTL2,
> + (PCI_EXP_DEVCTL2_ARI | PCI_EXP_DEVCTL2_ATOMIC_REQ |
> + PCI_EXP_DEVCTL2_TAG_REQ_EN));
> +
Hm, this will clear the bits while the device may still be present.
Note that the subsequent pci_stop_and_remove_bus_device() will unbind
the driver and may thus cause communication with the device.
Can clearing those bits in the hotplug port hamper communication with
the device?
I'd recommend avoiding that issue altogether by clearing the bits at
the end of the function after the call to pci_unlock_rescan_remove(),
so that negotiated state of the hotplug port gets cleared after all
subordinate devices are de-enumerated.
The commit message doesn't point out that PCI_EXP_DEVCTL2_ARI is
already being taken care of on enumeration of future subordinate
devices in pci_configure_ari() and is only cleared here for good
measure. If you intend to configure 10 bit tags and atomic ops
on enumeration in future patches, I'd recommend omitting
PCI_EXP_DEVCTL2_ARI here and clearing each of the other two bits
in the future patches which configure them on enumeration.
You don't need braces around the "or" operation for the bits.
Thanks,
Lukas
> /*
> * Stopping an SR-IOV PF device removes all the associated VFs,
> * which will update the bus->devices list and confuse the
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index dc2000e0fe3a..6fbc47f23d52 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -668,6 +668,7 @@
> #define PCI_EXP_DEVCTL2_IDO_REQ_EN 0x0100 /* Allow IDO for requests */
> #define PCI_EXP_DEVCTL2_IDO_CMP_EN 0x0200 /* Allow IDO for completions */
> #define PCI_EXP_DEVCTL2_LTR_EN 0x0400 /* Enable LTR mechanism */
> +#define PCI_EXP_DEVCTL2_TAG_REQ_EN 0x1000 /* Allow 10 Tags for Requester */
> #define PCI_EXP_DEVCTL2_OBFF_MSGA_EN 0x2000 /* Enable OBFF Message type A */
> #define PCI_EXP_DEVCTL2_OBFF_MSGB_EN 0x4000 /* Enable OBFF Message type B */
> #define PCI_EXP_DEVCTL2_OBFF_WAKE_EN 0x6000 /* OBFF using WAKE# signaling */
> --
> 2.17.1
On Thu, Jun 22, 2023 at 08:31:05AM +0200, Lukas Wunner wrote:
> On Wed, Jun 21, 2023 at 06:51:52PM +0000, Smita Koralahalli wrote:
> The commit message doesn't point out that PCI_EXP_DEVCTL2_ARI is
> already being taken care of on enumeration of future subordinate
> devices in pci_configure_ari() and is only cleared here for good
> measure. If you intend to configure 10 bit tags and atomic ops
> on enumeration in future patches, I'd recommend omitting
> PCI_EXP_DEVCTL2_ARI here and clearing each of the other two bits
^^^^^^^^
Sorry, I meant to say "dropping". I.e. once you enable or disable
the feature on enumeration and there's no longer a need to
unconditionally disable it on de-enumeration, drop clearing the bit
here.
Thanks,
Lukas
On 6/21/2023 11:31 PM, Lukas Wunner wrote:
> On Wed, Jun 21, 2023 at 06:51:52PM +0000, Smita Koralahalli wrote:
>> --- a/drivers/pci/hotplug/pciehp_pci.c
>> +++ b/drivers/pci/hotplug/pciehp_pci.c
>> @@ -102,6 +102,10 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
>>
>> pci_lock_rescan_remove();
>>
>> + pcie_capability_clear_word(ctrl->pcie->port, PCI_EXP_DEVCTL2,
>> + (PCI_EXP_DEVCTL2_ARI | PCI_EXP_DEVCTL2_ATOMIC_REQ |
>> + PCI_EXP_DEVCTL2_TAG_REQ_EN));
>> +
>
> Hm, this will clear the bits while the device may still be present.
> Note that the subsequent pci_stop_and_remove_bus_device() will unbind
> the driver and may thus cause communication with the device.
> Can clearing those bits in the hotplug port hamper communication with
> the device?
>
> I'd recommend avoiding that issue altogether by clearing the bits at
> the end of the function after the call to pci_unlock_rescan_remove(),
> so that negotiated state of the hotplug port gets cleared after all
> subordinate devices are de-enumerated.
This is a good point. Thanks!
>
> The commit message doesn't point out that PCI_EXP_DEVCTL2_ARI is
> already being taken care of on enumeration of future subordinate
> devices in pci_configure_ari() and is only cleared here for good
> measure. If you intend to configure 10 bit tags and atomic ops
> on enumeration in future patches, I'd recommend omitting
> PCI_EXP_DEVCTL2_ARI here and clearing each of the other two bits
> in the future patches which configure them on enumeration.
Would it be fair to just reuse pci_enable_atomic_ops_to_root() for
Atomic_Ops configuration?
>
> You don't need braces around the "or" operation for the bits.
Sure!
Thanks,
Smita
>
> Thanks,
>
> Lukas
>
>> /*
>> * Stopping an SR-IOV PF device removes all the associated VFs,
>> * which will update the bus->devices list and confuse the
>> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
>> index dc2000e0fe3a..6fbc47f23d52 100644
>> --- a/include/uapi/linux/pci_regs.h
>> +++ b/include/uapi/linux/pci_regs.h
>> @@ -668,6 +668,7 @@
>> #define PCI_EXP_DEVCTL2_IDO_REQ_EN 0x0100 /* Allow IDO for requests */
>> #define PCI_EXP_DEVCTL2_IDO_CMP_EN 0x0200 /* Allow IDO for completions */
>> #define PCI_EXP_DEVCTL2_LTR_EN 0x0400 /* Enable LTR mechanism */
>> +#define PCI_EXP_DEVCTL2_TAG_REQ_EN 0x1000 /* Allow 10 Tags for Requester */
>> #define PCI_EXP_DEVCTL2_OBFF_MSGA_EN 0x2000 /* Enable OBFF Message type A */
>> #define PCI_EXP_DEVCTL2_OBFF_MSGB_EN 0x4000 /* Enable OBFF Message type B */
>> #define PCI_EXP_DEVCTL2_OBFF_WAKE_EN 0x6000 /* OBFF using WAKE# signaling */
>> --
>> 2.17.1
[cc += Jay, Felix]
On Thu, Jun 22, 2023 at 02:02:12PM -0700, Smita Koralahalli wrote:
> Would it be fair to just reuse pci_enable_atomic_ops_to_root() for
> Atomic_Ops configuration?
Hm, that's a good question. I'm not an expert on that corner of
the PCI core.
But indeed what you could try is amend that function to not only
*set* PCI_EXP_DEVCTL2_ATOMIC_REQ if it's supported, but to also
*clear* it if it's not supported.
And you'd have to call pci_enable_atomic_ops_to_root() on enumeration,
e.g. from pci_init_capabilities().
That should obviate the need to call pci_enable_atomic_ops_to_root()
from drivers, so you could probably remove the call from all the
drivers which currently call it (amdgpu, infiniband, mellanox),
in one separate patch per driver.
An then you could drop the EXPORT clause for pci_enable_atomic_ops_to_root()
and make it private to the PCI core.
So that would be 5 patches (enablement/disablement on enumeration,
amendmend of the 3 drivers, making the call private).
I'm not sure if anyone will cry foul if you do that but if you want
to give it a try, go for it. :)
I don't now why commit 430a23689dea, which introduced
pci_enable_atomic_ops_to_root(), chose to add it as a library function
which is only called from specific drivers, instead of universally
enabling the feature for all devices. Adding the commit authors to cc
so they can chime in.
Thanks,
Lukas
On 2023-06-22 23:42, Lukas Wunner wrote:
> [cc += Jay, Felix]
>
> On Thu, Jun 22, 2023 at 02:02:12PM -0700, Smita Koralahalli wrote:
>> Would it be fair to just reuse pci_enable_atomic_ops_to_root() for
>> Atomic_Ops configuration?
> Hm, that's a good question. I'm not an expert on that corner of
> the PCI core.
>
> But indeed what you could try is amend that function to not only
> *set* PCI_EXP_DEVCTL2_ATOMIC_REQ if it's supported, but to also
> *clear* it if it's not supported.
>
> And you'd have to call pci_enable_atomic_ops_to_root() on enumeration,
> e.g. from pci_init_capabilities().
>
> That should obviate the need to call pci_enable_atomic_ops_to_root()
> from drivers, so you could probably remove the call from all the
> drivers which currently call it (amdgpu, infiniband, mellanox),
> in one separate patch per driver.
>
> An then you could drop the EXPORT clause for pci_enable_atomic_ops_to_root()
> and make it private to the PCI core.
Then our driver would need an alternative way to determine whether
atomic capabilities are enabled for a device. We currently use the
return value from pci_enable_atomic_ops_to_root to determine this.
Regards,
Felix
>
> So that would be 5 patches (enablement/disablement on enumeration,
> amendmend of the 3 drivers, making the call private).
>
> I'm not sure if anyone will cry foul if you do that but if you want
> to give it a try, go for it. :)
>
> I don't now why commit 430a23689dea, which introduced
> pci_enable_atomic_ops_to_root(), chose to add it as a library function
> which is only called from specific drivers, instead of universally
> enabling the feature for all devices. Adding the commit authors to cc
> so they can chime in.
>
> Thanks,
>
> Lukas
On Fri, Jun 23, 2023 at 05:59:55AM +0200, Felix Kuehling wrote:
> On 2023-06-22 23:42, Lukas Wunner wrote:
> > On Thu, Jun 22, 2023 at 02:02:12PM -0700, Smita Koralahalli wrote:
> > > Would it be fair to just reuse pci_enable_atomic_ops_to_root() for
> > > Atomic_Ops configuration?
> >
> > Hm, that's a good question. I'm not an expert on that corner of
> > the PCI core.
> >
> > But indeed what you could try is amend that function to not only
> > *set* PCI_EXP_DEVCTL2_ATOMIC_REQ if it's supported, but to also
> > *clear* it if it's not supported.
> >
> > And you'd have to call pci_enable_atomic_ops_to_root() on enumeration,
> > e.g. from pci_init_capabilities().
> >
> > That should obviate the need to call pci_enable_atomic_ops_to_root()
> > from drivers, so you could probably remove the call from all the
> > drivers which currently call it (amdgpu, infiniband, mellanox),
> > in one separate patch per driver.
> >
> > An then you could drop the EXPORT clause for pci_enable_atomic_ops_to_root()
> > and make it private to the PCI core.
>
> Then our driver would need an alternative way to determine whether atomic
> capabilities are enabled for a device. We currently use the return value
> from pci_enable_atomic_ops_to_root to determine this.
Just read PCI_EXP_DEVCTL2 and check whether PCI_EXP_DEVCTL2_ATOMIC_REQ
is set. (I.e. has been set by the PCI core on device enumeration.)
Problem solved, I guess?
Thanks,
Lukas
On 6/22/2023 16:42, Lukas Wunner wrote:
> I don't now why commit 430a23689dea, which introduced
> pci_enable_atomic_ops_to_root(), chose to add it as a library function
> which is only called from specific drivers, instead of universally
> enabling the feature for all devices. Adding the commit authors to cc
> so they can chime in.
IIRC during the initial design discussion on linux-pci this approach was suggested to avoid triggering potential bugs in devices without AtomicOps support. See quote below.
I've no objections to changing it.
On 2016-05-06 10:48, Bjorn Helgaas wrote:
> Once enabled in Device Control 2, a device's use of AtomicOps is
> competely device-specific. In many cases, the device probably doesn't
> support AtomicOps, so enabling them would be a no-op. But there could
> be devices where AtomicOps are nominally supported but untested or
> broken. Even if we didn't change their drivers, those devices could
> start using AtomicOps, so I'm not comfortable with the PCI core
> enabling AtomicOp requests indiscriminately.
On 6/22/2023 2:42 PM, Lukas Wunner wrote:
> [cc += Jay, Felix]
>
> On Thu, Jun 22, 2023 at 02:02:12PM -0700, Smita Koralahalli wrote:
>> Would it be fair to just reuse pci_enable_atomic_ops_to_root() for
>> Atomic_Ops configuration?
>
> Hm, that's a good question. I'm not an expert on that corner of
> the PCI core.
>
> But indeed what you could try is amend that function to not only
> *set* PCI_EXP_DEVCTL2_ATOMIC_REQ if it's supported, but to also
> *clear* it if it's not supported.
>
> And you'd have to call pci_enable_atomic_ops_to_root() on enumeration,
> e.g. from pci_init_capabilities().
>
> That should obviate the need to call pci_enable_atomic_ops_to_root()
> from drivers, so you could probably remove the call from all the
> drivers which currently call it (amdgpu, infiniband, mellanox),
> in one separate patch per driver.
>
> An then you could drop the EXPORT clause for pci_enable_atomic_ops_to_root()
> and make it private to the PCI core.
>
> So that would be 5 patches (enablement/disablement on enumeration,
> amendmend of the 3 drivers, making the call private).
>
> I'm not sure if anyone will cry foul if you do that but if you want
> to give it a try, go for it. :)
Okay, I see there are no objections except for Bjorn/Jay's comments on
"But there could be devices where AtomicOps are nominally supported but
untested or broken.."
Would this be an issue?
If not, I will start working on those 5 patches.
Thanks,
Smita
>
> I don't now why commit 430a23689dea, which introduced
> pci_enable_atomic_ops_to_root(), chose to add it as a library function
> which is only called from specific drivers, instead of universally
> enabling the feature for all devices. Adding the commit authors to cc
> so they can chime in.
>
> Thanks,
>
> Lukas
On Tue, Jun 27, 2023 at 10:38:54AM -0700, Smita Koralahalli wrote:
> Okay, I see there are no objections except for Bjorn/Jay's comments on
>
> "But there could be devices where AtomicOps are nominally supported but
> untested or broken.."
>
> Would this be an issue?
I think you said that BIOS enables AtomicOps on certain AMD machines?
Or did that observation only apply to 10 Bit tags?
If BIOS does enable AtomicOps, it would be interesting to know which
logic BIOS follows, i.e. how does it determine whether to set
AtomicOp Requester Enable on Endpoints?
It would also be interesting to know how far that BIOS has proliferated,
i.e. how much experience with various Endpoint devices exists in the
real world. If it turns out that BIOS has enabled the feature for
years on a wide range of Endpoints without any issues, I think
that would render concerns mute that enabling it in the kernel
might cause regressions.
I don't know why the spec says that "discovery of AtomicOp Requester
capabilities is outside the scope of this specification". I imagine
it would be possible to set AtomicOp Requester Enable, then read it
to determine whether the bit is now indeed 1 or hard-wired to 0.
In the latter case, AtomicOp Requester capabilities can be assumed
to be absent. So that would be a way to make do without any other
specific discovery of AtomicOp Requester capabilities.
Thanks,
Lukas
On 6/28/2023 6:25 AM, Lukas Wunner wrote:
> On Tue, Jun 27, 2023 at 10:38:54AM -0700, Smita Koralahalli wrote:
>> Okay, I see there are no objections except for Bjorn/Jay's comments on
>>
>> "But there could be devices where AtomicOps are nominally supported but
>> untested or broken.."
>>
>> Would this be an issue?
>
> I think you said that BIOS enables AtomicOps on certain AMD machines?
> Or did that observation only apply to 10 Bit tags?
Yes, that observation right now applies only to 10 bit tags.
>
> If BIOS does enable AtomicOps, it would be interesting to know which
> logic BIOS follows, i.e. how does it determine whether to set
> AtomicOp Requester Enable on Endpoints?
I agree this is a very good thing to know. I have followed up with the
BIOS team to get some pointers on this. I will get back to you soon.
>
> It would also be interesting to know how far that BIOS has proliferated,
> i.e. how much experience with various Endpoint devices exists in the
> real world. If it turns out that BIOS has enabled the feature for
> years on a wide range of Endpoints without any issues, I think
> that would render concerns mute that enabling it in the kernel
> might cause regressions.
>
> I don't know why the spec says that "discovery of AtomicOp Requester
> capabilities is outside the scope of this specification". I imagine
> it would be possible to set AtomicOp Requester Enable, then read it
> to determine whether the bit is now indeed 1 or hard-wired to 0.
> In the latter case, AtomicOp Requester capabilities can be assumed
> to be absent. So that would be a way to make do without any other
> specific discovery of AtomicOp Requester capabilities.
>
> Thanks,
>
> Lukas