If a downstream port has a PCIe switch connected to it, the enumeration
process leaves the 'Received Master Abort' bit set in the Secondary
Status Register of the downstream port because of the Unsupported
Requests (URs) take place in the downstream hierarchy. Since the
ownership of Secondary Status Register always lies with the OS including
systems with Firmware-First approach for error handling[1], clear the
error status bits in the Secondary Status Register post enumeration.
[1] https://lore.kernel.org/all/[email protected]/T/
Signed-off-by: Vidya Sagar <[email protected]>
---
drivers/pci/probe.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 43159965e09e..edf8202465d8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
}
out:
+ /* Clear errors in the Secondary Status Register */
+ pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
+
pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
pm_runtime_put(&dev->dev);
--
2.25.1
Hi Bjorn,
Do you have any comments for this patch?
Thanks,
Vidya Sagar
On 1/4/2024 7:02 AM, Vidya Sagar wrote:
> If a downstream port has a PCIe switch connected to it, the enumeration
> process leaves the 'Received Master Abort' bit set in the Secondary
> Status Register of the downstream port because of the Unsupported
> Requests (URs) take place in the downstream hierarchy. Since the
> ownership of Secondary Status Register always lies with the OS including
> systems with Firmware-First approach for error handling[1], clear the
> error status bits in the Secondary Status Register post enumeration.
>
> [1] https://lore.kernel.org/all/[email protected]/T/
>
> Signed-off-by: Vidya Sagar <[email protected]>
> ---
> drivers/pci/probe.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 43159965e09e..edf8202465d8 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
> }
>
> out:
> + /* Clear errors in the Secondary Status Register */
> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
> +
> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>
> pm_runtime_put(&dev->dev);
On Thu, Jan 04, 2024 at 07:02:29AM +0530, Vidya Sagar wrote:
> If a downstream port has a PCIe switch connected to it, the enumeration
> process leaves the 'Received Master Abort' bit set in the Secondary
> Status Register of the downstream port because of the Unsupported
> Requests (URs) take place in the downstream hierarchy. Since the
> ownership of Secondary Status Register always lies with the OS including
> systems with Firmware-First approach for error handling[1], clear the
> error status bits in the Secondary Status Register post enumeration.
I would expect these URs to happen when enumerating below *all* PCIe
Root Ports (not just when switches are present), and Master Aborts
should happen in conventional PCI.
Similarly, I don't think Firmware-First is relevant here. Only the
fact that the OS owns PCI_SEC_STATUS because there's no mechanism to
negotiate for platform ownership of it.
We're in the merge window right now, so we'll start merging v6.9
material after v6.8-rc1 is tagged.
> [1] https://lore.kernel.org/all/[email protected]/T/
>
> Signed-off-by: Vidya Sagar <[email protected]>
> ---
> drivers/pci/probe.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 43159965e09e..edf8202465d8 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
> }
>
> out:
> + /* Clear errors in the Secondary Status Register */
> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
> +
> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>
> pm_runtime_put(&dev->dev);
> --
> 2.25.1
>
On 1/12/2024 10:36 PM, Bjorn Helgaas wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Jan 04, 2024 at 07:02:29AM +0530, Vidya Sagar wrote:
>> If a downstream port has a PCIe switch connected to it, the enumeration
>> process leaves the 'Received Master Abort' bit set in the Secondary
>> Status Register of the downstream port because of the Unsupported
>> Requests (URs) take place in the downstream hierarchy. Since the
>> ownership of Secondary Status Register always lies with the OS including
>> systems with Firmware-First approach for error handling[1], clear the
>> error status bits in the Secondary Status Register post enumeration.
>
> I would expect these URs to happen when enumerating below *all* PCIe
> Root Ports (not just when switches are present), and Master Aborts
> should happen in conventional PCI.
Agree.
There was a misunderstanding from my side because of which I had said
that the 'Received Master Abort' bit gets set only if there is a PCIe
switch connected downstream. I'll correct it in my next patch.
>
> Similarly, I don't think Firmware-First is relevant here. Only the
> fact that the OS owns PCI_SEC_STATUS because there's no mechanism to
> negotiate for platform ownership of it.
I mentioned about Firmware-First as a continuation to the discussion we
had in [1]. But, agree that, this being a standalone patch, there is no
need to mentioned about Firmware-First flow.
>
> We're in the merge window right now, so we'll start merging v6.9
> material after v6.8-rc1 is tagged.
>
>> [1] https://lore.kernel.org/all/[email protected]/T/
>>
>> Signed-off-by: Vidya Sagar <[email protected]>
>> ---
>> drivers/pci/probe.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 43159965e09e..edf8202465d8 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
>> }
>>
>> out:
>> + /* Clear errors in the Secondary Status Register */
>> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
>> +
>> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>>
>> pm_runtime_put(&dev->dev);
>> --
>> 2.25.1
>>
The enumeration process leaves the 'Received Master Abort' bit set in
the Secondary Status Register of the downstream port in the following
scenarios.
(1) The device connected to the downstream port has ARI capability
and that makes the kernel set the 'ARI Forwarding Enable' bit in
the Device Control 2 Register of the downstream port. This
effectively makes the downstream port forward the configuration
requests targeting the devices downstream of it, even though they
don't exist in reality. It causes the downstream devices return
completions with UR set in the status in turn causing 'Received
Master Abort' bit set.
In contrast, if the downstream device doesn't have ARI capability,
the 'ARI Forwarding Enable' bit in the downstream port is not set
and any configuration requests targeting the downstream devices
that don't exist are terminated (section 6.13 of PCI Express Base
6.0 spec) in the downstream port itself resulting in no change of
the 'Received Master Abort' bit.
(2) A PCIe switch is connected to the downstream port and when the
enumeration flow tries to explore the presence of devices that
don't really exist downstream of the switch, the downstream
port receives the completions with UR set causing the 'Received
Master Abort' bit set.
Clear 'Received Master Abort' bit to keep the bridge device in a clean
state post enumeration.
Signed-off-by: Vidya Sagar <[email protected]>
---
V2:
* Changed commit message based on Bjorn's feedback
drivers/pci/probe.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 795534589b98..640d2871b061 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
}
out:
+ /* Clear errors in the Secondary Status Register */
+ pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
+
pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
pm_runtime_put(&dev->dev);
--
2.25.1
On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
> The enumeration process leaves the 'Received Master Abort' bit set in
> the Secondary Status Register of the downstream port in the following
> scenarios.
>
> (1) The device connected to the downstream port has ARI capability
> and that makes the kernel set the 'ARI Forwarding Enable' bit in
> the Device Control 2 Register of the downstream port. This
> effectively makes the downstream port forward the configuration
> requests targeting the devices downstream of it, even though they
> don't exist in reality. It causes the downstream devices return
> completions with UR set in the status in turn causing 'Received
> Master Abort' bit set.
>
> In contrast, if the downstream device doesn't have ARI capability,
> the 'ARI Forwarding Enable' bit in the downstream port is not set
> and any configuration requests targeting the downstream devices
> that don't exist are terminated (section 6.13 of PCI Express Base
> 6.0 spec) in the downstream port itself resulting in no change of
> the 'Received Master Abort' bit.
>
> (2) A PCIe switch is connected to the downstream port and when the
> enumeration flow tries to explore the presence of devices that
> don't really exist downstream of the switch, the downstream
> port receives the completions with UR set causing the 'Received
> Master Abort' bit set.
Are these the only possible ways this error is logged? I expected
them to be logged when we enumerate below a Root Port that has nothing
attached, for example.
Does clearing them in pci_scan_bridge_extend() cover all ways this
error might be logged during enumeration? I can't remember whether
all enumeration goes through this path.
> Clear 'Received Master Abort' bit to keep the bridge device in a clean
> state post enumeration.
>
> Signed-off-by: Vidya Sagar <[email protected]>
> ---
> V2:
> * Changed commit message based on Bjorn's feedback
>
> drivers/pci/probe.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 795534589b98..640d2871b061 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
> }
>
> out:
> + /* Clear errors in the Secondary Status Register */
> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
> +
> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>
> pm_runtime_put(&dev->dev);
> --
> 2.25.1
>
On 23-01-2024 04:30, Bjorn Helgaas wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
>> The enumeration process leaves the 'Received Master Abort' bit set in
>> the Secondary Status Register of the downstream port in the following
>> scenarios.
>>
>> (1) The device connected to the downstream port has ARI capability
>> and that makes the kernel set the 'ARI Forwarding Enable' bit in
>> the Device Control 2 Register of the downstream port. This
>> effectively makes the downstream port forward the configuration
>> requests targeting the devices downstream of it, even though they
>> don't exist in reality. It causes the downstream devices return
>> completions with UR set in the status in turn causing 'Received
>> Master Abort' bit set.
>>
>> In contrast, if the downstream device doesn't have ARI capability,
>> the 'ARI Forwarding Enable' bit in the downstream port is not set
>> and any configuration requests targeting the downstream devices
>> that don't exist are terminated (section 6.13 of PCI Express Base
>> 6.0 spec) in the downstream port itself resulting in no change of
>> the 'Received Master Abort' bit.
>>
>> (2) A PCIe switch is connected to the downstream port and when the
>> enumeration flow tries to explore the presence of devices that
>> don't really exist downstream of the switch, the downstream
>> port receives the completions with UR set causing the 'Received
>> Master Abort' bit set.
> Are these the only possible ways this error is logged? I expected
> them to be logged when we enumerate below a Root Port that has nothing
> attached, for example.
In this case, there won't be any TLP sent downstream. I talked about
this scenario in the
second paragraph of point (1) above.
> Does clearing them in pci_scan_bridge_extend() cover all ways this
> error might be logged during enumeration? I can't remember whether
> all enumeration goes through this path.
So far in my testing, clearing it in pci_scan_bridge_extend() covers all
the cases.
>> Clear 'Received Master Abort' bit to keep the bridge device in a clean
>> state post enumeration.
>>
>> Signed-off-by: Vidya Sagar <[email protected]>
>> ---
>> V2:
>> * Changed commit message based on Bjorn's feedback
>>
>> drivers/pci/probe.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 795534589b98..640d2871b061 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
>> }
>>
>> out:
>> + /* Clear errors in the Secondary Status Register */
>> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
>> +
>> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>>
>> pm_runtime_put(&dev->dev);
>> --
>> 2.25.1
>>
Hi Bjorn,
Just checking on this thread.
Is there anything else you want me to clarify on?
Thanks,
Vidya Sagar
On 14-03-2024 06:09, Vidya Sagar wrote:
>
>
> On 23-01-2024 04:30, Bjorn Helgaas wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
>>> The enumeration process leaves the 'Received Master Abort' bit set in
>>> the Secondary Status Register of the downstream port in the following
>>> scenarios.
>>>
>>> (1) The device connected to the downstream port has ARI capability
>>> and that makes the kernel set the 'ARI Forwarding Enable' bit in
>>> the Device Control 2 Register of the downstream port. This
>>> effectively makes the downstream port forward the configuration
>>> requests targeting the devices downstream of it, even though they
>>> don't exist in reality. It causes the downstream devices return
>>> completions with UR set in the status in turn causing 'Received
>>> Master Abort' bit set.
>>>
>>> In contrast, if the downstream device doesn't have ARI capability,
>>> the 'ARI Forwarding Enable' bit in the downstream port is not set
>>> and any configuration requests targeting the downstream devices
>>> that don't exist are terminated (section 6.13 of PCI Express Base
>>> 6.0 spec) in the downstream port itself resulting in no change of
>>> the 'Received Master Abort' bit.
>>>
>>> (2) A PCIe switch is connected to the downstream port and when the
>>> enumeration flow tries to explore the presence of devices that
>>> don't really exist downstream of the switch, the downstream
>>> port receives the completions with UR set causing the 'Received
>>> Master Abort' bit set.
>> Are these the only possible ways this error is logged? I expected
>> them to be logged when we enumerate below a Root Port that has nothing
>> attached, for example.
> In this case, there won't be any TLP sent downstream. I talked about
> this scenario in the
> second paragraph of point (1) above.
>> Does clearing them in pci_scan_bridge_extend() cover all ways this
>> error might be logged during enumeration? I can't remember whether
>> all enumeration goes through this path.
> So far in my testing, clearing it in pci_scan_bridge_extend() covers
> all the cases.
>
>>> Clear 'Received Master Abort' bit to keep the bridge device in a clean
>>> state post enumeration.
>>>
>>> Signed-off-by: Vidya Sagar <[email protected]>
>>> ---
>>> V2:
>>> * Changed commit message based on Bjorn's feedback
>>>
>>> drivers/pci/probe.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index 795534589b98..640d2871b061 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct
>>> pci_bus *bus, struct pci_dev *dev,
>>> }
>>>
>>> out:
>>> + /* Clear errors in the Secondary Status Register */
>>> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
>>> +
>>> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>>>
>>> pm_runtime_put(&dev->dev);
>>> --
>>> 2.25.1
>>>
>
Hi Bjorn,
Sorry to bug you.
Is this change good to be accepted?
Thanks,
Vidya Sagar
On 01-04-2024 13:29, Vidya Sagar wrote:
> Hi Bjorn,
> Just checking on this thread.
> Is there anything else you want me to clarify on?
>
> Thanks,
> Vidya Sagar
>
> On 14-03-2024 06:09, Vidya Sagar wrote:
>>
>>
>> On 23-01-2024 04:30, Bjorn Helgaas wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
>>>> The enumeration process leaves the 'Received Master Abort' bit set in
>>>> the Secondary Status Register of the downstream port in the following
>>>> scenarios.
>>>>
>>>> (1) The device connected to the downstream port has ARI capability
>>>> and that makes the kernel set the 'ARI Forwarding Enable' bit in
>>>> the Device Control 2 Register of the downstream port. This
>>>> effectively makes the downstream port forward the configuration
>>>> requests targeting the devices downstream of it, even though they
>>>> don't exist in reality. It causes the downstream devices return
>>>> completions with UR set in the status in turn causing 'Received
>>>> Master Abort' bit set.
>>>>
>>>> In contrast, if the downstream device doesn't have ARI capability,
>>>> the 'ARI Forwarding Enable' bit in the downstream port is not set
>>>> and any configuration requests targeting the downstream devices
>>>> that don't exist are terminated (section 6.13 of PCI Express Base
>>>> 6.0 spec) in the downstream port itself resulting in no change of
>>>> the 'Received Master Abort' bit.
>>>>
>>>> (2) A PCIe switch is connected to the downstream port and when the
>>>> enumeration flow tries to explore the presence of devices that
>>>> don't really exist downstream of the switch, the downstream
>>>> port receives the completions with UR set causing the 'Received
>>>> Master Abort' bit set.
>>> Are these the only possible ways this error is logged? I expected
>>> them to be logged when we enumerate below a Root Port that has nothing
>>> attached, for example.
>> In this case, there won't be any TLP sent downstream. I talked about this
>> scenario in the
>> second paragraph of point (1) above.
>>> Does clearing them in pci_scan_bridge_extend() cover all ways this
>>> error might be logged during enumeration? I can't remember whether
>>> all enumeration goes through this path.
>> So far in my testing, clearing it in pci_scan_bridge_extend() covers all the
>> cases.
>>
>>>> Clear 'Received Master Abort' bit to keep the bridge device in a clean
>>>> state post enumeration.
>>>>
>>>> Signed-off-by: Vidya Sagar <[email protected]>
>>>> ---
>>>> V2:
>>>> * Changed commit message based on Bjorn's feedback
>>>>
>>>> drivers/pci/probe.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>> index 795534589b98..640d2871b061 100644
>>>> --- a/drivers/pci/probe.c
>>>> +++ b/drivers/pci/probe.c
>>>> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus
>>>> *bus, struct pci_dev *dev,
>>>> }
>>>>
>>>> out:
>>>> + /* Clear errors in the Secondary Status Register */
>>>> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
>>>> +
>>>> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>>>>
>>>> pm_runtime_put(&dev->dev);
>>>> --
>>>> 2.25.1
>>>>
>>
>
On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
> The enumeration process leaves the 'Received Master Abort' bit set in
> the Secondary Status Register of the downstream port in the following
> scenarios.
>
> (1) The device connected to the downstream port has ARI capability
> and that makes the kernel set the 'ARI Forwarding Enable' bit in
> the Device Control 2 Register of the downstream port. This
> effectively makes the downstream port forward the configuration
> requests targeting the devices downstream of it, even though they
> don't exist in reality. It causes the downstream devices return
> completions with UR set in the status in turn causing 'Received
> Master Abort' bit set.
>
> In contrast, if the downstream device doesn't have ARI capability,
> the 'ARI Forwarding Enable' bit in the downstream port is not set
> and any configuration requests targeting the downstream devices
> that don't exist are terminated (section 6.13 of PCI Express Base
> 6.0 spec) in the downstream port itself resulting in no change of
> the 'Received Master Abort' bit.
>
> (2) A PCIe switch is connected to the downstream port and when the
> enumeration flow tries to explore the presence of devices that
> don't really exist downstream of the switch, the downstream
> port receives the completions with UR set causing the 'Received
> Master Abort' bit set.
>
> Clear 'Received Master Abort' bit to keep the bridge device in a clean
> state post enumeration.
>
> Signed-off-by: Vidya Sagar <[email protected]>
Applied to pci/enumeration for v6.10, thanks!
I shortened the commit log because I think this happens all the time,
not just in the specific cases you mentioned above:
PCI: Clear Secondary Status errors after enumeration
We enumerate devices by attempting config reads to the Vendor ID of each
possible device. On conventional PCI, if no device responds, the read
terminates with a Master Abort (PCI r3.0, sec 6.1). On PCIe, the config
read is terminated as an Unsupported Request (PCIe r6.0, sec 2.3.2,
7.5.1.3.7). In either case, if the read addressed a device below a bridge,
it is logged by setting "Received Master Abort" in the bridge Secondary
Status register.
Clear any errors logged in the Secondary Status register after enumeration.
> ---
> V2:
> * Changed commit message based on Bjorn's feedback
>
> drivers/pci/probe.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 795534589b98..640d2871b061 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
> }
>
> out:
> + /* Clear errors in the Secondary Status Register */
> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
> +
> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>
> pm_runtime_put(&dev->dev);
> --
> 2.25.1
>