2024-03-04 19:34:04

by Smita Koralahalli

[permalink] [raw]
Subject: Re: [PATCH pci-next] pci/edr: Ignore Surprise Down error on hot removal

Hi Ethan,

On 3/4/2024 3:58 AM, Lukas Wunner wrote:
> On Mon, Mar 04, 2024 at 04:08:19AM -0500, Ethan Zhao wrote:
>> Per PCI firmware spec r3.3 sec 4.6.12, for firmware first mode DPC
>> handling path, FW should clear UC errors logged by port and bring link
>> out of DPC, but because of ambiguity of wording in the spec, some BIOSes
>> doesn't clear the surprise down error and the error bits in pci status,
>> still notify OS to handle it. thus following trick is needed in EDR when
>> double reporting (hot removal interrupt && dpc notification) is hit.

Please correct me if I'm wrong.

When there is double reporting (hot removal interrupt && dpc
notification), won't the DPC handler be called always which takes care
of clearing the surprise down errors? Do we need it again from EDR handler?

Thanks
Smita

>
> Please provide more detailed information about the hardware and BIOS
> affected by this.
>
>
>> -static void dpc_handle_surprise_removal(struct pci_dev *pdev)
>> +bool dpc_handle_surprise_removal(struct pci_dev *pdev)
>> {
>> + if (!dpc_is_surprise_removal(pdev))
>> + return false;
>
> This change of moving dpc_is_surprise_removal() into
> dpc_handle_surprise_removal() seems unrelated to the problem at hand.
>
> Please drop it if it's unnecessary to fix the issue.
>
>
>> --- a/drivers/pci/pcie/edr.c
>> +++ b/drivers/pci/pcie/edr.c
>> @@ -184,6 +184,9 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data)
>> goto send_ost;
>> }
>>
>> + if (dpc_handle_surprise_removal(edev))
>> + goto send_ost;
>> +
>> dpc_process_error(edev);
>> pci_aer_raw_clear_status(edev);
>
> This seems to be the only necessary change. Please reduce the
> patch to contain only it and no other refactoring.
>
> Please capitalize the "PCI/EDR: " prefix in the subject and add
> a Fixes tag.
>
> Thanks,
>
> Lukas
>


2024-03-05 02:19:36

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH pci-next] pci/edr: Ignore Surprise Down error on hot removal

On 3/5/2024 3:33 AM, Smita Koralahalli wrote:
> Hi Ethan,
>
> On 3/4/2024 3:58 AM, Lukas Wunner wrote:
>> On Mon, Mar 04, 2024 at 04:08:19AM -0500, Ethan Zhao wrote:
>>> Per PCI firmware spec r3.3 sec 4.6.12, for firmware first mode DPC
>>> handling path, FW should clear UC errors logged by port and bring link
>>> out of DPC, but because of ambiguity of wording in the spec, some
>>> BIOSes
>>> doesn't clear the surprise down error and the error bits in pci status,
>>> still notify OS to handle it. thus following trick is needed in EDR
>>> when
>>> double reporting (hot removal interrupt && dpc notification) is hit.
>
> Please correct me if I'm wrong.
>
> When there is double reporting (hot removal interrupt && dpc
> notification), won't the DPC handler be called always which takes care
> of clearing the surprise down errors? Do we need it again from EDR
> handler?

My understanding, if firmware first mode is enabled, DPC driver wouldn't
be enabled, EDR is notified instead, though some of the common functions
are used in EDR, such as dpc_process_error() is called in edr_handle_event(),
but dpc_handler() isn't called, so does the dpc_handle_surprise_removal().

Thanks,
Ethan

>
> Thanks
> Smita
>
>>
>> Please provide more detailed information about the hardware and BIOS
>> affected by this.
>>
>>
>>> -static void dpc_handle_surprise_removal(struct pci_dev *pdev)
>>> +bool  dpc_handle_surprise_removal(struct pci_dev *pdev)
>>>   {
>>> +    if (!dpc_is_surprise_removal(pdev))
>>> +        return false;
>>
>> This change of moving dpc_is_surprise_removal() into
>> dpc_handle_surprise_removal() seems unrelated to the problem at hand.
>>
>> Please drop it if it's unnecessary to fix the issue.
>>
>>
>>> --- a/drivers/pci/pcie/edr.c
>>> +++ b/drivers/pci/pcie/edr.c
>>> @@ -184,6 +184,9 @@ static void edr_handle_event(acpi_handle handle,
>>> u32 event, void *data)
>>>           goto send_ost;
>>>       }
>>>   +    if (dpc_handle_surprise_removal(edev))
>>> +        goto send_ost;
>>> +
>>>       dpc_process_error(edev);
>>>       pci_aer_raw_clear_status(edev);
>>
>> This seems to be the only necessary change.  Please reduce the
>> patch to contain only it and no other refactoring.
>>
>> Please capitalize the "PCI/EDR: " prefix in the subject and add
>> a Fixes tag.
>>
>> Thanks,
>>
>> Lukas
>>