2021-05-25 13:22:44

by Krzysztof Wilczyński

[permalink] [raw]
Subject: Re: [PATCH] pci: add pci_dev_is_alive API

Hi Lambert,

Thank you for sending the patch over!

To match the style of other patches you'd need to capitalise "PCI" in
the subject, see the following for some examples:

$ git log --oneline drivers/pci/pci.c

Also, it might be worth mentioning in the subject that this is a new API
that will be added.

> Device drivers use this API to proactively check if the device
> is alive or not. It is helpful for some PCI devices to detect
> surprise removal and do recovery when Hotplug function is disabled.
>
> Note: Device in power states larger than D0 is also treated not alive
> by this function.
[...]

Question to you: do you have any particular users of this new API in
mind? Or is this solving some problem you've seen and/or reported via
the kernel Bugzilla?

> +/**
> + * pci_dev_is_alive - check the pci device is alive or not
> + * @pdev: the PCI device

That would be "PCI" in the function brief above. Also, try to be
consistent and capitalise everything plus add missing periods, see the
following for an example on how to write kernel-doc:

https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#function-documentation

> + * Device drivers use this API to proactively check if the device
> + * is alive or not. It is helpful for some PCI devices to detect
> + * surprise removal and do recovery when Hotplug function is disabled.

As per my question above - what users of this new API do you have in
mind? Are they any other patches pending adding users of this API?

> + * Note: Device in power state larger than D0 is also treated not alive
> + * by this function.
> + *
> + * Returns true if the device is alive.
> + */
> +bool pci_dev_is_alive(struct pci_dev *pdev)
> +{
> + u16 vendor;
> +
> + pci_read_config_word(pdev, PCI_VENDOR_ID, &vendor);
> +
> + return vendor == pdev->vendor;
> +}
> +EXPORT_SYMBOL(pci_dev_is_alive);

Why not use the EXPORT_SYMBOL_GPL()?

Krzysztof


2021-05-26 07:56:09

by Lambert Wang

[permalink] [raw]
Subject: Re: [PATCH] pci: add pci_dev_is_alive API

Hi Krzysztof and Bjorn,

Thanks for your comments and your time.Your questions are answered inline.

I have checked and tested *pci_device_is_present* as pointed out by Bjorn.
I think that API could satisfy my needs.

So this patch request could be dropped. Sorry for the inconvenience.

On Tue, May 25, 2021 at 9:20 PM Krzysztof Wilczyński <[email protected]> wrote:
>
> Hi Lambert,
>
> Thank you for sending the patch over!
>
> To match the style of other patches you'd need to capitalise "PCI" in
> the subject, see the following for some examples:
>
> $ git log --oneline drivers/pci/pci.c
>
> Also, it might be worth mentioning in the subject that this is a new API
> that will be added.
>

I will be careful on the styles of patch title and description in my
future patches.

> > Device drivers use this API to proactively check if the device
> > is alive or not. It is helpful for some PCI devices to detect
> > surprise removal and do recovery when Hotplug function is disabled.
> >
> > Note: Device in power states larger than D0 is also treated not alive
> > by this function.
> [...]
>
> Question to you: do you have any particular users of this new API in
> mind? Or is this solving some problem you've seen and/or reported via
> the kernel Bugzilla?
>

The user is our new PCI driver under development for WWAN devices .
Surprise removal could happen under multiple circumstances.
e.g. Exception, Link Failure, etc.

We wanted this API to detect surprise removal or check device recovery
when AER and Hotplug are disabled.

I thought the API could be commonly used for many similar devices.

> > +/**
> > + * pci_dev_is_alive - check the pci device is alive or not
> > + * @pdev: the PCI device
>
> That would be "PCI" in the function brief above. Also, try to be
> consistent and capitalise everything plus add missing periods, see the
> following for an example on how to write kernel-doc:
>
> https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#function-documentation
>
> > + * Device drivers use this API to proactively check if the device
> > + * is alive or not. It is helpful for some PCI devices to detect
> > + * surprise removal and do recovery when Hotplug function is disabled.
>
> As per my question above - what users of this new API do you have in
> mind? Are they any other patches pending adding users of this API?
>
> > + * Note: Device in power state larger than D0 is also treated not alive
> > + * by this function.
> > + *
> > + * Returns true if the device is alive.
> > + */
> > +bool pci_dev_is_alive(struct pci_dev *pdev)
> > +{
> > + u16 vendor;
> > +
> > + pci_read_config_word(pdev, PCI_VENDOR_ID, &vendor);
> > +
> > + return vendor == pdev->vendor;
> > +}
> > +EXPORT_SYMBOL(pci_dev_is_alive);
>
> Why not use the EXPORT_SYMBOL_GPL()?
>
> Krzysztof

2021-05-26 16:25:05

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] pci: add pci_dev_is_alive API

On Wed, May 26, 2021 at 02:12:38PM +0800, Lambert Wang wrote:
> ...
> The user is our new PCI driver under development for WWAN devices .
> Surprise removal could happen under multiple circumstances.
> e.g. Exception, Link Failure, etc.
>
> We wanted this API to detect surprise removal or check device recovery
> when AER and Hotplug are disabled.
>
> I thought the API could be commonly used for many similar devices.

Be careful with this. pci_device_is_present() is not a good way to
detect surprise removal. Surprise removal can happen at any time, for
example, it can occur after you call pci_device_is_present() but
before you use the result:

present = pci_device_is_present(pdev);
/* present == true */
/* device may be removed here */
if (present)
xxx; /* this operation may fail */

You have to assume that *any* operation on the device can fail because
the device has been removed. In general, there's no response for a
PCIe write to the device, so you can't really check whether a write
has failed.

There *are* responses for reads, of course, if the device has been
removed, a read will cause a failure response. Most PCIe controllers
turn that response into ~0 data to satisfy the read. So the only
reliable way to detect surprise removal is to check for ~0 data when
doing an MMIO read from the device. Of course, ~0 may be either valid
data or a symptom of a failure response, so you may have to do
additional work to distinguish those two cases.

Bjorn

2021-05-26 19:44:02

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH] pci: add pci_dev_is_alive API

On Wed, May 26, 2021 at 02:12:38PM +0800, Lambert Wang wrote:
> The user is our new PCI driver under development for WWAN devices .
> Surprise removal could happen under multiple circumstances.
> e.g. Exception, Link Failure, etc.
>
> We wanted this API to detect surprise removal or check device recovery
> when AER and Hotplug are disabled.

You may want to take a look at pci_dev_is_disconnected().

Be aware of its limitations, which Bjorn has already pointed out
and which are discussed in more detail under the following link
in the "Surprise removal" section:

https://lwn.net/Articles/767885/

Thanks,

Lukas

2021-05-27 04:06:38

by Lambert Wang

[permalink] [raw]
Subject: Re: [PATCH] pci: add pci_dev_is_alive API

On Thu, May 27, 2021 at 12:23 AM Bjorn Helgaas <[email protected]> wrote:
>
> On Wed, May 26, 2021 at 02:12:38PM +0800, Lambert Wang wrote:
> > ...
> > The user is our new PCI driver under development for WWAN devices .
> > Surprise removal could happen under multiple circumstances.
> > e.g. Exception, Link Failure, etc.
> >
> > We wanted this API to detect surprise removal or check device recovery
> > when AER and Hotplug are disabled.
> >
> > I thought the API could be commonly used for many similar devices.
>
> Be careful with this. pci_device_is_present() is not a good way to
> detect surprise removal. Surprise removal can happen at any time, for
> example, it can occur after you call pci_device_is_present() but
> before you use the result:
>
> present = pci_device_is_present(pdev);
> /* present == true */
> /* device may be removed here */
> if (present)
> xxx; /* this operation may fail */
>
> You have to assume that *any* operation on the device can fail because
> the device has been removed. In general, there's no response for a
> PCIe write to the device, so you can't really check whether a write
> has failed.
>
> There *are* responses for reads, of course, if the device has been
> removed, a read will cause a failure response. Most PCIe controllers
> turn that response into ~0 data to satisfy the read. So the only
> reliable way to detect surprise removal is to check for ~0 data when
> doing an MMIO read from the device. Of course, ~0 may be either valid
> data or a symptom of a failure response, so you may have to do
> additional work to distinguish those two cases.

Thanks for reminding. :)

Yes the check has race conditions. When the driver is doing recovery detection,
the check result is not reliable.

It is pretty useful when the driver wants to confirm if the device is
absent *after*
the driver finds it's not working as expected.

>
> Bjorn

2021-05-27 04:09:27

by Lambert Wang

[permalink] [raw]
Subject: Re: [PATCH] pci: add pci_dev_is_alive API

On Thu, May 27, 2021 at 2:18 AM Lukas Wunner <[email protected]> wrote:
>
> On Wed, May 26, 2021 at 02:12:38PM +0800, Lambert Wang wrote:
> > The user is our new PCI driver under development for WWAN devices .
> > Surprise removal could happen under multiple circumstances.
> > e.g. Exception, Link Failure, etc.
> >
> > We wanted this API to detect surprise removal or check device recovery
> > when AER and Hotplug are disabled.
>
> You may want to take a look at pci_dev_is_disconnected().
>
> Be aware of its limitations, which Bjorn has already pointed out
> and which are discussed in more detail under the following link
> in the "Surprise removal" section:
>
> https://lwn.net/Articles/767885/
>

Thanks for the suggestion and the article. Currently I prefer
pci_device_is_present() for my scenario.

e.g. pci_dev_is_disconnected() seems to use a cached value. If the
driver wants to check the device's absence
after it *senses* something abnormal, pci_device_is_present() is more suitable.

> Thanks,
>
> Lukas