2021-07-20 15:18:38

by Niklas Schnelle

[permalink] [raw]
Subject: [PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h

The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
PCI arch code of both s390 and powerpc leading to awkward relative
includes. Move it to the global include/linux/pci.h and get rid of these
includes just for that one function.

Signed-off-by: Niklas Schnelle <[email protected]>
---
Since v1 (and bad v2):
- Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING
defines and also move these to include/linux/pci.h

arch/powerpc/platforms/powernv/pci-sriov.c | 3 ---
arch/powerpc/platforms/pseries/setup.c | 1 -
arch/s390/pci/pci_sysfs.c | 2 --
drivers/pci/hotplug/acpiphp_glue.c | 1 -
drivers/pci/pci.h | 15 ---------------
include/linux/pci.h | 15 +++++++++++++++
6 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
index 28aac933a439..2e0ca5451e85 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -9,9 +9,6 @@

#include "pci.h"

-/* for pci_dev_is_added() */
-#include "../../../../drivers/pci/pci.h"
-
/*
* The majority of the complexity in supporting SR-IOV on PowerNV comes from
* the need to put the MMIO space for each VF into a separate PE. Internally
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 631a0d57b6cd..17585ec9f955 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -74,7 +74,6 @@
#include <asm/hvconsole.h>

#include "pseries.h"
-#include "../../../../drivers/pci/pci.h"

DEFINE_STATIC_KEY_FALSE(shared_processor);
EXPORT_SYMBOL_GPL(shared_processor);
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index 6e2450c2b9c1..8dbe54ef8f8e 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -13,8 +13,6 @@
#include <linux/stat.h>
#include <linux/pci.h>

-#include "../../../drivers/pci/pci.h"
-
#include <asm/sclp.h>

#define zpci_attr(name, fmt, member) \
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index f031302ad401..4cb963f88183 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -38,7 +38,6 @@
#include <linux/slab.h>
#include <linux/acpi.h>

-#include "../pci.h"
#include "acpiphp.h"

static LIST_HEAD(bridge_list);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 93dcdd431072..a159cd0f6f05 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
return dev->error_state == pci_channel_io_perm_failure;
}

-/* pci_dev priv_flags */
-#define PCI_DEV_ADDED 0
-#define PCI_DPC_RECOVERED 1
-#define PCI_DPC_RECOVERING 2
-
-static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
-{
- assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added);
-}
-
-static inline bool pci_dev_is_added(const struct pci_dev *dev)
-{
- return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
-}
-
#ifdef CONFIG_PCIEAER
#include <linux/aer.h>

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 540b377ca8f6..ea0e23dbc8ec 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -507,6 +507,21 @@ struct pci_dev {
unsigned long priv_flags; /* Private flags for the PCI driver */
};

+/* pci_dev priv_flags */
+#define PCI_DEV_ADDED 0
+#define PCI_DPC_RECOVERED 1
+#define PCI_DPC_RECOVERING 2
+
+static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
+{
+ assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added);
+}
+
+static inline bool pci_dev_is_added(const struct pci_dev *dev)
+{
+ return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
+}
+
static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
{
#ifdef CONFIG_PCI_IOV
--
2.25.1


2021-08-20 22:39:39

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h

On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote:
> The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
> PCI arch code of both s390 and powerpc leading to awkward relative
> includes. Move it to the global include/linux/pci.h and get rid of these
> includes just for that one function.

I agree the includes are awkward.

But the arch code *using* pci_dev_is_added() seems awkward, too.

AFAICS, in powerpc, pci_dev_is_added() is only used by
pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources(). Those
are only called from pcibios_add_device(), which is only called from
pci_device_add().

Is it even possible for pci_dev_is_added() to be true in that path?

s390 uses pci_dev_is_added() in recover_store(), but I don't know what
that is (looks like a sysfs file, but it's not documented) or why s390
is the only arch that does this.

Maybe we should make powerpc and s390 less special?

> Signed-off-by: Niklas Schnelle <[email protected]>
> ---
> Since v1 (and bad v2):
> - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING
> defines and also move these to include/linux/pci.h
>
> arch/powerpc/platforms/powernv/pci-sriov.c | 3 ---
> arch/powerpc/platforms/pseries/setup.c | 1 -
> arch/s390/pci/pci_sysfs.c | 2 --
> drivers/pci/hotplug/acpiphp_glue.c | 1 -
> drivers/pci/pci.h | 15 ---------------
> include/linux/pci.h | 15 +++++++++++++++
> 6 files changed, 15 insertions(+), 22 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
> index 28aac933a439..2e0ca5451e85 100644
> --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> @@ -9,9 +9,6 @@
>
> #include "pci.h"
>
> -/* for pci_dev_is_added() */
> -#include "../../../../drivers/pci/pci.h"
> -
> /*
> * The majority of the complexity in supporting SR-IOV on PowerNV comes from
> * the need to put the MMIO space for each VF into a separate PE. Internally
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index 631a0d57b6cd..17585ec9f955 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -74,7 +74,6 @@
> #include <asm/hvconsole.h>
>
> #include "pseries.h"
> -#include "../../../../drivers/pci/pci.h"
>
> DEFINE_STATIC_KEY_FALSE(shared_processor);
> EXPORT_SYMBOL_GPL(shared_processor);
> diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
> index 6e2450c2b9c1..8dbe54ef8f8e 100644
> --- a/arch/s390/pci/pci_sysfs.c
> +++ b/arch/s390/pci/pci_sysfs.c
> @@ -13,8 +13,6 @@
> #include <linux/stat.h>
> #include <linux/pci.h>
>
> -#include "../../../drivers/pci/pci.h"
> -
> #include <asm/sclp.h>
>
> #define zpci_attr(name, fmt, member) \
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index f031302ad401..4cb963f88183 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -38,7 +38,6 @@
> #include <linux/slab.h>
> #include <linux/acpi.h>
>
> -#include "../pci.h"
> #include "acpiphp.h"
>
> static LIST_HEAD(bridge_list);
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 93dcdd431072..a159cd0f6f05 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
> return dev->error_state == pci_channel_io_perm_failure;
> }
>
> -/* pci_dev priv_flags */
> -#define PCI_DEV_ADDED 0
> -#define PCI_DPC_RECOVERED 1
> -#define PCI_DPC_RECOVERING 2
> -
> -static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
> -{
> - assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added);
> -}
> -
> -static inline bool pci_dev_is_added(const struct pci_dev *dev)
> -{
> - return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
> -}
> -
> #ifdef CONFIG_PCIEAER
> #include <linux/aer.h>
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 540b377ca8f6..ea0e23dbc8ec 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -507,6 +507,21 @@ struct pci_dev {
> unsigned long priv_flags; /* Private flags for the PCI driver */
> };
>
> +/* pci_dev priv_flags */
> +#define PCI_DEV_ADDED 0
> +#define PCI_DPC_RECOVERED 1
> +#define PCI_DPC_RECOVERING 2
> +
> +static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
> +{
> + assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added);
> +}
> +
> +static inline bool pci_dev_is_added(const struct pci_dev *dev)
> +{
> + return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
> +}
> +
> static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
> {
> #ifdef CONFIG_PCI_IOV
> --
> 2.25.1
>

2021-08-23 10:58:03

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h

On Fri, 2021-08-20 at 17:37 -0500, Bjorn Helgaas wrote:
> On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote:
> > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
> > PCI arch code of both s390 and powerpc leading to awkward relative
> > includes. Move it to the global include/linux/pci.h and get rid of these
> > includes just for that one function.
>
> I agree the includes are awkward.
>
> But the arch code *using* pci_dev_is_added() seems awkward, too.

See below for my interpretation why s390 has some driver like
functionality in its arch code which isn't necessarily awkward.

Independent from that I have found pci_dev_is_added() as the only way
deal with the case that one might be looking at a struct pci_dev
reference that has been removed via pci_stop_and_remove_bus_device() or
has never been fully scanned. This is quite useful when handling error
events which on s390 are part of the adapter event mechanism shared
with channel I/O devices.

>
> AFAICS, in powerpc, pci_dev_is_added() is only used by
> pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources(). Those
> are only called from pcibios_add_device(), which is only called from
> pci_device_add().
>
> Is it even possible for pci_dev_is_added() to be true in that path?
>
> s390 uses pci_dev_is_added() in recover_store()

I'm actually looking into this as I'm working on an s390 implementation
of the PCI recovery flow described in Documentation/PCI/pci-error-
recovery.rst that would also call pci_dev_is_added() because when we
get a platform notification of a PCI reset done by firmware it may be
that the sturct pci_dev is going away i.e. we still have a ref count
but it is not added to the PCI bus anymore. And pci_dev_is_added() is
the only way I've found to check for this state.

> , but I don't know what
> that is (looks like a sysfs file, but it's not documented) or why s390
> is the only arch that does this.

Good point about this not being documented, I'll look into adding docs.

This is a sysfs attribute that basically removes the pci_dev and re-
adds it. This has the complication that since the attribute sits at
/sys/bus/pci/devices/<dev>/recover it deletes its own parent directory
which requires extra caution and means concurrent accesses block on
pci_lock_rescan_remove() instead of a kernfs lock.
Long story short when concurrently triggering the attribute one thread
proceeds into the pci_lock_rescan_remove() section and does the
removal, while others would block on pci_lock_rescan_remove(). Now when
the threads unblock the removal is done. In this case there is a new
struct pci_dev found in the rescan but the previously blocked threads
still have references to the old struct pci_dev which was removed and
as far as I could tell can only be distinguihsed by checking
pci_dev_is_added().

>
> Maybe we should make powerpc and s390 less special?

On s390, as I see it, the reason for this is that all of the PCI
functionality is directly defined in the Architecture as special CPU
instructions which are kind of hypercalls but also an ISA extension.

These instructions range from the basic PCI memory accesses (no real
MMIO) to enumeration of the devices and on to reporting of hot-plug and
and resets/recovery events. Importantly we do not have any kind of
direct access to a real or virtual PCI controller and the architecture
has no concept of a comparable entity.

So in my opinion while there is some of the functionality of a PCI
controller in arch/s390/pci the cut off between controller
functionality and arch support isn't clear at all and exposing PCI
support as CPU instructions doesn't map well to the controller concept.

That said, in principle I'm open to moving some of that into
drivers/pci/controller/ if you think that would improve things and we
can find a good argument what should go where. One possible cut off
would be to have arch/s390/pci/ provide wrappers to the PCI
instructions but move all their uses to e.g.
drivers/pci/controller/s390/. This would of course be a major
refactoring and none of that code would be useful on any other
architecture but it would move a lot the accesses to PCI common code
functionality out of the arch code.

>
> > Signed-off-by: Niklas Schnelle <[email protected]>
> > ---
> > Since v1 (and bad v2):
> > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING
> > defines and also move these to include/linux/pci.h
> >
> > arch/powerpc/platforms/powernv/pci-sriov.c | 3 ---
> > arch/powerpc/platforms/pseries/setup.c | 1 -
> > arch/s390/pci/pci_sysfs.c | 2 --
> > drivers/pci/hotplug/acpiphp_glue.c | 1 -
> > drivers/pci/pci.h | 15 ---------------
> > include/linux/pci.h | 15 +++++++++++++++
> > 6 files changed, 15 insertions(+), 22 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
> > index 28aac933a439..2e0ca5451e85 100644
> > --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> > +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> > @@ -9,9 +9,6 @@
> >
> > #include "pci.h"
> >
> > -/* for pci_dev_is_added() */
> > -#include "../../../../drivers/pci/pci.h"
> >
.. snip ..

2021-08-25 21:09:44

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h

On Mon, Aug 23, 2021 at 12:53:39PM +0200, Niklas Schnelle wrote:
> On Fri, 2021-08-20 at 17:37 -0500, Bjorn Helgaas wrote:
> > On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote:
> > > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
> > > PCI arch code of both s390 and powerpc leading to awkward relative
> > > includes. Move it to the global include/linux/pci.h and get rid of these
> > > includes just for that one function.
> >
> > I agree the includes are awkward.
> >
> > But the arch code *using* pci_dev_is_added() seems awkward, too.
>
> See below for my interpretation why s390 has some driver like
> functionality in its arch code which isn't necessarily awkward.
>
> Independent from that I have found pci_dev_is_added() as the only way
> deal with the case that one might be looking at a struct pci_dev
> reference that has been removed via pci_stop_and_remove_bus_device() or
> has never been fully scanned. This is quite useful when handling error
> events which on s390 are part of the adapter event mechanism shared
> with channel I/O devices.
>
> > AFAICS, in powerpc, pci_dev_is_added() is only used by
> > pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources(). Those
> > are only called from pcibios_add_device(), which is only called from
> > pci_device_add().
> >
> > Is it even possible for pci_dev_is_added() to be true in that path?

If the pci_dev_is_added() in powerpc is unreachable, we can remove it
and at least reduce this to an s390-only problem.

> > s390 uses pci_dev_is_added() in recover_store()
>
> I'm actually looking into this as I'm working on an s390 implementation
> of the PCI recovery flow described in Documentation/PCI/pci-error-
> recovery.rst that would also call pci_dev_is_added() because when we
> get a platform notification of a PCI reset done by firmware it may be
> that the struct pci_dev is going away i.e. we still have a ref count
> but it is not added to the PCI bus anymore. And pci_dev_is_added() is
> the only way I've found to check for this state.
>
> > , but I don't know what
> > that is (looks like a sysfs file, but it's not documented) or why s390
> > is the only arch that does this.
>
> Good point about this not being documented, I'll look into adding docs.
>
> This is a sysfs attribute that basically removes the pci_dev and re-
> adds it. This has the complication that since the attribute sits at
> /sys/bus/pci/devices/<dev>/recover it deletes its own parent directory
> which requires extra caution and means concurrent accesses block on
> pci_lock_rescan_remove() instead of a kernfs lock.
> Long story short when concurrently triggering the attribute one thread
> proceeds into the pci_lock_rescan_remove() section and does the
> removal, while others would block on pci_lock_rescan_remove(). Now when
> the threads unblock the removal is done. In this case there is a new
> struct pci_dev found in the rescan but the previously blocked threads
> still have references to the old struct pci_dev which was removed and
> as far as I could tell can only be distinguished by checking
> pci_dev_is_added().

Is this locking issue different from concurrently writing to
/sys/.../remove on other architectures?

> > Maybe we should make powerpc and s390 less special?
>
> On s390, as I see it, the reason for this is that all of the PCI
> functionality is directly defined in the Architecture as special CPU
> instructions which are kind of hypercalls but also an ISA extension.
>
> These instructions range from the basic PCI memory accesses (no real
> MMIO) to enumeration of the devices and on to reporting of hot-plug and
> and resets/recovery events. Importantly we do not have any kind of
> direct access to a real or virtual PCI controller and the architecture
> has no concept of a comparable entity.
>
> So in my opinion while there is some of the functionality of a PCI
> controller in arch/s390/pci the cut off between controller
> functionality and arch support isn't clear at all and exposing PCI
> support as CPU instructions doesn't map well to the controller concept.
>
> That said, in principle I'm open to moving some of that into
> drivers/pci/controller/ if you think that would improve things and we
> can find a good argument what should go where. One possible cut off
> would be to have arch/s390/pci/ provide wrappers to the PCI
> instructions but move all their uses to e.g.
> drivers/pci/controller/s390/. This would of course be a major
> refactoring and none of that code would be useful on any other
> architecture but it would move a lot the accesses to PCI common code
> functionality out of the arch code.

Looks like hotplug is already in drivers/pci/hotplug/s390_pci_hpc.c.

Might be worth considering putting the other PCI core-ish code in
drivers/pci as well, though it doesn't feel urgent to me. Maybe a
good internship or mentoring project.

I'm not sure this juggling around is worth it basically to just clean
up the include path. The downside to me is exposing
pci_dev_is_added() to outside the PCI core, because I don't want
to encourage any other users.

> > > Signed-off-by: Niklas Schnelle <[email protected]>
> > > ---
> > > Since v1 (and bad v2):
> > > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING
> > > defines and also move these to include/linux/pci.h
> > >
> > > arch/powerpc/platforms/powernv/pci-sriov.c | 3 ---
> > > arch/powerpc/platforms/pseries/setup.c | 1 -
> > > arch/s390/pci/pci_sysfs.c | 2 --
> > > drivers/pci/hotplug/acpiphp_glue.c | 1 -
> > > drivers/pci/pci.h | 15 ---------------
> > > include/linux/pci.h | 15 +++++++++++++++
> > > 6 files changed, 15 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
> > > index 28aac933a439..2e0ca5451e85 100644
> > > --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> > > +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> > > @@ -9,9 +9,6 @@
> > >
> > > #include "pci.h"
> > >
> > > -/* for pci_dev_is_added() */
> > > -#include "../../../../drivers/pci/pci.h"
> > >
> .. snip ..
>

2021-08-26 12:38:46

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h

On Wed, 2021-08-25 at 14:04 -0500, Bjorn Helgaas wrote:
> On Mon, Aug 23, 2021 at 12:53:39PM +0200, Niklas Schnelle wrote:
> > On Fri, 2021-08-20 at 17:37 -0500, Bjorn Helgaas wrote:
> > > On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote:
> > > > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
> > > > PCI arch code of both s390 and powerpc leading to awkward relative
> > > > includes. Move it to the global include/linux/pci.h and get rid of these
> > > > includes just for that one function.
> > >
> > > I agree the includes are awkward.
> > >
> > > But the arch code *using* pci_dev_is_added() seems awkward, too.
> >
> > See below for my interpretation why s390 has some driver like
> > functionality in its arch code which isn't necessarily awkward.
> >
> > Independent from that I have found pci_dev_is_added() as the only way
> > deal with the case that one might be looking at a struct pci_dev
> > reference that has been removed via pci_stop_and_remove_bus_device() or
> > has never been fully scanned. This is quite useful when handling error
> > events which on s390 are part of the adapter event mechanism shared
> > with channel I/O devices.
> >
> > > AFAICS, in powerpc, pci_dev_is_added() is only used by
> > > pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources(). Those
> > > are only called from pcibios_add_device(), which is only called from
> > > pci_device_add().
> > >
> > > Is it even possible for pci_dev_is_added() to be true in that path?
>
> If the pci_dev_is_added() in powerpc is unreachable, we can remove it
> and at least reduce this to an s390-only problem.

Ok. I might be missing something but I agree it does look these are
called from within pcibios_add_device() only so pci_dev_is_added() can
never be true. This looks pretty clear as pci_dev_assign_added() is
called after pcibios_add_device() in pci_bus_add_device().

>
> > > s390 uses pci_dev_is_added() in recover_store()
> >
> > I'm actually looking into this as I'm working on an s390 implementation
> > of the PCI recovery flow described in Documentation/PCI/pci-error-
> > recovery.rst that would also call pci_dev_is_added() because when we
> > get a platform notification of a PCI reset done by firmware it may be
> > that the struct pci_dev is going away i.e. we still have a ref count
> > but it is not added to the PCI bus anymore. And pci_dev_is_added() is
> > the only way I've found to check for this state.
> >
> > > , but I don't know what
> > > that is (looks like a sysfs file, but it's not documented) or why s390
> > > is the only arch that does this.
> >
> > Good point about this not being documented, I'll look into adding docs.
> >
> > This is a sysfs attribute that basically removes the pci_dev and re-
> > adds it. This has the complication that since the attribute sits at
> > /sys/bus/pci/devices/<dev>/recover it deletes its own parent directory
> > which requires extra caution and means concurrent accesses block on
> > pci_lock_rescan_remove() instead of a kernfs lock.
> > Long story short when concurrently triggering the attribute one thread
> > proceeds into the pci_lock_rescan_remove() section and does the
> > removal, while others would block on pci_lock_rescan_remove(). Now when
> > the threads unblock the removal is done. In this case there is a new
> > struct pci_dev found in the rescan but the previously blocked threads
> > still have references to the old struct pci_dev which was removed and
> > as far as I could tell can only be distinguished by checking
> > pci_dev_is_added().
>
> Is this locking issue different from concurrently writing to
> /sys/.../remove on other architectures?

In principle it is very similar except that we re-scan and thus the
removed pdev may co-exist with a new pdev for the same actual device if
there are other references to the pdev.

There is however also a significant difference in locking that fixes a
possible deadlock that I just confirmed also affects /sys/../remove
where it is hidden by a lockdep ignore, see lockdep splash below.

For /sys/../recover this was fixed by my commit dd712f0a53ca
("s390/pci: Fix possible deadlock in recover_store()") which also
introduced the need for pci_dev_is_added().

The change in the above commit is very similar to what is done in
drivers/scsi/scsi_sysfs.c:sdev_store_delete() which also has a self
deletion and in commit 0ee223b2e1f6 ("scsi: core: Avoid that SCSI
device removal through sysfs triggers a deadlock") fixed a very very
similar lock order problem.

Like the SCSI code I use sysfs_break_active_protection() which allows a
concurrent access to /sys/../recover to advance into recover_store().
It may then block on pci_lock_rescan_remove() instead of the kernfs
lock it would block on with just delete_remove_file_self(). Thus we
have to handle unblocking from pci_lock_rescan_remove() while holding
the stale struct pci_dev of the removed device. Together with the re-
scan this means I have to be able to determine after unblocking if I'm
looking at the old pdev.

I just tested that when removing the lockdep ignore from remove_store()
and do /sys/../power and /sys/../remove I do hit the same lock order
inversion issue there for remove. Note that unlike in the commit
introducing the ignore this is a completely flat hiearchy:

[ 234.196509] ======================================================
[ 234.196511] WARNING: possible circular locking dependency detected
[ 234.196512] 5.14.0-rc7-08025-gf6d7568b37df-dirty #18 Not tainted
[ 234.196514] ------------------------------------------------------
[ 234.196516] zsh/1214 is trying to acquire lock:
[ 234.196518] 000000013f296e30 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0x26/0x48
[ 234.196529]
but task is already holding lock:
[ 234.196530] 000000009b6c3a10 (kn->active#363){++++}-{0:0}, at: sysfs_remove_file_self+0x42/0x78
[ 234.196538]
which lock already depends on the new lock.

[ 234.196539]
the existing dependency chain (in reverse order) is:
[ 234.196541]
-> #1 (kn->active#363){++++}-{0:0}:
[ 234.196545] validate_chain+0x9ca/0xde8
[ 234.196548] __lock_acquire+0x64c/0xc40
[ 234.196550] lock_acquire.part.0+0xec/0x258
[ 234.196552] lock_acquire+0xb0/0x200
[ 234.196554] kernfs_drain+0x17a/0x1c8
[ 234.196556] __kernfs_remove+0x1f4/0x230
[ 234.196558] kernfs_remove_by_name_ns+0x5c/0xa8
[ 234.196560] remove_files+0x46/0x90
[ 234.196563] sysfs_remove_group+0x5a/0xb8
[ 234.196565] sysfs_remove_groups+0x46/0x68
[ 234.196567] device_remove_attrs+0x90/0xb8
[ 234.196598] device_del+0x192/0x3f8
[ 234.196600] pci_remove_bus_device+0x8a/0x128
[ 234.196602] pci_stop_and_remove_bus_device_locked+0x3a/0x48
[ 234.196604] zpci_bus_remove_device+0x68/0xa8
[ 234.196607] zpci_deconfigure_device+0x3a/0xe0
[ 234.196610] power_write_file+0x7c/0x130
[ 234.196615] kernfs_fop_write_iter+0x13e/0x1e0
[ 234.196617] new_sync_write+0x100/0x190
[ 234.196620] vfs_write+0x21e/0x2d0
[ 234.196622] ksys_write+0x6c/0xf8
[ 234.196624] __do_syscall+0x1c2/0x1f0
[ 234.196628] system_call+0x78/0xa0
[ 234.196632]
-> #0 (pci_rescan_remove_lock){+.+.}-{3:3}:
[ 234.196636] check_noncircular+0x168/0x188
[ 234.196637] check_prev_add+0xe0/0xed8
[ 234.196639] validate_chain+0x9ca/0xde8
[ 234.196641] __lock_acquire+0x64c/0xc40
[ 234.196642] lock_acquire.part.0+0xec/0x258
[ 234.196644] lock_acquire+0xb0/0x200
[ 234.196646] __mutex_lock+0xa2/0x8d8
[ 234.196648] mutex_lock_nested+0x32/0x40
[ 234.196649] pci_stop_and_remove_bus_device_locked+0x26/0x48
[ 234.196652] remove_store+0x7a/0x88
[ 234.196654] kernfs_fop_write_iter+0x13e/0x1e0
[ 234.196656] new_sync_write+0x100/0x190
[ 234.196657] vfs_write+0x21e/0x2d0
[ 234.196659] ksys_write+0x6c/0xf8
[ 234.196661] __do_syscall+0x1c2/0x1f0
[ 234.196663] system_call+0x78/0xa0
[ 234.196665]
other info that might help us debug this:

[ 234.196666] Possible unsafe locking scenario:

[ 234.196668] CPU0 CPU1
[ 234.196669] ---- ----
[ 234.196670] lock(kn->active#363);
[ 234.196672] lock(pci_rescan_remove_lock);
[ 234.196674] lock(kn->active#363);
[ 234.196677] lock(pci_rescan_remove_lock);
[ 234.196679]
*** DEADLOCK ***

[ 234.196680] 3 locks held by zsh/1214:
[ 234.196682] #0: 0000000099d44498 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x6c/0xf8
[ 234.196688] #1: 00000000b214ee90 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x102/0x1e0
[ 234.196693] #2: 000000009b6c3a10 (kn->active#363){++++}-{0:0}, at: sysfs_remove_file_self+0x42/0x78
[ 234.196699]
stack backtrace:
[ 234.196701] CPU: 6 PID: 1214 Comm: zsh Not tainted 5.14.0-rc7-08025-gf6d7568b37df-dirty #18
[ 234.196703] Hardware name: IBM 8561 T01 703 (LPAR)
[ 234.196705] Call Trace:
[ 234.196706] [<000000013ebba5d0>] show_stack+0x90/0xf8
[ 234.196711] [<000000013ebcc28e>] dump_stack_lvl+0x8e/0xc8
[ 234.196714] [<000000013dfbe908>] check_noncircular+0x168/0x188
[ 234.196716] [<000000013dfbf9c8>] check_prev_add+0xe0/0xed8
[ 234.196718] [<000000013dfc118a>] validate_chain+0x9ca/0xde8
[ 234.196720] [<000000013dfc4184>] __lock_acquire+0x64c/0xc40
[ 234.196721] [<000000013dfc2d8c>] lock_acquire.part.0+0xec/0x258
[ 234.196724] [<000000013dfc2fa8>] lock_acquire+0xb0/0x200
[ 234.196725] [<000000013ebdac7a>] __mutex_lock+0xa2/0x8d8
[ 234.196727] [<000000013ebdb4e2>] mutex_lock_nested+0x32/0x40
[ 234.196729] [<000000013e7daaf6>] pci_stop_and_remove_bus_device_locked+0x26/0x48
[ 234.196732] [<000000013e7e753a>] remove_store+0x7a/0x88
[ 234.196734] [<000000013e35b4be>] kernfs_fop_write_iter+0x13e/0x1e0
[ 234.196736] [<000000013e264098>] new_sync_write+0x100/0x190
[ 234.196738] [<000000013e266f06>] vfs_write+0x21e/0x2d0
[ 234.196740] [<000000013e267224>] ksys_write+0x6c/0xf8
[ 234.196742] [<000000013ebcf9da>] __do_syscall+0x1c2/0x1f0
[ 234.196744] [<000000013ebe2238>] system_call+0x78/0xa0


>
> > > Maybe we should make powerpc and s390 less special?
> >
> > On s390, as I see it, the reason for this is that all of the PCI
> > functionality is directly defined in the Architecture as special CPU
> > instructions which are kind of hypercalls but also an ISA extension.
> >
> > These instructions range from the basic PCI memory accesses (no real
> > MMIO) to enumeration of the devices and on to reporting of hot-plug and
> > and resets/recovery events. Importantly we do not have any kind of
> > direct access to a real or virtual PCI controller and the architecture
> > has no concept of a comparable entity.
> >
> > So in my opinion while there is some of the functionality of a PCI
> > controller in arch/s390/pci the cut off between controller
> > functionality and arch support isn't clear at all and exposing PCI
> > support as CPU instructions doesn't map well to the controller concept.
> >
> > That said, in principle I'm open to moving some of that into
> > drivers/pci/controller/ if you think that would improve things and we
> > can find a good argument what should go where. One possible cut off
> > would be to have arch/s390/pci/ provide wrappers to the PCI
> > instructions but move all their uses to e.g.
> > drivers/pci/controller/s390/. This would of course be a major
> > refactoring and none of that code would be useful on any other
> > architecture but it would move a lot the accesses to PCI common code
> > functionality out of the arch code.
>
> Looks like hotplug is already in drivers/pci/hotplug/s390_pci_hpc.c.
>
> Might be worth considering putting the other PCI core-ish code in
> drivers/pci as well, though it doesn't feel urgent to me. Maybe a
> good internship or mentoring project.

I agree

>
> I'm not sure this juggling around is worth it basically to just clean
> up the include path. The downside to me is exposing
> pci_dev_is_added() to outside the PCI core, because I don't want
> to encourage any other users.

Hmm, for me the big question how to then handle the need for
pci_dev_is_added() in my upcoming recovery code, that would be a new
user outside the PCI core unless I move arch/s390/pci/pci_event.c to
drivers/pci.

That said I'm not sure I understand yet what makes pci_dev_is_added()
unsuitable for outside the PCI core. I do understand that struct
pci_dev::priv_flags is supposed to be private to PCI drivers but on the
other hand the functionality of checking whether a PCI device has been
added to a PCI bus seems rather basic and useful whenever working with
a PCI device.

>
> > > > Signed-off-by: Niklas Schnelle <[email protected]>
> > > > ---
> > > > Since v1 (and bad v2):
> > > > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING
> > > > defines and also move these to include/linux/pci.h
> > > >
> > > > arch/powerpc/platforms/powernv/pci-sriov.c | 3 ---
> > > > arch/powerpc/platforms/pseries/setup.c | 1 -
> > > > arch/s390/pci/pci_sysfs.c | 2 --
> > > > drivers/pci/hotplug/acpiphp_glue.c | 1 -
> > > > drivers/pci/pci.h | 15 ---------------
> > > > include/linux/pci.h | 15 +++++++++++++++
> > > > 6 files changed, 15 insertions(+), 22 deletions(-)
> > > >
> > > > diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
> > > > index 28aac933a439..2e0ca5451e85 100644
> > > > --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> > > > +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> > > > @@ -9,9 +9,6 @@
> > > >
> > > > #include "pci.h"
> > > >
> > > > -/* for pci_dev_is_added() */
> > > > -#include "../../../../drivers/pci/pci.h"
> > > >
> > .. snip ..
> >

2021-10-08 22:14:30

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h

On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote:
> The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
> PCI arch code of both s390 and powerpc leading to awkward relative
> includes. Move it to the global include/linux/pci.h and get rid of these
> includes just for that one function.
>
> Signed-off-by: Niklas Schnelle <[email protected]>
> ---
> Since v1 (and bad v2):
> - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING
> defines and also move these to include/linux/pci.h
>
> arch/powerpc/platforms/powernv/pci-sriov.c | 3 ---
> arch/powerpc/platforms/pseries/setup.c | 1 -
> arch/s390/pci/pci_sysfs.c | 2 --
> drivers/pci/hotplug/acpiphp_glue.c | 1 -
> drivers/pci/pci.h | 15 ---------------
> include/linux/pci.h | 15 +++++++++++++++
> 6 files changed, 15 insertions(+), 22 deletions(-)

I dropped this one because I think a subsequent patch removed the use
in arch/powerpc, so if you still need this, it probably needs to be
updated to at least drop those hunks.