2021-08-30 08:10:05

by Jan Kiszka

[permalink] [raw]
Subject: [PATCH v2] PCI/portdrv: Do not setup up IRQs if there are no users

From: Jan Kiszka <[email protected]>

Avoid registering service IRQs if there is no service that offers them
or no driver to register a handler against them. This saves IRQ vectors
when they are limited (e.g. on x86) and also avoids that spurious events
could hit a missing handler. Such spurious events need to be generated
by the Jailhouse hypervisor for active MSI vectors when enabling or
disabling itself.

Signed-off-by: Jan Kiszka <[email protected]>
---

Changes in v2:
- move initialization of irqs to address test bot finding

drivers/pci/pcie/portdrv_core.c | 47 +++++++++++++++++++++------------
1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index e1fed6649c41..0e2556269429 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -166,9 +166,6 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
{
int ret, i;

- for (i = 0; i < PCIE_PORT_DEVICE_MAXSERVICES; i++)
- irqs[i] = -1;
-
/*
* If we support PME but can't use MSI/MSI-X for it, we have to
* fall back to INTx or other interrupts, e.g., a system shared
@@ -312,8 +309,10 @@ static int pcie_device_init(struct pci_dev *pdev, int service, int irq)
*/
int pcie_port_device_register(struct pci_dev *dev)
{
- int status, capabilities, i, nr_service;
- int irqs[PCIE_PORT_DEVICE_MAXSERVICES];
+ int status, capabilities, irq_services, i, nr_service;
+ int irqs[PCIE_PORT_DEVICE_MAXSERVICES] = {
+ [0 ... PCIE_PORT_DEVICE_MAXSERVICES-1] = -1
+ };

/* Enable PCI Express port device */
status = pci_enable_device(dev);
@@ -326,18 +325,32 @@ int pcie_port_device_register(struct pci_dev *dev)
return 0;

pci_set_master(dev);
- /*
- * Initialize service irqs. Don't use service devices that
- * require interrupts if there is no way to generate them.
- * However, some drivers may have a polling mode (e.g. pciehp_poll_mode)
- * that can be used in the absence of irqs. Allow them to determine
- * if that is to be used.
- */
- status = pcie_init_service_irqs(dev, irqs, capabilities);
- if (status) {
- capabilities &= PCIE_PORT_SERVICE_HP;
- if (!capabilities)
- goto error_disable;
+
+ irq_services = 0;
+ if (IS_ENABLED(CONFIG_PCIE_PME))
+ irq_services |= PCIE_PORT_SERVICE_PME;
+ if (IS_ENABLED(CONFIG_PCIEAER))
+ irq_services |= PCIE_PORT_SERVICE_AER;
+ if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
+ irq_services |= PCIE_PORT_SERVICE_HP;
+ if (IS_ENABLED(CONFIG_PCIE_DPC))
+ irq_services |= PCIE_PORT_SERVICE_DPC;
+ irq_services &= capabilities;
+
+ if (irq_services) {
+ /*
+ * Initialize service irqs. Don't use service devices that
+ * require interrupts if there is no way to generate them.
+ * However, some drivers may have a polling mode (e.g.
+ * pciehp_poll_mode) that can be used in the absence of irqs.
+ * Allow them to determine if that is to be used.
+ */
+ status = pcie_init_service_irqs(dev, irqs, irq_services);
+ if (status) {
+ irq_services &= PCIE_PORT_SERVICE_HP;
+ if (!irq_services)
+ goto error_disable;
+ }
}

/* Allocate child services if any */
--
2.31.1


2021-09-21 02:56:09

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI/portdrv: Do not setup up IRQs if there are no users

On Mon, Aug 30, 2021 at 10:08:10AM +0200, Jan Kiszka wrote:
> From: Jan Kiszka <[email protected]>
>
> Avoid registering service IRQs if there is no service that offers them
> or no driver to register a handler against them. This saves IRQ vectors
> when they are limited (e.g. on x86) and also avoids that spurious events
> could hit a missing handler. Such spurious events need to be generated
> by the Jailhouse hypervisor for active MSI vectors when enabling or
> disabling itself.
>
> Signed-off-by: Jan Kiszka <[email protected]>

Applied to pci/portdrv for v5.16, thanks!

> ---
>
> Changes in v2:
> - move initialization of irqs to address test bot finding
>
> drivers/pci/pcie/portdrv_core.c | 47 +++++++++++++++++++++------------
> 1 file changed, 30 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index e1fed6649c41..0e2556269429 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -166,9 +166,6 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
> {
> int ret, i;
>
> - for (i = 0; i < PCIE_PORT_DEVICE_MAXSERVICES; i++)
> - irqs[i] = -1;
> -
> /*
> * If we support PME but can't use MSI/MSI-X for it, we have to
> * fall back to INTx or other interrupts, e.g., a system shared
> @@ -312,8 +309,10 @@ static int pcie_device_init(struct pci_dev *pdev, int service, int irq)
> */
> int pcie_port_device_register(struct pci_dev *dev)
> {
> - int status, capabilities, i, nr_service;
> - int irqs[PCIE_PORT_DEVICE_MAXSERVICES];
> + int status, capabilities, irq_services, i, nr_service;
> + int irqs[PCIE_PORT_DEVICE_MAXSERVICES] = {
> + [0 ... PCIE_PORT_DEVICE_MAXSERVICES-1] = -1
> + };
>
> /* Enable PCI Express port device */
> status = pci_enable_device(dev);
> @@ -326,18 +325,32 @@ int pcie_port_device_register(struct pci_dev *dev)
> return 0;
>
> pci_set_master(dev);
> - /*
> - * Initialize service irqs. Don't use service devices that
> - * require interrupts if there is no way to generate them.
> - * However, some drivers may have a polling mode (e.g. pciehp_poll_mode)
> - * that can be used in the absence of irqs. Allow them to determine
> - * if that is to be used.
> - */
> - status = pcie_init_service_irqs(dev, irqs, capabilities);
> - if (status) {
> - capabilities &= PCIE_PORT_SERVICE_HP;
> - if (!capabilities)
> - goto error_disable;
> +
> + irq_services = 0;
> + if (IS_ENABLED(CONFIG_PCIE_PME))
> + irq_services |= PCIE_PORT_SERVICE_PME;
> + if (IS_ENABLED(CONFIG_PCIEAER))
> + irq_services |= PCIE_PORT_SERVICE_AER;
> + if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> + irq_services |= PCIE_PORT_SERVICE_HP;
> + if (IS_ENABLED(CONFIG_PCIE_DPC))
> + irq_services |= PCIE_PORT_SERVICE_DPC;
> + irq_services &= capabilities;
> +
> + if (irq_services) {
> + /*
> + * Initialize service irqs. Don't use service devices that
> + * require interrupts if there is no way to generate them.
> + * However, some drivers may have a polling mode (e.g.
> + * pciehp_poll_mode) that can be used in the absence of irqs.
> + * Allow them to determine if that is to be used.
> + */
> + status = pcie_init_service_irqs(dev, irqs, irq_services);
> + if (status) {
> + irq_services &= PCIE_PORT_SERVICE_HP;
> + if (!irq_services)
> + goto error_disable;
> + }
> }
>
> /* Allocate child services if any */
> --
> 2.31.1

2022-02-01 20:57:32

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH v2] PCI/portdrv: Do not setup up IRQs if there are no users

On 30.08.21 10:08, Jan Kiszka wrote:
> From: Jan Kiszka <[email protected]>
>
> Avoid registering service IRQs if there is no service that offers them
> or no driver to register a handler against them. This saves IRQ vectors
> when they are limited (e.g. on x86) and also avoids that spurious events
> could hit a missing handler. Such spurious events need to be generated
> by the Jailhouse hypervisor for active MSI vectors when enabling or
> disabling itself.
>
> Signed-off-by: Jan Kiszka <[email protected]>
> ---
>
> Changes in v2:
> - move initialization of irqs to address test bot finding
>
> drivers/pci/pcie/portdrv_core.c | 47 +++++++++++++++++++++------------
> 1 file changed, 30 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index e1fed6649c41..0e2556269429 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -166,9 +166,6 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
> {
> int ret, i;
>
> - for (i = 0; i < PCIE_PORT_DEVICE_MAXSERVICES; i++)
> - irqs[i] = -1;
> -
> /*
> * If we support PME but can't use MSI/MSI-X for it, we have to
> * fall back to INTx or other interrupts, e.g., a system shared
> @@ -312,8 +309,10 @@ static int pcie_device_init(struct pci_dev *pdev, int service, int irq)
> */
> int pcie_port_device_register(struct pci_dev *dev)
> {
> - int status, capabilities, i, nr_service;
> - int irqs[PCIE_PORT_DEVICE_MAXSERVICES];
> + int status, capabilities, irq_services, i, nr_service;
> + int irqs[PCIE_PORT_DEVICE_MAXSERVICES] = {
> + [0 ... PCIE_PORT_DEVICE_MAXSERVICES-1] = -1
> + };
>
> /* Enable PCI Express port device */
> status = pci_enable_device(dev);
> @@ -326,18 +325,32 @@ int pcie_port_device_register(struct pci_dev *dev)
> return 0;
>
> pci_set_master(dev);
> - /*
> - * Initialize service irqs. Don't use service devices that
> - * require interrupts if there is no way to generate them.
> - * However, some drivers may have a polling mode (e.g. pciehp_poll_mode)
> - * that can be used in the absence of irqs. Allow them to determine
> - * if that is to be used.
> - */
> - status = pcie_init_service_irqs(dev, irqs, capabilities);
> - if (status) {
> - capabilities &= PCIE_PORT_SERVICE_HP;
> - if (!capabilities)
> - goto error_disable;
> +
> + irq_services = 0;
> + if (IS_ENABLED(CONFIG_PCIE_PME))
> + irq_services |= PCIE_PORT_SERVICE_PME;
> + if (IS_ENABLED(CONFIG_PCIEAER))
> + irq_services |= PCIE_PORT_SERVICE_AER;
> + if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> + irq_services |= PCIE_PORT_SERVICE_HP;
> + if (IS_ENABLED(CONFIG_PCIE_DPC))
> + irq_services |= PCIE_PORT_SERVICE_DPC;
> + irq_services &= capabilities;
> +
> + if (irq_services) {
> + /*
> + * Initialize service irqs. Don't use service devices that
> + * require interrupts if there is no way to generate them.
> + * However, some drivers may have a polling mode (e.g.
> + * pciehp_poll_mode) that can be used in the absence of irqs.
> + * Allow them to determine if that is to be used.
> + */
> + status = pcie_init_service_irqs(dev, irqs, irq_services);
> + if (status) {
> + irq_services &= PCIE_PORT_SERVICE_HP;
> + if (!irq_services)
> + goto error_disable;
> + }
> }
>
> /* Allocate child services if any */

It turns out that this patch causes troubles on some machines, see [1].
That could be "resolved" by doing

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index bda630889f95..68b0013c3662 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -331,7 +331,7 @@ int pcie_port_device_register(struct pci_dev *dev)

pci_set_master(dev);

- irq_services = 0;
+ irq_services = PCIE_PORT_SERVICE_BWNOTIF;
if (IS_ENABLED(CONFIG_PCIE_PME))
irq_services |= PCIE_PORT_SERVICE_PME;
if (IS_ENABLED(CONFIG_PCIEAER))

thus considering bandwidth notification as an IRQ-providing service as
well. But as far as I can see, there is no driver for this port service,
thus no one should ever request or even handle that interrupt.

I'm not yet seeing the key difference that could explain this effect.
What else happens via pcie_device_init() when called for
PCIE_PORT_SERVICE_BWNOTIF, although there will never be a driver?

Jan

[1] https://bugzilla.kernel.org/show_bug.cgi?id=215533

--
Siemens AG, Technology
Competence Center Embedded Linux

2022-02-09 02:39:52

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI/portdrv: Do not setup up IRQs if there are no users

[+cc David, Joey, Sergiu]

On Mon, Jan 31, 2022 at 10:22:28PM +0100, Jan Kiszka wrote:
> On 30.08.21 10:08, Jan Kiszka wrote:
> > From: Jan Kiszka <[email protected]>
> >
> > Avoid registering service IRQs if there is no service that offers them
> > or no driver to register a handler against them. This saves IRQ vectors
> > when they are limited (e.g. on x86) and also avoids that spurious events
> > could hit a missing handler. Such spurious events need to be generated
> > by the Jailhouse hypervisor for active MSI vectors when enabling or
> > disabling itself.
> >
> > Signed-off-by: Jan Kiszka <[email protected]>
> > ---
> >
> > Changes in v2:
> > - move initialization of irqs to address test bot finding
> >
> > drivers/pci/pcie/portdrv_core.c | 47 +++++++++++++++++++++------------
> > 1 file changed, 30 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> > index e1fed6649c41..0e2556269429 100644
> > --- a/drivers/pci/pcie/portdrv_core.c
> > +++ b/drivers/pci/pcie/portdrv_core.c
> > @@ -166,9 +166,6 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
> > {
> > int ret, i;
> > - for (i = 0; i < PCIE_PORT_DEVICE_MAXSERVICES; i++)
> > - irqs[i] = -1;
> > -
> > /*
> > * If we support PME but can't use MSI/MSI-X for it, we have to
> > * fall back to INTx or other interrupts, e.g., a system shared
> > @@ -312,8 +309,10 @@ static int pcie_device_init(struct pci_dev *pdev, int service, int irq)
> > */
> > int pcie_port_device_register(struct pci_dev *dev)
> > {
> > - int status, capabilities, i, nr_service;
> > - int irqs[PCIE_PORT_DEVICE_MAXSERVICES];
> > + int status, capabilities, irq_services, i, nr_service;
> > + int irqs[PCIE_PORT_DEVICE_MAXSERVICES] = {
> > + [0 ... PCIE_PORT_DEVICE_MAXSERVICES-1] = -1
> > + };
> > /* Enable PCI Express port device */
> > status = pci_enable_device(dev);
> > @@ -326,18 +325,32 @@ int pcie_port_device_register(struct pci_dev *dev)
> > return 0;
> > pci_set_master(dev);
> > - /*
> > - * Initialize service irqs. Don't use service devices that
> > - * require interrupts if there is no way to generate them.
> > - * However, some drivers may have a polling mode (e.g. pciehp_poll_mode)
> > - * that can be used in the absence of irqs. Allow them to determine
> > - * if that is to be used.
> > - */
> > - status = pcie_init_service_irqs(dev, irqs, capabilities);
> > - if (status) {
> > - capabilities &= PCIE_PORT_SERVICE_HP;
> > - if (!capabilities)
> > - goto error_disable;
> > +
> > + irq_services = 0;
> > + if (IS_ENABLED(CONFIG_PCIE_PME))
> > + irq_services |= PCIE_PORT_SERVICE_PME;
> > + if (IS_ENABLED(CONFIG_PCIEAER))
> > + irq_services |= PCIE_PORT_SERVICE_AER;
> > + if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> > + irq_services |= PCIE_PORT_SERVICE_HP;
> > + if (IS_ENABLED(CONFIG_PCIE_DPC))
> > + irq_services |= PCIE_PORT_SERVICE_DPC;
> > + irq_services &= capabilities;
> > +
> > + if (irq_services) {
> > + /*
> > + * Initialize service irqs. Don't use service devices that
> > + * require interrupts if there is no way to generate them.
> > + * However, some drivers may have a polling mode (e.g.
> > + * pciehp_poll_mode) that can be used in the absence of irqs.
> > + * Allow them to determine if that is to be used.
> > + */
> > + status = pcie_init_service_irqs(dev, irqs, irq_services);
> > + if (status) {
> > + irq_services &= PCIE_PORT_SERVICE_HP;
> > + if (!irq_services)
> > + goto error_disable;
> > + }
> > }
> > /* Allocate child services if any */
>
> It turns out that this patch causes troubles on some machines, see [1].
> That could be "resolved" by doing
>
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index bda630889f95..68b0013c3662 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -331,7 +331,7 @@ int pcie_port_device_register(struct pci_dev *dev)
> pci_set_master(dev);
> - irq_services = 0;
> + irq_services = PCIE_PORT_SERVICE_BWNOTIF;
> if (IS_ENABLED(CONFIG_PCIE_PME))
> irq_services |= PCIE_PORT_SERVICE_PME;
> if (IS_ENABLED(CONFIG_PCIEAER))
>
> thus considering bandwidth notification as an IRQ-providing service as
> well. But as far as I can see, there is no driver for this port service,
> thus no one should ever request or even handle that interrupt.
>
> I'm not yet seeing the key difference that could explain this effect.
> What else happens via pcie_device_init() when called for
> PCIE_PORT_SERVICE_BWNOTIF, although there will never be a driver?
>
> Jan
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215533

Comparing David's "pci=earlydump" logs from [2,3] I see these
differences:

- BIOS 1.12
+ BIOS 1.14
00:1d.0 Root Port to [bus 04]
- Status: INTx-
+ Status: INTx+
- DevSta: CorrErr+
+ DevSta: CorrErr-
- LnkCtl: CommClk+ AutWidDis+ BWInt- AutBWInt+
+ LnkCtl: CommClk- AutWidDis- BWInt+ AutBWInt-
04:00.0 NVMe SSD
- LnkCtl: CommClk+ ClockPM-
+ LnkCtl: CommClk- ClockPM+

It looks like BIOS 1.14 leaves the BWInt bit (Link Bandwidth
Management Interrupt Enable) *set*, while BIOS 1.12 left it cleared.

Joey's log [4] with BIOS 1.14 also shows BWInt set:

+ BIOS 1.14
00:1d.0 Root Port to [bus 04]
+ Status: INTx+
+ DevSta: CorrErr-
+ LnkCtl: CommClk- AutWidDis- BWInt+ AutBWInt-
04:00.0 NVMe SSD
+ LnkCtl: CommClk- ClockPM-

In my opinion this is a BIOS defect. The BIOS should not leave an
interrupt enabled unless it is prepared to handle the interrupt.

But Linux should be able to tolerate this. Maybe we could clear
PCI_EXP_LNKCTL_LBMIE and PCI_EXP_LNKCTL_LABIE somewhere like
pci_configure_device().

Maybe we should also clear other PCIe interrupt enables like
PCI_EXP_SLTCTL_CCIE, PCI_EXP_SLTCTL_HPIE, PCI_EXP_RTCTL_PMEIE.
They should be enabled after installing an interrupt handler for them.

Bjorn

[2] https://bugzilla.kernel.org/attachment.cgi?id=300397 [BIOS 1.12 (David)]
[3] https://bugzilla.kernel.org/attachment.cgi?id=300396 [BIOS 1.14 (David)]
[4] https://bugzilla.kernel.org/attachment.cgi?id=300370 [BIOS 1.14 (Joey)]

2022-02-10 08:07:09

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH v2] PCI/portdrv: Do not setup up IRQs if there are no users

On 08.02.22 19:56, Bjorn Helgaas wrote:
> [+cc David, Joey, Sergiu]
>
> On Mon, Jan 31, 2022 at 10:22:28PM +0100, Jan Kiszka wrote:
>> On 30.08.21 10:08, Jan Kiszka wrote:
>>> From: Jan Kiszka <[email protected]>
>>>
>>> Avoid registering service IRQs if there is no service that offers them
>>> or no driver to register a handler against them. This saves IRQ vectors
>>> when they are limited (e.g. on x86) and also avoids that spurious events
>>> could hit a missing handler. Such spurious events need to be generated
>>> by the Jailhouse hypervisor for active MSI vectors when enabling or
>>> disabling itself.
>>>
>>> Signed-off-by: Jan Kiszka <[email protected]>
>>> ---
>>>
>>> Changes in v2:
>>> - move initialization of irqs to address test bot finding
>>>
>>> drivers/pci/pcie/portdrv_core.c | 47 +++++++++++++++++++++------------
>>> 1 file changed, 30 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
>>> index e1fed6649c41..0e2556269429 100644
>>> --- a/drivers/pci/pcie/portdrv_core.c
>>> +++ b/drivers/pci/pcie/portdrv_core.c
>>> @@ -166,9 +166,6 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
>>> {
>>> int ret, i;
>>> - for (i = 0; i < PCIE_PORT_DEVICE_MAXSERVICES; i++)
>>> - irqs[i] = -1;
>>> -
>>> /*
>>> * If we support PME but can't use MSI/MSI-X for it, we have to
>>> * fall back to INTx or other interrupts, e.g., a system shared
>>> @@ -312,8 +309,10 @@ static int pcie_device_init(struct pci_dev *pdev, int service, int irq)
>>> */
>>> int pcie_port_device_register(struct pci_dev *dev)
>>> {
>>> - int status, capabilities, i, nr_service;
>>> - int irqs[PCIE_PORT_DEVICE_MAXSERVICES];
>>> + int status, capabilities, irq_services, i, nr_service;
>>> + int irqs[PCIE_PORT_DEVICE_MAXSERVICES] = {
>>> + [0 ... PCIE_PORT_DEVICE_MAXSERVICES-1] = -1
>>> + };
>>> /* Enable PCI Express port device */
>>> status = pci_enable_device(dev);
>>> @@ -326,18 +325,32 @@ int pcie_port_device_register(struct pci_dev *dev)
>>> return 0;
>>> pci_set_master(dev);
>>> - /*
>>> - * Initialize service irqs. Don't use service devices that
>>> - * require interrupts if there is no way to generate them.
>>> - * However, some drivers may have a polling mode (e.g. pciehp_poll_mode)
>>> - * that can be used in the absence of irqs. Allow them to determine
>>> - * if that is to be used.
>>> - */
>>> - status = pcie_init_service_irqs(dev, irqs, capabilities);
>>> - if (status) {
>>> - capabilities &= PCIE_PORT_SERVICE_HP;
>>> - if (!capabilities)
>>> - goto error_disable;
>>> +
>>> + irq_services = 0;
>>> + if (IS_ENABLED(CONFIG_PCIE_PME))
>>> + irq_services |= PCIE_PORT_SERVICE_PME;
>>> + if (IS_ENABLED(CONFIG_PCIEAER))
>>> + irq_services |= PCIE_PORT_SERVICE_AER;
>>> + if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
>>> + irq_services |= PCIE_PORT_SERVICE_HP;
>>> + if (IS_ENABLED(CONFIG_PCIE_DPC))
>>> + irq_services |= PCIE_PORT_SERVICE_DPC;
>>> + irq_services &= capabilities;
>>> +
>>> + if (irq_services) {
>>> + /*
>>> + * Initialize service irqs. Don't use service devices that
>>> + * require interrupts if there is no way to generate them.
>>> + * However, some drivers may have a polling mode (e.g.
>>> + * pciehp_poll_mode) that can be used in the absence of irqs.
>>> + * Allow them to determine if that is to be used.
>>> + */
>>> + status = pcie_init_service_irqs(dev, irqs, irq_services);
>>> + if (status) {
>>> + irq_services &= PCIE_PORT_SERVICE_HP;
>>> + if (!irq_services)
>>> + goto error_disable;
>>> + }
>>> }
>>> /* Allocate child services if any */
>>
>> It turns out that this patch causes troubles on some machines, see [1].
>> That could be "resolved" by doing
>>
>> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
>> index bda630889f95..68b0013c3662 100644
>> --- a/drivers/pci/pcie/portdrv_core.c
>> +++ b/drivers/pci/pcie/portdrv_core.c
>> @@ -331,7 +331,7 @@ int pcie_port_device_register(struct pci_dev *dev)
>> pci_set_master(dev);
>> - irq_services = 0;
>> + irq_services = PCIE_PORT_SERVICE_BWNOTIF;
>> if (IS_ENABLED(CONFIG_PCIE_PME))
>> irq_services |= PCIE_PORT_SERVICE_PME;
>> if (IS_ENABLED(CONFIG_PCIEAER))
>>
>> thus considering bandwidth notification as an IRQ-providing service as
>> well. But as far as I can see, there is no driver for this port service,
>> thus no one should ever request or even handle that interrupt.
>>
>> I'm not yet seeing the key difference that could explain this effect.
>> What else happens via pcie_device_init() when called for
>> PCIE_PORT_SERVICE_BWNOTIF, although there will never be a driver?
>>
>> Jan
>>
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215533
>
> Comparing David's "pci=earlydump" logs from [2,3] I see these
> differences:
>
> - BIOS 1.12
> + BIOS 1.14
> 00:1d.0 Root Port to [bus 04]
> - Status: INTx-
> + Status: INTx+
> - DevSta: CorrErr+
> + DevSta: CorrErr-
> - LnkCtl: CommClk+ AutWidDis+ BWInt- AutBWInt+
> + LnkCtl: CommClk- AutWidDis- BWInt+ AutBWInt-
> 04:00.0 NVMe SSD
> - LnkCtl: CommClk+ ClockPM-
> + LnkCtl: CommClk- ClockPM+
>
> It looks like BIOS 1.14 leaves the BWInt bit (Link Bandwidth
> Management Interrupt Enable) *set*, while BIOS 1.12 left it cleared.
>
> Joey's log [4] with BIOS 1.14 also shows BWInt set:
>
> + BIOS 1.14
> 00:1d.0 Root Port to [bus 04]
> + Status: INTx+
> + DevSta: CorrErr-
> + LnkCtl: CommClk- AutWidDis- BWInt+ AutBWInt-
> 04:00.0 NVMe SSD
> + LnkCtl: CommClk- ClockPM-
>
> In my opinion this is a BIOS defect. The BIOS should not leave an
> interrupt enabled unless it is prepared to handle the interrupt.
>
> But Linux should be able to tolerate this. Maybe we could clear
> PCI_EXP_LNKCTL_LBMIE and PCI_EXP_LNKCTL_LABIE somewhere like
> pci_configure_device().
>
> Maybe we should also clear other PCIe interrupt enables like
> PCI_EXP_SLTCTL_CCIE, PCI_EXP_SLTCTL_HPIE, PCI_EXP_RTCTL_PMEIE.
> They should be enabled after installing an interrupt handler for them.
>

So, we have firmware that leaves interrupts sources enabled in the
devices for the OS? What a mess. The reason why that "worked" was then
pure luck: interrupt sharing of that source with others, disabled ones +
unconditional handler registration for those. Makes sense now.

I agree that disabling all unused sources will then be needed to address
also broken firmware. Conceptually, the same problem could happen with
them when we stop registering handlers for unused sources.

Jan

--
Siemens AG, Technology
Competence Center Embedded Linux