The j721e, kirin, tegra, and mediatek drivers all implement .remove().
They also set ".suppress_bind_attrs = true". I think this means
bus_add_driver() will not create the "bind" and "unbind" sysfs
attributes for the driver that would allow users to users to manually
attach and detach devices from it.
Is there a reason for this, or should these drivers stop setting
.suppress_bind_attrs?
For example, Pali and Ley Foon *did* stop setting .suppress_bind_attrs
when adding .remove() methods in these commits:
0746ae1be121 ("PCI: mvebu: Add support for compiling driver as module")
526a76991b7b ("PCI: aardvark: Implement driver 'remove' function and allow to build it as module")
ec15c4d0d5d2 ("PCI: altera: Allow building as module")
Bjorn
On 21/07/2022 21:46, Pali Rohár wrote:
>
> Microchip driver wanted to change its type from bool to tristate
> https://lore.kernel.org/linux-pci/[email protected]/t/#u
> and after discussion it seems that it is needed to do more work for this
> driver.
>
Still planned, but other things are taking up time atm
unfortunately!
Hello!
On Thursday 21 July 2022 14:54:33 Bjorn Helgaas wrote:
> The j721e, kirin, tegra, and mediatek drivers all implement .remove().
>
> They also set ".suppress_bind_attrs = true". I think this means
> bus_add_driver() will not create the "bind" and "unbind" sysfs
> attributes for the driver that would allow users to users to manually
> attach and detach devices from it.
>
> Is there a reason for this, or should these drivers stop setting
> .suppress_bind_attrs?
I have already asked this question during review of kirin driver:
https://lore.kernel.org/linux-pci/20211031205527.ochhi72dfu4uidii@pali/
Microchip driver wanted to change its type from bool to tristate
https://lore.kernel.org/linux-pci/[email protected]/t/#u
and after discussion it seems that it is needed to do more work for this
driver.
> For example, Pali and Ley Foon *did* stop setting .suppress_bind_attrs
> when adding .remove() methods in these commits:
>
> 0746ae1be121 ("PCI: mvebu: Add support for compiling driver as module")
> 526a76991b7b ("PCI: aardvark: Implement driver 'remove' function and allow to build it as module")
> ec15c4d0d5d2 ("PCI: altera: Allow building as module")
>
> Bjorn
I added it for both pci-mvebu.c and pci-aardvark.c. And just few days
ago I realized why suppress_bind_attrs was set to true and remove method
was not implemented.
Implementing remove method is not really simple, specially when pci
controller driver implements also interrupt controller (e.g. for
handling legacy interrupts).
Here are waiting fixup patches for pci-mvebu.c and pci-aardvark.c which
fixes .remove callback. Without these patches calling 'rmmod driver' let
dangling pointer in kernel which may cause random kernel crashes. See:
https://lore.kernel.org/linux-pci/[email protected]/
https://lore.kernel.org/linux-pci/[email protected]/
https://lore.kernel.org/linux-pci/[email protected]/
So I would suggest to do more detailed review when adding .remove
callback for pci controller driver (or when remove suppress_bind_attrs)
and do more testings and checking if all IRQ mappings are disposed.
[+to Johan for qcom]
[-cc Tom, email bounces]
On Thu, Jul 21, 2022 at 10:46:07PM +0200, Pali Roh?r wrote:
> On Thursday 21 July 2022 14:54:33 Bjorn Helgaas wrote:
> > The j721e, kirin, tegra, and mediatek drivers all implement .remove().
> >
> > They also set ".suppress_bind_attrs = true". I think this means
> > bus_add_driver() will not create the "bind" and "unbind" sysfs
> > attributes for the driver that would allow users to users to manually
> > attach and detach devices from it.
> >
> > Is there a reason for this, or should these drivers stop setting
> > .suppress_bind_attrs?
>
> I have already asked this question during review of kirin driver:
> https://lore.kernel.org/linux-pci/20211031205527.ochhi72dfu4uidii@pali/
>
> Microchip driver wanted to change its type from bool to tristate
> https://lore.kernel.org/linux-pci/[email protected]/t/#u
> and after discussion it seems that it is needed to do more work for this
> driver.
>
> > For example, Pali and Ley Foon *did* stop setting .suppress_bind_attrs
> > when adding .remove() methods in these commits:
> >
> > 0746ae1be121 ("PCI: mvebu: Add support for compiling driver as module")
> > 526a76991b7b ("PCI: aardvark: Implement driver 'remove' function and allow to build it as module")
> > ec15c4d0d5d2 ("PCI: altera: Allow building as module")
>
> I added it for both pci-mvebu.c and pci-aardvark.c. And just few days
> ago I realized why suppress_bind_attrs was set to true and remove method
> was not implemented.
With suppress_bind_attrs, the user can't manually unbind a device, so
we can't get to mvebu_pcie_remove() that way, but since mvebu is a
modular driver, I assume we can unload the module and *that* would
call mvebu_pcie_remove(). Right?
> Implementing remove method is not really simple, specially when pci
> controller driver implements also interrupt controller (e.g. for
> handling legacy interrupts).
Hmmm. Based on your patches below, it looks like we need to call
irq_dispose_mapping() in some cases, but I'm very confused about
*which* cases.
I first thought it was for mappings created with irq_create_mapping(),
but pci-aardvark.c never calls that, so there must be more to it.
Currently only altera, iproc, mediatek-gen3, and mediatek call
irq_dispose_mapping() from their .remove() methods. (They all call
irq_domain_remove() *before* irq_dispose_mapping(). Is that legal?
Your patches do irq_dispose_mapping() *first*.)
altera, mediatek-gen3, and mediatek call irq_dispose_mapping() on IRQs
that came from platform_get_irq().
qcom is a DWC driver, so all the IRQ stuff happens in
dw_pcie_host_init(). qcom_pcie_remove() does call
dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
calls irq_dispose_mapping().
I'm thoroughly confused by all this. But I suspect that maybe I
should drop the "make qcom modular" patch because it seems susceptible
to this problem:
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
> Here are waiting fixup patches for pci-mvebu.c and pci-aardvark.c which
> fixes .remove callback. Without these patches calling 'rmmod driver' let
> dangling pointer in kernel which may cause random kernel crashes. See:
>
> https://lore.kernel.org/linux-pci/[email protected]/
> https://lore.kernel.org/linux-pci/[email protected]/
> https://lore.kernel.org/linux-pci/[email protected]/
>
> So I would suggest to do more detailed review when adding .remove
> callback for pci controller driver (or when remove suppress_bind_attrs)
> and do more testings and checking if all IRQ mappings are disposed.
I'm not smart enough to do "more detailed review" because I don't know
what things to look for :) Thanks for all your work in sorting out
these arcane details!
Bjorn
On Thursday 21 July 2022 17:21:22 Bjorn Helgaas wrote:
> [+to Johan for qcom]
> [-cc Tom, email bounces]
>
> On Thu, Jul 21, 2022 at 10:46:07PM +0200, Pali Rohár wrote:
> > On Thursday 21 July 2022 14:54:33 Bjorn Helgaas wrote:
> > > The j721e, kirin, tegra, and mediatek drivers all implement .remove().
> > >
> > > They also set ".suppress_bind_attrs = true". I think this means
> > > bus_add_driver() will not create the "bind" and "unbind" sysfs
> > > attributes for the driver that would allow users to users to manually
> > > attach and detach devices from it.
> > >
> > > Is there a reason for this, or should these drivers stop setting
> > > .suppress_bind_attrs?
> >
> > I have already asked this question during review of kirin driver:
> > https://lore.kernel.org/linux-pci/20211031205527.ochhi72dfu4uidii@pali/
> >
> > Microchip driver wanted to change its type from bool to tristate
> > https://lore.kernel.org/linux-pci/[email protected]/t/#u
> > and after discussion it seems that it is needed to do more work for this
> > driver.
> >
> > > For example, Pali and Ley Foon *did* stop setting .suppress_bind_attrs
> > > when adding .remove() methods in these commits:
> > >
> > > 0746ae1be121 ("PCI: mvebu: Add support for compiling driver as module")
> > > 526a76991b7b ("PCI: aardvark: Implement driver 'remove' function and allow to build it as module")
> > > ec15c4d0d5d2 ("PCI: altera: Allow building as module")
> >
> > I added it for both pci-mvebu.c and pci-aardvark.c. And just few days
> > ago I realized why suppress_bind_attrs was set to true and remove method
> > was not implemented.
>
> With suppress_bind_attrs, the user can't manually unbind a device, so
> we can't get to mvebu_pcie_remove() that way, but since mvebu is a
> modular driver, I assume we can unload the module and *that* would
> call mvebu_pcie_remove(). Right?
Yes! It looks like that if module unloading is allowed then .remove
callback would be called despite suppress_bind_attrs was set.
> > Implementing remove method is not really simple, specially when pci
> > controller driver implements also interrupt controller (e.g. for
> > handling legacy interrupts).
>
> Hmmm. Based on your patches below, it looks like we need to call
> irq_dispose_mapping() in some cases, but I'm very confused about
> *which* cases.
>
> I first thought it was for mappings created with irq_create_mapping(),
> but pci-aardvark.c never calls that, so there must be more to it.
From reading kernel code it looks like that mapping is created when
some other driver request shared INTx IRQ. So it is not done by
pci-aardvark.c but e.g. by wifi driver ath9k (which uses INTx).
> Currently only altera, iproc, mediatek-gen3, and mediatek call
> irq_dispose_mapping() from their .remove() methods. (They all call
> irq_domain_remove() *before* irq_dispose_mapping(). Is that legal?
> Your patches do irq_dispose_mapping() *first*.)
Documentation says:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq/irqdomain.c?h=v5.18#n240
/**
* irq_domain_remove() - Remove an irq domain.
* @domain: domain to remove
*
* This routine is used to remove an irq domain. The caller must ensure
* that all mappings within the domain have been disposed of prior to
* use, depending on the revmap type.
*/
> altera, mediatek-gen3, and mediatek call irq_dispose_mapping() on IRQs
> that came from platform_get_irq().
I think that if IRQ is not used it could be disposed (no need to have it
mapped). So if this IRQ from platform_get_irq() is not shared then in
pci controller driver .remove callback for sure it is not used anymore.
So for non-shared IRQs this looks fine.
> qcom is a DWC driver, so all the IRQ stuff happens in
> dw_pcie_host_init(). qcom_pcie_remove() does call
> dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> calls irq_dispose_mapping().
>
> I'm thoroughly confused by all this.
Beware that there are IRQs which pci controller uses as "client" and
then there may be also IRQs which pci controller driver allocates via
own IRQ chip driver and provides for other drivers (e.g. as chained irq
handler). So it is needed to distinguish between them. IRQs which
controller driver create via own IRQ chip driver have to be properly
cleaned when calling .remove driver. IRQ which controller driver get via
platform_get_irq() (or similar of method) is under parent driver control
and parent driver is responsible for proper dispose/cleanup.
> But I suspect that maybe I
> should drop the "make qcom modular" patch because it seems susceptible
> to this problem:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
In pcie-qcom.c there is no irq reference. So maybe it is safe? But I do
not understand designware driver API design...
The best is probably to do runtime test. Boot kernel with pci=nomsi and
connect some PCIe card which supports legacy INTx interrupt. This should
force usage of INTx. Check that card driver is loaded and in file
/proc/interrupts is IRQ line for that driver. Then rmmod pci controller
driver and check that IRQ line in /proc/interrupts disappeared. And also
check that /sys/kernel/debug/irq/irqs/<num> does not exist.
Also another runtime test is to call rmmod and modprobe of pci
controller driver more times (with PCIe card connected) and check that
IRQ numbers assigned to card in /proc/interrupts are same. If after
rmmod + modprobe sequence are numbers increasing then there is some
IRQ-leak in .remove callback.
> > Here are waiting fixup patches for pci-mvebu.c and pci-aardvark.c which
> > fixes .remove callback. Without these patches calling 'rmmod driver' let
> > dangling pointer in kernel which may cause random kernel crashes. See:
> >
> > https://lore.kernel.org/linux-pci/[email protected]/
> > https://lore.kernel.org/linux-pci/[email protected]/
> > https://lore.kernel.org/linux-pci/[email protected]/
> >
> > So I would suggest to do more detailed review when adding .remove
> > callback for pci controller driver (or when remove suppress_bind_attrs)
> > and do more testings and checking if all IRQ mappings are disposed.
>
> I'm not smart enough to do "more detailed review" because I don't know
> what things to look for :) Thanks for all your work in sorting out
> these arcane details!
>
> Bjorn
Anyway, this issue was discovered by Hajo Noerenberg as a "Side note:"
during debugging SATA controller:
https://bugzilla.kernel.org/show_bug.cgi?id=216094#c31
It looks like that on some machines rmmod cause immediate kernel crash
and on other machines it is hard to reproduce.
On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote:
> [+to Johan for qcom]
> [-cc Tom, email bounces]
>
> On Thu, Jul 21, 2022 at 10:46:07PM +0200, Pali Rohár wrote:
> > On Thursday 21 July 2022 14:54:33 Bjorn Helgaas wrote:
> With suppress_bind_attrs, the user can't manually unbind a device, so
> we can't get to mvebu_pcie_remove() that way, but since mvebu is a
> modular driver, I assume we can unload the module and *that* would
> call mvebu_pcie_remove(). Right?
Correct.
> qcom is a DWC driver, so all the IRQ stuff happens in
> dw_pcie_host_init(). qcom_pcie_remove() does call
> dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> calls irq_dispose_mapping().
>
> I'm thoroughly confused by all this. But I suspect that maybe I
> should drop the "make qcom modular" patch because it seems susceptible
> to this problem:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
That should not be necessary.
As you note above, interrupt handling is implemented in dwc core so if
there are any issue here at all, which I doubt, then all of the dwc
drivers that currently can be built as modules would all be broken and
this would need to be fixed in core.
I've been using the modular pcie-qcom patch for months now, unloading
and reloading the driver repeatedly to test power sequencing, without
noticing any problems whatsoever.
Johan
On Fri, Jul 22, 2022 at 03:26:44PM +0200, Johan Hovold wrote:
> On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote:
> > qcom is a DWC driver, so all the IRQ stuff happens in
> > dw_pcie_host_init(). qcom_pcie_remove() does call
> > dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> > calls irq_dispose_mapping().
> >
> > I'm thoroughly confused by all this. But I suspect that maybe I
> > should drop the "make qcom modular" patch because it seems susceptible
> > to this problem:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
>
> That should not be necessary.
>
> As you note above, interrupt handling is implemented in dwc core so if
> there are any issue here at all, which I doubt, then all of the dwc
> drivers that currently can be built as modules would all be broken and
> this would need to be fixed in core.
I don't know yet whether there's an issue. We need a clear argument
for why there is or is not. The fact that others might be broken is
not an argument for breaking another one ;)
> I've been using the modular pcie-qcom patch for months now, unloading
> and reloading the driver repeatedly to test power sequencing, without
> noticing any problems whatsoever.
Pali's commit log suggests that unloading the module is not, by
itself, enough to trigger the problem:
https://lore.kernel.org/linux-pci/[email protected]/
Can you test the scenario he mentions?
Bjorn
[+cc Marc, can you clarify when we need irq_dispose_mapping()?]
On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote:
> On Thu, Jul 21, 2022 at 10:46:07PM +0200, Pali Roh?r wrote:
> > On Thursday 21 July 2022 14:54:33 Bjorn Helgaas wrote:
> > > The j721e, kirin, tegra, and mediatek drivers all implement .remove().
> > >
> > > They also set ".suppress_bind_attrs = true". I think this means
> > > bus_add_driver() will not create the "bind" and "unbind" sysfs
> > > attributes for the driver that would allow users to users to manually
> > > attach and detach devices from it.
> > >
> > > Is there a reason for this, or should these drivers stop setting
> > > .suppress_bind_attrs?
> >
> > I have already asked this question during review of kirin driver:
> > https://lore.kernel.org/linux-pci/20211031205527.ochhi72dfu4uidii@pali/
> >
> > Microchip driver wanted to change its type from bool to tristate
> > https://lore.kernel.org/linux-pci/[email protected]/t/#u
> > and after discussion it seems that it is needed to do more work for this
> > driver.
> >
> > > For example, Pali and Ley Foon *did* stop setting .suppress_bind_attrs
> > > when adding .remove() methods in these commits:
> > >
> > > 0746ae1be121 ("PCI: mvebu: Add support for compiling driver as module")
> > > 526a76991b7b ("PCI: aardvark: Implement driver 'remove' function and allow to build it as module")
> > > ec15c4d0d5d2 ("PCI: altera: Allow building as module")
> >
> > I added it for both pci-mvebu.c and pci-aardvark.c. And just few days
> > ago I realized why suppress_bind_attrs was set to true and remove method
> > was not implemented.
>
> With suppress_bind_attrs, the user can't manually unbind a device, so
> we can't get to mvebu_pcie_remove() that way, but since mvebu is a
> modular driver, I assume we can unload the module and *that* would
> call mvebu_pcie_remove(). Right?
>
> > Implementing remove method is not really simple, specially when pci
> > controller driver implements also interrupt controller (e.g. for
> > handling legacy interrupts).
>
> Hmmm. Based on your patches below, it looks like we need to call
> irq_dispose_mapping() in some cases, but I'm very confused about
> *which* cases.
>
> I first thought it was for mappings created with irq_create_mapping(),
> but pci-aardvark.c never calls that, so there must be more to it.
>
> Currently only altera, iproc, mediatek-gen3, and mediatek call
> irq_dispose_mapping() from their .remove() methods. (They all call
> irq_domain_remove() *before* irq_dispose_mapping(). Is that legal?
> Your patches do irq_dispose_mapping() *first*.)
>
> altera, mediatek-gen3, and mediatek call irq_dispose_mapping() on IRQs
> that came from platform_get_irq().
>
> qcom is a DWC driver, so all the IRQ stuff happens in
> dw_pcie_host_init(). qcom_pcie_remove() does call
> dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> calls irq_dispose_mapping().
>
> I'm thoroughly confused by all this. But I suspect that maybe I
> should drop the "make qcom modular" patch because it seems susceptible
> to this problem:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
>
> > Here are waiting fixup patches for pci-mvebu.c and pci-aardvark.c which
> > fixes .remove callback. Without these patches calling 'rmmod driver' let
> > dangling pointer in kernel which may cause random kernel crashes. See:
> >
> > https://lore.kernel.org/linux-pci/[email protected]/
> > https://lore.kernel.org/linux-pci/[email protected]/
> > https://lore.kernel.org/linux-pci/[email protected]/
> >
> > So I would suggest to do more detailed review when adding .remove
> > callback for pci controller driver (or when remove suppress_bind_attrs)
> > and do more testings and checking if all IRQ mappings are disposed.
>
> I'm not smart enough to do "more detailed review" because I don't know
> what things to look for :) Thanks for all your work in sorting out
> these arcane details!
>
> Bjorn
Hi Bjorn,
On Fri, 22 Jul 2022 15:39:05 +0100,
Bjorn Helgaas <[email protected]> wrote:
>
> [+cc Marc, can you clarify when we need irq_dispose_mapping()?]
In general, interrupt controllers should not have to discard mappings
themselves, just like they rarely create mappings themselves. That's
usually a different layer that has created it (DT, for example).
The problem is that these mappings persist even if the interrupt has
been released by the driver (it called free_irq()), and the IRQ number
can be further reused. The client driver could dispose of the mapping
after having released the IRQ, but nobody does that in practice.
From the point of view of the controller, there is no simple way to
tell when an interrupt is "unused". And even if a driver was
overzealous and called irq_dispose_mapping() on all the possible
mappings (and made sure no mapping could be created in parallel), this
could result in a bunch of dangling pointers should a client driver
still have the interrupt requested.
Fixing this is pretty hard, as IRQ descriptors are leaky (you can
either have a pointer to one, or just an IRQ number -- they are
strictly equivalent). So in general, being able to remove an interrupt
controller driver is at best fragile, and I'm trying not to get more
of this in the tree.
Hope this helps,
M.
--
Without deviation from the norm, progress is not possible.
On Fri, Jul 22, 2022 at 06:06:07PM +0100, Marc Zyngier wrote:
> On Fri, 22 Jul 2022 15:39:05 +0100,
> Bjorn Helgaas <[email protected]> wrote:
> >
> > [+cc Marc, can you clarify when we need irq_dispose_mapping()?]
>
> In general, interrupt controllers should not have to discard mappings
> themselves, just like they rarely create mappings themselves. That's
> usually a different layer that has created it (DT, for example).
>
> The problem is that these mappings persist even if the interrupt has
> been released by the driver (it called free_irq()), and the IRQ number
> can be further reused. The client driver could dispose of the mapping
> after having released the IRQ, but nobody does that in practice.
>
> From the point of view of the controller, there is no simple way to
> tell when an interrupt is "unused". And even if a driver was
> overzealous and called irq_dispose_mapping() on all the possible
> mappings (and made sure no mapping could be created in parallel), this
> could result in a bunch of dangling pointers should a client driver
> still have the interrupt requested.
>
> Fixing this is pretty hard, as IRQ descriptors are leaky (you can
> either have a pointer to one, or just an IRQ number -- they are
> strictly equivalent). So in general, being able to remove an interrupt
> controller driver is at best fragile, and I'm trying not to get more
> of this in the tree.
Thank you!
How do we identify an interrupt controller driver? Apparently some of
these PCIe controller drivers also include an interrupt controller
driver, but I don't know what to look for to find them.
Bjorn
On Fri, 22 Jul 2022 18:17:06 +0100,
Bjorn Helgaas <[email protected]> wrote:
>
> On Fri, Jul 22, 2022 at 06:06:07PM +0100, Marc Zyngier wrote:
> > On Fri, 22 Jul 2022 15:39:05 +0100,
> > Bjorn Helgaas <[email protected]> wrote:
> > >
> > > [+cc Marc, can you clarify when we need irq_dispose_mapping()?]
> >
> > In general, interrupt controllers should not have to discard mappings
> > themselves, just like they rarely create mappings themselves. That's
> > usually a different layer that has created it (DT, for example).
> >
> > The problem is that these mappings persist even if the interrupt has
> > been released by the driver (it called free_irq()), and the IRQ number
> > can be further reused. The client driver could dispose of the mapping
> > after having released the IRQ, but nobody does that in practice.
> >
> > From the point of view of the controller, there is no simple way to
> > tell when an interrupt is "unused". And even if a driver was
> > overzealous and called irq_dispose_mapping() on all the possible
> > mappings (and made sure no mapping could be created in parallel), this
> > could result in a bunch of dangling pointers should a client driver
> > still have the interrupt requested.
> >
> > Fixing this is pretty hard, as IRQ descriptors are leaky (you can
> > either have a pointer to one, or just an IRQ number -- they are
> > strictly equivalent). So in general, being able to remove an interrupt
> > controller driver is at best fragile, and I'm trying not to get more
> > of this in the tree.
>
> Thank you!
>
> How do we identify an interrupt controller driver? Apparently some of
> these PCIe controller drivers also include an interrupt controller
> driver, but I don't know what to look for to find them.
If you see a struct irq_chip somewhere, this is an interrupt
controller. And yes, most of the PCIe RC drivers will have some sort
of interrupt controller driver for INTx support, as well as MSI when
the RC doesn't/cannot rely on the platform providing one.
It means that these PCIe RC drivers probably shouldn't be removable if
built as modules. Which I don't think is a big problem. You want
modularity to reduce the size of the kernel image and only load the
drivers the platform actually requires, saving memory in the process.
And for something as fundamental as an interrupt controller (and PCIe
in general), you probably want to keep it around for the lifetime of
the machine.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
[ +CC: maz ]
On Fri, Jul 22, 2022 at 09:38:58AM -0500, Bjorn Helgaas wrote:
> On Fri, Jul 22, 2022 at 03:26:44PM +0200, Johan Hovold wrote:
> > On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote:
>
> > > qcom is a DWC driver, so all the IRQ stuff happens in
> > > dw_pcie_host_init(). qcom_pcie_remove() does call
> > > dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> > > calls irq_dispose_mapping().
> > >
> > > I'm thoroughly confused by all this. But I suspect that maybe I
> > > should drop the "make qcom modular" patch because it seems susceptible
> > > to this problem:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
> >
> > That should not be necessary.
> >
> > As you note above, interrupt handling is implemented in dwc core so if
> > there are any issue here at all, which I doubt, then all of the dwc
> > drivers that currently can be built as modules would all be broken and
> > this would need to be fixed in core.
>
> I don't know yet whether there's an issue. We need a clear argument
> for why there is or is not. The fact that others might be broken is
> not an argument for breaking another one ;)
It's not breaking anything that is currently working, and if there's
some corner case during module unload, that's not the end of the world
either.
It's a feature useful for developers and no one expects remove code to
be perfect (e.g. resilient against someone trying to break it by doing
things in parallel, etc.).
> > I've been using the modular pcie-qcom patch for months now, unloading
> > and reloading the driver repeatedly to test power sequencing, without
> > noticing any problems whatsoever.
>
> Pali's commit log suggests that unloading the module is not, by
> itself, enough to trigger the problem:
>
> https://lore.kernel.org/linux-pci/[email protected]/
>
> Can you test the scenario he mentions?
Turns out the pcie-qcom driver does not support legacy interrupts so
there's no risk of there being any lingering mappings if I understand
things correctly.
Johan
On Mon, 25 Jul 2022 14:25:49 +0100,
Johan Hovold <[email protected]> wrote:
>
> [ +CC: maz ]
>
> On Fri, Jul 22, 2022 at 09:38:58AM -0500, Bjorn Helgaas wrote:
> > On Fri, Jul 22, 2022 at 03:26:44PM +0200, Johan Hovold wrote:
> > > On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote:
> >
> > > > qcom is a DWC driver, so all the IRQ stuff happens in
> > > > dw_pcie_host_init(). qcom_pcie_remove() does call
> > > > dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> > > > calls irq_dispose_mapping().
> > > >
> > > > I'm thoroughly confused by all this. But I suspect that maybe I
> > > > should drop the "make qcom modular" patch because it seems susceptible
> > > > to this problem:
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
> > >
> > > That should not be necessary.
> > >
> > > As you note above, interrupt handling is implemented in dwc core so if
> > > there are any issue here at all, which I doubt, then all of the dwc
> > > drivers that currently can be built as modules would all be broken and
> > > this would need to be fixed in core.
> >
> > I don't know yet whether there's an issue. We need a clear argument
> > for why there is or is not. The fact that others might be broken is
> > not an argument for breaking another one ;)
>
> It's not breaking anything that is currently working, and if there's
> some corner case during module unload, that's not the end of the world
> either.
It may not be the end of the world for you, but you have absolutely no
idea of what dangling pointers to kernel memory will do on a user
machine, nor how this can be further exploited. Unloading a module
should never result in an unsafe kernel.
> It's a feature useful for developers and no one expects remove code to
> be perfect (e.g. resilient against someone trying to break it by doing
> things in parallel, etc.).
If that's a feature for you while you are developing, then please keep
this change as part of your own hacking toolbox. IMO the upstream
kernel shouldn't be subjected to this.
>
> > > I've been using the modular pcie-qcom patch for months now, unloading
> > > and reloading the driver repeatedly to test power sequencing, without
> > > noticing any problems whatsoever.
> >
> > Pali's commit log suggests that unloading the module is not, by
> > itself, enough to trigger the problem:
> >
> > https://lore.kernel.org/linux-pci/[email protected]/
> >
> > Can you test the scenario he mentions?
>
> Turns out the pcie-qcom driver does not support legacy interrupts so
> there's no risk of there being any lingering mappings if I understand
> things correctly.
It still does MSIs, thanks to dw_pcie_host_init(). If you can remove
the driver while devices are up and running with MSIs allocated,
things may get ugly if things align the wrong way (if a driver still
has a reference to an irq_desc or irq_data, for example).
M.
--
Without deviation from the norm, progress is not possible.
On Mon, Jul 25, 2022 at 03:43:40PM +0100, Marc Zyngier wrote:
> On Mon, 25 Jul 2022 14:25:49 +0100,
> Johan Hovold <[email protected]> wrote:
> >
> > [ +CC: maz ]
> >
> > On Fri, Jul 22, 2022 at 09:38:58AM -0500, Bjorn Helgaas wrote:
> > > On Fri, Jul 22, 2022 at 03:26:44PM +0200, Johan Hovold wrote:
> > > > On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote:
> > >
> > > > > qcom is a DWC driver, so all the IRQ stuff happens in
> > > > > dw_pcie_host_init(). qcom_pcie_remove() does call
> > > > > dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> > > > > calls irq_dispose_mapping().
> > > > >
> > > > > I'm thoroughly confused by all this. But I suspect that maybe I
> > > > > should drop the "make qcom modular" patch because it seems susceptible
> > > > > to this problem:
> > > > >
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
> > > >
> > > > That should not be necessary.
> > > >
> > > > As you note above, interrupt handling is implemented in dwc core so if
> > > > there are any issue here at all, which I doubt, then all of the dwc
> > > > drivers that currently can be built as modules would all be broken and
> > > > this would need to be fixed in core.
> > >
> > > I don't know yet whether there's an issue. We need a clear argument
> > > for why there is or is not. The fact that others might be broken is
> > > not an argument for breaking another one ;)
> >
> > It's not breaking anything that is currently working, and if there's
> > some corner case during module unload, that's not the end of the world
> > either.
>
> It may not be the end of the world for you, but you have absolutely no
> idea of what dangling pointers to kernel memory will do on a user
> machine, nor how this can be further exploited. Unloading a module
> should never result in an unsafe kernel.
Since when is unloading modules something that is expected to work
perfectly? I keep hearing "well, don't do that then" when someone
complains about unloading this module while doing this or that broke
something. (And it's only root that can unload modules in the first
place.)
If this was the general understanding, then it seems the only option
would be to disable module unloading completely as module remove code
almost by definition gets less testing and is subject to bit rot.
It's useful for developers, but use it at your own risk.
That said, I agree that if something is next to impossible to get right,
as may be the case with interrupt controllers generally, then fine,
let's disable module unloading for that class of drivers.
And this would mean disabling driver unbind for the 20+ driver PCI
drivers that currently implement it to some degree.
Also note that we only appear to have some 60 drivers in the tree that
can be built as modules but cannot be unloaded (if my grep patterns
were correct).
> > > > I've been using the modular pcie-qcom patch for months now, unloading
> > > > and reloading the driver repeatedly to test power sequencing, without
> > > > noticing any problems whatsoever.
> > >
> > > Pali's commit log suggests that unloading the module is not, by
> > > itself, enough to trigger the problem:
> > >
> > > https://lore.kernel.org/linux-pci/[email protected]/
> > >
> > > Can you test the scenario he mentions?
> >
> > Turns out the pcie-qcom driver does not support legacy interrupts so
> > there's no risk of there being any lingering mappings if I understand
> > things correctly.
>
> It still does MSIs, thanks to dw_pcie_host_init(). If you can remove
> the driver while devices are up and running with MSIs allocated,
> things may get ugly if things align the wrong way (if a driver still
> has a reference to an irq_desc or irq_data, for example).
That is precisely the way I've been testing it and everything appears
to be tore down as it should.
And a PCI driver that has been unbound should have released its
resources, or that's a driver bug. Right?
And for the OF INTx case you mentioned earlier, aren't those mapped by
PCI core and could in theory be released by core as well?
Johan
On Mon, 25 Jul 2022 16:18:48 +0100,
Johan Hovold <[email protected]> wrote:
>
> On Mon, Jul 25, 2022 at 03:43:40PM +0100, Marc Zyngier wrote:
> > On Mon, 25 Jul 2022 14:25:49 +0100,
> > Johan Hovold <[email protected]> wrote:
> > >
> > > [ +CC: maz ]
> > >
> > > On Fri, Jul 22, 2022 at 09:38:58AM -0500, Bjorn Helgaas wrote:
> > > > On Fri, Jul 22, 2022 at 03:26:44PM +0200, Johan Hovold wrote:
> > > > > On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote:
> > > >
> > > > > > qcom is a DWC driver, so all the IRQ stuff happens in
> > > > > > dw_pcie_host_init(). qcom_pcie_remove() does call
> > > > > > dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> > > > > > calls irq_dispose_mapping().
> > > > > >
> > > > > > I'm thoroughly confused by all this. But I suspect that maybe I
> > > > > > should drop the "make qcom modular" patch because it seems susceptible
> > > > > > to this problem:
> > > > > >
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
> > > > >
> > > > > That should not be necessary.
> > > > >
> > > > > As you note above, interrupt handling is implemented in dwc core so if
> > > > > there are any issue here at all, which I doubt, then all of the dwc
> > > > > drivers that currently can be built as modules would all be broken and
> > > > > this would need to be fixed in core.
> > > >
> > > > I don't know yet whether there's an issue. We need a clear argument
> > > > for why there is or is not. The fact that others might be broken is
> > > > not an argument for breaking another one ;)
> > >
> > > It's not breaking anything that is currently working, and if there's
> > > some corner case during module unload, that's not the end of the world
> > > either.
> >
> > It may not be the end of the world for you, but you have absolutely no
> > idea of what dangling pointers to kernel memory will do on a user
> > machine, nor how this can be further exploited. Unloading a module
> > should never result in an unsafe kernel.
>
> Since when is unloading modules something that is expected to work
> perfectly? I keep hearing "well, don't do that then" when someone
> complains about unloading this module while doing this or that broke
> something. (And it's only root that can unload modules in the first
> place.)
Well, maybe I have higher standards. For the stuff I maintain, I now
point-blank refuse to support module unloading if this can result in a
crash. Or worse.
> If this was the general understanding, then it seems the only option
> would be to disable module unloading completely as module remove code
> almost by definition gets less testing and is subject to bit rot.
My personal preference would be to prevent module unloading by default
if the probing has succeeded, and have modules to actually buy into
unloading. But that ship has sailed a long time ago.
> It's useful for developers, but use it at your own risk.
>
> That said, I agree that if something is next to impossible to get right,
> as may be the case with interrupt controllers generally, then fine,
> let's disable module unloading for that class of drivers.
>
> And this would mean disabling driver unbind for the 20+ driver PCI
> drivers that currently implement it to some degree.
That would be Bjorn's and Lorenzo's call.
> Also note that we only appear to have some 60 drivers in the tree that
> can be built as modules but cannot be unloaded (if my grep patterns
> were correct).
I'm not surprised. Preventing module unload requires extra "code", and
hardly anyone cares.
>
> > > > > I've been using the modular pcie-qcom patch for months now, unloading
> > > > > and reloading the driver repeatedly to test power sequencing, without
> > > > > noticing any problems whatsoever.
> > > >
> > > > Pali's commit log suggests that unloading the module is not, by
> > > > itself, enough to trigger the problem:
> > > >
> > > > https://lore.kernel.org/linux-pci/[email protected]/
> > > >
> > > > Can you test the scenario he mentions?
> > >
> > > Turns out the pcie-qcom driver does not support legacy interrupts so
> > > there's no risk of there being any lingering mappings if I understand
> > > things correctly.
> >
> > It still does MSIs, thanks to dw_pcie_host_init(). If you can remove
> > the driver while devices are up and running with MSIs allocated,
> > things may get ugly if things align the wrong way (if a driver still
> > has a reference to an irq_desc or irq_data, for example).
>
> That is precisely the way I've been testing it and everything appears
> to be tore down as it should.
>
> And a PCI driver that has been unbound should have released its
> resources, or that's a driver bug. Right?
But that's the thing: you can easily remove part of the infrastructure
without the endpoint driver even noticing. It may not happen in your
particular case if removing the RC driver will also nuke the endpoints
in the process, but I can't see this is an absolute guarantee. The
crash pointed to by an earlier email is symptomatic of it.
> And for the OF INTx case you mentioned earlier, aren't those mapped by
> PCI core and could in theory be released by core as well?
Potentially, though I haven't tried to follow the life cycle of those.
The whole thing is pretty fragile, and this sort of resource is rarely
expected to be removed...
M.
--
Without deviation from the norm, progress is not possible.
On 22/07/2022 18:06, Marc Zyngier wrote:
> Hi Bjorn,
>
> On Fri, 22 Jul 2022 15:39:05 +0100,
> Bjorn Helgaas <[email protected]> wrote:
>>
>> [+cc Marc, can you clarify when we need irq_dispose_mapping()?]
>
> In general, interrupt controllers should not have to discard mappings
> themselves, just like they rarely create mappings themselves. That's
> usually a different layer that has created it (DT, for example).
>
> The problem is that these mappings persist even if the interrupt has
> been released by the driver (it called free_irq()), and the IRQ number
> can be further reused. The client driver could dispose of the mapping
> after having released the IRQ, but nobody does that in practice.
>
> From the point of view of the controller, there is no simple way to
> tell when an interrupt is "unused". And even if a driver was
> overzealous and called irq_dispose_mapping() on all the possible
> mappings (and made sure no mapping could be created in parallel), this
> could result in a bunch of dangling pointers should a client driver
> still have the interrupt requested.
>
> Fixing this is pretty hard, as IRQ descriptors are leaky (you can
> either have a pointer to one, or just an IRQ number -- they are
> strictly equivalent). So in general, being able to remove an interrupt
> controller driver is at best fragile, and I'm trying not to get more
> of this in the tree.
>
Sorry to butt back in here - but I am taking this to mean that rather
than add a remove callback for the microchip pci controller driver when
making it buildable as a module it would instead be better to forgo that
entirely and prevent unloading the module (since it does INTX & MSI).
Would that be an accurate assessment?
Thanks,
Conor.
On 7/24/22 02:38, Marc Zyngier wrote:
> On Fri, 22 Jul 2022 18:17:06 +0100,
> Bjorn Helgaas <[email protected]> wrote:
>>
>> On Fri, Jul 22, 2022 at 06:06:07PM +0100, Marc Zyngier wrote:
>>> On Fri, 22 Jul 2022 15:39:05 +0100,
>>> Bjorn Helgaas <[email protected]> wrote:
>>>>
>>>> [+cc Marc, can you clarify when we need irq_dispose_mapping()?]
>>>
>>> In general, interrupt controllers should not have to discard mappings
>>> themselves, just like they rarely create mappings themselves. That's
>>> usually a different layer that has created it (DT, for example).
>>>
>>> The problem is that these mappings persist even if the interrupt has
>>> been released by the driver (it called free_irq()), and the IRQ number
>>> can be further reused. The client driver could dispose of the mapping
>>> after having released the IRQ, but nobody does that in practice.
>>>
>>> From the point of view of the controller, there is no simple way to
>>> tell when an interrupt is "unused". And even if a driver was
>>> overzealous and called irq_dispose_mapping() on all the possible
>>> mappings (and made sure no mapping could be created in parallel), this
>>> could result in a bunch of dangling pointers should a client driver
>>> still have the interrupt requested.
>>>
>>> Fixing this is pretty hard, as IRQ descriptors are leaky (you can
>>> either have a pointer to one, or just an IRQ number -- they are
>>> strictly equivalent). So in general, being able to remove an interrupt
>>> controller driver is at best fragile, and I'm trying not to get more
>>> of this in the tree.
>>
>> Thank you!
>>
>> How do we identify an interrupt controller driver? Apparently some of
>> these PCIe controller drivers also include an interrupt controller
>> driver, but I don't know what to look for to find them.
>
> If you see a struct irq_chip somewhere, this is an interrupt
> controller. And yes, most of the PCIe RC drivers will have some sort
> of interrupt controller driver for INTx support, as well as MSI when
> the RC doesn't/cannot rely on the platform providing one.
>
> It means that these PCIe RC drivers probably shouldn't be removable if
> built as modules. Which I don't think is a big problem. You want
> modularity to reduce the size of the kernel image and only load the
> drivers the platform actually requires, saving memory in the process.
> And for something as fundamental as an interrupt controller (and PCIe
> in general), you probably want to keep it around for the lifetime of
> the machine.
No disagreement, however being able to fully remove and load the module again ensures that you bring the hardware and software in a sane state every time, i.e.: it does help find actual bugs in either implementations. It's also a faster turn around time if you are working on that specific subsystem in avoid rebooting the kernel needlessly (that puts a lot of faith into not crashing the kernel, still).
--
Florian
On Mon, 25 Jul 2022 18:49:05 +0100,
<[email protected]> wrote:
>
> Sorry to butt back in here - but I am taking this to mean that rather
> than add a remove callback for the microchip pci controller driver when
> making it buildable as a module it would instead be better to forgo that
> entirely and prevent unloading the module (since it does INTX & MSI).
>
> Would that be an accurate assessment?
Yes.
M.
--
Without deviation from the norm, progress is not possible.
On Mon, Jul 25, 2022 at 06:35:27PM +0100, Marc Zyngier wrote:
> On Mon, 25 Jul 2022 16:18:48 +0100,
> Johan Hovold <[email protected]> wrote:
> > Since when is unloading modules something that is expected to work
> > perfectly? I keep hearing "well, don't do that then" when someone
> > complains about unloading this module while doing this or that broke
> > something. (And it's only root that can unload modules in the first
> > place.)
>
> Well, maybe I have higher standards. For the stuff I maintain, I now
> point-blank refuse to support module unloading if this can result in a
> crash. Or worse.
That makes sense for regular interrupt controllers where its hard to
tell that all consumers are gone, but I don't think that should limit
the usefulness of having modular PCI controller drivers where we know
that the consumers are gone after deregistering the bus (i.e. the
consumers are descendants of the controller in the device tree).
> > If this was the general understanding, then it seems the only option
> > would be to disable module unloading completely as module remove code
> > almost by definition gets less testing and is subject to bit rot.
>
> My personal preference would be to prevent module unloading by default
> if the probing has succeeded, and have modules to actually buy into
> unloading. But that ship has sailed a long time ago.
We obviously agree that modular driver are important (e.g. for
multi-platform kernels), but being able to unload a module is also a
useful debugging and development tool. I've fixed several driver bugs in
paths that are rarely tested (or hard to test) by unloading the
pcie-qcom driver.
This old quote from Linus seems to agree with my position on this:
The proper thing to do (and what we _have_ done) is to say
"unloading of modules is not supported". It's a debugging
feature, and you literally shouldn't do it unless you are
actively developing that module.
https://lore.kernel.org/all/[email protected]/
But of course we should fix any issues we find, such as the missing
unmapping of legacy interrupts pointed out by Pali earlier in this
thread.
> > It's useful for developers, but use it at your own risk.
> >
> > That said, I agree that if something is next to impossible to get right,
> > as may be the case with interrupt controllers generally, then fine,
> > let's disable module unloading for that class of drivers.
> >
> > And this would mean disabling driver unbind for the 20+ driver PCI
> > drivers that currently implement it to some degree.
>
> That would be Bjorn's and Lorenzo's call.
Sure, but I think it would be the wrong decision here. Especially, since
the end result will likely just be that more drivers will become always
compiled-in.
> > Also note that we only appear to have some 60 drivers in the tree that
> > can be built as modules but cannot be unloaded (if my grep patterns
> > were correct).
>
> I'm not surprised. Preventing module unload requires extra "code", and
> hardly anyone cares.
And it's primarily a debugging feature.
> > > > Turns out the pcie-qcom driver does not support legacy interrupts so
> > > > there's no risk of there being any lingering mappings if I understand
> > > > things correctly.
> > >
> > > It still does MSIs, thanks to dw_pcie_host_init(). If you can remove
> > > the driver while devices are up and running with MSIs allocated,
> > > things may get ugly if things align the wrong way (if a driver still
> > > has a reference to an irq_desc or irq_data, for example).
> >
> > That is precisely the way I've been testing it and everything appears
> > to be tore down as it should.
> >
> > And a PCI driver that has been unbound should have released its
> > resources, or that's a driver bug. Right?
>
> But that's the thing: you can easily remove part of the infrastructure
> without the endpoint driver even noticing. It may not happen in your
> particular case if removing the RC driver will also nuke the endpoints
> in the process, but I can't see this is an absolute guarantee. The
> crash pointed to by an earlier email is symptomatic of it.
But that was arguably due to a driver bug, which we know how to fix. For
MSIs the endpoint driver will free its interrupts and all is good.
The key observation is that the driver model will make sure that any
endpoint drivers have been unbound before the bus is deregistered.
That means there are no longer any consumers of the interrupts, which
can be disposed. For MSI this is handled by pci_free_irq_vectors() when
unbinding the endpoint drivers. For legacy interrupts, which can be
shared, the PCIe RC driver needs to manage this itself after the
consumers are gone.
> > And for the OF INTx case you mentioned earlier, aren't those mapped by
> > PCI core and could in theory be released by core as well?
>
> Potentially, though I haven't tried to follow the life cycle of those.
> The whole thing is pretty fragile, and this sort of resource is rarely
> expected to be removed...
The OF mapping is typically done by PCI core when binding the endpoint
driver, but as the interrupts can be shared, they cannot be disposed at
unbind.
Instead the host-bridge driver needs to dispose the mappings after
deregistering the bus and before removing the domain, as in Pali's
fixes:
https://lore.kernel.org/linux-pci/[email protected]/
at which point the consumers are gone.
Johan
On Tue, Jul 26, 2022 at 11:56:59AM +0200, Johan Hovold wrote:
> On Mon, Jul 25, 2022 at 06:35:27PM +0100, Marc Zyngier wrote:
> > On Mon, 25 Jul 2022 16:18:48 +0100,
> > Johan Hovold <[email protected]> wrote:
>
> > > Since when is unloading modules something that is expected to
> > > work perfectly? I keep hearing "well, don't do that then" when
> > > someone complains about unloading this module while doing this
> > > or that broke something. (And it's only root that can unload
> > > modules in the first place.)
> >
> > Well, maybe I have higher standards. For the stuff I maintain, I
> > now point-blank refuse to support module unloading if this can
> > result in a crash. Or worse.
>
> That makes sense for regular interrupt controllers where its hard to
> tell that all consumers are gone, but I don't think that should
> limit the usefulness of having modular PCI controller drivers where
> we know that the consumers are gone after deregistering the bus
> (i.e. the consumers are descendants of the controller in the device
> tree).
Those consumers are endpoint drivers, so I think this depends on those
drivers correctly unmapping the interrupts they use, right?
> > > It's useful for developers, but use it at your own risk.
> > >
> > > That said, I agree that if something is next to impossible to
> > > get right, as may be the case with interrupt controllers
> > > generally, then fine, let's disable module unloading for that
> > > class of drivers.
> > >
> > > And this would mean disabling driver unbind for the 20+ driver
> > > PCI drivers that currently implement it to some degree.
> >
> > That would be Bjorn's and Lorenzo's call.
>
> Sure, but I think it would be the wrong decision here. Especially,
> since the end result will likely just be that more drivers will
> become always compiled-in.
Can you elaborate on this? I think Marc is suggesting that these PCI
controller drivers be modular but not removable. Why would that cause
more of them to be compiled-in?
> > > > > Turns out the pcie-qcom driver does not support legacy
> > > > > interrupts so there's no risk of there being any lingering
> > > > > mappings if I understand things correctly.
> > > >
> > > > It still does MSIs, thanks to dw_pcie_host_init(). If you can
> > > > remove the driver while devices are up and running with MSIs
> > > > allocated, things may get ugly if things align the wrong way
> > > > (if a driver still has a reference to an irq_desc or irq_data,
> > > > for example).
> > >
> > > That is precisely the way I've been testing it and everything
> > > appears to be tore down as it should.
> > >
> > > And a PCI driver that has been unbound should have released its
> > > resources, or that's a driver bug. Right?
> >
> > But that's the thing: you can easily remove part of the
> > infrastructure without the endpoint driver even noticing. It may
> > not happen in your particular case if removing the RC driver will
> > also nuke the endpoints in the process, but I can't see this is an
> > absolute guarantee. The crash pointed to by an earlier email is
> > symptomatic of it.
>
> But that was arguably due to a driver bug, which we know how to fix.
> For MSIs the endpoint driver will free its interrupts and all is
> good.
>
> The key observation is that the driver model will make sure that any
> endpoint drivers have been unbound before the bus is deregistered.
>
> That means there are no longer any consumers of the interrupts,
> which can be disposed. For MSI this is handled by
> pci_free_irq_vectors() when unbinding the endpoint drivers. For
> legacy interrupts, which can be shared, the PCIe RC driver needs to
> manage this itself after the consumers are gone.
The driver model ensures that endpoint drivers have been unbound. But
doesn't the interrupt disposal depend on the correct functioning of
those endpoint drivers? So if a buggy endpoint driver failed to
dispose of them, we're still vulnerable?
Bjorn
On Wed, Jul 27, 2022 at 02:57:16PM -0500, Bjorn Helgaas wrote:
> On Tue, Jul 26, 2022 at 11:56:59AM +0200, Johan Hovold wrote:
> > On Mon, Jul 25, 2022 at 06:35:27PM +0100, Marc Zyngier wrote:
> > > On Mon, 25 Jul 2022 16:18:48 +0100,
> > > Johan Hovold <[email protected]> wrote:
> >
> > > > Since when is unloading modules something that is expected to
> > > > work perfectly? I keep hearing "well, don't do that then" when
> > > > someone complains about unloading this module while doing this
> > > > or that broke something. (And it's only root that can unload
> > > > modules in the first place.)
> > >
> > > Well, maybe I have higher standards. For the stuff I maintain, I
> > > now point-blank refuse to support module unloading if this can
> > > result in a crash. Or worse.
> >
> > That makes sense for regular interrupt controllers where its hard to
> > tell that all consumers are gone, but I don't think that should
> > limit the usefulness of having modular PCI controller drivers where
> > we know that the consumers are gone after deregistering the bus
> > (i.e. the consumers are descendants of the controller in the device
> > tree).
>
> Those consumers are endpoint drivers, so I think this depends on those
> drivers correctly unmapping the interrupts they use, right?
Right. For MSI this means that pci_alloc_irq_vectors() in probe should
be matched by pci_free_irq_vectors() on remove.
For legacy interrupts, which can be shared, the mapping is created by
PCI core when binding to the first device and can only be disposed by
the host-bridge driver once all descendants have been removed.
The endpoint drivers still need to disable their interrupts of course.
Buggy endpoint-driver remove implementations can lead to all sorts of
crashes (e.g. after failing to deregister a class device), and if that's
a worry then don't unload modules (and possibly disable it completely
using CONFIG_MODULE_UNLOAD).
> > > > It's useful for developers, but use it at your own risk.
> > > >
> > > > That said, I agree that if something is next to impossible to
> > > > get right, as may be the case with interrupt controllers
> > > > generally, then fine, let's disable module unloading for that
> > > > class of drivers.
> > > >
> > > > And this would mean disabling driver unbind for the 20+ driver
> > > > PCI drivers that currently implement it to some degree.
> > >
> > > That would be Bjorn's and Lorenzo's call.
> >
> > Sure, but I think it would be the wrong decision here. Especially,
> > since the end result will likely just be that more drivers will
> > become always compiled-in.
>
> Can you elaborate on this? I think Marc is suggesting that these PCI
> controller drivers be modular but not removable. Why would that cause
> more of them to be compiled-in?
As mentioned earlier in this thread, we only appear to have some 60
drivers in the entire tree that bother to try to implement that. I fear
that blocking the use of modules (including being able to unload them)
will just make people submit drivers that can only be built in.
Not everyone cares about Android's GKI, but being able to unload a
module during development is very useful (and keeping that out-of-tree
prevents sharing the implementation and make it susceptible to even
further bit rot).
So continuing to supporting modules properly is a win for everyone (e.g.
GKI and developers).
> > > > > > Turns out the pcie-qcom driver does not support legacy
> > > > > > interrupts so there's no risk of there being any lingering
> > > > > > mappings if I understand things correctly.
> > > > >
> > > > > It still does MSIs, thanks to dw_pcie_host_init(). If you can
> > > > > remove the driver while devices are up and running with MSIs
> > > > > allocated, things may get ugly if things align the wrong way
> > > > > (if a driver still has a reference to an irq_desc or irq_data,
> > > > > for example).
> > > >
> > > > That is precisely the way I've been testing it and everything
> > > > appears to be tore down as it should.
> > > >
> > > > And a PCI driver that has been unbound should have released its
> > > > resources, or that's a driver bug. Right?
> > >
> > > But that's the thing: you can easily remove part of the
> > > infrastructure without the endpoint driver even noticing. It may
> > > not happen in your particular case if removing the RC driver will
> > > also nuke the endpoints in the process, but I can't see this is an
> > > absolute guarantee. The crash pointed to by an earlier email is
> > > symptomatic of it.
> >
> > But that was arguably due to a driver bug, which we know how to fix.
> > For MSIs the endpoint driver will free its interrupts and all is
> > good.
> >
> > The key observation is that the driver model will make sure that any
> > endpoint drivers have been unbound before the bus is deregistered.
> >
> > That means there are no longer any consumers of the interrupts,
> > which can be disposed. For MSI this is handled by
> > pci_free_irq_vectors() when unbinding the endpoint drivers. For
> > legacy interrupts, which can be shared, the PCIe RC driver needs to
> > manage this itself after the consumers are gone.
>
> The driver model ensures that endpoint drivers have been unbound. But
> doesn't the interrupt disposal depend on the correct functioning of
> those endpoint drivers? So if a buggy endpoint driver failed to
> dispose of them, we're still vulnerable?
Just as you are if an endpoint-driver fails to clean up after itself in
some other way (e.g. leaves the interrupt enabled).
Johan
On Mon, Jul 25, 2022 at 06:35:27PM +0100, Marc Zyngier wrote:
[...]
> > That is precisely the way I've been testing it and everything appears
> > to be tore down as it should.
> >
> > And a PCI driver that has been unbound should have released its
> > resources, or that's a driver bug. Right?
>
> But that's the thing: you can easily remove part of the infrastructure
> without the endpoint driver even noticing. It may not happen in your
> particular case if removing the RC driver will also nuke the endpoints
> in the process, but I can't see this is an absolute guarantee. The
> crash pointed to by an earlier email is symptomatic of it.
>
> > And for the OF INTx case you mentioned earlier, aren't those mapped by
> > PCI core and could in theory be released by core as well?
>
> Potentially, though I haven't tried to follow the life cycle of those.
> The whole thing is pretty fragile, and this sort of resource is rarely
> expected to be removed...
This made me notice that we don't undo the actions (ie bridge->map_irq())
executed in pci_assign_irq() in pci_device_remove(); I don't think this
can be right and that's already a candidate for a fix.
It is not necessarily related to this thread topic, though I believe,
in an _ideal_ world, removing a bridge should guarantee that all
the downstream devices (ie drivers) had a chance of freeing/disposing
the resources they allocated. This in theory; I totally understand
Marc's point of view here and we should make up our mind about what
we want to do on host bridge removal policy - this will take me more
time to get to the bottom of it.
Lorenzo
On Tue, Sep 27, 2022 at 05:27:42PM +0200, Lorenzo Pieralisi wrote:
> On Mon, Jul 25, 2022 at 06:35:27PM +0100, Marc Zyngier wrote:
>
> [...]
>
> > > That is precisely the way I've been testing it and everything appears
> > > to be tore down as it should.
> > >
> > > And a PCI driver that has been unbound should have released its
> > > resources, or that's a driver bug. Right?
> >
> > But that's the thing: you can easily remove part of the infrastructure
> > without the endpoint driver even noticing. It may not happen in your
> > particular case if removing the RC driver will also nuke the endpoints
> > in the process, but I can't see this is an absolute guarantee. The
> > crash pointed to by an earlier email is symptomatic of it.
> >
> > > And for the OF INTx case you mentioned earlier, aren't those mapped by
> > > PCI core and could in theory be released by core as well?
> >
> > Potentially, though I haven't tried to follow the life cycle of those.
> > The whole thing is pretty fragile, and this sort of resource is rarely
> > expected to be removed...
>
> This made me notice that we don't undo the actions (ie bridge->map_irq())
> executed in pci_assign_irq() in pci_device_remove(); I don't think this
> can be right and that's already a candidate for a fix.
There's an inherent asymmetry here as a legacy interrupt can be used by
more than one device. It is mapped on first use as each user calls
->map_irq() but can only be disposed when the final user is gone as I
mentioned here:
https://lore.kernel.org/all/Yt+6azfwd%[email protected]/
> It is not necessarily related to this thread topic, though I believe,
> in an _ideal_ world, removing a bridge should guarantee that all
> the downstream devices (ie drivers) had a chance of freeing/disposing
> the resources they allocated. This in theory; I totally understand
> Marc's point of view here and we should make up our mind about what
> we want to do on host bridge removal policy - this will take me more
> time to get to the bottom of it.
Johan