While running Android15-6.6 and hacking on latest Qemu, I observed
the following with KFENCE:
[ 10.761992][ T81] pcieport 0000:00:03.0: Adding to iommu group 0
[ 10.782536][ T81] pcieport 0000:00:03.0: enabling device (0000 -> 0003)
[ 10.814259][ T81] ==================================================================
[ 10.816534][ T81] BUG: KFENCE: use-after-free read in __pci_enable_msi_range+0x2c0/0x488
[ 10.816534][ T81]
[ 10.820189][ T81] Use-after-free read at 0x0000000024629571 (in kfence-#12):
[ 10.822652][ T81] __pci_enable_msi_range+0x2c0/0x488
[ 10.824304][ T81] pci_alloc_irq_vectors_affinity+0xec/0x14c
[ 10.826129][ T81] pci_alloc_irq_vectors+0x18/0x28
[ 10.827606][ T81] pcie_portdrv_probe+0x1e0/0x608
[ 10.828979][ T81] pci_device_probe+0xa8/0x174
[ 10.830316][ T81] really_probe+0x150/0x2b8
[ 10.831717][ T81] __driver_probe_device+0x7c/0x130
[ 10.833368][ T81] driver_probe_device+0x40/0x118
[ 10.834899][ T81] __device_attach_driver+0xc4/0x108
[ 10.836464][ T81] bus_for_each_drv+0x8c/0xf0
[ 10.837915][ T81] __device_attach+0xa4/0x198
[ 10.839260][ T81] device_initial_probe+0x18/0x28
[ 10.840684][ T81] bus_probe_device+0xb0/0xb4
[ 10.842174][ T81] deferred_probe_work_func+0xac/0xf8
[ 10.843760][ T81] process_one_work+0x18c/0x414
[ 10.845129][ T81] worker_thread+0x41c/0x53c
[ 10.846611][ T81] kthread+0x114/0x118
[ 10.847841][ T81] ret_from_fork+0x10/0x20
[ 10.849298][ T81]
[ 10.850435][ T81] kfence-#12: 0x0000000008614900-0x00000000e06c228d, size=104, cache=kmalloc-128
[ 10.850435][ T81]
[ 10.853455][ T81] allocated by task 81 on cpu 7 at 10.808142s:
[ 10.856329][ T81] __kmem_cache_alloc_node+0x1f0/0x2bc
[ 10.857988][ T81] kmalloc_trace+0x44/0x138
[ 10.859299][ T81] msi_alloc_desc+0x3c/0x9c
[ 10.860582][ T81] msi_domain_insert_msi_desc+0x30/0x78
[ 10.862058][ T81] msi_setup_msi_desc+0x13c/0x184
[ 10.862942][ T81] __pci_enable_msi_range+0x258/0x488
[ 10.863847][ T81] pci_alloc_irq_vectors_affinity+0xec/0x14c
[ 10.864821][ T81] pci_alloc_irq_vectors+0x18/0x28
[ 10.866011][ T81] pcie_portdrv_probe+0x1e0/0x608
[ 10.867047][ T81] pci_device_probe+0xa8/0x174
[ 10.867937][ T81] really_probe+0x150/0x2b8
[ 10.868774][ T81] __driver_probe_device+0x7c/0x130
[ 10.869755][ T81] driver_probe_device+0x40/0x118
[ 10.870665][ T81] __device_attach_driver+0xc4/0x108
[ 10.871654][ T81] bus_for_each_drv+0x8c/0xf0
[ 10.872585][ T81] __device_attach+0xa4/0x198
[ 10.873581][ T81] device_initial_probe+0x18/0x28
[ 10.874645][ T81] bus_probe_device+0xb0/0xb4
[ 10.875610][ T81] deferred_probe_work_func+0xac/0xf8
[ 10.876696][ T81] process_one_work+0x18c/0x414
[ 10.877573][ T81] worker_thread+0x41c/0x53c
[ 10.878468][ T81] kthread+0x114/0x118
[ 10.879331][ T81] ret_from_fork+0x10/0x20
[ 10.880238][ T81]
[ 10.880767][ T81] freed by task 81 on cpu 7 at 10.811436s:
[ 10.882311][ T81] msi_domain_free_descs+0xd4/0x10c
[ 10.883318][ T81] msi_domain_free_locked.part.0+0xc0/0x1d8
[ 10.884401][ T81] msi_domain_alloc_irqs_all_locked+0xb4/0xbc
[ 10.885519][ T81] pci_msi_setup_msi_irqs+0x30/0x4c
[ 10.886549][ T81] __pci_enable_msi_range+0x2a8/0x488
[ 10.887537][ T81] pci_alloc_irq_vectors_affinity+0xec/0x14c
[ 10.888576][ T81] pci_alloc_irq_vectors+0x18/0x28
[ 10.889693][ T81] pcie_portdrv_probe+0x1e0/0x608
[ 10.890719][ T81] pci_device_probe+0xa8/0x174
[ 10.891623][ T81] really_probe+0x150/0x2b8
[ 10.892534][ T81] __driver_probe_device+0x7c/0x130
[ 10.893541][ T81] driver_probe_device+0x40/0x118
[ 10.894491][ T81] __device_attach_driver+0xc4/0x108
[ 10.895433][ T81] bus_for_each_drv+0x8c/0xf0
[ 10.896381][ T81] __device_attach+0xa4/0x198
[ 10.897359][ T81] device_initial_probe+0x18/0x28
[ 10.898401][ T81] bus_probe_device+0xb0/0xb4
[ 10.899268][ T81] deferred_probe_work_func+0xac/0xf8
[ 10.900314][ T81] process_one_work+0x18c/0x414
[ 10.901279][ T81] worker_thread+0x41c/0x53c
[ 10.902138][ T81] kthread+0x114/0x118
[ 10.902916][ T81] ret_from_fork+0x10/0x20
[ 10.903816][ T81]
[ 10.904429][ T81] CPU: 7 PID: 81 Comm: kworker/u16:2 Not tainted 6.6.23mostafa+ #224
[ 10.906362][ T81] Hardware name: linux,dummy-virt (DT)
[ 10.907934][ T81] Workqueue: events_unbound deferred_probe_work_func
[ 10.909959][ T81] ==================================================================
Looking at the upstream code, it seems to have the same issue where:
Descriptor allocation done in:
__pci_enable_msi_range
msi_capability_init
msi_setup_msi_desc
msi_insert_msi_desc
msi_domain_insert_msi_desc
msi_alloc_desc
...
Freed in case of failure in __msi_domain_alloc_locked()
__pci_enable_msi_range
msi_capability_init
pci_msi_setup_msi_irqs
msi_domain_alloc_irqs_all_locked
msi_domain_alloc_locked
__msi_domain_alloc_locked => fails
msi_domain_free_locked
...
That failure would propagate back till pci_msi_setup_msi_irqs() call
in msi_capability_init() which have "goto err" which would access the
descriptor to clear the mask.
However, we can’t make assumptions if the descriptor is freed or not
as it depends on the failure location and on MSI_FLAG_FREE_MSI_DESCS,
so one simple solution is to re-read it.
Also, calling pci_free_msi_irqs() after will trigger
msi_domain_free_locked() again, however it re-iterates through the xa
array, which should be safe.
I hit this only once, but I can confirm the bug(and verify the fix)
for the upstream kernel through stubbing return of
__msi_domain_alloc_locked() to fail and with KASAN enabled.
bf6e054e0e3f ("genirq/msi: Provide msi_device_populate/destroy_sysfs()")
is the first commit that introduced the free logic from the context
of pci_msi_setup_msi_irqs()
Signed-off-by: Mostafa Saleh <[email protected]>
---
drivers/pci/msi/msi.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 682fa877478f..dd961151fecf 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -393,7 +393,10 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
goto unlock;
err:
- pci_msi_unmask(entry, msi_multi_mask(entry));
+ /* Re-read the descriptor as it might have been freed */
+ entry = msi_first_desc(&dev->dev, MSI_DESC_ALL);
+ if (entry)
+ pci_msi_unmask(entry, msi_multi_mask(entry));
pci_free_msi_irqs(dev);
fail:
dev->msi_enabled = 0;
--
2.44.0.769.g3c40516874-goog
On Mon, Apr 29 2024 at 03:41, Mostafa Saleh wrote:
> err:
> - pci_msi_unmask(entry, msi_multi_mask(entry));
> + /* Re-read the descriptor as it might have been freed */
> + entry = msi_first_desc(&dev->dev, MSI_DESC_ALL);
> + if (entry)
> + pci_msi_unmask(entry, msi_multi_mask(entry));
What unmasks the entry in the NULL case?
The mask has to be undone. So you need something like the uncompiled
below.
Thanks,
tglx
---
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -349,7 +349,7 @@ static int msi_capability_init(struct pc
struct irq_affinity *affd)
{
struct irq_affinity_desc *masks = NULL;
- struct msi_desc *entry;
+ struct msi_desc *entry, desc;
int ret;
/* Reject multi-MSI early on irq domain enabled architectures */
@@ -374,6 +374,12 @@ static int msi_capability_init(struct pc
/* All MSIs are unmasked by default; mask them all */
entry = msi_first_desc(&dev->dev, MSI_DESC_ALL);
pci_msi_mask(entry, msi_multi_mask(entry));
+ /*
+ * Copy the MSI descriptor for the error path because
+ * pci_msi_setup_msi_irqs() will free it for the hierarchical
+ * interrupt domain case.
+ */
+ memcpy(&desc, entry, sizeof(desc));
/* Configure MSI capability structure */
ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI);
@@ -393,7 +399,7 @@ static int msi_capability_init(struct pc
goto unlock;
err:
- pci_msi_unmask(entry, msi_multi_mask(entry));
+ pci_msi_unmask(&desc, msi_multi_mask(&desc));
pci_free_msi_irqs(dev);
fail:
dev->msi_enabled = 0;
Hi Thomas,
On Mon, Apr 29, 2024 at 11:04:39PM +0200, Thomas Gleixner wrote:
> On Mon, Apr 29 2024 at 03:41, Mostafa Saleh wrote:
> > err:
> > - pci_msi_unmask(entry, msi_multi_mask(entry));
> > + /* Re-read the descriptor as it might have been freed */
> > + entry = msi_first_desc(&dev->dev, MSI_DESC_ALL);
> > + if (entry)
> > + pci_msi_unmask(entry, msi_multi_mask(entry));
>
> What unmasks the entry in the NULL case?
Apparently nothing, I missed that. (PCI isn’t really my area, I
prefer dealing with non standardised platform devices :))
>
> The mask has to be undone. So you need something like the uncompiled
> below.
>
> Thanks,
>
> tglx
> ---
>
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -349,7 +349,7 @@ static int msi_capability_init(struct pc
> struct irq_affinity *affd)
> {
> struct irq_affinity_desc *masks = NULL;
> - struct msi_desc *entry;
> + struct msi_desc *entry, desc;
> int ret;
>
> /* Reject multi-MSI early on irq domain enabled architectures */
> @@ -374,6 +374,12 @@ static int msi_capability_init(struct pc
> /* All MSIs are unmasked by default; mask them all */
> entry = msi_first_desc(&dev->dev, MSI_DESC_ALL);
> pci_msi_mask(entry, msi_multi_mask(entry));
> + /*
> + * Copy the MSI descriptor for the error path because
> + * pci_msi_setup_msi_irqs() will free it for the hierarchical
> + * interrupt domain case.
> + */
> + memcpy(&desc, entry, sizeof(desc));
>
> /* Configure MSI capability structure */
> ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI);
> @@ -393,7 +399,7 @@ static int msi_capability_init(struct pc
> goto unlock;
>
> err:
> - pci_msi_unmask(entry, msi_multi_mask(entry));
> + pci_msi_unmask(&desc, msi_multi_mask(&desc));
> pci_free_msi_irqs(dev);
> fail:
> dev->msi_enabled = 0;
I tested it with my stub, but since I didn't write the code, here
is my tag, let me know if you want me to respin.
Tested-by: Mostafa Saleh <[email protected]>
Thanks,
Mostafa