2024-03-19 14:43:42

by Francisco Ayala Le Brun

[permalink] [raw]
Subject: Bug report: probe of AMDI0040:00 failed with error -16

Hello,

I would like to report a bug.

Issue description:
After updating a GHF51 SBC to a newer kernel version, the system was
no longer able to boot. Running the "lsblk" command in the recovery
console showed no mmc storage detected.

System Information:
OS: Fedora 40 x86_64
Kernel: 6.8.0-0.rc6.49.fc40.x86_64

Relevant Logs:
[ 10.920756] Call Trace:
[ 10.920763] <TASK>
[ 10.920771] dump_stack_lvl+0x4d/0x70
[ 10.920786] __setup_irq+0x530/0x6c0
[ 10.920801] request_threaded_irq+0xe5/0x180
[ 10.920813] ? __pfx_sdhci_thread_irq+0x10/0x10 [sdhci]
[ 10.920843] __sdhci_add_host+0x108/0x360 [sdhci]
[ 10.920871] sdhci_acpi_probe+0x3a8/0x500 [sdhci_acpi]
[ 10.920894] platform_probe+0x44/0xa0
[ 10.920908] really_probe+0x19e/0x3e0
[ 10.930244] __driver_probe_device+0x78/0x160
[ 10.930264] driver_probe_device+0x1f/0xa0
[ 10.930273] __driver_attach_async_helper+0x5e/0xe0
[ 10.930284] async_run_entry_fn+0x34/0x130
[ 10.930296] process_one_work+0x170/0x330
[ 10.930309] worker_thread+0x273/0x3c0
[ 10.934639] ? __pfx_worker_thread+0x10/0x10
[ 10.934654] kthread+0xe8/0x120
[ 10.934663] ? __pfx_kthread+0x10/0x10
[ 10.934671] ret_from_fork+0x34/0x50
[ 10.934681] ? __pfx_kthread+0x10/0x10
[ 10.934688] ret_from_fork_asm+0x1b/0x30
[ 10.934708] </TASK>
[ 10.940978] mmc0: Failed to request IRQ 7: -16
[ 10.943885] sdhci-acpi: probe of AMDI0040:00 failed with error -16


2024-03-19 16:26:57

by Adrian Hunter

[permalink] [raw]
Subject: Re: Bug report: probe of AMDI0040:00 failed with error -16

On 19/03/24 16:43, Francisco Ayala Le Brun wrote:
> Hello,
>
> I would like to report a bug.
>
> Issue description:
> After updating a GHF51 SBC to a newer kernel version, the system was

What was the older / working kernel version? Are you able
to git bisect?

> no longer able to boot. Running the "lsblk" command in the recovery
> console showed no mmc storage detected.
>
> System Information:
> OS: Fedora 40 x86_64
> Kernel: 6.8.0-0.rc6.49.fc40.x86_64
>
> Relevant Logs:

Really no error / fail messages before the stack dump?

> [ 10.920756] Call Trace:
> [ 10.920763] <TASK>
> [ 10.920771] dump_stack_lvl+0x4d/0x70
> [ 10.920786] __setup_irq+0x530/0x6c0
> [ 10.920801] request_threaded_irq+0xe5/0x180
> [ 10.920813] ? __pfx_sdhci_thread_irq+0x10/0x10 [sdhci]
> [ 10.920843] __sdhci_add_host+0x108/0x360 [sdhci]
> [ 10.920871] sdhci_acpi_probe+0x3a8/0x500 [sdhci_acpi]
> [ 10.920894] platform_probe+0x44/0xa0
> [ 10.920908] really_probe+0x19e/0x3e0
> [ 10.930244] __driver_probe_device+0x78/0x160
> [ 10.930264] driver_probe_device+0x1f/0xa0
> [ 10.930273] __driver_attach_async_helper+0x5e/0xe0
> [ 10.930284] async_run_entry_fn+0x34/0x130
> [ 10.930296] process_one_work+0x170/0x330
> [ 10.930309] worker_thread+0x273/0x3c0
> [ 10.934639] ? __pfx_worker_thread+0x10/0x10
> [ 10.934654] kthread+0xe8/0x120
> [ 10.934663] ? __pfx_kthread+0x10/0x10
> [ 10.934671] ret_from_fork+0x34/0x50
> [ 10.934681] ? __pfx_kthread+0x10/0x10
> [ 10.934688] ret_from_fork_asm+0x1b/0x30
> [ 10.934708] </TASK>
> [ 10.940978] mmc0: Failed to request IRQ 7: -16
> [ 10.943885] sdhci-acpi: probe of AMDI0040:00 failed with error -16

16 is EBUSY which seems to be used by __setup_irq() for
irq mismatch


2024-03-20 19:30:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Bug report: probe of AMDI0040:00 failed with error -16

On Tuesday, March 19, 2024 5:20:41 PM CET Adrian Hunter wrote:
> On 19/03/24 16:43, Francisco Ayala Le Brun wrote:
> > Hello,
> >
> > I would like to report a bug.
> >
> > Issue description:
> > After updating a GHF51 SBC to a newer kernel version, the system was
>
> What was the older / working kernel version? Are you able
> to git bisect?
>
> > no longer able to boot. Running the "lsblk" command in the recovery
> > console showed no mmc storage detected.
> >
> > System Information:
> > OS: Fedora 40 x86_64
> > Kernel: 6.8.0-0.rc6.49.fc40.x86_64
> >
> > Relevant Logs:
>
> Really no error / fail messages before the stack dump?
>
> > [ 10.920756] Call Trace:
> > [ 10.920763] <TASK>
> > [ 10.920771] dump_stack_lvl+0x4d/0x70
> > [ 10.920786] __setup_irq+0x530/0x6c0
> > [ 10.920801] request_threaded_irq+0xe5/0x180
> > [ 10.920813] ? __pfx_sdhci_thread_irq+0x10/0x10 [sdhci]
> > [ 10.920843] __sdhci_add_host+0x108/0x360 [sdhci]
> > [ 10.920871] sdhci_acpi_probe+0x3a8/0x500 [sdhci_acpi]
> > [ 10.920894] platform_probe+0x44/0xa0
> > [ 10.920908] really_probe+0x19e/0x3e0
> > [ 10.930244] __driver_probe_device+0x78/0x160
> > [ 10.930264] driver_probe_device+0x1f/0xa0
> > [ 10.930273] __driver_attach_async_helper+0x5e/0xe0
> > [ 10.930284] async_run_entry_fn+0x34/0x130
> > [ 10.930296] process_one_work+0x170/0x330
> > [ 10.930309] worker_thread+0x273/0x3c0
> > [ 10.934639] ? __pfx_worker_thread+0x10/0x10
> > [ 10.934654] kthread+0xe8/0x120
> > [ 10.934663] ? __pfx_kthread+0x10/0x10
> > [ 10.934671] ret_from_fork+0x34/0x50
> > [ 10.934681] ? __pfx_kthread+0x10/0x10
> > [ 10.934688] ret_from_fork_asm+0x1b/0x30
> > [ 10.934708] </TASK>
> > [ 10.940978] mmc0: Failed to request IRQ 7: -16
> > [ 10.943885] sdhci-acpi: probe of AMDI0040:00 failed with error -16
>
> 16 is EBUSY which seems to be used by __setup_irq() for
> irq mismatch

Would you be able to test the patch below and see if it helps?

---
drivers/pinctrl/pinctrl-amd.c | 2 +-
include/linux/interrupt.h | 5 ++++-
kernel/irq/manage.c | 13 +++++++++++--
3 files changed, 16 insertions(+), 4 deletions(-)

Index: linux-pm/include/linux/interrupt.h
===================================================================
--- linux-pm.orig/include/linux/interrupt.h
+++ linux-pm/include/linux/interrupt.h
@@ -67,6 +67,8 @@
* later.
* IRQF_NO_DEBUG - Exclude from runnaway detection for IPI and similar handlers,
* depends on IRQF_PERCPU.
+ * IRQF_COND_ONESHOT - Agree to do IRQF_ONESHOT if already set for a shared
+ * interrupt.
*/
#define IRQF_SHARED 0x00000080
#define IRQF_PROBE_SHARED 0x00000100
@@ -82,6 +84,7 @@
#define IRQF_COND_SUSPEND 0x00040000
#define IRQF_NO_AUTOEN 0x00080000
#define IRQF_NO_DEBUG 0x00100000
+#define IRQF_COND_ONESHOT 0x00200000

#define IRQF_TIMER (__IRQF_TIMER | IRQF_NO_SUSPEND | IRQF_NO_THREAD)

@@ -784,7 +787,7 @@ extern void tasklet_setup(struct tasklet
* if more than one irq occurred.
*/

-#if !defined(CONFIG_GENERIC_IRQ_PROBE)
+#if !defined(CONFIG_GENERIC_IRQ_PROBE)
static inline unsigned long probe_irq_on(void)
{
return 0;
Index: linux-pm/kernel/irq/manage.c
===================================================================
--- linux-pm.orig/kernel/irq/manage.c
+++ linux-pm/kernel/irq/manage.c
@@ -1642,8 +1642,14 @@ __setup_irq(unsigned int irq, struct irq
}

if (!((old->flags & new->flags) & IRQF_SHARED) ||
- (oldtype != (new->flags & IRQF_TRIGGER_MASK)) ||
- ((old->flags ^ new->flags) & IRQF_ONESHOT))
+ (oldtype != (new->flags & IRQF_TRIGGER_MASK)))
+ goto mismatch;
+
+ if ((old->flags & IRQF_ONESHOT) &&
+ (new->flags & IRQF_COND_ONESHOT))
+ new->flags |= IRQF_ONESHOT;
+
+ if ((old->flags ^ new->flags) & IRQF_ONESHOT)
goto mismatch;

/* All handlers must agree on per-cpuness */
@@ -1665,6 +1671,9 @@ __setup_irq(unsigned int irq, struct irq
shared = 1;
}

+ /* IRQF_COND_ONESHOT has no meaning from now on, so clear it. */
+ new->flags &= ~IRQF_COND_ONESHOT;
+
/*
* Setup the thread mask for this irqaction for ONESHOT. For
* !ONESHOT irqs the thread mask is 0 so we can avoid a
Index: linux-pm/drivers/pinctrl/pinctrl-amd.c
===================================================================
--- linux-pm.orig/drivers/pinctrl/pinctrl-amd.c
+++ linux-pm/drivers/pinctrl/pinctrl-amd.c
@@ -1159,7 +1159,7 @@ static int amd_gpio_probe(struct platfor
}

ret = devm_request_irq(&pdev->dev, gpio_dev->irq, amd_gpio_irq_handler,
- IRQF_SHARED | IRQF_ONESHOT, KBUILD_MODNAME, gpio_dev);
+ IRQF_SHARED | IRQF_COND_ONESHOT, KBUILD_MODNAME, gpio_dev);
if (ret)
goto out2;






2024-03-21 16:34:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Bug report: probe of AMDI0040:00 failed with error -16

On Wednesday, March 20, 2024 8:29:55 PM CET Rafael J. Wysocki wrote:
> On Tuesday, March 19, 2024 5:20:41 PM CET Adrian Hunter wrote:
> > On 19/03/24 16:43, Francisco Ayala Le Brun wrote:
> > > Hello,
> > >
> > > I would like to report a bug.
> > >
> > > Issue description:
> > > After updating a GHF51 SBC to a newer kernel version, the system was
> >
> > What was the older / working kernel version? Are you able
> > to git bisect?
> >
> > > no longer able to boot. Running the "lsblk" command in the recovery
> > > console showed no mmc storage detected.
> > >
> > > System Information:
> > > OS: Fedora 40 x86_64
> > > Kernel: 6.8.0-0.rc6.49.fc40.x86_64
> > >
> > > Relevant Logs:
> >
> > Really no error / fail messages before the stack dump?
> >
> > > [ 10.920756] Call Trace:
> > > [ 10.920763] <TASK>
> > > [ 10.920771] dump_stack_lvl+0x4d/0x70
> > > [ 10.920786] __setup_irq+0x530/0x6c0
> > > [ 10.920801] request_threaded_irq+0xe5/0x180
> > > [ 10.920813] ? __pfx_sdhci_thread_irq+0x10/0x10 [sdhci]
> > > [ 10.920843] __sdhci_add_host+0x108/0x360 [sdhci]
> > > [ 10.920871] sdhci_acpi_probe+0x3a8/0x500 [sdhci_acpi]
> > > [ 10.920894] platform_probe+0x44/0xa0
> > > [ 10.920908] really_probe+0x19e/0x3e0
> > > [ 10.930244] __driver_probe_device+0x78/0x160
> > > [ 10.930264] driver_probe_device+0x1f/0xa0
> > > [ 10.930273] __driver_attach_async_helper+0x5e/0xe0
> > > [ 10.930284] async_run_entry_fn+0x34/0x130
> > > [ 10.930296] process_one_work+0x170/0x330
> > > [ 10.930309] worker_thread+0x273/0x3c0
> > > [ 10.934639] ? __pfx_worker_thread+0x10/0x10
> > > [ 10.934654] kthread+0xe8/0x120
> > > [ 10.934663] ? __pfx_kthread+0x10/0x10
> > > [ 10.934671] ret_from_fork+0x34/0x50
> > > [ 10.934681] ? __pfx_kthread+0x10/0x10
> > > [ 10.934688] ret_from_fork_asm+0x1b/0x30
> > > [ 10.934708] </TASK>
> > > [ 10.940978] mmc0: Failed to request IRQ 7: -16
> > > [ 10.943885] sdhci-acpi: probe of AMDI0040:00 failed with error -16
> >
> > 16 is EBUSY which seems to be used by __setup_irq() for
> > irq mismatch
>
> Would you be able to test the patch below and see if it helps?
>
> ---
> drivers/pinctrl/pinctrl-amd.c | 2 +-
> include/linux/interrupt.h | 5 ++++-
> kernel/irq/manage.c | 13 +++++++++++--
> 3 files changed, 16 insertions(+), 4 deletions(-)
>
> Index: linux-pm/include/linux/interrupt.h
> ===================================================================
> --- linux-pm.orig/include/linux/interrupt.h
> +++ linux-pm/include/linux/interrupt.h
> @@ -67,6 +67,8 @@
> * later.
> * IRQF_NO_DEBUG - Exclude from runnaway detection for IPI and similar handlers,
> * depends on IRQF_PERCPU.
> + * IRQF_COND_ONESHOT - Agree to do IRQF_ONESHOT if already set for a shared
> + * interrupt.
> */
> #define IRQF_SHARED 0x00000080
> #define IRQF_PROBE_SHARED 0x00000100
> @@ -82,6 +84,7 @@
> #define IRQF_COND_SUSPEND 0x00040000
> #define IRQF_NO_AUTOEN 0x00080000
> #define IRQF_NO_DEBUG 0x00100000
> +#define IRQF_COND_ONESHOT 0x00200000
>
> #define IRQF_TIMER (__IRQF_TIMER | IRQF_NO_SUSPEND | IRQF_NO_THREAD)
>

We actually can get away without defining a new IRQ flag, as in
the patch below.

It is not super-clean, but should do the work.

Linus, what do you think?

---
drivers/pinctrl/pinctrl-amd.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/pinctrl/pinctrl-amd.c
===================================================================
--- linux-pm.orig/drivers/pinctrl/pinctrl-amd.c
+++ linux-pm/drivers/pinctrl/pinctrl-amd.c
@@ -1159,7 +1159,18 @@ static int amd_gpio_probe(struct platfor
}

ret = devm_request_irq(&pdev->dev, gpio_dev->irq, amd_gpio_irq_handler,
- IRQF_SHARED | IRQF_ONESHOT, KBUILD_MODNAME, gpio_dev);
+ IRQF_SHARED | IRQF_PROBE_SHARED, KBUILD_MODNAME,
+ gpio_dev);
+ /*
+ * There can be a flags mismatch if IRQF_ONESHOT has been set for the
+ * IRQ already, so if the error code indicates that, try again with
+ * IRQF_ONESHOT set.
+ */
+ if (ret == -EBUSY)
+ ret = devm_request_irq(&pdev->dev, gpio_dev->irq, amd_gpio_irq_handler,
+ IRQF_SHARED | IRQF_ONESHOT, KBUILD_MODNAME,
+ gpio_dev);
+
if (ret)
goto out2;





2024-03-22 14:30:41

by Linus Walleij

[permalink] [raw]
Subject: Re: Bug report: probe of AMDI0040:00 failed with error -16

On Thu, Mar 21, 2024 at 5:33 PM Rafael J. Wysocki <[email protected]> wrote:

> We actually can get away without defining a new IRQ flag, as in
> the patch below.
>
> It is not super-clean, but should do the work.
>
> Linus, what do you think?

Uhhh I rather not, the other approach will cover the invariably recurring
instances of this, it will not be the last time we see something like this.

We need tglx input on this, I could merge the patch below with some
big TODO to fix it properly if the discussion about the proper solution
takes too much time.

But I rather not hack around with IRQs without tglx (or marcz, but he
got overloaded) input.

Yours,
Linus Walleij

2024-03-22 14:49:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Bug report: probe of AMDI0040:00 failed with error -16

On Fri, Mar 22, 2024 at 3:28 PM Linus Walleij <linus.walleij@linaroorg> wrote:
>
> On Thu, Mar 21, 2024 at 5:33 PM Rafael J. Wysocki <[email protected]> wrote:
>
> > We actually can get away without defining a new IRQ flag, as in
> > the patch below.
> >
> > It is not super-clean, but should do the work.
> >
> > Linus, what do you think?
>
> Uhhh I rather not, the other approach will cover the invariably recurring
> instances of this, it will not be the last time we see something like this.

I'm not actually sure how likely this is.

The ACPI SCI is generally heavy-wieght, so it is not shared very often
(and I believe that there is a particular reason for sharing it with a
GPIO chip) and this very well may be an exception.

> We need tglx input on this, I could merge the patch below with some
> big TODO to fix it properly if the discussion about the proper solution
> takes too much time.
>
> But I rather not hack around with IRQs without tglx (or marcz, but he
> got overloaded) input.

Fair enough.

I guess I'll post the first patch with a proper changelog next week
and we'll see.

Thanks!

2024-03-22 20:07:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Bug report: probe of AMDI0040:00 failed with error -16

On Fri, Mar 22 2024 at 15:49, Rafael J. Wysocki wrote:
> On Fri, Mar 22, 2024 at 3:28 PM Linus Walleij <linus.walleij@linaroorg> wrote:
>> Uhhh I rather not, the other approach will cover the invariably recurring
>> instances of this, it will not be the last time we see something like this.
>
> I'm not actually sure how likely this is.
>
> The ACPI SCI is generally heavy-wieght, so it is not shared very often
> (and I believe that there is a particular reason for sharing it with a
> GPIO chip) and this very well may be an exception.
>
>> We need tglx input on this, I could merge the patch below with some
>> big TODO to fix it properly if the discussion about the proper solution
>> takes too much time.
>>
>> But I rather not hack around with IRQs without tglx (or marcz, but he
>> got overloaded) input.
>
> Fair enough.
>
> I guess I'll post the first patch with a proper changelog next week
> and we'll see.

Yes please. The COND flag makes a lot of sense. Hacking around it in the
driver is just a bandaid.

Thanks,

tglx