2023-10-18 18:50:58

by Mario Limonciello

[permalink] [raw]
Subject: PIC probing code from e179f6914152 failing

Hi,

Recently an issue was reported to Bugzilla [1] that the Keyboard (IRQ 1)
and GPIO controller (IRQ 7) weren't working properly on two separate
Lenovo machines. The issues are unique to Linux.

In digging through them, they happen because Lenovo didn't set up the
PIC in the BIOS.
Specifically the PIC probing code introduced by e179f6914152 ("x86, irq,
pic: Probe for legacy PIC and set legacy_pic appropriately") expects
that the BIOS sets up the PIC and uses that assertion to let Linux set
it up.

One of the reporters confirmed that reverting e179f6914152 fixes the
issue. Keyboard and GPIO controller both work properly.

I wanted to ask if we can please revert that and come up with a
different solution for kexec with HyperV.
Can guests instead perhaps detect in early boot code they're running in
an EFI based hypervisor and explicitly set 'legacy_pic = &null_legacy_pic;'?

[1] https://bugzilla.kernel.org/show_bug.cgi?id=218003

Thanks,


2023-10-18 22:50:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On Wed, Oct 18 2023 at 13:50, Mario Limonciello wrote:
> Recently an issue was reported to Bugzilla [1] that the Keyboard (IRQ 1)
> and GPIO controller (IRQ 7) weren't working properly on two separate
> Lenovo machines. The issues are unique to Linux.
>
> In digging through them, they happen because Lenovo didn't set up the
> PIC in the BIOS.
> Specifically the PIC probing code introduced by e179f6914152 ("x86, irq,
> pic: Probe for legacy PIC and set legacy_pic appropriately") expects
> that the BIOS sets up the PIC and uses that assertion to let Linux set
> it up.
>
> One of the reporters confirmed that reverting e179f6914152 fixes the
> issue. Keyboard and GPIO controller both work properly.
>
> I wanted to ask if we can please revert that and come up with a
> different solution for kexec with HyperV.
> Can guests instead perhaps detect in early boot code they're running in
> an EFI based hypervisor and explicitly set 'legacy_pic = &null_legacy_pic;'?

No. This detection mechanism prevents PIC usage also in other
scenarios.

It's perfectly valid code and the assumption that you can read back what
you wrote to the master IMR is entirely correct. If that's not the case
then there is no PIC, the BIOS has disabled some parts of the legacy
block or did not initialize it.

Letting the kernel blindly assume that there is always a PIC is just
wrong. Worse, it puts the burden on everyone else to sprinkle
"legacy_pic = null_pic;" all over the place with dubious
conditions. That's exactly what the commit in question avoided.

So no, we are not going back there.

We could obviously change the probe() function to issue a full PIC
initialization sequence before reading a known written value
back. That's basically what the revert causes to happen via
legacy_pic->init().

But I fundamentally hate to do that because forcing the init sequence
just to make presumably broken code which has some dubious dependencies
on the PIC or on nr_legacy_irqs > 0 happy is not really a solution when
the PIC is actually not needed at all. For anything modern where all
legacy interrupts are routed to the IO/APIC the PIC does not make any
sense whatsoever.

We rather go and fix the real underlying problem.

The kernel can handle the legacy interrupts perfectly fine through the
IOAPIC. There is no hard requirement for the PIC at all except for
really old systems which had the timer interrupt wired to the PIC and
therefore required to route the timer interrupt through the PIC instead
of the IO/APIC or did not provide routing entries for the IO/APIC. See
the horrible hackery in check_timer() and the grossly misnomed
init_IO_APIC_traps().

I just took a random machine, forced the NULL PIC and added
'apic=verbose' to the kernel command line and magically all the legacy
interrupts are set up via IO/APIC despite the NULL PIC and therefore 0
preallocated legacy interrupts.

apic 0 pin 0 not connected
IOAPIC[0]: Preconfigured routing entry (0-1 -> IRQ 1 Level:0 ActiveLow:0)
IOAPIC[0]: Preconfigured routing entry (0-2 -> IRQ 2 Level:0 ActiveLow:0)
IOAPIC[0]: Preconfigured routing entry (0-3 -> IRQ 3 Level:0 ActiveLow:0)
...

which is identical to the output with PIC enabled. That debug message is
emitted from mp_irqdomain_alloc() which is invoked via the PNP resource
management code.

Now /proc/interrupts:

CPU0 CPU1 CPU2 CPU3
1: 0 56 0 0 IO-APIC 1-edge i8042
4: 442 0 0 0 IO-APIC 4-edge ttyS0
8: 0 0 0 0 IO-APIC 8-edge rtc0
9: 0 0 0 0 IO-APIC 9-fasteoi acpi
12: 0 0 122 0 IO-APIC 12-edge i8042

Keyboard and serial are working, see?

The only interrupt which does not work is interrupt 0 because nothing
allocates interrupt 0 due to nr_legacy_irqs == 0, but that's a trivially
solvable problem.

That machine does not even need the timer interrupt because it has a
usable APIC and TSC deadline timer, so no APIC timer calibration
required. The same is true for CPUs which do not have a TSC deadline
timer, but enumerate the APIC frequency via CPUID or MSRs.

But that brings up an interesting question. How are those affected
machines even reaching a state where the user notices that just the
keyboard and the GPIO are not working? Why?

The CPUID/MSR APIC frequency enumeration is Intel specific and
everything else depends on a working timer interrupt to calibrate the
APIC timer frequency. So AMD CPUs require the timer interrupt to
work. The only explanation why this "works" in that null PIC case is
that the PIT/HPET interrupt is actually wired to pin 0, but that's
something to be determined...

Can I please get the following information from an affected machine:

1) dmesg with 'apic=verbose' on the command line
2) /proc/interrupts
3) /sys/kernel/debug/irq/irqs/{0..15}

#3 requires CONFIG_GENERIC_IRQ_DEBUGFS to be set.

Two versions of that please:

1) Unpatched kernel
2) Patched kernel

Thanks,

tglx

2023-10-19 16:39:57

by David Lazar

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing


--- On Thu, 19 Oct 2023, Thomas Gleixner wrote:
> Can I please get the following information from an affected machine:
>
> 1) dmesg with 'apic=verbose' on the command line
> 2) /proc/interrupts
> 3) /sys/kernel/debug/irq/irqs/{0..15}
>
> #3 requires CONFIG_GENERIC_IRQ_DEBUGFS to be set.
>
> Two versions of that please:
>
> 1) Unpatched kernel
> 2) Patched kernel

Since they're a lot of files, I've uploaded them in a .tar.gz[0]
attached to the bug[1].

[0] https://bugzilla.kernel.org/attachment.cgi?id=305266&action=edit
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218003

Hope this helps.

Cheers,
-=[david]=-


Attachments:
(No filename) (661.00 B)
signature.asc (849.00 B)
Download all attachments

2023-10-19 21:20:41

by Mario Limonciello

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On 10/18/2023 17:50, Thomas Gleixner wrote:
> On Wed, Oct 18 2023 at 13:50, Mario Limonciello wrote:
>> Recently an issue was reported to Bugzilla [1] that the Keyboard (IRQ 1)
>> and GPIO controller (IRQ 7) weren't working properly on two separate
>> Lenovo machines. The issues are unique to Linux.
>>
>> In digging through them, they happen because Lenovo didn't set up the
>> PIC in the BIOS.
>> Specifically the PIC probing code introduced by e179f6914152 ("x86, irq,
>> pic: Probe for legacy PIC and set legacy_pic appropriately") expects
>> that the BIOS sets up the PIC and uses that assertion to let Linux set
>> it up.
>>
>> One of the reporters confirmed that reverting e179f6914152 fixes the
>> issue. Keyboard and GPIO controller both work properly.
>>
>> I wanted to ask if we can please revert that and come up with a
>> different solution for kexec with HyperV.
>> Can guests instead perhaps detect in early boot code they're running in
>> an EFI based hypervisor and explicitly set 'legacy_pic = &null_legacy_pic;'?
>
> No. This detection mechanism prevents PIC usage also in other
> scenarios.
>
> It's perfectly valid code and the assumption that you can read back what
> you wrote to the master IMR is entirely correct. If that's not the case
> then there is no PIC, the BIOS has disabled some parts of the legacy
> block or did not initialize it.
>
> Letting the kernel blindly assume that there is always a PIC is just
> wrong. Worse, it puts the burden on everyone else to sprinkle
> "legacy_pic = null_pic;" all over the place with dubious
> conditions. That's exactly what the commit in question avoided.
>
> So no, we are not going back there.
>
> We could obviously change the probe() function to issue a full PIC
> initialization sequence before reading a known written value
> back. That's basically what the revert causes to happen via
> legacy_pic->init().
>
> But I fundamentally hate to do that because forcing the init sequence
> just to make presumably broken code which has some dubious dependencies
> on the PIC or on nr_legacy_irqs > 0 happy is not really a solution when
> the PIC is actually not needed at all. For anything modern where all
> legacy interrupts are routed to the IO/APIC the PIC does not make any
> sense whatsoever.
>
> We rather go and fix the real underlying problem.

Looking at the logs from David and also trying to mock up the failure on
my side I have a few findings I'll share, please agree or disagree with
them.

>
> The kernel can handle the legacy interrupts perfectly fine through the
> IOAPIC. There is no hard requirement for the PIC at all except for
> really old systems which had the timer interrupt wired to the PIC and
> therefore required to route the timer interrupt through the PIC instead
> of the IO/APIC or did not provide routing entries for the IO/APIC. See
> the horrible hackery in check_timer() and the grossly misnomed
> init_IO_APIC_traps().
>
> I just took a random machine, forced the NULL PIC and added
> 'apic=verbose' to the kernel command line and magically all the legacy
> interrupts are set up via IO/APIC despite the NULL PIC and therefore 0
> preallocated legacy interrupts.
>
> apic 0 pin 0 not connected
> IOAPIC[0]: Preconfigured routing entry (0-1 -> IRQ 1 Level:0 ActiveLow:0)
> IOAPIC[0]: Preconfigured routing entry (0-2 -> IRQ 2 Level:0 ActiveLow:0)
> IOAPIC[0]: Preconfigured routing entry (0-3 -> IRQ 3 Level:0 ActiveLow:0)
> ...
>
> which is identical to the output with PIC enabled. That debug message is
> emitted from mp_irqdomain_alloc() which is invoked via the PNP resource
> management code.
>
> Now /proc/interrupts:
>
> CPU0 CPU1 CPU2 CPU3
> 1: 0 56 0 0 IO-APIC 1-edge i8042
> 4: 442 0 0 0 IO-APIC 4-edge ttyS0
> 8: 0 0 0 0 IO-APIC 8-edge rtc0
> 9: 0 0 0 0 IO-APIC 9-fasteoi acpi
> 12: 0 0 122 0 IO-APIC 12-edge i8042
>
> Keyboard and serial are working, see?
>
> The only interrupt which does not work is interrupt 0 because nothing
> allocates interrupt 0 due to nr_legacy_irqs == 0, but that's a trivially
> solvable problem.

From David's logs I can see that the timer interrupt gets wired up to
IRQ2 instead of IRQ0.

IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 2 Level:0 ActiveLow:0)

In my hacked up forcing NULL pic case this fixes that:

diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
index 43c1c24e934b..885687e64e4e 100644
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -424,7 +424,7 @@ static int legacy_pic_probe(void)
}

struct legacy_pic null_legacy_pic = {
- .nr_legacy_irqs = 0,
+ .nr_legacy_irqs = 1,
.chip = &dummy_irq_chip,
.mask = legacy_pic_uint_noop,
.unmask = legacy_pic_uint_noop,

I think it's cleaner than changing all the places that use
nr_legacy_irqs(). On my side this makes:

IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 0 Level:0 ActiveLow:0)

>
> That machine does not even need the timer interrupt because it has a
> usable APIC and TSC deadline timer, so no APIC timer calibration
> required. The same is true for CPUs which do not have a TSC deadline
> timer, but enumerate the APIC frequency via CPUID or MSRs.

Don't you need it for things like rtcwake to be able to work?

>
> But that brings up an interesting question. How are those affected
> machines even reaching a state where the user notices that just the
> keyboard and the GPIO are not working? Why?

So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to
try to discover the IRQ to use.

This calls acpi_irq_get() which isn't implemented on x86 (hardcodes
-EINVAL).

I can "work around it" by:

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 76bfcba25003..2b4b436c65d8 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device
*dev, unsigned int num)
}

r = platform_get_resource(dev, IORESOURCE_IRQ, num);
- if (has_acpi_companion(&dev->dev)) {
+ if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
+ has_acpi_companion(&dev->dev)) {
if (r && r->flags & IORESOURCE_DISABLED) {
ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, r);
if (ret)

but the resource that is returned from the next hunk has the resource
flags set wrong in the NULL pic case:

NULL case:
r: AMDI0030:00 flags: 0x30000418
PIC case:
r: AMDI0030:00 flags: 0x418

IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET

This then later the GPIO controller interrupts are not actually working.
For example the attn pin for my I2C touchpad doesn't work.

Checking the debugfs with my "work around" in place I can see a few
things set up differently:

NULL case:
handler: handle_edge_irq
dstate: 0x3740c208
IRQ_TYPE_LEVEL_LOW
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_ON_ACTIVATE
IRQD_CAN_RESERVE
IRQD_WAKEUP_STATE
IRQD_DEFAULT_TRIGGER_SET
IRQD_HANDLE_ENFORCE_IRQCTX

PIC case:
handler: handle_fasteoi_irq
dstate: 0x3740e208
IRQ_TYPE_LEVEL_LOW
IRQD_LEVEL
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_ON_ACTIVATE
IRQD_CAN_RESERVE
IRQD_WAKEUP_STATE
IRQD_DEFAULT_TRIGGER_SET
IRQD_HANDLE_ENFORCE_IRQCTX

I guess something related to the callpath for mp_register_handler().

Maybe this is the same reason for the keyboard not working right.

>
> The CPUID/MSR APIC frequency enumeration is Intel specific and
> everything else depends on a working timer interrupt to calibrate the
> APIC timer frequency. So AMD CPUs require the timer interrupt to
> work. The only explanation why this "works" in that null PIC case is
> that the PIT/HPET interrupt is actually wired to pin 0, but that's
> something to be determined...
>
> Can I please get the following information from an affected machine:
>
> 1) dmesg with 'apic=verbose' on the command line
> 2) /proc/interrupts
> 3) /sys/kernel/debug/irq/irqs/{0..15}
>
> #3 requires CONFIG_GENERIC_IRQ_DEBUGFS to be set.
>
> Two versions of that please:
>
> 1) Unpatched kernel
> 2) Patched kernel
>
> Thanks,
>
> tglx

2023-10-20 03:44:20

by Mario Limonciello

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On 10/19/2023 16:20, Mario Limonciello wrote:
> On 10/18/2023 17:50, Thomas Gleixner wrote:
>> On Wed, Oct 18 2023 at 13:50, Mario Limonciello wrote:
>>> Recently an issue was reported to Bugzilla [1] that the Keyboard (IRQ 1)
>>> and GPIO controller (IRQ 7) weren't working properly on two separate
>>> Lenovo machines.  The issues are unique to Linux.
>>>
>>> In digging through them, they happen because Lenovo didn't set up the
>>> PIC in the BIOS.
>>> Specifically the PIC probing code introduced by e179f6914152 ("x86, irq,
>>> pic: Probe for legacy PIC and set legacy_pic appropriately") expects
>>> that the BIOS sets up the PIC and uses that assertion to let Linux set
>>> it up.
>>>
>>> One of the reporters confirmed that reverting e179f6914152 fixes the
>>> issue.  Keyboard and GPIO controller both work properly.
>>>
>>> I wanted to ask if we can please revert that and come up with a
>>> different solution for kexec with HyperV.
>>> Can guests instead perhaps detect in early boot code they're running in
>>> an EFI based hypervisor and explicitly set 'legacy_pic =
>>> &null_legacy_pic;'?
>>
>> No. This detection mechanism prevents PIC usage also in other
>> scenarios.
>>
>> It's perfectly valid code and the assumption that you can read back what
>> you wrote to the master IMR is entirely correct. If that's not the case
>> then there is no PIC, the BIOS has disabled some parts of the legacy
>> block or did not initialize it.
>>
>> Letting the kernel blindly assume that there is always a PIC is just
>> wrong. Worse, it puts the burden on everyone else to sprinkle
>> "legacy_pic = null_pic;" all over the place with dubious
>> conditions. That's exactly what the commit in question avoided.
>>
>> So no, we are not going back there.
>>
>> We could obviously change the probe() function to issue a full PIC
>> initialization sequence before reading a known written value
>> back. That's basically what the revert causes to happen via
>> legacy_pic->init().
>>
>> But I fundamentally hate to do that because forcing the init sequence
>> just to make presumably broken code which has some dubious dependencies
>> on the PIC or on nr_legacy_irqs > 0 happy is not really a solution when
>> the PIC is actually not needed at all. For anything modern where all
>> legacy interrupts are routed to the IO/APIC the PIC does not make any
>> sense whatsoever.
>>
>> We rather go and fix the real underlying problem.
>
> Looking at the logs from David and also trying to mock up the failure on
> my side I have a few findings I'll share, please agree or disagree with
> them.
>
>>
>> The kernel can handle the legacy interrupts perfectly fine through the
>> IOAPIC. There is no hard requirement for the PIC at all except for
>> really old systems which had the timer interrupt wired to the PIC and
>> therefore required to route the timer interrupt through the PIC instead
>> of the IO/APIC or did not provide routing entries for the IO/APIC. See
>> the horrible hackery in check_timer() and the grossly misnomed
>> init_IO_APIC_traps().
>>
>> I just took a random machine, forced the NULL PIC and added
>> 'apic=verbose' to the kernel command line and magically all the legacy
>> interrupts are set up via IO/APIC despite the NULL PIC and therefore 0
>> preallocated legacy interrupts.
>>
>>    apic 0 pin 0 not connected
>>   IOAPIC[0]: Preconfigured routing entry (0-1 -> IRQ 1 Level:0
>> ActiveLow:0)
>>   IOAPIC[0]: Preconfigured routing entry (0-2 -> IRQ 2 Level:0
>> ActiveLow:0)
>>   IOAPIC[0]: Preconfigured routing entry (0-3 -> IRQ 3 Level:0
>> ActiveLow:0)
>>   ...
>>
>> which is identical to the output with PIC enabled. That debug message is
>> emitted from mp_irqdomain_alloc() which is invoked via the PNP resource
>> management code.
>>
>> Now /proc/interrupts:
>>
>>             CPU0       CPU1       CPU2       CPU3
>>    1:          0         56          0          0    IO-APIC
>> 1-edge      i8042
>>    4:        442          0          0          0    IO-APIC
>> 4-edge      ttyS0
>>    8:          0          0          0          0    IO-APIC
>> 8-edge      rtc0
>>    9:          0          0          0          0    IO-APIC
>> 9-fasteoi   acpi
>>   12:          0          0        122          0    IO-APIC
>> 12-edge      i8042
>>
>> Keyboard and serial are working, see?
>>
>> The only interrupt which does not work is interrupt 0 because nothing
>> allocates interrupt 0 due to nr_legacy_irqs == 0, but that's a trivially
>> solvable problem.
>
> From David's logs I can see that the timer interrupt gets wired up to
> IRQ2 instead of IRQ0.
>
> IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 2 Level:0 ActiveLow:0)
>
> In my hacked up forcing NULL pic case this fixes that:
>
> diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
> index 43c1c24e934b..885687e64e4e 100644
> --- a/arch/x86/kernel/i8259.c
> +++ b/arch/x86/kernel/i8259.c
> @@ -424,7 +424,7 @@ static int legacy_pic_probe(void)
>  }
>
>  struct legacy_pic null_legacy_pic = {
> -       .nr_legacy_irqs = 0,
> +       .nr_legacy_irqs = 1,
>         .chip = &dummy_irq_chip,
>         .mask = legacy_pic_uint_noop,
>         .unmask = legacy_pic_uint_noop,
>
> I think it's cleaner than changing all the places that use
> nr_legacy_irqs().  On my side this makes:
>
> IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 0 Level:0 ActiveLow:0)
>
>>
>> That machine does not even need the timer interrupt because it has a
>> usable APIC and TSC deadline timer, so no APIC timer calibration
>> required. The same is true for CPUs which do not have a TSC deadline
>> timer, but enumerate the APIC frequency via CPUID or MSRs.
>
> Don't you need it for things like rtcwake to be able to work?
>
>>
>> But that brings up an interesting question. How are those affected
>> machines even reaching a state where the user notices that just the
>> keyboard and the GPIO are not working? Why?
>
> So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to
> try to discover the IRQ to use.
>
> This calls acpi_irq_get() which isn't implemented on x86 (hardcodes
> -EINVAL).
>
> I can "work around it" by:
>
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 76bfcba25003..2b4b436c65d8 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device
> *dev, unsigned int num)
>         }
>
>         r = platform_get_resource(dev, IORESOURCE_IRQ, num);
> -       if (has_acpi_companion(&dev->dev)) {
> +       if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
> +            has_acpi_companion(&dev->dev)) {
>                 if (r && r->flags & IORESOURCE_DISABLED) {
>                         ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num,
> r);
>                         if (ret)
>
> but the resource that is returned from the next hunk has the resource
> flags set wrong in the NULL pic case:
>
> NULL case:
> r: AMDI0030:00 flags: 0x30000418
> PIC case:
> r: AMDI0030:00 flags: 0x418
>
> IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET
>
> This then later the GPIO controller interrupts are not actually working.
> For example the attn pin for my I2C touchpad doesn't work.
>
> Checking the debugfs with my "work around" in place I can see a few
> things set up differently:
>
> NULL case:
> handler:  handle_edge_irq
> dstate:   0x3740c208
>             IRQ_TYPE_LEVEL_LOW
>             IRQD_ACTIVATED
>             IRQD_IRQ_STARTED
>             IRQD_SINGLE_TARGET
>             IRQD_MOVE_PCNTXT
>             IRQD_AFFINITY_ON_ACTIVATE
>             IRQD_CAN_RESERVE
>             IRQD_WAKEUP_STATE
>             IRQD_DEFAULT_TRIGGER_SET
>             IRQD_HANDLE_ENFORCE_IRQCTX
>
> PIC case:
> handler:  handle_fasteoi_irq
> dstate:   0x3740e208
>             IRQ_TYPE_LEVEL_LOW
>             IRQD_LEVEL
>             IRQD_ACTIVATED
>             IRQD_IRQ_STARTED
>             IRQD_SINGLE_TARGET
>             IRQD_MOVE_PCNTXT
>             IRQD_AFFINITY_ON_ACTIVATE
>             IRQD_CAN_RESERVE
>             IRQD_WAKEUP_STATE
>             IRQD_DEFAULT_TRIGGER_SET
>             IRQD_HANDLE_ENFORCE_IRQCTX
>
> I guess something related to the callpath for mp_register_handler().
>
> Maybe this is the same reason for the keyboard not working right.
>
>>
>> The CPUID/MSR APIC frequency enumeration is Intel specific and
>> everything else depends on a working timer interrupt to calibrate the
>> APIC timer frequency. So AMD CPUs require the timer interrupt to
>> work. The only explanation why this "works" in that null PIC case is
>> that the PIT/HPET interrupt is actually wired to pin 0, but that's
>> something to be determined...
>>
>> Can I please get the following information from an affected machine:
>>
>>    1) dmesg with 'apic=verbose' on the command line
>>    2) /proc/interrupts
>>    3) /sys/kernel/debug/irq/irqs/{0..15}
>>
>>    #3 requires CONFIG_GENERIC_IRQ_DEBUGFS to be set.
>>
>> Two versions of that please:
>>
>>    1) Unpatched kernel
>>    2) Patched kernel
>>
>> Thanks,
>>
>>          tglx
>

At least on my system with forcing NULL PIC I've come up with this
series to make everything work.

David, can you please have a try with it on your system?

https://lore.kernel.org/linux-kernel/[email protected]/#t

Thanks,

2023-10-20 15:17:59

by Hans de Goede

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

Hi Mario,

On 10/19/23 23:20, Mario Limonciello wrote:
> On 10/18/2023 17:50, Thomas Gleixner wrote:

<snip>

>> But that brings up an interesting question. How are those affected
>> machines even reaching a state where the user notices that just the
>> keyboard and the GPIO are not working? Why?
>
> So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to try to discover the IRQ to use.
>
> This calls acpi_irq_get() which isn't implemented on x86 (hardcodes -EINVAL).
>
> I can "work around it" by:
>
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 76bfcba25003..2b4b436c65d8 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
>         }
>
>         r = platform_get_resource(dev, IORESOURCE_IRQ, num);
> -       if (has_acpi_companion(&dev->dev)) {
> +       if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
> +            has_acpi_companion(&dev->dev)) {
>                 if (r && r->flags & IORESOURCE_DISABLED) {
>                         ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, r);
>                         if (ret)
>
> but the resource that is returned from the next hunk has the resource flags set wrong in the NULL pic case:
>
> NULL case:
> r: AMDI0030:00 flags: 0x30000418
> PIC case:
> r: AMDI0030:00 flags: 0x418
>
> IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET
>
> This then later the GPIO controller interrupts are not actually working.
> For example the attn pin for my I2C touchpad doesn't work.

Right the issue is that with the legacy-pic path disabled /
with nr_legacy_irqs() returning 0 them there is no mapping
added for the Legacy ISA IRQs which causes this problem.

My hack to set nr_legacy_irqs to 16 also for the NULL PIC from:
https://bugzilla.kernel.org/show_bug.cgi?id=218003

Does cause the Legacy ISA IRQ mappings to get added and makes
the GPIO controller actually work, as can be seen from:

https://bugzilla.kernel.org/attachment.cgi?id=305241&action=edit

Which is a dmesg with that hack and it does NOT have this error:

[ 0.276113] amd_gpio AMDI0030:00: error -EINVAL: IRQ index 0 not found
[ 0.278464] amd_gpio: probe of AMDI0030:00 failed with error -22

and the reporter also reports the touchpad works with this patch.

As Thomas already said the legayc PIC really is not necessary,
but what is still necessary on these laptops with the legacy PIC
not initialized is to have the Legacy ISA IRQ mappings added
by the kernel itself since these are missing from the MADT
(if I have my ACPI/IOAPIC terminology correct).

This quick hack (which is the one from the working dmesg)
does this:

--- a/arch/x86/kernel/i8259.c
+++ a/arch/x86/kernel/i8259.c
@@ -394,7 +394,7 @@ static int legacy_pic_probe(void)
}

struct legacy_pic null_legacy_pic = {
- .nr_legacy_irqs = 0,
+ .nr_legacy_irqs = NR_IRQS_LEGACY,
.chip = &dummy_irq_chip,
.mask = legacy_pic_uint_noop,
.unmask = legacy_pic_uint_noop,

But I believe this will break things when there are actually
non legacy ISA IRQs / GSI-s using GSI numbers < NR_IRQS_LEGACY

Thomas, I'm not at all familiar with this area of the kernel,
but would checking if the MADT defines any non ISA GSIs under
16 and if NOT use nr_legacy_irqs = NR_IRQS_LEGACY for the
NULL PIC be an option?

Or maybe some sort of DMI (sys_vendor == Lenovo) quirk to
set nr_legacy_irqs = NR_IRQS_LEGACY for the NULL PIC ?

Regards,

Hans



2023-10-20 17:13:46

by Mario Limonciello

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On 10/20/2023 10:16, Hans de Goede wrote:
> Hi Mario,
>
> On 10/19/23 23:20, Mario Limonciello wrote:
>> On 10/18/2023 17:50, Thomas Gleixner wrote:
>
> <snip>
>
>>> But that brings up an interesting question. How are those affected
>>> machines even reaching a state where the user notices that just the
>>> keyboard and the GPIO are not working? Why?
>>
>> So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to try to discover the IRQ to use.
>>
>> This calls acpi_irq_get() which isn't implemented on x86 (hardcodes -EINVAL).
>>
>> I can "work around it" by:
>>
>> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
>> index 76bfcba25003..2b4b436c65d8 100644
>> --- a/drivers/base/platform.c
>> +++ b/drivers/base/platform.c
>> @@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
>>         }
>>
>>         r = platform_get_resource(dev, IORESOURCE_IRQ, num);
>> -       if (has_acpi_companion(&dev->dev)) {
>> +       if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
>> +            has_acpi_companion(&dev->dev)) {
>>                 if (r && r->flags & IORESOURCE_DISABLED) {
>>                         ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, r);
>>                         if (ret)
>>
>> but the resource that is returned from the next hunk has the resource flags set wrong in the NULL pic case:
>>
>> NULL case:
>> r: AMDI0030:00 flags: 0x30000418
>> PIC case:
>> r: AMDI0030:00 flags: 0x418
>>
>> IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET
>>
>> This then later the GPIO controller interrupts are not actually working.
>> For example the attn pin for my I2C touchpad doesn't work.
>
> Right the issue is that with the legacy-pic path disabled /
> with nr_legacy_irqs() returning 0 them there is no mapping
> added for the Legacy ISA IRQs which causes this problem.
>
> My hack to set nr_legacy_irqs to 16 also for the NULL PIC from:
> https://bugzilla.kernel.org/show_bug.cgi?id=218003
>
> Does cause the Legacy ISA IRQ mappings to get added and makes
> the GPIO controller actually work, as can be seen from:
>
> https://bugzilla.kernel.org/attachment.cgi?id=305241&action=edit
>
> Which is a dmesg with that hack and it does NOT have this error:
>
> [ 0.276113] amd_gpio AMDI0030:00: error -EINVAL: IRQ index 0 not found
> [ 0.278464] amd_gpio: probe of AMDI0030:00 failed with error -22
>
> and the reporter also reports the touchpad works with this patch.
>
> As Thomas already said the legayc PIC really is not necessary,
> but what is still necessary on these laptops with the legacy PIC
> not initialized is to have the Legacy ISA IRQ mappings added
> by the kernel itself since these are missing from the MADT
> (if I have my ACPI/IOAPIC terminology correct).

They're not missing, the problem is that the ioapic code doesn't
let it get updated because of what I see as an extra nr_legacy_irqs()
check.

The series I posted I believe fixes this issue.

>
> This quick hack (which is the one from the working dmesg)
> does this:
>
> --- a/arch/x86/kernel/i8259.c
> +++ a/arch/x86/kernel/i8259.c
> @@ -394,7 +394,7 @@ static int legacy_pic_probe(void)
> }
>
> struct legacy_pic null_legacy_pic = {
> - .nr_legacy_irqs = 0,
> + .nr_legacy_irqs = NR_IRQS_LEGACY,
> .chip = &dummy_irq_chip,
> .mask = legacy_pic_uint_noop,
> .unmask = legacy_pic_uint_noop,
>
> But I believe this will break things when there are actually
> non legacy ISA IRQs / GSI-s using GSI numbers < NR_IRQS_LEGACY
>
> Thomas, I'm not at all familiar with this area of the kernel,
> but would checking if the MADT defines any non ISA GSIs under
> 16 and if NOT use nr_legacy_irqs = NR_IRQS_LEGACY for the
> NULL PIC be an option?
>
> Or maybe some sort of DMI (sys_vendor == Lenovo) quirk to
> set nr_legacy_irqs = NR_IRQS_LEGACY for the NULL PIC ?
>

I'd prefer we don't do this.
As tglx pointed out there is an underlying bug and we shouldn't paper
over it with quirks.

My guess at what he doesn't see this issue on his system is that the
default preconfigured IOAPIC mappings (polarity and triggering) happen
to match the values that would have been programmed from _CRS.

That's not the case here.

2023-10-23 15:59:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On Thu, Oct 19 2023 at 16:20, Mario Limonciello wrote:
> On 10/18/2023 17:50, Thomas Gleixner wrote:
>> The only interrupt which does not work is interrupt 0 because nothing
>> allocates interrupt 0 due to nr_legacy_irqs == 0, but that's a trivially
>> solvable problem.
>
> From David's logs I can see that the timer interrupt gets wired up to
> IRQ2 instead of IRQ0.

Sure, but that's not really a problem. Nothing needs the timer
interrupt in principle.

> IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 2 Level:0 ActiveLow:0)
>
> In my hacked up forcing NULL pic case this fixes that:
>
> diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
> index 43c1c24e934b..885687e64e4e 100644
> --- a/arch/x86/kernel/i8259.c
> +++ b/arch/x86/kernel/i8259.c
> @@ -424,7 +424,7 @@ static int legacy_pic_probe(void)
> }
>
> struct legacy_pic null_legacy_pic = {
> - .nr_legacy_irqs = 0,
> + .nr_legacy_irqs = 1,
> .chip = &dummy_irq_chip,
> .mask = legacy_pic_uint_noop,
> .unmask = legacy_pic_uint_noop,
>
> I think it's cleaner than changing all the places that use
> nr_legacy_irqs().

No. It's not cleaner. It's a hack and you still need to audit all places
which depend on nr_legacy_irqs(). Also why '1'? You could as well use
'16', no?

> On my side this makes:
>
> IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 0 Level:0
> ActiveLow:0)

Sure, but that can be achieved by other means in a clean way as
well. Can we please focus on analyzing the underlying problems instead
of trying random hacks? The timer part is well understood already.

>> That machine does not even need the timer interrupt because it has a
>> usable APIC and TSC deadline timer, so no APIC timer calibration
>> required. The same is true for CPUs which do not have a TSC deadline
>> timer, but enumerate the APIC frequency via CPUID or MSRs.
>
> Don't you need it for things like rtcwake to be able to work?

Timer != RTC.

The RTC interrupt is separate (IRQ 8), but in the case of this system it
is using the HPET-RTC emulation which fails to initialize because
interrupt 0 is not available.

>> But that brings up an interesting question. How are those affected
>> machines even reaching a state where the user notices that just the
>> keyboard and the GPIO are not working? Why?
>
> So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to
> try to discover the IRQ to use.
>
> This calls acpi_irq_get() which isn't implemented on x86 (hardcodes
> -EINVAL).
>
> I can "work around it" by:
>
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 76bfcba25003..2b4b436c65d8 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device
> *dev, unsigned int num)
> }
>
> r = platform_get_resource(dev, IORESOURCE_IRQ, num);
> - if (has_acpi_companion(&dev->dev)) {
> + if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
> + has_acpi_companion(&dev->dev)) {
> if (r && r->flags & IORESOURCE_DISABLED) {
> ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, r);
> if (ret)

So why is acpi_irq_get() reached when the PIC is disabled, but not when
the PIC is enabled? Because of the below:

> but the resource that is returned from the next hunk ?

next hunk? The resource is returned by platform_get_resource() above, no?

> has the resource flags set wrong in the NULL pic case:
>
> NULL case:
> r: AMDI0030:00 flags: 0x30000418
> PIC case:
> r: AMDI0030:00 flags: 0x418
>
> IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET

So the real question is WHY are the DISABLED/UNSET flags not set in the
PIC case?

> NULL case:
> handler: handle_edge_irq
> dstate: 0x3740c208
> IRQ_TYPE_LEVEL_LOW
>
> PIC case:
> handler: handle_fasteoi_irq
> dstate: 0x3740e208
> IRQ_TYPE_LEVEL_LOW
> IRQD_LEVEL
>
> I guess something related to the callpath for mp_register_handler().

Guessing is not helpful.

There is a difference in how the allocation info is set up when legacy
PIC is enabled, but that does not explain the above resource flag
difference.

As there is no override for IRQ7:

[ 0.011415] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.011417] Int: type 0, pol 0, trig 0, bus 00, IRQ 00, APIC ID 20, APIC INT 02
[ 0.011418] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.011419] Int: type 0, pol 3, trig 3, bus 00, IRQ 09, APIC ID 20, APIC INT 09
...
[ 0.011425] Int: type 0, pol 0, trig 0, bus 00, IRQ 07, APIC ID 20, APIC INT 07

the initial setup of the IOAPIC interrupt is edge, while the initial
setup of the legacy PIC is level. But that gets changed later to edge
when the IOAPIC is initialized.

I'm not seeing the magic which make the above different yet, though I'm
100% sure by now that this "works" definitely not by design. It just
works by pure luck.

Now when platform_get_irq_optional() sets the trigger type via
irqd_set_trigger_type() it just sets LEVEL_LOW, but does not change the
handler and does not set IRQD_LEVEL. It does neither change the IO/APIC
pin setup. This happens because the IOAPIC interrupt chip does not
implement an irq_set_type() callback.

IOW the whole machinery depends on magic setup ordering vs. PIC and pure
luck. Adding the callback is not rocket science, but while it should
make the interrupt work it still does not explain the magic "working"
when the legacy PIC is enabled.

Let me sit down and add a pile of debug printks to all the relevant
places as we really need to understand the underlying magic effects of
legacy PIC first.

Thanks,

tglx

2023-10-23 16:17:19

by Mario Limonciello

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On 10/23/2023 10:59, Thomas Gleixner wrote:
> On Thu, Oct 19 2023 at 16:20, Mario Limonciello wrote:
>> On 10/18/2023 17:50, Thomas Gleixner wrote:
>>> The only interrupt which does not work is interrupt 0 because nothing
>>> allocates interrupt 0 due to nr_legacy_irqs == 0, but that's a trivially
>>> solvable problem.
>>
>> From David's logs I can see that the timer interrupt gets wired up to
>> IRQ2 instead of IRQ0.
>
> Sure, but that's not really a problem. Nothing needs the timer
> interrupt in principle.
>
>> IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 2 Level:0 ActiveLow:0)
>>
>> In my hacked up forcing NULL pic case this fixes that:
>>
>> diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
>> index 43c1c24e934b..885687e64e4e 100644
>> --- a/arch/x86/kernel/i8259.c
>> +++ b/arch/x86/kernel/i8259.c
>> @@ -424,7 +424,7 @@ static int legacy_pic_probe(void)
>> }
>>
>> struct legacy_pic null_legacy_pic = {
>> - .nr_legacy_irqs = 0,
>> + .nr_legacy_irqs = 1,
>> .chip = &dummy_irq_chip,
>> .mask = legacy_pic_uint_noop,
>> .unmask = legacy_pic_uint_noop,
>>
>> I think it's cleaner than changing all the places that use
>> nr_legacy_irqs().
>
> No. It's not cleaner. It's a hack and you still need to audit all places
> which depend on nr_legacy_irqs(). Also why '1'? You could as well use
> '16', no? >
>> On my side this makes:
>>
>> IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 0 Level:0
>> ActiveLow:0)
>
> Sure, but that can be achieved by other means in a clean way as
> well. Can we please focus on analyzing the underlying problems instead
> of trying random hacks? The timer part is well understood already.
>
>>> That machine does not even need the timer interrupt because it has a
>>> usable APIC and TSC deadline timer, so no APIC timer calibration
>>> required. The same is true for CPUs which do not have a TSC deadline
>>> timer, but enumerate the APIC frequency via CPUID or MSRs.
>>
>> Don't you need it for things like rtcwake to be able to work?
>
> Timer != RTC.
>
> The RTC interrupt is separate (IRQ 8), but in the case of this system it
> is using the HPET-RTC emulation which fails to initialize because
> interrupt 0 is not available.

That's exactly why I allocated 1 IRQ for IRQ 0.

>
>>> But that brings up an interesting question. How are those affected
>>> machines even reaching a state where the user notices that just the
>>> keyboard and the GPIO are not working? Why?
>>
>> So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to
>> try to discover the IRQ to use.
>>
>> This calls acpi_irq_get() which isn't implemented on x86 (hardcodes
>> -EINVAL).
>>
>> I can "work around it" by:
>>
>> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
>> index 76bfcba25003..2b4b436c65d8 100644
>> --- a/drivers/base/platform.c
>> +++ b/drivers/base/platform.c
>> @@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device
>> *dev, unsigned int num)
>> }
>>
>> r = platform_get_resource(dev, IORESOURCE_IRQ, num);
>> - if (has_acpi_companion(&dev->dev)) {
>> + if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
>> + has_acpi_companion(&dev->dev)) {
>> if (r && r->flags & IORESOURCE_DISABLED) {
>> ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, r);
>> if (ret)
>
> So why is acpi_irq_get() reached when the PIC is disabled, but not when
> the PIC is enabled? Because of the below:
>
>> but the resource that is returned from the next hunk ?
>
> next hunk? The resource is returned by platform_get_resource() above, no?
>
>> has the resource flags set wrong in the NULL pic case:
>>
>> NULL case:
>> r: AMDI0030:00 flags: 0x30000418
>> PIC case:
>> r: AMDI0030:00 flags: 0x418
>>
>> IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET
>
> So the real question is WHY are the DISABLED/UNSET flags not set in the
> PIC case?
>
>> NULL case:
>> handler: handle_edge_irq
>> dstate: 0x3740c208
>> IRQ_TYPE_LEVEL_LOW
>>
>> PIC case:
>> handler: handle_fasteoi_irq
>> dstate: 0x3740e208
>> IRQ_TYPE_LEVEL_LOW
>> IRQD_LEVEL
>>
>> I guess something related to the callpath for mp_register_handler().
>
> Guessing is not helpful.
>
> There is a difference in how the allocation info is set up when legacy
> PIC is enabled, but that does not explain the above resource flag
> difference.

I did a pile of printks and that's how I realized it's because of the
missing call to mp_register_handler() which is dependent upon what
appeared to me to be a superfluous number of legacy IRQs check (patch 1
in my solution).

>
> As there is no override for IRQ7:
>
> [ 0.011415] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [ 0.011417] Int: type 0, pol 0, trig 0, bus 00, IRQ 00, APIC ID 20, APIC INT 02
> [ 0.011418] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> [ 0.011419] Int: type 0, pol 3, trig 3, bus 00, IRQ 09, APIC ID 20, APIC INT 09
> ...
> [ 0.011425] Int: type 0, pol 0, trig 0, bus 00, IRQ 07, APIC ID 20, APIC INT 07
>
> the initial setup of the IOAPIC interrupt is edge, while the initial
> setup of the legacy PIC is level. But that gets changed later to edge
> when the IOAPIC is initialized.
>
> I'm not seeing the magic which make the above different yet, though I'm
> 100% sure by now that this "works" definitely not by design. It just
> works by pure luck.
>
> Now when platform_get_irq_optional() sets the trigger type via
> irqd_set_trigger_type() it just sets LEVEL_LOW, but does not change the
> handler and does not set IRQD_LEVEL. It does neither change the IO/APIC
> pin setup. This happens because the IOAPIC interrupt chip does not
> implement an irq_set_type() callback.
>
> IOW the whole machinery depends on magic setup ordering vs. PIC and pure
> luck. Adding the callback is not rocket science, but while it should
> make the interrupt work it still does not explain the magic "working"
> when the legacy PIC is enabled.
>
> Let me sit down and add a pile of debug printks to all the relevant
> places as we really need to understand the underlying magic effects of
> legacy PIC first.
>

OK let's see if you come up with different conclusions.

2023-10-23 17:51:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On Mon, Oct 23 2023 at 11:17, Mario Limonciello wrote:
> On 10/23/2023 10:59, Thomas Gleixner wrote:
>>> IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET
>>
>> So the real question is WHY are the DISABLED/UNSET flags not set in the
>> PIC case?

Do you have an answer for this?

>>> NULL case:
>>> handler: handle_edge_irq
>>> dstate: 0x3740c208
>>> IRQ_TYPE_LEVEL_LOW
>>>
>>> PIC case:
>>> handler: handle_fasteoi_irq
>>> dstate: 0x3740e208
>>> IRQ_TYPE_LEVEL_LOW
>>> IRQD_LEVEL
>>>
>>> I guess something related to the callpath for mp_register_handler().
>>
>> Guessing is not helpful.
>>
>> There is a difference in how the allocation info is set up when legacy
>> PIC is enabled, but that does not explain the above resource flag
>> difference.
>
> I did a pile of printks and that's how I realized it's because of the
> missing call to mp_register_handler() which is dependent upon what
> appeared to me to be a superfluous number of legacy IRQs check (patch 1
> in my solution).

What exactly is superfluous about these legacy checks?

Thanks,

tglx

2023-10-23 17:59:35

by Mario Limonciello

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On 10/23/2023 12:50, Thomas Gleixner wrote:
> On Mon, Oct 23 2023 at 11:17, Mario Limonciello wrote:
>> On 10/23/2023 10:59, Thomas Gleixner wrote:
>>>> IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET
>>>
>>> So the real question is WHY are the DISABLED/UNSET flags not set in the
>>> PIC case?
>
> Do you have an answer for this?

Here's the problematic call path:

acpi_dev_get_irqresource()
->acpi_register_gsi()
->->__acpi_register_gsi() [ Which is acpi_register_gsi_ioapic() ]
->->->mp_map_gsi_to_irq()
->->->->mp_map_pin_to_irq()
->->->->->mp_check_pin_attr()

In the legacy PIC programmed case this function can overwrite level and
active when acpi_register_gsi() is called.

Without the change I made in the NULL PIC case it can't.
So the resources get disabled by acpi_dev_get_irqresource().

>
>>>> NULL case:
>>>> handler: handle_edge_irq
>>>> dstate: 0x3740c208
>>>> IRQ_TYPE_LEVEL_LOW
>>>>
>>>> PIC case:
>>>> handler: handle_fasteoi_irq
>>>> dstate: 0x3740e208
>>>> IRQ_TYPE_LEVEL_LOW
>>>> IRQD_LEVEL
>>>>
>>>> I guess something related to the callpath for mp_register_handler().
>>>
>>> Guessing is not helpful.
>>>
>>> There is a difference in how the allocation info is set up when legacy
>>> PIC is enabled, but that does not explain the above resource flag
>>> difference.
>>
>> I did a pile of printks and that's how I realized it's because of the
>> missing call to mp_register_handler() which is dependent upon what
>> appeared to me to be a superfluous number of legacy IRQs check (patch 1
>> in my solution).
>
> What exactly is superfluous about these legacy checks?
>
> Thanks,
>
> tglx

acpi_dev_get_irqresource() tries to set up to match what's in _CRS.
If acpi_register_gsi() fails, the resource can't get setup.

2023-10-25 09:23:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On Mon, Oct 23 2023 at 17:59, Thomas Gleixner wrote:
> On Thu, Oct 19 2023 at 16:20, Mario Limonciello wrote:
>> struct legacy_pic null_legacy_pic = {
>> - .nr_legacy_irqs = 0,
>> + .nr_legacy_irqs = 1,
>> .chip = &dummy_irq_chip,
>> .mask = legacy_pic_uint_noop,
>> .unmask = legacy_pic_uint_noop,
>>
>> I think it's cleaner than changing all the places that use
>> nr_legacy_irqs().
>
> No. It's not cleaner. It's a hack and you still need to audit all places
> which depend on nr_legacy_irqs(). Also why '1'? You could as well use
> '16', no?

So I sat down and did a thorough analysis of legacy PIC dependencies.

Unfortunately this is an unholy mess and sprinkled all over the place,
so there is no trivial way to resolve this quickly. This needs a proper
overhaul to decouple the actual PIC driver selection from the fact that
the kernel runs on a i8259 equipped hardware and therefore needs to
honour the legacy PNP overrides etc.

The probing itself is to stay in order to avoid sprinkling weird
conditions and NULL PIC selections all over the place.

It could be argued that the probe function should try to initialize the
PIC, but that's overkill for scenarios where the PIC does not exist.

Though it turns out that ACPI/MADT is helpful here because the MADT
header has a flags field which denotes in bit 0, whether the system has
a 8259 setup or not.

This allows to override the probe for now until we actually resolved the
dependency problems in a clean way.

Untested patch below.

Thanks,

tglx
---
--- a/arch/x86/include/asm/i8259.h
+++ b/arch/x86/include/asm/i8259.h
@@ -69,6 +69,8 @@ struct legacy_pic {
void (*make_irq)(unsigned int irq);
};

+void legacy_pic_pcat_compat(void);
+
extern struct legacy_pic *legacy_pic;
extern struct legacy_pic null_legacy_pic;

--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -148,6 +148,9 @@ static int __init acpi_parse_madt(struct
pr_debug("Local APIC address 0x%08x\n", madt->address);
}

+ if (madt->flags & ACPI_MADT_PCAT_COMPAT)
+ legacy_pic_pcat_compat();
+
/* ACPI 6.3 and newer support the online capable bit. */
if (acpi_gbl_FADT.header.revision > 6 ||
(acpi_gbl_FADT.header.revision == 6 &&
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -32,6 +32,7 @@
*/
static void init_8259A(int auto_eoi);

+static bool pcat_compat __ro_after_init;
static int i8259A_auto_eoi;
DEFINE_RAW_SPINLOCK(i8259A_lock);

@@ -299,15 +300,32 @@ static void unmask_8259A(void)

static int probe_8259A(void)
{
+ unsigned char new_val, probe_val = ~(1 << PIC_CASCADE_IR);
unsigned long flags;
- unsigned char probe_val = ~(1 << PIC_CASCADE_IR);
- unsigned char new_val;
+
+ /*
+ * If MADT has the PCAT_COMPAT flag set, then do not bother probing
+ * for the PIC. Some BIOSes leave the PIC uninitialized and probing
+ * fails.
+ *
+ * Right now this causes problems as quite some code depends on
+ * nr_legacy_irqs() > 0 or has_legacy_pic() == true. This is silly
+ * when the system has an IO/APIC because then PIC is not required
+ * at all, except for really old machines where the timer interrupt
+ * must be routed through the PIC. So just pretend that the PIC is
+ * there and let legacy_pic->init() initialize it for nothing.
+ *
+ * Alternatively this could just try to initialize the PIC and
+ * repeat the probe, but for cases where there is no PIC that's
+ * just pointless.
+ */
+ if (pcat_compat)
+ return nr_legacy_irqs();
+
/*
- * Check to see if we have a PIC.
- * Mask all except the cascade and read
- * back the value we just wrote. If we don't
- * have a PIC, we will read 0xff as opposed to the
- * value we wrote.
+ * Check to see if we have a PIC. Mask all except the cascade and
+ * read back the value we just wrote. If we don't have a PIC, we
+ * will read 0xff as opposed to the value we wrote.
*/
raw_spin_lock_irqsave(&i8259A_lock, flags);

@@ -429,5 +447,9 @@ static int __init i8259A_init_ops(void)

return 0;
}
-
device_initcall(i8259A_init_ops);
+
+void __init legacy_pic_pcat_compat(void)
+{
+ pcat_compat = true;
+}

2023-10-25 14:42:06

by Mario Limonciello

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On 10/25/2023 04:23, Thomas Gleixner wrote:
> On Mon, Oct 23 2023 at 17:59, Thomas Gleixner wrote:
>> On Thu, Oct 19 2023 at 16:20, Mario Limonciello wrote:
>>> struct legacy_pic null_legacy_pic = {
>>> - .nr_legacy_irqs = 0,
>>> + .nr_legacy_irqs = 1,
>>> .chip = &dummy_irq_chip,
>>> .mask = legacy_pic_uint_noop,
>>> .unmask = legacy_pic_uint_noop,
>>>
>>> I think it's cleaner than changing all the places that use
>>> nr_legacy_irqs().
>>
>> No. It's not cleaner. It's a hack and you still need to audit all places
>> which depend on nr_legacy_irqs(). Also why '1'? You could as well use
>> '16', no?
>
> So I sat down and did a thorough analysis of legacy PIC dependencies.
>
> Unfortunately this is an unholy mess and sprinkled all over the place,
> so there is no trivial way to resolve this quickly. This needs a proper
> overhaul to decouple the actual PIC driver selection from the fact that
> the kernel runs on a i8259 equipped hardware and therefore needs to
> honour the legacy PNP overrides etc.
>
> The probing itself is to stay in order to avoid sprinkling weird
> conditions and NULL PIC selections all over the place.
>
> It could be argued that the probe function should try to initialize the
> PIC, but that's overkill for scenarios where the PIC does not exist.
>
> Though it turns out that ACPI/MADT is helpful here because the MADT
> header has a flags field which denotes in bit 0, whether the system has
> a 8259 setup or not.
>
> This allows to override the probe for now until we actually resolved the
> dependency problems in a clean way.
>
> Untested patch below.

+David from the bugzilla.

I checked his acpidump and I do think this will work for him.

[024h 0036 4] Local Apic Address : FEE00000
[028h 0040 4] Flags (decoded below) : 00000001
PC-AT Compatibility : 1


David - can you see if the below helps your hardware?

>
> Thanks,
>
> tglx
> ---
> --- a/arch/x86/include/asm/i8259.h
> +++ b/arch/x86/include/asm/i8259.h
> @@ -69,6 +69,8 @@ struct legacy_pic {
> void (*make_irq)(unsigned int irq);
> };
>
> +void legacy_pic_pcat_compat(void);
> +
> extern struct legacy_pic *legacy_pic;
> extern struct legacy_pic null_legacy_pic;
>
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -148,6 +148,9 @@ static int __init acpi_parse_madt(struct
> pr_debug("Local APIC address 0x%08x\n", madt->address);
> }
>
> + if (madt->flags & ACPI_MADT_PCAT_COMPAT)
> + legacy_pic_pcat_compat();
> +
> /* ACPI 6.3 and newer support the online capable bit. */
> if (acpi_gbl_FADT.header.revision > 6 ||
> (acpi_gbl_FADT.header.revision == 6 &&
> --- a/arch/x86/kernel/i8259.c
> +++ b/arch/x86/kernel/i8259.c
> @@ -32,6 +32,7 @@
> */
> static void init_8259A(int auto_eoi);
>
> +static bool pcat_compat __ro_after_init;
> static int i8259A_auto_eoi;
> DEFINE_RAW_SPINLOCK(i8259A_lock);
>
> @@ -299,15 +300,32 @@ static void unmask_8259A(void)
>
> static int probe_8259A(void)
> {
> + unsigned char new_val, probe_val = ~(1 << PIC_CASCADE_IR);
> unsigned long flags;
> - unsigned char probe_val = ~(1 << PIC_CASCADE_IR);
> - unsigned char new_val;
> +
> + /*
> + * If MADT has the PCAT_COMPAT flag set, then do not bother probing
> + * for the PIC. Some BIOSes leave the PIC uninitialized and probing
> + * fails.
> + *
> + * Right now this causes problems as quite some code depends on
> + * nr_legacy_irqs() > 0 or has_legacy_pic() == true. This is silly
> + * when the system has an IO/APIC because then PIC is not required
> + * at all, except for really old machines where the timer interrupt
> + * must be routed through the PIC. So just pretend that the PIC is
> + * there and let legacy_pic->init() initialize it for nothing.
> + *
> + * Alternatively this could just try to initialize the PIC and
> + * repeat the probe, but for cases where there is no PIC that's
> + * just pointless.
> + */
> + if (pcat_compat)
> + return nr_legacy_irqs();
> +
> /*
> - * Check to see if we have a PIC.
> - * Mask all except the cascade and read
> - * back the value we just wrote. If we don't
> - * have a PIC, we will read 0xff as opposed to the
> - * value we wrote.
> + * Check to see if we have a PIC. Mask all except the cascade and
> + * read back the value we just wrote. If we don't have a PIC, we
> + * will read 0xff as opposed to the value we wrote.
> */
> raw_spin_lock_irqsave(&i8259A_lock, flags);
>
> @@ -429,5 +447,9 @@ static int __init i8259A_init_ops(void)
>
> return 0;
> }
> -
> device_initcall(i8259A_init_ops);
> +
> +void __init legacy_pic_pcat_compat(void)
> +{
> + pcat_compat = true;
> +}
>

2023-10-25 15:27:10

by Mario Limonciello

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On 10/25/2023 09:41, Mario Limonciello wrote:
> On 10/25/2023 04:23, Thomas Gleixner wrote:
>> On Mon, Oct 23 2023 at 17:59, Thomas Gleixner wrote:
>>> On Thu, Oct 19 2023 at 16:20, Mario Limonciello wrote:
>>>>    struct legacy_pic null_legacy_pic = {
>>>> -       .nr_legacy_irqs = 0,
>>>> +       .nr_legacy_irqs = 1,
>>>>           .chip = &dummy_irq_chip,
>>>>           .mask = legacy_pic_uint_noop,
>>>>           .unmask = legacy_pic_uint_noop,
>>>>
>>>> I think it's cleaner than changing all the places that use
>>>> nr_legacy_irqs().
>>>
>>> No. It's not cleaner. It's a hack and you still need to audit all places
>>> which depend on nr_legacy_irqs(). Also why '1'? You could as well use
>>> '16', no?
>>
>> So I sat down and did a thorough analysis of legacy PIC dependencies.
>>
>> Unfortunately this is an unholy mess and sprinkled all over the place,
>> so there is no trivial way to resolve this quickly. This needs a proper
>> overhaul to decouple the actual PIC driver selection from the fact that
>> the kernel runs on a i8259 equipped hardware and therefore needs to
>> honour the legacy PNP overrides etc.
>>
>> The probing itself is to stay in order to avoid sprinkling weird
>> conditions and NULL PIC selections all over the place.
>>
>> It could be argued that the probe function should try to initialize the
>> PIC, but that's overkill for scenarios where the PIC does not exist.
>>
>> Though it turns out that ACPI/MADT is helpful here because the MADT
>> header has a flags field which denotes in bit 0, whether the system has
>> a 8259 setup or not.
>>
>> This allows to override the probe for now until we actually resolved the
>> dependency problems in a clean way.
>>
>> Untested patch below.
>
> +David from the bugzilla.
>
> I checked his acpidump and I do think this will work for him.
>
> [024h 0036   4]           Local Apic Address : FEE00000
> [028h 0040   4]        Flags (decoded below) : 00000001
>                          PC-AT Compatibility : 1
>
>
> David - can you see if the below helps your hardware?

FYI, David confirmed this works for fixing his hardware, thanks.

https://bugzilla.kernel.org/show_bug.cgi?id=218003#c84

>
>>
>> Thanks,
>>
>>          tglx
>> ---
>> --- a/arch/x86/include/asm/i8259.h
>> +++ b/arch/x86/include/asm/i8259.h
>> @@ -69,6 +69,8 @@ struct legacy_pic {
>>       void (*make_irq)(unsigned int irq);
>>   };
>> +void legacy_pic_pcat_compat(void);
>> +
>>   extern struct legacy_pic *legacy_pic;
>>   extern struct legacy_pic null_legacy_pic;
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -148,6 +148,9 @@ static int __init acpi_parse_madt(struct
>>           pr_debug("Local APIC address 0x%08x\n", madt->address);
>>       }
>> +    if (madt->flags & ACPI_MADT_PCAT_COMPAT)
>> +        legacy_pic_pcat_compat();
>> +
>>       /* ACPI 6.3 and newer support the online capable bit. */
>>       if (acpi_gbl_FADT.header.revision > 6 ||
>>           (acpi_gbl_FADT.header.revision == 6 &&
>> --- a/arch/x86/kernel/i8259.c
>> +++ b/arch/x86/kernel/i8259.c
>> @@ -32,6 +32,7 @@
>>    */
>>   static void init_8259A(int auto_eoi);
>> +static bool pcat_compat __ro_after_init;
>>   static int i8259A_auto_eoi;
>>   DEFINE_RAW_SPINLOCK(i8259A_lock);
>> @@ -299,15 +300,32 @@ static void unmask_8259A(void)
>>   static int probe_8259A(void)
>>   {
>> +    unsigned char new_val, probe_val = ~(1 << PIC_CASCADE_IR);
>>       unsigned long flags;
>> -    unsigned char probe_val = ~(1 << PIC_CASCADE_IR);
>> -    unsigned char new_val;
>> +
>> +    /*
>> +     * If MADT has the PCAT_COMPAT flag set, then do not bother probing
>> +     * for the PIC. Some BIOSes leave the PIC uninitialized and probing
>> +     * fails.
>> +     *
>> +     * Right now this causes problems as quite some code depends on
>> +     * nr_legacy_irqs() > 0 or has_legacy_pic() == true. This is silly
>> +     * when the system has an IO/APIC because then PIC is not required
>> +     * at all, except for really old machines where the timer interrupt
>> +     * must be routed through the PIC. So just pretend that the PIC is
>> +     * there and let legacy_pic->init() initialize it for nothing.
>> +     *
>> +     * Alternatively this could just try to initialize the PIC and
>> +     * repeat the probe, but for cases where there is no PIC that's
>> +     * just pointless.
>> +     */
>> +    if (pcat_compat)
>> +        return nr_legacy_irqs();
>> +
>>       /*
>> -     * Check to see if we have a PIC.
>> -     * Mask all except the cascade and read
>> -     * back the value we just wrote. If we don't
>> -     * have a PIC, we will read 0xff as opposed to the
>> -     * value we wrote.
>> +     * Check to see if we have a PIC.  Mask all except the cascade and
>> +     * read back the value we just wrote. If we don't have a PIC, we
>> +     * will read 0xff as opposed to the value we wrote.
>>        */
>>       raw_spin_lock_irqsave(&i8259A_lock, flags);
>> @@ -429,5 +447,9 @@ static int __init i8259A_init_ops(void)
>>       return 0;
>>   }
>> -
>>   device_initcall(i8259A_init_ops);
>> +
>> +void __init legacy_pic_pcat_compat(void)
>> +{
>> +    pcat_compat = true;
>> +}
>>
>

2023-10-25 15:27:10

by David Lazar

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

--- On Wed, 25 Oct 2023, Mario Limonciello wrote:
> David - can you see if the below helps your hardware?

The keyboard and mouse work fine with Thomas' patch.

I've uploaded the debug info to the bug:

https://bugzilla.kernel.org/attachment.cgi?id=305291&action=edit

2023-10-25 17:32:06

by Thomas Gleixner

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On Wed, Oct 25 2023 at 17:25, David Lazar wrote:
> --- On Wed, 25 Oct 2023, Mario Limonciello wrote:
>> David - can you see if the below helps your hardware?
>
> The keyboard and mouse work fine with Thomas' patch.
>
> I've uploaded the debug info to the bug:
>
> https://bugzilla.kernel.org/attachment.cgi?id=305291&action=edit

Let me write a changelog then. Unless Rafael has any objections to that
approach.

Thanks,

tglx

2023-10-25 17:38:14

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: PIC probing code from e179f6914152 failing

On Wed, Oct 25, 2023 at 7:31 PM Thomas Gleixner <[email protected]> wrote:
>
> On Wed, Oct 25 2023 at 17:25, David Lazar wrote:
> > --- On Wed, 25 Oct 2023, Mario Limonciello wrote:
> >> David - can you see if the below helps your hardware?
> >
> > The keyboard and mouse work fine with Thomas' patch.
> >
> > I've uploaded the debug info to the bug:
> >
> > https://bugzilla.kernel.org/attachment.cgi?id=305291&action=edit
>
> Let me write a changelog then. Unless Rafael has any objections to that
> approach.

I don't have any, thanks!