2015-07-27 15:21:14

by Matt Fleming

[permalink] [raw]
Subject: Regression in v4.2-rc1 caused by hierarchical irqdomain changes

A git bisect just pointed me at commit d32932d02e18 ("x86/irq: Convert
IOAPIC to use hierarchical irqdomain interfaces") as the reason for why
the trackpad on my Dell XPS13 is no longer working with v4.2-rc1.

I'm now seeing the following errors when booting,

[ 1.615017] i2c_designware INT3433:00: controller timed out
[ 1.642496] i2c_designware INT3433:00: timeout in disabling adapter
[ 1.642500] i2c_hid i2c-DLL0665:01: hid_descr_cmd failed

I tried commit d32932d02e18~1, which works, and things definitely break
starting with commit d32932d02e18.

Any suggestions or requests to try and diagnose why the irqdomain
changes broke this i2c controller driver?

--
Matt Fleming, Intel Open Source Technology Center


2015-07-27 16:35:41

by Jiang Liu

[permalink] [raw]
Subject: Re: Regression in v4.2-rc1 caused by hierarchical irqdomain changes

On 2015/7/27 23:21, Matt Fleming wrote:
> A git bisect just pointed me at commit d32932d02e18 ("x86/irq: Convert
> IOAPIC to use hierarchical irqdomain interfaces") as the reason for why
> the trackpad on my Dell XPS13 is no longer working with v4.2-rc1.
>
> I'm now seeing the following errors when booting,
>
> [ 1.615017] i2c_designware INT3433:00: controller timed out
> [ 1.642496] i2c_designware INT3433:00: timeout in disabling adapter
> [ 1.642500] i2c_hid i2c-DLL0665:01: hid_descr_cmd failed
>
> I tried commit d32932d02e18~1, which works, and things definitely break
> starting with commit d32932d02e18.
>
> Any suggestions or requests to try and diagnose why the irqdomain
> changes broke this i2c controller driver?
Hi Matt,
Sorry for the regression. Could you please help to provide
more information about the regression, such dmesg, /proc/interrupts
and hardware(PCI) info from good and bad kernels?
Thanks!
Gerry

2015-07-27 21:15:18

by Matt Fleming

[permalink] [raw]
Subject: Re: Regression in v4.2-rc1 caused by hierarchical irqdomain changes

On Tue, 28 Jul, at 12:35:36AM, Jiang Liu wrote:
> Hi Matt,
> Sorry for the regression. Could you please help to provide
> more information about the regression, such dmesg, /proc/interrupts
> and hardware(PCI) info from good and bad kernels?

See below. Let me know if you need anything else.

00:00.0 Host bridge: Intel Corporation Broadwell-U Host Bridge -OPI (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Broadwell-U Integrated Graphics (rev 09)
00:03.0 Audio device: Intel Corporation Broadwell-U Audio Controller (rev 09)
00:04.0 Signal processing controller: Intel Corporation Broadwell-U Camarillo Device (rev 09)
00:14.0 USB controller: Intel Corporation Wildcat Point-LP USB xHCI Controller (rev 03)
00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03)
00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #1 (rev e3)
00:1c.3 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #4 (rev e3)
00:1f.0 ISA bridge: Intel Corporation Wildcat Point-LP LPC Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation Wildcat Point-LP SATA Controller [AHCI Mode] (rev 03)
00:1f.3 SMBus: Intel Corporation Wildcat Point-LP SMBus Controller (rev 03)
00:1f.6 Signal processing controller: Intel Corporation Wildcat Point-LP Thermal Management Controller (rev 03)
01:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5249 PCI Express Card Reader (rev 01)
02:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)

[BAD]

CPU0 CPU1 CPU2 CPU3
0: 60 0 0 0 IR-IO-APIC 2-edge timer
1: 4535 2225 12990 2486 IR-IO-APIC 1-edge i8042
3: 1 0 0 0 IR-IO-APIC 3-edge
6: 0 0 0 0 IR-IO-APIC 6-edge dw_dmac
7: 15 1 0 0 IR-IO-APIC 7-edge INT3432:00, INT3433:00
8: 0 0 1 0 IR-IO-APIC 8-edge rtc0
9: 1078 737 1761 536 IR-IO-APIC 9-fasteoi acpi
12: 5 4 64 13 IR-IO-APIC 12-edge i8042
40: 0 0 0 0 DMAR-MSI 0-edge dmar0
41: 0 0 0 0 DMAR-MSI 1-edge dmar1
42: 0 0 0 0 IR-PCI-MSI 458752-edge PCIe PME
43: 0 0 0 0 IR-PCI-MSI 464896-edge PCIe PME
44: 263825 97055 149799 86574 IR-PCI-MSI 32768-edge i915
45: 4 1 20 1 IR-PCI-MSI 360448-edge mei_me
46: 8 7 10 5 IR-PCI-MSI 524288-edge rtsx_pci
47: 265262 96130 273599 126469 IR-PCI-MSI 512000-edge 0000:00:1f.2
48: 6151 3298 11082 2702 IR-PCI-MSI 327680-edge xhci_hcd
49: 826 91 920 257 IR-PCI-MSI 49152-edge snd_hda_intel
50: 363127 229106 850576 196544 IR-PCI-MSI 1048576-edge iwlwifi
NMI: 2190 2399 2221 1874 Non-maskable interrupts
LOC: 8199446 7602675 7137979 4973130 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 2190 2399 2221 1874 Performance monitoring interrupts
IWI: 0 1 1 0 IRQ work interrupts
RTR: 3 0 0 0 APIC ICR read retries
RES: 1117932 1200561 1398843 873543 Rescheduling interrupts
CAL: 145275 153343 122202 118672 Function call interrupts
TLB: 30552 30936 31487 32651 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 71 71 71 71 Machine check polls
ERR: 15
MIS: 0

[GOOD]

CPU0 CPU1 CPU2 CPU3
0: 60 0 0 0 IR-IO-APIC-edge timer
1: 73 26 135 54 IR-IO-APIC-edge i8042
3: 16 17 49 20 IR-IO-APIC-fasteoi AudioDSP, dw_dmac
6: 0 0 0 0 IR-IO-APIC-fasteoi dw_dmac
7: 2662 3072 12307 5419 IR-IO-APIC-fasteoi INT3432:00, INT3433:00
8: 0 0 1 0 IR-IO-APIC-edge rtc0
9: 265 48 280 85 IR-IO-APIC-fasteoi acpi
12: 20 5 61 2 IR-IO-APIC-edge i8042
37: 0 0 0 0 IR-IO-APIC 37-fasteoi rt286
39: 415 74 480 116 IR-IO-APIC 39-fasteoi DLL0665:01
40: 0 0 0 0 DMAR-MSI 0-edge dmar0
41: 0 0 0 0 DMAR-MSI 1-edge dmar1
42: 0 0 0 0 IR-PCI-MSI 458752-edge PCIe PME
43: 0 0 0 0 IR-PCI-MSI 464896-edge PCIe PME
44: 1007 402 696 280 IR-PCI-MSI 32768-edge i915
45: 4 1 20 0 IR-PCI-MSI 360448-edge mei_me
46: 19 1 10 0 IR-PCI-MSI 524288-edge rtsx_pci
47: 12101 9057 14481 30320 IR-PCI-MSI 512000-edge 0000:00:1f.2
48: 64 29 126 27 IR-PCI-MSI 327680-edge xhci_hcd
49: 635 21 1179 269 IR-PCI-MSI 49152-edge snd_hda_intel
50: 147 60 164 222 IR-PCI-MSI 1048576-edge iwlwifi
NMI: 16 12 18 11 Non-maskable interrupts
LOC: 21542 28287 23149 17203 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 16 12 18 11 Performance monitoring interrupts
IWI: 0 0 1 0 IRQ work interrupts
RTR: 3 0 0 0 APIC ICR read retries
RES: 5425 8358 6069 5476 Rescheduling interrupts
CAL: 308 297 340 293 Function call interrupts
TLB: 424 422 432 437 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 2 2 2 2 Machine check polls
ERR: 0
MIS: 0


--
Matt Fleming, Intel Open Source Technology Center


Attachments:
(No filename) (6.91 kB)
dmesg.bad (73.33 kB)
dmesg.good (96.33 kB)
Download all attachments

2015-07-29 21:03:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Regression in v4.2-rc1 caused by hierarchical irqdomain changes

On Mon, 27 Jul 2015, Matt Fleming wrote:
> [BAD]
> 3: 1 0 0 0 IR-IO-APIC 3-edge
> 6: 0 0 0 0 IR-IO-APIC 6-edge dw_dmac
> 7: 15 1 0 0 IR-IO-APIC 7-edge INT3432:00, INT3433:00

[GOOD]
> 3: 16 17 49 20 IR-IO-APIC-fasteoi AudioDSP, dw_dmac
> 6: 0 0 0 0 IR-IO-APIC-fasteoi dw_dmac
> 7: 2662 3072 12307 5419 IR-IO-APIC-fasteoi INT3432:00, INT3433:00

So the old code uses fasteoi while the new one uses edge.

Jiang????


2015-07-30 04:08:55

by Jiang Liu

[permalink] [raw]
Subject: Re: Regression in v4.2-rc1 caused by hierarchical irqdomain changes

On 2015/7/30 5:03, Thomas Gleixner wrote:
> On Mon, 27 Jul 2015, Matt Fleming wrote:
>> [BAD]
>> 3: 1 0 0 0 IR-IO-APIC 3-edge
>> 6: 0 0 0 0 IR-IO-APIC 6-edge dw_dmac
>> 7: 15 1 0 0 IR-IO-APIC 7-edge INT3432:00, INT3433:00
>
> [GOOD]
>> 3: 16 17 49 20 IR-IO-APIC-fasteoi AudioDSP, dw_dmac
>> 6: 0 0 0 0 IR-IO-APIC-fasteoi dw_dmac
>> 7: 2662 3072 12307 5419 IR-IO-APIC-fasteoi INT3432:00, INT3433:00
>
> So the old code uses fasteoi while the new one uses edge.
>
> Jiang????
Sorry for the slow response.
We have reproduced this regression on Surface Pro 3, but are still
trying to figure out the root cause.
Thanks!
Gerry

2015-07-30 07:50:13

by Jiang Liu

[permalink] [raw]
Subject: [Bugfix] x86, irq: Fix regression caused by commit d32932d02e18

Commit d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical
irqdomain interfaces") introduced a regression when converting IOAPIC
to hierarchy irqdomain, which sets wrong irq flow handler for IOAPIC
pins on Surface Pro 3 and causes failure when detecting I2C controllers.
It's caused by using stale attribute value when setting up IOAPIC pins.
With this patch applied, all IOAPIC IRQ configuration are identical
to v4.1 on Surface Pro 3.

Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-by: Matt Fleming <[email protected]>
Tested-and-reported-by: Chen Yu <[email protected]>
Signed-off-by: Jiang Liu <[email protected]>
---
Hi Matt,
Could you please help to test this patch?
Thanks!
Gerry
---
arch/x86/kernel/apic/io_apic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 845dc0df2002..206052e55517 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -943,7 +943,7 @@ static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info)
*/
if (irq < nr_legacy_irqs() && data->count == 1) {
if (info->ioapic_trigger != data->trigger)
- mp_register_handler(irq, data->trigger);
+ mp_register_handler(irq, info->ioapic_trigger);
data->entry.trigger = data->trigger = info->ioapic_trigger;
data->entry.polarity = data->polarity = info->ioapic_polarity;
}
--
1.7.10.4

2015-07-30 08:34:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Bugfix] x86, irq: Fix regression caused by commit d32932d02e18

On Thu, 30 Jul 2015, Jiang Liu wrote:

> Commit d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical
> irqdomain interfaces") introduced a regression when converting IOAPIC
> to hierarchy irqdomain, which sets wrong irq flow handler for IOAPIC
> pins on Surface Pro 3 and causes failure when detecting I2C controllers.
> It's caused by using stale attribute value when setting up IOAPIC pins.
> With this patch applied, all IOAPIC IRQ configuration are identical
> to v4.1 on Surface Pro 3.
>
> Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
> Reported-by: Matt Fleming <[email protected]>
> Tested-and-reported-by: Chen Yu <[email protected]>
> Signed-off-by: Jiang Liu <[email protected]>
> ---
> Hi Matt,
> Could you please help to test this patch?
> Thanks!
> Gerry
> ---
> arch/x86/kernel/apic/io_apic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 845dc0df2002..206052e55517 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -943,7 +943,7 @@ static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info)
> */
> if (irq < nr_legacy_irqs() && data->count == 1) {
> if (info->ioapic_trigger != data->trigger)
> - mp_register_handler(irq, data->trigger);
> + mp_register_handler(irq, info->ioapic_trigger);
> data->entry.trigger = data->trigger = info->ioapic_trigger;
> data->entry.polarity = data->polarity = info->ioapic_polarity;

Pretty obvious, but I was staring at this very piece of code for hours
without spotting it.

2015-07-30 08:55:58

by Jiang Liu

[permalink] [raw]
Subject: Re: [Bugfix] x86, irq: Fix regression caused by commit d32932d02e18

On 2015/7/30 16:33, Thomas Gleixner wrote:
> On Thu, 30 Jul 2015, Jiang Liu wrote:
>
>> Commit d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical
>> irqdomain interfaces") introduced a regression when converting IOAPIC
>> to hierarchy irqdomain, which sets wrong irq flow handler for IOAPIC
>> pins on Surface Pro 3 and causes failure when detecting I2C controllers.
>> It's caused by using stale attribute value when setting up IOAPIC pins.
>> With this patch applied, all IOAPIC IRQ configuration are identical
>> to v4.1 on Surface Pro 3.
>>
>> Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
>> Reported-by: Matt Fleming <[email protected]>
>> Tested-and-reported-by: Chen Yu <[email protected]>
>> Signed-off-by: Jiang Liu <[email protected]>
>> ---
>> Hi Matt,
>> Could you please help to test this patch?
>> Thanks!
>> Gerry
>> ---
>> arch/x86/kernel/apic/io_apic.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
>> index 845dc0df2002..206052e55517 100644
>> --- a/arch/x86/kernel/apic/io_apic.c
>> +++ b/arch/x86/kernel/apic/io_apic.c
>> @@ -943,7 +943,7 @@ static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info)
>> */
>> if (irq < nr_legacy_irqs() && data->count == 1) {
>> if (info->ioapic_trigger != data->trigger)
>> - mp_register_handler(irq, data->trigger);
>> + mp_register_handler(irq, info->ioapic_trigger);
>> data->entry.trigger = data->trigger = info->ioapic_trigger;
>> data->entry.polarity = data->polarity = info->ioapic_polarity;
>
> Pretty obvious, but I was staring at this very piece of code for hours
> without spotting it.
You have given me the most valuable hints about the difference in
irq flow handler:)

2015-07-30 09:07:53

by Matt Fleming

[permalink] [raw]
Subject: Re: [Bugfix] x86, irq: Fix regression caused by commit d32932d02e18

On Thu, 30 Jul, at 03:51:32PM, Jiang Liu wrote:
> Commit d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical
> irqdomain interfaces") introduced a regression when converting IOAPIC
> to hierarchy irqdomain, which sets wrong irq flow handler for IOAPIC
> pins on Surface Pro 3 and causes failure when detecting I2C controllers.
> It's caused by using stale attribute value when setting up IOAPIC pins.
> With this patch applied, all IOAPIC IRQ configuration are identical
> to v4.1 on Surface Pro 3.
>
> Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
> Reported-by: Matt Fleming <[email protected]>
> Tested-and-reported-by: Chen Yu <[email protected]>
> Signed-off-by: Jiang Liu <[email protected]>
> ---
> Hi Matt,
> Could you please help to test this patch?
> Thanks!
> Gerry
> ---
> arch/x86/kernel/apic/io_apic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

Works for me! What I actually did was apply this fixup commit ontop of
commit d32932d02e18, just to be sure that it fixes the issue. I'm about
to apply it on top of -rc4 just to be sure.

But failing any issues arising,

Tested-by: Matt Fleming <[email protected]>

--
Matt Fleming, Intel Open Source Technology Center

2015-07-30 09:12:46

by Jiang Liu

[permalink] [raw]
Subject: Re: [Bugfix] x86, irq: Fix regression caused by commit d32932d02e18

On 2015/7/30 17:07, Matt Fleming wrote:
> On Thu, 30 Jul, at 03:51:32PM, Jiang Liu wrote:
>> Commit d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical
>> irqdomain interfaces") introduced a regression when converting IOAPIC
>> to hierarchy irqdomain, which sets wrong irq flow handler for IOAPIC
>> pins on Surface Pro 3 and causes failure when detecting I2C controllers.
>> It's caused by using stale attribute value when setting up IOAPIC pins.
>> With this patch applied, all IOAPIC IRQ configuration are identical
>> to v4.1 on Surface Pro 3.
>>
>> Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
>> Reported-by: Matt Fleming <[email protected]>
>> Tested-and-reported-by: Chen Yu <[email protected]>
>> Signed-off-by: Jiang Liu <[email protected]>
>> ---
>> Hi Matt,
>> Could you please help to test this patch?
>> Thanks!
>> Gerry
>> ---
>> arch/x86/kernel/apic/io_apic.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Works for me! What I actually did was apply this fixup commit ontop of
> commit d32932d02e18, just to be sure that it fixes the issue. I'm about
> to apply it on top of -rc4 just to be sure.
Hi Matt,
Chen found there are other issues with 4.2-rc4 which
breaks Surface Pro 3's I2C and multitouch drivers. If you run into
the same trouble, please contact Chen Yu <[email protected]>
for details:)
Thanks!
Gerry

2015-07-30 09:46:48

by Matt Fleming

[permalink] [raw]
Subject: Re: [Bugfix] x86, irq: Fix regression caused by commit d32932d02e18

On Thu, 30 Jul, at 05:12:39PM, Jiang Liu wrote:
> Hi Matt,
> Chen found there are other issues with 4.2-rc4 which
> breaks Surface Pro 3's I2C and multitouch drivers. If you run into
> the same trouble, please contact Chen Yu <[email protected]>
> for details:)

No i2c/multitouch issues here on v4.2-rc4, everything works fine!

--
Matt Fleming, Intel Open Source Technology Center

Subject: [tip:x86/urgent] x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()

Commit-ID: 646c4b75494747887f936513b669bb8a2d794459
Gitweb: http://git.kernel.org/tip/646c4b75494747887f936513b669bb8a2d794459
Author: Jiang Liu <[email protected]>
AuthorDate: Thu, 30 Jul 2015 15:51:32 +0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Thu, 30 Jul 2015 21:15:29 +0200

x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()

Commit d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical
irqdomain interfaces") introduced a regression which causes
malfunction of interrupt lines.

The reason is that the conversion of mp_check_pin_attr() missed to
update the polarity selection of the interrupt pin with the caller
provided setting and instead uses a stale attribute value. That in
turn results in chosing the wrong interrupt flow handler.

Use the caller supplied setting to configure the pin correctly which
also choses the correct interrupt flow handler.

This restores the original behaviour and on the affected
machine/driver (Surface Pro 3, i2c controller) all IOAPIC IRQ
configuration are identical to v4.1.

Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-and-tested-by: Matt Fleming <[email protected]>
Reported-and-tested-by: Chen Yu <[email protected]>
Signed-off-by: Jiang Liu <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Chen Yu <[email protected]>
Cc: Yinghai Lu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 845dc0d..206052e 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -943,7 +943,7 @@ static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info)
*/
if (irq < nr_legacy_irqs() && data->count == 1) {
if (info->ioapic_trigger != data->trigger)
- mp_register_handler(irq, data->trigger);
+ mp_register_handler(irq, info->ioapic_trigger);
data->entry.trigger = data->trigger = info->ioapic_trigger;
data->entry.polarity = data->polarity = info->ioapic_polarity;
}