2020-11-03 14:20:52

by kernel test robot

[permalink] [raw]
Subject: [x86/ioapic] b643128b91: Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC

Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: b643128b917ca8f1c8b1e14af64ebdc81147b2d1 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/apic


in testcase: nvml
version: nvml-x86_64-9a558d859-1_20201008
with following parameters:

test: pmem
group: unicode
nr_pmem: 1
fs: ext4
mount_option: dax
bp_memmap: 32G!4G
ucode: 0x7000019



on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 48G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):

If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>

[ 3.148819] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 3.154825] DMAR: DRHD: handling fault status reg 2
[ 3.159701] DMAR: [INTR-REMAP] Request device [f0:1f.7] fault index 0 [fault reason 37] Blocked a compatibility format interrupt request
[ 3.173870] Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
[ 3.182381] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc1-00029-gb643128b917c #1
[ 3.190370] Hardware name: Supermicro SYS-5018D-FN4T/X10SDV-8C-TLN4F, BIOS 1.1 03/02/2016
[ 3.198534] Call Trace:
[ 3.200983] dump_stack+0x57/0x6a
[ 3.204298] panic+0x102/0x2d2
[ 3.207349] panic_if_irq_remap.cold+0x5/0x5
[ 3.211613] check_timer+0x1f6/0x694
[ 3.215184] ? printk+0x58/0x6f
[ 3.218320] setup_IO_APIC+0x17b/0x1c3
[ 3.222067] x86_late_time_init+0x20/0x30
[ 3.226077] start_kernel+0x40c/0x4c7
[ 3.229734] secondary_startup_64_no_verify+0xb8/0xbb

To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml



Thanks,
Oliver Sang


Attachments:
(No filename) (1.91 kB)
config-5.10.0-rc1-00029-gb643128b917c (174.21 kB)
job-script (5.92 kB)
dmesg.xz (5.72 kB)
job.yaml (4.90 kB)
Download all attachments

2020-11-03 15:26:20

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [x86/ioapic] b643128b91: Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC

Hi!

On Tue, Nov 03 2020 at 22:31, lkp wrote:
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: b643128b917ca8f1c8b1e14af64ebdc81147b2d1 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain")
>
> [ 3.148819] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 3.154825] DMAR: DRHD: handling fault status reg 2
> [ 3.159701] DMAR: [INTR-REMAP] Request device [f0:1f.7] fault index 0 [fault reason 37] Blocked a compatibility format interrupt request
> [ 3.173870] Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
> [ 3.182381] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc1-00029-gb643128b917c #1
> [ 3.190370] Hardware name: Supermicro SYS-5018D-FN4T/X10SDV-8C-TLN4F, BIOS 1.1 03/02/2016
> [ 3.198534] Call Trace:
> [ 3.200983] dump_stack+0x57/0x6a
> [ 3.204298] panic+0x102/0x2d2
> [ 3.207349] panic_if_irq_remap.cold+0x5/0x5
> [ 3.211613] check_timer+0x1f6/0x694
> [ 3.215184] ? printk+0x58/0x6f
> [ 3.218320] setup_IO_APIC+0x17b/0x1c3
> [ 3.222067] x86_late_time_init+0x20/0x30
> [ 3.226077] start_kernel+0x40c/0x4c7
> [ 3.229734] secondary_startup_64_no_verify+0xb8/0xbb

It's not reproducing here. Can you please redo the test with
apic=verbose on the kernel command line and provide the full dmesg
output?

Thanks,

tglx

2020-11-03 16:12:44

by Woodhouse, David

[permalink] [raw]
Subject: Re: [x86/ioapic] b643128b91: Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC

On Tue, 2020-11-03 at 16:22 +0100, Thomas Gleixner wrote:
> Hi!
>
> On Tue, Nov 03 2020 at 22:31, lkp wrote:
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: b643128b917ca8f1c8b1e14af64ebdc81147b2d1 ("x86/ioapic: Use
> > irq_find_matching_fwspec() to find remapping irqdomain")
> >
> > [ 3.148819] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> > [ 3.154825] DMAR: DRHD: handling fault status reg 2
> > [ 3.159701] DMAR: [INTR-REMAP] Request device [f0:1f.7] fault
> > index 0 [fault reason 37] Blocked a compatibility format interrupt
> > request
> > [ 3.173870] Kernel panic - not syncing: timer doesn't work
> > through Interrupt-remapped IO-APIC
> > [ 3.182381] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-
> > rc1-00029-gb643128b917c #1
> > [ 3.190370] Hardware name: Supermicro SYS-5018D-FN4T/X10SDV-8C-
> > TLN4F, BIOS 1.1 03/02/2016
> > [ 3.198534] Call Trace:
> > [ 3.200983] dump_stack+0x57/0x6a
> > [ 3.204298] panic+0x102/0x2d2
> > [ 3.207349] panic_if_irq_remap.cold+0x5/0x5
> > [ 3.211613] check_timer+0x1f6/0x694
> > [ 3.215184] ? printk+0x58/0x6f
> > [ 3.218320] setup_IO_APIC+0x17b/0x1c3
> > [ 3.222067] x86_late_time_init+0x20/0x30
> > [ 3.226077] start_kernel+0x40c/0x4c7
> > [ 3.229734] secondary_startup_64_no_verify+0xb8/0xbb
>
> It's not reproducing here. Can you please redo the test with
> apic=verbose on the kernel command line and provide the full dmesg
> output?

Ah, it already had apic=debug; sorry. I was looking for the IRTE setup
messages, which clearly aren't there which is why it was generating
compatibility format interrupts.

It's probably this. Will try harder to reproduce to confirm...

--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2335,7 +2335,7 @@ static int mp_irqdomain_create(int ioapic)
if (cfg->dev) {
fn = of_node_to_fwnode(cfg->dev);
} else {
- fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
+ fn = irq_domain_alloc_named_id_fwnode("IO-APIC", mpc_ioapic_id(ioapic));
if (!fn)
return -ENOMEM;
}



Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.


2020-11-03 16:41:10

by David Woodhouse

[permalink] [raw]
Subject: [PATCH] x86/ioapic: Use I/OAPIC ID for finding irqdomain, not index

From: David Woodhouse <[email protected]>

In commit b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to
find remapping irqdomain") the I/OAPIC code was changed to find its
parent irqdomain using irq_find_matching_fwspec(), but the key used
for the lookup was wrong. It shouldn't use 'ioapic' which is the index
into its own ioapics[] array. It should use the actual arbitration
ID of the I/OAPIC in question, which is mpc_ioapic_id(ioapic).

Fixes: b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain")
Reported-by: lkp <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---

The X2APIC_OPT_OUT bit was a red herring. Once I spotted and set up a
repro case such that mpc_ioapic_id(N) != N I can see it here.


arch/x86/kernel/apic/io_apic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 443d2c9086b9..0d68e7c286e2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2335,14 +2335,14 @@ static int mp_irqdomain_create(int ioapic)
if (cfg->dev) {
fn = of_node_to_fwnode(cfg->dev);
} else {
- fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
+ fn = irq_domain_alloc_named_id_fwnode("IO-APIC", mpc_ioapic_id(ioapic));
if (!fn)
return -ENOMEM;
}

fwspec.fwnode = fn;
fwspec.param_count = 1;
- fwspec.param[0] = ioapic;
+ fwspec.param[0] = mpc_ioapic_id(ioapic);

parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
if (!parent) {
--
2.17.1


Attachments:
smime.p7s (5.05 kB)
Subject: [tip: x86/apic] x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index

The following commit has been merged into the x86/apic branch of tip:

Commit-ID: f36a74b9345aebaf5d325380df87a54720229d18
Gitweb: https://git.kernel.org/tip/f36a74b9345aebaf5d325380df87a54720229d18
Author: David Woodhouse <[email protected]>
AuthorDate: Tue, 03 Nov 2020 16:36:22
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 04 Nov 2020 11:11:35 +01:00

x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index

In commit b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to
find remapping irqdomain") the I/O-APIC code was changed to find its
parent irqdomain using irq_find_matching_fwspec(), but the key used
for the lookup was wrong. It shouldn't use 'ioapic' which is the index
into its own ioapics[] array. It should use the actual arbitration
ID of the I/O-APIC in question, which is mpc_ioapic_id(ioapic).

Fixes: b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain")
Reported-by: lkp <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
arch/x86/kernel/apic/io_apic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 1cfd65e..0602c95 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2345,14 +2345,14 @@ static int mp_irqdomain_create(int ioapic)
if (cfg->dev) {
fn = of_node_to_fwnode(cfg->dev);
} else {
- fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
+ fn = irq_domain_alloc_named_id_fwnode("IO-APIC", mpc_ioapic_id(ioapic));
if (!fn)
return -ENOMEM;
}

fwspec.fwnode = fn;
fwspec.param_count = 1;
- fwspec.param[0] = ioapic;
+ fwspec.param[0] = mpc_ioapic_id(ioapic);

parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
if (!parent) {