I'm Seunghun Han, and I work for National Security Research Institute of
South Korea.
I found a kernel panic while I tested latest kernel version.
The kernel panic log is as follows:
>[ 0.058851] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
>[ 0.060000] IP: io_apic_modify_irq+0x11/0x80
>[ 0.060000] PGD 0
>[ 0.060000] P4D 0
>[ 0.060000]
>[ 0.060000] Oops: 0000 [#1] SMP
>[ 0.060000] Modules linked in:
>[ 0.060000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc5 #26
>[ 0.060000] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>[ 0.060000] task: ffff88021619acc0 task.stack: ffffc90000c50000
>[ 0.060000] RIP: 0010:io_apic_modify_irq+0x11/0x80
>[ 0.060000] RSP: 0000:ffffc90000c53e40 EFLAGS: 00010046
>[ 0.060000] RAX: 0000000000000082 RBX: 0000000000000000 RCX: 0000000000000000
>[ 0.060000] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 0000000000000000
>[ 0.060000] RBP: 0000000000000000 R08: ffff8802170cda00 R09: 0000000000000117
>[ 0.060000] R10: 0000000000000000 R11: 0000000000000116 R12: 0000000000000000
>[ 0.060000] R13: 0000000000000000 R14: ffff8802170cda58 R15: ffff8802170cec00
>[ 0.060000] FS: 0000000000000000(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
>[ 0.060000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[ 0.060000] CR2: 0000000000000010 CR3: 0000000001a09000 CR4: 00000000000006f0
>[ 0.060000] Call Trace:
>[ 0.060000] ? unmask_ioapic_irq+0x2b/0x40
>[ 0.060000] ? setup_IO_APIC+0x79b/0x7a3
>[ 0.060000] ? clear_IO_APIC_pin+0x93/0x130
>[ 0.060000] ? apic_bsp_setup+0xa2/0xac
>[ 0.060000] ? native_smp_prepare_cpus+0x245/0x2b1
>[ 0.060000] ? kernel_init_freeable+0xd0/0x21f
>[ 0.060000] ? rest_init+0x80/0x80
>[ 0.060000] ? kernel_init+0xa/0x100
>[ 0.060000] ? ret_from_fork+0x25/0x30
>[ 0.060000] Code: 47 04 83 c6 10 25 ff 0f 00 00 48 2d 00 10 80 00 48 29 c8 89 30 89 50 10 c3 90 0f 1f 44 00 00 41 55 41 54 49 89 cd 55 53 48 89 fb <23> 77 10 09 d6 89 73 10 4c 8b 27 89 f5 4c 39 e7 74 4f 41 8b 44
>[ 0.060000] RIP: io_apic_modify_irq+0x11/0x80 RSP: ffffc90000c53e40
>[ 0.060000] CR2: 0000000000000010
>[ 0.060000] ---[ end trace c0d793cf886c6f59 ]---
>[ 0.060000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>[ 0.060000]
>[ 0.060000] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>[ 0.060000]
I analyzed this kernel panic in detail and found that check_timer() function
passed an invalid argument to unmask_ioapic_irq() function.
The code is as follows:
> int idx;
> idx = find_irq_entry(apic1, pin1, mp_INT);
> if (idx != -1 && irq_trigger(idx))
> unmask_ioapic_irq(irq_get_chip_data(0));
The unmask_ioapic_irq() function wants a "struct irq_data *" argument, but
the code, irq_get_chip_data(0), passes a "struct irq_chip *" value.
To fix an invalid argument passing, I made a patch which passes a return
value of irq_get_irq_data(0).
Thank you.
Signed-off-by: Seunghun Han <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index b4f5f73..237e9c2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2093,7 +2093,7 @@ static inline void __init check_timer(void)
int idx;
idx = find_irq_entry(apic1, pin1, mp_INT);
if (idx != -1 && irq_trigger(idx))
- unmask_ioapic_irq(irq_get_chip_data(0));
+ unmask_ioapic_irq(irq_get_irq_data(0));
}
irq_domain_deactivate_irq(irq_data);
irq_domain_activate_irq(irq_data);
--
2.1.4
Seunghun,
On Tue, 18 Jul 2017, Seunghun Han wrote:
first of all thanks for the patch and the analysis.
> I'm Seunghun Han, and I work for National Security Research Institute of
> South Korea.
While I appreciate your detailed description of the problem, please try to
follow the rules for change logs as outlined in Documentation/process/
> >[ 0.058851] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> >[ 0.060000] IP: io_apic_modify_irq+0x11/0x80
> >[ 0.060000] PGD 0
> >[ 0.060000] P4D 0
> >[ 0.060000]
> >[ 0.060000] Oops: 0000 [#1] SMP
> >[ 0.060000] Modules linked in:
> >[ 0.060000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc5 #26
> >[ 0.060000] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> >[ 0.060000] task: ffff88021619acc0 task.stack: ffffc90000c50000
> >[ 0.060000] RIP: 0010:io_apic_modify_irq+0x11/0x80
> >[ 0.060000] RSP: 0000:ffffc90000c53e40 EFLAGS: 00010046
> >[ 0.060000] RAX: 0000000000000082 RBX: 0000000000000000 RCX: 0000000000000000
> >[ 0.060000] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 0000000000000000
> >[ 0.060000] RBP: 0000000000000000 R08: ffff8802170cda00 R09: 0000000000000117
> >[ 0.060000] R10: 0000000000000000 R11: 0000000000000116 R12: 0000000000000000
> >[ 0.060000] R13: 0000000000000000 R14: ffff8802170cda58 R15: ffff8802170cec00
> >[ 0.060000] FS: 0000000000000000(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
> >[ 0.060000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >[ 0.060000] CR2: 0000000000000010 CR3: 0000000001a09000 CR4: 00000000000006f0
> >[ 0.060000] Call Trace:
> >[ 0.060000] ? unmask_ioapic_irq+0x2b/0x40
> >[ 0.060000] ? setup_IO_APIC+0x79b/0x7a3
> >[ 0.060000] ? clear_IO_APIC_pin+0x93/0x130
> >[ 0.060000] ? apic_bsp_setup+0xa2/0xac
> >[ 0.060000] ? native_smp_prepare_cpus+0x245/0x2b1
> >[ 0.060000] ? kernel_init_freeable+0xd0/0x21f
> >[ 0.060000] ? rest_init+0x80/0x80
> >[ 0.060000] ? kernel_init+0xa/0x100
> >[ 0.060000] ? ret_from_fork+0x25/0x30
> >[ 0.060000] Code: 47 04 83 c6 10 25 ff 0f 00 00 48 2d 00 10 80 00 48 29 c8 89 30 89 50 10 c3 90 0f 1f 44 00 00 41 55 41 54 49 89 cd 55 53 48 89 fb <23> 77 10 09 d6 89 73 10 4c 8b 27 89 f5 4c 39 e7 74 4f 41 8b 44
> >[ 0.060000] RIP: io_apic_modify_irq+0x11/0x80 RSP: ffffc90000c53e40
> >[ 0.060000] CR2: 0000000000000010
> >[ 0.060000] ---[ end trace c0d793cf886c6f59 ]---
> >[ 0.060000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> >[ 0.060000]
> >[ 0.060000] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> >[ 0.060000]
Please remove irrelevant information, like time stamps and other data which
does not help in understanding the problem. In this case the back trace is
not really useful as the call chain of check_timer() is well known.
> I analyzed this kernel panic in detail and found that check_timer() function
> passed an invalid argument to unmask_ioapic_irq() function.
>
> The code is as follows:
> > int idx;
> > idx = find_irq_entry(apic1, pin1, mp_INT);
> > if (idx != -1 && irq_trigger(idx))
> > unmask_ioapic_irq(irq_get_chip_data(0));
Adding the code here is not helpful either, as we can see that from the patch.
> The unmask_ioapic_irq() function wants a "struct irq_data *" argument, but
> the code, irq_get_chip_data(0), passes a "struct irq_chip *" value.
That's actually wrong. irq_get_chip_data() returns a void pointer. If it
would return a pointer to a struct irq_chip, the compiler would have
noticed and emitted a warning. Due to the void pointer return the problem
went unnoticed. This code path seems to be never executed, but has been
left there for historical reasons. It was required for old machines, which
either are not running newer kernels or simply do not exist anymore.
> To fix an invalid argument passing, I made a patch which passes a return
> value of irq_get_irq_data(0).
I'll fix up the changelog for you this time as I did for the other
patch. Please study it along with the Documentation for further guidance.
Thanks for your contribution!
tglx
Commit-ID: afabde6986911394c95c596f96d2ac833eef04cc
Gitweb: http://git.kernel.org/tip/afabde6986911394c95c596f96d2ac833eef04cc
Author: Seunghun Han <[email protected]>
AuthorDate: Tue, 18 Jul 2017 18:20:44 +0900
Committer: Thomas Gleixner <[email protected]>
CommitDate: Tue, 18 Jul 2017 17:39:54 +0200
x86/ioapic: Pass the correct data to unmask_ioapic_irq()
One of the rarely executed code pathes in check_timer() calls
unmask_ioapic_irq() passing irq_get_chip_data(0) as argument.
That's wrong as unmask_ioapic_irq() expects a pointer to the irq data of
interrupt 0. irq_get_chip_data(0) returns NULL, so the following
dereference in unmask_ioapic_irq() causes a kernel panic.
The issue went unnoticed in the first place because irq_get_chip_data()
returns a void pointer so the compiler cannot do a type check on the
argument. The code path was added for machines with broken configuration,
but it seems that those machines are either not running current kernels or
simply do not longer exist.
Hand in irq_get_irq_data(0) as argument which provides the correct data.
[ tglx: Rewrote changelog ]
Fixes: 4467715a44cc ("x86/irq: Move irq_cfg.irq_2_pin into io_apic.c")
Signed-off-by: Seunghun Han <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/apic/io_apic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index b4f5f73..237e9c2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2093,7 +2093,7 @@ static inline void __init check_timer(void)
int idx;
idx = find_irq_entry(apic1, pin1, mp_INT);
if (idx != -1 && irq_trigger(idx))
- unmask_ioapic_irq(irq_get_chip_data(0));
+ unmask_ioapic_irq(irq_get_irq_data(0));
}
irq_domain_deactivate_irq(irq_data);
irq_domain_activate_irq(irq_data);
Commit-ID: e708e35ba6d89ff785b225cd07dcccab04fa954a
Gitweb: http://git.kernel.org/tip/e708e35ba6d89ff785b225cd07dcccab04fa954a
Author: Seunghun Han <[email protected]>
AuthorDate: Tue, 18 Jul 2017 18:20:44 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 20 Jul 2017 10:28:10 +0200
x86/ioapic: Pass the correct data to unmask_ioapic_irq()
One of the rarely executed code pathes in check_timer() calls
unmask_ioapic_irq() passing irq_get_chip_data(0) as argument.
That's wrong as unmask_ioapic_irq() expects a pointer to the irq data of
interrupt 0. irq_get_chip_data(0) returns NULL, so the following
dereference in unmask_ioapic_irq() causes a kernel panic.
The issue went unnoticed in the first place because irq_get_chip_data()
returns a void pointer so the compiler cannot do a type check on the
argument. The code path was added for machines with broken configuration,
but it seems that those machines are either not running current kernels or
simply do not longer exist.
Hand in irq_get_irq_data(0) as argument which provides the correct data.
[ tglx: Rewrote changelog ]
Fixes: 4467715a44cc ("x86/irq: Move irq_cfg.irq_2_pin into io_apic.c")
Signed-off-by: Seunghun Han <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index b4f5f73..237e9c2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2093,7 +2093,7 @@ static inline void __init check_timer(void)
int idx;
idx = find_irq_entry(apic1, pin1, mp_INT);
if (idx != -1 && irq_trigger(idx))
- unmask_ioapic_irq(irq_get_chip_data(0));
+ unmask_ioapic_irq(irq_get_irq_data(0));
}
irq_domain_deactivate_irq(irq_data);
irq_domain_activate_irq(irq_data);