Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751569AbbGBERQ (ORCPT ); Thu, 2 Jul 2015 00:17:16 -0400 Received: from mga02.intel.com ([134.134.136.20]:63684 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751236AbbGBERI (ORCPT ); Thu, 2 Jul 2015 00:17:08 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,390,1432623600"; d="scan'208";a="738966899" From: xiao jin To: jroedel@suse.de, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, bp@suse.de, boris.ostrovsky@oracle.com, dave.hansen@linux.intel.com, rientjes@google.com, imammedo@redhat.com, paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org Cc: yanmin_zhang@linux.intel.com, xiao jin Subject: [PATCH] smpboot.c: move setup_vector_irq after set_cpu_online Date: Thu, 2 Jul 2015 12:24:34 +0800 Message-Id: <1435811074-30062-1-git-send-email-jin.xiao@intel.com> X-Mailer: git-send-email 1.7.9.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6242 Lines: 127 While running cpuhotplug + reboot test, I can easily hit a IPANIC on kernel 3.14. [ 106.107851] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 [ 106.116702] IP: [ 106.118490] [] check_irq_vectors_for_cpu_disable+0x76/0x180 [ 106.126809] PGD 0 [ 106.129110] Oops: 0000 [#1] PREEMPT SMP [ 106.133613] Modules linked in: atomisp_css2401a0_v21 lm3554 ov2722 hid_sensor_hub sens_col_core hid_heci_ish heci_ish heci vidt_driver rfkill_gpi o bcmdhd_pcie(O) cfg80211 ov5693 videobuf_vmalloc pn544_nfc(C) videobuf_core bt_lpm 6lowpan_iphc ip6table_raw iptable_raw atmel_mxt_ts [ 106.161897] CPU: 2 PID: 18 Comm: migration/2 Tainted: G WC O 3.14.37-x86_64-L1-R467-g68db82c #1 [ 106.172323] Hardware name: Intel Corporation CHERRYVIEW C0 PLATFORM/Cherry Trail FFD, BIOS CH2TFFD.X64.0004.R83.1506171149 06/17/2015 [ 106.185758] task: ffff880077e98510 ti: ffff880077e9a000 task.ti: ffff880077e9a000 [ 106.194143] RIP: 0010:[] [ 106.198646] [] check_irq_vectors_for_cpu_disable+0x76/0x180 [ 106.206969] RSP: 0000:ffff880077e9bcf8 EFLAGS: 00010046 [ 106.212926] RAX: 0000000000000000 RBX: 00000000000000d3 RCX: 0000000000000000 [ 106.220921] RDX: 0000000000000000 RSI: 0000000000000088 RDI: 0000000000000001 [ 106.228918] RBP: ffff880077e9bd28 R08: 0000000000000000 R09: ffff8800784008e0 [ 106.236915] R10: 000000000000000a R11: 0000000000000000 R12: 0000000000000015 [ 106.244911] R13: 0000000000000000 R14: 0000000000000088 R15: 0000000000000002 [ 106.252898] FS: 0000000000000000(0000) GS:ffff88007a300000(0000) knlGS:0000000000000000 [ 106.261961] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 106.268405] CR2: 0000000000000040 CR3: 000000006e03c000 CR4: 00000000001007e0 [ 106.276400] Last Branch Records: [ 106.280052] to: [] page_fault+0x0/0x80 [ 106.286335] from: [] check_irq_vectors_for_cpu_disable+0x76/0x180 [ 106.295044] to: [] check_irq_vectors_for_cpu_disable+0x73/0x180 [ 106.303749] from: [] irq_to_desc+0x18/0x20 [ 106.310227] to: [] irq_to_desc+0x17/0x20 [ 106.316701] from: [] radix_tree_lookup+0xc/0x10 [ 106.323655] to: [] radix_tree_lookup+0xb/0x10 [ 106.330616] from: [] radix_tree_lookup_element+0x55/0x90 [ 106.338450] to: [] radix_tree_lookup_element+0x30/0x90 [ 106.346274] from: [] radix_tree_lookup_element+0x50/0x90 [ 106.354108] to: [] radix_tree_lookup_element+0x3b/0x90 [ 106.361934] from: [] radix_tree_lookup_element+0x2d/0x90 [ 106.369750] to: [] radix_tree_lookup_element+0x0/0x90 [ 106.377487] from: [] radix_tree_lookup+0x6/0x10 [ 106.384447] to: [] radix_tree_lookup+0x0/0x10 [ 106.391408] from: [] irq_to_desc+0x12/0x20 [ 106.397882] Stack: [ 106.400140] 00000002810bb4f2 ffff88006e07bde8 ffff88006e07bd88 0000000000000000 [ 106.408551] ffffffff8110a801 0000000000000202 ffff880077e9bd40 ffffffff81030f62 [ 106.416967] 0000000000000282 ffff880077e9bd58 ffffffff819dd043 0000000000000003 [ 106.425375] Call Trace: [ 106.428136] [] ? multi_cpu_stop+0x1/0x110 [ 106.434475] [] native_cpu_disable+0x12/0x40 [ 106.441018] [] take_cpu_down+0x13/0x40 [ 106.447074] [] multi_cpu_stop+0xc1/0x110 [ 106.453324] [] ? cpu_stop_should_run+0x50/0x50 [ 106.460156] [] cpu_stopper_thread+0x78/0x150 [ 106.466795] [] ? _raw_spin_unlock_irq+0x1e/0x40 [ 106.473726] [] ? finish_task_switch+0x57/0xd0 [ 106.480464] [] ? __schedule+0x37e/0x7b0 [ 106.486619] [] smpboot_thread_fn+0x17d/0x2b0 [ 106.493259] [] ? SyS_setgroups+0x160/0x160 [ 106.499704] [] kthread+0xe4/0x100 We find latest upstream has commit d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1 to solve this IPANIC. But from the link http://lkml.kernel.org/r/20150204132754.GA10078@suse.de, it looks the root cause is not clear. As it's easily to hit with the specific test case, we have more check and find the IPANIC scenario as below. cpu N (N = 1, or 2, or 3) cpu 0 native_cpu_up device_shutdown => do_boot_cpu => start_secondary => smp_callin => setup_vector_irq => __setup_vector_irq => free_msi_irqs => arch_teardown_msi_irqs => default_teardown_msi_irqs => arch_teardown_msi_irq => native_teardown_msi_irq => destroy_irq => __clear_irq_vector => set_cpu_online The cpu still is not online when clear irq vector, it makes the irq number remain in irq vector after free_msi_irqs. Next native_cpu_disable() will hit NULL pointer when check irq vector. The patch move setup_vector_irq after set_cpu_online. Signed-off-by: xiao jin --- arch/x86/kernel/smpboot.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 50e547e..f7d5d79 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -172,11 +172,6 @@ static void smp_callin(void) apic_ap_setup(); /* - * Need to setup vector mappings before we enable interrupts. - */ - setup_vector_irq(smp_processor_id()); - - /* * Save our processor parameters. Note: this information * is needed for clock calibration. */ @@ -257,6 +252,11 @@ static void notrace start_secondary(void *unused) cpu_set_state_online(smp_processor_id()); x86_platform.nmi_init(); + /* + * Need to setup vector mappings before we enable interrupts. + */ + setup_vector_irq(smp_processor_id()); + /* enable local interrupts */ local_irq_enable(); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/