Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1945922AbbD2O66 (ORCPT ); Wed, 29 Apr 2015 10:58:58 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:41746 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423451AbbD2O6e (ORCPT ); Wed, 29 Apr 2015 10:58:34 -0400 From: Ethan Zhao To: tglx@linutronix.de, john.stultz@linaro.org, linux-kernel@vger.kernel.org Cc: ethan.kernel@gmail.com, tim.uglow@oracle.com, Ethan Zhao Subject: [PATCH] Timer: fix a race condition between init_timers_cpu() and get_next_timer_interrupt() Date: Wed, 29 Apr 2015 23:58:07 +0900 Message-Id: <1430319487-23216-1-git-send-email-ethan.zhao@oracle.com> X-Mailer: git-send-email 1.8.3.1 X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3531 Lines: 105 while testing CPU hotplug and MCE with following two scripts, script 1: for i in {1..30}; do while :; do ((a=$RANDOM%160)); echo 0 >> /sys/devices/system/cpu/cpu${i}/online; echo 1 >> /sys/devices/system/cpu/cpu${i}/online; done & done script 2: while :; do for i in $(ls /sys/devices/system/machinecheck/machinecheck*/check_interval); do echo 1 >> $i; done; done We got panic call trace as: sh> bt PID: 0 TASK: ffff881028e28080 CPU: 14 COMMAND: "swapper/14" #0 [ffff881028e2ba90] machine_kexec at ffffffff810402aa #1 [ffff881028e2baf8] crash_kexec at ffffffff810c4ea4 #2 [ffff881028e2bbc0] oops_end at ffffffff81575c50 #3 [ffff881028e2bbe8] no_context at ffffffff8156a8f9 #4 [ffff881028e2bc30] __bad_area_nosemaphore at ffffffff8156a979 #5 [ffff881028e2bc78] bad_area_nosemaphore at ffffffff8156aae3 #6 [ffff881028e2bc88] __do_page_fault at ffffffff8157852d #7 [ffff881028e2bd80] do_page_fault at ffffffff8157896e #8 [ffff881028e2bd90] page_fault at ffffffff815750d8 [exception RIP: get_next_timer_interrupt+344] RIP: ffffffff8106c228 RSP: ffff881028e2be48 RFLAGS: 00010013 RAX: 0000000000000000 RBX: 00000001006d7a29 RCX: 00000001006d7dee RDX: 0000000000000001 RSI: ffff88602978d3f8 RDI: ffff88602978d028 RBP: ffff881028e2be90 R8: 000000000000003d R9: 000000000000003b R10: 0000000001006d7b R11: ffff881028e2be50 R12: 00000001006d7dee R13: 00000001406d7a28 R14: ffff88602978c000 R15: ffff881028e2be68 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff881028e2be40] get_next_timer_interrupt at ffffffff8106c120 This panic (NULL pointer dereference) was caused by race condition between init_timers_cpu() and get_next_timer_interrupt(), there is no protection with lock when does initialization in function init_timers_cpu(). The two threads cause the race condition are shown as following: Thread A: store_online() cpu_up() __cpu_notify(CPU_UP_PREPARE...) timer_cpu_notify() CPU_UP_PREPARE: init_timers_cpu() { ... INIT_LIST_HEAD(base->tv5.vec + j); ... } Thread B: tick_nohz_idle_enter() __tick_nohz_idle_enter() get_next_timer_interrupt() __next_timer_interrupt() { ... list_for_each_entry(nte, varp->vec + slot, entry) { if (tbase_get_deferrable(nte->base)) ... } This bug will affect stable branch 4.0, 3.8, 3.19, I didn't check other branches. The patch has been tested and verfied on stable 4.0. Reported-by: Tim Uglow Signed-off-by: Ethan Zhao --- kernel/time/timer.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 2d3f5c5..cc1cf35 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1573,6 +1573,9 @@ static int init_timers_cpu(int cpu) base = per_cpu(tvec_bases, cpu); } + if (unlikely(!base)) + return -EINVAL; + spin_lock_irq(&base->lock); for (j = 0; j < TVN_SIZE; j++) { INIT_LIST_HEAD(base->tv5.vec + j); @@ -1587,6 +1590,7 @@ static int init_timers_cpu(int cpu) base->next_timer = base->timer_jiffies; base->active_timers = 0; base->all_timers = 0; + spin_unlock_irq(&base->lock); return 0; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/