Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755580Ab3IIU3y (ORCPT ); Mon, 9 Sep 2013 16:29:54 -0400 Received: from mail-pa0-f51.google.com ([209.85.220.51]:54472 "EHLO mail-pa0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754841Ab3IIU3w (ORCPT ); Mon, 9 Sep 2013 16:29:52 -0400 Message-ID: <522E2FBB.4070406@linaro.org> Date: Mon, 09 Sep 2013 13:29:47 -0700 From: John Stultz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130803 Thunderbird/17.0.8 MIME-Version: 1.0 To: Gerlando Falauto CC: "linux-kernel@vger.kernel.org" , Thomas Gleixner , Richard Cochran , Prarit Bhargava , "Brunck, Holger" , "Longchamp, Valentin" , "Bigler, Stefan" , Peter Zijlstra , Ingo Molnar Subject: Re: kernel deadlock References: <521F6D06.1040107@keymile.com> <521FDD12.7050000@linaro.org> <52212511.9050206@keymile.com> <5221264F.4070402@linaro.org> <5225F8EF.3040701@keymile.com> <52261BBB.60904@linaro.org> <5226EB35.6080604@keymile.com> In-Reply-To: <5226EB35.6080604@keymile.com> X-Enigmail-Version: 1.5.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 14018 Lines: 331 On 09/04/2013 01:11 AM, Gerlando Falauto wrote: > Hi John, > > On 09/03/2013 07:26 PM, John Stultz wrote: >> On 09/03/2013 07:57 AM, Gerlando Falauto wrote: >>> Hi, >>> >>> I tried again from scratch, so let me recap the whole situation, so we >>> can all view it from the same standpoint. This should make the problem >>> easier to see and reproduce. >>> >>> I can confirm that running a stock 3.10 kernel with HRTICK enabled: >> [snip] >>> makes the following program (and the whole board, as a matter of fact) >>> hang with no further notice: >>> >> [snip] >>> If I then revert everything up to (and including) the offending commit >>> (mind the '~'): >>> >>> $ git log --oneline ...06c017f~ -- kernel/time/timekeeping.c >>> kernel/time/ntp.c | cut -f1 -d' ' | xargs git revert >>> >>> The problem disappears. >>> >>> If I then cherry-pick again the offending commit: >>> >>> $ git cherry-pick 06c017f; git log -1 >>> >>> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 >>> Author: John Stultz >>> Date: Fri Mar 22 11:37:28 2013 -0700 >>> >>> timekeeping: Hold timekeepering locks in do_adjtimex and hardpps >>> >> [snip] >>> And as soon as I also cherry-pick (notice there is another commit in >>> between, which seems not to be relevant on this matter): >>> >>> $ git cherry-pick a076b2146fabb0894cae5e0189a8ba3f1502d737; git show >>> >>> commit a076b2146fabb0894cae5e0189a8ba3f1502d737 >>> Author: John Stultz >>> Date: Fri Mar 22 11:52:03 2013 -0700 >>> >>> ntp: Remove ntp_lock, using the timekeeping locks to protect ntp >>> state >>> >> [snip] >>> >>> I end up in the situation where the system hangs completely and NO >>> deadlock report whatsoever is output. >>> >>> So it looks like 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 introduces >>> the deadlock, while a076b2146fabb0894cae5e0189a8ba3f1502d737 cares to >>> hide the report. >>> >>> Notice how I tested the above on an ARM board; on PowerPC I get >>> similar results, although I am not able to see the deadlock report >>> under any circumstances (enabling CONFIG_PROVE_LOCKING, which is the >>> flag that triggers the deadlock report, causes the kernel to hang at >>> startup even on a vanilla 3.10 kernel). >>> >>> John, could you please confirm whether you're at least able to >>> reproduce it somehow? >> >> Thanks again for the detailed notes. I've so far been unable to >> reproduce this, but I'm still looking at it. So I think I've managed to finally reproduce this and hunt it down. With Peter's "sched: Fix HRTICK" patch and HRTICK enabled, I found I could trigger a hard hang at boot on my x86_64 kvm system. sysrq didn't function, so I checked out info cpus and that pointed to both cpus being in ktime_get() and ktime_get_update_offsets(), which suggested a seqcount deadlock (basically calling something that reads the seqlock while we hold the write on it). Unfortunately the seqlock/seqcount infrastructure doesn't support lockdep, so I added some debug code to take and release the timekeeper_lock in every function that does a read on the timekeeper_seq. This popped up with: [ 15.846386] ====================================================== [ 15.847174] [ INFO: possible circular locking dependency detected ] [ 15.847959] 3.10.0+ #30 Not tainted [ 15.848402] ------------------------------------------------------- [ 15.849182] ntpd/4062 is trying to acquire lock: [ 15.849765] (&(&pool->lock)->rlock){..-...}, at: [] __queue_work+0x145/0x480 [ 15.850051] [ 15.850051] but task is already holding lock: [ 15.850051] (timekeeper_lock){-.-.-.}, at: [] do_adjtimex+0x7f/0x100 [ 15.850051] [ 15.850051] which lock already depends on the new lock. [ 15.850051] [ 15.850051] [ 15.850051] the existing dependency chain (in reverse order) is: [ 15.850051] -> #2 (timekeeper_lock){-.-.-.}: [ 15.850051] [] lock_acquire+0x9e/0x1e0 [ 15.850051] [] _raw_spin_lock_irqsave+0x53/0x90 [ 15.850051] [] ktime_get+0x32/0x110 [ 15.850051] [] hrtick_start+0x20/0x90 [ 15.850051] [] hrtick_start_fair+0x83/0xd0 [ 15.850051] [] hrtick_update+0x5e/0x60 [ 15.850051] [] enqueue_task_fair+0x961/0xfd0 [ 15.850051] [] enqueue_task+0x61/0x70 [ 15.850051] [] activate_task+0x1e/0x20 [ 15.850051] [] __schedule+0x799/0xa10 [ 15.850051] [] schedule+0x24/0x70 [ 15.850051] [] schedule_timeout+0x16c/0x2c0 [ 15.850051] [] io_schedule_timeout+0x9d/0xf0 [ 15.850051] [] wait_for_completion_io+0xaf/0x120 [ 15.850051] [] blk_execute_rq+0x8c/0xe0 [ 15.850051] [] scsi_execute+0xdf/0x170 [ 15.850051] [] scsi_execute_req_flags+0x95/0x110 [ 15.850051] [] sr_check_events+0xa4/0x2e0 [ 15.850051] [] cdrom_check_events+0x13/0x30 [ 15.850051] [] sr_block_check_events+0x34/0x50 [ 15.850051] [] disk_check_events+0x57/0x180 [ 15.850051] [] disk_events_workfn+0x17/0x20 [ 15.850051] [] process_one_work+0x1e7/0x660 [ 15.850051] [] worker_thread+0x119/0x370 [ 15.850051] [] kthread+0xd6/0xe0 [ 15.850051] [] ret_from_fork+0x7c/0xb0 [ 15.850051] -> #1 (&p->pi_lock){-.-.-.}: [ 15.850051] [] lock_acquire+0x9e/0x1e0 [ 15.850051] [] _raw_spin_lock_irqsave+0x53/0x90 [ 15.850051] [] try_to_wake_up+0x2c/0x300 [ 15.850051] [] wake_up_process+0x1e/0x40 [ 15.850051] [] start_worker+0x22/0x30 [ 15.850051] [] create_and_start_worker+0x48/0x90 [ 15.850051] [] init_workqueues+0x1b8/0x380 [ 15.850051] [] do_one_initcall+0x7b/0x114 [ 15.850051] [] kernel_init_freeable+0x99/0x1c3 [ 15.850051] [] kernel_init+0x9/0xf0 [ 15.850051] [] ret_from_fork+0x7c/0xb0 [ 15.850051] -> #0 (&(&pool->lock)->rlock){..-...}: [ 15.850051] [] __lock_acquire+0x1dce/0x1ee0 [ 15.850051] [] lock_acquire+0x9e/0x1e0 [ 15.850051] [] _raw_spin_lock+0x41/0x80 [ 15.850051] [] __queue_work+0x145/0x480 [ 15.850051] [] __queue_delayed_work+0xaa/0x1a0 [ 15.850051] [] queue_delayed_work_on+0x8e/0xa0 [ 15.850051] [] __do_adjtimex+0x158/0x560 [ 15.850051] [] do_adjtimex+0xa3/0x100 [ 15.850051] [] SYSC_adjtimex+0x3b/0x80 [ 15.850051] [] SyS_adjtimex+0x9/0x10 [ 15.850051] [] tracesys+0xdd/0xe2 [ 15.850051] [ 15.850051] other info that might help us debug this: [ 15.850051] [ 15.850051] Chain exists of: &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock [ 15.850051] Possible unsafe locking scenario: [ 15.850051] [ 15.850051] CPU0 CPU1 [ 15.850051] ---- ---- [ 15.850051] lock(timekeeper_lock); [ 15.850051] lock(&p->pi_lock); [ 15.850051] lock(timekeeper_lock); [ 15.850051] lock(&(&pool->lock)->rlock); [ 15.850051] [ 15.850051] *** DEADLOCK *** [ 15.850051] [ 15.850051] 1 lock held by ntpd/4062: [ 15.850051] #0: (timekeeper_lock){-.-.-.}, at: [] do_adjtimex+0x7f/0x100 [ 15.850051] [ 15.850051] stack backtrace: [ 15.850051] CPU: 0 PID: 4062 Comm: ntpd Not tainted 3.10.0+ #30 [ 15.850051] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 15.850051] ffffffff82354260 ffff880000239b78 ffffffff818f6cfe ffff880000239bb8 [ 15.850051] ffffffff818f2f51 ffff880000239c40 ffff88000002ea78 ffff88000002ea40 [ 15.850051] ffff88000002e310 0000000000000000 0000000000000001 ffff880000239ca8 [ 15.850051] Call Trace: [ 15.850051] [] dump_stack+0x19/0x1b [ 15.850051] [] print_circular_bug+0x1f9/0x208 [ 15.850051] [] __lock_acquire+0x1dce/0x1ee0 [ 15.850051] [] ? trace_hardirqs_off+0xd/0x10 [ 15.850051] [] ? sched_clock_local+0x1d/0x90 [ 15.850051] [] lock_acquire+0x9e/0x1e0 [ 15.850051] [] ? __queue_work+0x145/0x480 [ 15.850051] [] _raw_spin_lock+0x41/0x80 [ 15.850051] [] ? __queue_work+0x145/0x480 [ 15.850051] [] __queue_work+0x145/0x480 [ 15.850051] [] __queue_delayed_work+0xaa/0x1a0 [ 15.850051] [] queue_delayed_work_on+0x8e/0xa0 [ 15.850051] [] __do_adjtimex+0x158/0x560 [ 15.850051] [] do_adjtimex+0xa3/0x100 [ 15.850051] [] SYSC_adjtimex+0x3b/0x80 [ 15.850051] [] SyS_adjtimex+0x9/0x10 [ 15.850051] [] tracesys+0xdd/0xe2 So yea, thanks to the lockdep code we can see there's a three-lock deadlock. And as you discovered, it connected to adjtimex and the timekeeping locking changes. Specifically that we're now calling schedule_delayed_work() under a timekeeper_seq write, and that somehow ends up calling back to a timekeeping accessor function. Now, I'm still in the dark as to why HRTICK exposes this, but seems like the following patch should resolve the issue (and quiets the lockdep warnings in my testing). Let me know how it works for you! Ingo: This makes me think we really should have some lockdep smarts added to seqlock/seqcount structures. Is there something already discovered thats preventing this, or has this just not yet been tried? thanks -john >From 09885df9734aa678ccb61dbef356fea4faeff16c Mon Sep 17 00:00:00 2001 From: John Stultz Date: Mon, 9 Sep 2013 13:07:18 -0700 Subject: [PATCH] timekeepeing: Fix HRTICK related deadlock from ntp lock changes It was reported that when HRTICK is enabled, its possible to trigger system deadlocks. These were hard to reproduce, as HRTICK has been broken in the past, but seemed to be connected to the timekeeping_seq lock. Since seqlock/seqcount's aren't supported w/ lockdep, I added some extra spinlock based locking and triggered the following lockdep output: [ 15.849182] ntpd/4062 is trying to acquire lock: [ 15.849765] (&(&pool->lock)->rlock){..-...}, at: [] __queue_work+0x145/0x480 [ 15.850051] [ 15.850051] but task is already holding lock: [ 15.850051] (timekeeper_lock){-.-.-.}, at: [] do_adjtimex+0x7f/0x100 [ 15.850051] Chain exists of: &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock [ 15.850051] Possible unsafe locking scenario: [ 15.850051] [ 15.850051] CPU0 CPU1 [ 15.850051] ---- ---- [ 15.850051] lock(timekeeper_lock); [ 15.850051] lock(&p->pi_lock); [ 15.850051] lock(timekeeper_lock); [ 15.850051] lock(&(&pool->lock)->rlock); [ 15.850051] [ 15.850051] *** DEADLOCK *** This patch avoids this deadlock, by moving the call to schedule_delayed_work() outside of the timekeeper lock critical section. Reported-by: Gerlando Falauto Signed-off-by: John Stultz --- include/linux/timex.h | 1 + kernel/time/ntp.c | 6 ++---- kernel/time/timekeeping.c | 2 ++ 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/timex.h b/include/linux/timex.h index b3726e6..dd3edd7 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -141,6 +141,7 @@ extern int do_adjtimex(struct timex *); extern void hardpps(const struct timespec *, const struct timespec *); int read_current_timer(unsigned long *timer_val); +void ntp_notify_cmos_timer(void); /* The clock frequency of the i8253/i8254 PIT */ #define PIT_TICK_RATE 1193182ul diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 8f5b3b9..bb22151 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -516,13 +516,13 @@ static void sync_cmos_clock(struct work_struct *work) schedule_delayed_work(&sync_cmos_work, timespec_to_jiffies(&next)); } -static void notify_cmos_timer(void) +void ntp_notify_cmos_timer(void) { schedule_delayed_work(&sync_cmos_work, 0); } #else -static inline void notify_cmos_timer(void) { } +void ntp_notify_cmos_timer(void) { } #endif @@ -687,8 +687,6 @@ int __do_adjtimex(struct timex *txc, struct timespec *ts, s32 *time_tai) if (!(time_status & STA_NANO)) txc->time.tv_usec /= NSEC_PER_USEC; - notify_cmos_timer(); - return result; } diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index baeeb5c..fcc261c 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1682,6 +1682,8 @@ int do_adjtimex(struct timex *txc) write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); + ntp_notify_cmos_timer(); + return ret; } -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/