Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752466AbbFESwG (ORCPT ); Fri, 5 Jun 2015 14:52:06 -0400 Received: from mail-ig0-f173.google.com ([209.85.213.173]:37387 "EHLO mail-ig0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750836AbbFESwE (ORCPT ); Fri, 5 Jun 2015 14:52:04 -0400 MIME-Version: 1.0 In-Reply-To: <20150605100707.GB8995@gmail.com> References: <20150604005624.GA1789@hudson.localdomain> <20150605100707.GB8995@gmail.com> Date: Fri, 5 Jun 2015 11:52:03 -0700 Message-ID: Subject: Re: [BUG, bisect] hrtimer: severe lag after suspend & resume From: John Stultz To: Ingo Molnar Cc: Thomas Gleixner , Jeremiah Mahler , Preeti U Murthy , Peter Zijlstra , Viresh Kumar , Marcelo Tosatti , Frederic Weisbecker , lkml Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3435 Lines: 71 On Fri, Jun 5, 2015 at 3:07 AM, Ingo Molnar wrote: > > * Thomas Gleixner wrote: > >> On Thu, 4 Jun 2015, John Stultz wrote: >> > On Wed, Jun 3, 2015 at 5:56 PM, Jeremiah Mahler wrote: >> > So I suspect the problem is the change to clock_was_set_seq in >> > timekeeping_update is done prior to mirroring the time state to the >> > shadow-timekeeper. Thus the next time we do update_wall_time() the >> > updated sequence is overwritten by whats in the shadow copy. The >> > attached patch moving the modification up seems to avoid the issue for >> > me. >> >> Duh, yes. >> >> > Thomas: Looking at the problematic change, I'm not a big fan of it. Caching >> > timekeeping state here in the hrtimer code has been a source of bugs in the >> > past, and I'm not sure I see how avoiding copying 24bytes is that big of a >> > win. Especially since it adds more state to the timekeeper and hrtimer base >> > that we have to read and mange. >> >> It's not about copying 24 bytes. It's about touching 3 cache lines for nothing. >> In situations where we run high frequency periodic timers on clock monotonic and >> nothing is going on in the other clock domains, which is a pretty common >> situation, this is measurable in terms of cache utilization. [...] > > It's not just about 'touching': it's about _dirtying_ cachelines from a globally > executed function (timekeeping), which is then accessed by per-CPU functionality > (hrtimers). Right, but part of that issue is that we're caching in the hrtimer cpu bases data that *should not be cached*. That was the core issue that caused the 2012 leapsecond issue, and I'd prefer to not reintroduce it. The offset data is only valid for the monotonic time its read for. So dirtying three cache lines really is just due to the fact that the data is stored in the cpu_base structure where I'd argue it doesn't provide real value (other then convenience of indexing it cleanly). Reading the offset data into three values from the stack would be fine too, and (I think) would avoid dirtying much extra (we have to store the now value anyway). BTW: Thomas, what are you using to do measurements here? I hesitate to argue in these sorts of performance discussions, since I really only have a embarrassing theoretical understanding of the issues and suspect myself a bit naive here. Additionally these sorts of constraints aren't always clearly documented, so being able to measure and compare would be helpful to ensure future changes don't impact things here. > That makes it far more expensive, it has similar scalability limiting effects as a > global lock - while if we do it smart it can perform as essentially lockless code > in most cases. Another reason why I don't like this approach of caching the data is that it also prevents fixing the leap-second adjustment to happen on the second edge, because we have to have an async update to the seqcounter in order to refresh the cached real_offset. Or we have to also export more ntp state data so we can duplicate the adjustment to the cached data in the hrtimer code, which is more of the complexity you've objected to. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/