Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932367AbbFEH5m (ORCPT ); Fri, 5 Jun 2015 03:57:42 -0400 Received: from mail-wg0-f52.google.com ([74.125.82.52]:34749 "EHLO mail-wg0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752527AbbFEH5k (ORCPT ); Fri, 5 Jun 2015 03:57:40 -0400 Date: Fri, 5 Jun 2015 09:57:34 +0200 From: Ingo Molnar To: John Stultz Cc: Jeremiah Mahler , Thomas Gleixner , Preeti U Murthy , Peter Zijlstra , Viresh Kumar , Marcelo Tosatti , Frederic Weisbecker , lkml Subject: Re: [BUG, bisect] hrtimer: severe lag after suspend & resume Message-ID: <20150605075734.GA25340@gmail.com> References: <20150604005624.GA1789@hudson.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2682 Lines: 65 * John Stultz wrote: > So I suspect the problem is the change to clock_was_set_seq in > timekeeping_update is done prior to mirroring the time state to the > shadow-timekeeper. Thus the next time we do update_wall_time() the updated > sequence is overwritten by whats in the shadow copy. The attached patch moving > the modification up seems to avoid the issue for me. > > Thomas: Looking at the problematic change, I'm not a big fan of it. Caching > timekeeping state here in the hrtimer code has been a source of bugs in the > past, and I'm not sure I see how avoiding copying 24bytes is that big of a win. > Especially since it adds more state to the timekeeper and hrtimer base that we > have to read and mange. Personally I'd prefer a revert to my fix. So it's not really the copying of the 24 bytes that is the problem with that pattern. It's the constant dirtying of a cacheline that would otherwise be read-mostly, and then the reading of it from the percpu hrtimer interrupts. That can have negative scalability effects similar to a global lock. ( Now I have not checked whether this cacheline truly becomes read-mostly after the original commit - I suspect Thomas did. ) > + if (action & TK_CLOCK_WAS_SET) > + tk->clock_was_set_seq++; > + > tk_update_ktime_data(tk); So I'd also add a comment that this update should be done before: if (action & TK_MIRROR) memcpy(&shadow_timekeeper, &tk_core.timekeeper, sizeof(tk_core.timekeeper)); Also, there appears to be a layering violation here (unless I mis-read the tk_real/tk logic): this should copy 'tk', not access tk_core directly, correct? Right now this does not matter, because 'tk == &tk_core' is always supposed to be true, or at least it should point to an identical shadow copy, right? But nevertheless tk_core should only ever be directly referenced when 'tk' is initialized from it. Also, there's a few more of these apparent layering violations: git grep 'tk_core' kernel/time/timekeeping.c | grep -v 'struct timekeeper' I realize that we are transitioning from former global variables based timekeeping to a more parametric model, so this isn't a complaint, just an observation. Right now it's a code cleanliness detail, but if/when we introduce clocks that are updated independently from each other then it will also matter functionally. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/