Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758561AbcDHMxf (ORCPT ); Fri, 8 Apr 2016 08:53:35 -0400 Received: from mail-wm0-f41.google.com ([74.125.82.41]:35327 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752421AbcDHMxa (ORCPT ); Fri, 8 Apr 2016 08:53:30 -0400 Date: Fri, 8 Apr 2016 14:53:21 +0200 From: Frederic Weisbecker To: Peter Zijlstra Cc: LKML , Byungchul Park , Chris Metcalf , Thomas Gleixner , Luiz Capitulino , Christoph Lameter , "Paul E . McKenney" , Mike Galbraith , Rik van Riel , Ingo Molnar Subject: Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Message-ID: <20160408125320.GB24956@lerouge> References: <1460077633-23431-1-git-send-email-fweisbec@gmail.com> <1460077633-23431-3-git-send-email-fweisbec@gmail.com> <20160408094153.GL3448@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160408094153.GL3448@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2645 Lines: 65 On Fri, Apr 08, 2016 at 11:41:54AM +0200, Peter Zijlstra wrote: > On Fri, Apr 08, 2016 at 03:07:12AM +0200, Frederic Weisbecker wrote: > > +void cpu_load_update_nohz_start(void) > > { > > struct rq *this_rq = this_rq(); > > + > > + /* > > + * This is all lockless but should be fine. If weighted_cpuload changes > > + * concurrently we'll exit nohz. And cpu_load write can race with > > + * cpu_load_update_idle() but both updater would be writing the same. > > + */ > > + this_rq->cpu_load[0] = weighted_cpuload(cpu_of(this_rq)); > > +} > > There is more to this; this also updates ->cpu_load[0] at possibly much > higher frequency than we've done before, while not updating the other > ->cpu_load[] members. > > Now, I'm not sure we care, but it is a bit odd. This is right. cpu_load[0] is aimed at showing the most recent load, without knowing when was the last update (the last tick/update could have been recent or not, the readers shouldn't care). Now we can indeed worry about it if this field is read altogether with the other indexes and those are put into some relation. But it seems that each of the rq->cpu_load[i] are read independently without relation or comparison. Now really I'm saying that without much assurance as I don't know the details of the load balancing code. If in doubt I can create a field in struct rq to record the tickless load instead of storing it in cpu_load[0]. That was in fact the first direction I took in my drafts. > > > +/* > > + * Account the tickless load in the end of a nohz frame. > > + */ > > +void cpu_load_update_nohz_stop(void) > > +{ > > unsigned long curr_jiffies = READ_ONCE(jiffies); > > + struct rq *this_rq = this_rq(); > > + unsigned long load; > > > > if (curr_jiffies == this_rq->last_load_update_tick) > > return; > > > > + load = weighted_cpuload(cpu_of(this_rq)); > > raw_spin_lock(&this_rq->lock); > > + cpu_load_update_nohz(this_rq, curr_jiffies, load); > > raw_spin_unlock(&this_rq->lock); > > } > > And this makes us take rq->lock when waking from nohz; a bit > unfortunate. Do we really need this though? Note it was already the case before this patchset, the function was called cpu_load_update_nohz() instead. I think we need it, at least due to remote updates performed by cpu_load_update_idle() (which only updates when load is 0 though). > Will not a tick be forthcoming real-soon-now? No guarantee about that. If the next task only runs for less than 10ms (in HZ=100), there might be no tick until the next idle period. Besides in nohz_full, the next task may be running tickless as well, in which case we really need to update now.