Date: Fri, 8 Apr 2016 14:53:21 +0200
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
        Byungchul Park <byungchul.park@lge.com>,
        Chris Metcalf <cmetcalf@ezchip.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Luiz Capitulino <lcapitulino@redhat.com>,
        Christoph Lameter <cl@linux.com>,
        "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>,
        Mike Galbraith <efault@gmx.de>, Rik van Riel <riel@redhat.com>,
        Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load
 accounting
Message-ID: <20160408125320.GB24956@lerouge>
References: <1460077633-23431-1-git-send-email-fweisbec@gmail.com>
 <1460077633-23431-3-git-send-email-fweisbec@gmail.com>
 <20160408094153.GL3448@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160408094153.GL3448@twins.programming.kicks-ass.net>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2645
Lines: 65

On Fri, Apr 08, 2016 at 11:41:54AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 08, 2016 at 03:07:12AM +0200, Frederic Weisbecker wrote:
> > +void cpu_load_update_nohz_start(void)
> >  {
> >  	struct rq *this_rq = this_rq();
> > +
> > +	/*
> > +	 * This is all lockless but should be fine. If weighted_cpuload changes
> > +	 * concurrently we'll exit nohz. And cpu_load write can race with
> > +	 * cpu_load_update_idle() but both updater would be writing the same.
> > +	 */
> > +	this_rq->cpu_load[0] = weighted_cpuload(cpu_of(this_rq));
> > +}
> 
> There is more to this; this also updates ->cpu_load[0] at possibly much
> higher frequency than we've done before, while not updating the other
> ->cpu_load[] members.
> 
> Now, I'm not sure we care, but it is a bit odd.

This is right. cpu_load[0] is aimed at showing the most recent load, without knowing
when was the last update (the last tick/update could have been recent or not, the readers shouldn't
care). Now we can indeed worry about it if this field is read altogether with the other indexes and
those are put into some relation. But it seems that each of the rq->cpu_load[i] are read independently
without relation or comparison. Now really I'm saying that without much assurance as I don't know
the details of the load balancing code.

If in doubt I can create a field in struct rq to record the tickless load instead
of storing it in cpu_load[0]. That was in fact the first direction I took in my drafts.

> 
> > +/*
> > + * Account the tickless load in the end of a nohz frame.
> > + */
> > +void cpu_load_update_nohz_stop(void)
> > +{
> >  	unsigned long curr_jiffies = READ_ONCE(jiffies);
> > +	struct rq *this_rq = this_rq();
> > +	unsigned long load;
> >  
> >  	if (curr_jiffies == this_rq->last_load_update_tick)
> >  		return;
> >  
> > +	load = weighted_cpuload(cpu_of(this_rq));
> >  	raw_spin_lock(&this_rq->lock);
> > +	cpu_load_update_nohz(this_rq, curr_jiffies, load);
> >  	raw_spin_unlock(&this_rq->lock);
> >  }
> 
> And this makes us take rq->lock when waking from nohz; a bit
> unfortunate. Do we really need this though? 

Note it was already the case before this patchset, the function was called
cpu_load_update_nohz() instead.

I think we need it, at least due to remote updates performed by cpu_load_update_idle()
(which only updates when load is 0 though).

> Will not a tick be forthcoming real-soon-now?

No guarantee about that. If the next task only runs for less than 10ms (in HZ=100),
there might be no tick until the next idle period.

Besides in nohz_full, the next task may be running tickless as well, in which case
we really need to update now.