Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752258AbdGDKNO (ORCPT ); Tue, 4 Jul 2017 06:13:14 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:36188 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751979AbdGDKNN (ORCPT ); Tue, 4 Jul 2017 06:13:13 -0400 Date: Tue, 4 Jul 2017 12:13:09 +0200 From: Ingo Molnar To: Peter Zijlstra Cc: josef@toxicpanda.com, mingo@redhat.com, linux-kernel@vger.kernel.org, kernel-team@fb.com, Josef Bacik Subject: Re: [RFC][PATCH] sched: attach extra runtime to the right avg Message-ID: <20170704101308.odsijqc6qa7p2pe3@gmail.com> References: <1498787766-9593-1-git-send-email-jbacik@fb.com> <20170702093718.aq5p5xxfvrjdeful@gmail.com> <20170704094141.mebcs2pjv2s6vynt@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170704094141.mebcs2pjv2s6vynt@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2132 Lines: 65 * Peter Zijlstra wrote: > > An intermediate approach to improve that skew would be something like below. > > It doesn't track the remainder like your patch does, but doesn't lose > > precision either, just rounds down 'now' to the nearest 1024 boundary. > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 008c514dc241..b03703cd7989 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -2965,7 +2965,7 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, > > if (!delta) > > return 0; > > > > - sa->last_update_time += delta << 10; > > + sa->last_update_time = now & ~1023ULL; > > > > So if we have a task that always runs <1024ns it should still get blocks > of runtime because the difference between now and now&~1023 can be !0 > and spill. Agreed - in the first approximation I was trying to figure out why Josef was seeing an effect from the patch. > I'm just not immediately seeing how its different from the 0-sum we had. > It should be identical since delta*1024 would equally land us on those > same edges (there's an offset in the differential form between the two, > but since we start with last_update_time=0, the resulting edges are the > same afaict). So I think the difference is that this: sa->last_update_time = now & ~1023ULL; is tracking the absolute value of 'now' (i.e. rq->clock in most cases) by and large, with a 1024 ns imprecision. This code on the other hand: sa->last_update_time += delta << 10; ... in essence creates a whole new absolute clock value that slowly but surely is drifting away from the real rq->clock, because 'delta' is always rounded down to the nearest 1024 ns boundary, so we accumulate the 'remainder' losses. That is because: delta >>= 10; ... sa->last_update_time += delta << 10; Given enough time, ->last_update_time can drift a long way, and this delta: delta = now - sa->last_update_time; ... becomes meaningless AFAICS, because it's essentially two different clocks that get compared. But I might be super confused about this myself ... Thanks, Ingo