Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752053AbdGDMkL (ORCPT ); Tue, 4 Jul 2017 08:40:11 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:56998 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750838AbdGDMkK (ORCPT ); Tue, 4 Jul 2017 08:40:10 -0400 Date: Tue, 4 Jul 2017 14:40:03 +0200 From: Peter Zijlstra To: Ingo Molnar Cc: josef@toxicpanda.com, mingo@redhat.com, linux-kernel@vger.kernel.org, kernel-team@fb.com, Josef Bacik Subject: Re: [RFC][PATCH] sched: attach extra runtime to the right avg Message-ID: <20170704124003.i7lg4c7gdnqqbjyo@hirez.programming.kicks-ass.net> References: <1498787766-9593-1-git-send-email-jbacik@fb.com> <20170702093718.aq5p5xxfvrjdeful@gmail.com> <20170704094141.mebcs2pjv2s6vynt@hirez.programming.kicks-ass.net> <20170704101308.odsijqc6qa7p2pe3@gmail.com> <20170704122150.f2bqv55g7vvjztxa@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170704122150.f2bqv55g7vvjztxa@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2903 Lines: 91 On Tue, Jul 04, 2017 at 02:21:50PM +0200, Peter Zijlstra wrote: > On Tue, Jul 04, 2017 at 12:13:09PM +0200, Ingo Molnar wrote: > > > > This code on the other hand: > > > > sa->last_update_time += delta << 10; > > > > ... in essence creates a whole new absolute clock value that slowly but surely is > > drifting away from the real rq->clock, because 'delta' is always rounded down to > > the nearest 1024 ns boundary, so we accumulate the 'remainder' losses. > > > > That is because: > > > > delta >>= 10; > > ... > > sa->last_update_time += delta << 10; > > > > Given enough time, ->last_update_time can drift a long way, and this delta: > > > > delta = now - sa->last_update_time; > > > > ... becomes meaningless AFAICS, because it's essentially two different clocks that > > get compared. > > Thing is, once you drift over 1023 (ns) your delta increases and you > catch up again. > > > > A B C D E F > | | | | | | > +----+----+----+----+----+----+----+----+----+----+----+ > > > A: now = 0 > sa->last_update_time = 0 > delta := (now - sa->last_update_time) >> 10 = 0 > > B: now = 614 (+614) > delta = (614 - 0) >> 10 = 0 > sa->last_update_time += 0 (0) > sa->last_update_time = now & ~1023 (0) > > C: now = 1843 (+1229) > delta = (1843 - 0) >> 10 = 1 > sa->last_update_time += 1024 (1024) > sa->last_update_time = now & ~1023 (1024) > > > D: now = 3481 (+1638) > delta = (3481 - 1024) >> 10 = 2 > sa->last_update_time += 2048 (3072) > sa->last_update_time = now & ~1023 (3072) > > E: now = 5734 (+2253) > delta = (5734 - 3072) = 2 > sa->last_update_time += 2048 (5120) > sa->last_update_time = now & ~1023 (5120) > > F: now = 6348 (+614) > delta = (6348 - 5120) >> 10 = 1 > sa->last_update_time += 1024 (6144) > sa->last_update_time = now & ~1023 (6144) > > > > And you'll see that both are identical, and that both D and F have > gotten a spill from sub-chunk accounting. Where the two approaches differ is when we have different modifications to sa->last_update_time (and we do). The differential (+=) one does not mandate initial value of ->last_update_time has the bottom 9 bits cleared. It will simply continue from wherever. The absolute (&) one however mandates that ->last_update_time always has the bottom few bits 0, otherwise we can 'gain' time. The first iteration will clear those bits and we'll then double account them. It so happens that we have an explicit assign in migrate (attach_entity_load_avg / set_task_rq_fair). And on negative delta. In all those cases we use the immediate 'now' value, no clearing of bottom bits. The differential should work fine with that, the absolute one has double accounting issues in that case. So it would be very good to find what exactly causes Josef's workload to get 'fixed'.