Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp8066859ybi; Tue, 9 Jul 2019 08:37:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqxRK/csa68V8AfSyQcRHmcAq2KVAeEdDvbjjjwnQ+oAVLdH5JB2mHNBTvd/VqUtt2j8LXTN X-Received: by 2002:a17:902:2ac7:: with SMTP id j65mr31487710plb.242.1562686658875; Tue, 09 Jul 2019 08:37:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562686658; cv=none; d=google.com; s=arc-20160816; b=VXRn4/39R6PrhRGkiDrXpum6gu7T7IQOgTkXQuChXXla+Q1Mo/nFOZYXPQuHNYS5IL I1wVuMm22zrz01kocYa5nqfj4lrNB6IcgzSXrkNHR/hA92UvVZsbPefxcnRcDE6Ui7aB Spzg80H0ViXpMcZ8gG36HnKFzU4laCGMBqk0vAnSfD5Dj3GjLEWTgzI5KQCVOLRC+tKg bfFesBK3aEAR6t3LBKBBHy5X6Aq/+k8TL1BCMoulxr5n2YRgYjLphurxClzyRj4DwuEG iJhwWk0s+RVCYPx9fW5zCy4O6Jn/d8DDV31pLIwBnfLOSZt+S5fkldyQSTyJ3STtTwo/ 5BQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=TPh3jTfJYH1mQPE+p+Ui1THek54wI3/7w2x9Ph54Pz8=; b=AExxsoWUCxXzFU/jq529f3pLcYEs6httopVuNjiax6DsatHzUbs6lFrI6wrRrC9eDp GFiYuNkFAS6TCzHZ+iDn74WjXd95N9zKfwaP4Bkn6XEHIC6Hnyj1Pm9Jsaf37Ovqf1wM Q30VUOse+ZkhgM0AEBoCQU3dm2To2xGI9AXjOwm6JP213yFssYpqy2vl7ycNQaO4b87H Fodb8Ca8rOi6TnsCEG+FHlJgjSlxetC4Qy8hNCMmJRsXKLN1M/eT9cN8bl5hUpKVxkyP eeYaFsSAaRh099alBtn0eFxH0KDXwGOLMJnkHuz8aS6By2co/GJDwOnzFwE5PG0EsdJ7 lXTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Q8ZfTANu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e9si12679271pls.36.2019.07.09.08.37.18; Tue, 09 Jul 2019 08:37:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Q8ZfTANu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726232AbfGIPg3 (ORCPT + 99 others); Tue, 9 Jul 2019 11:36:29 -0400 Received: from mail-lf1-f68.google.com ([209.85.167.68]:41051 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726060AbfGIPg3 (ORCPT ); Tue, 9 Jul 2019 11:36:29 -0400 Received: by mail-lf1-f68.google.com with SMTP id 62so13741626lfa.8 for ; Tue, 09 Jul 2019 08:36:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=TPh3jTfJYH1mQPE+p+Ui1THek54wI3/7w2x9Ph54Pz8=; b=Q8ZfTANuiOZegcKM3Bn/7vasHoE1CJXz7mI7QutoG/PgSuLA2LEb0FYQLBJPWlYg6v 0w3zjRFtfSNdS1RkmdWgR1zGcEBFTfVMOBqACsV7OuqRNJy4/Q8mBEaEdqx+NLpkzEEI 8ngITUf62eH9XQQog2Dc1jDd52dhr3grD6CU0iKlOtixwKyBO8f7rMdxZIXiNQqKq89Q rxdqrRTS78t+GqAVIh2oxvoO6oIJ4yUdMedO0XcfREmQQG/6K+qiRaOQuLnZs0OsXOjW WaHhD+th2ws68cHyYANIweMa81TcMtWmIMsk8oB8vdh6Zd3DjbKGHEp2QSPb8vwKO7uA JaqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=TPh3jTfJYH1mQPE+p+Ui1THek54wI3/7w2x9Ph54Pz8=; b=mewFFS3vlHTnO9RwzX8ySLJODOs13dnrocFoWqBqjolxkuNXOZ9uOeCZJNsgMdMzob i1YWxVWlrrCFqIjZLh5UbXpw8ovExCeo6j+1t2ZtysrtR4DQLJW3As9foC3scDC/bzFz 7BFVfZ+duXfhKUz7czA82JsVw2M0UXmdbDgmy3V9anJxbRRESyW6vwO3a43T1Iea/z5r +NVz3XiQDu6pAbe3qdQlDjvob+MQsPEl0Lvq++jQBXmfVoXNovYKLczrtkiY3zHAXn45 ZEizJ8aV8/PYeZbxaaNu9ImKKyljLPT5FTudmNB/I53WcZxTLCR+9I55b8yQYWOSULNn ZAAA== X-Gm-Message-State: APjAAAUSAvCQqpiI9hIOiYbla0G+jKghLvVb4WdiM4E9KWToqmV89BhZ WVyfVj5Rs8IR9ZqXQM1EbbCUR/Vyimk1bZ/8oRqKzE6p X-Received: by 2002:a19:ec15:: with SMTP id b21mr12931602lfa.32.1562686586223; Tue, 09 Jul 2019 08:36:26 -0700 (PDT) MIME-Version: 1.0 References: <20190709115759.10451-1-chris.redpath@arm.com> <20190709135054.GF3402@hirez.programming.kicks-ass.net> In-Reply-To: From: Vincent Guittot Date: Tue, 9 Jul 2019 17:36:15 +0200 Message-ID: Subject: Re: [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity To: Chris Redpath Cc: Peter Zijlstra , "linux-kernel@vger.kernel.org" , Ingo Molnar , Morten Rasmussen , Dietmar Eggemann Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Chris, On Tue, 9 Jul 2019 at 17:23, Chris Redpath wrote: > > Hi Peter, > > On 09/07/2019 14:50, Peter Zijlstra wrote: > > On Tue, Jul 09, 2019 at 12:57:59PM +0100, Chris Redpath wrote: > >> The ancient workaround to avoid the cost of updating rq clocks in the > >> middle of a migration causes some issues on asymmetric CPU capacity > >> systems where we use task utilization to determine which cpus fit a ta= sk. > >> On quiet systems we can inflate task util after a migration which > >> causes misfit to fire and force-migrate the task. > >> > >> This occurs when: > >> > >> (a) a task has util close to the non-overutilized capacity limit of a > >> particular cpu (cpu0 here); and > >> (b) the prev_cpu was quiet otherwise, such that rq clock is > >> sufficiently out of date (cpu1 here). > >> > >> e.g. > >> _____ > >> cpu0: ________________________| |______________ > >> > >> |<- misfit happens > >> ______ ___ ___ > >> cpu1: ____| |______________|___| |_________| > >> > >> ->| |<- wakeup migration time > >> last rq clock update > >> > >> When the task util is in just the right range for the system, we can e= nd > >> up migrating an unlucky task back and forth many times until we are lu= cky > >> and the source rq happens to be updated close to the migration time. > >> > >> In order to address this, lets update both rq_clock and cfs_rq where > >> this could be an issue. > > > > Can you quantify how much of a problem this really is? It is really sad= , > > but this is already the second place where we take rq->lock on > > migration. We worked so hard to avoid having to acquire it :/ > > > > I think you're familiar with the way we test the EAS and misfit stuff, > but some might not be, so I'll just outline them. > > We have performance and placement tests for a suite of simple synthetic > scenarios selected to trigger the EAS & misfit mechanisms. The > performance tests use rt-app's slack metric, and we try to minimise > negative slack (i.e. missed deadlines). > > In the placement tests we estimate the minimum energy consumed to run a > particular synthetic test job and we calculate the energy consumed in > the actual execution according to a trace. We pass the test if our > estimate of actual is less than ideal+20%. > > We enter this code quite often in our testing, most individual runs of a > test which has small tasks involved have at least one hit where we make > a change to the clock with this patch in. Do you have a rt-app file that you can share ? > > That said - despite the relatively high number of hits only about 5% of > runs see enough additional energy consumed to trigger a test failure. We > do try to keep a quiet system as much as possible and only run for a few > seconds so the impact we see in testing is also probably higher than in > the real world. Yeah, I'm curious to see the impact on a real system which have a 60fps screen update like an android phone > > I totally appreciate the reluctance to add this - I don't much like it > either, but I was hoping that sticking it behind the asym_cpucapacity > key might be a reasonable compromise. > > At least only those people who select a CPU using task util and capacity > see this cost, and we have smaller systems so in theory the cost is small= er. > > I'm very open to exploring alternatives :) > > >> Signed-off-by: Chris Redpath > >> --- > >> kernel/sched/fair.c | 15 +++++++++++++++ > >> 1 file changed, 15 insertions(+) > >> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >> index b798fe7ff7cd..51791db26a2a 100644 > >> --- a/kernel/sched/fair.c > >> +++ b/kernel/sched/fair.c > >> @@ -6545,6 +6545,21 @@ static void migrate_task_rq_fair(struct task_st= ruct *p, int new_cpu) > >> * wakee task is less decayed, but giving the wakee more = load > >> * sounds not bad. > >> */ > >> + if (static_branch_unlikely(&sched_asym_cpucapacity) && > >> + p->state =3D=3D TASK_WAKING) { > > > > nit: indent fail. > > > > oops, will tweak it > > --Chris > IMPORTANT NOTICE: The contents of this email and any attachments are conf= idential and may also be privileged. If you are not the intended recipient,= please notify the sender immediately and do not disclose the contents to a= ny other person, use it for any purpose, or store or copy the information i= n any medium. Thank you.