Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp273601pxy; Thu, 22 Apr 2021 01:38:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwO243WjzCRz4ta5VTgrKjW1Y3Crrmkw+fZj6SBPtq17sa0M/33xmHQvt7kRGxkxhDGcBpR X-Received: by 2002:a63:ef4e:: with SMTP id c14mr2439922pgk.166.1619080705591; Thu, 22 Apr 2021 01:38:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619080705; cv=none; d=google.com; s=arc-20160816; b=WX4k0uLjFX2AxHRM8RdfXCfIKfZjspjgcqMsMxH0/0AZhrrZx/V389dw1TFAPs4nVx PqRqUwG8gQ3YVXSHbpxzEBSnxihcSQ1o4a1fH3g8B1dBOgWaqU9LhIQ4DbkYPqEqGPD/ eQ7Vp/1B6LTIMLYO23IG9wI7hkHcAI8xfHEQfQihEP7DodcBipPw6X4VennlYo5iVMG6 FN/muy26HENQnDjD55ymDxnKNaurwLRyk7LdAAQW2uk9nlo+BkJbaIZuz1RSwgGlqRXb J7t0ygqI1fxIfsnp8N2iWjFVJDyi8cMMSEwlLS7y8fMiYY1oMrW/9/03wp2PrY0LYAyv KUbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=5bNWGj0MtDwouVYiT6igwnQInlIcaj9lEKQchN4rGfQ=; b=j0qRemGNuQdbov8XVCCC7ADVY0wkTTx92rYY7gIt3IPxHW/x7mQ3W8z7skPJaiPe+R 5ydOcBg+MF9wBSv89FWUJpSdCBqqU0AAWZrv7MH4+NQ/04ZryCU0Lv89n2vSq0Sqm0m6 K5x3cH05L65XYtz+X2nGq0/2VKHs7RYZLKI97JP0RJwSaZ7URnGijirkQRAZndBWHTI+ ptDEdRno5ADqcwqtazgcWFc/NisuvlplUysjgRy9qWCoyXNf7NCZQovqg8hHM2Cfulw2 p1Mw1DsR1Nyz5JT7/KbhTuMh3H47i5rJTQ0x/9o5y5BlS8JytYpXTrMNySyNub4c8LwJ EIaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EK39SN42; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n20si2620788plp.383.2021.04.22.01.38.13; Thu, 22 Apr 2021 01:38:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EK39SN42; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235249AbhDVIiO (ORCPT + 99 others); Thu, 22 Apr 2021 04:38:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235314AbhDVIiI (ORCPT ); Thu, 22 Apr 2021 04:38:08 -0400 Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C56CDC06174A for ; Thu, 22 Apr 2021 01:37:32 -0700 (PDT) Received: by mail-lf1-x136.google.com with SMTP id x20so40594998lfu.6 for ; Thu, 22 Apr 2021 01:37:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5bNWGj0MtDwouVYiT6igwnQInlIcaj9lEKQchN4rGfQ=; b=EK39SN42ItaMbxrG+9TFGUjIRQwf2ZkofzLSuI57bhVQl80+so3aPUXpXv6umA3y4j 3oaKI7ZIjDDWU3BUOuBjOaZj2InXk+vCb+/T1pUlpqy9mWjYAiWkj76HUnbbZ3ywJRm8 g504hnVzopm/0VLtyXtULFtBeW8P5twXYsHTtGW06HPQA8b1/C1M+XRP1dPIyja32lDu q5Cy/1t4+zdvGvIzfrrdnCNxtbYbKLgJkTyZg/+xgghX+J00IKuDbJrNPAz/6R7mN3QI XeJc1RHME8HPQeaIy1TRm3+FxneHI763PvuvD79GJIfbebWFAeS5YFIm7+NvGvwrpSim gxYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5bNWGj0MtDwouVYiT6igwnQInlIcaj9lEKQchN4rGfQ=; b=sqaoWcQ63w+mSE5DU/pTLBk0275I3l78637y5CiRM8Ta0dQaAP8s+9Ogx9pFEhgr0y qcUqdAaeubaH4rIUT+sX2ilFL4f7fDOk3EtpUOqig9hm0AFKhGKd/Q4p7MWBiGXL/LaQ VcjNAUZyuXKcWnb1HuzU4WW52mpYZBg1UYpwGBHuyHqVTqd97Xx0TYDPtiql7wB0XrQX PyWttDIAxhN72OGFhovf6AhSKjzwVrlMWPgv6cHw9TlAZsaVvEjV0SCjMWJS0uJ88zYK 3oXfY+ePw/2WzBhVU0IkSSH0uvp29rWaucd9fDFRIYwWJfOwCTrHSygK+ELf/k00ejBx btiw== X-Gm-Message-State: AOAM533jPxTfz5Pw34aMR4+tqftbM8X4SV/ksq0xab09BdXx8YkkhB+J wYGHDrOpchm7dS5mTUicktsOLZFE7HUqa3CryKs9oM1GSlIO8w== X-Received: by 2002:ac2:560b:: with SMTP id v11mr1741453lfd.254.1619080651393; Thu, 22 Apr 2021 01:37:31 -0700 (PDT) MIME-Version: 1.0 References: <20210420120705.5c705d4b@imladris.surriel.com> In-Reply-To: From: Vincent Guittot Date: Thu, 22 Apr 2021 10:37:19 +0200 Message-ID: Subject: Re: [PATCH v3] sched,fair: skip newidle_balance if a wakeup is pending To: Rik van Riel Cc: linux-kernel , Kernel Team , Peter Zijlstra , Ingo Molnar , Dietmar Eggemann , Mel Gorman , Valentin Schneider Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 21 Apr 2021 at 19:27, Vincent Guittot wrote: > > Hi Rik, > > On Tue, 20 Apr 2021 at 18:07, Rik van Riel wrote: > > > > The try_to_wake_up function has an optimization where it can queue > > a task for wakeup on its previous CPU, if the task is still in the > > middle of going to sleep inside schedule(). > > > > Once schedule() re-enables IRQs, the task will be woken up with an > > IPI, and placed back on the runqueue. > > > > If we have such a wakeup pending, there is no need to search other > > CPUs for runnable tasks. Just skip (or bail out early from) newidle > > balancing, and run the just woken up task. > > > > For a memcache like workload test, this reduces total CPU use by > > about 2%, proportionally split between user and system time, > > and p99 and p95 application response time by 10% on average. > > The schedstats run_delay number shows a similar improvement. > > > > Signed-off-by: Rik van Riel > > --- > > kernel/sched/fair.c | 18 ++++++++++++++++-- > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 69680158963f..fd80175c3b3e 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -10594,6 +10594,14 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > u64 curr_cost = 0; > > > > update_misfit_status(NULL, this_rq); > > + > > + /* > > + * There is a task waiting to run. No need to search for one. > > + * Return 0; the task will be enqueued when switching to idle. > > + */ > > + if (this_rq->ttwu_pending) > > + return 0; > > + > > /* > > * We must set idle_stamp _before_ calling idle_balance(), such that we > > * measure the duration of idle_balance() as idle time. > > @@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > * Stop searching for tasks to pull if there are > > * now runnable tasks on this rq. > > */ > > - if (pulled_task || this_rq->nr_running > 0) > > + if (pulled_task || this_rq->nr_running > 0 || > > + this_rq->ttwu_pending) > > break; > > } > > rcu_read_unlock(); > > @@ -10688,7 +10697,12 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) > > if (this_rq->nr_running != this_rq->cfs.h_nr_running) > > pulled_task = -1; > > > > - if (pulled_task) > > + /* > > + * If we are no longer idle, do not let the time spent here pull > > + * down this_rq->avg_idle. That could lead to newidle_balance not > > + * doing enough work, and the CPU actually going idle. > > + */ > > + if (pulled_task || this_rq->ttwu_pending) > > I'm still running some benchmarks to evaluate the impact of your patch > and more especially the line above which clears this_rq->idle_stamp > and skips the time spent in newidle_balance from being accounted for > in avg_idle. I have some results which show some regression because > of this test especially with hackbench. > On large system, the time spent in newidle_balance can be significant > and we can't ignore it just because this_rq->ttwu_pending is set while > looping the domains because without newidle_balance the idle time > would have been large and we end up screwing up the metric I confirmed that the line above generate hackbench regression on my large arm64 system (2 * 112 CPUs) I'm testing hackbench with various number of group : 1, 2, 4, 16, 32, 64, 128, 256 but I have only put the 2 results which significantly regress. The other ones are in the +/-1% variation range hackbench -g $group group v5.12-rc8+tip w/ this patch w/ this patch without the line above 64 2.862(+/- 9%) 2.952(+/-11%) -3% 2.807(+/- 7%) +2% 128 3.334(+/-10%) 3.561-+/-13%) -7% 3.181(+/- 6%) +4% > > > this_rq->idle_stamp = 0; > > > > rq_repin_lock(this_rq, rf); > > -- > > 2.25.4 > > > >