Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2671608pxb; Tue, 23 Feb 2021 12:33:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJwGy8B7MqLKCHJUyLSmOnYkWBPAuhvowVcRQAoeBS8fWa1yaTyMpf6JlIlzWM0Cpa3B76DG X-Received: by 2002:a17:906:d8c3:: with SMTP id re3mr16844365ejb.82.1614112433438; Tue, 23 Feb 2021 12:33:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614112433; cv=none; d=google.com; s=arc-20160816; b=LB1nRH+l7EkFExl0Mr9Y5ZB55l5n52sbk8LIlhqWLZEOAWaTvWi5+PFpAuBII7aHvL 0APWhDeiNvtAw+xmUo2NmI02SBe2l0AJAQ5VQPTaD1F5Y+MSfMslOuyLlXIcdiXDkx/5 e9kvYvw3moWwK5C0H3WgSedrtObM0d48/js/1zuZNilt7qewh1QVN6JRUW8NPRWySfAX sNR8GQKomwIdsZfmZmFLXMkhnsKn8z8WR0vd3jwpENL0JScZJejAk6+z1eD3r0BDZu2e JqKKTANB2f6Ld3Nq+J6Xjv5SAAlbE+A74SjOurG2YghJ/MaXNwDhvgQ3q8MmQpJ4Z+gj jeCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ucIMA16Ia1ta00f/ORt+SVfhiEmO9kzx7JvEbe3PkvQ=; b=Hjegwy5q+tzoyx+FT2jqHMl0LByiUlkq+Vtr+4mR5si2aWgrG4TtLoHDmEkk24Y/CA XE4IllupA27R/E9SbbPutkY695wCuUZBa7FvkjmCzVWsy/WOIBE4zTCTDoM0WHy9J/vO UANL77fMiXVpcl9nPlb+tVhYhYWeVkDtLzvSPOrzAFtrKJq6v1h9MegFSe5DUeKnRjYZ AfeulsLjPtzKpR4ir9u2db2Wh4WIb6tTeQUqW0nBJBVnvjvPI+OiStwuCScOWmzWKO4+ JPGfxtKH6aW/xNKOxTl0Pc+atavRpgpkjQY+Wj3KqukewllCxKyj9tiK7XKkVbcOO6zd LRbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OwvyIgXB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a10si14640898ejk.696.2021.02.23.12.33.27; Tue, 23 Feb 2021 12:33:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OwvyIgXB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233705AbhBWRee (ORCPT + 99 others); Tue, 23 Feb 2021 12:34:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40558 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233736AbhBWReY (ORCPT ); Tue, 23 Feb 2021 12:34:24 -0500 Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com [IPv6:2a00:1450:4864:20::131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE211C06174A for ; Tue, 23 Feb 2021 09:33:39 -0800 (PST) Received: by mail-lf1-x131.google.com with SMTP id v30so12458389lfq.6 for ; Tue, 23 Feb 2021 09:33:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ucIMA16Ia1ta00f/ORt+SVfhiEmO9kzx7JvEbe3PkvQ=; b=OwvyIgXBV7uI6L/jcCDzSIelrDcqrsOYU/JIGWDqfB9Dxx9hhKqpP2xUtj1KEXBUrX Ri5pzED4hYEqJUhS+pfkTQOpXP+TgtoXC9muDxiY9PX9PmhfzM8r+BQ8/6RnT/SOkToP 6J8qOB01qiJc0o3f+2Cvvlc+Xyi0Uam+8cx2kOXjoqjb/PZB57JLx0ZaZO2wiA0QErqo o7nBLI9w3G2P+vfRCJpdX2Us7AaKy3w0VHnGHn3irEGw9PA3/coUkB7u9MUk/yzabsVZ aclvZmYA+LMWj28YzqvefYb0GBRJPwb5q8LwKlw5aWJXaGi5j7b83WDWIyROhlBNIkCx VrBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ucIMA16Ia1ta00f/ORt+SVfhiEmO9kzx7JvEbe3PkvQ=; b=ko/CGbvRqYRNTcT1ybJMhxf27ZyUVwGJX68q4Jqd+vevV55E2945WTbSgIEhpFSMAF tNkOFQITWYmst2jO08tpSyR/E0Aq6dSwhc/GTn4xSFZ+N6sdISZLrPkN0GpBw1eGXGbh 9zwAzTHAyCJTXTZ4rPMRBMUmrzDoz5Pt8+fT/wOZu5jdNuTgauuTi0W8WmR/Pz312fuB F+aehQj+Nq31zLX0yCPDSUu/zsamNpFJg5auvhrSvPCuJ0qng0ehxzMzE/bLGJOoygvo s3XDmJ2W5zGgQHgygHVjECoQ4/oPnyRB3i5RqUgciHgselIBLH8UNBrRTB2k9X0Oq7mg 50ew== X-Gm-Message-State: AOAM531E/Lz2G0dUhbKDqwBUyHNxsC96AGTVTTvcRM2KEqoQTwuEQFT8 Dk6unbAl5/OnsY0YCljDwqd4la0thLZRJW/NmHWauFvBnP8H/g== X-Received: by 2002:a05:6512:11cd:: with SMTP id h13mr1277350lfr.233.1614101618096; Tue, 23 Feb 2021 09:33:38 -0800 (PST) MIME-Version: 1.0 References: <1611554578-6464-1-git-send-email-aubrey.li@intel.com> In-Reply-To: From: Vincent Guittot Date: Tue, 23 Feb 2021 18:33:26 +0100 Message-ID: Subject: Re: [RFC PATCH v1] sched/fair: limit load balance redo times at the same sched_domain level To: "Li, Aubrey" Cc: Aubrey Li , Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , linux-kernel , Andi Kleen , Tim Chen , Srinivas Pandruvada , "Rafael J . Wysocki" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 23 Feb 2021 at 06:41, Li, Aubrey wrote: > > Hi Vincent, > > Sorry for the delay, I just returned from Chinese New Year holiday. > > On 2021/1/25 22:51, Vincent Guittot wrote: > > On Mon, 25 Jan 2021 at 15:00, Li, Aubrey wrote: > >> > >> On 2021/1/25 18:56, Vincent Guittot wrote: > >>> On Mon, 25 Jan 2021 at 06:50, Aubrey Li wrote: > >>>> > >>>> A long-tail load balance cost is observed on the newly idle path, > >>>> this is caused by a race window between the first nr_running check > >>>> of the busiest runqueue and its nr_running recheck in detach_tasks. > >>>> > >>>> Before the busiest runqueue is locked, the tasks on the busiest > >>>> runqueue could be pulled by other CPUs and nr_running of the busiest > >>>> runqueu becomes 1, this causes detach_tasks breaks with LBF_ALL_PINNED > >>> > >>> We should better detect that when trying to detach task like below > >> > >> This should be a compromise from my understanding. If we give up load balance > >> this time due to the race condition, we do reduce the load balance cost on the > >> newly idle path, but if there is an imbalance indeed at the same sched_domain > > > > Redo path is there in case, LB has found an imbalance but it can't > > move some loads from this busiest rq to dest rq because of some cpu > > affinity. So it tries to fix the imbalance by moving load onto another > > rq of the group. In your case, the imbalance has disappeared because > > it has already been pulled by another rq so you don't have to try to > > find another imbalance. And I would even say you should not in order > > to let other level to take a chance to spread the load > > > >> level, we have to wait the next softirq entry to handle that imbalance. This > >> means the tasks on the second busiest runqueue have to stay longer, which could > >> introduce tail latency as well. That's why I introduced a variable to control > >> the redo loops. I'll send this to the benchmark queue to see if it makes any > > > > TBH, I don't like multiplying the number of knobs > > Sure, I can take your approach, :) > > >>> > >>> --- a/kernel/sched/fair.c > >>> +++ b/kernel/sched/fair.c > >>> @@ -7688,6 +7688,16 @@ static int detach_tasks(struct lb_env *env) > >>> > >>> lockdep_assert_held(&env->src_rq->lock); > >>> > >>> + /* > >>> + * Another CPU has emptied this runqueue in the meantime. > >>> + * Just return and leave the load_balance properly. > >>> + */ > >>> + if (env->src_rq->nr_running <= 1 && !env->loop) { > > May I know why !env->loop is needed here? IIUC, if detach_tasks is invoked IIRC, my point was to do the test only when trying to detach the 1st task. A lot of things can happen when a break is involved but TBH I can't remember a precise UC. It may be over cautious > from LBF_NEED_BREAK, env->loop could be non-zero, but as long as src_rq's > nr_running <=1, we should return immediately with LBF_ALL_PINNED flag cleared. > > How about the following change? > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 04a3ce20da67..1761d33accaa 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7683,8 +7683,11 @@ static int detach_tasks(struct lb_env *env) > * We don't want to steal all, otherwise we may be treated likewise, > * which could at worst lead to a livelock crash. > */ > - if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) > + if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) { IMO, we must do the test before: while (!list_empty(tasks)) { because src_rq might have become empty if waiting tasks have been pulled by another cpu and the running one became idle in the meantime > + /* Clear the flag as we will not test any task */ > + env->flag &= ~LBF_ALL_PINNED; > break; > + } > > p = list_last_entry(tasks, struct task_struct, se.group_node); > > Thanks, > -Aubrey