Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2139203pxb; Mon, 22 Feb 2021 22:25:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJwxF0s1TMxCmjbOtQhdGpbyeo/HXdQ/ozwst2r1srCRtlUyoHswJW145u7Jmqu72w5JMHGJ X-Received: by 2002:a05:6402:893:: with SMTP id e19mr16742411edy.206.1614061517728; Mon, 22 Feb 2021 22:25:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614061517; cv=none; d=google.com; s=arc-20160816; b=MJ40FayvXL6oXlg0dF6ADNGsF1dilDsVzbKGc9dIPYpp0+WO9coGa77yTF2U+B3pl0 0m5WaQ9CyqDgWgDU/vI7ivs2kNn+qzTvGRRexddyYJ9tlJtJZ3Mmm6O8jLJW6M+Z7/or 12le7syMprdfNfYKnnybY0cjQzyv0KCVyvw17aEOccyeS98H0nQXPmmDav2cHi0Cwwjd /z7IAx1zEV0udPzicD4VcdALI9cb7nDrOGem/dDCgiNGDMriP87D3PDhiPTh+ysAPQrS xcHNIIbyfBpS+a4OWx3UEi7/KyTUdTKbU62WjjiHdh3ZkHImQRrUf+xGlWl3GTJrYSj5 wZjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:ironport-sdr:ironport-sdr; bh=Gb+YcygLeZnBZd/PhQdiWCgfFqMT+9tQqPwbBGWJ+P8=; b=sV1xBhS6ZszqM8+sXX6rgCypk4H6ZRJTByge8pHrPSaP7yUWZ3es/awH3ACj7+ddAz LRchTJtu07B8WCbmFZheB9NTw+eXRX+reXO0OAgPAgcb3EGHTKtKdWH34AoD8bju/xy+ IEXJoDTWWbGOYHim96mMKecuKQdM+pR8Pu+8T47SMtLD6BdLxjYw7zriX6EhYyhJvnPV z40BcPacYyNlsZ2/qYKa5dD26dC1CRvjGwvVAKAA7viRol0NE5/VieIF14Z4hQBHvL/8 FORNaCN4vL83goU+UnHZQmSHTioAobCL+Zs+SRv2bOoMPsHE6J1txujHchyMqUcIKRCh jWQQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h1si2478346ejb.448.2021.02.22.22.24.50; Mon, 22 Feb 2021 22:25:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231134AbhBWFnm (ORCPT + 99 others); Tue, 23 Feb 2021 00:43:42 -0500 Received: from mga18.intel.com ([134.134.136.126]:36790 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230060AbhBWFnl (ORCPT ); Tue, 23 Feb 2021 00:43:41 -0500 IronPort-SDR: g+oyo/3WWySuM29wub5QOF//C+Z8dmq5tHfH+1XCNqHG1qiBqolJTFIPWbbzmwg8HbK9OjOj/n EhKCnOU+7gJA== X-IronPort-AV: E=McAfee;i="6000,8403,9903"; a="172352142" X-IronPort-AV: E=Sophos;i="5.81,199,1610438400"; d="scan'208";a="172352142" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Feb 2021 21:41:54 -0800 IronPort-SDR: eKRRZd1CTz9ylWia3EWK3WKqnmUlQrJ1jFvLEwSNJZprSV/G5xVkm/i8lYCxOdzl41N+ydahNY ITM/xOfFEEBA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,199,1610438400"; d="scan'208";a="432536197" Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.125]) ([10.239.161.125]) by fmsmga002.fm.intel.com with ESMTP; 22 Feb 2021 21:41:50 -0800 Subject: Re: [RFC PATCH v1] sched/fair: limit load balance redo times at the same sched_domain level To: Vincent Guittot Cc: Aubrey Li , Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , linux-kernel , Andi Kleen , Tim Chen , Srinivas Pandruvada , "Rafael J . Wysocki" References: <1611554578-6464-1-git-send-email-aubrey.li@intel.com> From: "Li, Aubrey" Message-ID: Date: Tue, 23 Feb 2021 13:41:49 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Vincent, Sorry for the delay, I just returned from Chinese New Year holiday. On 2021/1/25 22:51, Vincent Guittot wrote: > On Mon, 25 Jan 2021 at 15:00, Li, Aubrey wrote: >> >> On 2021/1/25 18:56, Vincent Guittot wrote: >>> On Mon, 25 Jan 2021 at 06:50, Aubrey Li wrote: >>>> >>>> A long-tail load balance cost is observed on the newly idle path, >>>> this is caused by a race window between the first nr_running check >>>> of the busiest runqueue and its nr_running recheck in detach_tasks. >>>> >>>> Before the busiest runqueue is locked, the tasks on the busiest >>>> runqueue could be pulled by other CPUs and nr_running of the busiest >>>> runqueu becomes 1, this causes detach_tasks breaks with LBF_ALL_PINNED >>> >>> We should better detect that when trying to detach task like below >> >> This should be a compromise from my understanding. If we give up load balance >> this time due to the race condition, we do reduce the load balance cost on the >> newly idle path, but if there is an imbalance indeed at the same sched_domain > > Redo path is there in case, LB has found an imbalance but it can't > move some loads from this busiest rq to dest rq because of some cpu > affinity. So it tries to fix the imbalance by moving load onto another > rq of the group. In your case, the imbalance has disappeared because > it has already been pulled by another rq so you don't have to try to > find another imbalance. And I would even say you should not in order > to let other level to take a chance to spread the load > >> level, we have to wait the next softirq entry to handle that imbalance. This >> means the tasks on the second busiest runqueue have to stay longer, which could >> introduce tail latency as well. That's why I introduced a variable to control >> the redo loops. I'll send this to the benchmark queue to see if it makes any > > TBH, I don't like multiplying the number of knobs Sure, I can take your approach, :) >>> >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -7688,6 +7688,16 @@ static int detach_tasks(struct lb_env *env) >>> >>> lockdep_assert_held(&env->src_rq->lock); >>> >>> + /* >>> + * Another CPU has emptied this runqueue in the meantime. >>> + * Just return and leave the load_balance properly. >>> + */ >>> + if (env->src_rq->nr_running <= 1 && !env->loop) { May I know why !env->loop is needed here? IIUC, if detach_tasks is invoked from LBF_NEED_BREAK, env->loop could be non-zero, but as long as src_rq's nr_running <=1, we should return immediately with LBF_ALL_PINNED flag cleared. How about the following change? diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 04a3ce20da67..1761d33accaa 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7683,8 +7683,11 @@ static int detach_tasks(struct lb_env *env) * We don't want to steal all, otherwise we may be treated likewise, * which could at worst lead to a livelock crash. */ - if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) + if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) { + /* Clear the flag as we will not test any task */ + env->flag &= ~LBF_ALL_PINNED; break; + } p = list_last_entry(tasks, struct task_struct, se.group_node); Thanks, -Aubrey