Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp529136ybn; Wed, 2 Oct 2019 01:58:46 -0700 (PDT) X-Google-Smtp-Source: APXvYqyG+x686K4/dLBJ/OXP8wyEemGVvGvH8dNxjQmrrt2y/kxx6fFsSUUkeOcgRkFnpumTT/1f X-Received: by 2002:aa7:c897:: with SMTP id p23mr2515306eds.199.1570006726613; Wed, 02 Oct 2019 01:58:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570006726; cv=none; d=google.com; s=arc-20160816; b=klquvZ3Y2L3mX2wnKHgPJHdZXEGWiClWdE7/ZVCRhgz9X+Me0ZdeVUAPcb7IFxxK6y pxFjgQG/KimhTIOigf5Wt0+pP2f7nKaTpEOvL74mu6RmomnvCJ7jXYVrfMCHTMY0BxdC 1VPMsG6CMsYUmRCgwYfXDoOZN+iuNOCt0tzUdTRIenSDsuGpc2ZNBua3mJDIccHrEmb1 tvTbpRcurVfjR3hXDPZb6RjasO2OgT4coAvN/7aJT5TsoMvAqjX6kVl4TJ4WuA/AQ81F hMh5B0V+tL/bc1mGw87Tdyxo1N8pG8BG3uWupX3W2S1MVus/RU/2DVI/vh3PjOUA9Ulo 673g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Wxn26f7esoBnXGAUeDxZNvmoii+91yOfIb6hiRrAjvc=; b=Z9WfDyOCFocV9scWbbvCp5DtnDZx1o9tRwBJaZy2dWUliQ10TVb6RBTsy8XenSoikk g9dWj97/xLolULNkx0li4qYp9ow9XAaRGxpSsnnU5otWyVs4zqWyt0A10n02tphYxBSn sA+RT/m28URNYP4k/yJIAbQVK9CS2Iy2HJutL4AEVjUj6Fg/uNEJp31OYRaHl3GiAiXd R5433RahAyfWOYuK1zOoOL1ugIe2uCe0DA0iIyggFebxozGNX/QXhlnasZo7F7N1mMf4 PFIwwtnAQ4Uxum6YdBThlJ93IAFYJL3qR1D6wzETb9rhUEG0SoCrz02NGsBkdxaPVysI hfUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=s98jtjqm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q30si11765599eda.5.2019.10.02.01.58.22; Wed, 02 Oct 2019 01:58:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=s98jtjqm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727764AbfJBIXb (ORCPT + 99 others); Wed, 2 Oct 2019 04:23:31 -0400 Received: from mail-lf1-f67.google.com ([209.85.167.67]:38570 "EHLO mail-lf1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726073AbfJBIXa (ORCPT ); Wed, 2 Oct 2019 04:23:30 -0400 Received: by mail-lf1-f67.google.com with SMTP id u28so12036032lfc.5 for ; Wed, 02 Oct 2019 01:23:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Wxn26f7esoBnXGAUeDxZNvmoii+91yOfIb6hiRrAjvc=; b=s98jtjqmZwOQzkT8s+OYqj1z/yNACHb9TzVhHMDOxLtgc2+5rXIdQHwUH6VHzW9yAr u1w/QHSMLNeEPwyVCQG+KchVrG5OCTPMAU7aD2XgUP/fABs7YSmVQu+2TIJxZdkSTRB5 d5gPOmubutulTOWqio61D0GFPltEhV3kR4sUB1aR/PSTcv78TfArlVTFIh5n36VNQvnb U9HAj/hgLiZyHXU6jHEUDzWY0D+gh1t9b3mmHGRHoGdrtqGZ5N03lR7a5JTwDDUPiRO4 RanjcK4FMU+dNa7EcKKreGIA2OicrFu2Ekt2K8oNNSOyxxR9xLPrZglWcaWlxnKDhJBz Wwuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Wxn26f7esoBnXGAUeDxZNvmoii+91yOfIb6hiRrAjvc=; b=sF9LYiwmVeoepuwbeMCOeF8HV3whWD5nAlju8Jst7GHqrL4N7bJol0z3MhJioDBfHn JICMZVGAeLKTOshFUnBcG2o0vfcCMl9CNPEgGl4xzLlHQttzcYQDeb3CuqM3G+PahbfN cfp4O9ImppP6cWoG43X46ZrnJGDjFS1VTfjViLj6D2fiR8ks3mCc//1aMVqrTNXRaM7X fqJLl7OjkMa+ahG9tVtoqt25Td7YmJVOkuhROO+06qrSO3ljUY6f4OwsBxilhGrpes/r alN1f8XaEB7yDFDJeL5YrdAwS+NuQR5aCyWlKxbORI870h27/3oUU+IoeHTT4YvaITbu 7wyg== X-Gm-Message-State: APjAAAUbwJIM5onmK6Cqdq5BQOr+0YnGTbRBxe5x5PFBqF1onGQ75x9I 7UHtOB1YCrsOmcVAexPcKRBAju8mMEd53P5jQ+dIkA== X-Received: by 2002:a05:6512:304:: with SMTP id t4mr1501062lfp.15.1570004608224; Wed, 02 Oct 2019 01:23:28 -0700 (PDT) MIME-Version: 1.0 References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> <1568878421-12301-5-git-send-email-vincent.guittot@linaro.org> <9bfb3252-c268-8c0c-9c72-65f872e9c8b2@arm.com> <3dca46c5-c395-e2b3-a7e8-e9208ba741c8@arm.com> In-Reply-To: <3dca46c5-c395-e2b3-a7e8-e9208ba741c8@arm.com> From: Vincent Guittot Date: Wed, 2 Oct 2019 10:23:16 +0200 Message-ID: Subject: Re: [PATCH v3 04/10] sched/fair: rework load_balance To: Dietmar Eggemann Cc: linux-kernel , Ingo Molnar , Peter Zijlstra , Phil Auld , Valentin Schneider , Srikar Dronamraju , Quentin Perret , Morten Rasmussen , Hillf Danton Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 1 Oct 2019 at 18:53, Dietmar Eggemann wrote: > > On 01/10/2019 10:14, Vincent Guittot wrote: > > On Mon, 30 Sep 2019 at 18:24, Dietmar Eggemann wrote: > >> > >> Hi Vincent, > >> > >> On 19/09/2019 09:33, Vincent Guittot wrote: > [...] > > >>> + if (busiest->group_weight == 1 || sds->prefer_sibling) { > >>> + /* > >>> + * When prefer sibling, evenly spread running tasks on > >>> + * groups. > >>> + */ > >>> + env->balance_type = migrate_task; > >>> + env->imbalance = (busiest->sum_h_nr_running - local->sum_h_nr_running) >> 1; > >>> + return; > >>> + } > >>> + > >>> + /* > >>> + * If there is no overload, we just want to even the number of > >>> + * idle cpus. > >>> + */ > >>> + env->balance_type = migrate_task; > >>> + env->imbalance = max_t(long, 0, (local->idle_cpus - busiest->idle_cpus) >> 1); > >> > >> Why do we need a max_t(long, 0, ...) here and not for the 'if > >> (busiest->group_weight == 1 || sds->prefer_sibling)' case? > > > > For env->imbalance = (busiest->sum_h_nr_running - local->sum_h_nr_running) >> 1; > > > > either we have sds->prefer_sibling && busiest->sum_nr_running > > > local->sum_nr_running + 1 > > I see, this corresponds to > > /* Try to move all excess tasks to child's sibling domain */ > if (sds.prefer_sibling && local->group_type == group_has_spare && > busiest->sum_h_nr_running > local->sum_h_nr_running + 1) > goto force_balance; > > in find_busiest_group, I assume. yes. But it seems that I missed a case: prefer_sibling is set busiest->sum_h_nr_running <= local->sum_h_nr_running + 1 so we skip goto force_balance above But env->idle != CPU_NOT_IDLE and local->idle_cpus > (busiest->idle_cpus + 1) so we also skip goto out_balance and finally call calculate_imbalance() in calculate_imbalance with prefer_sibling set, imbalance = (busiest->sum_h_nr_running - local->sum_h_nr_running) >> 1; so we probably want something similar to max_t(long, 0, (busiest->sum_h_nr_running - local->sum_h_nr_running) >> 1) > > Haven't been able to recreate this yet on my arm64 platform since there > is no prefer_sibling and in case local and busiest have > group_type=group_has_spare they bailout in > > if (busiest->group_type != group_overloaded && > (env->idle == CPU_NOT_IDLE || > local->idle_cpus <= (busiest->idle_cpus + 1))) > goto out_balanced; > > > [...] > > >>> - if (busiest->group_type == group_overloaded && > >>> - local->group_type == group_overloaded) { > >>> - load_above_capacity = busiest->sum_h_nr_running * SCHED_CAPACITY_SCALE; > >>> - if (load_above_capacity > busiest->group_capacity) { > >>> - load_above_capacity -= busiest->group_capacity; > >>> - load_above_capacity *= scale_load_down(NICE_0_LOAD); > >>> - load_above_capacity /= busiest->group_capacity; > >>> - } else > >>> - load_above_capacity = ~0UL; > >>> + if (local->group_type < group_overloaded) { > >>> + /* > >>> + * Local will become overloaded so the avg_load metrics are > >>> + * finally needed. > >>> + */ > >> > >> How does this relate to the decision_matrix[local, busiest] (dm[])? E.g. > >> dm[overload, overload] == avg_load or dm[fully_busy, overload] == force. > >> It would be nice to be able to match all allowed fields of dm to code sections. > > > > decision_matrix describes how it decides between balanced or unbalanced. > > In case of dm[overload, overload], we use the avg_load to decide if it > > is balanced or not > > OK, that's why you calculate sgs->avg_load in update_sg_lb_stats() only > for 'sgs->group_type == group_overloaded'. > > > In case of dm[fully_busy, overload], the groups are unbalanced because > > fully_busy < overload and we force the balance. Then > > calculate_imbalance() uses the avg_load to decide how much will be > > moved > > And in this case 'local->group_type < group_overloaded' in > calculate_imbalance(), 'local->avg_load' and 'sds->avg_load' have to be > calculated before using them in env->imbalance = min(...). > > OK, got it now. > > > dm[overload, overload]=force means that we force the balance and we > > will compute later the imbalance. avg_load may be used to calculate > > the imbalance > > dm[overload, overload]=avg_load means that we compare the avg_load to > > decide whether we need to balance load between groups > > dm[overload, overload]=nr_idle means that we compare the number of > > idle cpus to decide whether we need to balance. In fact this is no > > more true with patch 7 because we also take into account the number of > > nr_h_running when weight =1 > > This becomes clearer now ... slowly. > > [...]