From: bsegall@google.com
To: Yuyang Du <yuyang.du@intel.com>
Cc: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org,
        pjt@google.com, morten.rasmussen@arm.com, vincent.guittot@linaro.org,
        dietmar.eggemann@arm.com, lizefan@huawei.com, umgwanakikbuti@gmail.com
Subject: Re: [PATCH RESEND v2 6/6] sched/fair: Remove unconditionally inactive code
References: <1459369015-28375-1-git-send-email-yuyang.du@intel.com>
	<1459369015-28375-7-git-send-email-yuyang.du@intel.com>
Date: Thu, 31 Mar 2016 10:53:02 -0700
In-Reply-To: <1459369015-28375-7-git-send-email-yuyang.du@intel.com> (Yuyang
	Du's message of "Thu, 31 Mar 2016 04:16:55 +0800")
Message-ID: <xm26h9fma60x.fsf@bsegall-linux.mtv.corp.google.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5449
Lines: 125

Yuyang Du <yuyang.du@intel.com> writes:

> The increased load resolution (fixed point arithmetic range) is
> unconditionally deactivated with #if 0, so it is effectively broken.
>
> But the increased load range is still used somewhere (e.g., in Google),
> so we keep this feature. The reconciliation is we define
> CONFIG_CFS_INCREASE_LOAD_RANGE and it depends on FAIR_GROUP_SCHED and
> 64BIT and BROKEN.
>
> Suggested-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>

The title of this patch "Remove unconditionally inactive code" is
misleading since it's more like giving it a CONFIG.

Also as a side note, does anyone remember/have a test for whatever got
it turned off to begin with, given all the changes in load tracking and
the load balancer and everything else?

> ---
>  init/Kconfig         | 16 +++++++++++++++
>  kernel/sched/sched.h | 55 +++++++++++++++++++++-------------------------------
>  2 files changed, 38 insertions(+), 33 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 2232080..d072c09 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1025,6 +1025,22 @@ config CFS_BANDWIDTH
>  	  restriction.
>  	  See tip/Documentation/scheduler/sched-bwc.txt for more information.
>  
> +config CFS_INCREASE_LOAD_RANGE
> +	bool "Increase kernel load range"
> +	depends on 64BIT && BROKEN
> +	default n
> +	help
> +	  Increase resolution of nice-level calculations for 64-bit architectures.
> +	  The extra resolution improves shares distribution and load balancing of
> +	  low-weight task groups (eg. nice +19 on an autogroup), deeper taskgroup
> +	  hierarchies, especially on larger systems. This is not a user-visible change
> +	  and does not change the user-interface for setting shares/weights.
> +	  We increase resolution only if we have enough bits to allow this increased
> +	  resolution (i.e. BITS_PER_LONG > 32). The costs for increasing resolution
> +	  when BITS_PER_LONG <= 32 are pretty high and the returns do not justify the
> +	  increased costs.
> +	  Currently broken: it increases power usage under light load.
> +
>  config RT_GROUP_SCHED
>  	bool "Group scheduling for SCHED_RR/FIFO"
>  	depends on CGROUP_SCHED
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index ebe16e3..1bb0d69 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -42,39 +42,6 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>  #define NS_TO_JIFFIES(TIME)	((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
>  
>  /*
> - * Increase resolution of nice-level calculations for 64-bit architectures.
> - * The extra resolution improves shares distribution and load balancing of
> - * low-weight task groups (eg. nice +19 on an autogroup), deeper taskgroup
> - * hierarchies, especially on larger systems. This is not a user-visible change
> - * and does not change the user-interface for setting shares/weights.
> - *
> - * We increase resolution only if we have enough bits to allow this increased
> - * resolution (i.e. BITS_PER_LONG > 32). The costs for increasing resolution
> - * when BITS_PER_LONG <= 32 are pretty high and the returns do not justify the
> - * increased costs.
> - */
> -#if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
> -# define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
> -# define user_to_kernel_load(w)	((w) << SCHED_FIXEDPOINT_SHIFT)
> -# define kernel_to_user_load(w)	((w) >> SCHED_FIXEDPOINT_SHIFT)
> -#else
> -# define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT)
> -# define user_to_kernel_load(w)	(w)
> -# define kernel_to_user_load(w)	(w)
> -#endif
> -
> -/*
> - * Task weight (visible to user) and its load (invisible to user) have
> - * independent resolution, but they should be well calibrated. We use
> - * user_to_kernel_load() and kernel_to_user_load(w) to convert between
> - * them. The following must be true:
> - *
> - * user_to_kernel_load(sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
> - * kernel_to_user_load(NICE_0_LOAD) == sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]
> - */
> -#define NICE_0_LOAD		(1L << NICE_0_LOAD_SHIFT)
> -
> -/*
>   * Single value that decides SCHED_DEADLINE internal math precision.
>   * 10 -> just above 1us
>   * 9  -> just above 0.5us
> @@ -1150,6 +1117,28 @@ extern const int sched_prio_to_weight[40];
>  extern const u32 sched_prio_to_wmult[40];
>  
>  /*
> + * Task weight (visible to user) and its load (invisible to user) have
> + * independent ranges, but they should be well calibrated. We use
> + * user_to_kernel_load() and kernel_to_user_load(w) to convert between
> + * them.
> + *
> + * The following must also be true:
> + * user_to_kernel_load(sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
> + * kernel_to_user_load(NICE_0_LOAD) == sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]
> + */
> +#ifdef CONFIG_CFS_INCREASE_LOAD_RANGE
> +#define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
> +#define user_to_kernel_load(w)	(w << SCHED_FIXEDPOINT_SHIFT)
> +#define kernel_to_user_load(w)	(w >> SCHED_FIXEDPOINT_SHIFT)
> +#else
> +#define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT)
> +#define user_to_kernel_load(w)	(w)
> +#define kernel_to_user_load(w)	(w)
> +#endif
> +
> +#define NICE_0_LOAD		(1UL << NICE_0_LOAD_SHIFT)
> +
> +/*
>   * {de,en}queue flags:
>   *
>   * DEQUEUE_SLEEP  - task is no longer runnable