Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Subject: Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism
 for !fit case
To:     Qais Yousef <qais.yousef@arm.com>, Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Steven Rostedt <rostedt@goodmis.org>,
        Pavan Kondeti <pkondeti@codeaurora.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc:     Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
        linux-kernel@vger.kernel.org
References: <20200214163949.27850-1-qais.yousef@arm.com>
 <20200214163949.27850-2-qais.yousef@arm.com>
From:   Valentin Schneider <valentin.schneider@arm.com>
Message-ID: <3feb31bb-3412-b38c-07a3-136433c87e66@arm.com>
Date:   Mon, 17 Feb 2020 17:07:43 +0000
User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:68.0) Gecko/20100101
 Thunderbird/68.4.1
MIME-Version: 1.0
In-Reply-To: <20200214163949.27850-2-qais.yousef@arm.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On 2/14/20 4:39 PM, Qais Yousef wrote:
> When searching for the best lowest_mask with a fitness_fn passed, make
> sure we record the lowest_level that returns a valid lowest_mask so that
> we can use that as a fallback in case we fail to find a fitting CPU at
> all levels.
> 
> The intention in the original patch was not to allow a down migration to
> unfitting CPU. But this missed the case where we are already running on
> unfitting one.
> 
> With this change now RT tasks can still move between unfitting CPUs when
> they're already running on such CPU.
> 
> And as Steve suggested; to adhere to the strict priority rules of RT, if
> a task is already running on a fitting CPU but due to priority it can't
> run on it, allow it to downmigrate to unfitting CPU so it can run.
> 
> Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
>  kernel/sched/cpupri.c | 157 +++++++++++++++++++++++++++---------------
>  1 file changed, 101 insertions(+), 56 deletions(-)
> 
> diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> index 1a2719e1350a..1bcfa1995550 100644
> --- a/kernel/sched/cpupri.c
> +++ b/kernel/sched/cpupri.c
> @@ -41,6 +41,59 @@ static int convert_prio(int prio)
>  	return cpupri;
>  }
>  
> +static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
> +				struct cpumask *lowest_mask, int idx)
> +{
> +	struct cpupri_vec *vec  = &cp->pri_to_cpu[idx];
> +	int skip = 0;
> +
> +	if (!atomic_read(&(vec)->count))
> +		skip = 1;
> +	/*
> +	 * When looking at the vector, we need to read the counter,
> +	 * do a memory barrier, then read the mask.
> +	 *
> +	 * Note: This is still all racey, but we can deal with it.
> +	 *  Ideally, we only want to look at masks that are set.
> +	 *
> +	 *  If a mask is not set, then the only thing wrong is that we
> +	 *  did a little more work than necessary.
> +	 *
> +	 *  If we read a zero count but the mask is set, because of the
> +	 *  memory barriers, that can only happen when the highest prio
> +	 *  task for a run queue has left the run queue, in which case,
> +	 *  it will be followed by a pull. If the task we are processing
> +	 *  fails to find a proper place to go, that pull request will
> +	 *  pull this task if the run queue is running at a lower
> +	 *  priority.
> +	 */
> +	smp_rmb();
> +
> +	/* Need to do the rmb for every iteration */
> +	if (skip)
> +		return 0;
> +
> +	if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids)
> +		return 0;
> +
> +	if (lowest_mask) {
> +		cpumask_and(lowest_mask, p->cpus_ptr, vec->mask);
> +
> +		/*
> +		 * We have to ensure that we have at least one bit
> +		 * still set in the array, since the map could have
> +		 * been concurrently emptied between the first and
> +		 * second reads of vec->mask.  If we hit this
> +		 * condition, simply act as though we never hit this
> +		 * priority level and continue on.
> +		 */
> +		if (cpumask_empty(lowest_mask))
> +			return 0;
> +	}
> +
> +	return 1;
> +}
> +

Just a drive-by comment; could you split that code move into its own patch?
It'd make the history a bit easier to read IMO.