2021-04-15 14:44:06

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH] sched,psi: fix the 'int' underflow for psi

On Thu, Apr 15, 2021 at 07:59:41PM +0530, Charan Teja Reddy wrote:
> psi_group_cpu->tasks, represented by the unsigned int, stores the number
> of tasks that could be stalled on a psi resource(io/mem/cpu).
> Decrementing these counters at zero leads to wrapping which further
> leads to the psi_group_cpu->state_mask is being set with the respective
> pressure state. This could result into the unnecessary time sampling for
> the pressure state thus cause the spurious psi events. This can further
> lead to wrong actions being taken at the user land based on these psi
> events.
> Though psi_bug is set under these conditions but that just for debug
> purpose. Fix it by decrementing the ->tasks count only when it is
> non-zero.

Makes sense, it's more graceful in the event of a bug.

But what motivates this change? Is it something you hit recently with
an upstream kernel and we should investigate?

> Signed-off-by: Charan Teja Reddy <[email protected]>
> ---
> kernel/sched/psi.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 967732c..f925468 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -718,7 +718,8 @@ static void psi_group_change(struct psi_group *group, int cpu,
> groupc->tasks[3], clear, set);
> psi_bug = 1;
> }
> - groupc->tasks[t]--;
> + if (groupc->tasks[t])
> + groupc->tasks[t]--;

There is already a branch on the tasks to signal the bug. How about:

if (groupc->tasksk[t]) {
groupc->tasks[t]--;
} else if (!psi_bug) {
printk_deferred(...
psi_bug = 1;
}