Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp3767930pxt; Tue, 10 Aug 2021 10:52:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzhWBhzlaLQbsuPSM2DyAmaNEz2rWfh1QPqrNKpllOIW5FwzTyjmZ78qamL1XVf9xLkKJTA X-Received: by 2002:a02:4e04:: with SMTP id r4mr1493909jaa.99.1628617923485; Tue, 10 Aug 2021 10:52:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628617923; cv=none; d=google.com; s=arc-20160816; b=rWJjFdoo5QDPhxAnKcEln5DpdztNTiWFp7xm8H7GwyRIEXY2z9ZzoBy/k9VZI/yLnz xt7vpuOSK4iBeh+8qpOIfsA8XGFlltFqDjktxE72qGzE+RGZYTNAau6xHy+la4HTGYmA YPjCkoWK8H1HaJwEqxGC4H/MpT4xGFQcme2dtDnuVtev9UsHqPhD/qtRbEfE2zEgjH6m nVwwXmPCfOcfZpy6LeZOJ2EPkOM/RUJl+vGu4DqNVMomBbyCRumGEkoU7vnHVtvy8Hw0 c3eiOTurHJfTGtwoGJohWjGHh4W4vCrxgilDAIRMvZIGfl1ilBzCZh/IA5VY/Xc0jQKA o/sg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=FNKn+Wr83CyZJfB+OXBo25Z4JOpKmRvGsycDJ0YC9pQ=; b=ADoxIdIMnN6OiK6DkVqu0gx6KOyyh3o/nJyu/eApBiqE9A9yH8vL2njwZdtCA5JKor u1jNwPdc1MzFQICfiqSrqFAbwVnpuGSQv36A7Xsk9MMKWbuvWI2IqnvfXorAcf2q0+RY CdTfsLfpA+FhvpYNFqtkrd50nF6fRsPz7v1FiwRRIS3ycK86XiSDM+kNtJr/l8sAd/nQ MOuCcfdrXgR12suC9HefVrXdhVLE5AGCjI06equGJ0hFVVe2PaiJpnw4rIbCqIJEQ5Da 5mO5wnNye83qZRhVHnq4gOP8eqKIoPLpVl0LHifal8sNVH3Z+1YTwqsWaVL6fY0e5jRl 2leA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=CWQZOCrA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k20si22865955iog.64.2021.08.10.10.51.52; Tue, 10 Aug 2021 10:52:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=CWQZOCrA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235956AbhHJRvK (ORCPT + 99 others); Tue, 10 Aug 2021 13:51:10 -0400 Received: from mail.kernel.org ([198.145.29.99]:51520 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235579AbhHJRs2 (ORCPT ); Tue, 10 Aug 2021 13:48:28 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 3EEE260FDA; Tue, 10 Aug 2021 17:41:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1628617276; bh=Z6uk/MxNraqrCh3CIVGMJLGjfWG3EMTixpmu8Jl6KSo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CWQZOCrAdrfYDf/wKesxdExB69hUhuhIeivku7OtSQJ0it3/Z7jPUCxK2dkKSowUE 4xmJ2l05KcU9x3lf9oobFy+8yoYkKFjYK0jZMhymMdGwHlYSqpz85HQvJ8h53rFK2t RzNOYAmRlHgCjZDvjK6hwHvTLgywX+OQum1aQ3Rg= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Mark Simmons , Juri Lelli , "Peter Zijlstra (Intel)" Subject: [PATCH 5.10 124/135] sched/rt: Fix double enqueue caused by rt_effective_prio Date: Tue, 10 Aug 2021 19:30:58 +0200 Message-Id: <20210810173000.019310392@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210810172955.660225700@linuxfoundation.org> References: <20210810172955.660225700@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Peter Zijlstra commit f558c2b834ec27e75d37b1c860c139e7b7c3a8e4 upstream. Double enqueues in rt runqueues (list) have been reported while running a simple test that spawns a number of threads doing a short sleep/run pattern while being concurrently setscheduled between rt and fair class. WARNING: CPU: 3 PID: 2825 at kernel/sched/rt.c:1294 enqueue_task_rt+0x355/0x360 CPU: 3 PID: 2825 Comm: setsched__13 RIP: 0010:enqueue_task_rt+0x355/0x360 Call Trace: __sched_setscheduler+0x581/0x9d0 _sched_setscheduler+0x63/0xa0 do_sched_setscheduler+0xa0/0x150 __x64_sys_sched_setscheduler+0x1a/0x30 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae list_add double add: new=ffff9867cb629b40, prev=ffff9867cb629b40, next=ffff98679fc67ca0. kernel BUG at lib/list_debug.c:31! invalid opcode: 0000 [#1] PREEMPT_RT SMP PTI CPU: 3 PID: 2825 Comm: setsched__13 RIP: 0010:__list_add_valid+0x41/0x50 Call Trace: enqueue_task_rt+0x291/0x360 __sched_setscheduler+0x581/0x9d0 _sched_setscheduler+0x63/0xa0 do_sched_setscheduler+0xa0/0x150 __x64_sys_sched_setscheduler+0x1a/0x30 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae __sched_setscheduler() uses rt_effective_prio() to handle proper queuing of priority boosted tasks that are setscheduled while being boosted. rt_effective_prio() is however called twice per each __sched_setscheduler() call: first directly by __sched_setscheduler() before dequeuing the task and then by __setscheduler() to actually do the priority change. If the priority of the pi_top_task is concurrently being changed however, it might happen that the two calls return different results. If, for example, the first call returned the same rt priority the task was running at and the second one a fair priority, the task won't be removed by the rt list (on_list still set) and then enqueued in the fair runqueue. When eventually setscheduled back to rt it will be seen as enqueued already and the WARNING/BUG be issued. Fix this by calling rt_effective_prio() only once and then reusing the return value. While at it refactor code as well for clarity. Concurrent priority inheritance handling is still safe and will eventually converge to a new state by following the inheritance chain(s). Fixes: 0782e63bc6fe ("sched: Handle priority boosted tasks proper in setscheduler()") [squashed Peterz changes; added changelog] Reported-by: Mark Simmons Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210803104501.38333-1-juri.lelli@redhat.com Signed-off-by: Greg Kroah-Hartman --- kernel/sched/core.c | 90 ++++++++++++++++++++-------------------------------- 1 file changed, 35 insertions(+), 55 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1598,12 +1598,18 @@ void deactivate_task(struct rq *rq, stru dequeue_task(rq, p, flags); } -/* - * __normal_prio - return the priority that is based on the static prio - */ -static inline int __normal_prio(struct task_struct *p) +static inline int __normal_prio(int policy, int rt_prio, int nice) { - return p->static_prio; + int prio; + + if (dl_policy(policy)) + prio = MAX_DL_PRIO - 1; + else if (rt_policy(policy)) + prio = MAX_RT_PRIO - 1 - rt_prio; + else + prio = NICE_TO_PRIO(nice); + + return prio; } /* @@ -1615,15 +1621,7 @@ static inline int __normal_prio(struct t */ static inline int normal_prio(struct task_struct *p) { - int prio; - - if (task_has_dl_policy(p)) - prio = MAX_DL_PRIO-1; - else if (task_has_rt_policy(p)) - prio = MAX_RT_PRIO-1 - p->rt_priority; - else - prio = __normal_prio(p); - return prio; + return __normal_prio(p->policy, p->rt_priority, PRIO_TO_NICE(p->static_prio)); } /* @@ -3248,7 +3246,7 @@ int sched_fork(unsigned long clone_flags } else if (PRIO_TO_NICE(p->static_prio) < 0) p->static_prio = NICE_TO_PRIO(0); - p->prio = p->normal_prio = __normal_prio(p); + p->prio = p->normal_prio = p->static_prio; set_load_weight(p, false); /* @@ -4799,6 +4797,18 @@ int default_wake_function(wait_queue_ent } EXPORT_SYMBOL(default_wake_function); +static void __setscheduler_prio(struct task_struct *p, int prio) +{ + if (dl_prio(prio)) + p->sched_class = &dl_sched_class; + else if (rt_prio(prio)) + p->sched_class = &rt_sched_class; + else + p->sched_class = &fair_sched_class; + + p->prio = prio; +} + #ifdef CONFIG_RT_MUTEXES static inline int __rt_effective_prio(struct task_struct *pi_task, int prio) @@ -4914,22 +4924,19 @@ void rt_mutex_setprio(struct task_struct } else { p->dl.pi_se = &p->dl; } - p->sched_class = &dl_sched_class; } else if (rt_prio(prio)) { if (dl_prio(oldprio)) p->dl.pi_se = &p->dl; if (oldprio < prio) queue_flag |= ENQUEUE_HEAD; - p->sched_class = &rt_sched_class; } else { if (dl_prio(oldprio)) p->dl.pi_se = &p->dl; if (rt_prio(oldprio)) p->rt.timeout = 0; - p->sched_class = &fair_sched_class; } - p->prio = prio; + __setscheduler_prio(p, prio); if (queued) enqueue_task(rq, p, queue_flag); @@ -5162,35 +5169,6 @@ static void __setscheduler_params(struct set_load_weight(p, true); } -/* Actually do priority change: must hold pi & rq lock. */ -static void __setscheduler(struct rq *rq, struct task_struct *p, - const struct sched_attr *attr, bool keep_boost) -{ - /* - * If params can't change scheduling class changes aren't allowed - * either. - */ - if (attr->sched_flags & SCHED_FLAG_KEEP_PARAMS) - return; - - __setscheduler_params(p, attr); - - /* - * Keep a potential priority boosting if called from - * sched_setscheduler(). - */ - p->prio = normal_prio(p); - if (keep_boost) - p->prio = rt_effective_prio(p, p->prio); - - if (dl_prio(p->prio)) - p->sched_class = &dl_sched_class; - else if (rt_prio(p->prio)) - p->sched_class = &rt_sched_class; - else - p->sched_class = &fair_sched_class; -} - /* * Check the target process has a UID that matches the current process's: */ @@ -5211,10 +5189,8 @@ static int __sched_setscheduler(struct t const struct sched_attr *attr, bool user, bool pi) { - int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 : - MAX_RT_PRIO - 1 - attr->sched_priority; - int retval, oldprio, oldpolicy = -1, queued, running; - int new_effective_prio, policy = attr->sched_policy; + int oldpolicy = -1, policy = attr->sched_policy; + int retval, oldprio, newprio, queued, running; const struct sched_class *prev_class; struct rq_flags rf; int reset_on_fork; @@ -5412,6 +5388,7 @@ change: p->sched_reset_on_fork = reset_on_fork; oldprio = p->prio; + newprio = __normal_prio(policy, attr->sched_priority, attr->sched_nice); if (pi) { /* * Take priority boosted tasks into account. If the new @@ -5420,8 +5397,8 @@ change: * the runqueue. This will be done when the task deboost * itself. */ - new_effective_prio = rt_effective_prio(p, newprio); - if (new_effective_prio == oldprio) + newprio = rt_effective_prio(p, newprio); + if (newprio == oldprio) queue_flags &= ~DEQUEUE_MOVE; } @@ -5434,7 +5411,10 @@ change: prev_class = p->sched_class; - __setscheduler(rq, p, attr, pi); + if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { + __setscheduler_params(p, attr); + __setscheduler_prio(p, newprio); + } __setscheduler_uclamp(p, attr); if (queued) {