Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp224345rdb; Thu, 21 Dec 2023 07:30:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IH7ddFbA6KdtgvB/p6+eeBL5At1S+zYq7g2C3DjxwtPz5fmlOZWJZl2B237aaflL/q9mzSu X-Received: by 2002:a05:6102:d89:b0:466:c471:1d3b with SMTP id d9-20020a0561020d8900b00466c4711d3bmr1072694vst.9.1703172658451; Thu, 21 Dec 2023 07:30:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703172658; cv=none; d=google.com; s=arc-20160816; b=RMgCbsfPhdI2VsFDNhTUuZWmWFCO2GsecUZXRq4ks3wDlDgnko2Ahb4B87e4pTxMzN +Hl/C4+Syspt5pGZnufKapdNtTM6ObSOkd3biUzsZcNH+yxd+ZdrSDtnGzdIxBXLkJzS yV0D3fUBg7Vo0GLKp6KrOPblL9mwDPMCFg7e2dUnTQtR0g+Z9YGIWgo48MCfknWDEtCl laGxVrcmRPI0GzGChKbG6sGyRYdtGcNJEkzuTMVJNLyXQSuxWf5Mb+n6TZrPvkCZ43FH /og6HVmlvbJsawokU+JH7534sUs0cfVcM/UJZTES1tUuQ4R1gz1bTUzkyLaQH1wQWrgP JDiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=LJsmX2lg0g/mCdRxCRagfg4xFc0dd0ORxIZPKf7rsvU=; fh=Pmc/PBsyyBed8+TLxl3UdQkQeH0E24Tmq9aeKLBtrcg=; b=teSBGWvDF0SdoCdKlab4w69TBB3THti0BME7veYGUYdNk/xbvUrpZylyvNInShS7oQ KDDQKuXNU1JDOINqNyxp+CD6PH2IqfspVrSme10AitnfhBOs3hBCjhz5xeSnNJZymvG4 KgbxAPyvM3ie1dXhnfP0bTGnQpvuqDRlwEtJGsswcCb+5eJJanBNWo7WVIot46m2PQq0 o8MkNTslyUl5vj2wbFX2lEQ2uA0B56eiq5xVgiLWrOmGBo54dyc7i/fcBoxmeOD+b9yb +17eLQx+0wNy/d7p+JuqacG7ljIdzzvmUgXw6QZG0ZaUWTCGthZpEVjzJIzrSebpMVYn hNrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-8665-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-8665-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id c20-20020a67ce94000000b004669d812975si342431vse.249.2023.12.21.07.30.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 07:30:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-8665-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-8665-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-8665-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 269801C20DA5 for ; Thu, 21 Dec 2023 15:30:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 36EB241C7A; Thu, 21 Dec 2023 15:30:38 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D4ED7539E6 for ; Thu, 21 Dec 2023 15:30:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1DB602F4; Thu, 21 Dec 2023 07:31:20 -0800 (PST) Received: from [10.1.25.50] (e132833.arm.com [10.1.25.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 63C2F3F64C; Thu, 21 Dec 2023 07:30:30 -0800 (PST) Message-ID: <49fb9cc3-c91d-4e9d-b75f-b31a8b5b2a91@arm.com> Date: Thu, 21 Dec 2023 15:30:28 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7 13/23] sched: Start blocked_on chain processing in find_proxy_task() Content-Language: en-US To: John Stultz , LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , kernel-team@android.com, Valentin Schneider References: <20231220001856.3710363-1-jstultz@google.com> <20231220001856.3710363-14-jstultz@google.com> From: Metin Kaya In-Reply-To: <20231220001856.3710363-14-jstultz@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 20/12/2023 12:18 am, John Stultz wrote: > From: Peter Zijlstra > > Start to flesh out the real find_proxy_task() implementation, > but avoid the migration cases for now, in those cases just > deactivate the selected task and pick again. > > To ensure the selected task or other blocked tasks in the chain > aren't migrated away while we're running the proxy, this patch > also tweaks CFS logic to avoid migrating selected or mutex > blocked tasks. > > Cc: Joel Fernandes > Cc: Qais Yousef > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Juri Lelli > Cc: Vincent Guittot > Cc: Dietmar Eggemann > Cc: Valentin Schneider > Cc: Steven Rostedt > Cc: Ben Segall > Cc: Zimuzo Ezeozue > Cc: Youssef Esmat > Cc: Mel Gorman > Cc: Daniel Bristot de Oliveira > Cc: Will Deacon > Cc: Waiman Long > Cc: Boqun Feng > Cc: "Paul E. McKenney" > Cc: Metin Kaya > Cc: Xuewen Yan > Cc: K Prateek Nayak > Cc: Thomas Gleixner > Cc: kernel-team@android.com > Signed-off-by: Peter Zijlstra (Intel) > Signed-off-by: Juri Lelli > Signed-off-by: Valentin Schneider > Signed-off-by: Connor O'Brien > [jstultz: This change was split out from the larger proxy patch] > Signed-off-by: John Stultz > --- > v5: > * Split this out from larger proxy patch > v7: > * Minor refactoring of core find_proxy_task() function > * Minor spelling and corrections suggested by Metin Kaya > * Dropped an added BUG_ON that was frequently tripped > * Minor commit message tweaks from Metin Kaya > --- > kernel/sched/core.c | 154 +++++++++++++++++++++++++++++++++++++------- > kernel/sched/fair.c | 9 ++- > 2 files changed, 137 insertions(+), 26 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index f6bf3b62194c..42e25bbdfe6b 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -94,6 +94,7 @@ > #include "../workqueue_internal.h" > #include "../../io_uring/io-wq.h" > #include "../smpboot.h" > +#include "../locking/mutex.h" > > EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpu); > EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpumask); > @@ -6609,6 +6610,15 @@ static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p, > > #ifdef CONFIG_SCHED_PROXY_EXEC > > +static inline struct task_struct * > +proxy_resched_idle(struct rq *rq, struct task_struct *next) > +{ > + put_prev_task(rq, next); > + rq_set_selected(rq, rq->idle); > + set_tsk_need_resched(rq->idle); > + return rq->idle; > +} > + > static bool proxy_deactivate(struct rq *rq, struct task_struct *next) > { > unsigned long state = READ_ONCE(next->__state); > @@ -6618,48 +6628,138 @@ static bool proxy_deactivate(struct rq *rq, struct task_struct *next) > return false; > if (!try_to_deactivate_task(rq, next, state, true)) > return false; > - put_prev_task(rq, next); > - rq_set_selected(rq, rq->idle); > - resched_curr(rq); > + proxy_resched_idle(rq, next); > return true; > } > > /* > - * Initial simple proxy that just returns the task if it's waking > - * or deactivates the blocked task so we can pick something that > - * isn't blocked. > + * Find who @next (currently blocked on a mutex) can proxy for. > + * > + * Follow the blocked-on relation: > + * task->blocked_on -> mutex->owner -> task... > + * > + * Lock order: > + * > + * p->pi_lock > + * rq->lock > + * mutex->wait_lock > + * p->blocked_lock > + * > + * Returns the task that is going to be used as execution context (the one > + * that is actually going to be put to run on cpu_of(rq)). > */ > static struct task_struct * > find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf) > { > + struct task_struct *owner = NULL; > struct task_struct *ret = NULL; > - struct task_struct *p = next; > + struct task_struct *p; > struct mutex *mutex; > + int this_cpu = cpu_of(rq); > > - mutex = p->blocked_on; > - /* Something changed in the chain, so pick again */ > - if (!mutex) > - return NULL; > /* > - * By taking mutex->wait_lock we hold off concurrent mutex_unlock() > - * and ensure @owner sticks around. > + * Follow blocked_on chain. > + * > + * TODO: deadlock detection > */ > - raw_spin_lock(&mutex->wait_lock); > - raw_spin_lock(&p->blocked_lock); > + for (p = next; task_is_blocked(p); p = owner) { > + mutex = p->blocked_on; > + /* Something changed in the chain, so pick again */ > + if (!mutex) > + return NULL; > > - /* Check again that p is blocked with blocked_lock held */ > - if (!task_is_blocked(p) || mutex != p->blocked_on) { > /* > - * Something changed in the blocked_on chain and > - * we don't know if only at this level. So, let's > - * just bail out completely and let __schedule > - * figure things out (pick_again loop). > + * By taking mutex->wait_lock we hold off concurrent mutex_unlock() > + * and ensure @owner sticks around. > */ > - goto out; > + raw_spin_lock(&mutex->wait_lock); > + raw_spin_lock(&p->blocked_lock); > + > + /* Check again that p is blocked with blocked_lock held */ Is this comment still valid? > + if (mutex != p->blocked_on) { > + /* > + * Something changed in the blocked_on chain and > + * we don't know if only at this level. So, let's > + * just bail out completely and let __schedule > + * figure things out (pick_again loop). > + */ > + goto out; > + } > + > + owner = __mutex_owner(mutex); > + if (!owner) { > + ret = p; > + goto out; > + } > + > + if (task_cpu(owner) != this_cpu) { > + /* XXX Don't handle migrations yet */ > + if (!proxy_deactivate(rq, next)) > + ret = next; > + goto out; > + } > + > + if (task_on_rq_migrating(owner)) { > + /* > + * One of the chain of mutex owners is currently migrating to this > + * CPU, but has not yet been enqueued because we are holding the > + * rq lock. As a simple solution, just schedule rq->idle to give > + * the migration a chance to complete. Much like the migrate_task > + * case we should end up back in proxy(), this time hopefully with s/proxy/find_proxy_task/ > + * all relevant tasks already enqueued. > + */ > + raw_spin_unlock(&p->blocked_lock); > + raw_spin_unlock(&mutex->wait_lock); > + return proxy_resched_idle(rq, next); > + } > + > + if (!owner->on_rq) { > + /* XXX Don't handle blocked owners yet */ > + if (!proxy_deactivate(rq, next)) > + ret = next; > + goto out; > + } > + > + if (owner == p) { > + /* > + * It's possible we interleave with mutex_unlock like: > + * > + * lock(&rq->lock); > + * find_proxy_task() > + * mutex_unlock() > + * lock(&wait_lock); > + * next(owner) = current->blocked_donor; > + * unlock(&wait_lock); > + * > + * wake_up_q(); > + * ... > + * ttwu_runnable() > + * __task_rq_lock() > + * lock(&wait_lock); > + * owner == p > + * > + * Which leaves us to finish the ttwu_runnable() and make it go. > + * > + * So schedule rq->idle so that ttwu_runnable can get the rq lock > + * and mark owner as running. > + */ > + raw_spin_unlock(&p->blocked_lock); > + raw_spin_unlock(&mutex->wait_lock); > + return proxy_resched_idle(rq, next); > + } > + > + /* > + * OK, now we're absolutely sure @owner is not blocked _and_ > + * on this rq, therefore holding @rq->lock is sufficient to > + * guarantee its existence, as per ttwu_remote(). > + */ > + raw_spin_unlock(&p->blocked_lock); > + raw_spin_unlock(&mutex->wait_lock); > } > > - if (!proxy_deactivate(rq, next)) > - ret = p; > + WARN_ON_ONCE(owner && !owner->on_rq); > + return owner; > + > out: > raw_spin_unlock(&p->blocked_lock); > raw_spin_unlock(&mutex->wait_lock); > @@ -6738,6 +6838,7 @@ static void __sched notrace __schedule(unsigned int sched_mode) > struct rq_flags rf; > struct rq *rq; > int cpu; > + bool preserve_need_resched = false; > > cpu = smp_processor_id(); > rq = cpu_rq(cpu); > @@ -6798,9 +6899,12 @@ static void __sched notrace __schedule(unsigned int sched_mode) > rq_repin_lock(rq, &rf); > goto pick_again; > } > + if (next == rq->idle && prev == rq->idle) > + preserve_need_resched = true; > } > > - clear_tsk_need_resched(prev); > + if (!preserve_need_resched) > + clear_tsk_need_resched(prev); > clear_preempt_need_resched(); > #ifdef CONFIG_SCHED_DEBUG > rq->last_seen_need_resched_ns = 0; > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 085941db5bf1..954b41e5b7df 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8905,6 +8905,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) > if (kthread_is_per_cpu(p)) > return 0; > > + if (task_is_blocked(p)) > + return 0; I think "We do not migrate tasks that are: ..." (kernel/sched/fair.c:8897) comment needs to be updated for this change. > + > if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) { > int cpu; > > @@ -8941,7 +8944,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) > /* Record that we found at least one task that could run on dst_cpu */ > env->flags &= ~LBF_ALL_PINNED; > > - if (task_on_cpu(env->src_rq, p)) { > + if (task_on_cpu(env->src_rq, p) || > + task_current_selected(env->src_rq, p)) { > schedstat_inc(p->stats.nr_failed_migrations_running); > return 0; > } > @@ -8980,6 +8984,9 @@ static void detach_task(struct task_struct *p, struct lb_env *env) > { > lockdep_assert_rq_held(env->src_rq); > > + BUG_ON(task_current(env->src_rq, p)); > + BUG_ON(task_current_selected(env->src_rq, p)); > + > deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK); > set_task_cpu(p, env->dst_cpu); > }