Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp58080rdb; Thu, 21 Dec 2023 02:45:03 -0800 (PST) X-Google-Smtp-Source: AGHT+IHYKpy5Z0EZpYoop/RqXGLjBRZx40RMyJmL00etOW/n5u7/WnFTBdbLcDxZyCaz8jIovA/g X-Received: by 2002:a17:90b:3b4e:b0:28b:8398:c1b5 with SMTP id ot14-20020a17090b3b4e00b0028b8398c1b5mr5076657pjb.9.1703155503568; Thu, 21 Dec 2023 02:45:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703155503; cv=none; d=google.com; s=arc-20160816; b=TSF6LEpARqNL7Qha25XgrVfjuRMc8Q3csk2ybfuttOIxXuhY4K/wRzrjbUUnz9GjTb jXpKAsiX7a95XSzAgh2T26OKb0R3XgBqarbFz/Uc9JJF0wKNCduhmVjnaaMt1g9ZWDde mAt6iphmiNxuc1+AtNpalqAIDECt3hBE2qvNmHSGn+4rOMofvjmm9TglixJXsVSkWNbC GVZ3Yn2Zb1BT88Z5LOTaSs79IdUaS3oLfZD86uwNwk8hux9E327/cNiA0Id9sDNXlZpE vC+V6cT7w0Th08ZV56Xh/BA6BtomXL7rnGf1Qx6aoyEXM3FHT9c7W6g7xZpT5SwWwMzt 9XPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=UOJ2pfbVQrycJdPQ60MRDHudEX72MWBzgKMcFIXj5t4=; fh=VsmtgmY8uhRlz+IPlrp1oZmmFbNtaV1/N69EtU79F2w=; b=h9SKmn0KwPVM+YP7BA9V/gYWbBwJ9zFLkg4Xb7WMYylj0A67yup5IDCwttQoSiz4mU jmveRRDyovmOqUf7OM4/EMMqUD2kFePjgmHvZMeOrAWobG1WkBRxcAtUMkDdkcO4Fgyz y0QsoSzGaCnUMnaiOF5BNyvr6Mi5lniLLMHQiEk/66PUeicduucNek8QJQ99Fz66Y7kq 9LgrKDLx5Xm/43NhGZdDTKIz3zFyFfA98RP5dDtGpazW/si2AvUCPQmwpYyPCD96nuf+ a0FjNZoCw2UyuEwMHW0ncXGxH6fjd8Mi7KNO0WFdglzamd2LifXWKeQxDJcbM+l5PPvH hvyw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-8233-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-8233-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id x8-20020a17090a8a8800b0028bc5445424si3685783pjn.76.2023.12.21.02.45.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 02:45:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-8233-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-8233-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-8233-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id B302A2855BC for ; Thu, 21 Dec 2023 10:44:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 326146D1BF; Thu, 21 Dec 2023 10:44:03 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 37F6A6BB55 for ; Thu, 21 Dec 2023 10:44:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7C6B32F4; Thu, 21 Dec 2023 02:44:44 -0800 (PST) Received: from [10.57.87.54] (unknown [10.57.87.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 199693F5A1; Thu, 21 Dec 2023 02:43:55 -0800 (PST) Message-ID: Date: Thu, 21 Dec 2023 10:43:54 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Metin Kaya Subject: Re: [PATCH v7 08/23] sched: Split scheduler and execution contexts To: John Stultz , LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , kernel-team@android.com, Connor O'Brien References: <20231220001856.3710363-1-jstultz@google.com> <20231220001856.3710363-9-jstultz@google.com> Content-Language: en-US In-Reply-To: <20231220001856.3710363-9-jstultz@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 20/12/2023 12:18 am, John Stultz wrote: > From: Peter Zijlstra > > Let's define the scheduling context as all the scheduler state > in task_struct for the task selected to run, and the execution > context as all state required to actually run the task. > > Currently both are intertwined in task_struct. We want to > logically split these such that we can use the scheduling > context of the task selected to be scheduled, but use the > execution context of a different task to actually be run. Should we update Documentation/kernel-hacking/hacking.rst (line #348: :c:macro:`current`) or another appropriate doc to announce separation of scheduling & execution contexts? > > To this purpose, introduce rq_selected() macro to point to the > task_struct selected from the runqueue by the scheduler, and > will be used for scheduler state, and preserve rq->curr to > indicate the execution context of the task that will actually be > run. > > NOTE: Peter previously mentioned he didn't like the name > "rq_selected()", but I've not come up with a better alternative. > I'm very open to other name proposals. > > Question for Peter: Dietmar suggested you'd prefer I drop the > conditionalization of the scheduler context pointer on the rq > (so rq_selected() would be open coded as rq->curr_selected or > whatever we agree on for a name), but I'd think in the > !CONFIG_PROXY_EXEC case we'd want to avoid the wasted pointer > and its use (since it curr_selected would always be == curr)? > If I'm wrong I'm fine switching this, but would appreciate > clarification. > > Cc: Joel Fernandes > Cc: Qais Yousef > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Juri Lelli > Cc: Vincent Guittot > Cc: Dietmar Eggemann > Cc: Valentin Schneider > Cc: Steven Rostedt > Cc: Ben Segall > Cc: Zimuzo Ezeozue > Cc: Youssef Esmat > Cc: Mel Gorman > Cc: Daniel Bristot de Oliveira > Cc: Will Deacon > Cc: Waiman Long > Cc: Boqun Feng > Cc: "Paul E. McKenney" > Cc: Xuewen Yan > Cc: K Prateek Nayak > Cc: Metin Kaya > Cc: Thomas Gleixner > Cc: kernel-team@android.com > Signed-off-by: Peter Zijlstra (Intel) > Signed-off-by: Juri Lelli > Signed-off-by: Peter Zijlstra (Intel) > Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com > [add additional comments and update more sched_class code to use > rq::proxy] > Signed-off-by: Connor O'Brien > [jstultz: Rebased and resolved minor collisions, reworked to use > accessors, tweaked update_curr_common to use rq_proxy fixing rt > scheduling issues] > Signed-off-by: John Stultz > --- > v2: > * Reworked to use accessors > * Fixed update_curr_common to use proxy instead of curr > v3: > * Tweaked wrapper names > * Swapped proxy for selected for clarity > v4: > * Minor variable name tweaks for readability > * Use a macro instead of a inline function and drop > other helper functions as suggested by Peter. > * Remove verbose comments/questions to avoid review > distractions, as suggested by Dietmar > v5: > * Add CONFIG_PROXY_EXEC option to this patch so the > new logic can be tested with this change > * Minor fix to grab rq_selected when holding the rq lock > v7: > * Minor spelling fix and unused argument fixes suggested by > Metin Kaya > * Switch to curr_selected for consistency, and minor rewording > of commit message for clarity > * Rename variables selected instead of curr when we're using > rq_selected() > * Reduce macros in CONFIG_SCHED_PROXY_EXEC ifdef sections, > as suggested by Metin Kaya > --- > kernel/sched/core.c | 46 ++++++++++++++++++++++++++--------------- > kernel/sched/deadline.c | 35 ++++++++++++++++--------------- > kernel/sched/fair.c | 18 ++++++++-------- > kernel/sched/rt.c | 40 +++++++++++++++++------------------ > kernel/sched/sched.h | 35 +++++++++++++++++++++++++++++-- > 5 files changed, 109 insertions(+), 65 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index e06558fb08aa..0ce34f5c0e0c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -822,7 +822,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer) > > rq_lock(rq, &rf); > update_rq_clock(rq); > - rq->curr->sched_class->task_tick(rq, rq->curr, 1); > + rq_selected(rq)->sched_class->task_tick(rq, rq_selected(rq), 1); > rq_unlock(rq, &rf); > > return HRTIMER_NORESTART; > @@ -2242,16 +2242,18 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p, > > void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags) > { > - if (p->sched_class == rq->curr->sched_class) > - rq->curr->sched_class->wakeup_preempt(rq, p, flags); > - else if (sched_class_above(p->sched_class, rq->curr->sched_class)) > + struct task_struct *selected = rq_selected(rq); > + > + if (p->sched_class == selected->sched_class) > + selected->sched_class->wakeup_preempt(rq, p, flags); > + else if (sched_class_above(p->sched_class, selected->sched_class)) > resched_curr(rq); > > /* > * A queue event has occurred, and we're going to schedule. In > * this case, we can save a useless back to back clock update. > */ > - if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr)) > + if (task_on_rq_queued(selected) && test_tsk_need_resched(rq->curr)) > rq_clock_skip_update(rq); > } > > @@ -2780,7 +2782,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx) > lockdep_assert_held(&p->pi_lock); > > queued = task_on_rq_queued(p); > - running = task_current(rq, p); > + running = task_current_selected(rq, p); > > if (queued) { > /* > @@ -5600,7 +5602,7 @@ unsigned long long task_sched_runtime(struct task_struct *p) > * project cycles that may never be accounted to this > * thread, breaking clock_gettime(). > */ > - if (task_current(rq, p) && task_on_rq_queued(p)) { > + if (task_current_selected(rq, p) && task_on_rq_queued(p)) { > prefetch_curr_exec_start(p); > update_rq_clock(rq); > p->sched_class->update_curr(rq); > @@ -5668,7 +5670,8 @@ void scheduler_tick(void) > { > int cpu = smp_processor_id(); > struct rq *rq = cpu_rq(cpu); > - struct task_struct *curr = rq->curr; > + /* accounting goes to the selected task */ > + struct task_struct *selected; > struct rq_flags rf; > unsigned long thermal_pressure; > u64 resched_latency; > @@ -5679,16 +5682,17 @@ void scheduler_tick(void) > sched_clock_tick(); > > rq_lock(rq, &rf); > + selected = rq_selected(rq); > > update_rq_clock(rq); > thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq)); > update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure); > - curr->sched_class->task_tick(rq, curr, 0); > + selected->sched_class->task_tick(rq, selected, 0); > if (sched_feat(LATENCY_WARN)) > resched_latency = cpu_resched_latency(rq); > calc_global_load_tick(rq); > sched_core_tick(rq); > - task_tick_mm_cid(rq, curr); > + task_tick_mm_cid(rq, selected); > > rq_unlock(rq, &rf); > > @@ -5697,8 +5701,8 @@ void scheduler_tick(void) > > perf_event_task_tick(); > > - if (curr->flags & PF_WQ_WORKER) > - wq_worker_tick(curr); > + if (selected->flags & PF_WQ_WORKER) > + wq_worker_tick(selected); > > #ifdef CONFIG_SMP > rq->idle_balance = idle_cpu(cpu); > @@ -5763,6 +5767,12 @@ static void sched_tick_remote(struct work_struct *work) > struct task_struct *curr = rq->curr; > > if (cpu_online(cpu)) { > + /* > + * Since this is a remote tick for full dynticks mode, > + * we are always sure that there is no proxy (only a > + * single task is running). > + */ > + SCHED_WARN_ON(rq->curr != rq_selected(rq)); > update_rq_clock(rq); > > if (!is_idle_task(curr)) { > @@ -6685,6 +6695,7 @@ static void __sched notrace __schedule(unsigned int sched_mode) > } > > next = pick_next_task(rq, prev, &rf); > + rq_set_selected(rq, next); > clear_tsk_need_resched(prev); > clear_preempt_need_resched(); > #ifdef CONFIG_SCHED_DEBUG > @@ -7185,7 +7196,7 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task) > > prev_class = p->sched_class; > queued = task_on_rq_queued(p); > - running = task_current(rq, p); > + running = task_current_selected(rq, p); > if (queued) > dequeue_task(rq, p, queue_flag); > if (running) > @@ -7275,7 +7286,7 @@ void set_user_nice(struct task_struct *p, long nice) > } > > queued = task_on_rq_queued(p); > - running = task_current(rq, p); > + running = task_current_selected(rq, p); > if (queued) > dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); > if (running) > @@ -7868,7 +7879,7 @@ static int __sched_setscheduler(struct task_struct *p, > } > > queued = task_on_rq_queued(p); > - running = task_current(rq, p); > + running = task_current_selected(rq, p); > if (queued) > dequeue_task(rq, p, queue_flags); > if (running) > @@ -9295,6 +9306,7 @@ void __init init_idle(struct task_struct *idle, int cpu) > rcu_read_unlock(); > > rq->idle = idle; > + rq_set_selected(rq, idle); > rcu_assign_pointer(rq->curr, idle); > idle->on_rq = TASK_ON_RQ_QUEUED; > #ifdef CONFIG_SMP > @@ -9384,7 +9396,7 @@ void sched_setnuma(struct task_struct *p, int nid) > > rq = task_rq_lock(p, &rf); > queued = task_on_rq_queued(p); > - running = task_current(rq, p); > + running = task_current_selected(rq, p); > > if (queued) > dequeue_task(rq, p, DEQUEUE_SAVE); > @@ -10489,7 +10501,7 @@ void sched_move_task(struct task_struct *tsk) > > update_rq_clock(rq); > > - running = task_current(rq, tsk); > + running = task_current_selected(rq, tsk); > queued = task_on_rq_queued(tsk); > > if (queued) > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index 6140f1f51da1..9cf20f4ac5f9 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -1150,7 +1150,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer) > #endif > > enqueue_task_dl(rq, p, ENQUEUE_REPLENISH); > - if (dl_task(rq->curr)) > + if (dl_task(rq_selected(rq))) > wakeup_preempt_dl(rq, p, 0); > else > resched_curr(rq); > @@ -1273,7 +1273,7 @@ static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se) > */ > static void update_curr_dl(struct rq *rq) > { > - struct task_struct *curr = rq->curr; > + struct task_struct *curr = rq_selected(rq); > struct sched_dl_entity *dl_se = &curr->dl; > s64 delta_exec, scaled_delta_exec; > int cpu = cpu_of(rq); > @@ -1784,7 +1784,7 @@ static int find_later_rq(struct task_struct *task); > static int > select_task_rq_dl(struct task_struct *p, int cpu, int flags) > { > - struct task_struct *curr; > + struct task_struct *curr, *selected; > bool select_rq; > struct rq *rq; > > @@ -1795,6 +1795,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags) > > rcu_read_lock(); > curr = READ_ONCE(rq->curr); /* unlocked access */ > + selected = READ_ONCE(rq_selected(rq)); > > /* > * If we are dealing with a -deadline task, we must > @@ -1805,9 +1806,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags) > * other hand, if it has a shorter deadline, we > * try to make it stay here, it might be important. > */ > - select_rq = unlikely(dl_task(curr)) && > + select_rq = unlikely(dl_task(selected)) && > (curr->nr_cpus_allowed < 2 || > - !dl_entity_preempt(&p->dl, &curr->dl)) && > + !dl_entity_preempt(&p->dl, &selected->dl)) && > p->nr_cpus_allowed > 1; > > /* > @@ -1870,7 +1871,7 @@ static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p) > * let's hope p can move out. > */ > if (rq->curr->nr_cpus_allowed == 1 || > - !cpudl_find(&rq->rd->cpudl, rq->curr, NULL)) > + !cpudl_find(&rq->rd->cpudl, rq_selected(rq), NULL)) > return; > > /* > @@ -1909,7 +1910,7 @@ static int balance_dl(struct rq *rq, struct task_struct *p, struct rq_flags *rf) > static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, > int flags) > { > - if (dl_entity_preempt(&p->dl, &rq->curr->dl)) { > + if (dl_entity_preempt(&p->dl, &rq_selected(rq)->dl)) { > resched_curr(rq); > return; > } > @@ -1919,7 +1920,7 @@ static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, > * In the unlikely case current and p have the same deadline > * let us try to decide what's the best thing to do... > */ > - if ((p->dl.deadline == rq->curr->dl.deadline) && > + if ((p->dl.deadline == rq_selected(rq)->dl.deadline) && > !test_tsk_need_resched(rq->curr)) > check_preempt_equal_dl(rq, p); > #endif /* CONFIG_SMP */ > @@ -1954,7 +1955,7 @@ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first) > if (hrtick_enabled_dl(rq)) > start_hrtick_dl(rq, p); > > - if (rq->curr->sched_class != &dl_sched_class) > + if (rq_selected(rq)->sched_class != &dl_sched_class) > update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0); > > deadline_queue_push_tasks(rq); > @@ -2268,8 +2269,8 @@ static int push_dl_task(struct rq *rq) > * can move away, it makes sense to just reschedule > * without going further in pushing next_task. > */ > - if (dl_task(rq->curr) && > - dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) && > + if (dl_task(rq_selected(rq)) && > + dl_time_before(next_task->dl.deadline, rq_selected(rq)->dl.deadline) && > rq->curr->nr_cpus_allowed > 1) { > resched_curr(rq); > return 0; > @@ -2394,7 +2395,7 @@ static void pull_dl_task(struct rq *this_rq) > * deadline than the current task of its runqueue. > */ > if (dl_time_before(p->dl.deadline, > - src_rq->curr->dl.deadline)) > + rq_selected(src_rq)->dl.deadline)) > goto skip; > > if (is_migration_disabled(p)) { > @@ -2435,9 +2436,9 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p) > if (!task_on_cpu(rq, p) && > !test_tsk_need_resched(rq->curr) && > p->nr_cpus_allowed > 1 && > - dl_task(rq->curr) && > + dl_task(rq_selected(rq)) && > (rq->curr->nr_cpus_allowed < 2 || > - !dl_entity_preempt(&p->dl, &rq->curr->dl))) { > + !dl_entity_preempt(&p->dl, &rq_selected(rq)->dl))) { > push_dl_tasks(rq); > } > } > @@ -2612,12 +2613,12 @@ static void switched_to_dl(struct rq *rq, struct task_struct *p) > return; > } > > - if (rq->curr != p) { > + if (rq_selected(rq) != p) { > #ifdef CONFIG_SMP > if (p->nr_cpus_allowed > 1 && rq->dl.overloaded) > deadline_queue_push_tasks(rq); > #endif > - if (dl_task(rq->curr)) > + if (dl_task(rq_selected(rq))) > wakeup_preempt_dl(rq, p, 0); > else > resched_curr(rq); > @@ -2646,7 +2647,7 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p, > if (!rq->dl.overloaded) > deadline_queue_pull_task(rq); > > - if (task_current(rq, p)) { > + if (task_current_selected(rq, p)) { > /* > * If we now have a earlier deadline task than p, > * then reschedule, provided p is still on this > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 1251fd01a555..07216ea3ed53 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -1157,7 +1157,7 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *curr) > */ > s64 update_curr_common(struct rq *rq) > { > - struct task_struct *curr = rq->curr; > + struct task_struct *curr = rq_selected(rq); > s64 delta_exec; > > delta_exec = update_curr_se(rq, &curr->se); > @@ -1203,7 +1203,7 @@ static void update_curr(struct cfs_rq *cfs_rq) > > static void update_curr_fair(struct rq *rq) > { > - update_curr(cfs_rq_of(&rq->curr->se)); > + update_curr(cfs_rq_of(&rq_selected(rq)->se)); > } > > static inline void > @@ -6611,7 +6611,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p) > s64 delta = slice - ran; > > if (delta < 0) { > - if (task_current(rq, p)) > + if (task_current_selected(rq, p)) > resched_curr(rq); > return; > } > @@ -6626,7 +6626,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p) > */ > static void hrtick_update(struct rq *rq) > { > - struct task_struct *curr = rq->curr; > + struct task_struct *curr = rq_selected(rq); > > if (!hrtick_enabled_fair(rq) || curr->sched_class != &fair_sched_class) > return; > @@ -8235,7 +8235,7 @@ static void set_next_buddy(struct sched_entity *se) > */ > static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags) > { > - struct task_struct *curr = rq->curr; > + struct task_struct *curr = rq_selected(rq); > struct sched_entity *se = &curr->se, *pse = &p->se; > struct cfs_rq *cfs_rq = task_cfs_rq(curr); > int next_buddy_marked = 0; > @@ -8268,7 +8268,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int > * prevents us from potentially nominating it as a false LAST_BUDDY > * below. > */ > - if (test_tsk_need_resched(curr)) > + if (test_tsk_need_resched(rq->curr)) > return; > > /* Idle tasks are by definition preempted by non-idle tasks. */ > @@ -9252,7 +9252,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done) > * update_load_avg() can call cpufreq_update_util(). Make sure that RT, > * DL and IRQ signals have been updated before updating CFS. > */ > - curr_class = rq->curr->sched_class; > + curr_class = rq_selected(rq)->sched_class; > > thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq)); > > @@ -12640,7 +12640,7 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio) > * our priority decreased, or if we are not currently running on > * this runqueue and our priority is higher than the current's > */ > - if (task_current(rq, p)) { > + if (task_current_selected(rq, p)) { > if (p->prio > oldprio) > resched_curr(rq); > } else > @@ -12743,7 +12743,7 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p) > * kick off the schedule if running, otherwise just see > * if we can still preempt the current task. > */ > - if (task_current(rq, p)) > + if (task_current_selected(rq, p)) > resched_curr(rq); > else > wakeup_preempt(rq, p, 0); > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index 9cdea3ea47da..2682cec45aaa 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -530,7 +530,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags) > > static void sched_rt_rq_enqueue(struct rt_rq *rt_rq) > { > - struct task_struct *curr = rq_of_rt_rq(rt_rq)->curr; > + struct task_struct *curr = rq_selected(rq_of_rt_rq(rt_rq)); > struct rq *rq = rq_of_rt_rq(rt_rq); > struct sched_rt_entity *rt_se; > > @@ -1000,7 +1000,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq) > */ > static void update_curr_rt(struct rq *rq) > { > - struct task_struct *curr = rq->curr; > + struct task_struct *curr = rq_selected(rq); > struct sched_rt_entity *rt_se = &curr->rt; > s64 delta_exec; > > @@ -1545,7 +1545,7 @@ static int find_lowest_rq(struct task_struct *task); > static int > select_task_rq_rt(struct task_struct *p, int cpu, int flags) > { > - struct task_struct *curr; > + struct task_struct *curr, *selected; > struct rq *rq; > bool test; > > @@ -1557,6 +1557,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags) > > rcu_read_lock(); > curr = READ_ONCE(rq->curr); /* unlocked access */ > + selected = READ_ONCE(rq_selected(rq)); > > /* > * If the current task on @p's runqueue is an RT task, then > @@ -1585,8 +1586,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags) > * systems like big.LITTLE. > */ > test = curr && > - unlikely(rt_task(curr)) && > - (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio); > + unlikely(rt_task(selected)) && > + (curr->nr_cpus_allowed < 2 || selected->prio <= p->prio); > > if (test || !rt_task_fits_capacity(p, cpu)) { > int target = find_lowest_rq(p); > @@ -1616,12 +1617,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags) > > static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p) > { > - /* > - * Current can't be migrated, useless to reschedule, > - * let's hope p can move out. > - */ > if (rq->curr->nr_cpus_allowed == 1 || > - !cpupri_find(&rq->rd->cpupri, rq->curr, NULL)) > + !cpupri_find(&rq->rd->cpupri, rq_selected(rq), NULL)) > return; > > /* > @@ -1664,7 +1661,9 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf) > */ > static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags) > { > - if (p->prio < rq->curr->prio) { > + struct task_struct *curr = rq_selected(rq); > + > + if (p->prio < curr->prio) { > resched_curr(rq); > return; > } > @@ -1682,7 +1681,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags) > * to move current somewhere else, making room for our non-migratable > * task. > */ > - if (p->prio == rq->curr->prio && !test_tsk_need_resched(rq->curr)) > + if (p->prio == curr->prio && !test_tsk_need_resched(rq->curr)) > check_preempt_equal_prio(rq, p); > #endif > } > @@ -1707,7 +1706,7 @@ static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool f > * utilization. We only care of the case where we start to schedule a > * rt task > */ > - if (rq->curr->sched_class != &rt_sched_class) > + if (rq_selected(rq)->sched_class != &rt_sched_class) > update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0); > > rt_queue_push_tasks(rq); > @@ -1988,6 +1987,7 @@ static struct task_struct *pick_next_pushable_task(struct rq *rq) > > BUG_ON(rq->cpu != task_cpu(p)); > BUG_ON(task_current(rq, p)); > + BUG_ON(task_current_selected(rq, p)); > BUG_ON(p->nr_cpus_allowed <= 1); > > BUG_ON(!task_on_rq_queued(p)); > @@ -2020,7 +2020,7 @@ static int push_rt_task(struct rq *rq, bool pull) > * higher priority than current. If that's the case > * just reschedule current. > */ > - if (unlikely(next_task->prio < rq->curr->prio)) { > + if (unlikely(next_task->prio < rq_selected(rq)->prio)) { > resched_curr(rq); > return 0; > } > @@ -2375,7 +2375,7 @@ static void pull_rt_task(struct rq *this_rq) > * p if it is lower in priority than the > * current task on the run queue > */ > - if (p->prio < src_rq->curr->prio) > + if (p->prio < rq_selected(src_rq)->prio) > goto skip; > > if (is_migration_disabled(p)) { > @@ -2419,9 +2419,9 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p) > bool need_to_push = !task_on_cpu(rq, p) && > !test_tsk_need_resched(rq->curr) && > p->nr_cpus_allowed > 1 && > - (dl_task(rq->curr) || rt_task(rq->curr)) && > + (dl_task(rq_selected(rq)) || rt_task(rq_selected(rq))) && > (rq->curr->nr_cpus_allowed < 2 || > - rq->curr->prio <= p->prio); > + rq_selected(rq)->prio <= p->prio); > > if (need_to_push) > push_rt_tasks(rq); > @@ -2505,7 +2505,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p) > if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) > rt_queue_push_tasks(rq); > #endif /* CONFIG_SMP */ > - if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq))) > + if (p->prio < rq_selected(rq)->prio && cpu_online(cpu_of(rq))) > resched_curr(rq); > } > } > @@ -2520,7 +2520,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio) > if (!task_on_rq_queued(p)) > return; > > - if (task_current(rq, p)) { > + if (task_current_selected(rq, p)) { > #ifdef CONFIG_SMP > /* > * If our priority decreases while running, we > @@ -2546,7 +2546,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio) > * greater than the current running task > * then reschedule. > */ > - if (p->prio < rq->curr->prio) > + if (p->prio < rq_selected(rq)->prio) > resched_curr(rq); > } > } > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 3e0e4fc8734b..6ea1dfbe502a 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -994,7 +994,10 @@ struct rq { > */ > unsigned int nr_uninterruptible; > > - struct task_struct __rcu *curr; > + struct task_struct __rcu *curr; /* Execution context */ > +#ifdef CONFIG_SCHED_PROXY_EXEC > + struct task_struct __rcu *curr_selected; /* Scheduling context (policy) */ > +#endif > struct task_struct *idle; > struct task_struct *stop; > unsigned long next_balance; > @@ -1189,6 +1192,20 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); > #define cpu_curr(cpu) (cpu_rq(cpu)->curr) > #define raw_rq() raw_cpu_ptr(&runqueues) > > +#ifdef CONFIG_SCHED_PROXY_EXEC > +#define rq_selected(rq) ((rq)->curr_selected) > +static inline void rq_set_selected(struct rq *rq, struct task_struct *t) > +{ > + rcu_assign_pointer(rq->curr_selected, t); > +} > +#else > +#define rq_selected(rq) ((rq)->curr) > +static inline void rq_set_selected(struct rq *rq, struct task_struct *t) > +{ > + /* Do nothing */ > +} > +#endif > + > struct sched_group; > #ifdef CONFIG_SCHED_CORE > static inline struct cpumask *sched_group_span(struct sched_group *sg); > @@ -2112,11 +2129,25 @@ static inline u64 global_rt_runtime(void) > return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC; > } > > +/* > + * Is p the current execution context? > + */ > static inline int task_current(struct rq *rq, struct task_struct *p) > { > return rq->curr == p; > } > > +/* > + * Is p the current scheduling context? > + * > + * Note that it might be the current execution context at the same time if > + * rq->curr == rq_selected() == p. > + */ > +static inline int task_current_selected(struct rq *rq, struct task_struct *p) > +{ > + return rq_selected(rq) == p; > +} > + > static inline int task_on_cpu(struct rq *rq, struct task_struct *p) > { > #ifdef CONFIG_SMP > @@ -2280,7 +2311,7 @@ struct sched_class { > > static inline void put_prev_task(struct rq *rq, struct task_struct *prev) > { > - WARN_ON_ONCE(rq->curr != prev); > + WARN_ON_ONCE(rq_selected(rq) != prev); > prev->sched_class->put_prev_task(rq, prev); > } >