2024-05-30 12:24:32

by Ze Gao

[permalink] [raw]
Subject: [RFC PATCH 0/2] nohz idle time accounting cleanup

Dear all,

Currently we use is_idle_task(current) check to decide if we need
to call tick_nohz_irq_enter() on irq entry but use idle_cpu() to
decide if we need to call tick_nohz_irq_exit() on irq exit.
and we rely this pair (which internally calls tick_nohz_stop_idle()
and tick_nohz_start_idle() separately) to do accurate idle time
accounting in most cases.

IIUC, idle_cpu() now is mainly for scheduler and for tick user,
we seem to ask less than what idle_cpu() gives us and the use of
idle_cpu() here only make things complicated which can be proved
by the introduction of sched_core_idle_cpu() for forcing idle
time accounting.

So I make this RFC to do this cleanup and make things simple again
here and it should introduce no functional changes.

Reviews welcome and please let me know if I'm being stupid
or missing something obvious.

Regards,
Ze

--

Ze Gao (2):
timer: Use is_idle_task() check instead of idle_cpu() on irq exit
sched/core: Remove sched_core_idle_cpu()

include/linux/sched.h | 2 --
kernel/sched/core.c | 13 -------------
kernel/softirq.c | 2 +-
3 files changed, 1 insertion(+), 16 deletions(-)

--
2.41.0



2024-05-30 12:24:42

by Ze Gao

[permalink] [raw]
Subject: [RFC PATCH 1/2] timer: Use is_idle_task() check instead of idle_cpu() on irq exit

idle_cpu() was initially introduced in irq_enter()/exit() to check
whether an irq interrupts an idle cpu or not since commit
79bf2bb335b8 ("[PATCH] tick-management: dyntick / highres functionality")
and at that time, it's implemented via a simple check if the curr
of task of that rq is idle or not. And then commit 6378ddb59215 ("time:
track accurate idle time with tick_sched.idle_sleeptime") uses the same
check to do accurate idle time accounting.

But since commit 908a3283728d ("sched: Fix idle_cpu()"), idle_cpu()
takes scheduler stats into consideration and becomes more constrained,
and therefore it tells more than if we have interrupted an idle
process but also whether a cpu is going to be idle or not since it
takes queued tasks and queued to be woken tasks into account.

However for tick user, it is too much as now we only rely on this check
to do nohz idle time accounting in tick_nohz_start_idle() just in case
that tick_nohz_stop_idle() is called upon irq_enter() if we actually
rupture an idle cpu(process). The use of idle_cpu() simply complicates
things here, and the introduction of sched_core_idle_cpu() in
commit 548796e2e70b ("sched/core: introduce sched_core_idle_cpu()")
proves this.

The use of is_idle_task() just like in commit 0a8a2e78b7ee ("timer: Fix
bad idle check on irq entry") helps to save one unnecessary fix for idle
time accounting for the newly force idle state. Note this also preps for
the remove of sched_core_idle_cpu() in the following patch.

Signed-off-by: Ze Gao <[email protected]>
---
kernel/softirq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 02582017759a..24c7bf3c3f6c 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -617,7 +617,7 @@ static inline void tick_irq_exit(void)
int cpu = smp_processor_id();

/* Make sure that timer wheel updates are propagated */
- if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+ if ((is_idle_task(current) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
if (!in_hardirq())
tick_nohz_irq_exit();
}
--
2.41.0


2024-05-30 12:24:51

by Ze Gao

[permalink] [raw]
Subject: [RFC PATCH 2/2] sched/core: Remove sched_core_idle_cpu()

Since there is no user of sched_core_idle_cpu(), delete it.

Signed-off-by: Ze Gao <[email protected]>
---
include/linux/sched.h | 2 --
kernel/sched/core.c | 13 -------------
2 files changed, 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61591ac6eab6..85ef086362c9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2182,11 +2182,9 @@ extern void sched_core_free(struct task_struct *tsk);
extern void sched_core_fork(struct task_struct *p);
extern int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type,
unsigned long uaddr);
-extern int sched_core_idle_cpu(int cpu);
#else
static inline void sched_core_free(struct task_struct *tsk) { }
static inline void sched_core_fork(struct task_struct *p) { }
-static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); }
#endif

extern void sched_set_stop_task(int cpu, struct task_struct *stop);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bcf2c4cc0522..c42fe87e07d0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7453,19 +7453,6 @@ struct task_struct *idle_task(int cpu)
return cpu_rq(cpu)->idle;
}

-#ifdef CONFIG_SCHED_CORE
-int sched_core_idle_cpu(int cpu)
-{
- struct rq *rq = cpu_rq(cpu);
-
- if (sched_core_enabled(rq) && rq->curr == rq->idle)
- return 1;
-
- return idle_cpu(cpu);
-}
-
-#endif
-
#ifdef CONFIG_SMP
/*
* This function computes an effective utilization for the given CPU, to be
--
2.41.0


2024-06-07 03:33:43

by Ze Gao

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] nohz idle time accounting cleanup

Gently ping for comments on this change ;D

Thanks,
Ze

On Thu, May 30, 2024 at 8:24 PM Ze Gao <[email protected]> wrote:
>
> Dear all,
>
> Currently we use is_idle_task(current) check to decide if we need
> to call tick_nohz_irq_enter() on irq entry but use idle_cpu() to
> decide if we need to call tick_nohz_irq_exit() on irq exit.
> and we rely this pair (which internally calls tick_nohz_stop_idle()
> and tick_nohz_start_idle() separately) to do accurate idle time
> accounting in most cases.
>
> IIUC, idle_cpu() now is mainly for scheduler and for tick user,
> we seem to ask less than what idle_cpu() gives us and the use of
> idle_cpu() here only make things complicated which can be proved
> by the introduction of sched_core_idle_cpu() for forcing idle
> time accounting.
>
> So I make this RFC to do this cleanup and make things simple again
> here and it should introduce no functional changes.
>
> Reviews welcome and please let me know if I'm being stupid
> or missing something obvious.
>
> Regards,
> Ze
>
> --
>
> Ze Gao (2):
> timer: Use is_idle_task() check instead of idle_cpu() on irq exit
> sched/core: Remove sched_core_idle_cpu()
>
> include/linux/sched.h | 2 --
> kernel/sched/core.c | 13 -------------
> kernel/softirq.c | 2 +-
> 3 files changed, 1 insertion(+), 16 deletions(-)
>
> --
> 2.41.0
>

2024-06-07 22:46:31

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] timer: Use is_idle_task() check instead of idle_cpu() on irq exit

Le Thu, May 30, 2024 at 08:24:00AM -0400, Ze Gao a ?crit :
> idle_cpu() was initially introduced in irq_enter()/exit() to check
> whether an irq interrupts an idle cpu or not since commit
> 79bf2bb335b8 ("[PATCH] tick-management: dyntick / highres functionality")
> and at that time, it's implemented via a simple check if the curr
> of task of that rq is idle or not. And then commit 6378ddb59215 ("time:
> track accurate idle time with tick_sched.idle_sleeptime") uses the same
> check to do accurate idle time accounting.
>
> But since commit 908a3283728d ("sched: Fix idle_cpu()"), idle_cpu()
> takes scheduler stats into consideration and becomes more constrained,
> and therefore it tells more than if we have interrupted an idle
> process but also whether a cpu is going to be idle or not since it
> takes queued tasks and queued to be woken tasks into account.
>
> However for tick user, it is too much as now we only rely on this check
> to do nohz idle time accounting in tick_nohz_start_idle() just in case
> that tick_nohz_stop_idle() is called upon irq_enter() if we actually
> rupture an idle cpu(process). The use of idle_cpu() simply complicates
> things here, and the introduction of sched_core_idle_cpu() in
> commit 548796e2e70b ("sched/core: introduce sched_core_idle_cpu()")
> proves this.
>
> The use of is_idle_task() just like in commit 0a8a2e78b7ee ("timer: Fix
> bad idle check on irq entry") helps to save one unnecessary fix for idle
> time accounting for the newly force idle state. Note this also preps for
> the remove of sched_core_idle_cpu() in the following patch.
>
> Signed-off-by: Ze Gao <[email protected]>
> ---
> kernel/softirq.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 02582017759a..24c7bf3c3f6c 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -617,7 +617,7 @@ static inline void tick_irq_exit(void)
> int cpu = smp_processor_id();
>
> /* Make sure that timer wheel updates are propagated */
> - if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> + if ((is_idle_task(current) && !need_resched()) || tick_nohz_full_cpu(cpu)) {

The reason why there is a check here for idle_cpu() (or sched_core_idle_cpu())
is to avoid calling again tick_nohz_start_idle() and then again
tick_nohz_stop_idle() later from tick_nohz_idle_exit(). This can save two costly
calls to ktime_get() when a real task is waiting for the CPU. So any quick clue to
know if a task is going to be scheduled is good to get. And idle_cpu() gives
them all:

int idle_cpu(int cpu)
{
struct rq *rq = cpu_rq(cpu);

if (rq->curr != rq->idle)
return 0;

// This is the necessary is_idle_task() check

if (rq->nr_running)
return 0;

// This tells if there is a real task pending. Ok that check
// is perhaps a bit redundant with need_resched()...

#ifdef CONFIG_SMP
if (rq->ttwu_pending)
return 0;
#endif

// This one tells if there is a remote wakeup pending for this CPU.
// And need_resched() doesn't tell about that yet...

return 1;
}

So it looks to me that idle_cpu() is still a good fit at this place.
And sched_core_idle_cpu() doesn't bring more overhead since the static
key in sched_core_enabled() is rarely active (I guess...). And if it is,
then the check is even more simple.

Thanks.

> if (!in_hardirq())
> tick_nohz_irq_exit();
> }
> --
> 2.41.0
>

2024-06-13 02:24:53

by Ze Gao

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] timer: Use is_idle_task() check instead of idle_cpu() on irq exit

Hi Frederic,

Thanks for your reply and even more thanks for the detailed comments
and elaboration.

On Sat, Jun 8, 2024 at 6:46 AM Frederic Weisbecker <[email protected]> wrote:
>
> Le Thu, May 30, 2024 at 08:24:00AM -0400, Ze Gao a écrit :
> > idle_cpu() was initially introduced in irq_enter()/exit() to check
> > whether an irq interrupts an idle cpu or not since commit
> > 79bf2bb335b8 ("[PATCH] tick-management: dyntick / highres functionality")
> > and at that time, it's implemented via a simple check if the curr
> > of task of that rq is idle or not. And then commit 6378ddb59215 ("time:
> > track accurate idle time with tick_sched.idle_sleeptime") uses the same
> > check to do accurate idle time accounting.
> >
> > But since commit 908a3283728d ("sched: Fix idle_cpu()"), idle_cpu()
> > takes scheduler stats into consideration and becomes more constrained,
> > and therefore it tells more than if we have interrupted an idle
> > process but also whether a cpu is going to be idle or not since it
> > takes queued tasks and queued to be woken tasks into account.
> >
> > However for tick user, it is too much as now we only rely on this check
> > to do nohz idle time accounting in tick_nohz_start_idle() just in case
> > that tick_nohz_stop_idle() is called upon irq_enter() if we actually
> > rupture an idle cpu(process). The use of idle_cpu() simply complicates
> > things here, and the introduction of sched_core_idle_cpu() in
> > commit 548796e2e70b ("sched/core: introduce sched_core_idle_cpu()")
> > proves this.
> >
> > The use of is_idle_task() just like in commit 0a8a2e78b7ee ("timer: Fix
> > bad idle check on irq entry") helps to save one unnecessary fix for idle
> > time accounting for the newly force idle state. Note this also preps for
> > the remove of sched_core_idle_cpu() in the following patch.
> >
> > Signed-off-by: Ze Gao <[email protected]>
> > ---
> > kernel/softirq.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 02582017759a..24c7bf3c3f6c 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -617,7 +617,7 @@ static inline void tick_irq_exit(void)
> > int cpu = smp_processor_id();
> >
> > /* Make sure that timer wheel updates are propagated */
> > - if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> > + if ((is_idle_task(current) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
>
> The reason why there is a check here for idle_cpu() (or sched_core_idle_cpu())
> is to avoid calling again tick_nohz_start_idle() and then again
> tick_nohz_stop_idle() later from tick_nohz_idle_exit(). This can save two costly
> calls to ktime_get() when a real task is waiting for the CPU. So any quick clue to
> know if a task is going to be scheduled is good to get. And idle_cpu() gives
> them all:
>
> int idle_cpu(int cpu)
> {
> struct rq *rq = cpu_rq(cpu);
>
> if (rq->curr != rq->idle)
> return 0;
>
> // This is the necessary is_idle_task() check
>
> if (rq->nr_running)
> return 0;
>
> // This tells if there is a real task pending. Ok that check
> // is perhaps a bit redundant with need_resched()...
>
> #ifdef CONFIG_SMP
> if (rq->ttwu_pending)
> return 0;
> #endif
>
> // This one tells if there is a remote wakeup pending for this CPU.
> // And need_resched() doesn't tell about that yet...

Please correct me if I'm stupid here.

Is it possible that there is a time window between this becoming true and
schedule_idle(), which is TIF_NEED_RESCHED is not set in time and this CPU
will be doing arch idle again? If so, we're actually counting less
idle time than it is.

I will test if this is true and provide statistics later. Appreciate
your attention again:)

Thanks,
Ze

> return 1;
> }
>
> So it looks to me that idle_cpu() is still a good fit at this place.
> And sched_core_idle_cpu() doesn't bring more overhead since the static
> key in sched_core_enabled() is rarely active (I guess...). And if it is,
> then the check is even more simple.
>
> Thanks.
>
> > if (!in_hardirq())
> > tick_nohz_irq_exit();
> > }
> > --
> > 2.41.0
> >