2024-01-03 08:09:15

by alexs

[permalink] [raw]
Subject: [PATCH v2] sched/stat: correct the task blocking state

From: Alex Shi <[email protected]>

The commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
stopped the idle kthreads from contributing to the load average. However,
the idle state time still contributes to the blocked state time instead of
the sleep time. As a result, we cannot determine if a task is stopped due
to some reasons or if it is idle by its own initiative.

Distinguishing between these two states would make the system state clearer
and provide us with an opportunity to use the 'D' state of a task as an
indicator of latency issues.

Originally-from: Curu Wong <[email protected]>
Signed-off-by: Alex Shi <[email protected]>
To: [email protected]
To: Valentin Schneider <[email protected]>
To: Daniel Bristot de Oliveira <[email protected]>
To: Mel Gorman <[email protected]>
To: Ben Segall <[email protected]>
To: Steven Rostedt <[email protected]>
To: Dietmar Eggemann <[email protected]>
To: Vincent Guittot <[email protected]>
To: Juri Lelli <[email protected]>
To: Peter Zijlstra <[email protected]>
To: Ingo Molnar <[email protected]>
---
include/linux/sched.h | 6 ++++++
kernel/sched/deadline.c | 5 +++--
kernel/sched/fair.c | 5 +++--
kernel/sched/rt.c | 5 +++--
4 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 292c31697248..002f80291837 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -140,6 +140,12 @@ struct user_event_mm;
#define is_special_task_state(state) \
((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | TASK_DEAD))

+/* blocked task is UNINTERRUPTIBLE but not NOLOAD */
+#define is_blocked_state(state) \
+ ((state) & TASK_UNINTERRUPTIBLE && (!((state) & TASK_NOLOAD)))
+
+#define is_idle_state(state) (((state) & TASK_IDLE) == TASK_IDLE)
+
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
# define debug_normal_state_change(state_value) \
do { \
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b28114478b82..99d46affc2aa 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1566,11 +1566,12 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
unsigned int state;

state = READ_ONCE(p->__state);
- if (state & TASK_INTERRUPTIBLE)
+ /* idle state still accounts into sleep */
+ if (state & TASK_INTERRUPTIBLE || is_idle_state(state))
__schedstat_set(p->stats.sleep_start,
rq_clock(rq_of_dl_rq(dl_rq)));

- if (state & TASK_UNINTERRUPTIBLE)
+ if (is_blocked_state(state))
__schedstat_set(p->stats.block_start,
rq_clock(rq_of_dl_rq(dl_rq)));
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7a3c63a2171..69506253aadf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1278,10 +1278,11 @@ update_stats_dequeue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int fl

/* XXX racy against TTWU */
state = READ_ONCE(tsk->__state);
- if (state & TASK_INTERRUPTIBLE)
+ /* idle state still accounts into sleep */
+ if (state & TASK_INTERRUPTIBLE || is_idle_state(state))
__schedstat_set(tsk->stats.sleep_start,
rq_clock(rq_of(cfs_rq)));
- if (state & TASK_UNINTERRUPTIBLE)
+ if (is_blocked_state(state))
__schedstat_set(tsk->stats.block_start,
rq_clock(rq_of(cfs_rq)));
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6aaf0a3d6081..dd0e381689f8 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1371,11 +1371,12 @@ update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
unsigned int state;

state = READ_ONCE(p->__state);
- if (state & TASK_INTERRUPTIBLE)
+ /* idle state still accounts into sleep */
+ if (state & TASK_INTERRUPTIBLE || is_idle_state(state))
__schedstat_set(p->stats.sleep_start,
rq_clock(rq_of_rt_rq(rt_rq)));

- if (state & TASK_UNINTERRUPTIBLE)
+ if (is_blocked_state(state))
__schedstat_set(p->stats.block_start,
rq_clock(rq_of_rt_rq(rt_rq)));
}
--
2.43.0



2024-01-04 10:39:00

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v2] sched/stat: correct the task blocking state

On 03/01/24 16:10, [email protected] wrote:
> From: Alex Shi <[email protected]>
>
> The commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> stopped the idle kthreads from contributing to the load average. However,
> the idle state time still contributes to the blocked state time instead of
> the sleep time. As a result, we cannot determine if a task is stopped due
> to some reasons or if it is idle by its own initiative.
>
> Distinguishing between these two states would make the system state clearer
> and provide us with an opportunity to use the 'D' state of a task as an
> indicator of latency issues.
>

get_task_state() already reports TASK_IDLE as 'I', which should be what
userspace sees (e.g. via /proc/$pid/stat). This is also the case for the
sched_switch and sched_wakeup trace events.

I assume what you mean here is you first turn to schedstats to check
whether there is any abnormal amount of blocking, and then if there is any
you turn to tracing, in which case you'd like the schedstats to not make
things look worse than they really are?


2024-01-04 11:30:01

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH v2] sched/stat: correct the task blocking state

On Thu, Jan 4, 2024 at 6:38 PM Valentin Schneider <[email protected]> wrote:
>
> On 03/01/24 16:10, [email protected] wrote:
> > From: Alex Shi <[email protected]>
> >
> > The commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> > stopped the idle kthreads from contributing to the load average. However,
> > the idle state time still contributes to the blocked state time instead of
> > the sleep time. As a result, we cannot determine if a task is stopped due
> > to some reasons or if it is idle by its own initiative.
> >
> > Distinguishing between these two states would make the system state clearer
> > and provide us with an opportunity to use the 'D' state of a task as an
> > indicator of latency issues.
> >
>
> get_task_state() already reports TASK_IDLE as 'I', which should be what
> userspace sees (e.g. via /proc/$pid/stat). This is also the case for the
> sched_switch and sched_wakeup trace events.
>
> I assume what you mean here is you first turn to schedstats to check
> whether there is any abnormal amount of blocking, and then if there is any
> you turn to tracing, in which case you'd like the schedstats to not make
> things look worse than they really are?

Yes, switch/wakeup or others could help to figure out real blocked
time, but with this change, schedstats could give a neat and elegant
way.

Thanks!
>

2024-01-17 08:49:01

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH v2] sched/stat: correct the task blocking state

Alex Shi <[email protected]> 于2024年1月4日周四 19:29写道:
>
> On Thu, Jan 4, 2024 at 6:38 PM Valentin Schneider <[email protected]> wrote:
> >
> > On 03/01/24 16:10, [email protected] wrote:
> > > From: Alex Shi <[email protected]>
> > >
> > > The commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> > > stopped the idle kthreads from contributing to the load average. However,
> > > the idle state time still contributes to the blocked state time instead of
> > > the sleep time. As a result, we cannot determine if a task is stopped due
> > > to some reasons or if it is idle by its own initiative.
> > >
> > > Distinguishing between these two states would make the system state clearer
> > > and provide us with an opportunity to use the 'D' state of a task as an
> > > indicator of latency issues.
> > >
> >
> > get_task_state() already reports TASK_IDLE as 'I', which should be what
> > userspace sees (e.g. via /proc/$pid/stat). This is also the case for the
> > sched_switch and sched_wakeup trace events.
> >
> > I assume what you mean here is you first turn to schedstats to check
> > whether there is any abnormal amount of blocking, and then if there is any
> > you turn to tracing, in which case you'd like the schedstats to not make
> > things look worse than they really are?
>
> Yes, switch/wakeup or others could help to figure out real blocked
> time, but with this change, schedstats could give a neat and elegant
> way.

For all of the shortages I can imagine, we can't treat blocked and
sleep states as before, and that may force some scripts to change on
these things, but giving 2 states the correct way is better way, and
make sleep/block state more meaningful and helpful in normal usage.
Are there any concerns on this?

Thanks
Alex