2024-01-02 07:32:30

by alexs

[permalink] [raw]
Subject: [PATCH] sched/tracing: correct the task blocking state

From: Alex Shi <[email protected]>

commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
stopped the idle kthreads contribution to loadavg. Also task idle should
separated from blocked state too, otherwise we will get incorrect task
blocking state from event tracing sched:sched_stat_blocked.

Originally-from: Curu Wong <[email protected]>
Signed-off-by: Alex Shi <[email protected]>
To: [email protected]
To: Valentin Schneider <[email protected]>
To: Daniel Bristot de Oliveira <[email protected]>
To: Mel Gorman <[email protected]>
To: Ben Segall <[email protected]>
To: Steven Rostedt <[email protected]>
To: Dietmar Eggemann <[email protected]>
To: Vincent Guittot <[email protected]>
To: Juri Lelli <[email protected]>
To: Peter Zijlstra <[email protected]>
To: Ingo Molnar <[email protected]>
---
include/linux/sched.h | 4 ++++
kernel/sched/deadline.c | 2 +-
kernel/sched/fair.c | 2 +-
kernel/sched/rt.c | 2 +-
4 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 292c31697248..341e62255ea7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -140,6 +140,10 @@ struct user_event_mm;
#define is_special_task_state(state) \
((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | TASK_DEAD))

+/* blocked task is UNINTERRUPTIBLE but not NOLOAD */
+#define is_blocked_task_state(state) \
+ ((state) & TASK_UNINTERRUPTIBLE && (!((state) & TASK_NOLOAD)))
+
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
# define debug_normal_state_change(state_value) \
do { \
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b28114478b82..b6afa596f071 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
__schedstat_set(p->stats.sleep_start,
rq_clock(rq_of_dl_rq(dl_rq)));

- if (state & TASK_UNINTERRUPTIBLE)
+ if (is_blocked_task_state(state))
__schedstat_set(p->stats.block_start,
rq_clock(rq_of_dl_rq(dl_rq)));
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7a3c63a2171..349b0c5104b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1281,7 +1281,7 @@ update_stats_dequeue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int fl
if (state & TASK_INTERRUPTIBLE)
__schedstat_set(tsk->stats.sleep_start,
rq_clock(rq_of(cfs_rq)));
- if (state & TASK_UNINTERRUPTIBLE)
+ if (is_blocked_task_state(state))
__schedstat_set(tsk->stats.block_start,
rq_clock(rq_of(cfs_rq)));
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6aaf0a3d6081..2fdf3d71428d 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1375,7 +1375,7 @@ update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
__schedstat_set(p->stats.sleep_start,
rq_clock(rq_of_rt_rq(rt_rq)));

- if (state & TASK_UNINTERRUPTIBLE)
+ if (is_blocked_task_state(state))
__schedstat_set(p->stats.block_start,
rq_clock(rq_of_rt_rq(rt_rq)));
}
--
2.43.0



2024-01-02 10:19:21

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/tracing: correct the task blocking state

On 02/01/24 15:33, [email protected] wrote:
> From: Alex Shi <[email protected]>
>
> commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> stopped the idle kthreads contribution to loadavg. Also task idle should
> separated from blocked state too, otherwise we will get incorrect task
> blocking state from event tracing sched:sched_stat_blocked.
>

Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
loadavg yes, but they are still in an UNINTERRUPTIBLE wait.

> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index b28114478b82..b6afa596f071 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
> __schedstat_set(p->stats.sleep_start,
> rq_clock(rq_of_dl_rq(dl_rq)));
>
> - if (state & TASK_UNINTERRUPTIBLE)
> + if (is_blocked_task_state(state))
> __schedstat_set(p->stats.block_start,
> rq_clock(rq_of_dl_rq(dl_rq)));

This change makes it so tasks waiting in TASK_IDLE have their waiting
ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).


2024-01-02 13:01:13

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH] sched/tracing: correct the task blocking state

On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <[email protected]> wrote:
>
> On 02/01/24 15:33, [email protected] wrote:
> > From: Alex Shi <[email protected]>
> >
> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> > stopped the idle kthreads contribution to loadavg. Also task idle should
> > separated from blocked state too, otherwise we will get incorrect task
> > blocking state from event tracing sched:sched_stat_blocked.
> >
>
> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.


Hi Valentin,
Thanks a lot for the reply.

I agree with you the current usage, but if so, we account for the idle task into
blocked state. And it's better to distinguish between idle and block.

>
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index b28114478b82..b6afa596f071 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
> > __schedstat_set(p->stats.sleep_start,
> > rq_clock(rq_of_dl_rq(dl_rq)));
> >
> > - if (state & TASK_UNINTERRUPTIBLE)
> > + if (is_blocked_task_state(state))
> > __schedstat_set(p->stats.block_start,
> > rq_clock(rq_of_dl_rq(dl_rq)));
>
> This change makes it so tasks waiting in TASK_IDLE have their waiting
> ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).
>

Right, maybe it's time to add a new state for TASK_IDLE?

Many thanks!
Alex Shi

2024-01-02 16:17:22

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/tracing: correct the task blocking state

On 02/01/24 21:00, Alex Shi wrote:
> On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <[email protected]> wrote:
>>
>> On 02/01/24 15:33, [email protected] wrote:
>> > From: Alex Shi <[email protected]>
>> >
>> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
>> > stopped the idle kthreads contribution to loadavg. Also task idle should
>> > separated from blocked state too, otherwise we will get incorrect task
>> > blocking state from event tracing sched:sched_stat_blocked.
>> >
>>
>> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
>> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
>> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
>
>
> Hi Valentin,
> Thanks a lot for the reply.
>
> I agree with you the current usage, but if so, we account for the idle task into
> blocked state. And it's better to distinguish between idle and block.
>

Why is that an issue? If those tasks didn't have to be
TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').

What problem are you facing with those tasks being flagged as blocked during
their wait?


2024-01-03 03:15:18

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH] sched/tracing: correct the task blocking state

On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <[email protected]> wrote:
>
> On 02/01/24 21:00, Alex Shi wrote:
> > On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <[email protected]> wrote:
> >>
> >> On 02/01/24 15:33, [email protected] wrote:
> >> > From: Alex Shi <[email protected]>
> >> >
> >> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> >> > stopped the idle kthreads contribution to loadavg. Also task idle should
> >> > separated from blocked state too, otherwise we will get incorrect task
> >> > blocking state from event tracing sched:sched_stat_blocked.
> >> >
> >>
> >> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> >> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> >> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
> >
> >
> > Hi Valentin,
> > Thanks a lot for the reply.
> >
> > I agree with you the current usage, but if so, we account for the idle task into
> > blocked state. And it's better to distinguish between idle and block.
> >
>
> Why is that an issue? If those tasks didn't have to be
> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
>
> What problem are you facing with those tasks being flagged as blocked during
> their wait?
>

Uh, Tencent cloud has some latency sensitive services, a blocked state
means the service has
some trouble, but with IDLE state involved, it's failed on this judgement.
and 2nd, if a service has abnormal, we want to check if it's hanging
on io or sth else, but the top
3 D tasks are often queuework in our system, and even a task in
blocked state we have no
quick way to figure out if it's IDLE or BLOCKED. 2 different states
will help us a lot.

Thanks!

2024-01-03 07:21:17

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH] sched/tracing: correct the task blocking state

On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <[email protected]> wrote:
>
> On 02/01/24 15:33, [email protected] wrote:
> > From: Alex Shi <[email protected]>
> >
> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> > stopped the idle kthreads contribution to loadavg. Also task idle should
> > separated from blocked state too, otherwise we will get incorrect task
> > blocking state from event tracing sched:sched_stat_blocked.
> >
>
> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
>
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index b28114478b82..b6afa596f071 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
> > __schedstat_set(p->stats.sleep_start,
> > rq_clock(rq_of_dl_rq(dl_rq)));
> >
> > - if (state & TASK_UNINTERRUPTIBLE)
> > + if (is_blocked_task_state(state))
> > __schedstat_set(p->stats.block_start,
> > rq_clock(rq_of_dl_rq(dl_rq)));
>
> This change makes it so tasks waiting in TASK_IDLE have their waiting
> ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).

Right, I will fix it by adding idle time into sleep. will send the 2nd
version patch.

Thanks
Alex

2024-01-03 13:57:16

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/tracing: correct the task blocking state

On 03/01/24 11:14, Alex Shi wrote:
> On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <[email protected]> wrote:
>>
>> Why is that an issue? If those tasks didn't have to be
>> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
>> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
>>
>> What problem are you facing with those tasks being flagged as blocked during
>> their wait?
>>
>
> Uh, Tencent cloud has some latency sensitive services, a blocked state
> means the service has
> some trouble, but with IDLE state involved, it's failed on this judgement.
> and 2nd, if a service has abnormal, we want to check if it's hanging
> on io or sth else, but the top
> 3 D tasks are often queuework in our system, and even a task in
> blocked state we have no
> quick way to figure out if it's IDLE or BLOCKED. 2 different states
> will help us a lot.
>

That's useful information - generally it's good to add the motivation for a
patch in the changelog.

> Thanks!


2024-01-04 07:57:42

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH] sched/tracing: correct the task blocking state

On Wed, Jan 3, 2024 at 9:56 PM Valentin Schneider <[email protected]> wrote:
>
> On 03/01/24 11:14, Alex Shi wrote:
> > On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <[email protected]> wrote:
> >>
> >> Why is that an issue? If those tasks didn't have to be
> >> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
> >> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
> >>
> >> What problem are you facing with those tasks being flagged as blocked during
> >> their wait?
> >>
> >
> > Uh, Tencent cloud has some latency sensitive services, a blocked state
> > means the service has
> > some trouble, but with IDLE state involved, it's failed on this judgement.
> > and 2nd, if a service has abnormal, we want to check if it's hanging
> > on io or sth else, but the top
> > 3 D tasks are often queuework in our system, and even a task in
> > blocked state we have no
> > quick way to figure out if it's IDLE or BLOCKED. 2 different states
> > will help us a lot.
> >
>
> That's useful information - generally it's good to add the motivation for a
> patch in the changelog.

Thanks a lot for coaching, I changed the commit log in the v2 patch:
https://lore.kernel.org/lkml/[email protected]/
Btw, which way is usual prefered, send v2 patch in a separate thread
or add '--in-reply-to' to follow this thread?

Thanks
Alex

>
> > Thanks!
>

2024-01-04 09:06:09

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/tracing: correct the task blocking state

On 04/01/24 15:56, Alex Shi wrote:
> On Wed, Jan 3, 2024 at 9:56 PM Valentin Schneider <[email protected]> wrote:
>>
>> That's useful information - generally it's good to add the motivation for a
>> patch in the changelog.
>
> Thanks a lot for coaching, I changed the commit log in the v2 patch:
> https://lore.kernel.org/lkml/[email protected]/
> Btw, which way is usual prefered, send v2 patch in a separate thread
> or add '--in-reply-to' to follow this thread?
>

New threads are usually better, it makes it clear that it is a new version
of the patch and not just a reply to the previous version's
thread.

Reviewers/Maintainers can also jump more easily on the v(n+1) when it's a
separate thread, in case they missed / didn't have time to review the
previous version.

> Thanks
> Alex
>
>>
>> > Thanks!
>>