From: Alex Shi <[email protected]>
commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
stopped the idle kthreads contribution to loadavg. Also task idle should
separated from blocked state too, otherwise we will get incorrect task
blocking state from event tracing sched:sched_stat_blocked.
Originally-from: Curu Wong <[email protected]>
Signed-off-by: Alex Shi <[email protected]>
To: [email protected]
To: Valentin Schneider <[email protected]>
To: Daniel Bristot de Oliveira <[email protected]>
To: Mel Gorman <[email protected]>
To: Ben Segall <[email protected]>
To: Steven Rostedt <[email protected]>
To: Dietmar Eggemann <[email protected]>
To: Vincent Guittot <[email protected]>
To: Juri Lelli <[email protected]>
To: Peter Zijlstra <[email protected]>
To: Ingo Molnar <[email protected]>
---
include/linux/sched.h | 4 ++++
kernel/sched/deadline.c | 2 +-
kernel/sched/fair.c | 2 +-
kernel/sched/rt.c | 2 +-
4 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 292c31697248..341e62255ea7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -140,6 +140,10 @@ struct user_event_mm;
#define is_special_task_state(state) \
((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | TASK_DEAD))
+/* blocked task is UNINTERRUPTIBLE but not NOLOAD */
+#define is_blocked_task_state(state) \
+ ((state) & TASK_UNINTERRUPTIBLE && (!((state) & TASK_NOLOAD)))
+
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
# define debug_normal_state_change(state_value) \
do { \
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b28114478b82..b6afa596f071 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
__schedstat_set(p->stats.sleep_start,
rq_clock(rq_of_dl_rq(dl_rq)));
- if (state & TASK_UNINTERRUPTIBLE)
+ if (is_blocked_task_state(state))
__schedstat_set(p->stats.block_start,
rq_clock(rq_of_dl_rq(dl_rq)));
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7a3c63a2171..349b0c5104b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1281,7 +1281,7 @@ update_stats_dequeue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int fl
if (state & TASK_INTERRUPTIBLE)
__schedstat_set(tsk->stats.sleep_start,
rq_clock(rq_of(cfs_rq)));
- if (state & TASK_UNINTERRUPTIBLE)
+ if (is_blocked_task_state(state))
__schedstat_set(tsk->stats.block_start,
rq_clock(rq_of(cfs_rq)));
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6aaf0a3d6081..2fdf3d71428d 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1375,7 +1375,7 @@ update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
__schedstat_set(p->stats.sleep_start,
rq_clock(rq_of_rt_rq(rt_rq)));
- if (state & TASK_UNINTERRUPTIBLE)
+ if (is_blocked_task_state(state))
__schedstat_set(p->stats.block_start,
rq_clock(rq_of_rt_rq(rt_rq)));
}
--
2.43.0
On 02/01/24 15:33, [email protected] wrote:
> From: Alex Shi <[email protected]>
>
> commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> stopped the idle kthreads contribution to loadavg. Also task idle should
> separated from blocked state too, otherwise we will get incorrect task
> blocking state from event tracing sched:sched_stat_blocked.
>
Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index b28114478b82..b6afa596f071 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
> __schedstat_set(p->stats.sleep_start,
> rq_clock(rq_of_dl_rq(dl_rq)));
>
> - if (state & TASK_UNINTERRUPTIBLE)
> + if (is_blocked_task_state(state))
> __schedstat_set(p->stats.block_start,
> rq_clock(rq_of_dl_rq(dl_rq)));
This change makes it so tasks waiting in TASK_IDLE have their waiting
ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).
On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <[email protected]> wrote:
>
> On 02/01/24 15:33, [email protected] wrote:
> > From: Alex Shi <[email protected]>
> >
> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> > stopped the idle kthreads contribution to loadavg. Also task idle should
> > separated from blocked state too, otherwise we will get incorrect task
> > blocking state from event tracing sched:sched_stat_blocked.
> >
>
> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
Hi Valentin,
Thanks a lot for the reply.
I agree with you the current usage, but if so, we account for the idle task into
blocked state. And it's better to distinguish between idle and block.
>
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index b28114478b82..b6afa596f071 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
> > __schedstat_set(p->stats.sleep_start,
> > rq_clock(rq_of_dl_rq(dl_rq)));
> >
> > - if (state & TASK_UNINTERRUPTIBLE)
> > + if (is_blocked_task_state(state))
> > __schedstat_set(p->stats.block_start,
> > rq_clock(rq_of_dl_rq(dl_rq)));
>
> This change makes it so tasks waiting in TASK_IDLE have their waiting
> ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).
>
Right, maybe it's time to add a new state for TASK_IDLE?
Many thanks!
Alex Shi
On 02/01/24 21:00, Alex Shi wrote:
> On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <[email protected]> wrote:
>>
>> On 02/01/24 15:33, [email protected] wrote:
>> > From: Alex Shi <[email protected]>
>> >
>> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
>> > stopped the idle kthreads contribution to loadavg. Also task idle should
>> > separated from blocked state too, otherwise we will get incorrect task
>> > blocking state from event tracing sched:sched_stat_blocked.
>> >
>>
>> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
>> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
>> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
>
>
> Hi Valentin,
> Thanks a lot for the reply.
>
> I agree with you the current usage, but if so, we account for the idle task into
> blocked state. And it's better to distinguish between idle and block.
>
Why is that an issue? If those tasks didn't have to be
TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
What problem are you facing with those tasks being flagged as blocked during
their wait?
On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <[email protected]> wrote:
>
> On 02/01/24 21:00, Alex Shi wrote:
> > On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <[email protected]> wrote:
> >>
> >> On 02/01/24 15:33, [email protected] wrote:
> >> > From: Alex Shi <[email protected]>
> >> >
> >> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> >> > stopped the idle kthreads contribution to loadavg. Also task idle should
> >> > separated from blocked state too, otherwise we will get incorrect task
> >> > blocking state from event tracing sched:sched_stat_blocked.
> >> >
> >>
> >> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> >> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> >> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
> >
> >
> > Hi Valentin,
> > Thanks a lot for the reply.
> >
> > I agree with you the current usage, but if so, we account for the idle task into
> > blocked state. And it's better to distinguish between idle and block.
> >
>
> Why is that an issue? If those tasks didn't have to be
> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
>
> What problem are you facing with those tasks being flagged as blocked during
> their wait?
>
Uh, Tencent cloud has some latency sensitive services, a blocked state
means the service has
some trouble, but with IDLE state involved, it's failed on this judgement.
and 2nd, if a service has abnormal, we want to check if it's hanging
on io or sth else, but the top
3 D tasks are often queuework in our system, and even a task in
blocked state we have no
quick way to figure out if it's IDLE or BLOCKED. 2 different states
will help us a lot.
Thanks!
On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <[email protected]> wrote:
>
> On 02/01/24 15:33, [email protected] wrote:
> > From: Alex Shi <[email protected]>
> >
> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> > stopped the idle kthreads contribution to loadavg. Also task idle should
> > separated from blocked state too, otherwise we will get incorrect task
> > blocking state from event tracing sched:sched_stat_blocked.
> >
>
> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
>
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index b28114478b82..b6afa596f071 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
> > __schedstat_set(p->stats.sleep_start,
> > rq_clock(rq_of_dl_rq(dl_rq)));
> >
> > - if (state & TASK_UNINTERRUPTIBLE)
> > + if (is_blocked_task_state(state))
> > __schedstat_set(p->stats.block_start,
> > rq_clock(rq_of_dl_rq(dl_rq)));
>
> This change makes it so tasks waiting in TASK_IDLE have their waiting
> ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).
Right, I will fix it by adding idle time into sleep. will send the 2nd
version patch.
Thanks
Alex
On 03/01/24 11:14, Alex Shi wrote:
> On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <[email protected]> wrote:
>>
>> Why is that an issue? If those tasks didn't have to be
>> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
>> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
>>
>> What problem are you facing with those tasks being flagged as blocked during
>> their wait?
>>
>
> Uh, Tencent cloud has some latency sensitive services, a blocked state
> means the service has
> some trouble, but with IDLE state involved, it's failed on this judgement.
> and 2nd, if a service has abnormal, we want to check if it's hanging
> on io or sth else, but the top
> 3 D tasks are often queuework in our system, and even a task in
> blocked state we have no
> quick way to figure out if it's IDLE or BLOCKED. 2 different states
> will help us a lot.
>
That's useful information - generally it's good to add the motivation for a
patch in the changelog.
> Thanks!
On Wed, Jan 3, 2024 at 9:56 PM Valentin Schneider <[email protected]> wrote:
>
> On 03/01/24 11:14, Alex Shi wrote:
> > On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <[email protected]> wrote:
> >>
> >> Why is that an issue? If those tasks didn't have to be
> >> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
> >> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
> >>
> >> What problem are you facing with those tasks being flagged as blocked during
> >> their wait?
> >>
> >
> > Uh, Tencent cloud has some latency sensitive services, a blocked state
> > means the service has
> > some trouble, but with IDLE state involved, it's failed on this judgement.
> > and 2nd, if a service has abnormal, we want to check if it's hanging
> > on io or sth else, but the top
> > 3 D tasks are often queuework in our system, and even a task in
> > blocked state we have no
> > quick way to figure out if it's IDLE or BLOCKED. 2 different states
> > will help us a lot.
> >
>
> That's useful information - generally it's good to add the motivation for a
> patch in the changelog.
Thanks a lot for coaching, I changed the commit log in the v2 patch:
https://lore.kernel.org/lkml/[email protected]/
Btw, which way is usual prefered, send v2 patch in a separate thread
or add '--in-reply-to' to follow this thread?
Thanks
Alex
>
> > Thanks!
>
On 04/01/24 15:56, Alex Shi wrote:
> On Wed, Jan 3, 2024 at 9:56 PM Valentin Schneider <[email protected]> wrote:
>>
>> That's useful information - generally it's good to add the motivation for a
>> patch in the changelog.
>
> Thanks a lot for coaching, I changed the commit log in the v2 patch:
> https://lore.kernel.org/lkml/[email protected]/
> Btw, which way is usual prefered, send v2 patch in a separate thread
> or add '--in-reply-to' to follow this thread?
>
New threads are usually better, it makes it clear that it is a new version
of the patch and not just a reply to the previous version's
thread.
Reviewers/Maintainers can also jump more easily on the v(n+1) when it's a
separate thread, in case they missed / didn't have time to review the
previous version.
> Thanks
> Alex
>
>>
>> > Thanks!
>>