2023-05-16 12:46:57

by Alexandru Ardelean

[permalink] [raw]
Subject: [PATCH][V2] sched/rt: Print curr when RT throttling activated

From: Xianting Tian <[email protected]>

We may meet the issue, that one RT thread occupied the cpu by 950ms/1s,
The RT thread maybe is a business thread or other unknown thread.

Currently, it only outputs the print "sched: RT throttling activated"
when RT throttling happen. It is hard to know what is the RT thread,
For further analysis, we need add more prints.

This patch is to print current RT task when RT throttling activated,
It help us to know what is the RT thread in the first time.

Tested-by: Alexandru Ardelean <[email protected]>
Signed-off-by: Xianting Tian <[email protected]>
---

Initial patch submission:
https://lore.kernel.org/all/[email protected]/T/

We've been having some issues of our own with some applications + some RT
configuration == some threads endded up taking too much CPU time.
This patch came in quite in handy to see in logs which thread is more
problematic.

We've applied this patch onto a 5.10.116 tree. It did need a bit of
re-applying since some context has changed since the initial version (hence
the V2 tag).
Since 5.10.116 (where we used it), it applied nicely to the latest/current
linux-next tree (hence the Tested-by tag).

It would be nice to apply this to the mainline kernel and have this handy.

kernel/sched/rt.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 00e0e5074115..44b161e42733 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -995,7 +995,7 @@ static inline int rt_se_prio(struct sched_rt_entity *rt_se)
return rt_task_of(rt_se)->prio;
}

-static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
+static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq, struct task_struct *curr)
{
u64 runtime = sched_rt_runtime(rt_rq);

@@ -1019,7 +1019,8 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
*/
if (likely(rt_b->rt_runtime)) {
rt_rq->rt_throttled = 1;
- printk_deferred_once("sched: RT throttling activated\n");
+ printk_deferred_once("sched: RT throttling activated (curr: pid %d, comm %s)\n",
+ curr->pid, curr->comm);
} else {
/*
* In case we did anyway, make it go away,
@@ -1074,7 +1075,7 @@ static void update_curr_rt(struct rq *rq)
if (sched_rt_runtime(rt_rq) != RUNTIME_INF) {
raw_spin_lock(&rt_rq->rt_runtime_lock);
rt_rq->rt_time += delta_exec;
- exceeded = sched_rt_runtime_exceeded(rt_rq);
+ exceeded = sched_rt_runtime_exceeded(rt_rq, curr);
if (exceeded)
resched_curr(rq);
raw_spin_unlock(&rt_rq->rt_runtime_lock);
--
2.40.1



2023-07-13 20:14:06

by Alexandru Ardelean

[permalink] [raw]
Subject: Re: [PATCH][V2] sched/rt: Print curr when RT throttling activated

On Tue, May 16, 2023 at 3:22 PM Alexandru Ardelean <[email protected]> wrote:
>
> From: Xianting Tian <[email protected]>
>
> We may meet the issue, that one RT thread occupied the cpu by 950ms/1s,
> The RT thread maybe is a business thread or other unknown thread.
>
> Currently, it only outputs the print "sched: RT throttling activated"
> when RT throttling happen. It is hard to know what is the RT thread,
> For further analysis, we need add more prints.
>
> This patch is to print current RT task when RT throttling activated,
> It help us to know what is the RT thread in the first time.
>

Adding Greg on this patch, since it 's been a while.
And also: ping :)

> Tested-by: Alexandru Ardelean <[email protected]>
> Signed-off-by: Xianting Tian <[email protected]>
> ---
>
> Initial patch submission:
> https://lore.kernel.org/all/[email protected]/T/
>
> We've been having some issues of our own with some applications + some RT
> configuration == some threads endded up taking too much CPU time.
> This patch came in quite in handy to see in logs which thread is more
> problematic.
>
> We've applied this patch onto a 5.10.116 tree. It did need a bit of
> re-applying since some context has changed since the initial version (hence
> the V2 tag).
> Since 5.10.116 (where we used it), it applied nicely to the latest/current
> linux-next tree (hence the Tested-by tag).
>
> It would be nice to apply this to the mainline kernel and have this handy.
>
> kernel/sched/rt.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 00e0e5074115..44b161e42733 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -995,7 +995,7 @@ static inline int rt_se_prio(struct sched_rt_entity *rt_se)
> return rt_task_of(rt_se)->prio;
> }
>
> -static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
> +static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq, struct task_struct *curr)
> {
> u64 runtime = sched_rt_runtime(rt_rq);
>
> @@ -1019,7 +1019,8 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
> */
> if (likely(rt_b->rt_runtime)) {
> rt_rq->rt_throttled = 1;
> - printk_deferred_once("sched: RT throttling activated\n");
> + printk_deferred_once("sched: RT throttling activated (curr: pid %d, comm %s)\n",
> + curr->pid, curr->comm);
> } else {
> /*
> * In case we did anyway, make it go away,
> @@ -1074,7 +1075,7 @@ static void update_curr_rt(struct rq *rq)
> if (sched_rt_runtime(rt_rq) != RUNTIME_INF) {
> raw_spin_lock(&rt_rq->rt_runtime_lock);
> rt_rq->rt_time += delta_exec;
> - exceeded = sched_rt_runtime_exceeded(rt_rq);
> + exceeded = sched_rt_runtime_exceeded(rt_rq, curr);
> if (exceeded)
> resched_curr(rq);
> raw_spin_unlock(&rt_rq->rt_runtime_lock);
> --
> 2.40.1
>

2024-04-01 15:48:24

by Alexandru Ardelean

[permalink] [raw]
Subject: Re: [PATCH][V2] sched/rt: Print curr when RT throttling activated

On Thu, Jul 13, 2023 at 11:07 PM Alexandru Ardelean <[email protected]> wrote:
>
> On Tue, May 16, 2023 at 3:22 PM Alexandru Ardelean <alex@shruggiero> wrote:
> >
> > From: Xianting Tian <[email protected]>
> >
> > We may meet the issue, that one RT thread occupied the cpu by 950ms/1s,
> > The RT thread maybe is a business thread or other unknown thread.
> >
> > Currently, it only outputs the print "sched: RT throttling activated"
> > when RT throttling happen. It is hard to know what is the RT thread,
> > For further analysis, we need add more prints.
> >
> > This patch is to print current RT task when RT throttling activated,
> > It help us to know what is the RT thread in the first time.
> >
>
> Adding Greg on this patch, since it 's been a while.
> And also: ping :)

Ping on this :)

Thanks
Alex

>
> > Tested-by: Alexandru Ardelean <[email protected]>
> > Signed-off-by: Xianting Tian <[email protected]>
> > ---
> >
> > Initial patch submission:
> > https://lore.kernel.org/all/[email protected]/T/
> >
> > We've been having some issues of our own with some applications + some RT
> > configuration == some threads endded up taking too much CPU time.
> > This patch came in quite in handy to see in logs which thread is more
> > problematic.
> >
> > We've applied this patch onto a 5.10.116 tree. It did need a bit of
> > re-applying since some context has changed since the initial version (hence
> > the V2 tag).
> > Since 5.10.116 (where we used it), it applied nicely to the latest/current
> > linux-next tree (hence the Tested-by tag).
> >
> > It would be nice to apply this to the mainline kernel and have this handy.
> >
> > kernel/sched/rt.c | 7 ++++---
> > 1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 00e0e5074115..44b161e42733 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -995,7 +995,7 @@ static inline int rt_se_prio(struct sched_rt_entity *rt_se)
> > return rt_task_of(rt_se)->prio;
> > }
> >
> > -static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
> > +static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq, struct task_struct *curr)
> > {
> > u64 runtime = sched_rt_runtime(rt_rq);
> >
> > @@ -1019,7 +1019,8 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
> > */
> > if (likely(rt_b->rt_runtime)) {
> > rt_rq->rt_throttled = 1;
> > - printk_deferred_once("sched: RT throttling activated\n");
> > + printk_deferred_once("sched: RT throttling activated (curr: pid %d, comm %s)\n",
> > + curr->pid, curr->comm);
> > } else {
> > /*
> > * In case we did anyway, make it go away,
> > @@ -1074,7 +1075,7 @@ static void update_curr_rt(struct rq *rq)
> > if (sched_rt_runtime(rt_rq) != RUNTIME_INF) {
> > raw_spin_lock(&rt_rq->rt_runtime_lock);
> > rt_rq->rt_time += delta_exec;
> > - exceeded = sched_rt_runtime_exceeded(rt_rq);
> > + exceeded = sched_rt_runtime_exceeded(rt_rq, curr);
> > if (exceeded)
> > resched_curr(rq);
> > raw_spin_unlock(&rt_rq->rt_runtime_lock);
> > --
> > 2.40.1
> >

2024-04-08 15:26:04

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH][V2] sched/rt: Print curr when RT throttling activated

On 01/04/24 18:47, Alexandru Ardelean wrote:
> On Thu, Jul 13, 2023 at 11:07 PM Alexandru Ardelean <[email protected]> wrote:
>>
>> On Tue, May 16, 2023 at 3:22 PM Alexandru Ardelean <[email protected]> wrote:
>> >
>> > From: Xianting Tian <[email protected]>
>> >
>> > We may meet the issue, that one RT thread occupied the cpu by 950ms/1s,
>> > The RT thread maybe is a business thread or other unknown thread.
>> >
>> > Currently, it only outputs the print "sched: RT throttling activated"
>> > when RT throttling happen. It is hard to know what is the RT thread,
>> > For further analysis, we need add more prints.
>> >
>> > This patch is to print current RT task when RT throttling activated,
>> > It help us to know what is the RT thread in the first time.
>> >
>>
>> Adding Greg on this patch, since it 's been a while.
>> And also: ping :)
>
> Ping on this :)
>

AFAIA this has been proposed a few times in the past, the problem is that
printing the current task isn't the right thing to do as it may not be the
one that contributed most of the runtime that lead to throttling.

See https://lore.kernel.org/lkml/[email protected]/


2024-04-17 06:49:00

by Alexandru Ardelean

[permalink] [raw]
Subject: Re: [PATCH][V2] sched/rt: Print curr when RT throttling activated

On Mon, Apr 8, 2024 at 6:25 PM Valentin Schneider <[email protected]> wrote:
>
> On 01/04/24 18:47, Alexandru Ardelean wrote:
> > On Thu, Jul 13, 2023 at 11:07 PM Alexandru Ardelean <[email protected]> wrote:
> >>
> >> On Tue, May 16, 2023 at 3:22 PM Alexandru Ardelean <[email protected]> wrote:
> >> >
> >> > From: Xianting Tian <[email protected]>
> >> >
> >> > We may meet the issue, that one RT thread occupied the cpu by 950ms/1s,
> >> > The RT thread maybe is a business thread or other unknown thread.
> >> >
> >> > Currently, it only outputs the print "sched: RT throttling activated"
> >> > when RT throttling happen. It is hard to know what is the RT thread,
> >> > For further analysis, we need add more prints.
> >> >
> >> > This patch is to print current RT task when RT throttling activated,
> >> > It help us to know what is the RT thread in the first time.
> >> >
> >>
> >> Adding Greg on this patch, since it 's been a while.
> >> And also: ping :)
> >
> > Ping on this :)
> >
>
> AFAIA this has been proposed a few times in the past, the problem is that
> printing the current task isn't the right thing to do as it may not be the
> one that contributed most of the runtime that lead to throttling.
>
> See https://lore.kernel.org/lkml/[email protected]/

Ah.
Thanks for the input :)
I'll shut up on this.

Thanks
Alex

>