Subject: [PATCH RT 1/2] softirq: Avoid "local_softirq_pending" messages if ksoftirqd is blocked

If the ksoftirqd thread has a softirq pending and is blocked on the
`local_softirq_locks' lock then softirq_check_pending_idle() won't
complain because the "lock owner" will mask away this softirq from the
mask of pending softirqs.
If ksoftirqd has an additional softirq pending then it won't be masked
out because we never look at ksoftirqd's mask.

If there are still pending softirqs while going to idle check
ksoftirqd's and ktimersfotd's mask before complaining about unhandled
softirqs.

Cc: [email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/softirq.c | 58 +++++++++++++++++++++++++++++++++++-------------
1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index c15583162a559..8da664eb843a4 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -92,6 +92,31 @@ static inline void softirq_clr_runner(unsigned int sirq)
sr->runner[sirq] = NULL;
}

+static bool softirq_check_runner_tsk(struct task_struct *tsk,
+ unsigned int *pending)
+{
+ bool ret = false;
+
+ if (!tsk)
+ return ret;
+
+ /*
+ * The wakeup code in rtmutex.c wakes up the task
+ * _before_ it sets pi_blocked_on to NULL under
+ * tsk->pi_lock. So we need to check for both: state
+ * and pi_blocked_on.
+ */
+ raw_spin_lock(&tsk->pi_lock);
+ if (tsk->pi_blocked_on || tsk->state == TASK_RUNNING) {
+ /* Clear all bits pending in that task */
+ *pending &= ~(tsk->softirqs_raised);
+ ret = true;
+ }
+ raw_spin_unlock(&tsk->pi_lock);
+
+ return ret;
+}
+
/*
* On preempt-rt a softirq running context might be blocked on a
* lock. There might be no other runnable task on this CPU because the
@@ -104,6 +129,7 @@ static inline void softirq_clr_runner(unsigned int sirq)
*/
void softirq_check_pending_idle(void)
{
+ struct task_struct *tsk;
static int rate_limit;
struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
u32 warnpending;
@@ -113,26 +139,26 @@ void softirq_check_pending_idle(void)
return;

warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
+ if (!warnpending)
+ return;
for (i = 0; i < NR_SOFTIRQS; i++) {
- struct task_struct *tsk = sr->runner[i];
+ tsk = sr->runner[i];

- /*
- * The wakeup code in rtmutex.c wakes up the task
- * _before_ it sets pi_blocked_on to NULL under
- * tsk->pi_lock. So we need to check for both: state
- * and pi_blocked_on.
- */
- if (tsk) {
- raw_spin_lock(&tsk->pi_lock);
- if (tsk->pi_blocked_on || tsk->state == TASK_RUNNING) {
- /* Clear all bits pending in that task */
- warnpending &= ~(tsk->softirqs_raised);
- warnpending &= ~(1 << i);
- }
- raw_spin_unlock(&tsk->pi_lock);
- }
+ if (softirq_check_runner_tsk(tsk, &warnpending))
+ warnpending &= ~(1 << i);
}

+ if (warnpending) {
+ tsk = __this_cpu_read(ksoftirqd);
+ softirq_check_runner_tsk(tsk, &warnpending);
+ }
+
+ if (warnpending) {
+ tsk = __this_cpu_read(ktimer_softirqd);
+ softirq_check_runner_tsk(tsk, &warnpending);
+ }
+
+
if (warnpending) {
printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
warnpending);
--
2.20.1



Subject: [PATCH RT 2/2] x86: lazy-preempt: properly check against preempt-mask

should_resched() should check against preempt_offset after unmasking the
need-resched-bit. Otherwise should_resched() won't work for
preempt_offset != 0 and lazy-preempt set.

Cc: [email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/include/asm/preempt.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 22992c8377952..f667087792747 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -118,7 +118,7 @@ static __always_inline bool should_resched(int preempt_offset)

/* preempt count == 0 ? */
tmp &= ~PREEMPT_NEED_RESCHED;
- if (tmp)
+ if (tmp != preempt_offset)
return false;
if (current_thread_info()->preempt_lazy_count)
return false;
--
2.20.1


2019-02-19 15:00:31

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH RT 1/2] softirq: Avoid "local_softirq_pending" messages if ksoftirqd is blocked

Hi,

On 18/02/19 17:31, Sebastian Andrzej Siewior wrote:
> If the ksoftirqd thread has a softirq pending and is blocked on the
> `local_softirq_locks' lock then softirq_check_pending_idle() won't
> complain because the "lock owner" will mask away this softirq from the
> mask of pending softirqs.
> If ksoftirqd has an additional softirq pending then it won't be masked
> out because we never look at ksoftirqd's mask.
>
> If there are still pending softirqs while going to idle check
> ksoftirqd's and ktimersfotd's mask before complaining about unhandled
> softirqs.
>
> Cc: [email protected]
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

I've been seeing those messages while running some stress tests (hog
tasks pinned to CPUs).

Have yet to see them after I applied this patch earlier this morning (it
usually took not much time to reproduce).

Tested-by: Juri Lelli <[email protected]>

Thanks!

- Juri

Subject: Re: [PATCH RT 1/2] softirq: Avoid "local_softirq_pending" messages if ksoftirqd is blocked

On 2019-02-19 15:58:26 [+0100], Juri Lelli wrote:
> Hi,
Hi,

> I've been seeing those messages while running some stress tests (hog
> tasks pinned to CPUs).
>
> Have yet to see them after I applied this patch earlier this morning (it
> usually took not much time to reproduce).

So is it better or not. I kind of read this as "nothing changed".

> Tested-by: Juri Lelli <[email protected]>
>
> Thanks!
>
> - Juri

Sebastian

Subject: [PATCH RT 4/2] hrtimer: Don't lose state in cpu_chill()

In cpu_chill() the state is set to TASK_UNINTERRUPTIBLE and a timer is
programmed. On return the state is always TASK_RUNNING which means we
lose the state if it was something other than RUNNING. Also
set_current_state() sets ->task_state_change to within cpu_chill() which
is not expected.

Save the task state on entry and restore it on return. Simply set the
state in order to avoid updating ->task_state_change.

Cc: [email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/time/hrtimer.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 851b2134e77f4..6f2736ec4b8ef 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1902,15 +1902,18 @@ void cpu_chill(void)
{
ktime_t chill_time;
unsigned int freeze_flag = current->flags & PF_NOFREEZE;
+ long saved_state;

+ saved_state = current->state;
chill_time = ktime_set(0, NSEC_PER_MSEC);
- set_current_state(TASK_UNINTERRUPTIBLE);
+ __set_current_state_no_track(TASK_UNINTERRUPTIBLE);
current->flags |= PF_NOFREEZE;
sleeping_lock_inc();
schedule_hrtimeout(&chill_time, HRTIMER_MODE_REL_HARD);
sleeping_lock_dec();
if (!freeze_flag)
current->flags &= ~PF_NOFREEZE;
+ __set_current_state_no_track(saved_state);
}
EXPORT_SYMBOL(cpu_chill);
#endif
--
2.20.1


Subject: [PATCH RT 3/2] softirq: Avoid "local_softirq_pending" messages if task is in cpu_chill()

If the softirq thread enters cpu_chill() then ->state is UNINTERRUPTIBLE
and has no ->pi_blocked_on set and so its mask is not taken into account.

->sleeping_lock is increased by cpu_chill() since it is also requried to
avoid a splat by RCU in case cpu_chill() is used while a RCU-read lock
is held. Use the same mechanism for the softirq-pending check.

Cc: [email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/softirq.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 48ae7dae81b9c..25bcf2f2714ba 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -105,9 +105,12 @@ static bool softirq_check_runner_tsk(struct task_struct *tsk,
* _before_ it sets pi_blocked_on to NULL under
* tsk->pi_lock. So we need to check for both: state
* and pi_blocked_on.
+ * The test against UNINTERRUPTIBLE + ->sleeping_lock is in case the
+ * task does cpu_chill().
*/
raw_spin_lock(&tsk->pi_lock);
- if (tsk->pi_blocked_on || tsk->state == TASK_RUNNING) {
+ if (tsk->pi_blocked_on || tsk->state == TASK_RUNNING ||
+ (tsk->state == TASK_UNINTERRUPTIBLE && tsk->sleeping_lock)) {
/* Clear all bits pending in that task */
*pending &= ~(tsk->softirqs_raised);
ret = true;
--
2.20.1


2019-02-19 16:28:35

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH RT 1/2] softirq: Avoid "local_softirq_pending" messages if ksoftirqd is blocked

On 19/02/19 17:06, Sebastian Andrzej Siewior wrote:
> On 2019-02-19 15:58:26 [+0100], Juri Lelli wrote:
> > Hi,
> Hi,
>
> > I've been seeing those messages while running some stress tests (hog
> > tasks pinned to CPUs).
> >
> > Have yet to see them after I applied this patch earlier this morning (it
> > usually took not much time to reproduce).
>
> So is it better or not. I kind of read this as "nothing changed".

It is better. Warning message doesn't appear anymore.

Subject: Re: [PATCH RT 1/2] softirq: Avoid "local_softirq_pending" messages if ksoftirqd is blocked

On 2019-02-19 17:27:41 [+0100], Juri Lelli wrote:
> It is better. Warning message doesn't appear anymore.

Okay, thanks.

Sebastian

2019-02-25 14:46:28

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH RT 4/2] hrtimer: Don't lose state in cpu_chill()

Hi Sebastian,

My box claims that this patch is busted. It argues its case by IO
deadlocking any kernel this patch is applied to when spinning rust is
flogged, including virgin 4.19-rt14, said kernel becoming stable again
when I whack the accused.

On Tue, 2019-02-19 at 17:08 +0100, Sebastian Andrzej Siewior wrote:
> In cpu_chill() the state is set to TASK_UNINTERRUPTIBLE and a timer is
> programmed. On return the state is always TASK_RUNNING which means we
> lose the state if it was something other than RUNNING. Also
> set_current_state() sets ->task_state_change to within cpu_chill() which
> is not expected.
>
> Save the task state on entry and restore it on return. Simply set the
> state in order to avoid updating ->task_state_change.
>
> Cc: [email protected]
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> kernel/time/hrtimer.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 851b2134e77f4..6f2736ec4b8ef 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1902,15 +1902,18 @@ void cpu_chill(void)
> {
> ktime_t chill_time;
> unsigned int freeze_flag = current->flags & PF_NOFREEZE;
> + long saved_state;
>
> + saved_state = current->state;
> chill_time = ktime_set(0, NSEC_PER_MSEC);
> - set_current_state(TASK_UNINTERRUPTIBLE);
> + __set_current_state_no_track(TASK_UNINTERRUPTIBLE);
> current->flags |= PF_NOFREEZE;
> sleeping_lock_inc();
> schedule_hrtimeout(&chill_time, HRTIMER_MODE_REL_HARD);
> sleeping_lock_dec();
> if (!freeze_flag)
> current->flags &= ~PF_NOFREEZE;
> + __set_current_state_no_track(saved_state);
> }
> EXPORT_SYMBOL(cpu_chill);
> #endif

Subject: Re: [PATCH RT 4/2] hrtimer: Don't lose state in cpu_chill()

On 2019-02-25 15:43:35 [+0100], Mike Galbraith wrote:
> Hi Sebastian,
Hi Mike,

> My box claims that this patch is busted. It argues its case by IO
> deadlocking any kernel this patch is applied to when spinning rust is
> flogged, including virgin 4.19-rt14, said kernel becoming stable again
> when I whack the accused.

does the following hunk make any difference?

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 6ecdb9469ca9..e154632b90b4 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1884,20 +1884,29 @@ COMPAT_SYSCALL_DEFINE2(nanosleep, struct old_timespec32 __user *, rqtp,
*/
void cpu_chill(void)
{
+ struct task_struct *self = current;
ktime_t chill_time;
unsigned int freeze_flag = current->flags & PF_NOFREEZE;
- long saved_state;

- saved_state = current->state;
- chill_time = ktime_set(0, NSEC_PER_MSEC);
+ raw_spin_lock_irq(&self->pi_lock);
+ WARN_ON(self->saved_state != TASK_RUNNING);
+ self->saved_state = self->state;
__set_current_state_no_track(TASK_UNINTERRUPTIBLE);
+ raw_spin_unlock_irq(&self->pi_lock);
+
+ chill_time = ktime_set(0, NSEC_PER_MSEC);
+
current->flags |= PF_NOFREEZE;
sleeping_lock_inc();
schedule_hrtimeout(&chill_time, HRTIMER_MODE_REL_HARD);
sleeping_lock_dec();
if (!freeze_flag)
current->flags &= ~PF_NOFREEZE;
- __set_current_state_no_track(saved_state);
+
+ raw_spin_lock_irq(&self->pi_lock);
+ __set_current_state_no_track(self->saved_state);
+ self->saved_state = TASK_RUNNING;
+ raw_spin_unlock_irq(&self->pi_lock);
}
EXPORT_SYMBOL(cpu_chill);
#endif

Sebastian

2019-02-25 17:41:58

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH RT 4/2] hrtimer: Don't lose state in cpu_chill()

On Mon, 2019-02-25 at 17:34 +0100, Sebastian Andrzej Siewior wrote:
> On 2019-02-25 15:43:35 [+0100], Mike Galbraith wrote:
> > Hi Sebastian,
> Hi Mike,
>
> > My box claims that this patch is busted. It argues its case by IO
> > deadlocking any kernel this patch is applied to when spinning rust is
> > flogged, including virgin 4.19-rt14, said kernel becoming stable again
> > when I whack the accused.
>
> does the following hunk make any difference?

Ah. Yup, box says you're right.

> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 6ecdb9469ca9..e154632b90b4 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1884,20 +1884,29 @@ COMPAT_SYSCALL_DEFINE2(nanosleep, struct old_timespec32 __user *, rqtp,
> */
> void cpu_chill(void)
> {
> + struct task_struct *self = current;
> ktime_t chill_time;
> unsigned int freeze_flag = current->flags & PF_NOFREEZE;
> - long saved_state;
>
> - saved_state = current->state;
> - chill_time = ktime_set(0, NSEC_PER_MSEC);
> + raw_spin_lock_irq(&self->pi_lock);
> + WARN_ON(self->saved_state != TASK_RUNNING);
> + self->saved_state = self->state;
> __set_current_state_no_track(TASK_UNINTERRUPTIBLE);
> + raw_spin_unlock_irq(&self->pi_lock);
> +
> + chill_time = ktime_set(0, NSEC_PER_MSEC);
> +
> current->flags |= PF_NOFREEZE;
> sleeping_lock_inc();
> schedule_hrtimeout(&chill_time, HRTIMER_MODE_REL_HARD);
> sleeping_lock_dec();
> if (!freeze_flag)
> current->flags &= ~PF_NOFREEZE;
> - __set_current_state_no_track(saved_state);
> +
> + raw_spin_lock_irq(&self->pi_lock);
> + __set_current_state_no_track(self->saved_state);
> + self->saved_state = TASK_RUNNING;
> + raw_spin_unlock_irq(&self->pi_lock);
> }
> EXPORT_SYMBOL(cpu_chill);
> #endif
>
> Sebastian

Subject: Re: [PATCH RT 4/2] hrtimer: Don't lose state in cpu_chill()

On 2019-02-25 18:40:36 [+0100], Mike Galbraith wrote:
> Ah. Yup, box says you're right.

Thank you. Added:

Subject: [PATCH] hrtimer: cpu_chill(): save task state in ->saved_state()

In the previous change I saved the current task state on stack. This was
bad because while the task is scheduled-out it might receive a wake-up.
The wake up changes the task state and we must not destroy it.

Save the task-state in ->saved_state under a PI-lock to unsure that
state changes during are not missed while the task temporary scheduled
out.

Reported-by: Mike Galbraith <[email protected]>
Tested-by: Mike Galbraith <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/time/hrtimer.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 6f2736ec4b8ef..e1040b80362c9 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1900,20 +1900,28 @@ COMPAT_SYSCALL_DEFINE2(nanosleep, struct compat_timespec __user *, rqtp,
*/
void cpu_chill(void)
{
- ktime_t chill_time;
unsigned int freeze_flag = current->flags & PF_NOFREEZE;
- long saved_state;
+ struct task_struct *self = current;
+ ktime_t chill_time;

- saved_state = current->state;
- chill_time = ktime_set(0, NSEC_PER_MSEC);
+ raw_spin_lock_irq(&self->pi_lock);
+ self->saved_state = self->state;
__set_current_state_no_track(TASK_UNINTERRUPTIBLE);
+ raw_spin_unlock_irq(&self->pi_lock);
+
+ chill_time = ktime_set(0, NSEC_PER_MSEC);
+
current->flags |= PF_NOFREEZE;
sleeping_lock_inc();
schedule_hrtimeout(&chill_time, HRTIMER_MODE_REL_HARD);
sleeping_lock_dec();
if (!freeze_flag)
current->flags &= ~PF_NOFREEZE;
- __set_current_state_no_track(saved_state);
+
+ raw_spin_lock_irq(&self->pi_lock);
+ __set_current_state_no_track(self->saved_state);
+ self->saved_state = TASK_RUNNING;
+ raw_spin_unlock_irq(&self->pi_lock);
}
EXPORT_SYMBOL(cpu_chill);
#endif
--
2.20.1


Sebastian