Subject: [RFC PATCH 0/3] Allow to preempt a timer softirq on PREEMPT_RT.

Hi,

while the softirqs are served, bottom halves are disabled. By disabling
bottom halves (as per local_bh_disable()) PREEMPT_RT acquires a
local_lock_t. This lock ensures that the softirq is synchronized against
other softirq user on that CPU while keeping the context preemptible.

This leads to a scenario where context itself is preemptible but needs
to "complete" before the system can make progress. For instance, the
timer callback (in TIMER_SOFTIRQ) gets preempted because a
force-threaded interrupt thread, with higher priority, gets woken up.
Before the handler of the forced-threaded interrupt can be invoked,
bottom halves get disabled and this blocks on the same per-CPU lock.
This in turn leads to a PI-boost and the preempted timer softirq is back
on the CPU with higher priority completing its job (not just the timer,
all pending softirqs).

In the end the force threaded interrupt is blocked until all pending
softirqs have been served.

The PI-boost is usually intended to allow the thread with lower priority
to "quickly" finish what it was doing and leave the critical section
ASAP. This is not the case with softirqs and how this is handled by the
individual callbacks. Additionally the need_resched() check in
__do_softirq() is never true due to the boost. This means in worst case
this can run for MAX_SOFTIRQ_TIME or MAX_SOFTIRQ_RESTART.

One way of out would be to add preemption within the softirq handling at
which point the softirq-BKL can be dropped. This can be after all
softirqs have been served (__do_softirq() where the need_resched() check
is located), after each softirq handler or within the softirq handler
where it is considered safe to do so.

This series adds as an example such a preemption point to the timer
softirq handler. Should this fly then it would be needed the remaining
handlers as well.

Sebastian




Subject: [RFC PATCH 1/3] sched/core: Provide a method to check if a task is PI-boosted.

Provide a method to check if a task inherited the priority from another
task. This happens if a task owns a lock which is requested by a task
with higher priority. This can be used as a hint to add a preemption
point to the critical section.

Provide a function which reports true if the task is PI-boosted.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
include/linux/sched.h | 1 +
kernel/sched/core.c | 15 +++++++++++++++
2 files changed, 16 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 609bde814cb06..77fd274133750 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1887,6 +1887,7 @@ static inline int dl_task_check_affinity(struct task_struct *p, const struct cpu
}
#endif

+extern bool task_is_pi_boosted(const struct task_struct *p);
extern int yield_to(struct task_struct *p, bool preempt);
extern void set_user_nice(struct task_struct *p, long nice);
extern int task_prio(const struct task_struct *p);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c52c2eba7c739..132f06522efa0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8886,6 +8886,21 @@ static inline void preempt_dynamic_init(void) { }

#endif /* #ifdef CONFIG_PREEMPT_DYNAMIC */

+/*
+ * task_is_pi_boosted - Check if task has been PI boosted.
+ * @p: Task to check.
+ *
+ * Return true if task is subject to priority inheritance.
+ */
+bool task_is_pi_boosted(const struct task_struct *p)
+{
+ int prio = p->prio;
+
+ if (!rt_prio(prio))
+ return false;
+ return prio != p->normal_prio;
+}
+
/**
* yield - yield the current processor to other threads.
*
--
2.40.1


Subject: [RFC PATCH 2/3] softirq: Add function to preempt serving softirqs.

Add a functionality for the softirq handler to preempt its current work
if needed. The softirq core has no particular state. It reads and resets
the pending softirq bits and then processes one after the other.
It can already be preempted while it invokes a certain softirq handler.

By enabling the BH the softirq core releases the per-CPU bh lock which
serializes all softirq handler. It is safe to do as long as the code
does not expect any serialisation in between. A typical scenarion would
after the invocation of callback where no state needs to be preserved
before the next callback is invoked.

Add functionaliry to preempt the serving softirqs.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
include/linux/bottom_half.h | 2 ++
kernel/softirq.c | 13 +++++++++++++
2 files changed, 15 insertions(+)

diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h
index fc53e0ad56d90..448bbef474564 100644
--- a/include/linux/bottom_half.h
+++ b/include/linux/bottom_half.h
@@ -35,8 +35,10 @@ static inline void local_bh_enable(void)

#ifdef CONFIG_PREEMPT_RT
extern bool local_bh_blocked(void);
+extern void softirq_preempt(void);
#else
static inline bool local_bh_blocked(void) { return false; }
+static inline void softirq_preempt(void) { }
#endif

#endif /* _LINUX_BH_H */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 807b34ccd7973..dd3307a619af7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -247,6 +247,19 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
}
EXPORT_SYMBOL(__local_bh_enable_ip);

+void softirq_preempt(void)
+{
+ if (WARN_ON_ONCE(!preemptible()))
+ return;
+
+ if (WARN_ON_ONCE(__this_cpu_read(softirq_ctrl.cnt) != SOFTIRQ_OFFSET))
+ return;
+
+ __local_bh_enable(SOFTIRQ_OFFSET, true);
+ /* preemption point */
+ __local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
+}
+
/*
* Invoked from ksoftirqd_run() outside of the interrupt disabled section
* to acquire the per CPU local lock for reentrancy protection.
--
2.40.1


Subject: [RFC PATCH 3/3] time: Allow to preempt after a callback.

The TIMER_SOFTIRQ handler invokes timer callbacks of the expired timers.
Before each invocation the timer_base::lock is dropped. The only lock
that is still held is the timer_base::expiry_lock and the per-CPU
bh-lock as part of local_bh_disable(). The former is released as part
of lock up prevention if the timer is preempted by the caller which is
waiting for its completion.

Both locks are already released as part of timer_sync_wait_running().
This can be extended by also releasing in bh-lock. The timer core does
not rely on any state that is serialized by the bh-lock. The timer
callback expects the bh-state to be serialized by the lock but there is
no need to keep state synchronized while invoking multiple callbacks.

Preempt handling softirqs and release all locks after a timer invocation
if the current has inherited priority.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/time/timer.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 63a8ce7177dd4..b76856bc2edd2 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1470,9 +1470,16 @@ static inline void timer_base_unlock_expiry(struct timer_base *base)
*/
static void timer_sync_wait_running(struct timer_base *base)
{
- if (atomic_read(&base->timer_waiters)) {
+ bool need_preempt;
+
+ need_preempt = task_is_pi_boosted(current);
+ if (need_preempt || atomic_read(&base->timer_waiters)) {
raw_spin_unlock_irq(&base->lock);
spin_unlock(&base->expiry_lock);
+
+ if (need_preempt)
+ softirq_preempt();
+
spin_lock(&base->expiry_lock);
raw_spin_lock_irq(&base->lock);
}
--
2.40.1


2023-11-02 09:31:46

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] sched/core: Provide a method to check if a task is PI-boosted.

Hi,

Apologies for noticing only now, but I believe this is still part of the
6.6-rt patches and I've got the below question to ask.

On 04/08/23 13:30, Sebastian Andrzej Siewior wrote:
> Provide a method to check if a task inherited the priority from another
> task. This happens if a task owns a lock which is requested by a task
> with higher priority. This can be used as a hint to add a preemption
> point to the critical section.
>
> Provide a function which reports true if the task is PI-boosted.
>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> include/linux/sched.h | 1 +
> kernel/sched/core.c | 15 +++++++++++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 609bde814cb06..77fd274133750 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1887,6 +1887,7 @@ static inline int dl_task_check_affinity(struct task_struct *p, const struct cpu
> }
> #endif
>
> +extern bool task_is_pi_boosted(const struct task_struct *p);
> extern int yield_to(struct task_struct *p, bool preempt);
> extern void set_user_nice(struct task_struct *p, long nice);
> extern int task_prio(const struct task_struct *p);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c52c2eba7c739..132f06522efa0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8886,6 +8886,21 @@ static inline void preempt_dynamic_init(void) { }
>
> #endif /* #ifdef CONFIG_PREEMPT_DYNAMIC */
>
> +/*
> + * task_is_pi_boosted - Check if task has been PI boosted.
> + * @p: Task to check.
> + *
> + * Return true if task is subject to priority inheritance.
> + */
> +bool task_is_pi_boosted(const struct task_struct *p)
> +{
> + int prio = p->prio;
> +
> + if (!rt_prio(prio))
> + return false;
> + return prio != p->normal_prio;

Does this need to also take DEADLINE tasks into consideration? We don't
change priority when they are boosted, only pi_se changes (please check
is_dl_boosted()).

Thanks,
Juri

Subject: Re: [RFC PATCH 1/3] sched/core: Provide a method to check if a task is PI-boosted.

On 2023-11-02 10:30:50 [+0100], Juri Lelli wrote:
> Hi,
Hi,

> Apologies for noticing only now, but I believe this is still part of the
> 6.6-rt patches and I've got the below question to ask.

I'm not sure if we want this. I do have an alternative solution in the
meantime.

> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -8886,6 +8886,21 @@ static inline void preempt_dynamic_init(void) { }
> >
> > #endif /* #ifdef CONFIG_PREEMPT_DYNAMIC */
> >
> > +/*
> > + * task_is_pi_boosted - Check if task has been PI boosted.
> > + * @p: Task to check.
> > + *
> > + * Return true if task is subject to priority inheritance.
> > + */
> > +bool task_is_pi_boosted(const struct task_struct *p)
> > +{
> > + int prio = p->prio;
> > +
> > + if (!rt_prio(prio))
> > + return false;
> > + return prio != p->normal_prio;
>
> Does this need to also take DEADLINE tasks into consideration? We don't
> change priority when they are boosted, only pi_se changes (please check
> is_dl_boosted()).

If we want this, then probably yes.
Isn't task_struct::prio for the DL (boosted) task set to 0? While the RT
priority go from 1…MAX_RT_PRIO - 1.

> Thanks,
> Juri

Sebastian

2023-11-03 07:12:48

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] sched/core: Provide a method to check if a task is PI-boosted.

On 02/11/23 17:03, Sebastian Andrzej Siewior wrote:
> On 2023-11-02 10:30:50 [+0100], Juri Lelli wrote:
> > Hi,
> Hi,
>
> > Apologies for noticing only now, but I believe this is still part of the
> > 6.6-rt patches and I've got the below question to ask.
>
> I'm not sure if we want this. I do have an alternative solution in the
> meantime.

Ah, OK. Guess what below is moot then. :) Will be watching for the
alternative solution to show up.
>
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -8886,6 +8886,21 @@ static inline void preempt_dynamic_init(void) { }
> > >
> > > #endif /* #ifdef CONFIG_PREEMPT_DYNAMIC */
> > >
> > > +/*
> > > + * task_is_pi_boosted - Check if task has been PI boosted.
> > > + * @p: Task to check.
> > > + *
> > > + * Return true if task is subject to priority inheritance.
> > > + */
> > > +bool task_is_pi_boosted(const struct task_struct *p)
> > > +{
> > > + int prio = p->prio;
> > > +
> > > + if (!rt_prio(prio))
> > > + return false;
> > > + return prio != p->normal_prio;
> >
> > Does this need to also take DEADLINE tasks into consideration? We don't
> > change priority when they are boosted, only pi_se changes (please check
> > is_dl_boosted()).
>
> If we want this, then probably yes.
> Isn't task_struct::prio for the DL (boosted) task set to 0? While the RT
> priority go from 1…MAX_RT_PRIO - 1.

But then we can also have DL on DL boosting and in this case only the
pi_se changes while prio stays the same. But, again, looks like this is
moot anyway.

Thanks!
Juri