2019-07-26 19:59:50

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

From: Sebastian Andrzej Siewior <[email protected]>

On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for
hard interrupt expiry mode are moved into soft interrupt context either for
latency reasons or because the hrtimer callback takes regular spinlocks or
invokes other functions which are not suitable for hard interrupt context
on PREEMPT_RT.

The hrtimer_sleeper callback is RT compatible in hard interrupt context,
but there is a latency concern: Untrusted userspace can spawn many threads
which arm timers for the same expiry time on the same CPU. On expiry that
causes a latency spike due to the wakeup of a gazillion threads.

OTOH, priviledged real-time user space applications rely on the low latency
of hard interrupt wakeups. These syscall related wakeups are all based on
hrtimer sleepers.

If the current task is in a real-time scheduling class, mark the mode for
hard interrupt expiry.

[ tglx: Split out of a larger combo patch. Added changelog ]

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/time/hrtimer.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1662,6 +1662,30 @@ static enum hrtimer_restart hrtimer_wake
static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
clockid_t clock_id, enum hrtimer_mode mode)
{
+ /*
+ * On PREEMPT_RT enabled kernels hrtimers which are not explicitely
+ * marked for hard interrupt expiry mode are moved into soft
+ * interrupt context either for latency reasons or because the
+ * hrtimer callback takes regular spinlocks or invokes other
+ * functions which are not suitable for hard interrupt context on
+ * PREEMPT_RT.
+ *
+ * The hrtimer_sleeper callback is RT compatible in hard interrupt
+ * context, but there is a latency concern: Untrusted userspace can
+ * spawn many threads which arm timers for the same expiry time on
+ * the same CPU. That causes a latency spike due to the wakeup of
+ * a gazillion threads.
+ *
+ * OTOH, priviledged real-time user space applications rely on the
+ * low latency of hard interrupt wakeups. If the current task is in
+ * a real-time scheduling class, mark the mode for hard interrupt
+ * expiry.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
+ mode |= HRTIMER_MODE_HARD;
+ }
+
__hrtimer_init(&sl->timer, clock_id, mode);
sl->timer.function = hrtimer_wakeup;
sl->task = current;




2019-07-26 20:46:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

On Fri, 26 Jul 2019 20:30:58 +0200
Thomas Gleixner <[email protected]> wrote:

> +++ b/kernel/time/hrtimer.c
> @@ -1662,6 +1662,30 @@ static enum hrtimer_restart hrtimer_wake
> static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
> clockid_t clock_id, enum hrtimer_mode mode)
> {
> + /*
> + * On PREEMPT_RT enabled kernels hrtimers which are not explicitely
> + * marked for hard interrupt expiry mode are moved into soft
> + * interrupt context either for latency reasons or because the
> + * hrtimer callback takes regular spinlocks or invokes other
> + * functions which are not suitable for hard interrupt context on
> + * PREEMPT_RT.

Have we marked all timer handlers that have normal spin_locks as
HRTIMER_MODE_SOFT? Otherwise, can't we switch one to hard below and
having their handler grab a spin_lock/mutex in hard interrupt context
in RT?

-- Steve


> + *
> + * The hrtimer_sleeper callback is RT compatible in hard interrupt
> + * context, but there is a latency concern: Untrusted userspace can
> + * spawn many threads which arm timers for the same expiry time on
> + * the same CPU. That causes a latency spike due to the wakeup of
> + * a gazillion threads.
> + *
> + * OTOH, priviledged real-time user space applications rely on the
> + * low latency of hard interrupt wakeups. If the current task is in
> + * a real-time scheduling class, mark the mode for hard interrupt
> + * expiry.
> + */
> + if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> + if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
> + mode |= HRTIMER_MODE_HARD;
> + }
> +
> __hrtimer_init(&sl->timer, clock_id, mode);
> sl->timer.function = hrtimer_wakeup;
> sl->task = current;


2019-07-26 20:53:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

On Fri, 26 Jul 2019, Steven Rostedt wrote:

> On Fri, 26 Jul 2019 20:30:58 +0200
> Thomas Gleixner <[email protected]> wrote:
>
> > +++ b/kernel/time/hrtimer.c
> > @@ -1662,6 +1662,30 @@ static enum hrtimer_restart hrtimer_wake
> > static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
> > clockid_t clock_id, enum hrtimer_mode mode)
> > {
> > + /*
> > + * On PREEMPT_RT enabled kernels hrtimers which are not explicitely
> > + * marked for hard interrupt expiry mode are moved into soft
> > + * interrupt context either for latency reasons or because the
> > + * hrtimer callback takes regular spinlocks or invokes other
> > + * functions which are not suitable for hard interrupt context on
> > + * PREEMPT_RT.
>
> Have we marked all timer handlers that have normal spin_locks as
> HRTIMER_MODE_SOFT? Otherwise, can't we switch one to hard below and
> having their handler grab a spin_lock/mutex in hard interrupt context
> in RT?

See patch 09/12. We move all timers into soft mode which are not marked
MODE_HARD.

> > + *
> > + * The hrtimer_sleeper callback is RT compatible in hard interrupt

^^^^^^^^^^^^^^
> > + * context, but there is a latency concern: Untrusted userspace can
> > + * spawn many threads which arm timers for the same expiry time on
> > + * the same CPU. That causes a latency spike due to the wakeup of
> > + * a gazillion threads.
> > + *
> > + * OTOH, priviledged real-time user space applications rely on the
> > + * low latency of hard interrupt wakeups. If the current task is in
> > + * a real-time scheduling class, mark the mode for hard interrupt
> > + * expiry.
> > + */
> > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > + if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
> > + mode |= HRTIMER_MODE_HARD;
> > + }
> > +
> > __hrtimer_init(&sl->timer, clock_id, mode);
> > sl->timer.function = hrtimer_wakeup;

It's the wakeup function and nothing is supposed to override that.

Thanks,

tglx

2019-07-26 20:59:24

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

On Fri, 26 Jul 2019 22:52:18 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:
> > Have we marked all timer handlers that have normal spin_locks as
> > HRTIMER_MODE_SOFT? Otherwise, can't we switch one to hard below and
> > having their handler grab a spin_lock/mutex in hard interrupt context
> > in RT?
>
> See patch 09/12. We move all timers into soft mode which are not marked
> MODE_HARD.
>


> > > sl->timer.function = hrtimer_wakeup;
>
> It's the wakeup function and nothing is supposed to override that.

Ah, that makes sense. Not the actual handler then.

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

-- Steve


2019-07-26 21:18:32

by Julia Cartwright

[permalink] [raw]
Subject: Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

On Fri, Jul 26, 2019 at 08:30:58PM +0200, Thomas Gleixner wrote:
> From: Sebastian Andrzej Siewior <[email protected]>
>
> On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for
> hard interrupt expiry mode are moved into soft interrupt context either for
> latency reasons or because the hrtimer callback takes regular spinlocks or
> invokes other functions which are not suitable for hard interrupt context
> on PREEMPT_RT.
>
> The hrtimer_sleeper callback is RT compatible in hard interrupt context,
> but there is a latency concern: Untrusted userspace can spawn many threads
> which arm timers for the same expiry time on the same CPU. On expiry that
> causes a latency spike due to the wakeup of a gazillion threads.
>
> OTOH, priviledged real-time user space applications rely on the low latency
> of hard interrupt wakeups. These syscall related wakeups are all based on
> hrtimer sleepers.
>
> If the current task is in a real-time scheduling class, mark the mode for
> hard interrupt expiry.
>
> [ tglx: Split out of a larger combo patch. Added changelog ]
>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> kernel/time/hrtimer.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1662,6 +1662,30 @@ static enum hrtimer_restart hrtimer_wake
> static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
> clockid_t clock_id, enum hrtimer_mode mode)
> {
> + /*
> + * On PREEMPT_RT enabled kernels hrtimers which are not explicitely
> + * marked for hard interrupt expiry mode are moved into soft
> + * interrupt context either for latency reasons or because the
> + * hrtimer callback takes regular spinlocks or invokes other
> + * functions which are not suitable for hard interrupt context on
> + * PREEMPT_RT.
> + *
> + * The hrtimer_sleeper callback is RT compatible in hard interrupt
> + * context, but there is a latency concern: Untrusted userspace can
> + * spawn many threads which arm timers for the same expiry time on
> + * the same CPU. That causes a latency spike due to the wakeup of
> + * a gazillion threads.
> + *
> + * OTOH, priviledged real-time user space applications rely on the
> + * low latency of hard interrupt wakeups. If the current task is in
> + * a real-time scheduling class, mark the mode for hard interrupt
> + * expiry.
> + */
> + if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> + if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
> + mode |= HRTIMER_MODE_HARD;

Because this ends up sampling the tasks' scheduling parameters only at
the time of enqueue, it doesn't take into consideration whether or not
the task maybe holding a PI lock and later be boosted if contended by an
RT thread.

Am I correct in assuming there is an induced inversion here in this
case, because the deferred wakeup mechanism isn't part of the PI chain?

If so, is this just to be an accepted limitation at this point? Is the
intent to argue this away as bad RT application design? :)

Julia

2019-07-26 21:32:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

On Fri, 26 Jul 2019 21:16:24 +0000
Julia Cartwright <[email protected]> wrote:

> > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > + if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
> > + mode |= HRTIMER_MODE_HARD;
>
> Because this ends up sampling the tasks' scheduling parameters only at
> the time of enqueue, it doesn't take into consideration whether or not
> the task maybe holding a PI lock and later be boosted if contended by an
> RT thread.
>
> Am I correct in assuming there is an induced inversion here in this
> case, because the deferred wakeup mechanism isn't part of the PI chain?
>
> If so, is this just to be an accepted limitation at this point? Is the
> intent to argue this away as bad RT application design? :)
>

Well, it shouldn't be holding any kernel PI locks (aka spin_lock) when
it sleeps, but may be holding a PI futex. In which case, I would say is
a bad RT application, to have a thread sleep on a non RT timer while
holding a lock that an RT Task might take.

-- Steve

2019-07-26 21:36:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

Julia,

On Fri, 26 Jul 2019, Julia Cartwright wrote:
> On Fri, Jul 26, 2019 at 08:30:58PM +0200, Thomas Gleixner wrote:
> > + * OTOH, priviledged real-time user space applications rely on the
> > + * low latency of hard interrupt wakeups. If the current task is in
> > + * a real-time scheduling class, mark the mode for hard interrupt
> > + * expiry.
> > + */
> > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > + if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
> > + mode |= HRTIMER_MODE_HARD;
>
> Because this ends up sampling the tasks' scheduling parameters only at
> the time of enqueue, it doesn't take into consideration whether or not
> the task maybe holding a PI lock and later be boosted if contended by an
> RT thread.
>
> Am I correct in assuming there is an induced inversion here in this
> case, because the deferred wakeup mechanism isn't part of the PI chain?
>
> If so, is this just to be an accepted limitation at this point? Is the
> intent to argue this away as bad RT application design? :)

This would bring us back to the point where we moved the hrtimers
which were not marked for hardirq expiry onto the separate softirq expiry
list. That caused horrible latencies in some scenarios.

The separation of the bases into hard and soft expiry mode solved that
nicely and I haven't heard a complaint since we changed that in 4.14-rt.

So yes I'd argue it's an application issue. Holding a lock while doing
e.g. a nanosleep is not the most brilliant idea.

Thanks,

tglx

Subject: [tip:timers/core] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

Commit-ID: a6bc84e64a7dbfdeaae02f434ad1b296f2f9cd1e
Gitweb: https://git.kernel.org/tip/a6bc84e64a7dbfdeaae02f434ad1b296f2f9cd1e
Author: Sebastian Andrzej Siewior <[email protected]>
AuthorDate: Fri, 26 Jul 2019 20:30:58 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Tue, 30 Jul 2019 23:57:57 +0200

hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for
hard interrupt expiry mode are moved into soft interrupt context either for
latency reasons or because the hrtimer callback takes regular spinlocks or
invokes other functions which are not suitable for hard interrupt context
on PREEMPT_RT.

The hrtimer_sleeper callback is RT compatible in hard interrupt context,
but there is a latency concern: Untrusted userspace can spawn many threads
which arm timers for the same expiry time on the same CPU. On expiry that
causes a latency spike due to the wakeup of a gazillion threads.

OTOH, priviledged real-time user space applications rely on the low latency
of hard interrupt wakeups. These syscall related wakeups are all based on
hrtimer sleepers.

If the current task is in a real-time scheduling class, mark the mode for
hard interrupt expiry.

[ tglx: Split out of a larger combo patch. Added changelog ]

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
kernel/time/hrtimer.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 90dcc4d95e91..c101f88ae8aa 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1676,6 +1676,16 @@ static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer)
void hrtimer_sleeper_start_expires(struct hrtimer_sleeper *sl,
enum hrtimer_mode mode)
{
+ /*
+ * Make the enqueue delivery mode check work on RT. If the sleeper
+ * was initialized for hard interrupt delivery, force the mode bit.
+ * This is a special case for hrtimer_sleepers because
+ * hrtimer_init_sleeper() determines the delivery mode on RT so the
+ * fiddling with this decision is avoided at the call sites.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && sl->timer.is_hard)
+ mode |= HRTIMER_MODE_HARD;
+
hrtimer_start_expires(&sl->timer, mode);
}
EXPORT_SYMBOL_GPL(hrtimer_sleeper_start_expires);
@@ -1683,6 +1693,30 @@ EXPORT_SYMBOL_GPL(hrtimer_sleeper_start_expires);
static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
clockid_t clock_id, enum hrtimer_mode mode)
{
+ /*
+ * On PREEMPT_RT enabled kernels hrtimers which are not explicitely
+ * marked for hard interrupt expiry mode are moved into soft
+ * interrupt context either for latency reasons or because the
+ * hrtimer callback takes regular spinlocks or invokes other
+ * functions which are not suitable for hard interrupt context on
+ * PREEMPT_RT.
+ *
+ * The hrtimer_sleeper callback is RT compatible in hard interrupt
+ * context, but there is a latency concern: Untrusted userspace can
+ * spawn many threads which arm timers for the same expiry time on
+ * the same CPU. That causes a latency spike due to the wakeup of
+ * a gazillion threads.
+ *
+ * OTOH, priviledged real-time user space applications rely on the
+ * low latency of hard interrupt wakeups. If the current task is in
+ * a real-time scheduling class, mark the mode for hard interrupt
+ * expiry.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
+ mode |= HRTIMER_MODE_HARD;
+ }
+
__hrtimer_init(&sl->timer, clock_id, mode);
sl->timer.function = hrtimer_wakeup;
sl->task = current;

Subject: [tip:timers/core] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

Commit-ID: 876f28e7bdf152da7514a28c79f83e61e0c6d30e
Gitweb: https://git.kernel.org/tip/876f28e7bdf152da7514a28c79f83e61e0c6d30e
Author: Sebastian Andrzej Siewior <[email protected]>
AuthorDate: Fri, 26 Jul 2019 20:30:58 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Thu, 1 Aug 2019 17:43:19 +0200

hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for
hard interrupt expiry mode are moved into soft interrupt context either for
latency reasons or because the hrtimer callback takes regular spinlocks or
invokes other functions which are not suitable for hard interrupt context
on PREEMPT_RT.

The hrtimer_sleeper callback is RT compatible in hard interrupt context,
but there is a latency concern: Untrusted userspace can spawn many threads
which arm timers for the same expiry time on the same CPU. On expiry that
causes a latency spike due to the wakeup of a gazillion threads.

OTOH, priviledged real-time user space applications rely on the low latency
of hard interrupt wakeups. These syscall related wakeups are all based on
hrtimer sleepers.

If the current task is in a real-time scheduling class, mark the mode for
hard interrupt expiry.

[ tglx: Split out of a larger combo patch. Added changelog ]

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]


---
kernel/time/hrtimer.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 90dcc4d95e91..c101f88ae8aa 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1676,6 +1676,16 @@ static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer)
void hrtimer_sleeper_start_expires(struct hrtimer_sleeper *sl,
enum hrtimer_mode mode)
{
+ /*
+ * Make the enqueue delivery mode check work on RT. If the sleeper
+ * was initialized for hard interrupt delivery, force the mode bit.
+ * This is a special case for hrtimer_sleepers because
+ * hrtimer_init_sleeper() determines the delivery mode on RT so the
+ * fiddling with this decision is avoided at the call sites.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && sl->timer.is_hard)
+ mode |= HRTIMER_MODE_HARD;
+
hrtimer_start_expires(&sl->timer, mode);
}
EXPORT_SYMBOL_GPL(hrtimer_sleeper_start_expires);
@@ -1683,6 +1693,30 @@ EXPORT_SYMBOL_GPL(hrtimer_sleeper_start_expires);
static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
clockid_t clock_id, enum hrtimer_mode mode)
{
+ /*
+ * On PREEMPT_RT enabled kernels hrtimers which are not explicitely
+ * marked for hard interrupt expiry mode are moved into soft
+ * interrupt context either for latency reasons or because the
+ * hrtimer callback takes regular spinlocks or invokes other
+ * functions which are not suitable for hard interrupt context on
+ * PREEMPT_RT.
+ *
+ * The hrtimer_sleeper callback is RT compatible in hard interrupt
+ * context, but there is a latency concern: Untrusted userspace can
+ * spawn many threads which arm timers for the same expiry time on
+ * the same CPU. That causes a latency spike due to the wakeup of
+ * a gazillion threads.
+ *
+ * OTOH, priviledged real-time user space applications rely on the
+ * low latency of hard interrupt wakeups. If the current task is in
+ * a real-time scheduling class, mark the mode for hard interrupt
+ * expiry.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
+ mode |= HRTIMER_MODE_HARD;
+ }
+
__hrtimer_init(&sl->timer, clock_id, mode);
sl->timer.function = hrtimer_wakeup;
sl->task = current;

Subject: [tip:timers/core] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

Commit-ID: 1842f5a427f5323f5c19ab99b55d09b3ab5172a5
Gitweb: https://git.kernel.org/tip/1842f5a427f5323f5c19ab99b55d09b3ab5172a5
Author: Sebastian Andrzej Siewior <[email protected]>
AuthorDate: Fri, 26 Jul 2019 20:30:58 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Thu, 1 Aug 2019 20:51:22 +0200

hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for
hard interrupt expiry mode are moved into soft interrupt context either for
latency reasons or because the hrtimer callback takes regular spinlocks or
invokes other functions which are not suitable for hard interrupt context
on PREEMPT_RT.

The hrtimer_sleeper callback is RT compatible in hard interrupt context,
but there is a latency concern: Untrusted userspace can spawn many threads
which arm timers for the same expiry time on the same CPU. On expiry that
causes a latency spike due to the wakeup of a gazillion threads.

OTOH, priviledged real-time user space applications rely on the low latency
of hard interrupt wakeups. These syscall related wakeups are all based on
hrtimer sleepers.

If the current task is in a real-time scheduling class, mark the mode for
hard interrupt expiry.

[ tglx: Split out of a larger combo patch. Added changelog ]

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]



---
kernel/time/hrtimer.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 90dcc4d95e91..c101f88ae8aa 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1676,6 +1676,16 @@ static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer)
void hrtimer_sleeper_start_expires(struct hrtimer_sleeper *sl,
enum hrtimer_mode mode)
{
+ /*
+ * Make the enqueue delivery mode check work on RT. If the sleeper
+ * was initialized for hard interrupt delivery, force the mode bit.
+ * This is a special case for hrtimer_sleepers because
+ * hrtimer_init_sleeper() determines the delivery mode on RT so the
+ * fiddling with this decision is avoided at the call sites.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && sl->timer.is_hard)
+ mode |= HRTIMER_MODE_HARD;
+
hrtimer_start_expires(&sl->timer, mode);
}
EXPORT_SYMBOL_GPL(hrtimer_sleeper_start_expires);
@@ -1683,6 +1693,30 @@ EXPORT_SYMBOL_GPL(hrtimer_sleeper_start_expires);
static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
clockid_t clock_id, enum hrtimer_mode mode)
{
+ /*
+ * On PREEMPT_RT enabled kernels hrtimers which are not explicitely
+ * marked for hard interrupt expiry mode are moved into soft
+ * interrupt context either for latency reasons or because the
+ * hrtimer callback takes regular spinlocks or invokes other
+ * functions which are not suitable for hard interrupt context on
+ * PREEMPT_RT.
+ *
+ * The hrtimer_sleeper callback is RT compatible in hard interrupt
+ * context, but there is a latency concern: Untrusted userspace can
+ * spawn many threads which arm timers for the same expiry time on
+ * the same CPU. That causes a latency spike due to the wakeup of
+ * a gazillion threads.
+ *
+ * OTOH, priviledged real-time user space applications rely on the
+ * low latency of hard interrupt wakeups. If the current task is in
+ * a real-time scheduling class, mark the mode for hard interrupt
+ * expiry.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
+ mode |= HRTIMER_MODE_HARD;
+ }
+
__hrtimer_init(&sl->timer, clock_id, mode);
sl->timer.function = hrtimer_wakeup;
sl->task = current;