This is the second attempt of cleaning this up. Version 1 can be found
here:
https://lore.kernel.org/lkml/[email protected]/
Last year I reread a 15 years old comment about the SIG_IGN problem:
"FIXME: What we really want, is to stop this timer completely and restart
it in case the SIG_IGN is removed. This is a non trivial change which
involves sighand locking (sigh !), which we don't want to do late in the
release cycle. ... A more complex fix which solves also another related
inconsistency is already in the pipeline."
The embarrasing part was that I put that comment in back then. So I went
back and rumaged through old notes as I completely had forgotten why our
attempts to fix this back then failed.
It turned out that the comment is about right: sighand locking and life
time issues. So I sat down with the old notes and started to wrap my head
around this again.
The problem to solve:
Posix interval timers are not rearmed automatically by the kernel for
various reasons:
1) To prevent DoS by extremly short intervals.
2) To avoid timer overhead when a signal is pending and has not
yet been delivered.
This is achieved by queueing the signal at timer expiry and rearming the
timer at signal delivery to user space. This puts the rearming basically
under scheduler control and the work happens in context of the task which
asked for the signal.
There is a problem with that vs. SIG_IGN. If a signal has SIG_IGN installed
as handler the related signals are discarded. So in case of posix interval
timers this means that such a timer is never rearmed even when SIG_IGN is
replaced later with a real handler (including SIG_DFL).
To work around that the kernel self rearms those timers and throttles them
when the interval is smaller than a tick to prevent a DoS.
That just keeps timers ticking, which obviously has effects on power and
just creates work for nothing.
So ideally these timers should be stopped and rearmed when SIG_IGN is
replaced, which aligns with the regular handling of posix timers.
Sounds trivial, but isn't:
1) Lock ordering.
The timer lock cannot be taken with sighand lock held which is
problematic vs. the atomicity of sigaction().
2) Life time rules
The timer and the sigqueue are separate entities which requires a
lookup of the timer ID in the signal rearm code. This can be handled,
but the separate life time rules are not necessarily robust.
3) Finding the relevant timers
Obviosly it is possible to walk the posix timer list under sighand
lock and handle it from there. That can be expensive especially in the
case that there are no affected timers as the walk would just end up
doing nothing.
The following series is a new and this time actually working attempt to
solve this. It addresses it by:
1) Embedding the preallocated sigqueue into struct k_itimer, which makes
the life time rules way simpler and just needs a trivial reference
count.
2) Having a separate list in task::signal on which ignored timers are
queued.
This avoids walking a potentially large timer list for nothing on a
SIG_IGN to handler transition.
3) Requeueing the timers signal in the relevant signal queue so the timer
is rearmed when the signal is actually delivered
That turned out to be the least complicated way to address the sighand
lock vs. timer lock ordering issue.
With that timers which have their signal ignored are not longer self
rearmed and the relevant workarounds including throttling for DoS
prevention are removed.
Aside of the SIG_IGN issues it also addresses a few inconsistencies in
posix CPU timers and the general inconsistency of signal handling
vs. disarmed, reprogrammed and deleted timers.
To actually validate the fixes the posix timer self test has been expanded
with tests which cover not only the simple SIG IGN case but also more
complex scenarios which have never been handled correctly by the current
self rearming work around.
The series is based on:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/urgent
and is also available from git:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git timers/posix
Changes vs. V1:
- Dropped the timer distribution check changes as that has been handled
upstream differently (it's in timers/urgent and soon in Linus tree)
- Split up patch 9 for the sake of easier review - Frederic
- Addressed the review comments from Frederic
- Picked up Reviewed-by tags where appropriate
Thanks,
tglx
---
arch/x86/kernel/signal_32.c | 2
arch/x86/kernel/signal_64.c | 2
drivers/power/supply/charger-manager.c | 3
fs/proc/base.c | 10
fs/signalfd.c | 4
fs/timerfd.c | 4
include/linux/alarmtimer.h | 10
include/linux/posix-timers.h | 69 ++-
include/linux/sched/signal.h | 11
include/uapi/asm-generic/siginfo.h | 2
init/init_task.c | 5
kernel/fork.c | 3
kernel/signal.c | 486 +++++++++++++---------
kernel/time/alarmtimer.c | 82 ---
kernel/time/itimer.c | 22 -
kernel/time/posix-cpu-timers.c | 231 ++++------
kernel/time/posix-timers.c | 276 ++++++-------
kernel/time/posix-timers.h | 9
net/netfilter/xt_IDLETIMER.c | 4
tools/testing/selftests/timers/posix_timers.c | 550 +++++++++++++++++++++-----
20 files changed, 1092 insertions(+), 693 deletions(-)
timer_delete_hook() returns -EINVAL when the clock or the timer_del
callback of the clock does not exist. This return value is not handled by
the callsites timer_delete() and itimer_delete().
Therefore add proper error handling.
Signed-off-by: Anna-Maria Behnsen <[email protected]>
---
When having a look at the posix timer code during reviewing the queue, I
stumbled over this inconsitency. Maybe you want to have it in your
cleanup queue. Patch applies on top of your queue.
kernel/time/posix-timers.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -1009,6 +1009,7 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t
{
struct k_itimer *timer;
unsigned long flags;
+ int ret;
timer = lock_timer(timer_id, &flags);
@@ -1019,7 +1020,11 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t
/* Prevent signal delivery and rearming. */
timer->it_signal_seq++;
- if (unlikely(timer_delete_hook(timer) == TIMER_RETRY)) {
+ ret = timer_delete_hook(timer);
+ if (ret < 0)
+ return ret;
+
+ if (unlikely(ret == TIMER_RETRY)) {
/* Unlocks and relocks the timer if it still exists */
timer = timer_wait_running(timer, &flags);
goto retry_delete;
@@ -1047,6 +1052,7 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t
static void itimer_delete(struct k_itimer *timer)
{
unsigned long flags;
+ int ret;
/*
* irqsave is required to make timer_wait_running() work.
@@ -1054,13 +1060,17 @@ static void itimer_delete(struct k_itime
spin_lock_irqsave(&timer->it_lock, flags);
retry_delete:
+ ret = timer_delete_hook(timer);
+ if (WARN_ON_ONCE(ret < 0))
+ return;
+
/*
* Even if the timer is not longer accessible from other tasks
* it still might be armed and queued in the underlying timer
* mechanism. Worse, that timer mechanism might run the expiry
* function concurrently.
*/
- if (timer_delete_hook(timer) == TIMER_RETRY) {
+ if (ret == TIMER_RETRY) {
/*
* Timer is expired concurrently, prevent livelocks
* and pointless spinning on RT.
On 04/15, Anna-Maria Behnsen wrote:
>
> timer_delete_hook() returns -EINVAL when the clock or the timer_del
> callback of the clock does not exist. This return value is not handled by
> the callsites timer_delete() and itimer_delete().
IIUC this shouldn't happen? timer_delete_hook() WARN()s in this case,
not sure we need to return this error to userspace...
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -1009,6 +1009,7 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t
> {
> struct k_itimer *timer;
> unsigned long flags;
> + int ret;
>
> timer = lock_timer(timer_id, &flags);
>
> @@ -1019,7 +1020,11 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t
> /* Prevent signal delivery and rearming. */
> timer->it_signal_seq++;
>
> - if (unlikely(timer_delete_hook(timer) == TIMER_RETRY)) {
> + ret = timer_delete_hook(timer);
> + if (ret < 0)
> + return ret;
unlock_timer() ?
> static void itimer_delete(struct k_itimer *timer)
> {
> unsigned long flags;
> + int ret;
>
> /*
> * irqsave is required to make timer_wait_running() work.
> @@ -1054,13 +1060,17 @@ static void itimer_delete(struct k_itime
> spin_lock_irqsave(&timer->it_lock, flags);
>
> retry_delete:
> + ret = timer_delete_hook(timer);
> + if (WARN_ON_ONCE(ret < 0))
> + return;
the same.
Oleg.
Oleg Nesterov <[email protected]> writes:
> On 04/15, Anna-Maria Behnsen wrote:
>>
>> timer_delete_hook() returns -EINVAL when the clock or the timer_del
>> callback of the clock does not exist. This return value is not handled by
>> the callsites timer_delete() and itimer_delete().
>
> IIUC this shouldn't happen? timer_delete_hook() WARN()s in this case,
> not sure we need to return this error to userspace...
This shouldn't happen, right.
Even if we do not return this error to userspace, is it valid to proceed
with the rest of the callsites? When it is fine to just ignore the
-EINVAL return, then I would propose just to add a comment to the code.
>> --- a/kernel/time/posix-timers.c
>> +++ b/kernel/time/posix-timers.c
>> @@ -1009,6 +1009,7 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t
>> {
>> struct k_itimer *timer;
>> unsigned long flags;
>> + int ret;
>>
>> timer = lock_timer(timer_id, &flags);
>>
>> @@ -1019,7 +1020,11 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t
>> /* Prevent signal delivery and rearming. */
>> timer->it_signal_seq++;
>>
>> - if (unlikely(timer_delete_hook(timer) == TIMER_RETRY)) {
>> + ret = timer_delete_hook(timer);
>> + if (ret < 0)
>> + return ret;
>
> unlock_timer() ?
>
bah... was done in a hurry...
Thanks,
Anna-Maria
Anna-Maria, I can't really answer, I don't understand this code today ;)
That said, let me try to explain my opinion,
On 04/15, Anna-Maria Behnsen wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> > On 04/15, Anna-Maria Behnsen wrote:
> >>
> >> timer_delete_hook() returns -EINVAL when the clock or the timer_del
> >> callback of the clock does not exist. This return value is not handled by
> >> the callsites timer_delete() and itimer_delete().
> >
> > IIUC this shouldn't happen? timer_delete_hook() WARN()s in this case,
> > not sure we need to return this error to userspace...
>
> This shouldn't happen, right.
>
> Even if we do not return this error to userspace, is it valid to proceed
> with the rest of the callsites?
Well, I'd say that nothing is safe after we hit the kernel problem.
But lets suppose we return EINVAL and skip list_del(&timer->list)/etc.
How can this help? What can userspace do to resolve this problem? Is it
better to "leak" this timer? I dunno.
> When it is fine to just ignore the
> -EINVAL return, then I would propose just to add a comment to the code.
Agreed!
Oleg.