Summary:
Mixing HR timers with itimers occasionally hides an EINTR from a
blocking syscall.
Description:
In my test program I have a High Resolution timer firing every one
second (with SA_RESTART) and I set an itimer (without SA_RESTART) to
fire after three seconds. I then execute a blocking system call (flock
in this case) and expect the three second itimer to interrupt the system
call with EINTR. However, I frequently notice that the itimer will fire
but it will not interrupt the blocking system call. There appears to be
a race between the HR timer firing and the itimer firing. If I offset
the HR timer frequency by a half second, the itimer always interrupts
the system call.
Kernel version:
These kernels both demonstrate the condition:
2.6.29.6-217.2.16.fc11.x86_64
and
2.6.30.5-43.fc11.x86_64
I do not see this condition on:
2.6.18-53.el5
Test program:
The following program illustrates this condition:
http://github.com/mheffner/scripts/commits/master/hrtimer_vs_itimer.c
Is this behavior expected?
Cheers,
Mike
--
Mike Heffner <[email protected]>
(cc's added)
(it's a regression)
(it has a testcase!)
On Fri, 04 Sep 2009 17:26:35 -0400
Mike Heffner <[email protected]> wrote:
> Summary:
>
> Mixing HR timers with itimers occasionally hides an EINTR from a
> blocking syscall.
>
>
> Description:
>
> In my test program I have a High Resolution timer firing every one
> second (with SA_RESTART) and I set an itimer (without SA_RESTART) to
> fire after three seconds. I then execute a blocking system call (flock
> in this case) and expect the three second itimer to interrupt the system
> call with EINTR. However, I frequently notice that the itimer will fire
> but it will not interrupt the blocking system call. There appears to be
> a race between the HR timer firing and the itimer firing. If I offset
> the HR timer frequency by a half second, the itimer always interrupts
> the system call.
>
> Kernel version:
>
> These kernels both demonstrate the condition:
>
> 2.6.29.6-217.2.16.fc11.x86_64
> and
> 2.6.30.5-43.fc11.x86_64
>
>
> I do not see this condition on:
>
> 2.6.18-53.el5
>
>
> Test program:
>
> The following program illustrates this condition:
>
> http://github.com/mheffner/scripts/commits/master/hrtimer_vs_itimer.c
>
>
> Is this behavior expected?
>
>
On 09/24, Andrew Morton wrote:
>
> (cc's added)
add Roland.
> (it's a regression)
Not sure...
> On Fri, 04 Sep 2009 17:26:35 -0400
> Mike Heffner <[email protected]> wrote:
>
> > Summary:
> >
> > Mixing HR timers with itimers occasionally hides an EINTR from a
> > blocking syscall.
> >
> >
> > Description:
> >
> > In my test program I have a High Resolution timer firing every one
> > second (with SA_RESTART) and I set an itimer (without SA_RESTART) to
> > fire after three seconds. I then execute a blocking system call (flock
> > in this case) and expect the three second itimer to interrupt the system
> > call with EINTR. However, I frequently notice that the itimer will fire
> > but it will not interrupt the blocking system call. There appears to be
> > a race between the HR timer firing and the itimer firing. If I offset
> > the HR timer frequency by a half second, the itimer always interrupts
> > the system call.
> >
> > Kernel version:
> >
> > These kernels both demonstrate the condition:
> >
> > 2.6.29.6-217.2.16.fc11.x86_64
> > and
> > 2.6.30.5-43.fc11.x86_64
> >
> >
> > I do not see this condition on:
> >
> > 2.6.18-53.el5
This is strange.
> > The following program illustrates this condition:
> >
> > http://github.com/mheffner/scripts/commits/master/hrtimer_vs_itimer.c
I didn't try this test-case, but afaics everything is clear, please
see below.
> > Is this behavior expected?
I don't know ;)
Well, I'd say this is expected. I mean, I am not surprized. But I can't
"prove" this is correct.
OK, I wrote the simple test-case to simplify the explanation. The child
instals the same handler for SIGHUP < SIGINT < SIGQUIT, but SIGINT doesn't
use SA_RESTART.
The test-case:
static void sigh(int sig)
{
printf("SIG: %d\n", sig);
}
int main(void)
{
int pid;
if (!(pid = fork())) {
struct sigaction sa = { .sa_handler = sigh };
sa.sa_flags = SA_RESTART;
assert(0 == sigaction(SIGHUP, &sa, NULL));
sa.sa_flags = 0;
assert(0 == sigaction(SIGINT, &sa, NULL));
sa.sa_flags = SA_RESTART;
assert(0 == sigaction(SIGQUIT, &sa, NULL));
printf("block...\n");
getchar(); // any restartable syscall
printf("exit\n");
return 0;
}
sleep(1);
printf("it shouldn't exit\n");
kill(pid, SIGHUP); kill(pid, SIGINT);
sleep(1);
printf("now it should exit!\n");
kill(pid, SIGINT); kill(pid, SIGQUIT);
wait(NULL);
return 0;
}
The output:
block...
it shouldn't exit
SIG: 2
SIG: 1
now it should exit!
SIG: 3
SIG: 2
exit
So. The child sleeps in getchar().
The parent sends SIGHUP + SIGINT. The child recievese both signals and
restarts the syscall, despite the fact the hanlder for SIGINT has not
SA_RESTART flag.
What happens is:
syscall returns -ERESTARTSYS
SIGHUP < SIGINT, the child dequeues SIGHUP first.
handle_signal() notices -ERESTARTSYS and does:
regs->ax = regs->orig_ax;
regs->ip -= 2;
Before the child returns to user-mode, it will also dequeue SIGINT, but
this does not matter. regs->ax was changed, the next signal can't see
the soon-to-be-restarted syscall returned ERESTARTSYS.
When we send SIGINT + SIGHUP, SIGINT wins. It changes ->ax too, but
doesn't change ->ip - the child returns from syscall.
Again, this test-case relies on SIGHUP < SIGINT < SIGQUIT, but this is
not necessary. The thing is, if we dequeue the !SA_RESTART signal after
SA_RESTART signal - syscall will be restarted.
And this does not look like a bug to me. Because we can pretend that
SIGINT was sent _after_ the task has actually returned to user-mode
and before it restarts this syscall. In this case SIGINT can not
cancel the syscall which was not called yet.
IOW, we have SIG_1 and SIG_2. SIG_1 has SA_RESTART, SIG_2 not. The
task sleeps in syscall(). Then,
the task recieves SIG_1
syscall() returns -ERESTARTSYS
the tasks returns to user mode to restart syscall()
the task recieves SIG_2, handles the new signal
syscall() restarted
We can change this test-case so that SIGHUP will block all signals,
but this will only change the order of printf's from the handler.
If we want to change the current behaviour, we need the nontrivial
changes.
Oleg.
Oleg is correct. As he's explained, what we expect to be likely in your
scenario is that the SA_RESTART signal has already "prevented the syscall
from happening" by rolling it back at the time the non-SA_RESTART signal
happens, so there is no syscall to get EINTR. Then it goes ahead and
starts the syscall, after both signals have come and gone.
It's really the same case as if you'd suspended the program with ^Z between
setting the timer and making the flock call and then resumed it more than
three seconds later. Or just an obscenely long scheduling delay from
higher priority tasks. Or if you'd been stepping it in the debugger and
took that long to hit return that many times. Or if that printf/fflush had
blocked in write because the pipe/pty/socket buffers were all full along
the path to your terminal and it took three seconds for the network to
unclog, or whatever it was.
It's true that you can observe the difference between those cases and the
syscall restart case, if e.g. the syscall clearly had begun to happen
because you'd already blocked for most of a second and could tell that was
so somehow. But that thread itself can't really tell, and you don't get
any guarantee that because you somehow externally think the syscall had
been started, your thread won't semantically be said to be in user mode
sitting at the syscall instruction but not having executed it yet. So
indeed that's what we'll say when after that the SA_RESTART signal hits.
You really can't get the kind of guaranteed-raceless interruption you are
expecting using signals with arbitrary calls in POSIX. Only with a few
that specifically take a blocked signal set to install inside the syscall
before they block, like ppoll/pselect. You may be looking for the model
that pthread_cancel gives you (unfortunately it requires you to use a
separate thread and to have it entirely cancelled to effect an interrupt).
fcntl (flock) is a cancellation point, meaning a prior pthread_cancel has
the "sticky" effect you want even in "deferred cancel" mode.
Thanks,
Roland