2001-12-05 03:02:19

by Richard Henderson

[permalink] [raw]
Subject: alpha bug in signal handling

On Tue, Dec 04, 2001 at 06:15:50PM -0800, David Mosberger wrote:
> Oh, sorry, I was referring to teh *other* problem... ;-)
>
> What I meant is that the check for re-scheduling
> (current->need_resched) and signal deliverify (current->sigpending)
> needs to be done with interrupts turned off, and the interrupts need
> to be left off until user space is reached. Otherwise, you could get
> an interrupt which would wake up a higher priority task or post a
> signal between the check and the return to user space.
>
> I didn't see this interrupt disabling in the Alpha version of entry.S,
> but I have to admit my Alpha assembly is getting quite rusty.

Oh, yes, I see. This should fix it.


r~



--- arch/alpha/kernel/entry.S.orig Tue Dec 4 18:40:53 2001
+++ arch/alpha/kernel/entry.S Tue Dec 4 18:46:33 2001
@@ -580,6 +580,10 @@
and $0,8,$0
beq $0,restore_all
ret_from_reschedule:
+ /* Turn off interrupts so that resched and signal delivery
+ checks are done atomically. */
+ addq $31,7,$16
+ call_pal PAL_swpipl
ldq $2,TASK_NEED_RESCHED($8)
lda $4,init_task_union
bne $2,reschedule


2001-12-05 11:23:48

by David Miller

[permalink] [raw]
Subject: Re: alpha bug in signal handling

From: Richard Henderson <[email protected]>
Date: Tue, 4 Dec 2001 19:00:48 -0800

On Tue, Dec 04, 2001 at 06:15:50PM -0800, David Mosberger wrote:
> Oh, sorry, I was referring to teh *other* problem... ;-)
>
> What I meant is that the check for re-scheduling
> (current->need_resched) and signal deliverify (current->sigpending)
> needs to be done with interrupts turned off, and the interrupts need
> to be left off until user space is reached. Otherwise, you could get
> an interrupt which would wake up a higher priority task or post a
> signal between the check and the return to user space.
>
> I didn't see this interrupt disabling in the Alpha version of entry.S,
> but I have to admit my Alpha assembly is getting quite rusty.

Oh, yes, I see. This should fix it.

I don't understand why this is even necessary.

What if the interrupt comes in on another processor. How does this
return from trap behavior avoid that interrupt modifying the signal
and/or scheduling state wrt. the current cpu's task?

I think the change is bogus, we don't do this on sparc64 and things
have been perfectly fine.

And if the change isn't necessary, it's bad to disable interrupts for
a longer period of time than necessary.

2001-12-05 17:00:24

by Richard Henderson

[permalink] [raw]
Subject: Re: alpha bug in signal handling

On Wed, Dec 05, 2001 at 03:23:04AM -0800, David S. Miller wrote:
> I don't understand why this is even necessary.
>
> What if the interrupt comes in on another processor. How does this
> return from trap behavior avoid that interrupt modifying the signal
> and/or scheduling state wrt. the current cpu's task?

It doesn't. But it also prevents the IPI from being recognized
until we are back in userland. Apparently DMT had a test case
that failed without disabling interrupts; I didn't see it myself.


r~

2001-12-05 20:18:46

by David Miller

[permalink] [raw]
Subject: Re: alpha bug in signal handling

From: Richard Henderson <[email protected]>
Date: Wed, 5 Dec 2001 08:58:08 -0800

It doesn't. But it also prevents the IPI from being recognized
until we are back in userland. Apparently DMT had a test case
that failed without disabling interrupts; I didn't see it myself.

I would like to see this test case :-)

2001-12-05 20:59:28

by Paul Mackerras

[permalink] [raw]
Subject: Re: alpha bug in signal handling

David S. Miller writes:

> From: Richard Henderson <[email protected]>
> Date: Tue, 4 Dec 2001 19:00:48 -0800
>
> On Tue, Dec 04, 2001 at 06:15:50PM -0800, David Mosberger wrote:
> > Oh, sorry, I was referring to teh *other* problem... ;-)
> >
> > What I meant is that the check for re-scheduling
> > (current->need_resched) and signal deliverify (current->sigpending)
> > needs to be done with interrupts turned off, and the interrupts need
> > to be left off until user space is reached. Otherwise, you could get
> > an interrupt which would wake up a higher priority task or post a
> > signal between the check and the return to user space.
> >
> > I didn't see this interrupt disabling in the Alpha version of entry.S,
> > but I have to admit my Alpha assembly is getting quite rusty.
>
> Oh, yes, I see. This should fix it.
>
> I don't understand why this is even necessary.

I think David (Mosberger, not Miller :) is right here, and in fact
this is also wrong on PPC at the moment. However, the worst case is
that the reschedule or signal delivery will get delayed until the next
interrupt on that cpu (max 1/HZ seconds). Since it is a pretty narrow
window for the race I think it would be hard to observe the effect of
the bug.

> What if the interrupt comes in on another processor. How does this
> return from trap behavior avoid that interrupt modifying the signal
> and/or scheduling state wrt. the current cpu's task?

If the interrupt comes in on another processor, then we call schedule()
on that processor and it does an IPI if it thinks that this processor
should reschedule.

The logic requires that we do the signal/schedule checks on exit to
userland. The scenario where they can get missed is where we are
returning to userland, and we get an interrupt just after we have done
the checks. Suppose the interrupt routine raises a signal or wakes up
a task. On the return from the interrupt routine it won't do the
signal/schedule checks because we are returning to the kernel
mainline. When we get back there it just completes the return to
userspace without doing the checks (since we have already done them).

Paul.

2001-12-05 21:16:27

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: alpha bug in signal handling

>
>I think David (Mosberger, not Miller :) is right here, and in fact
>this is also wrong on PPC at the moment. However, the worst case is
>that the reschedule or signal delivery will get delayed until the next
>interrupt on that cpu (max 1/HZ seconds). Since it is a pretty narrow
>window for the race I think it would be hard to observe the effect of
>the bug.

Yup, I found that some time ago. I though I even fixed it while hacking
on the return from exception path. The fix has probably been lost along
with some of that experimental stuff I was doing at that time though,
I remember you saying the problem wasn't that big.

Ben.


2001-12-06 01:10:41

by David Miller

[permalink] [raw]
Subject: Re: alpha bug in signal handling

From: Paul Mackerras <[email protected]>
Date: Thu, 6 Dec 2001 07:55:26 +1100 (EST)

I think David (Mosberger, not Miller :) is right here, and in fact
this is also wrong on PPC at the moment. However, the worst case is
that the reschedule or signal delivery will get delayed until the next
interrupt on that cpu (max 1/HZ seconds). Since it is a pretty narrow
window for the race I think it would be hard to observe the effect of
the bug.

I understand now, thanks. I've fixed up sparc64.

So it's a latency problem, not a correctness problem.