2002-02-27 08:44:42

by Martin Wirth

[permalink] [raw]
Subject: Re: [PATCH] Lightweight userspace semphores

Hi Hubertus,

I just had a quick look on your semaphore code. As far as I can see you
do no re-check of the userspace semaphore counter while going to sleep
on your kernel semaphore. But this way you may loose synchronization
between the kernel semaphore and the user-space semaphore counter if
more than two processes are involved. Or did I miss some tricky form
of how you avoided this problem?

Martin Wirth


2002-02-27 15:24:19

by Hubertus Franke

[permalink] [raw]
Subject: Re: [PATCH] Lightweight userspace semphores

On Wed, Feb 27, 2002 at 09:43:45AM +0100, Martin Wirth wrote:
> Hi Hubertus,
>
> I just had a quick look on your semaphore code. As far as I can see you
> do no re-check of the userspace semaphore counter while going to sleep
> on your kernel semaphore. But this way you may loose synchronization
> between the kernel semaphore and the user-space semaphore counter if
> more than two processes are involved. Or did I miss some tricky form
> of how you avoided this problem?
>
> Martin Wirth

Yes, you are missing something.
I assume you are looking at the usema (non spinning version).
The trick is that the kernel semaphore stores a token in case a race
condition occurs and it will resolve it that way.
I rely on the fact that the kernel only provides a wait queue nothing
else, the entire state on how many are waiting are stored in user space.

Effectively you have a P operation in user space followed by a P operation
in kernel space. Same for V operation.

Assuming you have two or more processes 0,1,2 ... you can show
that race conditions are properly resolved. Again, the trick is to not
sync the state of the kernel and the user level. It comes naturally
if you properly separate the duties.

The spinning versions were much more interesting, you can really have
some tricky situations for 3 or more processes. The ulockflex program
really helped to identify problems.

In future messages please point out which of the user locks are in question.
Remember I provide several:
(1) semaphores
(2) semaphores with timed spinning
(3) convoy avoidance locks
(4) convoy avoidance locks with spinning
(5) shared locks == multiple reader/single writer locks
This list is growing as we speak.

I also modified the code (I will post the update at the lse site).
This also fixes a small bug in the ulockflex program, that I introduced
in the last posting at lse.
I took Ben LaHaises suggestions and provide multiple runqueues
up to a limit, rather then explicitely coding exclusive/shared version.


2002-02-27 17:20:17

by Martin Wirth

[permalink] [raw]
Subject: Re: [PATCH] Lightweight userspace semphores...

Hubertus Franke wrote:

> Again, the trick is to not
> sync the state of the kernel and the user level. It comes naturally
> if you properly separate the duties.
>

Your code for usema_down is simply:

int usema_down(ulock_t *ulock)
{
if (!__ulock_down(ulock)) {
return 0;
}
return ulock_wait(ulock,0);
}

This means you do not recheck if the usema is really available after you
return form ulock_wait(), but you may have the following situtation:

Process 1 Process 2 Process 3
down

down
-> call usema_wait
but gets preempted shortly
before it
can enter kernel mode

up
(kernel sema is free)

down -> got it


resume execution,
thinks its also got it
because kernel sema is free!

Now you have two owners!!


Martin Wirth




2002-02-27 19:04:35

by Hubertus Franke

[permalink] [raw]
Subject: Re: [PATCH] Lightweight userspace semphores...

On Wed, Feb 27, 2002 at 06:17:41PM +0100, Martin Wirth wrote:
> Hubertus Franke wrote:
>
> > Again, the trick is to not
> > sync the state of the kernel and the user level. It comes naturally
> > if you properly separate the duties.
> >
>
> Your code for usema_down is simply:
>
> int usema_down(ulock_t *ulock)
> {
> if (!__ulock_down(ulock)) {
> return 0;
> }
> return ulock_wait(ulock,0);
> }
>
> This means you do not recheck if the usema is really available after you
> return form ulock_wait(), but you may have the following situtation:

Not so. First remember the following:

__ulock_down is basically an
atomic_dec(&lockword) and returns 0 if lockword after decrement
is < 0.

__ulock_up is basically a
atomic_inc(&lockword) and returns 0 if lockword after increment
is >= 0.

After each line I print the value of the lockword after the atomic op.
I assume initial value was "1". Initial value of kernel semaphore is "0".
Let's write this as [ 1, 0 ] state (user,kernel).

>
> Process 1 Process 2 Process 3
> down
[ 0, 0 ]
>
> down
[ -1, 0 ]
> -> call usema_wait
> but gets preempted shortly
> before it
> can enter kernel mode
>
> up
[ 0, 0 ] now signal the kernel but nobody waiting on it yet
[ 0, 1 ] kernel stores the signal token in its state
> (kernel sema is free)
[ 0, 1 ] kernel stores the signal token in its state
>
> down -> got it
(not so........ )
[-1,1]
you don't have the lock
as __ulock_down return !=0
you are going to call the kernel ulock_wait

Two scenarios now: either P3 get's there first to the kernel or P2.
P3 first.

[-1,1] (start state from above)
wait in kernel is through down
that will succeed and eat
the kernel sem token and
continue out of kernel
holding the lock
[-1,0]

P2 rescheduled..
down in the kernel
will block the process
[ -1, -1 ]

P2 first, assuming that P3 too got preempted before the kernel call
("Shit Happens", Forrest Gump)


[-1,1] (start state from above)
wait in kernel is through down
that will succeed and eat
the kernel sem token and
continue out of kernel
holding the lock
[-1,0]

P3 rescheduled..
down in the kernel
will block the process
[ -1, -1 ]


Any UP call now will ditch into the kernel and signal the waiting process
to continue, handing the lock over to it.
[ 0, 0 ]
>
>
> resume execution,
> thinks its also got it
> because kernel sema is free!

So, it works and is correct. Hope that clarifies it, if not let me know.
Interestingly enough. This scheme doesn't work for spinning locks.
Goto lib/ulocks.c[91-133], I have integrated this dead code here
to show how not to do it. A similar analysis as above shows
that this approach wouldn't work. You need cmpxchg for these scenarios
(more or less).

>
> Now you have two owners!!
>

Not so, QED.

>
> Martin Wirth
>
>

-- Hubertus Franke ([email protected])

>

2002-03-02 14:08:28

by Hubertus Franke

[permalink] [raw]
Subject: Re: [PATCH] Lightweight userspace semaphores...

On Fri, Mar 01, 2002 at 09:07:18PM +0100, Martin Wirth wrote:
>
>
> Hubertus Franke Worte:
>
> >
> >So, it works and is correct. Hope that clarifies it, if not let me know.
> >Interestingly enough. This scheme doesn't work for spinning locks.
> >Goto lib/ulocks.c[91-133], I have integrated this dead code here
> >to show how not to do it. A similar analysis as above shows
> >that this approach wouldn't work. You need cmpxchg for these scenarios
> >(more or less).
> >
>
> You are right, I falsely assumed the initial state to be [1,1].
>
> But as mentioned in your README, your approach is currently is not able
> to manage signal handling correctly.
> You have to ignore all non-fatal signals by using ESYSRESTART and a
> SIG_KILL sent to one of the processes
> may corrupt your user-kernel-syncronisation.
>
> I don't think a user space semaphore implementation is acceptable until
> it provides (signal-) interruptability and
> timeouts. But maybe you have some idea how to manage this.
>
> Martin Wirth
>
>

As of the signal handling and ESYSRESTART.
The user code on the slow path can check for return code and
has two choices (a) reenter the kernel and wait or (b) correct the
status word, because its still counted as a waiting process and return
0. I have chosen (a) and I automated it.
I have tried to send signals (e.g. SIGIO) and it
seems to work fine. The signal is handled in user space and returns back
to the kernel section. (b) is feasible but a requires a bit more exception
work in the routine.
As of the corruption case. There is flatout (almost) nothing you can do.
For instance, if a process graps the locks and exits, then another process
tries to acquire it you got a deadlock. At the least on the mailing list
most people feel that you are dead in the water anyway.

I have been thinking about the problem of corruption.
The owner of the lock could register its pid in the lock aread (2nd word).
That still leaves non-atomic windows open and is a half-ass solution.

But more conceptually, if you process dies while holding a lock that protects
some shared data, there is no guarantee that you can make about the data
itself if the updating process dies. In that kind of scenario, why
trying to continue as if nothing happened.
It seems the general consent on the list is that your app is toast at that
point.


-- Hubertus

2002-03-03 22:14:13

by Paul Jackson

[permalink] [raw]
Subject: Re: [Lse-tech] Re: [PATCH] Lightweight userspace semaphores...

On Sat, 2 Mar 2002, Hubertus Franke wrote:
> But more conceptually, if you process dies while holding a lock ...
> your app is toast at that point.

Without application specific knowledge, certainly.

Is there someway one could support a hook, to enable
an application to register a handler that could clean
up, for those apps that found that worthwhile?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2002-03-04 16:07:18

by Hubertus Franke

[permalink] [raw]
Subject: Re: [Lse-tech] Re: [PATCH] Lightweight userspace semaphores...

On Sun, Mar 03, 2002 at 02:13:45PM -0800, Paul Jackson wrote:
> On Sat, 2 Mar 2002, Hubertus Franke wrote:
> > But more conceptually, if you process dies while holding a lock ...
> > your app is toast at that point.
>
> Without application specific knowledge, certainly.
>
> Is there someway one could support a hook, to enable
> an application to register a handler that could clean
> up, for those apps that found that worthwhile?
>

I think that's a good idea, I'll think it through and see what Rusty thinks.


> --
> I won't rest till it's the best ...
> Programmer, Linux Scalability
> Paul Jackson <[email protected]> 1.650.933.1373

2002-03-05 07:09:06

by Rusty Russell

[permalink] [raw]
Subject: Re: [Lse-tech] Re: [PATCH] Lightweight userspace semaphores...

On Sun, 3 Mar 2002 14:13:45 -0800
Paul Jackson <[email protected]> wrote:

> On Sat, 2 Mar 2002, Hubertus Franke wrote:
> > But more conceptually, if you process dies while holding a lock ...
> > your app is toast at that point.
>
> Without application specific knowledge, certainly.
>
> Is there someway one could support a hook, to enable
> an application to register a handler that could clean
> up, for those apps that found that worthwhile?

If you want this, use fcntl locks (see TDB). If you don't tell the kernel
what you are doing (which is the reason these locks are fast), it cannot
clean up for you.

One could conceive of a solution where a process told the kernel
"here is where I keep my lock states: if I die, clean up". Of course,
there's a race there too.

IMHO, given that the lock is protecting something which is left in an
unknown state, this is something which would require serious testing
to be proven worthwhile.

Hope that helps,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.