LinuxLists.cc - Fast Userspace Mutexes (futex) vs. msem

2002-03-08 06:16:11

Subject: Fast Userspace Mutexes (futex) vs. msem_*

I'm a long-time lurker in linux-kernel, but the discussion about fast
userspace mutexes ("futexes") has piqued my interest. I made great
use of the msem_* (msem_init, msem_lock, msem_unlock) functions on HP-UX
in the mid 90s (HP-UX 9.01 and on) At the time, they were invaluable for
me, with 1000+ processes needing exclusive access to resources on a 50MHz
machine. Requiring a system call for this purpose would have been a major
performance hit (remember HP-UX's system calls are not nearly so light
weight as Linux's)

The direction that the futex implementation is going is looking a lot like
how they are implemented on HP-UX (as well as Tru64 and AIX) I am curious
though why the case of "what happens if the process holding the lock dies"
is considered unimportant by some people. It wouldn't be all that much
more work to "do it right" (IMHO) and handle this case. AFAIK, on HP-UX
the implementation kept a "locker id" and a linked list of waiters' lock
ids (to allow first come first served as well as handling the case of a
lock holder dying) There was an underlying system call that was made when
the userspace part in libc found the lock already held and waiting for the
lock was desired.

If the implementation does move further towards the msem_* standard, it
might make sense to just implement it as defined, and provide source
compatibility with existing applications on HP-UX, AIX, and Tru64 that
use it.

I'm not subscribed (I skim the hypermail archives on zork a couple times
a week, so please cc: me on comments for a faster response)

--
Douglas Siebert
[email protected]

2002-03-08 06:30:59

by Davide Libenzi

[permalink] [raw]

Subject: Re: Fast Userspace Mutexes (futex) vs. msem_*

On Fri, 8 Mar 2002, Doug Siebert wrote:

> The direction that the futex implementation is going is looking a lot like
> how they are implemented on HP-UX (as well as Tru64 and AIX) I am curious
> though why the case of "what happens if the process holding the lock dies"
> is considered unimportant by some people. It wouldn't be all that much
> more work to "do it right" (IMHO) and handle this case. AFAIK, on HP-UX
> the implementation kept a "locker id" and a linked list of waiters' lock
> ids (to allow first come first served as well as handling the case of a
> lock holder dying) There was an underlying system call that was made when
> the userspace part in libc found the lock already held and waiting for the
> lock was desired.

Rusty, you should really make a futex-FAQ :-)

- Davide

2002-03-08 22:06:29

by Edgar Toernig

[permalink] [raw]

Subject: Re: Fast Userspace Mutexes (futex) vs. msem_*

Doug Siebert wrote:
>
> I am curious
> though why the case of "what happens if the process holding the lock dies"
> is considered unimportant by some people. It wouldn't be all that much
> more work to "do it right" (IMHO) and handle this case.

And what is "right"? You have two problems:

- The kernel has no idea of how many locks a dying process holds. The
kernel only starts to know about a lock when another process has to
wait for it.

- Locked data may be in an inconsistent state. The kernel has no idea
how to "repair" it.

I once suggested to send a signal to all processes that share a PROT_SEM
page with the dying process[1]. But even then it's pretty difficult for
the other processes to decide which locks were held by the dead one.
You need additional data and handling in userspace for that. But at
least it would help them to know about the dead and in the unhandled
case it would kill all possibly affected processes. Doing that in
userspace would require a manager process with all its ugliness.

Ciao, ET.

[1] I've no idea how difficult it would be to find these processes.

2002-03-09 07:14:50

by Rusty Russell

[permalink] [raw]

Subject: Re: Fast Userspace Mutexes (futex) vs. msem_*

On Fri, 8 Mar 2002 00:15:52 -0600 (CST)
Doug Siebert <[email protected]> wrote:

> The direction that the futex implementation is going is looking a lot like
> how they are implemented on HP-UX (as well as Tru64 and AIX) I am curious
> though why the case of "what happens if the process holding the lock dies"
> is considered unimportant by some people. It wouldn't be all that much
> more work to "do it right" (IMHO) and handle this case. AFAIK, on HP-UX
> the implementation kept a "locker id" and a linked list of waiters' lock
> ids (to allow first come first served as well as handling the case of a
> lock holder dying)

Fundamentally, these locks are fast because they *don't* tell the kernel.
Also, if you die with a lock, "autocleaning" it is often far worse then
just locking up, since the data it was protecting may be inconcsistant.

At the end of the day, this is the place for an application-specific
answer. And this is nothing that can't be done in userspace (ie. if
you can't get the lock for some length of time, notify the lock master
and it will do all the cleanups and release the lock).

Hope that clarifies,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-03-12 16:52:20

by Bill Davidsen

[permalink] [raw]

Subject: Re: Fast Userspace Mutexes (futex) vs. msem_*

On Fri, 8 Mar 2002, Edgar Toernig wrote:

> And what is "right"? You have two problems:
>
> - The kernel has no idea of how many locks a dying process holds. The
> kernel only starts to know about a lock when another process has to
> wait for it.

There have been some discussions on this, it seems possible to make the
information available, to {kernel,lockmanager,other}.

> - Locked data may be in an inconsistent state. The kernel has no idea
> how to "repair" it.

But that's true for other things as well, like locks on regions of files.
That hasn't prevented locking from working and being useful.

What I don't like is the idea of processes sitting forever with their
thumb up their ass waiting for an unlock which isn't going to come. I have
an application like that, starts a few thousand threads and spins at
10-70k ctx/sec forever. It's (a) commercial binary-only, and (b) selected
by management against my recommendations. The vendor blames Linux, of
course, "threads don't work right," meaning "like Solaris" rather than
"like POSIX."

I would LOVE to see Andrea's "Child Run First" and optional part of the
mainline kernel, the patch seems to solve the hang, but doesn't play well
with other patches.

I don't see either of the above as show stoppers, although I agree they
are implementation issues.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.