LinuxLists.cc - Robust futexes

2006-02-17 04:57:52

Subject: Robust futexes

Hi Ingo, all,

Noticed (via LWN, hence the delay) your robust futex work. Have you
considered the less-perfect, but simpler option of simply having futex
calls which tell the kernel that the u32 value is in fact the holder's
TID?

In this case, you don't get perfect robustness when TID wrap occurs:
the kernel won't know that the lock holder is dead. However, it's
simple, and telling the kernel that the lock is the tid allows the
kernel to do prio inheritence etc. in future.

Cheers!
Rusty.
--
ccontrol: http://ozlabs.org/~rusty/ccontrol

2006-02-17 06:42:26

by Paul Jackson

[permalink] [raw]

Subject: Re: Robust futexes

Rusty wrote:
> having futex
> calls which tell the kernel that the u32 value is in fact the holder's
> TID?

Huh - I must be dense. When would these calls be made?
Once per task creation, once per allocation of memory
for the lock, once per contested lock attempt, once per
uncontested lock attempt, ... ?

With Ingo's robust_futexes, you could have a task that
has taken and released a gazillion futex locks, and is
still at the present moment holding 47 of them, drop dead
and be able to initiate cleanup of exactly those 47 locks,
never having made but one system call at the birth of the
thread.

Can your idea do that?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-02-17 07:12:50

by Rusty Russell

[permalink] [raw]

Subject: Re: Robust futexes

On Thu, 2006-02-16 at 22:42 -0800, Paul Jackson wrote:
> Rusty wrote:
> > having futex
> > calls which tell the kernel that the u32 value is in fact the holder's
> > TID?
>
> Huh - I must be dense. When would these calls be made?
> Once per task creation, once per allocation of memory
> for the lock, once per contested lock attempt, once per
> uncontested lock attempt, ... ?

Hi Paul,

Sorry if I wasn't clear. A flag on the futex_wait operation (or, given
the current implementation, YA multiplexed FUTEX_WAIT variant).

> With Ingo's robust_futexes, you could have a task that
> has taken and released a gazillion futex locks, and is
> still at the present moment holding 47 of them, drop dead
> and be able to initiate cleanup of exactly those 47 locks,
> never having made but one system call at the birth of the
> thread.
>
> Can your idea do that?

I think so, yes. The kernel realizes it has to sleep, checks the thread
corresponding to the TID it just read is still alive, if not goes into
cleanup path...

Does that clarify?
Rusty.
--
ccontrol: http://ozlabs.org/~rusty/ccontrol

2006-02-17 07:30:13

by Paul Jackson

[permalink] [raw]

Subject: Re: Robust futexes

Ah - that makes more sense - thanks.

So the point is that we only have to cleanup the stale locks of dead
threads when some other task has the misfortune of trying to take
the orphaned lock and gets forced into a wait.

The wait call essentially becomes a "wait unless said other TID is
dead, in which case, a new owner is summarily declared."

Hmmm ...

How do you handle the case where the wait occurred before the death,
not after, and the case where the problem is caused by a task dying
that was not the task that held the lock when the wait was called.

Say task A is holding the lock for a while, during which tasks B,
C and D queue up waiting for the lock, then task A releases and task
B gets it, then task B drops dead unexpectedly.

When C and D began their wait, A owned the lock. Now it is the death
of B that should lead to the awakening of C.

What does you solution look like in that case?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-02-17 09:14:52

by Ingo Molnar

[permalink] [raw]

Subject: Re: Robust futexes

* Paul Jackson <[email protected]> wrote:

> So the point is that we only have to cleanup the stale locks of dead
> threads when some other task has the misfortune of trying to take the
> orphaned lock and gets forced into a wait.
>
> The wait call essentially becomes a "wait unless said other TID is
> dead, in which case, a new owner is summarily declared."

the fundamental problem i see here: how do you 'declare' a TID as dead?
32-bit TIDs can be reused, quite fundamentally. A 64-bit TID space with
no wrapover was suggested before in this discussion, but that's not
possible for many reasons (ABI impact, user impact, and due to futexes
being designed as 32-bit variables).

also, CPU support is problematic: not all 32-bit CPUs we support can do
64-bit atomic ops. The moment the TID cannot be handled atomically, we
are back to square 1 and to ->list_op_pending type of techniques.

[ and even a 64-bit TID space might be too narrow: lets fast forward 5
years and assume a CPU that can create/destroy a thread in 0.1 usecs
(right now we can do that in ~1 usec), and assume a total number of
2048 cores within the system [say 128x16], a 64-bit TID space, with 3
high bits set aside, could wrap around in 3 years. That's just a
single order of magnitude away from being 'months' and causing
practical problems. And that assumes the most 'compressed' variant: a
central TID counter - not including things like clustering or
scalability enhancements by partitioning the TID space along CPUs. ]

also, this approach has a futex performance issue: at every FUTEX_WAIT
we'd have to look up the TID, just to make sure that it isnt dead. This
will likely be an expensive operation [it probably needs to take the
tasklist_lock, to ensure that the task _really_ isnt dead] - and futexes
are supposed to be lightweight, even in their in-kernel slowpath. While
with the userspace-list based approach, the normal in-kernel futex path
is not impacted _at all_ - and even in the failure case, the TID only
has to be matched against current->pid.

but yes, in theory, if we had a unique ID (per bootup) for every task
started in the system, things would be somewhat simpler in some areas.
In practice though, even if all the other (big) hurdles are overcome,
handling that unique ID likely needs similar techniques as handling the
list, and wont perform as well as the userspace-list based approach.

Ingo

2006-02-17 15:47:37

by Daniel Walker

[permalink] [raw]

Subject: Re: Robust futexes

On Fri, 2006-02-17 at 15:57 +1100, Rusty Russell wrote:
> Hi Ingo, all,
>
> Noticed (via LWN, hence the delay) your robust futex work. Have you
> considered the less-perfect, but simpler option of simply having futex
> calls which tell the kernel that the u32 value is in fact the holder's
> TID?
>
> In this case, you don't get perfect robustness when TID wrap occurs:
> the kernel won't know that the lock holder is dead. However, it's
> simple, and telling the kernel that the lock is the tid allows the
> kernel to do prio inheritence etc. in future.

I think this was Todd Kneisel's approach . His version was vma
scanning, which is what Ingo is trying to replace. It just used the
current u32 value .

Daniel

2006-02-17 16:23:50

by Darren Hart

[permalink] [raw]

Subject: Re: Robust futexes

Rusty Russell wrote:
> Hi Ingo, all,
>
> Noticed (via LWN, hence the delay) your robust futex work. Have you
> considered the less-perfect, but simpler option of simply having futex
> calls which tell the kernel that the u32 value is in fact the holder's
> TID?
>
> In this case, you don't get perfect robustness when TID wrap occurs:
> the kernel won't know that the lock holder is dead. However, it's
> simple, and telling the kernel that the lock is the tid allows the
> kernel to do prio inheritence etc. in future.

Priority Inheritance has come up a couple of times in relation to Ingo's new
LightWeight Robust Futexes. Ingo has said that PI is orthogonal to LWRF, but I
don't think we've heard if there are plans already in the works (or in his head
:-) for PI. Rusty's comment above reads as "the current LWRF implementation
cannot support PI" - is there something about it that makes PI impractical to
implement?

Thanks,

--
Darren Hart

2006-02-19 00:28:38

by Rusty Russell

[permalink] [raw]

Subject: Re: Robust futexes

On Fri, 2006-02-17 at 10:13 +0100, Ingo Molnar wrote:
> * Paul Jackson <[email protected]> wrote:
>
> > So the point is that we only have to cleanup the stale locks of dead
> > threads when some other task has the misfortune of trying to take the
> > orphaned lock and gets forced into a wait.
> >
> > The wait call essentially becomes a "wait unless said other TID is
> > dead, in which case, a new owner is summarily declared."
>
> the fundamental problem i see here: how do you 'declare' a TID as dead?
> 32-bit TIDs can be reused, quite fundamentally.

Yes. I was asking of we actually need prefect robustness. I'm fairly
confident that this approach would work well in practice, since if tids
are being churned, the thread with wrapped TID will exit soon anyway.

I mentioned the possibility to make sure you had considered it.

As an added bonus, the tone of the first response I received (not
yours!) reminded me why I am not subscribed to lkml...

Cheers!
Rusty.
--
ccontrol: http://ozlabs.org/~rusty/ccontrol

2006-02-19 04:11:48

by Paul Jackson

[permalink] [raw]

Subject: Re: Robust futexes

> As an added bonus, the tone of the first response I received (not
> yours!) reminded me why I am not subscribed to lkml...

Oh dear. Now I have on my conscious reminding Rusty why
he keeps off lkml.

I'm not sure quite how I did that, but I wish I hadn't.

I for one found your lkml posts, Rusty, to be delightful.

Take good care of yourself, Rusty.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-02-20 09:08:23

by Ingo Molnar

[permalink] [raw]

Subject: Re: Robust futexes

* Rusty Russell <[email protected]> wrote:

> > the fundamental problem i see here: how do you 'declare' a TID as dead?
> > 32-bit TIDs can be reused, quite fundamentally.
>
> Yes. I was asking of we actually need prefect robustness. [...]

yes - we need at a minimum robustness that will work for all sane
workloads. If we make applications rely on it, we should offer an
implementation that works under well-specified circumstances. "The
feature might not work if some other, unrelated application happens
churn more than 32768 threads" is not well-specified. [not to talk about
the problems with the upcoming POSIX specification: we certainly wont be
able to claim to support the feature, if it doesnt reliably work under
an easily reproducible, normal workload.]

> [...] I'm fairly confident that this approach would work well in
> practice, since if tids are being churned, the thread with wrapped TID
> will exit soon anyway.

we cannot assume that - e.g. if there are two unrelated apps, one a
fast-churner, and another one relying on robust mutexes.

> As an added bonus, the tone of the first response I received (not
> yours!) reminded me why I am not subscribed to lkml...

hey, if that response is deemed as too confrontational, then i'd have to
discard 97% of the lkml responses i get to patches ;-) Another thing is
that this particular topic was pretty hotly discussed from the onset,
which could easily have carried over into other replies as well. So i'd
really not take it personal.

Ingo

2006-02-20 22:33:46

by Paul Jackson

[permalink] [raw]

Subject: Re: Robust futexes

Ingo wrote:
> hey, if that response is deemed as too confrontational, then ...

That's ok, Ingo. The reply was no doubt fine by lkml standards,
but apparently not what Rusty chooses to deal with on a routine basis.

His choice. I wish him well.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-03-10 01:40:59

by Rusty Russell

[permalink] [raw]

Subject: Re: Robust futexes

On Fri, 2006-02-17 at 08:23 -0800, Darren Hart wrote:
> Rusty Russell wrote:
> telling the kernel that the lock is the tid allows the
> > kernel to do prio inheritence etc. in future.
>
> Priority Inheritance has come up a couple of times in relation to Ingo's new
> LightWeight Robust Futexes. Ingo has said that PI is orthogonal to LWRF, but I
> don't think we've heard if there are plans already in the works (or in his head
> :-) for PI. Rusty's comment above reads as "the current LWRF implementation
> cannot support PI" - is there something about it that makes PI impractical to
> implement?

Hi Darren!

Ingo's approach is indeed orthogonal. But the obvious approach to PI
etc is to tell the kernel who is holding the lock, by making the lock
value == TID of the holder. If we are heading towards this anyway, the
kernel could use this to implement robust mutexes, too, although not
with a 100% guarantee (due to tid wrap). Ingo doesn't like that,
though.

Hope that clarifies!
Rusty.
--
ccontrol: http://ozlabs.org/~rusty/ccontrol