2021-06-05 09:01:24

by Andrey Semashev

[permalink] [raw]
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls

On 6/5/21 4:09 AM, Nicholas Piggin wrote:
> Excerpts from André Almeida's message of June 5, 2021 6:01 am:
>> Às 08:36 de 04/06/21, Nicholas Piggin escreveu:
>
>>> I'll be burned at the stake for suggesting it but it would be great if
>>> we could use file descriptors. At least for the shared futex, maybe
>>> private could use a per-process futex allocator. It solves all of the
>>> above, although I'm sure has many of its own problem. It may not play
>>> so nicely with the pthread mutex API because of the whole static
>>> initialiser problem, but the first futex proposal did use fds. But it's
>>> an example of an alternate API.
>>>
>>
>> FDs and futex doesn't play well, because for futex_wait() you need to
>> tell the kernel the expected value in the futex address to avoid
>> sleeping in a free lock. FD operations (poll, select) don't have this
>> `value` argument, so they could sleep forever, but I'm not sure if you
>> had taken this in consideration.
>
> I had. The futex wait API would take a fd additional. The only
> difference is the waitqueue that is used when a sleep or wake is
> required is derived from the fd, not from an address.
>
> I think the bigger sticking points would be if it's too heavyweight an
> object to use (which could be somewhat mitigated with a simpler ida
> allocator although that's difficult to do with shared), and whether libc
> could sanely use them due to the static initialiser problem of pthread
> mutexes.

The static initialization feature is not the only benefit of the current
futex design, and probably not the most important one. You can work
around the static initialization in userspace, e.g. by initializing fd
to an invalid value and creating a valid fd upon the first use. Although
that would still incur a performance penalty and add a new source of
failure.

What is more important is that waiting on fd always requires a kernel
call. This will be terrible for performance of uncontended locks, which
is the majority of time.

Another important point is that a futex that is not being waited on
consumes zero kernel resources while fd is a limited resource even when
not used. You can have millions futexes in userspace and you are
guaranteed not to exhaust any limit as long as you have memory. That is
an important feature, and the current userspace is relying on it by
assuming that creating mutexes and condition variables is cheap.

Having futex fd would be useful in some cases to be able to integrate
futexes with IO. I did have use cases where I would have liked to have
FUTEX_FD in the past. These cases arise when you already have a thread
that operates on fds and you want to avoid having a separate thread that
blocks on futexes in a similar fashion. But, IMO, that should be an
optional opt-in feature. By far, not every futex needs to have an fd.
For just waiting on multiple futexes, the native support that futex2
provides is superior.

PS: I'm not asking FUTEX_FD to be implemented as part of futex2 API.
futex2 would be great even without it.


2021-06-06 12:02:43

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls

Excerpts from Andrey Semashev's message of June 5, 2021 6:56 pm:
> On 6/5/21 4:09 AM, Nicholas Piggin wrote:
>> Excerpts from André Almeida's message of June 5, 2021 6:01 am:
>>> Às 08:36 de 04/06/21, Nicholas Piggin escreveu:
>>
>>>> I'll be burned at the stake for suggesting it but it would be great if
>>>> we could use file descriptors. At least for the shared futex, maybe
>>>> private could use a per-process futex allocator. It solves all of the
>>>> above, although I'm sure has many of its own problem. It may not play
>>>> so nicely with the pthread mutex API because of the whole static
>>>> initialiser problem, but the first futex proposal did use fds. But it's
>>>> an example of an alternate API.
>>>>
>>>
>>> FDs and futex doesn't play well, because for futex_wait() you need to
>>> tell the kernel the expected value in the futex address to avoid
>>> sleeping in a free lock. FD operations (poll, select) don't have this
>>> `value` argument, so they could sleep forever, but I'm not sure if you
>>> had taken this in consideration.
>>
>> I had. The futex wait API would take a fd additional. The only
>> difference is the waitqueue that is used when a sleep or wake is
>> required is derived from the fd, not from an address.
>>
>> I think the bigger sticking points would be if it's too heavyweight an
>> object to use (which could be somewhat mitigated with a simpler ida
>> allocator although that's difficult to do with shared), and whether libc
>> could sanely use them due to the static initialiser problem of pthread
>> mutexes.
>
> The static initialization feature is not the only benefit of the current
> futex design, and probably not the most important one. You can work
> around the static initialization in userspace, e.g. by initializing fd
> to an invalid value and creating a valid fd upon the first use. Although
> that would still incur a performance penalty and add a new source of
> failure.

Sounds like a serious problem, but maybe it isn't. On the other hand,
maybe we don't have to support pthread mutexes as they are anyway
because futex already does that fairly well.

> What is more important is that waiting on fd always requires a kernel
> call. This will be terrible for performance of uncontended locks, which
> is the majority of time.

No. As I said just before, it would be the same except the waitqueue is
derived from fd rather than address.

>
> Another important point is that a futex that is not being waited on
> consumes zero kernel resources while fd is a limited resource even when
> not used. You can have millions futexes in userspace and you are
> guaranteed not to exhaust any limit as long as you have memory. That is
> an important feature, and the current userspace is relying on it by
> assuming that creating mutexes and condition variables is cheap.

Is it an important feture? Would 1 byte of kernel memory per uncontended
futex be okay? 10? 100?

I do see it's very nice the current design that requires no
initialization for uncontended, I'm just asking questions to get an idea
of what constraints we're working with. We have a pretty good API
already which can support unlimited uncontended futexes, so I'm
wondering do we really need another very very similar API that doesn't
fix the really difficult problems of the existing one?

Thanks,
Nick

> Having futex fd would be useful in some cases to be able to integrate
> futexes with IO. I did have use cases where I would have liked to have
> FUTEX_FD in the past. These cases arise when you already have a thread
> that operates on fds and you want to avoid having a separate thread that
> blocks on futexes in a similar fashion. But, IMO, that should be an
> optional opt-in feature. By far, not every futex needs to have an fd.
> For just waiting on multiple futexes, the native support that futex2
> provides is superior.
>
> PS: I'm not asking FUTEX_FD to be implemented as part of futex2 API.
> futex2 would be great even without it.

2021-06-06 13:20:08

by Andrey Semashev

[permalink] [raw]
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls

On 6/6/21 2:57 PM, Nicholas Piggin wrote:
> Excerpts from Andrey Semashev's message of June 5, 2021 6:56 pm:
>> On 6/5/21 4:09 AM, Nicholas Piggin wrote:
>>> Excerpts from André Almeida's message of June 5, 2021 6:01 am:
>>>> Às 08:36 de 04/06/21, Nicholas Piggin escreveu:
>>>
>>>>> I'll be burned at the stake for suggesting it but it would be great if
>>>>> we could use file descriptors. At least for the shared futex, maybe
>>>>> private could use a per-process futex allocator. It solves all of the
>>>>> above, although I'm sure has many of its own problem. It may not play
>>>>> so nicely with the pthread mutex API because of the whole static
>>>>> initialiser problem, but the first futex proposal did use fds. But it's
>>>>> an example of an alternate API.
>>>>>
>>>>
>>>> FDs and futex doesn't play well, because for futex_wait() you need to
>>>> tell the kernel the expected value in the futex address to avoid
>>>> sleeping in a free lock. FD operations (poll, select) don't have this
>>>> `value` argument, so they could sleep forever, but I'm not sure if you
>>>> had taken this in consideration.
>>>
>>> I had. The futex wait API would take a fd additional. The only
>>> difference is the waitqueue that is used when a sleep or wake is
>>> required is derived from the fd, not from an address.
>>>
>>> I think the bigger sticking points would be if it's too heavyweight an
>>> object to use (which could be somewhat mitigated with a simpler ida
>>> allocator although that's difficult to do with shared), and whether libc
>>> could sanely use them due to the static initialiser problem of pthread
>>> mutexes.
>>
>> The static initialization feature is not the only benefit of the current
>> futex design, and probably not the most important one. You can work
>> around the static initialization in userspace, e.g. by initializing fd
>> to an invalid value and creating a valid fd upon the first use. Although
>> that would still incur a performance penalty and add a new source of
>> failure.
>
> Sounds like a serious problem, but maybe it isn't. On the other hand,
> maybe we don't have to support pthread mutexes as they are anyway
> because futex already does that fairly well.
>
>> What is more important is that waiting on fd always requires a kernel
>> call. This will be terrible for performance of uncontended locks, which
>> is the majority of time.
>
> No. As I said just before, it would be the same except the waitqueue is
> derived from fd rather than address.

Sorry, in that case I'm not sure I understand how that would work. You
do need to allocate a fd, do you?

>> Another important point is that a futex that is not being waited on
>> consumes zero kernel resources while fd is a limited resource even when
>> not used. You can have millions futexes in userspace and you are
>> guaranteed not to exhaust any limit as long as you have memory. That is
>> an important feature, and the current userspace is relying on it by
>> assuming that creating mutexes and condition variables is cheap.
>
> Is it an important feture? Would 1 byte of kernel memory per uncontended
> futex be okay? 10? 100?
>
> I do see it's very nice the current design that requires no
> initialization for uncontended, I'm just asking questions to get an idea
> of what constraints we're working with. We have a pretty good API
> already which can support unlimited uncontended futexes, so I'm
> wondering do we really need another very very similar API that doesn't
> fix the really difficult problems of the existing one?

It does provide the very much needed features that are missing in the
current futex. Namely, more futex sizes and wait for multiple. So the
argument of "why have two similar APIs" is not quite fair. It would be,
if there was feature parity with futex.

I believe, the low cost of a futex is an important feature, and was one
of the reasons for its original design and introduction. Otherwise we
would be using eventfds in mutexes.

One other feature that I didn't mention earlier and which follows from
its "address in memory" design is the ability to use futexes in
process-shared memory. This is important for process-shared pthread
components, too, but has its own value even without this, if you use
futexes directly. With fds, you can't place the fd in a shared memory
since every process needs to have its own fd referring to the same
kernel object, and passing fds cannot be done without a UNIX socket.
This is incompatible with pthreads API design and would require
non-trivial design changes to the applications using futexes directly.

2021-06-08 01:27:35

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls

Excerpts from Andrey Semashev's message of June 6, 2021 11:15 pm:
> On 6/6/21 2:57 PM, Nicholas Piggin wrote:
>> Excerpts from Andrey Semashev's message of June 5, 2021 6:56 pm:
>>> On 6/5/21 4:09 AM, Nicholas Piggin wrote:
>>>> Excerpts from André Almeida's message of June 5, 2021 6:01 am:
>>>>> Às 08:36 de 04/06/21, Nicholas Piggin escreveu:
>>>>
>>>>>> I'll be burned at the stake for suggesting it but it would be great if
>>>>>> we could use file descriptors. At least for the shared futex, maybe
>>>>>> private could use a per-process futex allocator. It solves all of the
>>>>>> above, although I'm sure has many of its own problem. It may not play
>>>>>> so nicely with the pthread mutex API because of the whole static
>>>>>> initialiser problem, but the first futex proposal did use fds. But it's
>>>>>> an example of an alternate API.
>>>>>>
>>>>>
>>>>> FDs and futex doesn't play well, because for futex_wait() you need to
>>>>> tell the kernel the expected value in the futex address to avoid
>>>>> sleeping in a free lock. FD operations (poll, select) don't have this
>>>>> `value` argument, so they could sleep forever, but I'm not sure if you
>>>>> had taken this in consideration.
>>>>
>>>> I had. The futex wait API would take a fd additional. The only
>>>> difference is the waitqueue that is used when a sleep or wake is
>>>> required is derived from the fd, not from an address.
>>>>
>>>> I think the bigger sticking points would be if it's too heavyweight an
>>>> object to use (which could be somewhat mitigated with a simpler ida
>>>> allocator although that's difficult to do with shared), and whether libc
>>>> could sanely use them due to the static initialiser problem of pthread
>>>> mutexes.
>>>
>>> The static initialization feature is not the only benefit of the current
>>> futex design, and probably not the most important one. You can work
>>> around the static initialization in userspace, e.g. by initializing fd
>>> to an invalid value and creating a valid fd upon the first use. Although
>>> that would still incur a performance penalty and add a new source of
>>> failure.
>>
>> Sounds like a serious problem, but maybe it isn't. On the other hand,
>> maybe we don't have to support pthread mutexes as they are anyway
>> because futex already does that fairly well.
>>
>>> What is more important is that waiting on fd always requires a kernel
>>> call. This will be terrible for performance of uncontended locks, which
>>> is the majority of time.
>>
>> No. As I said just before, it would be the same except the waitqueue is
>> derived from fd rather than address.
>
> Sorry, in that case I'm not sure I understand how that would work. You
> do need to allocate a fd, do you?

Yes. As I said, imagine a futex_wait API that also takes a fd. The
wait queue is derived from that fd rather than the hash table.

>>> Another important point is that a futex that is not being waited on
>>> consumes zero kernel resources while fd is a limited resource even when
>>> not used. You can have millions futexes in userspace and you are
>>> guaranteed not to exhaust any limit as long as you have memory. That is
>>> an important feature, and the current userspace is relying on it by
>>> assuming that creating mutexes and condition variables is cheap.
>>
>> Is it an important feture? Would 1 byte of kernel memory per uncontended
>> futex be okay? 10? 100?
>>
>> I do see it's very nice the current design that requires no
>> initialization for uncontended, I'm just asking questions to get an idea
>> of what constraints we're working with. We have a pretty good API
>> already which can support unlimited uncontended futexes, so I'm
>> wondering do we really need another very very similar API that doesn't
>> fix the really difficult problems of the existing one?
>
> It does provide the very much needed features that are missing in the
> current futex. Namely, more futex sizes and wait for multiple. So the
> argument of "why have two similar APIs" is not quite fair. It would be,
> if there was feature parity with futex.

It does provide some extra features sure, with some straightforward
extension of the existing API. The really interesting or tricky part of
the API is left unchanged though.

My line of thinking is that while we're changing the API anyway, we
should see if it can be changed to help those other problems too.

> I believe, the low cost of a futex is an important feature, and was one
> of the reasons for its original design and introduction.

It is of course. The first futex proposal did use fds, interestingly.
I didn't look back further into the libc side of that thing, but maybe
I should.

> Otherwise we
> would be using eventfds in mutexes.

I don't think so, not even if eventfd came before the futex syscall.

>
> One other feature that I didn't mention earlier and which follows from
> its "address in memory" design is the ability to use futexes in
> process-shared memory. This is important for process-shared pthread
> components, too, but has its own value even without this, if you use
> futexes directly. With fds, you can't place the fd in a shared memory
> since every process needs to have its own fd referring to the same
> kernel object, and passing fds cannot be done without a UNIX socket.
> This is incompatible with pthreads API design and would require
> non-trivial design changes to the applications using futexes directly.
>

That may be true. file is a natural object to share such a resource, but
the means to share the fd is not so easy. OTOH you could also use a
syscall to open the same file and get a new fd.

Are shared pthread mutexes using existing pthread APIs that are today
implemented okay with futex1 system call a good reason to constrain
futex2 I wonder? Or do we have an opportunity to make a bigger change
to the API so it suffers less from non deterministic latency (for
example)?

I don't want to limit it to just files vs addresses, fds was an example
of something that could solve some of the problems.

Thanks,
Nick

2021-06-08 11:06:09

by Andrey Semashev

[permalink] [raw]
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls

On 6/8/21 4:25 AM, Nicholas Piggin wrote:
>
> Are shared pthread mutexes using existing pthread APIs that are today
> implemented okay with futex1 system call a good reason to constrain
> futex2 I wonder? Or do we have an opportunity to make a bigger change
> to the API so it suffers less from non deterministic latency (for
> example)?

If futex2 is not able to cover futex1 use cases then it cannot be viewed
as a replacement. In the long term this means futex1 cannot be
deprecated and has to be maintained. My impression was that futex1 was
basically unmaintainable(*) and futex2 was an evolution of futex1 so
that users of futex1 could migrate relatively easily and futex1
eventually removed. Maybe my impression was wrong, but I would like to
see futex2 as a replacement and extension of futex1, so the latter can
be deprecated at some point.

In any case, creating a new API should consider requirements of its
potential users. If futex2 is intended to eventually replace futex1 then
all current futex1 users are potential users of futex2. If not, then the
futex2 submission should list its intended users, at least in general
terms, and their requirements that led to the proposed API design.

(*) I use "unmaintainable" in a broad sense here. It exists and works in
newer kernel versions and may receive code changes that are necessary to
keep it working, but maintainers refuse any extensions or modifications
of the code, mostly because of its complexity.

2021-06-08 11:16:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls

On Tue, Jun 08, 2021 at 02:03:50PM +0300, Andrey Semashev wrote:
> On 6/8/21 4:25 AM, Nicholas Piggin wrote:
> >
> > Are shared pthread mutexes using existing pthread APIs that are today
> > implemented okay with futex1 system call a good reason to constrain
> > futex2 I wonder? Or do we have an opportunity to make a bigger change
> > to the API so it suffers less from non deterministic latency (for
> > example)?
>
> If futex2 is not able to cover futex1 use cases then it cannot be viewed as
> a replacement. In the long term this means futex1 cannot be deprecated and
> has to be maintained. My impression was that futex1 was basically
> unmaintainable(*) and futex2 was an evolution of futex1 so that users of
> futex1 could migrate relatively easily and futex1 eventually removed. Maybe
> my impression was wrong, but I would like to see futex2 as a replacement and
> extension of futex1, so the latter can be deprecated at some point.

You can never delete a kernel system call, so even if you "deprecate"
it, it still needs to be supported for forever.

Best of all would be if internally your "futex2" code would replace the
"futex1" code so that there is no two different code bases. That would
be the only sane way forward, having 2 code bases to work with is just
insane.

> (*) I use "unmaintainable" in a broad sense here. It exists and works in
> newer kernel versions and may receive code changes that are necessary to
> keep it working, but maintainers refuse any extensions or modifications of
> the code, mostly because of its complexity.

Adding additional complexity for no good reason is not a good idea,
especially if you are asking others to maintain and support that
complexity. Would you want to have to do that work?

So what's keeping the futex2 code from doing all that futex1 does so
that the futex1 code can be deleted internally?

thanks,

greg k-h

2021-06-08 14:17:39

by André Almeida

[permalink] [raw]
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls

Hi Greg,

Às 08:13 de 08/06/21, Greg KH escreveu:
> On Tue, Jun 08, 2021 at 02:03:50PM +0300, Andrey Semashev wrote:
>> On 6/8/21 4:25 AM, Nicholas Piggin wrote:
>>>
>>> Are shared pthread mutexes using existing pthread APIs that are today
>>> implemented okay with futex1 system call a good reason to constrain
>>> futex2 I wonder? Or do we have an opportunity to make a bigger change
>>> to the API so it suffers less from non deterministic latency (for
>>> example)?
>>
>> If futex2 is not able to cover futex1 use cases then it cannot be viewed as
>> a replacement. In the long term this means futex1 cannot be deprecated and
>> has to be maintained. My impression was that futex1 was basically
>> unmaintainable(*) and futex2 was an evolution of futex1 so that users of
>> futex1 could migrate relatively easily and futex1 eventually removed. Maybe
>> my impression was wrong, but I would like to see futex2 as a replacement and
>> extension of futex1, so the latter can be deprecated at some point.
>
> You can never delete a kernel system call, so even if you "deprecate"
> it, it still needs to be supported for forever.
>
> Best of all would be if internally your "futex2" code would replace the
> "futex1" code so that there is no two different code bases. That would
> be the only sane way forward, having 2 code bases to work with is just
> insane.
>
>> (*) I use "unmaintainable" in a broad sense here. It exists and works in
>> newer kernel versions and may receive code changes that are necessary to
>> keep it working, but maintainers refuse any extensions or modifications of
>> the code, mostly because of its complexity.
>
> Adding additional complexity for no good reason is not a good idea,
> especially if you are asking others to maintain and support that
> complexity. Would you want to have to do that work?
>
> So what's keeping the futex2 code from doing all that futex1 does so
> that the futex1 code can be deleted internally?
>

My very first submission of futex2[0] was just an overlay on top of
futex.c, I didn't get much feedback at that time, but I think this is
what you and Peter are thinking of?

After that, last year at Plumbers' RT MC, I presented a talk called
"futex2: A New Interface" and my conclusion after the discussion on this
talk + responses I got from my FUTEX_WAIT_MULTIPLE patchset[1] was that
this work couldn't be done at futex.c, given how fragile things are
there. futex.c would be "feature freeze" and no new major changes would
happen there.

This is the context where this new futex2 code base comes from. So,
which one is it? Happy to go either way but I'm getting conflicting
messages here.

Thanks,
André

[0]
https://lore.kernel.org/lkml/[email protected]/

[1]
https://lore.kernel.org/lkml/[email protected]/

> thanks,
>
> greg k-h
>

2021-06-09 00:56:27

by Andrey Semashev

[permalink] [raw]
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls

On 6/8/21 2:13 PM, Greg KH wrote:
> On Tue, Jun 08, 2021 at 02:03:50PM +0300, Andrey Semashev wrote:
>> On 6/8/21 4:25 AM, Nicholas Piggin wrote:
>>>
>>> Are shared pthread mutexes using existing pthread APIs that are today
>>> implemented okay with futex1 system call a good reason to constrain
>>> futex2 I wonder? Or do we have an opportunity to make a bigger change
>>> to the API so it suffers less from non deterministic latency (for
>>> example)?
>>
>> If futex2 is not able to cover futex1 use cases then it cannot be viewed as
>> a replacement. In the long term this means futex1 cannot be deprecated and
>> has to be maintained. My impression was that futex1 was basically
>> unmaintainable(*) and futex2 was an evolution of futex1 so that users of
>> futex1 could migrate relatively easily and futex1 eventually removed. Maybe
>> my impression was wrong, but I would like to see futex2 as a replacement and
>> extension of futex1, so the latter can be deprecated at some point.
>
> You can never delete a kernel system call, so even if you "deprecate"
> it, it still needs to be supported for forever.

If I'm not mistaken, some syscalls were dropped from kernel in the past,
after it was established they are no longer used. So it is not
impossible, though might be more difficult specifically with futex.

> Best of all would be if internally your "futex2" code would replace the
> "futex1" code so that there is no two different code bases. That would
> be the only sane way forward, having 2 code bases to work with is just
> insane.

Yes, implementing futex1 in terms of futex2 internally is a possible way
forward. Though I'm not sure it is reasonable to require that to be done
in the initial futex2 submission. This requires all of the futex1
functionality to implemented in futex2 from the start, which I think is
too much to ask. Even with some futex1 features missing, futex2 would be
already very much useful to users, and it is easier to implement the
missing bits incrementally over time.

Also, one other point I'd like to make is that not all futex1 features
might need to be reimplemented if futex2 provides a better alternative.
For example, as a user, I would like to see a different approach to
robust futexes that does not mandate a single user (libc) and allows to
use robust futexes directly.

>> (*) I use "unmaintainable" in a broad sense here. It exists and works in
>> newer kernel versions and may receive code changes that are necessary to
>> keep it working, but maintainers refuse any extensions or modifications of
>> the code, mostly because of its complexity.
>
> Adding additional complexity for no good reason is not a good idea,
> especially if you are asking others to maintain and support that
> complexity. Would you want to have to do that work?
>
> So what's keeping the futex2 code from doing all that futex1 does so
> that the futex1 code can be deleted internally?

I think, André will answer this, but my guess is, as stated above, this
is a lot of work and time while the intermediate version is already useful.