Userspace implementations of mutexes (including glibc) in some cases
retries operation without checking error code from syscall futex.
This is good for performance because most errors are impossible when
locking code trusts itself.
Some errors which could came from outer code are handled automatically,
for example invalid address triggers SIGSEGV on atomic fast path.
But one case turns into nasty busy-loop: when address is unaligned.
futex(FUTEX_WAIT) returns EINVAL immediately and loop goes to retry.
Example which loops inside second call rather than hung peacefully:
#include <stdlib.h>
#include <pthread.h>
int main(int argc, char **argv)
{
char buf[sizeof(pthread_mutex_t) + 1];
pthread_mutex_t *mutex = (pthread_mutex_t *)(buf + 1);
pthread_mutex_init(mutex, NULL);
pthread_mutex_lock(mutex);
pthread_mutex_lock(mutex);
}
It seems there is no practical usage for calling syscall futex for
unaligned address. This may be only bug in user space. Let's help
and handle this gracefully without adding extra code on fast path.
This patch sends SIGBUS signal to slay task and break busy-loop.
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Reported-by: Maxim Samoylov <[email protected]>
---
kernel/futex.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index b59532862bc0..8a6d35fa56bc 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -508,10 +508,21 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
/*
* The futex address must be "naturally" aligned.
+ * Also send signal to break busy-loop if user-space ignore error.
+ * EFAULT case should trigger SIGSEGV at access from user-space.
*/
key->both.offset = address % PAGE_SIZE;
- if (unlikely((address % sizeof(u32)) != 0))
+ if (unlikely((address % sizeof(u32)) != 0)) {
+ struct kernel_siginfo info;
+
+ clear_siginfo(&info);
+ info.si_signo = SIGBUS;
+ info.si_code = BUS_ADRALN;
+ info.si_addr = uaddr;
+ force_sig_info(&info);
+
return -EINVAL;
+ }
address -= key->both.offset;
if (unlikely(!access_ok(uaddr, sizeof(u32))))
On Fri, May 15, 2020 at 06:36:47PM +0300, Konstantin Khlebnikov wrote:
> Userspace implementations of mutexes (including glibc) in some cases
> retries operation without checking error code from syscall futex.
> This is good for performance because most errors are impossible when
> locking code trusts itself.
>
> Some errors which could came from outer code are handled automatically,
> for example invalid address triggers SIGSEGV on atomic fast path.
>
> But one case turns into nasty busy-loop: when address is unaligned.
> futex(FUTEX_WAIT) returns EINVAL immediately and loop goes to retry.
>
> Example which loops inside second call rather than hung peacefully:
>
> #include <stdlib.h>
> #include <pthread.h>
>
> int main(int argc, char **argv)
> {
> char buf[sizeof(pthread_mutex_t) + 1];
> pthread_mutex_t *mutex = (pthread_mutex_t *)(buf + 1);
>
> pthread_mutex_init(mutex, NULL);
> pthread_mutex_lock(mutex);
> pthread_mutex_lock(mutex);
> }
>
> It seems there is no practical usage for calling syscall futex for
> unaligned address. This may be only bug in user space. Let's help
> and handle this gracefully without adding extra code on fast path.
>
> This patch sends SIGBUS signal to slay task and break busy-loop.
>
> Signed-off-by: Konstantin Khlebnikov <[email protected]>
> Reported-by: Maxim Samoylov <[email protected]>
Seems like a sensible idea to me.
Acked-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> kernel/futex.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index b59532862bc0..8a6d35fa56bc 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -508,10 +508,21 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
>
> /*
> * The futex address must be "naturally" aligned.
> + * Also send signal to break busy-loop if user-space ignore error.
> + * EFAULT case should trigger SIGSEGV at access from user-space.
> */
> key->both.offset = address % PAGE_SIZE;
> - if (unlikely((address % sizeof(u32)) != 0))
> + if (unlikely((address % sizeof(u32)) != 0)) {
> + struct kernel_siginfo info;
> +
> + clear_siginfo(&info);
> + info.si_signo = SIGBUS;
> + info.si_code = BUS_ADRALN;
> + info.si_addr = uaddr;
> + force_sig_info(&info);
> +
> return -EINVAL;
> + }
> address -= key->both.offset;
>
> if (unlikely(!access_ok(uaddr, sizeof(u32))))
>
On 5/15/20 12:27 PM, Peter Zijlstra wrote:
> On Fri, May 15, 2020 at 06:36:47PM +0300, Konstantin Khlebnikov wrote:
>> Userspace implementations of mutexes (including glibc) in some cases
>> retries operation without checking error code from syscall futex.
>> This is good for performance because most errors are impossible when
>> locking code trusts itself.
In newer versions of glibc, which won't solve this problem for older
distributions (or newer glibc on older kernels), we've refactored all
of this code into futex-internal.h and do things like this (example
from one of the generic internal interfaces for futex use):
149 case -ETIMEDOUT: /* Cannot have happened as we provided no timeout. */
150 case -EFAULT: /* Must have been caused by a glibc or application bug. */
151 case -EINVAL: /* Either due to wrong alignment or due to the timeout not
152 being normalized. Must have been caused by a glibc or
153 application bug. */
154 case -ENOSYS: /* Must have been caused by a glibc bug. */
155 /* No other errors are documented at this time. */
156 default:
157 futex_fatal_error ();
158 }
Several of the pthread interfaces are using this code so they won't suffer
from "stuck EINVAL loops" like below.
We worked with all the interested parties to get `man 2 futex` updated
with the expected semantics and error return codes.
We don't want to ignore what the kernel is returning and we terminate
the process without propagating that error upwards for the simple
API cases.
Likewise note the "default:" which means if we get new futex error that
is not documented we also terminate the process.
>> Some errors which could came from outer code are handled automatically,
>> for example invalid address triggers SIGSEGV on atomic fast path.
>>
>> But one case turns into nasty busy-loop: when address is unaligned.
>> futex(FUTEX_WAIT) returns EINVAL immediately and loop goes to retry.
>>
>> Example which loops inside second call rather than hung peacefully:
>>
>> #include <stdlib.h>
>> #include <pthread.h>
>>
>> int main(int argc, char **argv)
>> {
>> char buf[sizeof(pthread_mutex_t) + 1];
>> pthread_mutex_t *mutex = (pthread_mutex_t *)(buf + 1);
>>
>> pthread_mutex_init(mutex, NULL);
>> pthread_mutex_lock(mutex);
>> pthread_mutex_lock(mutex);
>> }
This isn't fixed because this is the older code in pthread_mutex_lock
which we haven't ported to futex-internal.h yet, otherwise we would abort
the process.
A quick change to use the newer interface (futex_wait_simple), and the
example above fails as expected:
./test
The futex facility returned an unexpected error code.
Aborted (core dumped)
And it does not loop. I'm open to bikeshed on the existing error message
(which has been there since 2014 / commit a2f0363f817).
coredumpctl debug loop-futex/test
Core was generated by `./test'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
49 return ret;
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1 0x00007f1cac0d2872 in __GI_abort () at abort.c:79
#2 0x00007f1cac12a248 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f1cac234a57 "%s")
at ../sysdeps/posix/libc_fatal.c:155
#3 0x00007f1cac12a27a in __GI___libc_fatal (
message=message@entry=0x7f1cac288000 "The futex facility returned an unexpected error code.\n")
at ../sysdeps/posix/libc_fatal.c:164
#4 0x00007f1cac283fdc in futex_fatal_error () at ../sysdeps/nptl/futex-internal.h:157
#5 futex_wait (private=<optimized out>, expected=2, futex_word=0x7f1cac283fdc <__lll_lock_wait+92>)
at ../sysdeps/nptl/futex-internal.h:157
#6 futex_wait_simple (private=<optimized out>, expected=2, futex_word=0x7f1cac283fdc <__lll_lock_wait+92>)
at ../sysdeps/nptl/futex-internal.h:172
#7 __lll_lock_wait (futex=futex@entry=0x7ffdb1f0a2c1, private=<optimized out>) at lowlevellock.c:53
#8 0x00007f1cac27cbf3 in __GI___pthread_mutex_lock (mutex=0x7ffdb1f0a2c1) at ../nptl/pthread_mutex_lock.c:80
#9 0x000000000040117a in main (argc=1, argv=0x7ffdb1f0a3f8) at test.c:11
So semantically the kernel change makes sense, and will terminate the
process for glibc today, and matches what the refactored glibc code
will do in userspace for more of the interfaces in the future.
>> It seems there is no practical usage for calling syscall futex for
>> unaligned address. This may be only bug in user space. Let's help
>> and handle this gracefully without adding extra code on fast path.
The only use case I could see is retroactively adding a futex to the
existing ABI of a structure and wanting to avoid padding. That does
not seem like a common enough use case that we would want to support.
To get efficient cache-line usage you'll want to pack things by hand.
>> This patch sends SIGBUS signal to slay task and break busy-loop.
>>
>> Signed-off-by: Konstantin Khlebnikov <[email protected]>
>> Reported-by: Maxim Samoylov <[email protected]>
>
> Seems like a sensible idea to me.
Please do try to update the linux kernel man pages update to note
the change in behaviour and the version and commit of the released
kernel where this changed.
Please keep `man 2 futex` as accurate as possible for userspace
libc implementations.
Thanks.
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
>
>> ---
>> kernel/futex.c | 13 ++++++++++++-
>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/futex.c b/kernel/futex.c
>> index b59532862bc0..8a6d35fa56bc 100644
>> --- a/kernel/futex.c
>> +++ b/kernel/futex.c
>> @@ -508,10 +508,21 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
>>
>> /*
>> * The futex address must be "naturally" aligned.
>> + * Also send signal to break busy-loop if user-space ignore error.
>> + * EFAULT case should trigger SIGSEGV at access from user-space.
>> */
>> key->both.offset = address % PAGE_SIZE;
>> - if (unlikely((address % sizeof(u32)) != 0))
>> + if (unlikely((address % sizeof(u32)) != 0)) {
>> + struct kernel_siginfo info;
>> +
>> + clear_siginfo(&info);
>> + info.si_signo = SIGBUS;
>> + info.si_code = BUS_ADRALN;
>> + info.si_addr = uaddr;
>> + force_sig_info(&info);
>> +
>> return -EINVAL;
>> + }
>> address -= key->both.offset;
>>
>> if (unlikely(!access_ok(uaddr, sizeof(u32))))
>>
>
--
Cheers,
Carlos.
On Fri, May 15, 2020 at 1:50 PM Carlos O'Donell <[email protected]> wrote:
> This isn't fixed because this is the older code in pthread_mutex_lock
> which we haven't ported to futex-internal.h yet, otherwise we would abort
> the process.
I filed this upstream as a QoI issue so I don't forget to port the existing code
to the newer internal interfaces for futex handling.
"Bug 25997 - pthread_mutex_lock QoI issue for unaligned futex."
https://sourceware.org/bugzilla/show_bug.cgi?id=25997
I checked that -Wcast-align=strict does warn about this case, but it's
rarely used
in production code that I've worked with. I'm following up with the
compiler people
to see if we can consistently warn in these cases and so avoid this kind of code
existing in the first place.
Cheers,
Carlos.
Konstantin Khlebnikov <[email protected]> writes:
> Userspace implementations of mutexes (including glibc) in some cases
> retries operation without checking error code from syscall futex.
> This is good for performance because most errors are impossible when
> locking code trusts itself.
This argument is blantantly wrong. It's the justification for bad
programming. Code ignoring error returns is simply buggy.
> Some errors which could came from outer code are handled automatically,
> for example invalid address triggers SIGSEGV on atomic fast path.
>
> But one case turns into nasty busy-loop: when address is unaligned.
> futex(FUTEX_WAIT) returns EINVAL immediately and loop goes to retry.
Why is that something the kernel has to care about? The kernel returns
EINVAl as documented and when user space decides to ignore then it goes
to retry for a full timeslice for nothing.
You have to come up with a better argument why we want to send a signal
here.
Along with an argument why SIGBUS is the right thing when a user space
fast path violation results in a SIGSEGV as you stated above.
Plus a patch which documents this change in the futex man page.
> Example which loops inside second call rather than hung peacefully:
>
> #include <stdlib.h>
> #include <pthread.h>
>
> int main(int argc, char **argv)
> {
> char buf[sizeof(pthread_mutex_t) + 1];
> pthread_mutex_t *mutex = (pthread_mutex_t *)(buf + 1);
>
> pthread_mutex_init(mutex, NULL);
> pthread_mutex_lock(mutex);
> pthread_mutex_lock(mutex);
> }
And this broken code is a kernel problem because?
> It seems there is no practical usage for calling syscall futex for
> unaligned address. This may be only bug in user space. Let's help and
> handle this gracefully without adding extra code on fast path.
How does that help?
Are you going to stick SIGBUS in _ALL_ syscalls which might be retried
in a loop just because user space fails to evaluate the error code
properly?
You have to come up with a better argument to justify this change.
> This patch sends SIGBUS signal to slay task and break busy-loop.
I'm pretty sure that I asked you to read and follow documentation
before. If I did not:
git grep 'This patch' Documentation/process/
Thanks,
tglx
On 20/05/2020 01.53, Thomas Gleixner wrote:
> Konstantin Khlebnikov <[email protected]> writes:
>
>> Userspace implementations of mutexes (including glibc) in some cases
>> retries operation without checking error code from syscall futex.
>> This is good for performance because most errors are impossible when
>> locking code trusts itself.
>
> This argument is blantantly wrong. It's the justification for bad
> programming. Code ignoring error returns is simply buggy.
I knew you'll love it. =)
>
>> Some errors which could came from outer code are handled automatically,
>> for example invalid address triggers SIGSEGV on atomic fast path.
>>
>> But one case turns into nasty busy-loop: when address is unaligned.
>> futex(FUTEX_WAIT) returns EINVAL immediately and loop goes to retry.
>
> Why is that something the kernel has to care about? The kernel returns
> EINVAl as documented and when user space decides to ignore then it goes
> to retry for a full timeslice for nothing.
>
> You have to come up with a better argument why we want to send a signal
> here.
>
> Along with an argument why SIGBUS is the right thing when a user space
> fast path violation results in a SIGSEGV as you stated above.
SIGSEGV comes only if address points to unmapped area.
This case doesn't need special care unlike to misaligned address.
>
> Plus a patch which documents this change in the futex man page.
Yep, will do.
>
>> Example which loops inside second call rather than hung peacefully:
>>
>> #include <stdlib.h>
>> #include <pthread.h>
>>
>> int main(int argc, char **argv)
>> {
>> char buf[sizeof(pthread_mutex_t) + 1];
>> pthread_mutex_t *mutex = (pthread_mutex_t *)(buf + 1);
>>
>> pthread_mutex_init(mutex, NULL);
>> pthread_mutex_lock(mutex);
>> pthread_mutex_lock(mutex);
>> }
>
> And this broken code is a kernel problem because?
That's just distilled example how pthread primitives works
when application passes misaligned pointer.
>
>> It seems there is no practical usage for calling syscall futex for
>> unaligned address. This may be only bug in user space. Let's help and
>> handle this gracefully without adding extra code on fast path.
>
> How does that help?
>
> Are you going to stick SIGBUS in _ALL_ syscalls which might be retried
> in a loop just because user space fails to evaluate the error code
> properly?
>
> You have to come up with a better argument to justify this change.
This particular case is important because buggy code is in glibc.
Definitely most frequently used library ever. Musl is buggy too.
Bug is here for ages and it took another decade to spread update.
In modern containerized setup kernel updates much more frequently.
All this hides bugs: application might recover from loop
if another thread unlock mutex from atomic fast path.
But mutex works as spin-lock wasting cpu time.
So, these few lines of code might prevent significant CO2 emission =)
>
>> This patch sends SIGBUS signal to slay task and break busy-loop.
>
> I'm pretty sure that I asked you to read and follow documentation
> before. If I did not:
>
> git grep 'This patch' Documentation/process/
Force of habit. =/
Definitely should be in checkpatch.pl
>
> Thanks,
>
> tglx