On Wed, Nov 15, 2023 at 01:11:20AM +0000, Edgecombe, Rick P wrote:
> On Tue, 2023-11-14 at 21:14 +0100, Peter Zijlstra wrote:
> > Urgh, thanks!
> >
> > Confirmed, the below cures things. Although I should probably make
> > that
> > FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree.
> >
> > Let me go do a proper patch.
>
> I saw these fail on the glibc shadow stack branch today, and I also saw
> this one failing:
> FAIL: nptl/tst-robustpi8
tip/locking/urgent (branch with the fix on) gets me:
root@ivb-ep:/usr/local/src/glibc# ./build/nptl/tst-robustpi8
running child
verifying locks
running child
verifying locks
running child
verifying locks
running child
verifying locks
running child
verifying locks
root@ivb-ep:/usr/local/src/glibc#
Which, to my untrained eye, looks like a pass to me.
On Wed, 2023-11-15 at 09:51 +0100, Peter Zijlstra wrote:
> On Wed, Nov 15, 2023 at 01:11:20AM +0000, Edgecombe, Rick P wrote:
> > On Tue, 2023-11-14 at 21:14 +0100, Peter Zijlstra wrote:
> > > Urgh, thanks!
> > >
> > > Confirmed, the below cures things. Although I should probably
> > > make
> > > that
> > > FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree.
> > >
> > > Let me go do a proper patch.
> >
> > I saw these fail on the glibc shadow stack branch today, and I also
> > saw
> > this one failing:
> > FAIL: nptl/tst-robustpi8
>
> tip/locking/urgent (branch with the fix on) gets me:
>
> root@ivb-ep:/usr/local/src/glibc# ./build/nptl/tst-robustpi8
> running child
> verifying locks
> running child
> verifying locks
> running child
> verifying locks
> running child
> verifying locks
> running child
> verifying locks
> root@ivb-ep:/usr/local/src/glibc#
>
> Which, to my untrained eye, looks like a pass to me.
It bisects to this for me:
fbeb558b0dd0 ("futex/pi: Fix recursive rt_mutex waiter state")
Reading the patch, I'm not immediately clear what is going on but a few
comments stood out: "There be dragons here" "What could possibly go
wrong..." "This is a somewhat dangerous proposition".
Seems a likelihood of some race, but it reproduces reliably on my
machine. Haven't dug into debugging it yet. Any pointers?
A bit more info...
The error returned to userspace is originating from:
https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L295
'uval' is often zero in that error case, but sometimes just a
mismatching value like: uval=0x567, task_pid_vnr()=0x564
Depending on the number of CPUs the VM is running on it reproduces or
not. When it does reproduce, the newly added path here is taken:
https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L1185
The path is taken a lot during the test, sometimes >400 times before
the above linked error is generated during the syscall. When it doesn't
reproduce, I never saw that new path taken.
More print statements make the reproduction less reliable, so it does
seem to have a race in the mix at least somewhat. Otherwise, I haven't
tried to understand what is going on here with all this highwire
locking.
Hope it helps.
On 17.11.23 02:22, Edgecombe, Rick P wrote:
> A bit more info...
>
> The error returned to userspace is originating from:
> https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L295
>
> 'uval' is often zero in that error case, but sometimes just a
> mismatching value like: uval=0x567, task_pid_vnr()=0x564
>
>
> Depending on the number of CPUs the VM is running on it reproduces or
> not. When it does reproduce, the newly added path here is taken:
> https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L1185
> The path is taken a lot during the test, sometimes >400 times before
> the above linked error is generated during the syscall. When it doesn't
> reproduce, I never saw that new path taken.
>
> More print statements make the reproduction less reliable, so it does
> seem to have a race in the mix at least somewhat. Otherwise, I haven't
> tried to understand what is going on here with all this highwire
> locking.
>
> Hope it helps.
Hi,
I've also observed fails in glibc testcase nptl/tst-robust8pi with:
mutex_timedlock of 66 in thread 7 failed with 22
=> pthread_mutex_timedlock returns 22=EINVAL
I've saw it on s390x. There I've used kernel with
commit 120d99901eb288f1d21db3976df4ba347b28f9c7
s390/vfio-ap: do not reset queue removed from host config
But I also saw it on a x86_64 kvm-guest with Fedora 39 and
copr-repository with vanilla kernel:
Linux fedora 6.7.0-0.rc8.20240107gt52b1853b.366.vanilla.fc39.x86_64 #1
SMP PREEMPT_DYNAMIC Sun Jan 7 06:17:30 UTC 2024 x86_64 GNU/Linux
And reported it to libc-alpha ("FAILING nptl/tst-robust8pi"
https://sourceware.org/pipermail/libc-alpha/2024-January/154150.html)
where Florian Weimer pointed me to this thread.
I've reduced the test (see attachement) and now have only one process
with three threads. I only use one mutex with attributes like the
original testcase: PTHREAD_MUTEX_ROBUST_NP, PTHREAD_PROCESS_SHARED,
PTHREAD_PRIO_INHERIT.
Every thread is doing a loop with pthread_mutex_timedlock(abstime={0,0})
and if locked, pthread_mutex_unlock.
I've added some uprobes before and after the futex-syscall in
__futex_lock_pi64(in pthread_mutex_timedlock) and futex_unlock_pi(in
pthread_mutex_unlock). For me __ASSUME_FUTEX_LOCK_PI2 is not available,
but __ASSUME_TIME64_SYSCALLS is defined.
For me it looks like this (simplified ubprobes-trace):
<thread> <timestamp>: <probe>
t1 4309589.419744: before syscall in __futex_lock_pi64
t3 4309589.419745: before syscall in futex_unlock_pi
t2 4309589.419745: before syscall in __futex_lock_pi64
t3 4309589.419747: after syscall in futex_unlock_pi
t2 4309589.419747: after syscall in __futex_lock_pi64 ret=-22=EINVAL
t1 4309589.419748: after syscall in __futex_lock_pi64 ret=-110=ETIMEDOUT
Can you please have a look again?
Bye,
Stefan Liebler
On Fri, 2024-01-19 at 14:56 +0100, Stefan Liebler wrote:
> I've reduced the test (see attachement) and now have only one process
> with three threads.
This tests fails on my setup as well:
main: start 3 threads.
#0: started: fct=1
#1: started: fct=1
#2: started: fct=1
#2: mutex_timedlock failed with 22 (round=28772)
But, after this patch:
https://lore.kernel.org/all/[email protected]/
...the attached test hangs.
However, the glibc test that was failing for me "nptl/tst-robustpi8"
passes with the linked patch applied. So I think that patch fixes the
issue I hit.
What is passing supposed to look like on the attached test?
On 29.01.24 23:23, Edgecombe, Rick P wrote:
> On Fri, 2024-01-19 at 14:56 +0100, Stefan Liebler wrote:
>> I've reduced the test (see attachement) and now have only one process
>> with three threads.
>
> This tests fails on my setup as well:
> main: start 3 threads.
> #0: started: fct=1
> #1: started: fct=1
> #2: started: fct=1
> #2: mutex_timedlock failed with 22 (round=28772)
>
> But, after this patch:
> https://lore.kernel.org/all/[email protected]/
>
> ...the attached test hangs.
>
> However, the glibc test that was failing for me "nptl/tst-robustpi8"
> passes with the linked patch applied. So I think that patch fixes the
> issue I hit.
>
> What is passing supposed to look like on the attached test?
kernel commit "futex: Prevent the reuse of stale pi_state"
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=e626cb02ee8399fd42c415e542d031d185783903
fixes the issue on s390x.
With this commit, the test runs to the end:
main: start 3 threads.
#0: started: fct=1
#1: started: fct=1
#2: started: fct=1
#2: REACHED round 100000000. => exit
#0: REACHED round 100000000. => exit
#1: REACHED round 100000000. => exit
main: end.
If you want you can reduce the number of rounds by compiling with
-DROUNDS=XYZ or manually adjusting the ROUNDS macro define.