2023-11-15 08:51:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Several tst-robust* tests time out with recent Linux kernel

On Wed, Nov 15, 2023 at 01:11:20AM +0000, Edgecombe, Rick P wrote:
> On Tue, 2023-11-14 at 21:14 +0100, Peter Zijlstra wrote:
> > Urgh, thanks!
> >
> > Confirmed, the below cures things. Although I should probably make
> > that
> > FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree.
> >
> > Let me go do a proper patch.
>
> I saw these fail on the glibc shadow stack branch today, and I also saw
> this one failing:
> FAIL: nptl/tst-robustpi8

tip/locking/urgent (branch with the fix on) gets me:

root@ivb-ep:/usr/local/src/glibc# ./build/nptl/tst-robustpi8
running child
verifying locks
running child
verifying locks
running child
verifying locks
running child
verifying locks
running child
verifying locks
root@ivb-ep:/usr/local/src/glibc#

Which, to my untrained eye, looks like a pass to me.


2023-11-15 23:28:25

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: Several tst-robust* tests time out with recent Linux kernel

On Wed, 2023-11-15 at 09:51 +0100, Peter Zijlstra wrote:
> On Wed, Nov 15, 2023 at 01:11:20AM +0000, Edgecombe, Rick P wrote:
> > On Tue, 2023-11-14 at 21:14 +0100, Peter Zijlstra wrote:
> > > Urgh, thanks!
> > >
> > > Confirmed, the below cures things. Although I should probably
> > > make
> > > that
> > > FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree.
> > >
> > > Let me go do a proper patch.
> >
> > I saw these fail on the glibc shadow stack branch today, and I also
> > saw
> > this one failing:
> > FAIL: nptl/tst-robustpi8
>
> tip/locking/urgent (branch with the fix on) gets me:
>
> root@ivb-ep:/usr/local/src/glibc# ./build/nptl/tst-robustpi8
> running child
> verifying locks
> running child
> verifying locks
> running child
> verifying locks
> running child
> verifying locks
> running child
> verifying locks
> root@ivb-ep:/usr/local/src/glibc#
>
> Which, to my untrained eye, looks like a pass to me.

It bisects to this for me:
fbeb558b0dd0 ("futex/pi: Fix recursive rt_mutex waiter state")

Reading the patch, I'm not immediately clear what is going on but a few
comments stood out: "There be dragons here" "What could possibly go
wrong..." "This is a somewhat dangerous proposition".

Seems a likelihood of some race, but it reproduces reliably on my
machine. Haven't dug into debugging it yet. Any pointers?

2023-11-17 01:25:20

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: Several tst-robust* tests time out with recent Linux kernel

A bit more info...

The error returned to userspace is originating from:
https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L295

'uval' is often zero in that error case, but sometimes just a
mismatching value like: uval=0x567, task_pid_vnr()=0x564


Depending on the number of CPUs the VM is running on it reproduces or
not. When it does reproduce, the newly added path here is taken:
https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L1185
The path is taken a lot during the test, sometimes >400 times before
the above linked error is generated during the syscall. When it doesn't
reproduce, I never saw that new path taken.

More print statements make the reproduction less reliable, so it does
seem to have a race in the mix at least somewhat. Otherwise, I haven't
tried to understand what is going on here with all this highwire
locking.

Hope it helps.

2024-01-19 13:57:19

by Stefan Liebler

[permalink] [raw]
Subject: Re: Several tst-robust* tests time out with recent Linux kernel

On 17.11.23 02:22, Edgecombe, Rick P wrote:
> A bit more info...
>
> The error returned to userspace is originating from:
> https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L295
>
> 'uval' is often zero in that error case, but sometimes just a
> mismatching value like: uval=0x567, task_pid_vnr()=0x564
>
>
> Depending on the number of CPUs the VM is running on it reproduces or
> not. When it does reproduce, the newly added path here is taken:
> https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L1185
> The path is taken a lot during the test, sometimes >400 times before
> the above linked error is generated during the syscall. When it doesn't
> reproduce, I never saw that new path taken.
>
> More print statements make the reproduction less reliable, so it does
> seem to have a race in the mix at least somewhat. Otherwise, I haven't
> tried to understand what is going on here with all this highwire
> locking.
>
> Hope it helps.
Hi,

I've also observed fails in glibc testcase nptl/tst-robust8pi with:
mutex_timedlock of 66 in thread 7 failed with 22
=> pthread_mutex_timedlock returns 22=EINVAL

I've saw it on s390x. There I've used kernel with
commit 120d99901eb288f1d21db3976df4ba347b28f9c7
s390/vfio-ap: do not reset queue removed from host config

But I also saw it on a x86_64 kvm-guest with Fedora 39 and
copr-repository with vanilla kernel:
Linux fedora 6.7.0-0.rc8.20240107gt52b1853b.366.vanilla.fc39.x86_64 #1
SMP PREEMPT_DYNAMIC Sun Jan 7 06:17:30 UTC 2024 x86_64 GNU/Linux

And reported it to libc-alpha ("FAILING nptl/tst-robust8pi"
https://sourceware.org/pipermail/libc-alpha/2024-January/154150.html)
where Florian Weimer pointed me to this thread.

I've reduced the test (see attachement) and now have only one process
with three threads. I only use one mutex with attributes like the
original testcase: PTHREAD_MUTEX_ROBUST_NP, PTHREAD_PROCESS_SHARED,
PTHREAD_PRIO_INHERIT.
Every thread is doing a loop with pthread_mutex_timedlock(abstime={0,0})
and if locked, pthread_mutex_unlock.

I've added some uprobes before and after the futex-syscall in
__futex_lock_pi64(in pthread_mutex_timedlock) and futex_unlock_pi(in
pthread_mutex_unlock). For me __ASSUME_FUTEX_LOCK_PI2 is not available,
but __ASSUME_TIME64_SYSCALLS is defined.

For me it looks like this (simplified ubprobes-trace):
<thread> <timestamp>: <probe>
t1 4309589.419744: before syscall in __futex_lock_pi64

t3 4309589.419745: before syscall in futex_unlock_pi
t2 4309589.419745: before syscall in __futex_lock_pi64

t3 4309589.419747: after syscall in futex_unlock_pi
t2 4309589.419747: after syscall in __futex_lock_pi64 ret=-22=EINVAL

t1 4309589.419748: after syscall in __futex_lock_pi64 ret=-110=ETIMEDOUT

Can you please have a look again?

Bye,
Stefan Liebler


Attachments:
tst-robust8pi-20240118.c (4.06 kB)

2024-01-29 22:24:12

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: Several tst-robust* tests time out with recent Linux kernel

On Fri, 2024-01-19 at 14:56 +0100, Stefan Liebler wrote:
> I've reduced the test (see attachement) and now have only one process
> with three threads.

This tests fails on my setup as well:
main: start 3 threads.
#0: started: fct=1
#1: started: fct=1
#2: started: fct=1
#2: mutex_timedlock failed with 22 (round=28772)

But, after this patch:
https://lore.kernel.org/all/[email protected]/

...the attached test hangs.

However, the glibc test that was failing for me "nptl/tst-robustpi8"
passes with the linked patch applied. So I think that patch fixes the
issue I hit.

What is passing supposed to look like on the attached test?

2024-01-30 10:16:04

by Stefan Liebler

[permalink] [raw]
Subject: Re: Several tst-robust* tests time out with recent Linux kernel

On 29.01.24 23:23, Edgecombe, Rick P wrote:
> On Fri, 2024-01-19 at 14:56 +0100, Stefan Liebler wrote:
>> I've reduced the test (see attachement) and now have only one process
>> with three threads.
>
> This tests fails on my setup as well:
> main: start 3 threads.
> #0: started: fct=1
> #1: started: fct=1
> #2: started: fct=1
> #2: mutex_timedlock failed with 22 (round=28772)
>
> But, after this patch:
> https://lore.kernel.org/all/[email protected]/
>
> ...the attached test hangs.
>
> However, the glibc test that was failing for me "nptl/tst-robustpi8"
> passes with the linked patch applied. So I think that patch fixes the
> issue I hit.
>
> What is passing supposed to look like on the attached test?

kernel commit "futex: Prevent the reuse of stale pi_state"
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=e626cb02ee8399fd42c415e542d031d185783903

fixes the issue on s390x.

With this commit, the test runs to the end:
main: start 3 threads.
#0: started: fct=1
#1: started: fct=1
#2: started: fct=1
#2: REACHED round 100000000. => exit
#0: REACHED round 100000000. => exit
#1: REACHED round 100000000. => exit
main: end.

If you want you can reduce the number of rounds by compiling with
-DROUNDS=XYZ or manually adjusting the ROUNDS macro define.