Hi,
When I'm testing Glibc master branch for upcoming 2.39 release, I
noticed an alarming test failure on mips64el:
FAIL: stdlib/tst-arc4random-thread
I've gathered some info about it and pasted my findings into
https://sourceware.org/glibc/wiki/Testing/Tests/stdlib/tst-arc4random-thread.
Finally I was able to reduce the test case into:
#include <stdlib.h>
#include <errno.h>
#include <pthread.h>
#include <unistd.h>
#include <fcntl.h>
void *
test_thread (void *)
{
char buf[16] = {};
int fd = open("/dev/zero", O_RDONLY);
while (1)
{
ssize_t ret = read (fd, buf, 7);
if (ret == -1 && errno == EFAULT)
abort ();
}
}
void *
fork_thread (void *)
{
while (1)
{
if (!fork ())
_exit (0);
}
}
int
main (void)
{
pthread_t test_th;
pthread_t fork_th;
pthread_create (&test_th, NULL, test_thread, NULL);
pthread_create (&fork_th, NULL, fork_thread, NULL);
pthread_join (test_th, NULL);
pthread_join (fork_th, NULL);
}
When running this on the mainline kernel (revision 6.8.0-rc1+-
g7ed2632ec7d72e926b9e8bcc9ad1bb0cd37274bf) it fails in milliseconds.
Some "interesting" aspects:
1. This is related to the size parameter passed to read (). When it's
less than 8 it fails, but when it's 8 or greater there is no failure.
2. This is not related to if "buf" is initialized or not.
Now I'm suspecting this might be a kernel bug. Any pointer to further
triage?
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On Jan 24 2024, Xi Ruoyao wrote:
> Now I'm suspecting this might be a kernel bug. Any pointer to further
> triage?
Is this a regression?
--
Andreas Schwab, SUSE Labs, [email protected]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
On Wed, 2024-01-24 at 12:59 +0100, Andreas Schwab wrote:
> On Jan 24 2024, Xi Ruoyao wrote:
>
> > Now I'm suspecting this might be a kernel bug. Any pointer to further
> > triage?
>
> Is this a regression?
Initially I guessed it was perhaps a Glibc regression related to the
newly introduced clone3 usage on MIPS, but it fails with Glibc-2.35 too.
Not sure if this is a kernel regression, I'll try different kernels in
several hours (once I can physically access the system).
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On Wed, 2024-01-24 at 20:49 +0800, Xi Ruoyao wrote:
> On Wed, 2024-01-24 at 12:59 +0100, Andreas Schwab wrote:
> > On Jan 24 2024, Xi Ruoyao wrote:
> >
> > > Now I'm suspecting this might be a kernel bug. Any pointer to further
> > > triage?
> >
> > Is this a regression?
>
> Initially I guessed it was perhaps a Glibc regression related to the
> newly introduced clone3 usage on MIPS, but it fails with Glibc-2.35 too.
>
> Not sure if this is a kernel regression, I'll try different kernels in
> several hours (once I can physically access the system).
Not happening with kernel 5.18.1. I can do a bisection but it will take
several days, I guess.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On Thu, 2024-01-25 at 00:13 +0800, Xi Ruoyao wrote:
> On Wed, 2024-01-24 at 20:49 +0800, Xi Ruoyao wrote:
> > On Wed, 2024-01-24 at 12:59 +0100, Andreas Schwab wrote:
> > > On Jan 24 2024, Xi Ruoyao wrote:
> > >
> > > > Now I'm suspecting this might be a kernel bug. Any pointer to further
> > > > triage?
> > >
> > > Is this a regression?
> >
> > Initially I guessed it was perhaps a Glibc regression related to the
> > newly introduced clone3 usage on MIPS, but it fails with Glibc-2.35 too.
> >
> > Not sure if this is a kernel regression, I'll try different kernels in
> > several hours (once I can physically access the system).
>
> Not happening with kernel 5.18.1. I can do a bisection but it will take
> several days, I guess.
Hmm, not so time-consuming as I expected.
4bce37a68ff884e821a02a731897a8119e0c37b7 is the first bad commit
commit 4bce37a68ff884e821a02a731897a8119e0c37b7
Author: Ben Hutchings <[email protected]>
Date: Thu Jun 22 18:47:40 2023 +0200
mips/mm: Convert to using lock_mm_and_find_vma()
Re-posting the broken test case for Ben (I also added a waitpid call to
prevent PID exhaustion):
#include <stdlib.h>
#include <errno.h>
#include <pthread.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>
void *
test_thread (void *)
{
char buf[16] = {};
int fd = open("/dev/zero", O_RDONLY);
while (1)
{
ssize_t ret = read (fd, buf, 7);
if (ret == -1 && errno == EFAULT)
abort ();
}
}
void *
fork_thread (void *)
{
while (1)
{
pid_t p = fork ();
if (!p)
_exit (0);
waitpid (p, NULL, 0);
}
}
int
main (void)
{
pthread_t test_th;
pthread_t fork_th;
pthread_create (&test_th, NULL, test_thread, NULL);
pthread_create (&fork_th, NULL, fork_thread, NULL);
pthread_join (test_th, NULL);
pthread_join (fork_th, NULL);
}
and the context where this issue was detected:
https://sourceware.org/glibc/wiki/Testing/Tests/stdlib/tst-arc4random-thread
and the "interesting" aspects:
1. If I change the third parameter of "read" to any value >= 8, it no
longer fails. But it fails with any integer in [1, 8).
2. It fails no matter if I initialize buf.
3. It does not fail on arm64 (the only other port using
lock_mm_and_find_vma I have access to).
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On Thu, 2024-01-25 at 05:32 +0800, Xi Ruoyao wrote:
/* snip */
> and the "interesting" aspects:
>
> 1. If I change the third parameter of "read" to any value >= 8, it no
> longer fails. But it fails with any integer in [1, 8).
> 2. It fails no matter if I initialize buf.
> 3. It does not fail on arm64 (the only other port using
> lock_mm_and_find_vma I have access to).
Correction: I'd not realized many ports use lock_mm_and_find_vma even
before this series of changes. I also have access to x86_64 and
loongarch64, and the failure seems specific to MIPS.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On Wed, 24 Jan 2024 at 13:33, Xi Ruoyao <[email protected]> wrote:
>
> Re-posting the broken test case for Ben (I also added a waitpid call to
> prevent PID exhaustion):
Funky, funky.
> ssize_t ret = read (fd, buf, 7);
> if (ret == -1 && errno == EFAULT)
> abort ();
So I think I have a clue:
> and the "interesting" aspects:
>
> 1. If I change the third parameter of "read" to any value >= 8, it no
> longer fails. But it fails with any integer in [1, 8).
One change (the only one, really), is that now that MIPS uses
lock_mm_and_find_vma(), it also has this code:
if (regs && !user_mode(regs)) {
unsigned long ip = instruction_pointer(regs);
if (!search_exception_tables(ip))
return false;
}
in case the mmap trylock fails.
That code protects against the deadlock case of "we hold the mmap
lock, and take a kernel page fault due to a bug, and that page fault
happens to be to user space, and the page fault code then deadlocks on
the mmap lock".
It's a rare bug, but it's so nasty to debug that x86 has had that code
pretty much forever, and the lock_mm_and_find_vma() helper got it that
way. MIPS was clearly expecting kernel debugging to happen on other
platforms ;)
And I think the "fails with any integer in [1, 8)" is because the MIPS
"copy_from_user()" code is likely doing something special for those
small copies.
And I note that the MIPS extable.c code uses
fixup = search_exception_tables(exception_epc(regs));
Note the difference: lock_mm_and_find_vma() uses
instruction_pointer(regs), extable.c uses exception_epc(regs).
The former is just "((regs)->cp0_epc)", while the latter is some
complex mess due to MIPS delay slots and isa16.
My *suspicion* is that instruction_pointer() needs to be fixed to do
the same full exception_epc() thing.
But honestly, I absolutely detest delay slots and refuse to touch
anything MIPS for that reason,.
And there could certainly be something else going on too. But that odd
size limitation, and the fact that it only happens on MIPS, does make
me think the above analysis is right.
I guess you could test it by changing the two cases of
'instruction_pointer(regs)' in mm/memory.c to use exception_epc(regs)
instead. It will only build on MIPS, but for *testing* that theory
out, it's fine.
Over to MIPS people..
Linus
On Wed, 24 Jan 2024 at 13:54, Linus Torvalds
<[email protected]> wrote:
>
>
> And I think the "fails with any integer in [1, 8)" is because the MIPS
> "copy_from_user()" code is likely doing something special for those
> small copies.
Lcopy_bytes_checklen\@: does COPY_BYTE(0) for the first access, which is
#define COPY_BYTE(N) \
LOADB(t0, N(src), .Ll_exc\@); \
SUB len, len, 1; \
beqz len, .Ldone\@; \
STOREB(t0, N(dst), .Ls_exc_p1\@)
so yeah, for 'copy_to_user()" (which is what that "read (fd, buf, 7)"
will do, we have that user space write ("STOREB()") in the branch
delay slot of the length test.
So that matches.
And it only fails when
(a) you're unlucky, and that stack buffer
char buf[16] = {};
happens to be just under the last page that has been accessed, so
you get a page fault
(b) you hit a mmap_sem already being locked, presumably because
another thread is doing that fork().
Anyway, I'm pretty sure this is the bug, now some MIPS person just
needs to fix the MIPS version of "instruction_pointer()" to do what
"exception_epc()" already does.
Linus
On Wed, 2024-01-24 at 14:10 -0800, Linus Torvalds wrote:
> On Wed, 24 Jan 2024 at 13:54, Linus Torvalds
> <[email protected]> wrote:
> >
> >
> > And I think the "fails with any integer in [1, 8)" is because the MIPS
> > "copy_from_user()" code is likely doing something special for those
> > small copies.
>
> .Lcopy_bytes_checklen\@: does COPY_BYTE(0) for the first access, which is
>
> #define COPY_BYTE(N) \
> LOADB(t0, N(src), .Ll_exc\@); \
> SUB len, len, 1; \
> beqz len, .Ldone\@; \
> STOREB(t0, N(dst), .Ls_exc_p1\@)
>
> so yeah, for 'copy_to_user()" (which is what that "read (fd, buf, 7)"
> will do, we have that user space write ("STOREB()") in the branch
> delay slot of the length test.
>
> So that matches.
>
> And it only fails when
>
> (a) you're unlucky, and that stack buffer
>
> char buf[16] = {};
>
> happens to be just under the last page that has been accessed, so
> you get a page fault
>
> (b) you hit a mmap_sem already being locked, presumably because
> another thread is doing that fork().
So I added a stupid hack:
diff --git a/mm/memory.c b/mm/memory.c
index 7e1f4849463a..e663eb517bbf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -92,6 +92,12 @@
#include "internal.h"
#include "swap.h"
+#ifdef __mips__
+#include "asm/branch.h"
+#undef instruction_pointer
+#define instruction_pointer(x) exception_epc(x)
+#endif
+
#if defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) && !defined(CONFIG_COMPILE_TEST)
#warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
#endif
and it indeed "solved" the problem.
> Anyway, I'm pretty sure this is the bug, now some MIPS person just
> needs to fix the MIPS version of "instruction_pointer()" to do what
> "exception_epc()" already does.
Agree.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
在2024年1月24日一月 下午10:10,Linus Torvalds写道:
[...]
>
> Anyway, I'm pretty sure this is the bug, now some MIPS person just
> needs to fix the MIPS version of "instruction_pointer()" to do what
> "exception_epc()" already does.
Hi folks,
Kinda MIPS person here, I looked into the problem, and it's not that
easy to fix.
I inspected some existing usage of "instruction_pointer()", and some
of them do want exception return address (which is always CP0_EPC).
Others like this case they want the precise exception instruction
pointer ("exception_epc()" for MIPS).
I'm planning to make "instruction_pointer()" always point to exception
instructions, and implemented a new function called "return_pc()"or
whatever for "exception_epc()". Do you have a better name in mind?
Besides isa16 stuff do require kernel to read user space fault
instruction to determine delay slot size... I don't think it's always
possible when "instruction_pointer()" is called. MIPS16/microMIPS
is rarely used today though.
Thanks
>
> Linus
--
- Jiaxun
在2024年1月24日一月 上午10:42,Xi Ruoyao写道:
> Hi,
>
> When I'm testing Glibc master branch for upcoming 2.39 release, I
> noticed an alarming test failure on mips64el:
So apparently it should be tracked as a regression.
#regzbot ^introduced 4bce37a68ff884e821a02a731897a8119e0c37b7
Should we revert it for now?
Thanks
- Jiaxun
>
> FAIL: stdlib/tst-arc4random-thread
>
> I've gathered some info about it and pasted my findings into
> https://sourceware.org/glibc/wiki/Testing/Tests/stdlib/tst-arc4random-thread.
>
> Finally I was able to reduce the test case into:
>
> #include <stdlib.h>
> #include <errno.h>
> #include <pthread.h>
> #include <unistd.h>
> #include <fcntl.h>
>
> void *
> test_thread (void *)
> {
> char buf[16] = {};
> int fd = open("/dev/zero", O_RDONLY);
> while (1)
> {
> ssize_t ret = read (fd, buf, 7);
> if (ret == -1 && errno == EFAULT)
> abort ();
> }
> }
>
> void *
> fork_thread (void *)
> {
> while (1)
> {
> if (!fork ())
> _exit (0);
> }
> }
>
> int
> main (void)
> {
> pthread_t test_th;
> pthread_t fork_th;
>
> pthread_create (&test_th, NULL, test_thread, NULL);
> pthread_create (&fork_th, NULL, fork_thread, NULL);
> pthread_join (test_th, NULL);
> pthread_join (fork_th, NULL);
> }
>
> When running this on the mainline kernel (revision 6.8.0-rc1+-
> g7ed2632ec7d72e926b9e8bcc9ad1bb0cd37274bf) it fails in milliseconds.
> Some "interesting" aspects:
>
> 1. This is related to the size parameter passed to read (). When it's
> less than 8 it fails, but when it's 8 or greater there is no failure.
> 2. This is not related to if "buf" is initialized or not.
>
> Now I'm suspecting this might be a kernel bug. Any pointer to further
> triage?
>
> --
> Xi Ruoyao <[email protected]>
> School of Aerospace Science and Technology, Xidian University
--
- Jiaxun
On Fri, 2024-01-26 at 12:33 +0000, Jiaxun Yang wrote:
>
>
> 在2024年1月24日一月 上午10:42,Xi Ruoyao写道:
> > Hi,
> >
> > When I'm testing Glibc master branch for upcoming 2.39 release, I
> > noticed an alarming test failure on mips64el:
>
> So apparently it should be tracked as a regression.
>
> #regzbot ^introduced 4bce37a68ff884e821a02a731897a8119e0c37b7
>
> Should we revert it for now?
I'd say "yes" if we cannot easily patch instruction_pointer() to handle
delay slot. Anyway the reversion will be a MIPS-only change.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University