2005-09-08 03:54:41

by Parag Warudkar

[permalink] [raw]
Subject: 2.6.13-mm1 X86_64: All 32bit programs segfault

I am clueless as to what's going on but just raising a flag in case it
is a not yet known problem.
Thunderbird, 32bit Sun Java and Opera are the ones I tried. They all
work fine with the Fedora 2.6.12-x kernel but
consistently seg fault with 2.6.13-mm1.

Parag
--------
Sample stack trace for java -
gdb ./java
GNU gdb Red Hat Linux (6.3.0.0-1.21rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
libthread_db l ibrary "/lib64/libthread_db.so.1".

(gdb) b main
Breakpoint 1 at 0x80492d3: file ../../../../src/share/bin/java.c, line 163.
(gdb) r
Starting program: /home/paragw/jdk1.6/jdk1.6.0/fastdebug/bin/java
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0xffffe000
[Thread debugging using libthread_db enabled]
[New Thread 1431820864 (LWP 6077)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1431820864 (LWP 6077)]
0x00c40471 in __pthread_initialize_minimal_internal ()
from /lib/libpthread.so.0
(gdb) bt
#0 0x00c40471 in __pthread_initialize_minimal_internal ()
from /lib/libpthread.so.0
#1 0x00c40298 in call_initialize_minimal () from /lib/libpthread.so.0
#2 0x00c3fe80 in _init () from /lib/libpthread.so.0
#3 0x00b00dcb in call_init () from /lib/ld-linux.so.2
#4 0x00b00eed in _dl_init_internal () from /lib/ld-linux.so.2
#5 0x00af37cf in _dl_start_user () from /lib/ld-linux.so.2
(gdb) info registers
eax 0x0 0
ecx 0x5557da88 1431820936
edx 0x1 1
ebx 0xffffd198 -11880
esp 0xffffced4 0xffffced4
ebp 0xffffd198 0xffffd198
esi 0x5557da40 1431820864
edi 0x0 0
eip 0xc40471 0xc40471
eflags 0x10202 66050
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99
(gdb)


2005-09-08 04:00:47

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.13-mm1 X86_64: All 32bit programs segfault

On Thursday 08 September 2005 05:54, Parag Warudkar wrote:
> I am clueless as to what's going on but just raising a flag in case it
> is a not yet known problem.
> Thunderbird, 32bit Sun Java and Opera are the ones I tried. They all
> work fine with the Fedora 2.6.12-x kernel but
> consistently seg fault with 2.6.13-mm1.

Hmm - not many x86-64 patches in mm1. 2.6.13 definitely works.

>
> Parag
> --------
> Sample stack trace for java -
> gdb ./java

Last lines of strace -f + the kernel message from dmesg might be useful.

-Andi

2005-09-08 05:10:54

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.13-mm1 X86_64: All 32bit programs segfault

Andi Kleen wrote:

>Hmm - not many x86-64 patches in mm1. 2.6.13 definitely works.
>
>
2.6.13-git7 works. So something in -mm has gone bad (if not x86_64, may
be i386 or arch-independent changes?)
It seems it has got something to do with the sys_set_tid_address as
evident from the strace output below.
Another thing - If I set LD_ASSUME_KERNEL=2.4 and then run the binary,
it works fine.

> Last lines of strace -f + the kernel message from dmesg might be useful.

dmesg
--------------------
Sep 7 00:14:23 localhost kernel: mozilla-xremote[4492]: segfault at
00000000000002e8 rip 0000000000c40471 rsp 00000000ffffd334 error 4
Sep 7 00:14:24 localhost kernel: thunderbird-bin[4500]: segfault at
00000000000002e8 rip 0000000000c40471 rsp 00000000ffffcfe4 error 4
Sep 7 00:15:02 localhost kernel: mozilla-xremote[4518]: segfault at
00000000000002e8 rip 0000000000c40471 rsp 00000000ffffce84 error 4 Sep
7 00:15:02 localhost kernel: thunderbird-bin[4526]: segfault at
00000000000002e8 rip 0000000000c40471 rsp 00000000ffffbbe4 error 4

strace -f ./thunderbird
---------------------------
[pid 2888] fstat64(0x3, 0xffffb25c) = 0
[pid 2888] old_mmap(0x12c8c00c7d000, 8804682956805,
PROT_READ|PROT_WRITE, 0xc /* MAP_???
*/|MAP_FIXED|MAP_ANONYMOUS|MAP_POPULATE|MAP_NONBLOCK|MAP_GROWSDOWN|MAP_EXECUTABLE|MAP_LOCKED|0xfffe0000,
1, 0xb071e0ffffb14c) = 0xc7d000
[pid 2888] old_mmap(0x100000c8f000, 8873402433539,
PROT_READ|PROT_WRITE, 0xc /* MAP_???
*/|MAP_FIXED|MAP_ANONYMOUS|MAP_POPULATE|MAP_NONBLOCK|MAP_GROWSDOWN|MAP_EXECUTABLE|MAP_LOCKED|0xfffe0000,
1, 0xb071e0ffffb14c) = 0xc8f000
[pid 2888] close(3) = 0
[pid 2888] old_mmap(0x100000000000, 146028888067,
PROT_READ|PROT_WRITE|PROT_EXEC|PROT_GROWSDOWN|PROT_GROWSUP|0xfcfffff8,
0x4 /* MAP_???
*/|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE|MAP_POPULATE|MAP_GROWSDOWN|MAP_DENYWRITE|MAP_EXECUTABLE|0xb00680,
48, 0x800b071e0) = 0x5599e000
[pid 2888] old_mmap(0x100000000000, 146028888067,
PROT_READ|PROT_WRITE|PROT_EXEC|PROT_GROWSDOWN|PROT_GROWSUP|0xfcfffff8,
0x4 /* MAP_???
*/|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE|MAP_POPULATE|MAP_GROWSDOWN|MAP_DENYWRITE|MAP_EXECUTABLE|0xb00680,
19, 0x800b071e0) = 0x5599f000
[pid 2888] old_mmap(0x100000000000, 146028888067,
PROT_READ|PROT_WRITE|PROT_EXEC|PROT_GROWSDOWN|PROT_GROWSUP|0xfcfffff8,
0x4 /* MAP_???
*/|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE|MAP_POPULATE|MAP_GROWSDOWN|MAP_DENYWRITE|MAP_EXECUTABLE|0xb00680,
2832, 0x2000b033df) = 0x559a0000
[pid 2888] set_thread_area(0xffffce20) = 0
[pid 2888] mprotect(0xc34000, 8192, PROT_READ) = 0
[pid 2888] mprotect(0xc73000, 4096, PROT_READ) = 0
[pid 2888] mprotect(0xc4a000, 4096, PROT_READ) = 0
[pid 2888] mprotect(0xc79000, 4096, PROT_READ) = 0
[pid 2888] mprotect(0xb0d000, 4096, PROT_READ) = 0
[pid 2888] munmap(0x555d8000, 155672) = 0
[pid 2888] set_tid_address(0x559a0608) = 2888
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[pid 2888] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
Process 2883 resumed
Process 2888 detached
[pid 2883] <... chroot resumed> ) = 2888
[ Process PID=2883 runs in 64 bit mode. ]
[pid 2883] fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1),
...}) = 0
[pid 2883] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaada2d000
[pid 2883] open("/usr/share/locale/locale.alias", O_RDONLY) = 3
[pid 2883] fstat(3, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0
[pid 2883] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaada2e000
[pid 2883] read(3, "# Locale name alias data base.\n#"..., 4096) = 2528
[pid 2883] read(3, "", 4096) = 0
[pid 2883] close(3) = 0
[pid 2883] munmap(0x2aaaada2e000, 4096) = 0
[pid 2883] open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2883] open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2883] open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2883] open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2883] open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2883] open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
[pid 2883] write(2, "./run-mozilla.sh: line 159: 288"...,
76./run-mozilla.sh: line 159: 2888 Segmentation fault "$prog"
${1+"$@"}
) = 76
[pid 2883] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 2883] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 2883] wait4(-1, 0x7fffffe133d4, WNOHANG, NULL) = -1 ECHILD (No
child processes)
[pid 2883] rt_sigreturn(0xffffffffffffffff) = 0
[pid 2883] rt_sigaction(SIGINT, {SIG_DFL}, {0x4312b5, [], SA_RESTORER,
0x3116d2f330}, 8) = 0
[pid 2883] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
[pid 2883] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 2883] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
[pid 2883] stat("core", 0x7fffffe138e0) = -1 ENOENT (No such file or
directory)[pid 2883] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 2883] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
[pid 2883] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 2883] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
[pid 2883] read(255, "\nexit $exitcode\n", 8192) = 16
[pid 2883] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
[pid 2883] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 2883] munmap(0x2aaaada2d000, 4096) = 0
[pid 2883] exit_group(139) = ?
Process 2870 resumed
Process 2883 detached
<... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 139}], 0, NULL)
= 2883
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7fffffbf54e4, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x4312b5, [], SA_RESTORER,
0x3116d2f330}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
read(255, "exitcode=$?\n\n## Stop addon scrip"..., 6653) = 91
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
open("/home/paragw/.thunderbird/init.d/",
O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOENT (No such file or directory)
open("./init.d/", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
getdents64(3, /* 3 entries */, 4096) = 80
getdents64(3, /* 0 entries */, 4096) = 0
close(3) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
access("/home/paragw/.thunderbird/init.d/K*", X_OK) = -1 ENOENT (No such
file or directory)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
access("./init.d/K*", X_OK) = -1 ENOENT (No such file or
directory)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(139)

>-Andi
>
>
>


2005-09-08 05:44:15

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.13-mm1 X86_64: All 32bit programs segfault

On Thursday 08 September 2005 07:10, Parag Warudkar wrote:
> Andi Kleen wrote:
> >Hmm - not many x86-64 patches in mm1. 2.6.13 definitely works.
>
> 2.6.13-git7 works. So something in -mm has gone bad (if not x86_64, may
> be i386 or arch-independent changes?)
> It seems it has got something to do with the sys_set_tid_address as
> evident from the strace output below.

If you catch a crash in gdb and type x/i $pc what do you see?

-Andi

2005-09-08 08:45:06

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.13-mm1 X86_64: All 32bit programs segfault

Parag Warudkar <[email protected]> wrote:
>
> Andi Kleen wrote:
>
> >Hmm - not many x86-64 patches in mm1. 2.6.13 definitely works.
> >
> >
> 2.6.13-git7 works. So something in -mm has gone bad (if not x86_64, may
> be i386 or arch-independent changes?)
> It seems it has got something to do with the sys_set_tid_address as
> evident from the strace output below.
> Another thing - If I set LD_ASSUME_KERNEL=2.4 and then run the binary,
> it works fine.

I can't reproduce this with the current -mm lineup. I compiled up a 32-bit
app on x86 and transferred that across.

Maybe it got fixed. Please test 2.6.13-mm2, which appears to be an hour or
two away. If it still fails then I'd need a recipe (including URLs and
stuff) with which to reproduce it please.

2005-09-08 08:58:33

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.13-mm1 X86_64: All 32bit programs segfault

On Thursday 08 September 2005 10:44, Andrew Morton wrote:
> Parag Warudkar <[email protected]> wrote:
> > Andi Kleen wrote:
> > >Hmm - not many x86-64 patches in mm1. 2.6.13 definitely works.
> >
> > 2.6.13-git7 works. So something in -mm has gone bad (if not x86_64, may
> > be i386 or arch-independent changes?)
> > It seems it has got something to do with the sys_set_tid_address as
> > evident from the strace output below.
> > Another thing - If I set LD_ASSUME_KERNEL=2.4 and then run the binary,
> > it works fine.
>
> I can't reproduce this with the current -mm lineup. I compiled up a 32-bit
> app on x86 and transferred that across.

Did you use a threaded program? It appears to be related to that.

(BTW you can compile 32bit on 64bit too by passing -m32)

-Andi

2005-09-08 12:57:46

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.13-mm1 X86_64: All 32bit programs segfault

Andrew Morton wrote:

>> it works fine.
>>
>>
>
>I can't reproduce this with the current -mm lineup. I compiled up a 32-bit
>app on x86 and transferred that across.
>
>Maybe it got fixed. Please test 2.6.13-mm2, which appears to be an hour or
>two away. If it still fails then I'd need a recipe (including URLs and
>stuff) with which to reproduce it please.
>
>
You need a program which uses threads - NPTL ones I suspect.
Normal programs work fine even on my setup.
Any way I will give -mm2 a try and see.

Parag

2005-09-08 13:08:13

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.13-mm1 X86_64: All 32bit programs segfault

Andi Kleen wrote:

>If you catch a crash in gdb and type x/i $pc what do you see?
>
>-Andi
>
>
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1431820864 (LWP 2839)]
0x00c40471 in __pthread_initialize_minimal_internal ()
from /lib/libpthread.so.0
(gdb) x/i $pc
0xc40471 <__pthread_initialize_minimal_internal+78>: mov
0x2e8(%eax),%edx
(gdb) info registers
eax 0x0 0
^^^^^^^^^^^^^^^^^^
ecx 0x5557da88 1431820936
edx 0x1 1
ebx 0xffffd4a8 -11096
esp 0xffffd1e4 0xffffd1e4
ebp 0xffffd4a8 0xffffd4a8
esi 0x5557da40 1431820864
edi 0x0 0
eip 0xc40471 0xc40471
eflags 0x10202 66050
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99