2006-02-12 17:08:26

by Roberto Nibali

[permalink] [raw]
Subject: trap int3 problem while porting a user space application and small cleanup patch

Hello,

For a while I've been working on a little tool called mpt-status to be
able to monitor LSI based controllers. The source can be found here:

http://www.drugphish.ch/~ratz/mpt-status/

The issue I'm trying to track down now is why I cannot get it to work on
a x86_64 kernel (Sun Fire V20z with AMD Opteron(tm) Processor 252 on
SLES 9 PL3). I suspect 32/64 bit issues between in my ioctl message
passing between user space and kernel space. Unfortunately when I strace
the kernel spits out tons of following entries:

mpt-status[16045] trap int3 rip:400acf rsp:7fbfff70b0 error:0
mpt-status[16045] trap int3 rip:4008f1 rsp:7fbfff70a8 error:0
mpt-status[16045] trap int3 rip:400b86 rsp:7fbfff70b0 error:0

I can only remotely guess what happened because I'm not sound on x64
trap handling, so my question is: How can I best debug and address this
issue in my tool?

I'm pretty sure it has something to do with me including kernel headers
in a user space tool, but noone has done the sanitizing for the LSI
related headers residing in drivers/message/fusion. It works on all
32-bit machines I've tested so far.

Attached is a small code style cleanup patch that resulted from my
skimming through the arch/x86_64/kernel/traps.c code to figure out what
went haywire. If Andi is ok with it, please consider applying.

Signed-off-by: Roberto Nibali <[email protected]>

Best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc


Attachments:
x86_64_kernel_traps_cleanup-1.diff (1.82 kB)

2006-02-13 00:57:45

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] trap int3 problem while porting a user space application and small cleanup patch

On Sunday 12 February 2006 18:08, Roberto Nibali wrote:
> Hello,
>
> For a while I've been working on a little tool called mpt-status to be
> able to monitor LSI based controllers. The source can be found here:
>
> http://www.drugphish.ch/~ratz/mpt-status/
>
> The issue I'm trying to track down now is why I cannot get it to work on
> a x86_64 kernel (Sun Fire V20z with AMD Opteron(tm) Processor 252 on
> SLES 9 PL3). I suspect 32/64 bit issues between in my ioctl message
> passing between user space and kernel space.

Quite possible. The mpt ioctls would need a ioctl conversion handler
to allow a 32bit program to use the 64bit ioctls. Or just use a 64bit
executable.

> Unfortunately when I strace
> the kernel spits out tons of following entries:

Some kernel versions printed that with strace. I think I fixed it in
mainline, but I can't remember if it was fixed in SLES9 too (apparently not)
It's fairly harmless, just ignore it. If it really bothers you you can
turn it off with echo 0 > /proc/sys/debug/exception-trace


>
> Attached is a small code style cleanup patch that resulted from my
> skimming through the arch/x86_64/kernel/traps.c code to figure out what
> went haywire. If Andi is ok with it, please consider applying.

Hmm, ok applied.
-Andi

2006-02-13 07:55:52

by Roberto Nibali

[permalink] [raw]
Subject: Re: [discuss] trap int3 problem while porting a user space application and small cleanup patch

Hello Andi,

Thanks for your comments.

>> The issue I'm trying to track down now is why I cannot get it to work on
>> a x86_64 kernel (Sun Fire V20z with AMD Opteron(tm) Processor 252 on
>> SLES 9 PL3). I suspect 32/64 bit issues between in my ioctl message
>> passing between user space and kernel space.
>
> Quite possible. The mpt ioctls would need a ioctl conversion handler
> to allow a 32bit program to use the 64bit ioctls. Or just use a 64bit
> executable.

It is a 64bit executable:

ratz@cpp9:~/mpt-status-1.1.5-RC3> readelf -h ./mpt-status | grep 64
ELF Header:
Class: ELF64
Machine: Advanced Micro Devices X86-64
Start of program headers: 64 (bytes into file)
Size of this header: 64 (bytes)
Size of section headers: 64 (bytes)
ratz@cpp9:~/mpt-status-1.1.5-RC3> ldd ./mpt-status
libc.so.6 => /lib64/tls/libc.so.6 (0x0000002a9566d000)
/lib64/ld-linux-x86-64.so.2 (0x0000002a95556000)

The strace looks ok with regard to the ioctl though:

cpp9:/home/ratz/mpt-status-1.1.5-RC3 # strace ./mpt-status
execve("./mpt-status", ["./mpt-status"], [/* 44 vars */]) = 0
uname({sys="Linux", node="cpp9", ...}) = 0
brk(0) = 0x503000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2a9556b000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=26878, ...}) = 0
mmap(NULL, 26878, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2a9556c000
close(3) = 0
open("/lib64/tls/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\313\1\0"...,
640) = 640
lseek(3, 624, SEEK_SET) = 624
read(3, "\4\0\0\0\20\0\0\0\1\0\0\0GNU\0\0\0\0\0\2\0\0\0\6\0\0\0"..., 32)
= 32
fstat(3, {st_mode=S_IFREG|0755, st_size=1424617, ...}) = 0
mmap(NULL, 2254664, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x2a9566d000
madvise(0x2a9566d000, 2254664, MADV_SEQUENTIAL|0x1) = 0
mprotect(0x2a95778000, 1161032, PROT_NONE) = 0
mmap(0x2a95877000, 102400, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x10a000) = 0x2a95877000
mmap(0x2a95890000, 14152, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2a95890000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2a95894000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2a95895000
arch_prctl(ARCH_SET_FS, 0x2a95894b00) = 0
munmap(0x2a9556c000, 26878) = 0
open("/dev/mptctl", O_RDWR) = 3
brk(0) = 0x503000
brk(0x526000) = 0x526000
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2a9556c000
write(1, "SGE ptr: 0x7fbfffc144\n", 22SGE ptr: 0x7fbfffc144
) = 22
write(1, "conf ptr: 0x7fbfffc124\n", 23conf ptr: 0x7fbfffc124
) = 23
write(1, "dataSgeOffset: 4\n", 17dataSgeOffset: 4
) = 17
ioctl(3, 0xc0486d14, 0x7fbfffc0e0) = 0
ioctl(3, 0xc0486d14, 0x7fbfffc0e0) = 0
write(1, "\nYou seem to have no SCSI disks "..., 139
You seem to have no SCSI disks attached to your HBA or you have
them on a different scsi_id. To get your SCSI id, run:

mpt-status -p
) = 139
write(1, "\n", 1
) = 1
munmap(0x2a9556c000, 4096) = 0
exit_group(1) = ?

My next steps will involve enabling full debug of the mptctl driver to
find out where it gets stuck and to sprinkle a few printk's to see if
the struct's got the wrong size or has been packed wrongly. Even the
SuSE provided mpt-status (including the patches) does not work correctly
on this machine. So I reckon I try to get my hands on a SLES support
contract and/or maybe ping LSIL.

From the looks of the MPI headers one can see that LSIL carefully
thought about the 64bit case and thus I'm really astonished it does not
work.

>> Unfortunately when I strace
>> the kernel spits out tons of following entries:
>
> Some kernel versions printed that with strace. I think I fixed it in
> mainline, but I can't remember if it was fixed in SLES9 too (apparently not)
> It's fairly harmless, just ignore it. If it really bothers you you can
> turn it off with echo 0 > /proc/sys/debug/exception-trace

Nice.

> Hmm, ok applied.

:) I know, not exactly fixing anything, just creating more work for you.

Best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

2006-02-13 09:25:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] trap int3 problem while porting a user space application and small cleanup patch

On Monday 13 February 2006 08:55, Roberto Nibali wrote:
> Hello Andi,
>
> Thanks for your comments.
>
> >> The issue I'm trying to track down now is why I cannot get it to work on
> >> a x86_64 kernel (Sun Fire V20z with AMD Opteron(tm) Processor 252 on
> >> SLES 9 PL3). I suspect 32/64 bit issues between in my ioctl message
> >> passing between user space and kernel space.
> >
> > Quite possible. The mpt ioctls would need a ioctl conversion handler
> > to allow a 32bit program to use the 64bit ioctls. Or just use a 64bit
> > executable.
>
> It is a 64bit executable:

Then whatever problem the program has is not enabled to 32bit ioctl emulation.
Maybe it has some generic 64bit issues.

Thanks for looking into it.

-Andi