The block that tests the call to umask via syscall
was commented out in this case and the module was
recompiled.
==================================================
page_fault: wrong gs 0 expected ffffffff805fb540
Unable to handle kernel NULL pointer dereference at virtual address 000000000000
0008
printing rip:
ffffffff80110053
PML4 17bb83067 PGD 17bb7d067 PMD 0
Oops: 0002
CPU 1
Pid: 2218, comm: insmod Not tainted
RIP: 0010:[<ffffffff80110053>]{system_call+3}
RSP: 0018:000001017bb85e30 EFLAGS: 00010012
RAX: 000000000000005f RBX: ffffffff8040ed20 RCX: ffffffffa00810cb
RDX: 0000000001000000 RSI: 0000000000000000 RDI: 00000000000001b6
RBP: ffffffffa0081000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000212 R12: 0000000000553f30
R13: 00000000000000b8 R14: 000000000000000c R15: 000001017e95b3c0
FS: 0000002a9557d4c0(0000) GS:ffffffff805fb540(0000) knlGS:ffffffff805fb540
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000018216000 CR4: 00000000000006e0
Call Trace: [<ffffffffa008113c>]{:syscall_test:syscall_test_init+28}
[<ffffffff801256b6>]{sys_init_module+1686} [<ffffffffa00810b8>]
[<ffffffff801100c7>]{system_call+119}
Process insmod (pid: 2218, stackpage=1017bb85000)
Stack: 000001017bb85e30 0000000000000018 ffffffffa008113c 0000000000553f30
ffffffff801256b6 000001017bb30000 000001017e99b440 000001017e99b400
0000000000554126 000001017bb30000 000001017bb32000 ffffff00000e5000
0000000000000246 00000000000000b8 ffffffffa007c000 ffffffffa00810b8
0000000000000608 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000002a958aa6c0 0000000000000000 00000000005538d0 00000000005514e0
Call Trace: [<ffffffffa008113c>]{:syscall_test:syscall_test_init+28}
[<ffffffff801256b6>]{sys_init_module+1686} [<ffffffffa00810b8>]
[<ffffffff801100c7>]{system_call+119}
Code: 65 48 89 24 25 08 00 00 00 65 48 8b 24 25 00 00 00 00 fb 48
Kernel panic: Fatal exception
=========================================
syscall_test_init() dump:
0x0000000000000060 <syscall_test_init+0>: sub $0x8,%rsp
0x0000000000000064 <syscall_test_init+4>: mov $0x0,%rdi
0x000000000000006b <syscall_test_init+11>: xor %eax,%eax
0x000000000000006d <syscall_test_init+13>: callq 0x72 <syscall_test_init+18>
0x0000000000000072 <syscall_test_init+18>: mov $0x1b6,%edi
0x0000000000000077 <syscall_test_init+23>: callq 0x0 <wrapper_umask>
0x000000000000007c <syscall_test_init+28>: mov $0x0,%rdi
0x0000000000000083 <syscall_test_init+35>: mov %rax,%rsi
0x0000000000000086 <syscall_test_init+38>: xor %eax,%eax
0x0000000000000088 <syscall_test_init+40>: callq 0x8d <syscall_test_init+45>
0x000000000000008d <syscall_test_init+45>: xor %eax,%eax
0x000000000000008f <syscall_test_init+47>: add $0x8,%rsp
0x0000000000000093 <syscall_test_init+51>: retq
0x0000000000000094 <syscall_test_init+52>: data16
0x0000000000000095 <syscall_test_init+53>: data16
0x0000000000000096 <syscall_test_init+54>: data16
0x0000000000000097 <syscall_test_init+55>: nop
0x0000000000000098 <syscall_test_init+56>: data16
0x0000000000000099 <syscall_test_init+57>: data16
0x000000000000009a <syscall_test_init+58>: data16
0x000000000000009b <syscall_test_init+59>: nop
0x000000000000009c <syscall_test_init+60>: data16
0x000000000000009d <syscall_test_init+61>: data16
0x000000000000009e <syscall_test_init+62>: data16
0x000000000000009f <syscall_test_init+63>: nop
On Mon, Sep 13, 2004 at 05:04:17PM +0300, Constantine Gavrilov wrote:
> Hello:
>
> We have a piece of kernel code that calls some system calls in kernel
> context (
Which you shouldn't do in the first place.
On Montag, 13. September 2004 16:04, Constantine Gavrilov wrote:
> We have a piece of kernel code that calls some system calls in kernel
> context (from a process with mm and a daemonized kernel thread that does
> not have mm). This works fine on IA64 and i386 architectures.
You can find the list of system calls that are supposed to work
from kernel space in asm/unistd.h inside #ifdef __KERNEL__SYSCALLS__.
On current kernels, that list only contains execve(), which should
be avoided as well in favor of call_usermodehelper. Other calls
might work on some architectures but that is not a supported
interface any more.
You could call the sys_* functions directly if they are exported,
but it is unlikely that such code gets integrated in the mainline
kernel.
The real answer for your problem highly depends on which syscalls
you want to use.
Arnd <><
Constantine Gavrilov wrote:
> Hello:
>
> We have a piece of kernel code that calls some system calls in kernel
> context (from a process with mm and a daemonized kernel thread that does
> not have mm). This works fine on IA64 and i386 architectures.
>
> When I try this on x86-64 kernel on Opteron machines, it results in
> immediate crash. I have tried standard _syscall() macros from
> asm/unistd.h. The system panics when returning from the system call.
> The disassembled code shows that gcc has often a hard time deciding
> which registers (32-bit or 64-bit) it will use. For example, it puts the
> system call number to eax, while it should put it to rax. However, this
> register thing is not a problem. I have tried my own gcc hand-crafted
> inline assembly and glibc inline syscall assembly that results in
> "correct" disassembled code. The result is always the same -- kernel
> crash when calling a function defined by _syscall() macros or when using
> an "inline" block defined by glibc macros.
>
> Attached please find a test module that tries to call the umask() (JUST
> TO DEMONSTRATE a problem) via the syscall machanism. Both methods (the
> _syscall1() marco and GLIBC INLINE_SYCALL() were used.
>
> The assembly dump of the umask() called via _syscall(1) and via
> INLINE_SYSCALL() as well as the disassembly of umask() from glibc are
> provided in a separate attachement. The crash dump (captured with a
> serial console) is provided along with disassembly of the main module
> function.
>
> It seems that segmentation is changed during the syscall and not
> restored properly, or some other REALLY BAD THING happens. The entry.S
> for x86_64 architecture is very informative, but I am not an expert in
> Opteron architecture and I do not know how the syscall instruction is
> supposed to work.
>
> Can someone explain the reason for the crash? Can you think of a
> workaround? Comments and ideas are very welcome (except of the kind that
> it can be implemented in the user space or with a help of a user proxy
> process).
You should never use the unistd.h macros from kernel space. Call
sys_foo() directly. This may mean you have to export it. The reason it
crashes is that the "syscall" opcode used by the x86-64 macros (unlike
the "int $0x80" for i386) causes a fault when already running in kernel
space.
--
Brian Gerst
Christoph Hellwig wrote:
>On Mon, Sep 13, 2004 at 05:04:17PM +0300, Constantine Gavrilov wrote:
>
>
>>Hello:
>>
>>We have a piece of kernel code that calls some system calls in kernel
>>context (
>>
>>
>
>Which you shouldn't do in the first place.
>
>
Function kernel_thread() on i386 is implemented by putting the args to
appropriate regs and calling int 0x80, resulting in a system call
clone() on i386.
I have also found the "syscall" instruction in x86-64 kernel specific
code (it does not call _syscall() macros directly, though). So,
"shouldn't do" is a bit too strong.
What I am writing is an application, and not interface. As such, it is
not much different from its requierements from a user-space application.
If user-space application may call system calls, why a kernel space
application cannot?
And BTW, kernel-space applications have their own place even if the
concept seems foreign to you.
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
Brian Gerst wrote:
>
> You should never use the unistd.h macros from kernel space. Call
> sys_foo() directly. This may mean you have to export it. The reason
> it crashes is that the "syscall" opcode used by the x86-64 macros
> (unlike the "int $0x80" for i386) causes a fault when already running
> in kernel space.
>
> --
> Brian Gerst
I can see from the crash report that the fault happens. I want to
understand why.
I can use workarounds. (Calling sys_foo() directly from module can be a
problem -- I would have to know the "versioned" function name or the
address of the function within the kernel space. Calling an entry from
the syscall table is much easier.)
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
Arnd Bergmann wrote:
>On Montag, 13. September 2004 16:04, Constantine Gavrilov wrote:
>
>
>>We have a piece of kernel code that calls some system calls in kernel
>>context (from a process with mm and a daemonized kernel thread that does
>>not have mm). This works fine on IA64 and i386 architectures.
>>
>>
>
>You can find the list of system calls that are supposed to work
>from kernel space in asm/unistd.h inside #ifdef __KERNEL__SYSCALLS__.
>On current kernels, that list only contains execve(), which should
>be avoided as well in favor of call_usermodehelper. Other calls
>might work on some architectures but that is not a supported
>interface any more.
>
>You could call the sys_* functions directly if they are exported,
>but it is unlikely that such code gets integrated in the mainline
>kernel.
>
>The real answer for your problem highly depends on which syscalls
>you want to use.
>
> Arnd <><
>
>
I can implement differently what I want, though it will be somewhat
kludgy and kernel depenedent (depends on a version and distribution). I
wanted to avoid that. Since what I write is really an application and
not interface, it was very "native" to use application syscall approach.
My real problem is not how to implement it. I want to understand this
specific x86_64 problem.
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
Hi Constantine,
On Mon, Sep 13, 2004 at 06:05:52PM +0300, Constantine Gavrilov wrote:
> And BTW, kernel-space applications have their own place even if the
> concept seems foreign to you.
I avoided to do like i386 that inefficiently calls int 0x80 when you can
call sys_read/sys_write etc.. by hand.
the syscall is only meaningful if you're not in kernel space. Once
you're in kernel space if you ever try to invoke a syscall again (either
via int 0x80, syscall, sysenter, call gate, whatever) then you're just
going slower than you should for no good reason.
The only point of calling int 0x80 and friends is to change mode from
user space to kernel space, and you're in kernel space already so you
should just call sys_read/sys_write etc.. by hand which will not waste
precious cycles and it'll be a lot simpler too.
Note also that int 0x80 will bring you into the 32bit emulation layer,
the only 64bit entry point is reacheable only via syscall.
Hope this helps.
On Mon, Sep 13, 2004 at 06:05:52PM +0300, Constantine Gavrilov wrote:
> What I am writing is an application, and not interface. As such, it is
> not much different from its requierements from a user-space application.
> If user-space application may call system calls, why a kernel space
> application cannot?
>
> And BTW, kernel-space applications have their own place even if the
> concept seems foreign to you.
What kind of application is this?
And do you have a link to your source code available?
thanks,
greg k-h
On Mon, 13 Sep 2004 18:17:36 +0200
Andrea Arcangeli <[email protected]> wrote:
> Hi Constantine,
>
> On Mon, Sep 13, 2004 at 06:05:52PM +0300, Constantine Gavrilov wrote:
> > And BTW, kernel-space applications have their own place even if the
> > concept seems foreign to you.
>
> I avoided to do like i386 that inefficiently calls int 0x80 when you can
> call sys_read/sys_write etc.. by hand.
>
> the syscall is only meaningful if you're not in kernel space. Once
> you're in kernel space if you ever try to invoke a syscall again (either
> via int 0x80, syscall, sysenter, call gate, whatever) then you're just
> going slower than you should for no good reason.
>
> The only point of calling int 0x80 and friends is to change mode from
> user space to kernel space, and you're in kernel space already so you
> should just call sys_read/sys_write etc.. by hand which will not waste
> precious cycles and it'll be a lot simpler too.
>
> Note also that int 0x80 will bring you into the 32bit emulation layer,
> the only 64bit entry point is reacheable only via syscall.
>
> Hope this helps.
Actually, the fact that system calls work in kernel space I would consider
a BUG. The int 0x80 handler should oops or at least kill the offending
thread for security and robustness reasons.
--
Stephen Hemminger mailto:[email protected]
Open Source Development Lab http://developer.osdl.org/shemminger
Constantine Gavrilov wrote:
> Christoph Hellwig wrote:
>
>> On Mon, Sep 13, 2004 at 05:04:17PM +0300, Constantine Gavrilov wrote:
>>
>>
>>> Hello:
>>>
>>> We have a piece of kernel code that calls some system calls in kernel
>>> context (
>>>
>>
>>
>> Which you shouldn't do in the first place.
>>
>>
>
> Function kernel_thread() on i386 is implemented by putting the args to
> appropriate regs and calling int 0x80, resulting in a system call
> clone() on i386.
It's gone in 2.6, in favor of calling do_fork() directly.
> I have also found the "syscall" instruction in x86-64 kernel specific
> code (it does not call _syscall() macros directly, though). So,
> "shouldn't do" is a bit too strong.
>
> What I am writing is an application, and not interface. As such, it is
> not much different from its requierements from a user-space application.
> If user-space application may call system calls, why a kernel space
> application cannot?
>
> And BTW, kernel-space applications have their own place even if the
> concept seems foreign to you.
What are you trying to do that can't be done in user space? The only
possible reason for a kernel space app is for performance (like knfsd),
at the cost of risking system stability and security.
--
Brian Gerst
Followup to: <[email protected]>
By author: Constantine Gavrilov <[email protected]>
In newsgroup: linux.dev.kernel
>
> I can implement differently what I want, though it will be somewhat
> kludgy and kernel depenedent (depends on a version and distribution). I
> wanted to avoid that. Since what I write is really an application and
> not interface, it was very "native" to use application syscall approach.
>
> My real problem is not how to implement it. I want to understand this
> specific x86_64 problem.
>
Put it in userspace. Really.
-hpa
On Mon, Sep 13, 2004 at 09:41:48AM -0700, Stephen Hemminger wrote:
> Actually, the fact that system calls work in kernel space I would consider
> a BUG. The int 0x80 handler should oops or at least kill the offending
> thread for security and robustness reasons.
kernel_thread is using int 0x80 in x86, and yes, that should better
implemented without it (like we did in x86-64).
Christoph Hellwig wrote:
>> Which you shouldn't do in the first place.
On Mon, Sep 13, 2004 at 06:05:52PM +0300, Constantine Gavrilov wrote:
> Function kernel_thread() on i386 is implemented by putting the args to
> appropriate regs and calling int 0x80, resulting in a system call
> clone() on i386.
> I have also found the "syscall" instruction in x86-64 kernel specific
> code (it does not call _syscall() macros directly, though). So,
> "shouldn't do" is a bit too strong.
> What I am writing is an application, and not interface. As such, it is
> not much different from its requierements from a user-space application.
> If user-space application may call system calls, why a kernel space
> application cannot?
> And BTW, kernel-space applications have their own place even if the
> concept seems foreign to you.
This is not something we particularly endorse, but when making syscalls
the function calls sys_foo() suffice. Also, ia32 does not use syscall
traps for kernel_thread() in current 2.6.x
-- wli