2006-08-19 07:30:34

by Björn Steinbrink

[permalink] [raw]
Subject: [PATCH] Return real errno from execve in ____call_usermodehelper

If execve fails in ____call_usermodehelper we treat its return value as
error code, but as execve is a syscall, we actually want -errno there.

Signed-off-by: Bj?rn Steinbrink <[email protected]>

--

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 1d32def..865abc0 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -149,8 +149,10 @@ static int ____call_usermodehelper(void
set_cpus_allowed(current, CPU_MASK_ALL);

retval = -EPERM;
- if (current->fs->root)
- retval = execve(sub_info->path, sub_info->argv,sub_info->envp);
+ if (current->fs->root) {
+ execve(sub_info->path, sub_info->argv, sub_info->envp);
+ retval = -errno;
+ }

/* Exec failed? */
sub_info->retval = retval;


2006-08-19 08:14:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Return real errno from execve in ____call_usermodehelper

On Sat, 19 Aug 2006 09:30:31 +0200
Bj?rn Steinbrink <[email protected]> wrote:

> If execve fails in ____call_usermodehelper we treat its return value as
> error code, but as execve is a syscall, we actually want -errno there.
>
> Signed-off-by: Bj?rn Steinbrink <[email protected]>
>
> --
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 1d32def..865abc0 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -149,8 +149,10 @@ static int ____call_usermodehelper(void
> set_cpus_allowed(current, CPU_MASK_ALL);
>
> retval = -EPERM;
> - if (current->fs->root)
> - retval = execve(sub_info->path, sub_info->argv,sub_info->envp);
> + if (current->fs->root) {
> + execve(sub_info->path, sub_info->argv, sub_info->envp);
> + retval = -errno;
> + }
>
> /* Exec failed? */
> sub_info->retval = retval;

ug. I wish we could find some way of using do_execve() here. Or hoist
sys_execve() out of the architectures.

2006-08-19 08:42:43

by Russell King

[permalink] [raw]
Subject: Re: [PATCH] Return real errno from execve in ____call_usermodehelper

On Sat, Aug 19, 2006 at 01:14:28AM -0700, Andrew Morton wrote:
> On Sat, 19 Aug 2006 09:30:31 +0200
> Bj?rn Steinbrink <[email protected]> wrote:
>
> > If execve fails in ____call_usermodehelper we treat its return value as
> > error code, but as execve is a syscall, we actually want -errno there.
> >
> > Signed-off-by: Bj?rn Steinbrink <[email protected]>
> >
> > --
> >
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index 1d32def..865abc0 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -149,8 +149,10 @@ static int ____call_usermodehelper(void
> > set_cpus_allowed(current, CPU_MASK_ALL);
> >
> > retval = -EPERM;
> > - if (current->fs->root)
> > - retval = execve(sub_info->path, sub_info->argv,sub_info->envp);
> > + if (current->fs->root) {
> > + execve(sub_info->path, sub_info->argv, sub_info->envp);
> > + retval = -errno;
> > + }
> >
> > /* Exec failed? */
> > sub_info->retval = retval;
>
> ug. I wish we could find some way of using do_execve() here. Or hoist
> sys_execve() out of the architectures.

Some architectures do implement their own special execve() function,
some of which are written for how this code above currently is (iow,
not using errno) and are probably buggy in that respect.

Maybe what we should be thinking of doing is changing execve() calls
to kernel_execve() which returns the error code.

This way, architectures are free to implement execve() whatever way
they wish - and if they're concerned about using errno, that's their
own implementation specific detail.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-08-20 13:01:44

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] Return real errno from execve in ____call_usermodehelper

On Saturday 19 August 2006 10:42, Russell King wrote:
> Maybe what we should be thinking of doing is changing execve() calls
> to kernel_execve() which returns the error code.
>
> This way, architectures are free to implement execve() whatever way
> they wish - and if they're concerned about using errno, that's their
> own implementation specific detail.

Sounds good, it means we could finally kill __KERNEL_SYSCALLS__ along
with lib/errno.c.

I guess a fallback for those that haven't yet done kernel_execve could be

#ifdef CONFIG_ARCH_KERNEL_EXECVE
extern int kernel_execve(const char *filename,
char *const argv[], char *const envp[]);
#else
static inline int kernel_execve(const char *filename,
char *const argv[], char *const envp[]);
{
int errno;
mm_segment_t old_fs = get_fs();
set_fs(KERNEL_DS);
/* the kernel syscall macro modifies errno */
execve(filename, argv, envp);
set_fs(old_fs);
return errno;
}
#endif

With that in place, we can remove the global errno right away, and the
kernel syscalls for any architecture that implements its own kernel_execve.

Arnd <><

2006-08-20 13:47:51

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [PATCH] Return real errno from execve in ____call_usermodehelper

On 2006.08.20 15:01:28 +0200, Arnd Bergmann wrote:
> On Saturday 19 August 2006 10:42, Russell King wrote:
> > Maybe what we should be thinking of doing is changing execve() calls
> > to kernel_execve() which returns the error code.
> >
> > This way, architectures are free to implement execve() whatever way
> > they wish - and if they're concerned about using errno, that's their
> > own implementation specific detail.
>
> Sounds good, it means we could finally kill __KERNEL_SYSCALLS__ along
> with lib/errno.c.
>
> I guess a fallback for those that haven't yet done kernel_execve could be
>
> #ifdef CONFIG_ARCH_KERNEL_EXECVE
> extern int kernel_execve(const char *filename,
> char *const argv[], char *const envp[]);
> #else
> static inline int kernel_execve(const char *filename,
> char *const argv[], char *const envp[]);
> {
> int errno;
> mm_segment_t old_fs = get_fs();
> set_fs(KERNEL_DS);
> /* the kernel syscall macro modifies errno */
> execve(filename, argv, envp);
> set_fs(old_fs);
> return errno;
> }
> #endif
>
> With that in place, we can remove the global errno right away, and the
> kernel syscalls for any architecture that implements its own kernel_execve.

How is execve() supposed to use the local errno? The kernel syscall
macro only "creates" a function, so you still need a global errno for
that code, don't you?

And I (because I'm clueless ;) wonder about the calls to set_fs(), why
do we need them? The current code does not seem to do them. Or is there
something special about kernel_execve that I'm missing? cscope and grep
didn't tell anything and Google had only a few useless results for
kernel_execve.

Bj?rn

2006-08-20 17:13:58

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Sunday 20 August 2006 15:47, Bj?rn Steinbrink wrote:
> How is execve() supposed to use the local errno? The kernel syscall
> macro only "creates" a function, so you still need a global errno for
> that code, don't you?

Right, I got confused by the macro referencing it. As an alternative,
you can have a static errno variable in the source file that defines
kernel_execve.

> And I (because I'm clueless ;) wonder about the calls to set_fs(), why
> do we need them? The current code does not seem to do them. Or is there
> something special about kernel_execve that I'm missing? cscope and grep
> didn't tell anything and Google had only a few useless results for
> kernel_execve.

You need to do set_fs if you want to pass a kernel pointer to a function
expecting a __user pointer (like 'char __user *argv[]'). I guess every
place in the kernel where we do call execve actually is running in a
set_fs(KERNEL_DS) environment already and anything else would not
make too much sense. Maybe adding a small check in there to make sure
we're really running in kernel space is better then.

---

Iit turned out most of the architectures that already implement
their own execve() call instead of using the _syscall3 function
for it end up passing the return value of sys_execve down,
instead of setting errno.

The patch below converts those functions to a new kernel_execve
variant and provides a lib/execve.c file with an alternative
implementation for the architectures that are using the traditional
__KERNEL_SYSCALLS__ mechanism for it. It also removes the kernel
syscalls implementation on the architectures that no longer need
it.

The architectures that this patch doesn't touch should ideally
introduce their own kernel_execve() function to get rid of
__KERNEL_SYSCALLS__ as well.

Signed-off-by: Arnd Bergmann <[email protected]>
---
Tested-on: i386
Compiled-on: i386, powerpc

arch/alpha/Kconfig | 3 +
arch/alpha/kernel/entry.S | 10 ++--
arch/arm/Kconfig | 3 +
arch/arm/kernel/sys_arm.c | 4 -
arch/arm26/Kconfig | 3 +
arch/arm26/kernel/sys_arm.c | 2
arch/ia64/Kconfig | 3 +
arch/ia64/kernel/entry.S | 4 -
arch/parisc/Kconfig | 3 +
arch/parisc/kernel/process.c | 9 +++-
arch/powerpc/Kconfig | 3 +
arch/powerpc/kernel/misc_32.S | 2
arch/powerpc/kernel/misc_64.S | 2
arch/sparc64/kernel/power.c | 5 --
arch/um/Kconfig | 3 +
arch/um/kernel/syscall.c | 13 ++++++
arch/x86_64/Kconfig | 3 +
arch/x86_64/kernel/entry.S | 4 -
drivers/sbus/char/bbc_envctrl.c | 5 --
drivers/sbus/char/envctrl.c | 5 --
include/asm-alpha/unistd.h | 69 ----------------------------------
include/asm-arm/unistd.h | 24 -----------
include/asm-arm26/unistd.h | 24 -----------
include/asm-ia64/unistd.h | 72 -----------------------------------
include/asm-parisc/unistd.h | 79 ---------------------------------------
include/asm-powerpc/unistd.h | 7 ---
include/asm-um/unistd.h | 27 -------------
include/asm-x86_64/unistd.h | 81 ----------------------------------------
include/linux/syscalls.h | 2
init/do_mounts_initrd.c | 3 -
init/main.c | 4 -
kernel/kmod.c | 5 --
lib/Makefile | 4 +
lib/execve.c | 19 +++++++++
34 files changed, 92 insertions(+), 417 deletions(-)

Index: linux-cg/init/do_mounts_initrd.c
===================================================================
--- linux-cg.orig/init/do_mounts_initrd.c 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/init/do_mounts_initrd.c 2006-08-20 19:06:00.000000000 +0200
@@ -1,4 +1,3 @@
-#define __KERNEL_SYSCALLS__
#include <linux/unistd.h>
#include <linux/kernel.h>
#include <linux/fs.h>
@@ -35,7 +34,7 @@
(void) sys_open("/dev/console",O_RDWR,0);
(void) sys_dup(0);
(void) sys_dup(0);
- return execve(shell, argv, envp_init);
+ return kernel_execve(shell, argv, envp_init);
}

static void __init handle_initrd(void)
Index: linux-cg/kernel/kmod.c
===================================================================
--- linux-cg.orig/kernel/kmod.c 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/kernel/kmod.c 2006-08-20 19:06:00.000000000 +0200
@@ -18,8 +18,6 @@
call_usermodehelper wait flag, and remove exec_usermodehelper.
Rusty Russell <[email protected]> Jan 2003
*/
-#define __KERNEL_SYSCALLS__
-
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/syscalls.h>
@@ -150,7 +148,8 @@

retval = -EPERM;
if (current->fs->root)
- retval = execve(sub_info->path, sub_info->argv,sub_info->envp);
+ retval = kernel_execve(sub_info->path,
+ sub_info->argv, sub_info->envp);

/* Exec failed? */
sub_info->retval = retval;
Index: linux-cg/lib/Makefile
===================================================================
--- linux-cg.orig/lib/Makefile 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/lib/Makefile 2006-08-20 19:06:00.000000000 +0200
@@ -33,6 +33,10 @@
lib-y += dec_and_lock.o
endif

+ifneq ($(CONFIG_HAVE_KERNEL_EXECVE),y)
+ lib-y += execve.o
+endif
+
obj-$(CONFIG_CRC_CCITT) += crc-ccitt.o
obj-$(CONFIG_CRC16) += crc16.o
obj-$(CONFIG_CRC32) += crc32.o
Index: linux-cg/lib/execve.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-cg/lib/execve.c 2006-08-20 19:06:00.000000000 +0200
@@ -0,0 +1,19 @@
+#include <asm/bug.h>
+#include <asm/uaccess.h>
+
+#define __KERNEL_SYSCALLS__
+static int errno;
+#include <asm/unistd.h>
+
+int kernel_execve(const char *filename, char *const argv[], char *const envp[])
+{
+ mm_segment_t fs = get_fs();
+ int ret;
+
+ WARN_ON(segment_eq(fs, USER_DS));
+ ret = execve(filename, (char **)argv, (char **)envp);
+ if (ret)
+ ret = errno;
+
+ return ret;
+}
Index: linux-cg/init/main.c
===================================================================
--- linux-cg.orig/init/main.c 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/init/main.c 2006-08-20 19:06:00.000000000 +0200
@@ -9,8 +9,6 @@
* Simplified starting of init: Michael A. Griffith <[email protected]>
*/

-#define __KERNEL_SYSCALLS__
-
#include <linux/types.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
@@ -679,7 +677,7 @@
static void run_init_process(char *init_filename)
{
argv_init[0] = init_filename;
- execve(init_filename, argv_init, envp_init);
+ kernel_execve(init_filename, argv_init, envp_init);
}

static int init(void * unused)
Index: linux-cg/arch/sparc64/kernel/power.c
===================================================================
--- linux-cg.orig/arch/sparc64/kernel/power.c 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/sparc64/kernel/power.c 2006-08-20 19:06:00.000000000 +0200
@@ -4,8 +4,6 @@
* Copyright (C) 1999 David S. Miller ([email protected])
*/

-#define __KERNEL_SYSCALLS__
-
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
@@ -14,6 +12,7 @@
#include <linux/delay.h>
#include <linux/interrupt.h>
#include <linux/pm.h>
+#include <linux/syscalls.h>

#include <asm/system.h>
#include <asm/auxio.h>
@@ -98,7 +97,7 @@

/* Ok, down we go... */
button_pressed = 0;
- if (execve("/sbin/shutdown", argv, envp) < 0) {
+ if (kernel_execve("/sbin/shutdown", argv, envp) < 0) {
printk("powerd: shutdown execution failed\n");
add_wait_queue(&powerd_wait, &wait);
goto again;
Index: linux-cg/arch/x86_64/kernel/entry.S
===================================================================
--- linux-cg.orig/arch/x86_64/kernel/entry.S 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/x86_64/kernel/entry.S 2006-08-20 19:06:00.000000000 +0200
@@ -1000,7 +1000,7 @@
* do_sys_execve asm fallback arguments:
* rdi: name, rsi: argv, rdx: envp, fake frame on the stack
*/
-ENTRY(execve)
+ENTRY(kernel_execve)
CFI_STARTPROC
FAKE_STACK_FRAME $0
SAVE_ALL
@@ -1013,7 +1013,7 @@
UNFAKE_STACK_FRAME
ret
CFI_ENDPROC
-ENDPROC(execve)
+ENDPROC(kernel_execve)

KPROBE_ENTRY(page_fault)
errorentry do_page_fault
Index: linux-cg/drivers/sbus/char/bbc_envctrl.c
===================================================================
--- linux-cg.orig/drivers/sbus/char/bbc_envctrl.c 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/drivers/sbus/char/bbc_envctrl.c 2006-08-20 19:06:00.000000000 +0200
@@ -4,9 +4,6 @@
* Copyright (C) 2001 David S. Miller ([email protected])
*/

-#define __KERNEL_SYSCALLS__
-static int errno;
-
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/sched.h>
@@ -200,7 +197,7 @@
printk(KERN_CRIT "kenvctrld: Shutting down the system now.\n");

shutting_down = 1;
- if (execve("/sbin/shutdown", argv, envp) < 0)
+ if (kernel_execve("/sbin/shutdown", argv, envp) < 0)
printk(KERN_CRIT "envctrl: shutdown execution failed\n");
}

Index: linux-cg/drivers/sbus/char/envctrl.c
===================================================================
--- linux-cg.orig/drivers/sbus/char/envctrl.c 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/drivers/sbus/char/envctrl.c 2006-08-20 19:06:00.000000000 +0200
@@ -19,9 +19,6 @@
* Daniele Bellucci <[email protected]>
*/

-#define __KERNEL_SYSCALLS__
-static int errno;
-
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/kthread.h>
@@ -982,7 +979,7 @@

inprog = 1;
printk(KERN_CRIT "kenvctrld: WARNING: Shutting down the system now.\n");
- if (0 > execve("/sbin/shutdown", argv, envp)) {
+ if (0 > kernel_execve("/sbin/shutdown", argv, envp)) {
printk(KERN_CRIT "kenvctrld: WARNING: system shutdown failed!\n");
inprog = 0; /* unlikely to succeed, but we could try again */
}
Index: linux-cg/arch/x86_64/Kconfig
===================================================================
--- linux-cg.orig/arch/x86_64/Kconfig 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/x86_64/Kconfig 2006-08-20 19:07:12.000000000 +0200
@@ -61,6 +61,9 @@
bool
default y

+config HAVE_KERNEL_EXECVE
+ def_bool y
+
config X86_CMPXCHG
bool
default y
Index: linux-cg/include/asm-x86_64/unistd.h
===================================================================
--- linux-cg.orig/include/asm-x86_64/unistd.h 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/include/asm-x86_64/unistd.h 2006-08-20 19:06:00.000000000 +0200
@@ -661,8 +661,6 @@
#define __ARCH_WANT_SYS_TIME
#define __ARCH_WANT_COMPAT_SYS_TIME

-#ifndef __KERNEL_SYSCALLS__
-
#define __syscall "syscall"

#define _syscall0(type,name) \
@@ -744,85 +742,6 @@
__syscall_return(type,__res); \
}

-#else /* __KERNEL_SYSCALLS__ */
-
-#include <linux/syscalls.h>
-#include <asm/ptrace.h>
-
-/*
- * we need this inline - forking from kernel space will result
- * in NO COPY ON WRITE (!!!), until an execve is executed. This
- * is no problem, but for the stack. This is handled by not letting
- * main() use the stack at all after fork(). Thus, no function
- * calls - which means inline code for fork too, as otherwise we
- * would use the stack upon exit from 'fork()'.
- *
- * Actually only pause and fork are needed inline, so that there
- * won't be any messing with the stack from main(), but we define
- * some others too.
- */
-#define __NR__exit __NR_exit
-
-static inline pid_t setsid(void)
-{
- return sys_setsid();
-}
-
-static inline ssize_t write(unsigned int fd, char * buf, size_t count)
-{
- return sys_write(fd, buf, count);
-}
-
-static inline ssize_t read(unsigned int fd, char * buf, size_t count)
-{
- return sys_read(fd, buf, count);
-}
-
-static inline off_t lseek(unsigned int fd, off_t offset, unsigned int origin)
-{
- return sys_lseek(fd, offset, origin);
-}
-
-static inline long dup(unsigned int fd)
-{
- return sys_dup(fd);
-}
-
-/* implemented in asm in arch/x86_64/kernel/entry.S */
-extern int execve(const char *, char * const *, char * const *);
-
-static inline long open(const char * filename, int flags, int mode)
-{
- return sys_open(filename, flags, mode);
-}
-
-static inline long close(unsigned int fd)
-{
- return sys_close(fd);
-}
-
-static inline pid_t waitpid(int pid, int * wait_stat, int flags)
-{
- return sys_wait4(pid, wait_stat, flags, NULL);
-}
-
-extern long sys_mmap(unsigned long addr, unsigned long len,
- unsigned long prot, unsigned long flags,
- unsigned long fd, unsigned long off);
-
-extern int sys_modify_ldt(int func, void *ptr, unsigned long bytecount);
-
-asmlinkage long sys_execve(char *name, char **argv, char **envp,
- struct pt_regs regs);
-asmlinkage long sys_clone(unsigned long clone_flags, unsigned long newsp,
- void *parent_tid, void *child_tid,
- struct pt_regs regs);
-asmlinkage long sys_fork(struct pt_regs regs);
-asmlinkage long sys_vfork(struct pt_regs regs);
-asmlinkage long sys_pipe(int *fildes);
-
-#endif /* __KERNEL_SYSCALLS__ */
-
#ifndef __ASSEMBLY__

#include <linux/linkage.h>
Index: linux-cg/arch/alpha/kernel/entry.S
===================================================================
--- linux-cg.orig/arch/alpha/kernel/entry.S 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/alpha/kernel/entry.S 2006-08-20 19:06:00.000000000 +0200
@@ -655,12 +655,12 @@
.end kernel_thread

/*
- * execve(path, argv, envp)
+ * kernel_execve(path, argv, envp)
*/
.align 4
- .globl execve
- .ent execve
-execve:
+ .globl kernel_execve
+ .ent kernel_execve
+kernel_execve:
/* We can be called from a module. */
ldgp $gp, 0($27)
lda $sp, -(32+SIZEOF_PT_REGS+8)($sp)
@@ -704,7 +704,7 @@

1: lda $sp, 32+SIZEOF_PT_REGS+8($sp)
ret
-.end execve
+.end kernel_execve


/*
Index: linux-cg/arch/arm/kernel/sys_arm.c
===================================================================
--- linux-cg.orig/arch/arm/kernel/sys_arm.c 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/arm/kernel/sys_arm.c 2006-08-20 19:06:00.000000000 +0200
@@ -279,7 +279,7 @@
return error;
}

-long execve(const char *filename, char **argv, char **envp)
+int kernel_execve(const char *filename, char *const argv[], char *const envp[]);
{
struct pt_regs regs;
int ret;
@@ -317,7 +317,7 @@
out:
return ret;
}
-EXPORT_SYMBOL(execve);
+EXPORT_SYMBOL(kernel_execve);

/*
* Since loff_t is a 64 bit type we avoid a lot of ABI hastle
Index: linux-cg/arch/arm26/kernel/sys_arm.c
===================================================================
--- linux-cg.orig/arch/arm26/kernel/sys_arm.c 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/arm26/kernel/sys_arm.c 2006-08-20 19:08:32.000000000 +0200
@@ -283,7 +283,7 @@
}

/* FIXME - see if this is correct for arm26 */
-long execve(const char *filename, char **argv, char **envp)
+int kernel_execve(const char *filename, char *const argv[], char *const envp[]);
{
struct pt_regs regs;
int ret;
@@ -320,4 +320,4 @@
return ret;
}

-EXPORT_SYMBOL(execve);
+EXPORT_SYMBOL(kernel_execve);
Index: linux-cg/arch/powerpc/kernel/misc_32.S
===================================================================
--- linux-cg.orig/arch/powerpc/kernel/misc_32.S 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/powerpc/kernel/misc_32.S 2006-08-20 19:06:00.000000000 +0200
@@ -843,7 +843,7 @@
addi r1,r1,16
blr

-_GLOBAL(execve)
+_GLOBAL(kernel_execve)
li r0,__NR_execve
sc
bnslr
Index: linux-cg/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-cg.orig/arch/powerpc/kernel/misc_64.S 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/powerpc/kernel/misc_64.S 2006-08-20 19:06:00.000000000 +0200
@@ -556,7 +556,7 @@

#endif /* CONFIG_ALTIVEC */

-_GLOBAL(execve)
+_GLOBAL(kernel_execve)
li r0,__NR_execve
sc
bnslr
Index: linux-cg/arch/um/Kconfig
===================================================================
--- linux-cg.orig/arch/um/Kconfig 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/um/Kconfig 2006-08-20 19:06:00.000000000 +0200
@@ -29,6 +29,9 @@
bool
default y

+config HAVE_KERNEL_EXECVE
+ def_bool y
+
# Used in kernel/irq/manage.c and include/linux/irq.h
config IRQ_RELEASE_METHOD
bool
Index: linux-cg/include/asm-alpha/unistd.h
===================================================================
--- linux-cg.orig/include/asm-alpha/unistd.h 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/include/asm-alpha/unistd.h 2006-08-20 19:06:00.000000000 +0200
@@ -580,75 +580,6 @@
#define __ARCH_WANT_SYS_OLDUMOUNT
#define __ARCH_WANT_SYS_SIGPENDING

-#ifdef __KERNEL_SYSCALLS__
-
-#include <linux/compiler.h>
-#include <linux/types.h>
-#include <linux/string.h>
-#include <linux/signal.h>
-#include <linux/syscalls.h>
-#include <asm/ptrace.h>
-
-static inline long open(const char * name, int mode, int flags)
-{
- return sys_open(name, mode, flags);
-}
-
-static inline long dup(int fd)
-{
- return sys_dup(fd);
-}
-
-static inline long close(int fd)
-{
- return sys_close(fd);
-}
-
-static inline off_t lseek(int fd, off_t off, int whence)
-{
- return sys_lseek(fd, off, whence);
-}
-
-static inline void _exit(int value)
-{
- sys_exit(value);
-}
-
-#define exit(x) _exit(x)
-
-static inline long write(int fd, const char * buf, size_t nr)
-{
- return sys_write(fd, buf, nr);
-}
-
-static inline long read(int fd, char * buf, size_t nr)
-{
- return sys_read(fd, buf, nr);
-}
-
-extern int execve(char *, char **, char **);
-
-static inline long setsid(void)
-{
- return sys_setsid();
-}
-
-static inline pid_t waitpid(int pid, int * wait_stat, int flags)
-{
- return sys_wait4(pid, wait_stat, flags, NULL);
-}
-
-asmlinkage int sys_execve(char *ufilename, char **argv, char **envp,
- unsigned long a3, unsigned long a4, unsigned long a5,
- struct pt_regs regs);
-asmlinkage long sys_rt_sigaction(int sig,
- const struct sigaction __user *act,
- struct sigaction __user *oact,
- size_t sigsetsize,
- void *restorer);
-
-#endif /* __KERNEL_SYSCALLS__ */
-
/* "Conditional" syscalls. What we want is

__attribute__((weak,alias("sys_ni_syscall")))
Index: linux-cg/include/asm-arm/unistd.h
===================================================================
--- linux-cg.orig/include/asm-arm/unistd.h 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/include/asm-arm/unistd.h 2006-08-20 19:06:00.000000000 +0200
@@ -548,30 +548,6 @@
#define __ARCH_WANT_SYS_SOCKETCALL
#endif

-#ifdef __KERNEL_SYSCALLS__
-
-#include <linux/compiler.h>
-#include <linux/types.h>
-#include <linux/syscalls.h>
-
-extern long execve(const char *file, char **argv, char **envp);
-
-struct pt_regs;
-asmlinkage int sys_execve(char *filenamei, char **argv, char **envp,
- struct pt_regs *regs);
-asmlinkage int sys_clone(unsigned long clone_flags, unsigned long newsp,
- struct pt_regs *regs);
-asmlinkage int sys_fork(struct pt_regs *regs);
-asmlinkage int sys_vfork(struct pt_regs *regs);
-asmlinkage int sys_pipe(unsigned long *fildes);
-struct sigaction;
-asmlinkage long sys_rt_sigaction(int sig,
- const struct sigaction __user *act,
- struct sigaction __user *oact,
- size_t sigsetsize);
-
-#endif /* __KERNEL_SYSCALLS__ */
-
/*
* "Conditional" syscalls
*
Index: linux-cg/include/asm-arm26/unistd.h
===================================================================
--- linux-cg.orig/include/asm-arm26/unistd.h 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/include/asm-arm26/unistd.h 2006-08-20 19:06:00.000000000 +0200
@@ -463,30 +463,6 @@
#define __ARCH_WANT_SYS_SIGPROCMASK
#define __ARCH_WANT_SYS_RT_SIGACTION

-#ifdef __KERNEL_SYSCALLS__
-
-#include <linux/compiler.h>
-#include <linux/types.h>
-#include <linux/syscalls.h>
-
-extern long execve(const char *file, char **argv, char **envp);
-
-struct pt_regs;
-asmlinkage int sys_execve(char *filenamei, char **argv, char **envp,
- struct pt_regs *regs);
-asmlinkage int sys_clone(unsigned long clone_flags, unsigned long newsp,
- struct pt_regs *regs);
-asmlinkage int sys_fork(struct pt_regs *regs);
-asmlinkage int sys_vfork(struct pt_regs *regs);
-asmlinkage int sys_pipe(unsigned long *fildes);
-struct sigaction;
-asmlinkage long sys_rt_sigaction(int sig,
- const struct sigaction __user *act,
- struct sigaction __user *oact,
- size_t sigsetsize);
-
-#endif /* __KERNEL_SYSCALLS__ */
-
/*
* "Conditional" syscalls
*
Index: linux-cg/include/asm-parisc/unistd.h
===================================================================
--- linux-cg.orig/include/asm-parisc/unistd.h 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/include/asm-parisc/unistd.h 2006-08-20 19:06:00.000000000 +0200
@@ -959,85 +959,6 @@
return K_INLINE_SYSCALL(name, 6, arg1, arg2, arg3, arg4, arg5, arg6); \
}

-#ifdef __KERNEL_SYSCALLS__
-
-#include <asm/current.h>
-#include <linux/compiler.h>
-#include <linux/types.h>
-#include <linux/syscalls.h>
-
-static inline pid_t setsid(void)
-{
- return sys_setsid();
-}
-
-static inline int write(int fd, const char *buf, off_t count)
-{
- return sys_write(fd, buf, count);
-}
-
-static inline int read(int fd, char *buf, off_t count)
-{
- return sys_read(fd, buf, count);
-}
-
-static inline off_t lseek(int fd, off_t offset, int count)
-{
- return sys_lseek(fd, offset, count);
-}
-
-static inline int dup(int fd)
-{
- return sys_dup(fd);
-}
-
-static inline int execve(char *filename, char * argv [],
- char * envp[])
-{
- extern int __execve(char *, char **, char **, struct task_struct *);
- return __execve(filename, argv, envp, current);
-}
-
-static inline int open(const char *file, int flag, int mode)
-{
- return sys_open(file, flag, mode);
-}
-
-static inline int close(int fd)
-{
- return sys_close(fd);
-}
-
-static inline void _exit(int exitcode)
-{
- sys_exit(exitcode);
-}
-
-static inline pid_t waitpid(pid_t pid, int *wait_stat, int options)
-{
- return sys_wait4(pid, wait_stat, options, NULL);
-}
-
-asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len,
- unsigned long prot, unsigned long flags,
- unsigned long fd, unsigned long offset);
-asmlinkage unsigned long sys_mmap2(unsigned long addr, unsigned long len,
- unsigned long prot, unsigned long flags,
- unsigned long fd, unsigned long pgoff);
-struct pt_regs;
-asmlinkage int sys_execve(struct pt_regs *regs);
-int sys_clone(unsigned long clone_flags, unsigned long usp,
- struct pt_regs *regs);
-int sys_vfork(struct pt_regs *regs);
-int sys_pipe(int *fildes);
-struct sigaction;
-asmlinkage long sys_rt_sigaction(int sig,
- const struct sigaction __user *act,
- struct sigaction __user *oact,
- size_t sigsetsize);
-
-#endif /* __KERNEL_SYSCALLS__ */
-
#endif /* __ASSEMBLY__ */

#undef STR
Index: linux-cg/include/asm-powerpc/unistd.h
===================================================================
--- linux-cg.orig/include/asm-powerpc/unistd.h 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/include/asm-powerpc/unistd.h 2006-08-20 19:06:00.000000000 +0200
@@ -479,13 +479,6 @@
#endif

/*
- * System call prototypes.
- */
-#ifdef __KERNEL_SYSCALLS__
-extern int execve(const char *file, char **argv, char **envp);
-#endif /* __KERNEL_SYSCALLS__ */
-
-/*
* "Conditional" syscalls
*
* What we want is __attribute__((weak,alias("sys_ni_syscall"))),
Index: linux-cg/include/asm-um/unistd.h
===================================================================
--- linux-cg.orig/include/asm-um/unistd.h 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/include/asm-um/unistd.h 2006-08-20 19:06:00.000000000 +0200
@@ -37,33 +37,6 @@
#define __ARCH_WANT_SYS_RT_SIGSUSPEND
#endif

-#ifdef __KERNEL_SYSCALLS__
-
-#include <linux/compiler.h>
-#include <linux/types.h>
-
-static inline int execve(const char *filename, char *const argv[],
- char *const envp[])
-{
- mm_segment_t fs;
- int ret;
-
- fs = get_fs();
- set_fs(KERNEL_DS);
- ret = um_execve(filename, argv, envp);
- set_fs(fs);
-
- if (ret >= 0)
- return ret;
-
- errno = -(long)ret;
- return -1;
-}
-
-int sys_execve(char *file, char **argv, char **env);
-
-#endif /* __KERNEL_SYSCALLS__ */
-
#undef __KERNEL_SYSCALLS__
#include "asm/arch/unistd.h"

Index: linux-cg/include/linux/syscalls.h
===================================================================
--- linux-cg.orig/include/linux/syscalls.h 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/include/linux/syscalls.h 2006-08-20 19:06:00.000000000 +0200
@@ -597,4 +597,6 @@
asmlinkage long sys_set_robust_list(struct robust_list_head __user *head,
size_t len);

+int kernel_execve(const char *filename, char *const argv[], char *const envp[]);
+
#endif
Index: linux-cg/arch/ia64/kernel/entry.S
===================================================================
--- linux-cg.orig/arch/ia64/kernel/entry.S 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/ia64/kernel/entry.S 2006-08-20 19:06:00.000000000 +0200
@@ -492,11 +492,11 @@
br.ret.sptk.many rp
END(prefetch_stack)

-GLOBAL_ENTRY(execve)
+GLOBAL_ENTRY(kernel_execve)
mov r15=__NR_execve // put syscall number in place
break __BREAK_SYSCALL
br.ret.sptk.many rp
-END(execve)
+END(kernel_execve)

GLOBAL_ENTRY(clone)
mov r15=__NR_clone // put syscall number in place
Index: linux-cg/arch/parisc/kernel/process.c
===================================================================
--- linux-cg.orig/arch/parisc/kernel/process.c 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/parisc/kernel/process.c 2006-08-20 19:06:00.000000000 +0200
@@ -368,7 +368,14 @@
return error;
}

-unsigned long
+extern int __execve(const char *filename, char *const argv[],
+ char *const envp[], struct task_struct *task);
+int kernel_execve(const char *filename, char *const argv[], char *const envp[]);
+{
+ return __execve(filename, argv, envp, current);
+}
+
+unsigned long
get_wchan(struct task_struct *p)
{
struct unwind_frame_info info;
Index: linux-cg/arch/um/kernel/syscall.c
===================================================================
--- linux-cg.orig/arch/um/kernel/syscall.c 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/arch/um/kernel/syscall.c 2006-08-20 19:06:00.000000000 +0200
@@ -164,3 +164,16 @@
spin_unlock(&syscall_lock);
return(ret);
}
+
+int kernel_execve(const char *filename, char *const argv[], char *const envp[])
+{
+ mm_segment_t fs;
+ int ret;
+
+ fs = get_fs();
+ set_fs(KERNEL_DS);
+ ret = um_execve(filename, argv, envp);
+ set_fs(fs);
+
+ return ret;
+}
Index: linux-cg/include/asm-ia64/unistd.h
===================================================================
--- linux-cg.orig/include/asm-ia64/unistd.h 2006-08-20 19:05:53.000000000 +0200
+++ linux-cg/include/asm-ia64/unistd.h 2006-08-20 19:06:00.000000000 +0200
@@ -319,78 +319,6 @@

extern long __ia64_syscall (long a0, long a1, long a2, long a3, long a4, long nr);

-#ifdef __KERNEL_SYSCALLS__
-
-#include <linux/compiler.h>
-#include <linux/string.h>
-#include <linux/signal.h>
-#include <asm/ptrace.h>
-#include <linux/stringify.h>
-#include <linux/syscalls.h>
-
-static inline long
-open (const char * name, int mode, int flags)
-{
- return sys_open(name, mode, flags);
-}
-
-static inline long
-dup (int fd)
-{
- return sys_dup(fd);
-}
-
-static inline long
-close (int fd)
-{
- return sys_close(fd);
-}
-
-static inline off_t
-lseek (int fd, off_t off, int whence)
-{
- return sys_lseek(fd, off, whence);
-}
-
-static inline void
-_exit (int value)
-{
- sys_exit(value);
-}
-
-#define exit(x) _exit(x)
-
-static inline long
-write (int fd, const char * buf, size_t nr)
-{
- return sys_write(fd, buf, nr);
-}
-
-static inline long
-read (int fd, char * buf, size_t nr)
-{
- return sys_read(fd, buf, nr);
-}
-
-
-static inline long
-setsid (void)
-{
- return sys_setsid();
-}
-
-static inline pid_t
-waitpid (int pid, int * wait_stat, int flags)
-{
- return sys_wait4(pid, wait_stat, flags, NULL);
-}
-
-
-extern int execve (const char *filename, char *const av[], char *const ep[]);
-extern pid_t clone (unsigned long flags, void *sp);
-
-#endif /* __KERNEL_SYSCALLS__ */
-
asmlinkage unsigned long sys_mmap(
unsigned long addr, unsigned long len,
int prot, int flags,
Index: linux-cg/arch/alpha/Kconfig
===================================================================
--- linux-cg.orig/arch/alpha/Kconfig 2006-08-20 19:06:01.000000000 +0200
+++ linux-cg/arch/alpha/Kconfig 2006-08-20 19:06:02.000000000 +0200
@@ -524,6 +524,9 @@
depends on SMP
default y

+config HAVE_KERNEL_EXECVE
+ def_bool y
+
config NR_CPUS
int "Maximum number of CPUs (2-64)"
range 2 64
Index: linux-cg/arch/arm/Kconfig
===================================================================
--- linux-cg.orig/arch/arm/Kconfig 2006-08-20 19:06:01.000000000 +0200
+++ linux-cg/arch/arm/Kconfig 2006-08-20 19:06:02.000000000 +0200
@@ -77,6 +77,9 @@
config GENERIC_BUST_SPINLOCK
bool

+config HAVE_KERNEL_EXECVE
+ def_bool y
+
config ARCH_MAY_HAVE_PC_FDC
bool

Index: linux-cg/arch/arm26/Kconfig
===================================================================
--- linux-cg.orig/arch/arm26/Kconfig 2006-08-20 19:06:01.000000000 +0200
+++ linux-cg/arch/arm26/Kconfig 2006-08-20 19:06:02.000000000 +0200
@@ -52,6 +52,9 @@
config GENERIC_BUST_SPINLOCK
bool

+config HAVE_KERNEL_EXECVE
+ def_bool y
+
config GENERIC_ISA_DMA
bool

Index: linux-cg/arch/ia64/Kconfig
===================================================================
--- linux-cg.orig/arch/ia64/Kconfig 2006-08-20 19:06:01.000000000 +0200
+++ linux-cg/arch/ia64/Kconfig 2006-08-20 19:06:02.000000000 +0200
@@ -54,6 +54,9 @@
bool
default y

+config HAVE_KERNEL_EXECVE
+ def_bool y
+
config GENERIC_IOMAP
bool
default y
Index: linux-cg/arch/parisc/Kconfig
===================================================================
--- linux-cg.orig/arch/parisc/Kconfig 2006-08-20 19:06:01.000000000 +0200
+++ linux-cg/arch/parisc/Kconfig 2006-08-20 19:06:02.000000000 +0200
@@ -37,6 +37,9 @@
bool
default y

+config HAVE_KERNEL_EXECVE
+ def_bool y
+
config TIME_LOW_RES
bool
depends on SMP
Index: linux-cg/arch/powerpc/Kconfig
===================================================================
--- linux-cg.orig/arch/powerpc/Kconfig 2006-08-20 19:06:01.000000000 +0200
+++ linux-cg/arch/powerpc/Kconfig 2006-08-20 19:06:02.000000000 +0200
@@ -53,6 +53,9 @@
bool
default y

+config HAVE_KERNEL_EXECVE
+ def_bool y
+
config PPC
bool
default y
Index: linux-cg/arch/alpha/kernel/alpha_ksyms.c
===================================================================
--- linux-cg.orig/arch/alpha/kernel/alpha_ksyms.c 2006-08-20 19:09:47.000000000 +0200
+++ linux-cg/arch/alpha/kernel/alpha_ksyms.c 2006-08-20 19:09:48.000000000 +0200
@@ -116,7 +116,7 @@
EXPORT_SYMBOL(sys_exit);
EXPORT_SYMBOL(sys_write);
EXPORT_SYMBOL(sys_lseek);
-EXPORT_SYMBOL(execve);
+EXPORT_SYMBOL(kernel_execve);
EXPORT_SYMBOL(sys_setsid);
EXPORT_SYMBOL(sys_wait4);

2006-08-20 17:37:19

by Chase Venters

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Sunday 20 August 2006 12:13, Arnd Bergmann wrote:

> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ linux-cg/lib/execve.c 2006-08-20 19:06:00.000000000 +0200
> @@ -0,0 +1,19 @@
> +#include <asm/bug.h>
> +#include <asm/uaccess.h>
> +
> +#define __KERNEL_SYSCALLS__
> +static int errno;
> +#include <asm/unistd.h>
> +
> +int kernel_execve(const char *filename, char *const argv[], char *const
> envp[]) +{
> + mm_segment_t fs = get_fs();
> + int ret;
> +
> + WARN_ON(segment_eq(fs, USER_DS));
> + ret = execve(filename, (char **)argv, (char **)envp);
> + if (ret)
> + ret = errno;
> +
> + return ret;
> +}

I noticed this global errno in lib/errno.c a while ago and was wondering what
the right way to clean it up is. From what I remember, no one actually uses
errno in the kernel (unless it's an "errno" they've defined locally). The
only other place errno gets used is by all of the syscall macros.

Unless there's some TLS kernel magic that I've totally missed, using errno in
this manner is totally unsafe anyway. So I would NAK the above because your
kernel_execve() function gives an unsafe errno value significance it should
not have by turning it into a return value. (As an aside, shouldn't that have
read [ ret = -errno; ] anyway?)

Unless 'errno' has some significant reason to live on in the kernel, I think
it would be better to kill it and write kernel syscall macros that don't muck
with it.

Thanks,
Chase

2006-08-20 18:25:44

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Sun, 20 Aug 2006 12:36:49 -0500
Chase Venters <[email protected]> wrote:

> Unless 'errno' has some significant reason to live on in the kernel, I think
> it would be better to kill it and write kernel syscall macros that don't muck
> with it.

We have been working in that direction. It's certainly something we'd like
to kill off.

2006-08-20 18:33:07

by Chase Venters

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Sunday 20 August 2006 13:25, Andrew Morton wrote:
> On Sun, 20 Aug 2006 12:36:49 -0500
>
> Chase Venters <[email protected]> wrote:
> > Unless 'errno' has some significant reason to live on in the kernel, I
> > think it would be better to kill it and write kernel syscall macros that
> > don't muck with it.
>
> We have been working in that direction. It's certainly something we'd like
> to kill off.

Perhaps Arnd's patch is a good step in that direction then. A secondary
suggestion is to put a big comment there that explains "Yes, we know this is
ugly, it's going to die soon."

I'd also consider going so far as just returning -1 if we failed, since we
can't quite trust errno anyway.

Thanks,
Chase

2006-08-20 19:31:58

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Sunday 20 August 2006 19:36, Chase Venters wrote:

> Unless there's some TLS kernel magic that I've totally missed, using errno in
> this manner is totally unsafe anyway. So I would NAK the above because your
> kernel_execve() function gives an unsafe errno value significance it should
> not have by turning it into a return value.

It has always resulted in an unsafe errno value, my patch just fixes it on
a few architectures and makes it safe there. Note that never even noticed
execve returning -1 on some architectures and -errno on others, and if
execve succeeds, errno is never assigned anyway.

> (As an aside, shouldn't that have read [ ret = -errno; ] anyway?)

Right, thanks for pointing this out.

> Unless 'errno' has some significant reason to live on in the kernel, I think
> it would be better to kill it and write kernel syscall macros that don't muck
> with it.

The direction in which this patch goes is to kill off kernel syscalls
entirely. The main problem there is that kernel_execve needs an architecture
specific implementation (calling sys_execve does the wrong thing), so doing
it all in one step would require knowing how to do it on all 20 architectures.
Once the execve kernel syscall is gone, errno can die with it.

Arnd <><

2006-08-20 19:45:59

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On 2006.08.20 13:32:39 -0500, Chase Venters wrote:
> On Sunday 20 August 2006 13:25, Andrew Morton wrote:
> > On Sun, 20 Aug 2006 12:36:49 -0500
> >
> > Chase Venters <[email protected]> wrote:
> > > Unless 'errno' has some significant reason to live on in the kernel, I
> > > think it would be better to kill it and write kernel syscall macros that
> > > don't muck with it.
> >
> > We have been working in that direction. It's certainly something we'd like
> > to kill off.
>
> Perhaps Arnd's patch is a good step in that direction then. A secondary
> suggestion is to put a big comment there that explains "Yes, we know this is
> ugly, it's going to die soon."
>
> I'd also consider going so far as just returning -1 if we failed, since we
> can't quite trust errno anyway.

Could we rename __syscall_return to IS_SYS_ERR (or whatever) and force
kernel syscall users to do the check? That way we could eliminate errno
and still provide the real error code to the code using the syscall.

Bj?rn

2006-08-20 19:51:27

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

\
> Could we rename __syscall_return to IS_SYS_ERR (or whatever) and force
> kernel syscall users to do the check? That way we could eliminate errno

s/users/user/ .. there's one left that should die out soon ;)


2006-08-20 20:11:22

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On 2006.08.20 21:50:46 +0200, Arjan van de Ven wrote:
> \
> > Could we rename __syscall_return to IS_SYS_ERR (or whatever) and force
> > kernel syscall users to do the check? That way we could eliminate errno
>
> s/users/user/ .. there's one left that should die out soon ;)
>

Only one in unistd.h, but throughout the kernel there are quite a few
unless I'm missing something here:
doener@atjola:~/src/kernel/linux-2.6$ grep \ _syscall * -R | \
> grep -v define\\\|undef\\\|clobber | wc -l
116

Are these just going to be replaced by calls to sys_whatever?

Bj?rn

2006-08-20 20:20:40

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Sun, 2006-08-20 at 22:11 +0200, Björn Steinbrink wrote:
> On 2006.08.20 21:50:46 +0200, Arjan van de Ven wrote:
> > \
> > > Could we rename __syscall_return to IS_SYS_ERR (or whatever) and force
> > > kernel syscall users to do the check? That way we could eliminate errno
> >
> > s/users/user/ .. there's one left that should die out soon ;)
> >
>
> Only one in unistd.h, but throughout the kernel there are quite a few
> unless I'm missing something here:
> doener@atjola:~/src/kernel/linux-2.6$ grep \ _syscall * -R | \
> > grep -v define\\\|undef\\\|clobber | wc -l
> 116
>
> Are these just going to be replaced by calls to sys_whatever?

they're not the users of this, they're the definitions... ;)


--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com

2006-08-20 20:33:58

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Sunday 20 August 2006 22:11, Bj?rn Steinbrink wrote:
> Only one in unistd.h, but throughout the kernel there are quite a few
> unless I'm missing something here:
> doener@atjola:~/src/kernel/linux-2.6$ grep \ _syscall * -R | \
> > grep -v define\\\|undef\\\|clobber | wc -l
> 116

there are only a few direct calls that managed to sneak in after we removed
them all some time ago:

| arch/sh64/kernel/process.c: _syscall0(int, getpid)
| arch/sh64/kernel/process.c: _syscall1(int, getpgid, int, pid)
| arch/sh64/kernel/process.c:static __inline__ _syscall2(int,clone,unsigned long,flags,unsigned long,newsp)
| arch/sh64/kernel/process.c:static __inline__ _syscall1(int,exit,int,ret)

These should be replaced with calls to sys_*, or whatever the other
architectures do in order to implement the respective functions.

| arch/um/os-Linux/sys-i386/tls.c:static _syscall1(int, get_thread_area, user_desc_t *, u_info);
| arch/um/os-Linux/process.c:inline _syscall0(pid_t, getpid)
| arch/um/os-Linux/tls.c:static _syscall1(int, get_thread_area, user_desc_t *, u_info);
| arch/um/os-Linux/tls.c:static _syscall1(int, set_thread_area, user_desc_t *, u_info);
| arch/um/sys-i386/unmap.c:static inline _syscall2(int,munmap,void *,start,size_t,len)
| arch/um/sys-i386/unmap.c:static inline _syscall6(void *,mmap2,void *,addr,size_t,len,int,prot,int,flags,int,fd,off_t,offset)
| arch/um/sys-x86_64/unmap.c:static inline _syscall2(int,munmap,void *,start,size_t,len)
| arch/um/sys-x86_64/unmap.c:static inline _syscall6(void *,mmap,void *,addr,size_t,len,int,prot,int,flags,int,fd,off_t,offset)

UML is special, there may be a good reason to use them, if they are not
actually kernel syscalls, but instead calls to the host OS.

Arnd <><

2006-08-20 20:36:12

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On 2006.08.20 22:20:28 +0200, Arjan van de Ven wrote:
> On Sun, 2006-08-20 at 22:11 +0200, Bj?rn Steinbrink wrote:
> > On 2006.08.20 21:50:46 +0200, Arjan van de Ven wrote:
> > > \
> > > > Could we rename __syscall_return to IS_SYS_ERR (or whatever) and force
> > > > kernel syscall users to do the check? That way we could eliminate errno
> > >
> > > s/users/user/ .. there's one left that should die out soon ;)
> > >
> >
> > Only one in unistd.h, but throughout the kernel there are quite a few
> > unless I'm missing something here:
> > doener@atjola:~/src/kernel/linux-2.6$ grep \ _syscall * -R | \
> > > grep -v define\\\|undef\\\|clobber | wc -l
> > 116
> >
> > Are these just going to be replaced by calls to sys_whatever?
>
> they're not the users of this, they're the definitions... ;)

Well, I assume that if some code defines a syscall, it will actually use
it. Of course I meant to ask if the users of those definitions are going
to just call sys_whatever.
For example check_host_supports_tls in arch/um/os-Linux/sys-i386/tls.c
which even uses the global errno (although in that case the whole
else part could probably be just removed).

Bj?rn

2006-08-20 20:41:15

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Sun, 2006-08-20 at 22:36 +0200, Björn Steinbrink wrote:
> On 2006.08.20 22:20:28 +0200, Arjan van de Ven wrote:
> > On Sun, 2006-08-20 at 22:11 +0200, Björn Steinbrink wrote:
> > > On 2006.08.20 21:50:46 +0200, Arjan van de Ven wrote:
> > > > \
> > > > > Could we rename __syscall_return to IS_SYS_ERR (or whatever) and force
> > > > > kernel syscall users to do the check? That way we could eliminate errno
> > > >
> > > > s/users/user/ .. there's one left that should die out soon ;)
> > > >
> > >
> > > Only one in unistd.h, but throughout the kernel there are quite a few
> > > unless I'm missing something here:
> > > doener@atjola:~/src/kernel/linux-2.6$ grep \ _syscall * -R | \
> > > > grep -v define\\\|undef\\\|clobber | wc -l
> > > 116
> > >
> > > Are these just going to be replaced by calls to sys_whatever?
> >
> > they're not the users of this, they're the definitions... ;)
>
> Well, I assume that if some code defines a syscall, it will actually use
> it. Of course I meant to ask if the users of those definitions are going
> to just call sys_whatever.
> For example check_host_supports_tls in arch/um/os-Linux/sys-i386/tls.c
> which even uses the global errno (although in that case the whole
> else part could probably be just removed).

um uses glibc, and is thus special.. lets ignore that ;)
(really, it's an entire different beast in this regard)

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com

2006-08-21 00:37:03

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

Arnd Bergmann writes:

> Iit turned out most of the architectures that already implement
> their own execve() call instead of using the _syscall3 function
> for it end up passing the return value of sys_execve down,
> instead of setting errno.

I really don't like having an "errno" variable in the kernel. What if
two processes are doing an execve concurrently?

Anyway, your patch returns the (positive) errno value here:

> + WARN_ON(segment_eq(fs, USER_DS));
> + ret = execve(filename, (char **)argv, (char **)envp);
> + if (ret)
> + ret = errno;
> +
> + return ret;

but here we are testing for a negative value to mean error:

> - if (execve("/sbin/shutdown", argv, envp) < 0) {
> + if (kernel_execve("/sbin/shutdown", argv, envp) < 0) {

Paul.

2006-08-21 01:57:09

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Sun, Aug 20, 2006 at 10:36:04PM +0200, Bj?rn Steinbrink wrote:
> For example check_host_supports_tls in arch/um/os-Linux/sys-i386/tls.c
> which even uses the global errno (although in that case the whole
> else part could probably be just removed).

UML is different. It uses errno extensively (as it must) on the glibc side
of things. On the kernel side, there are no uses of errno that I'm aware of.

Jeff

2006-08-21 15:12:32

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Monday 21 August 2006 02:36, Paul Mackerras wrote:
> > Iit turned out most of the architectures that already implement
> > their own execve() call instead of using the _syscall3 function
> > for it end up passing the return value of sys_execve down,
> > instead of setting errno.
>
> I really don't like having an "errno" variable in the kernel. ?What if
> two processes are doing an execve concurrently?

The point is that we have two different schemes in the kernel that
conflict:

alpha, arm{,26}, ia64, parisc, powerpc and x86_64 pass the error
code from execve, all others pass -1 and set the global errno.

So the caller does not really have a chance to get the correct error
value at all. Bjoern's first patch changed one caller from looking
at the return value to looking at errno in case of an error, which
shifts the problem to other architectures.

My patch makes the errno variable local to execve, which slightly
helps, and makes it easier to get it right completely right
by doing the same as powerpc or parisc.

Now, we could do a truely evil involving a nested function, like

#include <asm/bug.h>
#include <asm/uaccess.h>
#define __KERNEL_SYSCALLS__
#include <linux/unistd.h>
int kernel_execve(const char *filename, char *const argv[], char *const envp[])
{
mm_segment_t fs = get_fs();
int errno;
int ret;
_syscall3(int,execve,const char *,file,char *const*,argv,char *const*,envp)
WARN_ON(segment_eq(fs, USER_DS));
ret = execve(filename, argv, envp);
if (ret)
ret = -errno;
return ret;
}

That would solve the problem of races on the errno variable,
but set a bad example to other hackers.

> Anyway, your patch returns the (positive) errno value here:
>
> > +?????WARN_ON(segment_eq(fs, USER_DS));
> > +?????ret = execve(filename, (char **)argv, (char **)envp);
> > +?????if (ret)
> > +?????????????ret = errno;
> > +
> > +?????return ret;
>
> but here we are testing for a negative value to mean error:
>
> > -?????if (execve("/sbin/shutdown", argv, envp) < 0) {
> > +?????if (kernel_execve("/sbin/shutdown", argv, envp) < 0) {

Yes, Chase Venters already noticed that bug. If obviously needs
to be 'ret = -errno;'.

Arnd <><

2006-08-21 15:17:31

by Russell King

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Mon, Aug 21, 2006 at 05:12:17PM +0200, Arnd Bergmann wrote:
> On Monday 21 August 2006 02:36, Paul Mackerras wrote:
> > > Iit turned out most of the architectures that already implement
> > > their own execve() call instead of using the _syscall3 function
> > > for it end up passing the return value of sys_execve down,
> > > instead of setting errno.
> >
> > I really don't like having an "errno" variable in the kernel. ?What if
> > two processes are doing an execve concurrently?
>
> The point is that we have two different schemes in the kernel that
> conflict:
>
> alpha, arm{,26}, ia64, parisc, powerpc and x86_64 pass the error
> code from execve, all others pass -1 and set the global errno.

Indeed, and rather than fixing execve() for one set of architectures
and by doing that breaking the other set, the point of this change is
to fix _all_ architectures in the most expedient way.

At a later date, those architectures who are using the global errno
can have that _separate_ bug fixed.

Let's fix one bug at a time. Especially as this probably needs to go
in to -rc.

Arnd - thanks for taking this on.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-08-22 07:30:27

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Mon, 2006-08-21 at 17:12 +0200, Arnd Bergmann wrote:
> On Monday 21 August 2006 02:36, Paul Mackerras wrote:
> > > Iit turned out most of the architectures that already implement
> > > their own execve() call instead of using the _syscall3 function
> > > for it end up passing the return value of sys_execve down,
> > > instead of setting errno.
> >
> > I really don't like having an "errno" variable in the kernel. What if
> > two processes are doing an execve concurrently?
>
> The point is that we have two different schemes in the kernel that
> conflict:
>
> alpha, arm{,26}, ia64, parisc, powerpc and x86_64 pass the error
> code from execve, all others pass -1 and set the global errno.

All other need to be fixed then... having an errno is just plain wrong.


2006-08-22 08:00:50

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On 2006.08.22 17:29:02 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2006-08-21 at 17:12 +0200, Arnd Bergmann wrote:
> > On Monday 21 August 2006 02:36, Paul Mackerras wrote:
> > > > Iit turned out most of the architectures that already implement
> > > > their own execve() call instead of using the _syscall3 function
> > > > for it end up passing the return value of sys_execve down,
> > > > instead of setting errno.
> > >
> > > I really don't like having an "errno" variable in the kernel. What if
> > > two processes are doing an execve concurrently?
> >
> > The point is that we have two different schemes in the kernel that
> > conflict:
> >
> > alpha, arm{,26}, ia64, parisc, powerpc and x86_64 pass the error
> > code from execve, all others pass -1 and set the global errno.
>
> All other need to be fixed then... having an errno is just plain wrong.

I'm working on a patch loosely based on Arnd's that changes the
in-kernel syscall macros to directly return the error codes. Once
kernel_execve is implemented for each arch, only um should remain as a
user and I found only two calls there that care about the exact
non-zero return value, both are simple to adapt.
That should allow to get rid of errno completely. If someone knows a
reason why this is destined to fail (maybe syscalls returning char?!),
please let me know before I waste too much time on it ;)

Bj?rn

2006-08-22 10:07:58

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Tuesday 22 August 2006 10:00, Bj?rn Steinbrink wrote:
> I'm working on a patch loosely based on Arnd's that changes the
> in-kernel syscall macros to directly return the error codes.

I think that is still going in the wrong direction. Traditionally,
the macros in unistd.h were meant for user space, but we're now
discouraging that strongly (i.e. they are inside of #ifdef __KERNEL__).
The only in-kernel users on the _syscall macros used to by the
__KERNEL_SYSCALLS__ that we're trying to kill.

The logical consequence should be that we remove the _syscall macros
entirely, for all architectures.
UML can be converted to use the syscall function provided by libc
in order to call the host OS.

Arnd <><

2006-08-22 13:41:27

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Tue, Aug 22, 2006 at 12:06:59PM +0200, Arnd Bergmann wrote:
> UML can be converted to use the syscall function provided by libc
> in order to call the host OS.

You're contemplating changing UML to do, e.g.
syscall(NR_write, fd, buf, len)
instead of the current
write(fd, buf,len)
?

That hardly seems like an improvement and it seems fairly unnecessary.

Jeff

2006-08-22 15:13:50

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Tuesday 22 August 2006 15:39, Jeff Dike wrote:
> You're contemplating changing UML to do, e.g.
> ????????syscall(NR_write, fd, buf, len)
> instead of the current
> ????????write(fd, buf,len)
> ?
>
> That hardly seems like an improvement and it seems fairly unnecessary.
>
No, that's not what I was referring to. I was thinking of the calls:

arch/um/os-Linux/process.c:inline _syscall0(pid_t, getpid)
arch/um/os-Linux/sys-i386/tls.c:static _syscall1(int, get_thread_area, user_desc_t *, u_info);
arch/um/os-Linux/tls.c:static _syscall1(int, get_thread_area, user_desc_t *, u_info);
arch/um/os-Linux/tls.c:static _syscall1(int, set_thread_area, user_desc_t *, u_info);
arch/um/sys-i386/unmap.c:static inline _syscall2(int,munmap,void *,start,size_t,len)
arch/um/sys-i386/unmap.c:static inline _syscall6(void *,mmap2,void *,addr,size_t,len,int,prot,int,flags,int,fd,off_t,offset)
arch/um/sys-x86_64/unmap.c:static inline _syscall2(int,munmap,void *,start,size_t,len)
arch/um/sys-x86_64/unmap.c:static inline _syscall6(void *,mmap,void *,addr,size_t,len,int,prot,int,flags,int,fd,off_t,offset)

Are these for calling the host OS or calling the UML kernel?
If they are for the host, they can be implemented using syscall(),
otherwise by calling the sys_* functions directly.

Arnd <><

2006-08-22 15:38:31

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH] introduce kernel_execve function to replace __KERNEL_SYSCALLS__

On Tue, Aug 22, 2006 at 05:13:39PM +0200, Arnd Bergmann wrote:
> No, that's not what I was referring to. I was thinking of the calls:
>
> arch/um/os-Linux/process.c:inline _syscall0(pid_t, getpid)
> arch/um/os-Linux/sys-i386/tls.c:static _syscall1(int, get_thread_area, user_desc_t *, u_info);
> arch/um/os-Linux/tls.c:static _syscall1(int, get_thread_area, user_desc_t *, u_info);
> arch/um/os-Linux/tls.c:static _syscall1(int, set_thread_area, user_desc_t *, u_info);
> arch/um/sys-i386/unmap.c:static inline _syscall2(int,munmap,void *,start,size_t,len)
> arch/um/sys-i386/unmap.c:static inline _syscall6(void *,mmap2,void *,addr,size_t,len,int,prot,int,flags,int,fd,off_t,offset)
> arch/um/sys-x86_64/unmap.c:static inline _syscall2(int,munmap,void *,start,size_t,len)
> arch/um/sys-x86_64/unmap.c:static inline _syscall6(void *,mmap,void *,addr,size_t,len,int,prot,int,flags,int,fd,off_t,offset)
>
> Are these for calling the host OS or calling the UML kernel?
> If they are for the host, they can be implemented using syscall(),
> otherwise by calling the sys_* functions directly.

OK, these are all calling the host, and using syscall() instead sounds
reasonable.

Jeff