2002-12-04 06:56:32

by Stephen Rothwell

[permalink] [raw]
Subject: [PATCH] compatibility syscall layer (lets try again)

Hi Linus,

Below is the generic part of the start of the compatibility syscall layer.
I think I have made it generic enough that each architecture can define
what compatibility means.

To use this,an architecture must create asm/compat.h and provide typedefs
for (currently) compat_time_t, compat_suseconds_t, struct compat_timespec.

Hopefully, this is what you had in mind - ohterwise back to the drawing
board.

I will follow this posting with the architecture specific patches that I
have done but not tested.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

diff -ruN 2.5.50-BK.2/fs/open.c 2.5.50-BK.2-32bit.1/fs/open.c
--- 2.5.50-BK.2/fs/open.c 2002-12-04 12:07:36.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/fs/open.c 2002-12-04 12:01:36.000000000 +1100
@@ -280,7 +280,7 @@
* must be owner or have write permission.
* Else, update from *times, must be owner or super user.
*/
-asmlinkage long sys_utimes(char * filename, struct timeval * utimes)
+long do_utimes(char * filename, struct timeval * times)
{
int error;
struct nameidata nd;
@@ -299,11 +299,7 @@

/* Don't worry, the checks are done in inode_change_ok() */
newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
- if (utimes) {
- struct timeval times[2];
- error = -EFAULT;
- if (copy_from_user(&times, utimes, sizeof(times)))
- goto dput_and_out;
+ if (times) {
newattrs.ia_atime.tv_sec = times[0].tv_sec;
newattrs.ia_atime.tv_nsec = times[0].tv_usec * 1000;
newattrs.ia_mtime.tv_sec = times[1].tv_sec;
@@ -323,6 +319,16 @@
return error;
}

+asmlinkage long sys_utimes(char * filename, struct timeval * utimes)
+{
+ struct timeval times[2];
+
+ if (utimes && copy_from_user(&times, utimes, sizeof(times)))
+ return -EFAULT;
+ return do_utimes(filename, utimes ? times : NULL);
+}
+
+
/*
* access() needs to use the real uid/gid, not the effective uid/gid.
* We do this by temporarily clearing all FS-related capabilities and
diff -ruN 2.5.50-BK.2/include/linux/compat.h 2.5.50-BK.2-32bit.1/include/linux/compat.h
--- 2.5.50-BK.2/include/linux/compat.h 1970-01-01 10:00:00.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/linux/compat.h 2002-12-04 15:42:36.000000000 +1100
@@ -0,0 +1,29 @@
+#ifndef _LINUX_COMPAT_H
+#define _LINUX_COMPAT_H
+/*
+ * These are the type definitions for the arhitecure sepcific
+ * compatibility layer.
+ */
+#include <linux/config.h>
+
+#ifdef CONFIG_COMPAT
+
+#include <asm/compat.h>
+
+struct compat_timeval {
+ compat_time_t tv_sec;
+ compat_suseconds_t tv_usec;
+};
+
+struct compat_utimbuf {
+ compat_time_t actime;
+ compat_time_t modtime;
+};
+
+struct compat_itimerval {
+ struct compat_timeval it_interval;
+ struct compat_timeval it_value;
+};
+
+#endif /* CONFIG_COMPAT */
+#endif /* _LINUX_COMPAT_H */
diff -ruN 2.5.50-BK.2/include/linux/time.h 2.5.50-BK.2-32bit.1/include/linux/time.h
--- 2.5.50-BK.2/include/linux/time.h 2002-11-18 15:47:56.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/include/linux/time.h 2002-12-03 15:47:26.000000000 +1100
@@ -138,6 +138,8 @@
#ifdef __KERNEL__
extern void do_gettimeofday(struct timeval *tv);
extern void do_settimeofday(struct timeval *tv);
+extern long do_nanosleep(struct timespec *t);
+extern long do_utimes(char * filename, struct timeval * times);
#endif

#define FD_SETSIZE __FD_SETSIZE
diff -ruN 2.5.50-BK.2/kernel/Makefile 2.5.50-BK.2-32bit.1/kernel/Makefile
--- 2.5.50-BK.2/kernel/Makefile 2002-11-28 10:34:59.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/kernel/Makefile 2002-12-03 15:42:28.000000000 +1100
@@ -21,6 +21,7 @@
obj-$(CONFIG_CPU_FREQ) += cpufreq.o
obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend.o
+obj-$(CONFIG_COMPAT) += compat.o

ifneq ($(CONFIG_IA64),y)
# According to Alan Modra <[email protected]>, the -fno-omit-frame-pointer is
diff -ruN 2.5.50-BK.2/kernel/compat.c 2.5.50-BK.2-32bit.1/kernel/compat.c
--- 2.5.50-BK.2/kernel/compat.c 1970-01-01 10:00:00.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/kernel/compat.c 2002-12-04 17:40:08.000000000 +1100
@@ -0,0 +1,114 @@
+/*
+ * linux/kernel/compat.c
+ *
+ * Kernel compatibililty routines for e.g. 32 bit syscall support
+ * on 64 bit kernels.
+ *
+ * Copyright (C) 2002 Stephen Rothwell, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <linux/compat.h>
+#include <linux/errno.h>
+#include <linux/time.h>
+
+#include <asm/uaccess.h>
+
+asmlinkage long compat_sys_nanosleep(struct compat_timespec *rqtp,
+ struct compat_timespec *rmtp)
+{
+ struct timespec t;
+ struct compat_timespec ct;
+ s32 ret;
+
+ if (copy_from_user(&ct, rqtp, sizeof(ct)))
+ return -EFAULT;
+ t.tv_sec = ct.tv_sec;
+ t.tv_nsec = ct.tv_nsec;
+ ret = do_nanosleep(&t);
+ if (rmtp && (ret == -EINTR)) {
+ ct.tv_sec = t.tv_sec;
+ ct.tv_nsec = t.tv_nsec;
+ if (copy_to_user(rmtp, &ct, sizeof(ct)))
+ return -EFAULT;
+ }
+ return ret;
+}
+
+/*
+ * Not all architectures have sys_utime, so implement this in terms
+ * of sys_utimes.
+ */
+asmlinkage long compat_sys_utime(char *filename, struct compat_utimbuf *t)
+{
+ struct timeval tv[2];
+
+ if (t) {
+ if (get_user(tv[0].tv_sec, &t->actime) ||
+ get_user(tv[1].tv_sec, &t->modtime))
+ return -EFAULT;
+ tv[0].tv_usec = 0;
+ tv[1].tv_usec = 0;
+ }
+ return do_utimes(filename, t ? tv : NULL);
+}
+
+
+static inline long get_compat_itimerval(struct itimerval *o,
+ struct compat_itimerval *i)
+{
+ return (!access_ok(VERIFY_READ, i, sizeof(*i)) ||
+ (__get_user(o->it_interval.tv_sec, &i->it_interval.tv_sec) |
+ __get_user(o->it_interval.tv_usec, &i->it_interval.tv_usec) |
+ __get_user(o->it_value.tv_sec, &i->it_value.tv_sec) |
+ __get_user(o->it_value.tv_usec, &i->it_value.tv_usec)));
+}
+
+static inline long put_compat_itimerval(struct compat_itimerval *o,
+ struct itimerval *i)
+{
+ return (!access_ok(VERIFY_WRITE, o, sizeof(*o)) ||
+ (__put_user(i->it_interval.tv_sec, &o->it_interval.tv_sec) |
+ __put_user(i->it_interval.tv_usec, &o->it_interval.tv_usec) |
+ __put_user(i->it_value.tv_sec, &o->it_value.tv_sec) |
+ __put_user(i->it_value.tv_usec, &o->it_value.tv_usec)));
+}
+
+extern int do_getitimer(int which, struct itimerval *value);
+
+asmlinkage long compat_sys_getitimer(int which, struct compat_itimerval *it)
+{
+ struct itimerval kit;
+ int error;
+
+ error = do_getitimer(which, &kit);
+ if (!error && put_compat_itimerval(it, &kit))
+ error = -EFAULT;
+ return error;
+}
+
+extern int do_setitimer(int which, struct itimerval *, struct itimerval *);
+
+asmlinkage long compat_sys_setitimer(int which, struct compat_itimerval *in,
+ struct compat_itimerval *out)
+{
+ struct itimerval kin, kout;
+ int error;
+
+ if (in) {
+ if (get_compat_itimerval(&kin, in))
+ return -EFAULT;
+ } else
+ memset(&kin, 0, sizeof(kin));
+
+ error = do_setitimer(which, &kin, out ? &kout : NULL);
+ if (error || !out)
+ return error;
+ if (put_compat_itimerval(out, &kout))
+ return -EFAULT;
+ return 0;
+}
diff -ruN 2.5.50-BK.2/kernel/timer.c 2.5.50-BK.2-32bit.1/kernel/timer.c
--- 2.5.50-BK.2/kernel/timer.c 2002-12-04 12:07:39.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/kernel/timer.c 2002-12-04 12:03:23.000000000 +1100
@@ -1020,33 +1020,41 @@
return current->pid;
}

-asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)
+long do_nanosleep(struct timespec *t)
{
- struct timespec t;
unsigned long expire;

- if(copy_from_user(&t, rqtp, sizeof(struct timespec)))
- return -EFAULT;
-
- if (t.tv_nsec >= 1000000000L || t.tv_nsec < 0 || t.tv_sec < 0)
+ if ((t->tv_nsec >= 1000000000L) || (t->tv_nsec < 0) || (t->tv_sec < 0))
return -EINVAL;

- expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);
+ expire = timespec_to_jiffies(t) + (t->tv_sec || t->tv_nsec);

current->state = TASK_INTERRUPTIBLE;
expire = schedule_timeout(expire);

if (expire) {
- if (rmtp) {
- jiffies_to_timespec(expire, &t);
- if (copy_to_user(rmtp, &t, sizeof(struct timespec)))
- return -EFAULT;
- }
+ jiffies_to_timespec(expire, t);
return -EINTR;
}
return 0;
}

+asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)
+{
+ struct timespec t;
+ long ret;
+
+ if (copy_from_user(&t, rqtp, sizeof(t)))
+ return -EFAULT;
+
+ ret = do_nanosleep(&t);
+ if (rmtp && (ret == -EINTR)) {
+ if (copy_to_user(rmtp, &t, sizeof(t)))
+ return -EFAULT;
+ }
+ return ret;
+}
+
/*
* sys_sysinfo - fill in sysinfo struct
*/


2002-12-04 07:00:21

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer - PPC64

Hi Anton, Linus,

This is the PPC64 specific patch. It goes slightly further than necessary
by defining compat_size_t and compat_ssize_t.

This builds.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

diff -ruN 2.5.50-BK.2/arch/ppc64/Kconfig 2.5.50-BK.2-32bit.1/arch/ppc64/Kconfig
--- 2.5.50-BK.2/arch/ppc64/Kconfig 2002-12-04 12:07:31.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/ppc64/Kconfig 2002-12-04 12:18:35.000000000 +1100
@@ -33,6 +33,10 @@
bool
default y

+config COMPAT
+ bool
+ default y
+
source "init/Kconfig"


diff -ruN 2.5.50-BK.2/arch/ppc64/kernel/binfmt_elf32.c 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/binfmt_elf32.c
--- 2.5.50-BK.2/arch/ppc64/kernel/binfmt_elf32.c 2002-07-25 10:42:55.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/binfmt_elf32.c 2002-12-04 15:33:43.000000000 +1100
@@ -21,11 +21,7 @@
#include <linux/module.h>
#include <linux/config.h>
#include <linux/elfcore.h>
-
-struct timeval32
-{
- int tv_sec, tv_usec;
-};
+#include <linux/compat.h>

#define elf_prstatus elf_prstatus32
struct elf_prstatus32
@@ -38,10 +34,10 @@
pid_t pr_ppid;
pid_t pr_pgrp;
pid_t pr_sid;
- struct timeval32 pr_utime; /* User time */
- struct timeval32 pr_stime; /* System time */
- struct timeval32 pr_cutime; /* Cumulative user time */
- struct timeval32 pr_cstime; /* Cumulative system time */
+ struct compat_timeval pr_utime; /* User time */
+ struct compat_timeval pr_stime; /* System time */
+ struct compat_timeval pr_cutime; /* Cumulative user time */
+ struct compat_timeval pr_cstime; /* Cumulative system time */
elf_gregset_t pr_reg; /* General purpose registers. */
int pr_fpvalid; /* True if math co-processor being used. */
};
@@ -64,9 +60,9 @@

#include <linux/time.h>

-#define jiffies_to_timeval jiffies_to_timeval32
+#define jiffies_to_timeval jiffies_to_compat_timeval
static __inline__ void
-jiffies_to_timeval32(unsigned long jiffies, struct timeval32 *value)
+jiffies_to_compat_timeval(unsigned long jiffies, struct compat_timeval *value)
{
value->tv_usec = (jiffies % HZ) * (1000000L / HZ);
value->tv_sec = jiffies / HZ;
diff -ruN 2.5.50-BK.2/arch/ppc64/kernel/ioctl32.c 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/ioctl32.c
--- 2.5.50-BK.2/arch/ppc64/kernel/ioctl32.c 2002-11-11 14:55:28.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/ioctl32.c 2002-12-04 15:34:07.000000000 +1100
@@ -22,6 +22,7 @@

#include <linux/config.h>
#include <linux/types.h>
+#include <linux/compat.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/smp.h>
@@ -397,14 +398,9 @@
return err;
}

-struct timeval32 {
- int tv_sec;
- int tv_usec;
-};
-
static int do_siocgstamp(unsigned int fd, unsigned int cmd, unsigned long arg)
{
- struct timeval32 *up = (struct timeval32 *)arg;
+ struct compat_timeval *up = (struct compat_timeval *)arg;
struct timeval ktv;
mm_segment_t old_fs = get_fs();
int err;
@@ -1424,8 +1420,8 @@
#define PPPIOCSCOMPRESS32 _IOW('t', 77, struct ppp_option_data32)

struct ppp_idle32 {
- __kernel_time_t32 xmit_idle;
- __kernel_time_t32 recv_idle;
+ compat_time_t xmit_idle;
+ compat_time_t recv_idle;
};
#define PPPIOCGIDLE32 _IOR('t', 63, struct ppp_idle32)

diff -ruN 2.5.50-BK.2/arch/ppc64/kernel/misc.S 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/misc.S
--- 2.5.50-BK.2/arch/ppc64/kernel/misc.S 2002-12-04 12:07:31.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/misc.S 2002-12-04 17:40:50.000000000 +1100
@@ -538,7 +538,7 @@
.llong .sys_alarm
.llong .sys_ni_syscall /* old fstat syscall */
.llong .sys32_pause
- .llong .sys32_utime /* 30 */
+ .llong .compat_sys_utime /* 30 */
.llong .sys_ni_syscall /* old stty syscall */
.llong .sys_ni_syscall /* old gtty syscall */
.llong .sys32_access
@@ -612,8 +612,8 @@
.llong .sys_ioperm
.llong .sys32_socketcall
.llong .sys32_syslog
- .llong .sys32_setitimer
- .llong .sys32_getitimer /* 105 */
+ .llong .compat_sys_setitimer
+ .llong .compat_sys_getitimer /* 105 */
.llong .sys32_newstat
.llong .sys32_newlstat
.llong .sys32_newfstat
@@ -670,7 +670,7 @@
.llong .sys32_sched_get_priority_max
.llong .sys32_sched_get_priority_min /* 160 */
.llong .sys32_sched_rr_get_interval
- .llong .sys32_nanosleep
+ .llong .compat_sys_nanosleep
.llong .sys32_mremap
.llong .sys_setresuid
.llong .sys_getresuid /* 165 */
diff -ruN 2.5.50-BK.2/arch/ppc64/kernel/signal32.c 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/signal32.c
--- 2.5.50-BK.2/arch/ppc64/kernel/signal32.c 2002-10-21 01:02:45.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/signal32.c 2002-12-04 14:40:20.000000000 +1100
@@ -22,6 +22,7 @@
#include <linux/signal.h>
#include <linux/errno.h>
#include <linux/elf.h>
+#include <linux/compat.h>
#include <asm/ppc32.h>
#include <asm/uaccess.h>
#include <asm/ppcdebug.h>
@@ -53,11 +54,6 @@
#define MSR_USERCHANGE 0
#endif

-struct timespec32 {
- s32 tv_sec;
- s32 tv_nsec;
-};
-
struct sigregs32 {
/*
* the gp_regs array is 32 bit representation of the pt_regs
@@ -635,8 +631,7 @@
extern long sys_rt_sigpending(sigset_t *set, size_t sigsetsize);


-long sys32_rt_sigpending(sigset32_t *set,
- __kernel_size_t32 sigsetsize)
+long sys32_rt_sigpending(sigset32_t *set, compat_size_t sigsetsize)
{
sigset_t s;
sigset32_t s32;
@@ -708,7 +703,7 @@
size_t sigsetsize);

long sys32_rt_sigtimedwait(sigset32_t *uthese, siginfo_t32 *uinfo,
- struct timespec32 *uts, __kernel_size_t32 sigsetsize)
+ struct compat_timespec *uts, compat_size_t sigsetsize)
{
sigset_t s;
sigset32_t s32;
diff -ruN 2.5.50-BK.2/arch/ppc64/kernel/sys32.S 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/sys32.S
--- 2.5.50-BK.2/arch/ppc64/kernel/sys32.S 2002-12-04 12:07:32.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/sys32.S 2002-12-04 14:40:30.000000000 +1100
@@ -134,7 +134,7 @@
lwz r6,12(r10)
b .sys_recv

-_STATIC(do_sys_sendto) /* sys32_sendto(int, u32, __kernel_size_t32, unsigned int, u32, int) */
+_STATIC(do_sys_sendto) /* sys32_sendto(int, u32, compat_size_t, unsigned int, u32, int) */
mr r10,r4
lwa r3,0(r10)
lwz r4,4(r10)
@@ -144,7 +144,7 @@
lwa r8,20(r10)
b .sys32_sendto

-_STATIC(do_sys_recvfrom) /* sys32_recvfrom(int, u32, __kernel_size_t32, unsigned int, u32, u32) */
+_STATIC(do_sys_recvfrom) /* sys32_recvfrom(int, u32, compat_size_t, unsigned int, u32, u32) */
mr r10,r4
lwa r3,0(r10)
lwz r4,4(r10)
diff -ruN 2.5.50-BK.2/arch/ppc64/kernel/sys_ppc32.c 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/sys_ppc32.c
--- 2.5.50-BK.2/arch/ppc64/kernel/sys_ppc32.c 2002-12-04 12:07:32.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/ppc64/kernel/sys_ppc32.c 2002-12-04 16:26:18.000000000 +1100
@@ -22,7 +22,6 @@
#include <linux/mm.h>
#include <linux/file.h>
#include <linux/signal.h>
-#include <linux/utime.h>
#include <linux/resource.h>
#include <linux/times.h>
#include <linux/utsname.h>
@@ -58,6 +57,7 @@
#include <linux/binfmts.h>
#include <linux/dnotify.h>
#include <linux/security.h>
+#include <linux/compat.h>

#include <asm/types.h>
#include <asm/ipc.h>
@@ -73,38 +73,7 @@
#include <asm/ppc32.h>
#include <asm/mmu_context.h>

-extern asmlinkage long sys_utime(char * filename, struct utimbuf * times);
-
-struct utimbuf32 {
- __kernel_time_t32 actime, modtime;
-};
-
-asmlinkage long sys32_utime(char * filename, struct utimbuf32 *times)
-{
- struct utimbuf t;
- mm_segment_t old_fs;
- int ret;
- char *filenam;
-
- if (!times)
- return sys_utime(filename, NULL);
- if (get_user(t.actime, &times->actime) ||
- __get_user(t.modtime, &times->modtime))
- return -EFAULT;
- filenam = getname(filename);
- ret = PTR_ERR(filenam);
- if (!IS_ERR(filenam)) {
- old_fs = get_fs();
- set_fs (KERNEL_DS);
- ret = sys_utime(filenam, &t);
- set_fs (old_fs);
- putname (filenam);
- }
-
- return ret;
-}
-
-struct iovec32 { u32 iov_base; __kernel_size_t32 iov_len; };
+struct iovec32 { u32 iov_base; compat_size_t iov_len; };

typedef ssize_t (*io_fn_t)(struct file *, char *, size_t, loff_t *);
typedef ssize_t (*iov_fn_t)(struct file *, const struct iovec *, unsigned long, loff_t *);
@@ -112,7 +81,7 @@
static long do_readv_writev32(int type, struct file *file,
const struct iovec32 *vector, u32 count)
{
- __kernel_ssize_t32 tot_len;
+ compat_ssize_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
struct iovec *iov=iovstack, *ivp;
struct inode *inode;
@@ -159,8 +128,8 @@
ivp = iov;
retval = -EINVAL;
while(i > 0) {
- __kernel_ssize_t32 tmp = tot_len;
- __kernel_ssize_t32 len;
+ compat_ssize_t tmp = tot_len;
+ compat_ssize_t len;
u32 buf;

if (__get_user(len, &vector->iov_len) ||
@@ -168,10 +137,10 @@
retval = -EFAULT;
goto out;
}
- if (len < 0) /* size_t not fitting an ssize_t32 .. */
+ if (len < 0) /* size_t not fitting an compat_ssize_t .. */
goto out;
tot_len += len;
- if (tot_len < tmp) /* maths overflow on the ssize_t32 */
+ if (tot_len < tmp) /* maths overflow on the compat_ssize_t */
goto out;
ivp->iov_base = (void *)A(buf);
ivp->iov_len = (__kernel_size_t) len;
@@ -664,20 +633,6 @@

/* 32-bit timeval and related flotsam. */

-struct timeval32
-{
- int tv_sec, tv_usec;
-};
-
-struct itimerval32
-{
- struct timeval32 it_interval;
- struct timeval32 it_value;
-};
-
-
-
-
/*
* Ooo, nasty. We need here to frob 32-bit unsigned longs to
* 64-bit unsigned longs.
@@ -743,7 +698,7 @@
asmlinkage long sys32_select(int n, u32 *inp, u32 *outp, u32 *exp, u32 tvp_x)
{
fd_set_bits fds;
- struct timeval32 *tvp = (struct timeval32 *)AA(tvp_x);
+ struct compat_timeval *tvp = (struct compat_timeval *)AA(tvp_x);
char *bits;
unsigned long nn;
long timeout;
@@ -1021,7 +976,7 @@
u32 modes;
s32 offset, freq, maxerror, esterror;
s32 status, constant, precision, tolerance;
- struct timeval32 time;
+ struct compat_timeval time;
s32 tick;
s32 ppsfreq, jitter, shift, stabil;
s32 jitcnt, calcnt, errcnt, stbcnt;
@@ -1098,7 +1053,7 @@

extern asmlinkage unsigned long sys_create_module(const char *name_user, size_t size);

-asmlinkage unsigned long sys32_create_module(const char *name_user, __kernel_size_t32 size)
+asmlinkage unsigned long sys32_create_module(const char *name_user, compat_size_t size)
{
return sys_create_module(name_user, (size_t)size);
}
@@ -1181,7 +1136,7 @@
}

static int
-qm_modules(char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_modules(char *buf, size_t bufsize, compat_size_t *ret)
{
struct module *mod;
size_t nmod, space, len;
@@ -1216,7 +1171,7 @@
}

static int
-qm_deps(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_deps(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
size_t i, space, len;

@@ -1253,7 +1208,7 @@
}

static int
-qm_refs(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_refs(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
size_t nrefs, space, len;
struct module_ref *ref;
@@ -1297,7 +1252,7 @@
}

static inline int
-qm_symbols(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_symbols(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
size_t i, space, len;
struct module_symbol *s;
@@ -1356,7 +1311,7 @@
}

static inline int
-qm_info(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_info(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
int error = 0;

@@ -1389,7 +1344,7 @@
* proper conversion (sign extension) between the register representation of a signed int (msr in 32-bit mode)
* and the register representation of a signed int (msr in 64-bit mode) is performed.
*/
-asmlinkage long sys32_query_module(char *name_user, u32 which, char *buf, __kernel_size_t32 bufsize, u32 ret)
+asmlinkage long sys32_query_module(char *name_user, u32 which, char *buf, compat_size_t bufsize, u32 ret)
{
struct module *mod;
int err;
@@ -1425,19 +1380,19 @@
err = 0;
break;
case QM_MODULES:
- err = qm_modules(buf, bufsize, (__kernel_size_t32 *)AA(ret));
+ err = qm_modules(buf, bufsize, (compat_size_t *)AA(ret));
break;
case QM_DEPS:
- err = qm_deps(mod, buf, bufsize, (__kernel_size_t32 *)AA(ret));
+ err = qm_deps(mod, buf, bufsize, (compat_size_t *)AA(ret));
break;
case QM_REFS:
- err = qm_refs(mod, buf, bufsize, (__kernel_size_t32 *)AA(ret));
+ err = qm_refs(mod, buf, bufsize, (compat_size_t *)AA(ret));
break;
case QM_SYMBOLS:
- err = qm_symbols(mod, buf, bufsize, (__kernel_size_t32 *)AA(ret));
+ err = qm_symbols(mod, buf, bufsize, (compat_size_t *)AA(ret));
break;
case QM_INFO:
- err = qm_info(mod, buf, bufsize, (__kernel_size_t32 *)AA(ret));
+ err = qm_info(mod, buf, bufsize, (compat_size_t *)AA(ret));
break;
default:
err = -EINVAL;
@@ -1863,37 +1818,6 @@



-struct timespec32 {
- s32 tv_sec;
- s32 tv_nsec;
-};
-
-extern asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp);
-
-asmlinkage long sys32_nanosleep(struct timespec32 *rqtp, struct timespec32 *rmtp)
-{
- struct timespec t;
- int ret;
- mm_segment_t old_fs = get_fs ();
-
- if (get_user (t.tv_sec, &rqtp->tv_sec) ||
- __get_user (t.tv_nsec, &rqtp->tv_nsec))
- return -EFAULT;
- set_fs (KERNEL_DS);
- ret = sys_nanosleep(&t, rmtp ? &t : NULL);
- set_fs (old_fs);
- if (rmtp && ret == -EINTR) {
- if (__put_user (t.tv_sec, &rmtp->tv_sec) ||
- __put_user (t.tv_nsec, &rmtp->tv_nsec))
- return -EFAULT;
- }
-
- return ret;
-}
-
-
-
-
/* These are here just in case some old sparc32 binary calls it. */
asmlinkage long sys32_pause(void)
{
@@ -1905,32 +1829,14 @@



-static inline long get_it32(struct itimerval *o, struct itimerval32 *i)
-{
- return (!access_ok(VERIFY_READ, i, sizeof(*i)) ||
- (__get_user(o->it_interval.tv_sec, &i->it_interval.tv_sec) |
- __get_user(o->it_interval.tv_usec, &i->it_interval.tv_usec) |
- __get_user(o->it_value.tv_sec, &i->it_value.tv_sec) |
- __get_user(o->it_value.tv_usec, &i->it_value.tv_usec)));
-}
-
-static inline long put_it32(struct itimerval32 *o, struct itimerval *i)
-{
- return (!access_ok(VERIFY_WRITE, o, sizeof(*o)) ||
- (__put_user(i->it_interval.tv_sec, &o->it_interval.tv_sec) |
- __put_user(i->it_interval.tv_usec, &o->it_interval.tv_usec) |
- __put_user(i->it_value.tv_sec, &o->it_value.tv_sec) |
- __put_user(i->it_value.tv_usec, &o->it_value.tv_usec)));
-}
-
-static inline long get_tv32(struct timeval *o, struct timeval32 *i)
+static inline long get_tv32(struct timeval *o, struct compat_timeval *i)
{
return (!access_ok(VERIFY_READ, i, sizeof(*i)) ||
(__get_user(o->tv_sec, &i->tv_sec) |
__get_user(o->tv_usec, &i->tv_usec)));
}

-static inline long put_tv32(struct timeval32 *o, struct timeval *i)
+static inline long put_tv32(struct compat_timeval *o, struct timeval *i)
{
return (!access_ok(VERIFY_WRITE, o, sizeof(*o)) ||
(__put_user(i->tv_sec, &o->tv_sec) |
@@ -1940,54 +1846,6 @@



-extern int do_getitimer(int which, struct itimerval *value);
-
-/* Note: it is necessary to treat which as an unsigned int,
- * with the corresponding cast to a signed int to insure that the
- * proper conversion (sign extension) between the register representation of a signed int (msr in 32-bit mode)
- * and the register representation of a signed int (msr in 64-bit mode) is performed.
- */
-asmlinkage long sys32_getitimer(u32 which, struct itimerval32 *it)
-{
- struct itimerval kit;
- int error;
-
- error = do_getitimer((int)which, &kit);
- if (!error && put_it32(it, &kit))
- error = -EFAULT;
-
- return error;
-}
-
-
-
-extern int do_setitimer(int which, struct itimerval *, struct itimerval *);
-
-/* Note: it is necessary to treat which as an unsigned int,
- * with the corresponding cast to a signed int to insure that the
- * proper conversion (sign extension) between the register representation of a signed int (msr in 32-bit mode)
- * and the register representation of a signed int (msr in 64-bit mode) is performed.
- */
-asmlinkage long sys32_setitimer(u32 which, struct itimerval32 *in, struct itimerval32 *out)
-{
- struct itimerval kin, kout;
- int error;
-
- if (in) {
- if (get_it32(&kin, in))
- return -EFAULT;
- } else
- memset(&kin, 0, sizeof(kin));
-
- error = do_setitimer((int)which, &kin, out ? &kout : NULL);
- if (error || !out)
- return error;
- if (put_it32(out, &kout))
- return -EFAULT;
-
- return 0;
-}
-
#define RLIM_INFINITY32 0xffffffff
#define RESOURCE32(x) ((x > RLIM_INFINITY32) ? RLIM_INFINITY32 : x)

@@ -2062,8 +1920,8 @@


struct rusage32 {
- struct timeval32 ru_utime;
- struct timeval32 ru_stime;
+ struct compat_timeval ru_utime;
+ struct compat_timeval ru_stime;
s32 ru_maxrss;
s32 ru_ixrss;
s32 ru_idrss;
@@ -2180,7 +2038,7 @@
extern struct timezone sys_tz;
extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);

-asmlinkage long sys32_gettimeofday(struct timeval32 *tv, struct timezone *tz)
+asmlinkage long sys32_gettimeofday(struct compat_timeval *tv, struct timezone *tz)
{
if (tv) {
struct timeval ktv;
@@ -2198,7 +2056,7 @@



-asmlinkage long sys32_settimeofday(struct timeval32 *tv, struct timezone *tz)
+asmlinkage long sys32_settimeofday(struct compat_timeval *tv, struct timezone *tz)
{
struct timeval ktv;
struct timezone ktz;
@@ -2251,8 +2109,8 @@

struct semid_ds32 {
struct ipc_perm sem_perm;
- __kernel_time_t32 sem_otime;
- __kernel_time_t32 sem_ctime;
+ compat_time_t sem_otime;
+ compat_time_t sem_ctime;
u32 sem_base;
u32 sem_pending;
u32 sem_pending_last;
@@ -2263,9 +2121,9 @@
struct semid64_ds32 {
struct ipc64_perm sem_perm;
unsigned int __unused1;
- __kernel_time_t32 sem_otime;
+ compat_time_t sem_otime;
unsigned int __unused2;
- __kernel_time_t32 sem_ctime;
+ compat_time_t sem_ctime;
u32 sem_nsems;
u32 __unused3;
u32 __unused4;
@@ -2276,9 +2134,9 @@
struct ipc_perm msg_perm;
u32 msg_first;
u32 msg_last;
- __kernel_time_t32 msg_stime;
- __kernel_time_t32 msg_rtime;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_stime;
+ compat_time_t msg_rtime;
+ compat_time_t msg_ctime;
u32 msg_lcbytes;
u32 msg_lqbytes;
unsigned short msg_cbytes;
@@ -2291,11 +2149,11 @@
struct msqid64_ds32 {
struct ipc64_perm msg_perm;
unsigned int __unused1;
- __kernel_time_t32 msg_stime;
+ compat_time_t msg_stime;
unsigned int __unused2;
- __kernel_time_t32 msg_rtime;
+ compat_time_t msg_rtime;
unsigned int __unused3;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_ctime;
unsigned int msg_cbytes;
unsigned int msg_qnum;
unsigned int msg_qbytes;
@@ -2308,9 +2166,9 @@
struct shmid_ds32 {
struct ipc_perm shm_perm;
int shm_segsz;
- __kernel_time_t32 shm_atime;
- __kernel_time_t32 shm_dtime;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_atime;
+ compat_time_t shm_dtime;
+ compat_time_t shm_ctime;
__kernel_ipc_pid_t32 shm_cpid;
__kernel_ipc_pid_t32 shm_lpid;
unsigned short shm_nattch;
@@ -2322,13 +2180,13 @@
struct shmid64_ds32 {
struct ipc64_perm shm_perm;
unsigned int __unused1;
- __kernel_time_t32 shm_atime;
+ compat_time_t shm_atime;
unsigned int __unused2;
- __kernel_time_t32 shm_dtime;
+ compat_time_t shm_dtime;
unsigned int __unused3;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_ctime;
unsigned int __unused4;
- __kernel_size_t32 shm_segsz;
+ compat_size_t shm_segsz;
__kernel_pid_t32 shm_cpid;
__kernel_pid_t32 shm_lpid;
unsigned int shm_nattch;
@@ -2966,7 +2824,7 @@

static int do_set_sock_timeout(int fd, int level, int optname, char *optval, int optlen)
{
- struct timeval32 *up = (struct timeval32 *) optval;
+ struct compat_timeval *up = (struct compat_timeval *) optval;
struct timeval ktime;
mm_segment_t old_fs;
int err;
@@ -3003,7 +2861,7 @@

static int do_get_sock_timeout(int fd, int level, int optname, char *optval, int *optlen)
{
- struct timeval32 *up = (struct timeval32 *) optval;
+ struct compat_timeval *up = (struct compat_timeval *) optval;
struct timeval ktime;
mm_segment_t old_fs;
int len, err;
@@ -3054,15 +2912,15 @@
u32 msg_name;
int msg_namelen;
u32 msg_iov;
- __kernel_size_t32 msg_iovlen;
+ compat_size_t msg_iovlen;
u32 msg_control;
- __kernel_size_t32 msg_controllen;
+ compat_size_t msg_controllen;
unsigned msg_flags;
};

struct cmsghdr32
{
- __kernel_size_t32 cmsg_len;
+ compat_size_t cmsg_len;
int cmsg_level;
int cmsg_type;
};
@@ -3180,7 +3038,7 @@
{
struct cmsghdr32 *ucmsg;
struct cmsghdr *kcmsg, *kcmsg_base;
- __kernel_size_t32 ucmlen;
+ compat_size_t ucmlen;
__kernel_size_t kcmlen, tmp;

kcmlen = 0;
@@ -3447,12 +3305,12 @@
* from 64-bit time values to 32-bit time values
*/
case SO_TIMESTAMP: {
- __kernel_time_t32* ptr_time32 = CMSG32_DATA(kcmsg32);
+ compat_time_t* ptr_time32 = CMSG32_DATA(kcmsg32);
__kernel_time_t* ptr_time = CMSG_DATA(ucmsg);
*ptr_time32 = *ptr_time;
*(ptr_time32+1) = *(ptr_time+1);
kcmsg32->cmsg_len -= 2*(sizeof(__kernel_time_t) -
- sizeof(__kernel_time_t32));
+ sizeof(compat_time_t));
}
default:;
}
@@ -3563,7 +3421,7 @@
err = move_addr_to_user(addr, kern_msg.msg_namelen, uaddr, uaddr_len);
if(cmsg_ptr != 0 && err >= 0) {
unsigned long ucmsg_ptr = ((unsigned long)kern_msg.msg_control);
- __kernel_size_t32 uclen = (__kernel_size_t32) (ucmsg_ptr - cmsg_ptr);
+ compat_size_t uclen = (compat_size_t) (ucmsg_ptr - cmsg_ptr);
err |= __put_user(uclen, &user_msg->msg_controllen);
}
if(err >= 0)
@@ -3821,7 +3679,7 @@
* proper conversion (sign extension) between the register representation of a signed int (msr in 32-bit mode)
* and the register representation of a signed int (msr in 64-bit mode) is performed.
*/
-asmlinkage int sys32_sched_rr_get_interval(u32 pid, struct timespec32 *interval)
+asmlinkage int sys32_sched_rr_get_interval(u32 pid, struct compat_timespec *interval)
{
struct timespec t;
int ret;
@@ -4323,15 +4181,13 @@
extern ssize_t sys_pwrite64(unsigned int fd, const char *buf, size_t count,
loff_t pos);

-typedef __kernel_ssize_t32 ssize_t32;
-
-ssize_t32 sys32_pread64(unsigned int fd, char *ubuf, __kernel_size_t32 count,
+compat_ssize_t sys32_pread64(unsigned int fd, char *ubuf, compat_size_t count,
u32 reg6, u32 poshi, u32 poslo)
{
return sys_pread64(fd, ubuf, count, ((loff_t)AA(poshi) << 32) | AA(poslo));
}

-ssize_t32 sys32_pwrite64(unsigned int fd, char *ubuf, __kernel_size_t32 count,
+compat_ssize_t sys32_pwrite64(unsigned int fd, char *ubuf, compat_size_t count,
u32 reg6, u32 poshi, u32 poslo)
{
return sys_pwrite64(fd, ubuf, count, ((loff_t)AA(poshi) << 32) | AA(poslo));
@@ -4339,7 +4195,7 @@

extern ssize_t sys_readahead(int fd, loff_t offset, size_t count);

-ssize_t32 sys32_readahead(int fd, u32 r4, u32 offhi, u32 offlo, u32 count)
+compat_ssize_t sys32_readahead(int fd, u32 r4, u32 offhi, u32 offlo, u32 count)
{
return sys_readahead(fd, ((loff_t)offhi << 32) | offlo, AA(count));
}
@@ -4418,9 +4274,9 @@
return error;
}

-asmlinkage long sys32_time(__kernel_time_t32* tloc)
+asmlinkage long sys32_time(compat_time_t* tloc)
{
- __kernel_time_t32 secs;
+ compat_time_t secs;

struct timeval tv;

diff -ruN 2.5.50-BK.2/include/asm-ppc64/compat.h 2.5.50-BK.2-32bit.1/include/asm-ppc64/compat.h
--- 2.5.50-BK.2/include/asm-ppc64/compat.h 1970-01-01 10:00:00.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-ppc64/compat.h 2002-12-04 15:14:56.000000000 +1100
@@ -0,0 +1,18 @@
+#ifndef _ASM_PPC64_COMPAT_H
+#define _ASM_PPC64_COMPAT_H
+/*
+ * Architecture specific compatibility types
+ */
+#include <linux/types.h>
+
+typedef u32 compat_size_t;
+typedef s32 compat_ssize_t;
+typedef s32 compat_time_t;
+typedef s32 compat_suseconds_t;
+
+struct compat_timespec {
+ compat_time_t tv_sec;
+ s32 tv_nsec;
+};
+
+#endif /* _ASM_PPC64_COMPAT_H */
diff -ruN 2.5.50-BK.2/include/asm-ppc64/ppc32.h 2.5.50-BK.2-32bit.1/include/asm-ppc64/ppc32.h
--- 2.5.50-BK.2/include/asm-ppc64/ppc32.h 2002-12-04 12:07:39.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/include/asm-ppc64/ppc32.h 2002-12-04 14:46:15.000000000 +1100
@@ -1,6 +1,7 @@
#ifndef _PPC64_PPC32_H
#define _PPC64_PPC32_H

+#include <linux/compat.h>
#include <asm/siginfo.h>
#include <asm/signal.h>

@@ -43,10 +44,7 @@
})

/* These are here to support 32-bit syscalls on a 64-bit kernel. */
-typedef unsigned int __kernel_size_t32;
-typedef int __kernel_ssize_t32;
typedef int __kernel_ptrdiff_t32;
-typedef int __kernel_time_t32;
typedef int __kernel_clock_t32;
typedef int __kernel_pid_t32;
typedef unsigned short __kernel_ipc_pid_t32;
@@ -160,7 +158,7 @@
typedef struct sigaltstack_32 {
unsigned int ss_sp;
int ss_flags;
- __kernel_size_t32 ss_size;
+ compat_size_t ss_size;
} stack_32_t;

struct flock32 {
@@ -183,11 +181,11 @@
__kernel_off_t32 st_size; /* 4 */
__kernel_off_t32 st_blksize; /* 4 */
__kernel_off_t32 st_blocks; /* 4 */
- __kernel_time_t32 st_atime; /* 4 */
+ compat_time_t st_atime; /* 4 */
unsigned int __unused1; /* 4 */
- __kernel_time_t32 st_mtime; /* 4 */
+ compat_time_t st_mtime; /* 4 */
unsigned int __unused2; /* 4 */
- __kernel_time_t32 st_ctime; /* 4 */
+ compat_time_t st_ctime; /* 4 */
unsigned int __unused3; /* 4 */
unsigned int __unused4[2]; /* 2*4 */
};

2002-12-04 07:09:55

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer - SPARC64

Hi Dave, Linus,

This is the Sparc64 specific patch.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

diff -ruN 2.5.50-BK.2/arch/sparc64/Kconfig 2.5.50-BK.2-32bit.1/arch/sparc64/Kconfig
--- 2.5.50-BK.2/arch/sparc64/Kconfig 2002-11-28 10:35:37.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/sparc64/Kconfig 2002-12-03 17:00:37.000000000 +1100
@@ -352,6 +352,11 @@
This allows you to run 32-bit binaries on your Ultra.
Everybody wants this; say Y.

+config COMPAT
+ bool
+ depends on SPARC32_COMPAT
+ default y
+
config BINFMT_ELF32
tristate "Kernel support for 32-bit ELF binaries"
depends on SPARC32_COMPAT
diff -ruN 2.5.50-BK.2/arch/sparc64/kernel/binfmt_elf32.c 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/binfmt_elf32.c
--- 2.5.50-BK.2/arch/sparc64/kernel/binfmt_elf32.c 2002-11-28 10:35:37.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/binfmt_elf32.c 2002-12-04 15:29:39.000000000 +1100
@@ -86,11 +86,7 @@
#include <linux/module.h>
#include <linux/config.h>
#include <linux/elfcore.h>
-
-struct timeval32
-{
- int tv_sec, tv_usec;
-};
+#include <linux/compat.h>

#define elf_prstatus elf_prstatus32
struct elf_prstatus32
@@ -103,10 +99,10 @@
pid_t pr_ppid;
pid_t pr_pgrp;
pid_t pr_sid;
- struct timeval32 pr_utime; /* User time */
- struct timeval32 pr_stime; /* System time */
- struct timeval32 pr_cutime; /* Cumulative user time */
- struct timeval32 pr_cstime; /* Cumulative system time */
+ struct compat_timeval pr_utime; /* User time */
+ struct compat_timeval pr_stime; /* System time */
+ struct compat_timeval pr_cutime; /* Cumulative user time */
+ struct compat_timeval pr_cstime; /* Cumulative system time */
elf_gregset_t pr_reg; /* GP registers */
int pr_fpvalid; /* True if math co-processor being used. */
};
@@ -136,9 +132,9 @@

#include <linux/time.h>

-#define jiffies_to_timeval jiffies_to_timeval32
+#define jiffies_to_timeval jiffies_to_compat_timeval
static __inline__ void
-jiffies_to_timeval32(unsigned long jiffies, struct timeval32 *value)
+jiffies_to_compat_timeval(unsigned long jiffies, struct compat_timeval *value)
{
value->tv_usec = (jiffies % HZ) * (1000000L / HZ);
value->tv_sec = jiffies / HZ;
diff -ruN 2.5.50-BK.2/arch/sparc64/kernel/ioctl32.c 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/ioctl32.c
--- 2.5.50-BK.2/arch/sparc64/kernel/ioctl32.c 2002-11-18 15:47:41.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/ioctl32.c 2002-12-04 15:30:07.000000000 +1100
@@ -10,6 +10,7 @@

#include <linux/config.h>
#include <linux/types.h>
+#include <linux/compat.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/smp.h>
@@ -405,14 +406,9 @@
return err;
}

-struct timeval32 {
- int tv_sec;
- int tv_usec;
-};
-
static int do_siocgstamp(unsigned int fd, unsigned int cmd, unsigned long arg)
{
- struct timeval32 *up = (struct timeval32 *)arg;
+ struct compat_timeval *up = (struct compat_timeval *)arg;
struct timeval ktv;
mm_segment_t old_fs = get_fs();
int err;
@@ -1743,8 +1739,8 @@
#define PPPIOCSCOMPRESS32 _IOW('t', 77, struct ppp_option_data32)

struct ppp_idle32 {
- __kernel_time_t32 xmit_idle;
- __kernel_time_t32 recv_idle;
+ compat_time_t xmit_idle;
+ compat_time_t recv_idle;
};
#define PPPIOCGIDLE32 _IOR('t', 63, struct ppp_idle32)

diff -ruN 2.5.50-BK.2/arch/sparc64/kernel/signal32.c 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/signal32.c
--- 2.5.50-BK.2/arch/sparc64/kernel/signal32.c 2002-11-28 10:35:37.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/signal32.c 2002-12-04 14:37:20.000000000 +1100
@@ -19,6 +19,7 @@
#include <linux/tty.h>
#include <linux/smp_lock.h>
#include <linux/binfmts.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include <asm/bitops.h>
@@ -181,7 +182,7 @@
sigset_t32 set32;

/* XXX: Don't preclude handling different sized sigset_t's. */
- if (((__kernel_size_t32)sigsetsize) != sizeof(sigset_t)) {
+ if (((compat_size_t)sigsetsize) != sizeof(sigset_t)) {
regs->tstate |= TSTATE_ICARRY;
regs->u_regs[UREG_I0] = EINVAL;
return;
diff -ruN 2.5.50-BK.2/arch/sparc64/kernel/sys32.S 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/sys32.S
--- 2.5.50-BK.2/arch/sparc64/kernel/sys32.S 2002-11-11 14:55:28.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/sys32.S 2002-12-04 14:36:56.000000000 +1100
@@ -175,7 +175,7 @@
lduwa [%o1 + 0x4] %asi, %o1
nop
nop
-do_sys_sendto: /* sys32_sendto(int, u32, __kernel_size_t32, unsigned int, u32, int) */
+do_sys_sendto: /* sys32_sendto(int, u32, compat_size_t, unsigned int, u32, int) */
ldswa [%o1 + 0x0] %asi, %o0
sethi %hi(sys32_sendto), %g1
lduwa [%o1 + 0x8] %asi, %o2
@@ -184,7 +184,7 @@
ldswa [%o1 + 0x14] %asi, %o5
jmpl %g1 + %lo(sys32_sendto), %g0
lduwa [%o1 + 0x4] %asi, %o1
-do_sys_recvfrom: /* sys32_recvfrom(int, u32, __kernel_size_t32, unsigned int, u32, u32) */
+do_sys_recvfrom: /* sys32_recvfrom(int, u32, compat_size_t, unsigned int, u32, u32) */
ldswa [%o1 + 0x0] %asi, %o0
sethi %hi(sys32_recvfrom), %g1
lduwa [%o1 + 0x8] %asi, %o2
diff -ruN 2.5.50-BK.2/arch/sparc64/kernel/sys_sparc32.c 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/sys_sparc32.c
--- 2.5.50-BK.2/arch/sparc64/kernel/sys_sparc32.c 2002-12-04 12:07:33.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/sys_sparc32.c 2002-12-04 16:24:44.000000000 +1100
@@ -15,7 +15,6 @@
#include <linux/mm.h>
#include <linux/file.h>
#include <linux/signal.h>
-#include <linux/utime.h>
#include <linux/resource.h>
#include <linux/times.h>
#include <linux/utsname.h>
@@ -52,6 +51,7 @@
#include <linux/binfmts.h>
#include <linux/dnotify.h>
#include <linux/security.h>
+#include <linux/compat.h>

#include <asm/types.h>
#include <asm/ipc.h>
@@ -263,49 +263,20 @@

/* 32-bit timeval and related flotsam. */

-struct timeval32
-{
- int tv_sec, tv_usec;
-};
-
-struct itimerval32
-{
- struct timeval32 it_interval;
- struct timeval32 it_value;
-};
-
-static long get_tv32(struct timeval *o, struct timeval32 *i)
+static long get_tv32(struct timeval *o, struct compat_timeval *i)
{
return (!access_ok(VERIFY_READ, tv32, sizeof(*tv32)) ||
(__get_user(o->tv_sec, &i->tv_sec) |
__get_user(o->tv_usec, &i->tv_usec)));
}

-static inline long put_tv32(struct timeval32 *o, struct timeval *i)
+static inline long put_tv32(struct compat_timeval *o, struct timeval *i)
{
return (!access_ok(VERIFY_WRITE, o, sizeof(*o)) ||
(__put_user(i->tv_sec, &o->tv_sec) |
__put_user(i->tv_usec, &o->tv_usec)));
}

-static inline long get_it32(struct itimerval *o, struct itimerval32 *i)
-{
- return (!access_ok(VERIFY_READ, i32, sizeof(*i32)) ||
- (__get_user(o->it_interval.tv_sec, &i->it_interval.tv_sec) |
- __get_user(o->it_interval.tv_usec, &i->it_interval.tv_usec) |
- __get_user(o->it_value.tv_sec, &i->it_value.tv_sec) |
- __get_user(o->it_value.tv_usec, &i->it_value.tv_usec)));
-}
-
-static long put_it32(struct itimerval32 *o, struct itimerval *i)
-{
- return (!access_ok(VERIFY_WRITE, i32, sizeof(*i32)) ||
- (__put_user(i->it_interval.tv_sec, &o->it_interval.tv_sec) |
- __put_user(i->it_interval.tv_usec, &o->it_interval.tv_usec) |
- __put_user(i->it_value.tv_sec, &o->it_value.tv_sec) |
- __put_user(i->it_value.tv_usec, &o->it_value.tv_usec)));
-}
-
extern asmlinkage int sys_ioperm(unsigned long from, unsigned long num, int on);

asmlinkage int sys32_ioperm(u32 from, u32 num, int on)
@@ -328,8 +299,8 @@

struct semid_ds32 {
struct ipc_perm32 sem_perm; /* permissions .. see ipc.h */
- __kernel_time_t32 sem_otime; /* last semop time */
- __kernel_time_t32 sem_ctime; /* last change time */
+ compat_time_t sem_otime; /* last semop time */
+ compat_time_t sem_ctime; /* last change time */
u32 sem_base; /* ptr to first semaphore in array */
u32 sem_pending; /* pending operations to be processed */
u32 sem_pending_last; /* last pending operation */
@@ -340,9 +311,9 @@
struct semid64_ds32 {
struct ipc64_perm sem_perm; /* this structure is the same on sparc32 and sparc64 */
unsigned int __pad1;
- __kernel_time_t32 sem_otime;
+ compat_time_t sem_otime;
unsigned int __pad2;
- __kernel_time_t32 sem_ctime;
+ compat_time_t sem_ctime;
u32 sem_nsems;
u32 __unused1;
u32 __unused2;
@@ -353,9 +324,9 @@
struct ipc_perm32 msg_perm;
u32 msg_first;
u32 msg_last;
- __kernel_time_t32 msg_stime;
- __kernel_time_t32 msg_rtime;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_stime;
+ compat_time_t msg_rtime;
+ compat_time_t msg_ctime;
u32 wwait;
u32 rwait;
unsigned short msg_cbytes;
@@ -368,11 +339,11 @@
struct msqid64_ds32 {
struct ipc64_perm msg_perm;
unsigned int __pad1;
- __kernel_time_t32 msg_stime;
+ compat_time_t msg_stime;
unsigned int __pad2;
- __kernel_time_t32 msg_rtime;
+ compat_time_t msg_rtime;
unsigned int __pad3;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_ctime;
unsigned int msg_cbytes;
unsigned int msg_qnum;
unsigned int msg_qbytes;
@@ -386,9 +357,9 @@
struct shmid_ds32 {
struct ipc_perm32 shm_perm;
int shm_segsz;
- __kernel_time_t32 shm_atime;
- __kernel_time_t32 shm_dtime;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_atime;
+ compat_time_t shm_dtime;
+ compat_time_t shm_ctime;
__kernel_ipc_pid_t32 shm_cpid;
__kernel_ipc_pid_t32 shm_lpid;
unsigned short shm_nattch;
@@ -397,12 +368,12 @@
struct shmid64_ds32 {
struct ipc64_perm shm_perm;
unsigned int __pad1;
- __kernel_time_t32 shm_atime;
+ compat_time_t shm_atime;
unsigned int __pad2;
- __kernel_time_t32 shm_dtime;
+ compat_time_t shm_dtime;
unsigned int __pad3;
- __kernel_time_t32 shm_ctime;
- __kernel_size_t32 shm_segsz;
+ compat_time_t shm_ctime;
+ compat_size_t shm_segsz;
__kernel_pid_t32 shm_cpid;
__kernel_pid_t32 shm_lpid;
unsigned int shm_nattch;
@@ -965,37 +936,7 @@
return sys_ftruncate(fd, (high << 32) | low);
}

-extern asmlinkage int sys_utime(char * filename, struct utimbuf * times);
-
-struct utimbuf32 {
- __kernel_time_t32 actime, modtime;
-};
-
-asmlinkage int sys32_utime(char * filename, struct utimbuf32 *times)
-{
- struct utimbuf t;
- mm_segment_t old_fs;
- int ret;
- char *filenam;
-
- if (!times)
- return sys_utime(filename, NULL);
- if (get_user (t.actime, &times->actime) ||
- __get_user (t.modtime, &times->modtime))
- return -EFAULT;
- filenam = getname (filename);
- ret = PTR_ERR(filenam);
- if (!IS_ERR(filenam)) {
- old_fs = get_fs();
- set_fs (KERNEL_DS);
- ret = sys_utime(filenam, &t);
- set_fs (old_fs);
- putname (filenam);
- }
- return ret;
-}
-
-struct iovec32 { u32 iov_base; __kernel_size_t32 iov_len; };
+struct iovec32 { u32 iov_base; compat_size_t iov_len; };

typedef ssize_t (*io_fn_t)(struct file *, char *, size_t, loff_t *);
typedef ssize_t (*iov_fn_t)(struct file *, const struct iovec *, unsigned long, loff_t *);
@@ -1003,7 +944,7 @@
static long do_readv_writev32(int type, struct file *file,
const struct iovec32 *vector, u32 count)
{
- __kernel_ssize_t32 tot_len;
+ compat_ssize_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
struct iovec *iov=iovstack, *ivp;
struct inode *inode;
@@ -1035,16 +976,16 @@
ivp = iov;
retval = -EINVAL;
while(i > 0) {
- __kernel_ssize_t32 tmp = tot_len;
- __kernel_ssize_t32 len;
+ compat_ssize_t tmp = tot_len;
+ compat_ssize_t len;
u32 buf;

__get_user(len, &vector->iov_len);
__get_user(buf, &vector->iov_base);
- if (len < 0) /* size_t not fittina an ssize_t32 .. */
+ if (len < 0) /* size_t not fittina an compat_ssize_t .. */
goto out;
tot_len += len;
- if (tot_len < tmp) /* maths overflow on the ssize_t32 */
+ if (tot_len < tmp) /* maths overflow on the compat_ssize_t */
goto out;
ivp->iov_base = (void *)A(buf);
ivp->iov_len = (__kernel_size_t) len;
@@ -1331,7 +1272,7 @@
asmlinkage int sys32_select(int n, u32 *inp, u32 *outp, u32 *exp, u32 tvp_x)
{
fd_set_bits fds;
- struct timeval32 *tvp = (struct timeval32 *)AA(tvp_x);
+ struct compat_timeval *tvp = (struct compat_timeval *)AA(tvp_x);
char *bits;
unsigned long nn;
long timeout;
@@ -1692,8 +1633,8 @@
}

struct rusage32 {
- struct timeval32 ru_utime;
- struct timeval32 ru_stime;
+ struct compat_timeval ru_utime;
+ struct compat_timeval ru_stime;
s32 ru_maxrss;
s32 ru_ixrss;
s32 ru_idrss;
@@ -1795,14 +1736,9 @@
return ret;
}

-struct timespec32 {
- s32 tv_sec;
- s32 tv_nsec;
-};
-
extern asmlinkage int sys_sched_rr_get_interval(pid_t pid, struct timespec *interval);

-asmlinkage int sys32_sched_rr_get_interval(__kernel_pid_t32 pid, struct timespec32 *interval)
+asmlinkage int sys32_sched_rr_get_interval(__kernel_pid_t32 pid, struct compat_timespec *interval)
{
struct timespec t;
int ret;
@@ -1817,28 +1753,6 @@
return ret;
}

-extern asmlinkage int sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp);
-
-asmlinkage int sys32_nanosleep(struct timespec32 *rqtp, struct timespec32 *rmtp)
-{
- struct timespec t;
- int ret;
- mm_segment_t old_fs = get_fs ();
-
- if (get_user (t.tv_sec, &rqtp->tv_sec) ||
- __get_user (t.tv_nsec, &rqtp->tv_nsec))
- return -EFAULT;
- set_fs (KERNEL_DS);
- ret = sys_nanosleep(&t, rmtp ? &t : NULL);
- set_fs (old_fs);
- if (rmtp && ret == -EINTR) {
- if (__put_user (t.tv_sec, &rmtp->tv_sec) ||
- __put_user (t.tv_nsec, &rmtp->tv_nsec))
- return -EFAULT;
- }
- return ret;
-}
-
extern asmlinkage int sys_sigprocmask(int how, old_sigset_t *set, old_sigset_t *oset);

asmlinkage int sys32_sigprocmask(int how, old_sigset_t32 *set, old_sigset_t32 *oset)
@@ -1858,7 +1772,7 @@

extern asmlinkage int sys_rt_sigprocmask(int how, sigset_t *set, sigset_t *oset, size_t sigsetsize);

-asmlinkage int sys32_rt_sigprocmask(int how, sigset_t32 *set, sigset_t32 *oset, __kernel_size_t32 sigsetsize)
+asmlinkage int sys32_rt_sigprocmask(int how, sigset_t32 *set, sigset_t32 *oset, compat_size_t sigsetsize)
{
sigset_t s;
sigset_t32 s32;
@@ -1909,7 +1823,7 @@

extern asmlinkage int sys_rt_sigpending(sigset_t *set, size_t sigsetsize);

-asmlinkage int sys32_rt_sigpending(sigset_t32 *set, __kernel_size_t32 sigsetsize)
+asmlinkage int sys32_rt_sigpending(sigset_t32 *set, compat_size_t sigsetsize)
{
sigset_t s;
sigset_t32 s32;
@@ -1934,7 +1848,7 @@

asmlinkage int
sys32_rt_sigtimedwait(sigset_t32 *uthese, siginfo_t32 *uinfo,
- struct timespec32 *uts, __kernel_size_t32 sigsetsize)
+ struct compat_timespec *uts, compat_size_t sigsetsize)
{
int ret, sig;
sigset_t these;
@@ -2139,14 +2053,14 @@
u32 msg_name;
int msg_namelen;
u32 msg_iov;
- __kernel_size_t32 msg_iovlen;
+ compat_size_t msg_iovlen;
u32 msg_control;
- __kernel_size_t32 msg_controllen;
+ compat_size_t msg_controllen;
unsigned msg_flags;
};

struct cmsghdr32 {
- __kernel_size_t32 cmsg_len;
+ compat_size_t cmsg_len;
int cmsg_level;
int cmsg_type;
};
@@ -2280,7 +2194,7 @@
{
struct cmsghdr32 *ucmsg;
struct cmsghdr *kcmsg, *kcmsg_base;
- __kernel_size_t32 ucmlen;
+ compat_size_t ucmlen;
__kernel_size_t kcmlen, tmp;

kcmlen = 0;
@@ -2646,7 +2560,7 @@
err = move_addr_to_user(addr, kern_msg.msg_namelen, uaddr, uaddr_len);
if(cmsg_ptr != 0 && err >= 0) {
unsigned long ucmsg_ptr = ((unsigned long)kern_msg.msg_control);
- __kernel_size_t32 uclen = (__kernel_size_t32) (ucmsg_ptr - cmsg_ptr);
+ compat_size_t uclen = (compat_size_t) (ucmsg_ptr - cmsg_ptr);
err |= __put_user(uclen, &user_msg->msg_controllen);
}
if(err >= 0)
@@ -2734,7 +2648,7 @@

static int do_set_sock_timeout(int fd, int level, int optname, char *optval, int optlen)
{
- struct timeval32 *up = (struct timeval32 *) optval;
+ struct compat_timeval *up = (struct compat_timeval *) optval;
struct timeval ktime;
mm_segment_t old_fs;
int err;
@@ -2772,7 +2686,7 @@

static int do_get_sock_timeout(int fd, int level, int optname, char *optval, int *optlen)
{
- struct timeval32 *up = (struct timeval32 *) optval;
+ struct compat_timeval *up = (struct compat_timeval *) optval;
struct timeval ktime;
mm_segment_t old_fs;
int len, err;
@@ -2843,7 +2757,7 @@

asmlinkage int
sys32_rt_sigaction(int sig, struct sigaction32 *act, struct sigaction32 *oact,
- void *restorer, __kernel_size_t32 sigsetsize)
+ void *restorer, compat_size_t sigsetsize)
{
struct k_sigaction new_ka, old_ka;
int ret;
@@ -3481,7 +3395,7 @@
extern struct timezone sys_tz;
extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);

-asmlinkage int sys32_gettimeofday(struct timeval32 *tv, struct timezone *tz)
+asmlinkage int sys32_gettimeofday(struct compat_timeval *tv, struct timezone *tz)
{
if (tv) {
struct timeval ktv;
@@ -3496,7 +3410,7 @@
return 0;
}

-asmlinkage int sys32_settimeofday(struct timeval32 *tv, struct timezone *tz)
+asmlinkage int sys32_settimeofday(struct compat_timeval *tv, struct timezone *tz)
{
struct timeval ktv;
struct timezone ktz;
@@ -3513,46 +3427,9 @@
return do_sys_settimeofday(tv ? &ktv : NULL, tz ? &ktz : NULL);
}

-extern int do_getitimer(int which, struct itimerval *value);
-
-asmlinkage int sys32_getitimer(int which, struct itimerval32 *it)
-{
- struct itimerval kit;
- int error;
-
- error = do_getitimer(which, &kit);
- if (!error && put_it32(it, &kit))
- error = -EFAULT;
-
- return error;
-}
-
-extern int do_setitimer(int which, struct itimerval *, struct itimerval *);
-
-asmlinkage int sys32_setitimer(int which, struct itimerval32 *in, struct itimerval32 *out)
-{
- struct itimerval kin, kout;
- int error;
-
- if (in) {
- if (get_it32(&kin, in))
- return -EFAULT;
- } else
- memset(&kin, 0, sizeof(kin));
-
- error = do_setitimer(which, &kin, out ? &kout : NULL);
- if (error || !out)
- return error;
- if (put_it32(out, &kout))
- return -EFAULT;
-
- return 0;
-
-}
-
asmlinkage int sys_utimes(char *, struct timeval *);

-asmlinkage int sys32_utimes(char *filename, struct timeval32 *tvs)
+asmlinkage int sys32_utimes(char *filename, struct compat_timeval *tvs)
{
char *kfilename;
struct timeval ktvs[2];
@@ -3636,23 +3513,21 @@
extern asmlinkage ssize_t sys_pwrite64(unsigned int fd, const char * buf,
size_t count, loff_t pos);

-typedef __kernel_ssize_t32 ssize_t32;
-
-asmlinkage ssize_t32 sys32_pread64(unsigned int fd, char *ubuf,
- __kernel_size_t32 count, u32 poshi, u32 poslo)
+asmlinkage compat_ssize_t sys32_pread64(unsigned int fd, char *ubuf,
+ compat_size_t count, u32 poshi, u32 poslo)
{
return sys_pread64(fd, ubuf, count, ((loff_t)AA(poshi) << 32) | AA(poslo));
}

-asmlinkage ssize_t32 sys32_pwrite64(unsigned int fd, char *ubuf,
- __kernel_size_t32 count, u32 poshi, u32 poslo)
+asmlinkage compat_ssize_t sys32_pwrite64(unsigned int fd, char *ubuf,
+ compat_size_t count, u32 poshi, u32 poslo)
{
return sys_pwrite64(fd, ubuf, count, ((loff_t)AA(poshi) << 32) | AA(poslo));
}

extern asmlinkage ssize_t sys_readahead(int fd, loff_t offset, size_t count);

-asmlinkage ssize_t32 sys32_readahead(int fd, u32 offhi, u32 offlo, s32 count)
+asmlinkage compat_ssize_t sys32_readahead(int fd, u32 offhi, u32 offlo, s32 count)
{
return sys_readahead(fd, ((loff_t)AA(offhi) << 32) | AA(offlo), count);
}
@@ -3705,7 +3580,7 @@
u32 modes;
s32 offset, freq, maxerror, esterror;
s32 status, constant, precision, tolerance;
- struct timeval32 time;
+ struct compat_timeval time;
s32 tick;
s32 ppsfreq, jitter, shift, stabil;
s32 jitcnt, calcnt, errcnt, stbcnt;
diff -ruN 2.5.50-BK.2/arch/sparc64/kernel/sys_sunos32.c 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/sys_sunos32.c
--- 2.5.50-BK.2/arch/sparc64/kernel/sys_sunos32.c 2002-11-28 10:34:43.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/sys_sunos32.c 2002-12-04 15:28:36.000000000 +1100
@@ -12,6 +12,7 @@
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/types.h>
+#include <linux/compat.h>
#include <linux/mman.h>
#include <linux/mm.h>
#include <linux/swap.h>
@@ -528,11 +529,6 @@
extern asmlinkage int
sys32_select(int n, u32 inp, u32 outp, u32 exp, u32 tvp);

-struct timeval32
-{
- int tv_sec, tv_usec;
-};
-
asmlinkage int sunos_select(int width, u32 inp, u32 outp, u32 exp, u32 tvp_x)
{
int ret;
@@ -540,7 +536,7 @@
/* SunOS binaries expect that select won't change the tvp contents */
ret = sys32_select (width, inp, outp, exp, tvp_x);
if (ret == -EINTR && tvp_x) {
- struct timeval32 *tvp = (struct timeval32 *)A(tvp_x);
+ struct compat_timeval *tvp = (struct compat_timeval *)A(tvp_x);
time_t sec, usec;

__get_user(sec, &tvp->tv_sec);
@@ -948,9 +944,9 @@
struct ipc_perm32 msg_perm;
u32 msg_first;
u32 msg_last;
- __kernel_time_t32 msg_stime;
- __kernel_time_t32 msg_rtime;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_stime;
+ compat_time_t msg_rtime;
+ compat_time_t msg_ctime;
u32 wwait;
u32 rwait;
unsigned short msg_cbytes;
@@ -1085,9 +1081,9 @@
struct shmid_ds32 {
struct ipc_perm32 shm_perm;
int shm_segsz;
- __kernel_time_t32 shm_atime;
- __kernel_time_t32 shm_dtime;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_atime;
+ compat_time_t shm_dtime;
+ compat_time_t shm_ctime;
__kernel_ipc_pid_t32 shm_cpid;
__kernel_ipc_pid_t32 shm_lpid;
unsigned short shm_nattch;
diff -ruN 2.5.50-BK.2/arch/sparc64/kernel/systbls.S 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/systbls.S
--- 2.5.50-BK.2/arch/sparc64/kernel/systbls.S 2002-11-28 10:35:37.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/sparc64/kernel/systbls.S 2002-12-04 17:40:44.000000000 +1100
@@ -25,7 +25,7 @@
/*15*/ .word sys32_chmod, sys32_lchown16, sparc_brk, sys_perfctr, sys32_lseek
/*20*/ .word sys_getpid, sys_capget, sys_capset, sys32_setuid16, sys32_getuid16
/*25*/ .word sys_time, sys_ptrace, sys_alarm, sys32_sigaltstack, sys32_pause
-/*30*/ .word sys32_utime, sys_lchown, sys_fchown, sys_access, sys_nice
+/*30*/ .word compat_sys_utime, sys_lchown, sys_fchown, sys_access, sys_nice
.word sys_chown, sys_sync, sys_kill, sys32_newstat, sys32_sendfile
/*40*/ .word sys32_newlstat, sys_dup, sys_pipe, sys32_times, sys_getuid
.word sys_umount, sys32_setgid16, sys32_getgid16, sys_signal, sys32_geteuid16
@@ -35,8 +35,8 @@
.word sys_msync, sys_vfork, sys32_pread64, sys32_pwrite64, sys_geteuid
/*70*/ .word sys_getegid, sys32_mmap, sys_setreuid, sys_munmap, sys_mprotect
.word sys_madvise, sys_vhangup, sys32_truncate64, sys_mincore, sys32_getgroups16
-/*80*/ .word sys32_setgroups16, sys_getpgrp, sys_setgroups, sys32_setitimer, sys32_ftruncate64
- .word sys_swapon, sys32_getitimer, sys_setuid, sys_sethostname, sys_setgid
+/*80*/ .word sys32_setgroups16, sys_getpgrp, sys_setgroups, compat_sys_setitimer, sys32_ftruncate64
+ .word sys_swapon, compat_sys_getitimer, sys_setuid, sys_sethostname, sys_setgid
/*90*/ .word sys_dup2, sys_setfsuid, sys32_fcntl, sys32_select, sys_setfsgid
.word sys_fsync, sys_setpriority32, sys_nis_syscall, sys_nis_syscall, sys_nis_syscall
/*100*/ .word sys_getpriority, sys32_rt_sigreturn, sys32_rt_sigaction, sys32_rt_sigprocmask, sys32_rt_sigpending
@@ -68,7 +68,7 @@
/*230*/ .word sys32_select, sys_time, sys_nis_syscall, sys_stime, sys_alloc_hugepages
.word sys_free_hugepages, sys_llseek, sys_mlock, sys_munlock, sys_mlockall
/*240*/ .word sys_munlockall, sys_sched_setparam, sys_sched_getparam, sys_sched_setscheduler, sys_sched_getscheduler
- .word sys_sched_yield, sys_sched_get_priority_max, sys_sched_get_priority_min, sys32_sched_rr_get_interval, sys32_nanosleep
+ .word sys_sched_yield, sys_sched_get_priority_max, sys_sched_get_priority_min, sys32_sched_rr_get_interval, compat_sys_nanosleep
/*250*/ .word sys32_mremap, sys32_sysctl, sys_getsid, sys_fdatasync, sys32_nfsservctl
.word sys_aplib

@@ -166,8 +166,8 @@
.word sys_mprotect, sys_madvise, sys_vhangup
.word sunos_nosys, sys_mincore, sys32_getgroups16
.word sys32_setgroups16, sys_getpgrp, sunos_setpgrp
- .word sys32_setitimer, sunos_nosys, sys_swapon
- .word sys32_getitimer, sys_gethostname, sys_sethostname
+ .word compat_sys_setitimer, sunos_nosys, sys_swapon
+ .word compat_sys_getitimer, sys_gethostname, sys_sethostname
.word sunos_getdtablesize, sys_dup2, sunos_nop
.word sys32_fcntl, sunos_select, sunos_nop
.word sys_fsync, sys_setpriority32, sunos_socket
diff -ruN 2.5.50-BK.2/arch/sparc64/solaris/misc.c 2.5.50-BK.2-32bit.1/arch/sparc64/solaris/misc.c
--- 2.5.50-BK.2/arch/sparc64/solaris/misc.c 2002-10-14 18:17:30.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/sparc64/solaris/misc.c 2002-12-04 15:32:41.000000000 +1100
@@ -16,6 +16,7 @@
#include <linux/file.h>
#include <linux/timex.h>
#include <linux/major.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include <asm/string.h>
@@ -597,12 +598,8 @@
return ret;
}

-struct timeval32 {
- int tv_sec, tv_usec;
-};
-
struct sol_ntptimeval {
- struct timeval32 time;
+ struct compat_timeval time;
s32 maxerror;
s32 esterror;
};
diff -ruN 2.5.50-BK.2/arch/sparc64/solaris/socket.c 2.5.50-BK.2-32bit.1/arch/sparc64/solaris/socket.c
--- 2.5.50-BK.2/arch/sparc64/solaris/socket.c 2002-11-28 10:34:43.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/sparc64/solaris/socket.c 2002-12-04 14:39:38.000000000 +1100
@@ -14,6 +14,7 @@
#include <linux/socket.h>
#include <linux/file.h>
#include <linux/net.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include <asm/string.h>
@@ -378,7 +379,7 @@
if(kern_msg.msg_controllen) {
struct sol_cmsghdr *ucmsg = (struct sol_cmsghdr *)kern_msg.msg_control;
unsigned long *kcmsg;
- __kernel_size_t32 cmlen;
+ compat_size_t cmlen;

if(kern_msg.msg_controllen > sizeof(ctl) &&
kern_msg.msg_controllen <= 256) {
@@ -392,7 +393,7 @@
*kcmsg++ = (unsigned long)cmlen;
err = -EFAULT;
if(copy_from_user(kcmsg, &ucmsg->cmsg_level,
- kern_msg.msg_controllen - sizeof(__kernel_size_t32)))
+ kern_msg.msg_controllen - sizeof(compat_size_t)))
goto out_freectl;
kern_msg.msg_control = ctl_buf;
}
diff -ruN 2.5.50-BK.2/include/asm-sparc64/compat.h 2.5.50-BK.2-32bit.1/include/asm-sparc64/compat.h
--- 2.5.50-BK.2/include/asm-sparc64/compat.h 1970-01-01 10:00:00.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-sparc64/compat.h 2002-12-04 15:16:20.000000000 +1100
@@ -0,0 +1,18 @@
+#ifndef _ASM_SPARC64_COMPAT_H
+#define _ASM_SPARC64_COMPAT_H
+/*
+ * Architecture specific compatibility types
+ */
+#include <linux/types.h>
+
+typedef u32 compat_size_t;
+typedef s32 compat_ssize_t;
+typedef s32 compat_time_t;
+typedef s32 compat_suseconds_t;
+
+struct compat_timespec {
+ compat_time_t tv_sec;
+ s32 tv_nsec;
+};
+
+#endif /* _ASM_SPARC64_COMPAT_H */
diff -ruN 2.5.50-BK.2/include/asm-sparc64/posix_types.h 2.5.50-BK.2-32bit.1/include/asm-sparc64/posix_types.h
--- 2.5.50-BK.2/include/asm-sparc64/posix_types.h 2000-10-28 04:55:01.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/include/asm-sparc64/posix_types.h 2002-12-04 14:46:11.000000000 +1100
@@ -48,10 +48,7 @@
} __kernel_fsid_t;

/* Now 32bit compatibility types */
-typedef unsigned int __kernel_size_t32;
-typedef int __kernel_ssize_t32;
typedef int __kernel_ptrdiff_t32;
-typedef int __kernel_time_t32;
typedef int __kernel_clock_t32;
typedef int __kernel_pid_t32;
typedef unsigned short __kernel_ipc_pid_t32;
diff -ruN 2.5.50-BK.2/include/asm-sparc64/signal.h 2.5.50-BK.2-32bit.1/include/asm-sparc64/signal.h
--- 2.5.50-BK.2/include/asm-sparc64/signal.h 2002-06-21 10:22:39.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-sparc64/signal.h 2002-12-04 14:29:24.000000000 +1100
@@ -8,6 +8,7 @@
#ifndef __ASSEMBLY__
#include <linux/personality.h>
#include <linux/types.h>
+#include <linux/compat.h>
#endif
#endif

@@ -250,7 +251,7 @@
typedef struct sigaltstack32 {
u32 ss_sp;
int ss_flags;
- __kernel_size_t32 ss_size;
+ compat_size_t ss_size;
} stack_t32;

#define HAVE_ARCH_GET_SIGNAL_TO_DELIVER
diff -ruN 2.5.50-BK.2/include/asm-sparc64/stat.h 2.5.50-BK.2-32bit.1/include/asm-sparc64/stat.h
--- 2.5.50-BK.2/include/asm-sparc64/stat.h 2002-11-18 15:47:55.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/include/asm-sparc64/stat.h 2002-12-03 17:05:52.000000000 +1100
@@ -3,6 +3,7 @@
#define _SPARC64_STAT_H

#include <linux/types.h>
+#include <linux/compat.h>
#include <linux/time.h>

struct stat32 {
@@ -14,11 +15,11 @@
__kernel_gid_t32 st_gid;
__kernel_dev_t32 st_rdev;
__kernel_off_t32 st_size;
- __kernel_time_t32 st_atime;
+ compat_time_t st_atime;
unsigned int __unused1;
- __kernel_time_t32 st_mtime;
+ compat_time_t st_mtime;
unsigned int __unused2;
- __kernel_time_t32 st_ctime;
+ compat_time_t st_ctime;
unsigned int __unused3;
__kernel_off_t32 st_blksize;
__kernel_off_t32 st_blocks;

2002-12-04 07:11:50

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer - X86_64

Hi Andi, Linus,

Here is the x86_64 specific patch.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

diff -ruN 2.5.50-BK.2/arch/x86_64/Kconfig 2.5.50-BK.2-32bit.1/arch/x86_64/Kconfig
--- 2.5.50-BK.2/arch/x86_64/Kconfig 2002-11-28 10:34:43.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/x86_64/Kconfig 2002-12-03 17:02:32.000000000 +1100
@@ -425,6 +425,11 @@
turn this on, unless you're 100% sure that you don't have any 32bit programs
left.

+config COMPAT
+ bool
+ depends on IA32_EMULATION
+ default y
+
endmenu

source "drivers/mtd/Kconfig"
diff -ruN 2.5.50-BK.2/arch/x86_64/ia32/ia32_binfmt.c 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/ia32_binfmt.c
--- 2.5.50-BK.2/arch/x86_64/ia32/ia32_binfmt.c 2002-10-21 01:02:47.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/ia32_binfmt.c 2002-12-04 15:35:28.000000000 +1100
@@ -6,6 +6,7 @@
* of ugly preprocessor tricks. Talk about very very poor man's inheritance.
*/
#include <linux/types.h>
+#include <linux/compat.h>
#include <linux/config.h>
#include <linux/stddef.h>
#include <linux/module.h>
@@ -53,11 +54,6 @@
int si_errno; /* errno */
};

-struct timeval32
-{
- int tv_sec, tv_usec;
-};
-
#define jiffies_to_timeval(a,b) do { (b)->tv_usec = 0; (b)->tv_sec = (a)/HZ; }while(0)

struct elf_prstatus
@@ -70,10 +66,10 @@
pid_t pr_ppid;
pid_t pr_pgrp;
pid_t pr_sid;
- struct timeval32 pr_utime; /* User time */
- struct timeval32 pr_stime; /* System time */
- struct timeval32 pr_cutime; /* Cumulative user time */
- struct timeval32 pr_cstime; /* Cumulative system time */
+ struct compat_timeval pr_utime; /* User time */
+ struct compat_timeval pr_stime; /* System time */
+ struct compat_timeval pr_cutime; /* Cumulative user time */
+ struct compat_timeval pr_cstime; /* Cumulative system time */
elf_gregset_t pr_reg; /* GP registers */
int pr_fpvalid; /* True if math co-processor being used. */
};
diff -ruN 2.5.50-BK.2/arch/x86_64/ia32/ia32_ioctl.c 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/ia32_ioctl.c
--- 2.5.50-BK.2/arch/x86_64/ia32/ia32_ioctl.c 2002-10-21 01:02:47.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/ia32_ioctl.c 2002-12-04 15:35:52.000000000 +1100
@@ -11,6 +11,7 @@

#include <linux/config.h>
#include <linux/types.h>
+#include <linux/compat.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/smp.h>
@@ -405,14 +406,9 @@
return err;
}

-struct timeval32 {
- int tv_sec;
- int tv_usec;
-};
-
static int do_siocgstamp(unsigned int fd, unsigned int cmd, unsigned long arg)
{
- struct timeval32 *up = (struct timeval32 *)arg;
+ struct compat_timeval *up = (struct compat_timeval *)arg;
struct timeval ktv;
mm_segment_t old_fs = get_fs();
int err;
@@ -1611,8 +1607,8 @@
#define PPPIOCSCOMPRESS32 _IOW('t', 77, struct ppp_option_data32)

struct ppp_idle32 {
- __kernel_time_t32 xmit_idle;
- __kernel_time_t32 recv_idle;
+ compat_time_t xmit_idle;
+ compat_time_t recv_idle;
};
#define PPPIOCGIDLE32 _IOR('t', 63, struct ppp_idle32)

diff -ruN 2.5.50-BK.2/arch/x86_64/ia32/ia32entry.S 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/ia32entry.S
--- 2.5.50-BK.2/arch/x86_64/ia32/ia32entry.S 2002-10-21 01:02:47.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/ia32entry.S 2002-12-04 17:40:54.000000000 +1100
@@ -151,7 +151,7 @@
.quad sys_alarm /* XXX sign extension??? */
.quad ni_syscall /* (old)fstat */
.quad sys_pause
- .quad sys32_utime /* 30 */
+ .quad compat_sys_utime /* 30 */
.quad ni_syscall /* old stty syscall holder */
.quad ni_syscall /* old gtty syscall holder */
.quad sys_access
@@ -225,8 +225,8 @@
.quad sys_ioperm
.quad sys32_socketcall
.quad sys_syslog
- .quad sys32_setitimer
- .quad sys32_getitimer /* 105 */
+ .quad compat_sys_setitimer
+ .quad compat_sys_getitimer /* 105 */
.quad sys32_newstat
.quad sys32_newlstat
.quad sys32_newfstat
@@ -283,7 +283,7 @@
.quad sys_sched_get_priority_max
.quad sys_sched_get_priority_min /* 160 */
.quad sys_sched_rr_get_interval
- .quad sys32_nanosleep
+ .quad compat_sys_nanosleep
.quad sys_mremap
.quad sys_setresuid16
.quad sys_getresuid16 /* 165 */
diff -ruN 2.5.50-BK.2/arch/x86_64/ia32/ipc32.c 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/ipc32.c
--- 2.5.50-BK.2/arch/x86_64/ia32/ipc32.c 2002-10-21 01:02:47.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/ipc32.c 2002-12-04 14:42:24.000000000 +1100
@@ -8,6 +8,7 @@
#include <linux/shm.h>
#include <linux/slab.h>
#include <linux/ipc.h>
+#include <linux/compat.h>
#include <asm/mman.h>
#include <asm/types.h>
#include <asm/uaccess.h>
@@ -53,8 +54,8 @@

struct semid_ds32 {
struct ipc_perm32 sem_perm; /* permissions .. see ipc.h */
- __kernel_time_t32 sem_otime; /* last semop time */
- __kernel_time_t32 sem_ctime; /* last change time */
+ compat_time_t sem_otime; /* last semop time */
+ compat_time_t sem_ctime; /* last change time */
u32 sem_base; /* ptr to first semaphore in array */
u32 sem_pending; /* pending operations to be processed */
u32 sem_pending_last; /* last pending operation */
@@ -64,9 +65,9 @@

struct semid64_ds32 {
struct ipc64_perm32 sem_perm;
- __kernel_time_t32 sem_otime;
+ compat_time_t sem_otime;
unsigned int __unused1;
- __kernel_time_t32 sem_ctime;
+ compat_time_t sem_ctime;
unsigned int __unused2;
unsigned int sem_nsems;
unsigned int __unused3;
@@ -77,9 +78,9 @@
struct ipc_perm32 msg_perm;
u32 msg_first;
u32 msg_last;
- __kernel_time_t32 msg_stime;
- __kernel_time_t32 msg_rtime;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_stime;
+ compat_time_t msg_rtime;
+ compat_time_t msg_ctime;
u32 wwait;
u32 rwait;
unsigned short msg_cbytes;
@@ -91,11 +92,11 @@

struct msqid64_ds32 {
struct ipc64_perm32 msg_perm;
- __kernel_time_t32 msg_stime;
+ compat_time_t msg_stime;
unsigned int __unused1;
- __kernel_time_t32 msg_rtime;
+ compat_time_t msg_rtime;
unsigned int __unused2;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_ctime;
unsigned int __unused3;
unsigned int msg_cbytes;
unsigned int msg_qnum;
@@ -109,9 +110,9 @@
struct shmid_ds32 {
struct ipc_perm32 shm_perm;
int shm_segsz;
- __kernel_time_t32 shm_atime;
- __kernel_time_t32 shm_dtime;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_atime;
+ compat_time_t shm_dtime;
+ compat_time_t shm_ctime;
__kernel_ipc_pid_t32 shm_cpid;
__kernel_ipc_pid_t32 shm_lpid;
unsigned short shm_nattch;
@@ -119,12 +120,12 @@

struct shmid64_ds32 {
struct ipc64_perm32 shm_perm;
- __kernel_size_t32 shm_segsz;
- __kernel_time_t32 shm_atime;
+ compat_size_t shm_segsz;
+ compat_time_t shm_atime;
unsigned int __unused1;
- __kernel_time_t32 shm_dtime;
+ compat_time_t shm_dtime;
unsigned int __unused2;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_ctime;
unsigned int __unused3;
__kernel_pid_t32 shm_cpid;
__kernel_pid_t32 shm_lpid;
diff -ruN 2.5.50-BK.2/arch/x86_64/ia32/socket32.c 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/socket32.c
--- 2.5.50-BK.2/arch/x86_64/ia32/socket32.c 2002-10-16 14:51:16.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/socket32.c 2002-12-04 14:41:55.000000000 +1100
@@ -19,6 +19,7 @@
#include <linux/icmpv6.h>
#include <linux/socket.h>
#include <linux/filter.h>
+#include <linux/compat.h>

#include <net/scm.h>
#include <net/sock.h>
@@ -123,7 +124,7 @@
{
struct cmsghdr32 *ucmsg;
struct cmsghdr *kcmsg, *kcmsg_base;
- __kernel_size_t32 ucmlen;
+ compat_size_t ucmlen;
__kernel_size_t kcmlen, tmp;

kcmlen = 0;
@@ -489,7 +490,7 @@
err = move_addr_to_user(addr, kern_msg.msg_namelen, uaddr, uaddr_len);
if(cmsg_ptr != 0 && err >= 0) {
unsigned long ucmsg_ptr = ((unsigned long)kern_msg.msg_control);
- __kernel_size_t32 uclen = (__kernel_size_t32) (ucmsg_ptr - cmsg_ptr);
+ compat_size_t uclen = (compat_size_t) (ucmsg_ptr - cmsg_ptr);
err |= __put_user(uclen, &user_msg->msg_controllen);
}
if(err >= 0)
@@ -606,10 +607,10 @@
extern asmlinkage long sys_getpeername(int fd, struct sockaddr *usockaddr,
int *usockaddr_len);
extern asmlinkage long sys_send(int fd, void *buff, size_t len, unsigned flags);
-extern asmlinkage long sys_sendto(int fd, u32 buff, __kernel_size_t32 len,
+extern asmlinkage long sys_sendto(int fd, u32 buff, compat_size_t len,
unsigned flags, u32 addr, int addr_len);
extern asmlinkage long sys_recv(int fd, void *ubuf, size_t size, unsigned flags);
-extern asmlinkage long sys_recvfrom(int fd, u32 ubuf, __kernel_size_t32 size,
+extern asmlinkage long sys_recvfrom(int fd, u32 ubuf, compat_size_t size,
unsigned flags, u32 addr, u32 addr_len);
extern asmlinkage long sys_getsockopt(int fd, int level, int optname,
u32 optval, u32 optlen);
diff -ruN 2.5.50-BK.2/arch/x86_64/ia32/sys_ia32.c 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/sys_ia32.c
--- 2.5.50-BK.2/arch/x86_64/ia32/sys_ia32.c 2002-11-18 15:47:41.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/x86_64/ia32/sys_ia32.c 2002-12-04 16:27:36.000000000 +1100
@@ -26,7 +26,6 @@
#include <linux/fs.h>
#include <linux/file.h>
#include <linux/signal.h>
-#include <linux/utime.h>
#include <linux/resource.h>
#include <linux/times.h>
#include <linux/utsname.h>
@@ -58,6 +57,7 @@
#include <linux/binfmts.h>
#include <linux/init.h>
#include <linux/aio_abi.h>
+#include <linux/compat.h>
#include <asm/mman.h>
#include <asm/types.h>
#include <asm/uaccess.h>
@@ -498,19 +498,8 @@
return ret;
}

-struct timeval32
-{
- int tv_sec, tv_usec;
-};
-
-struct itimerval32
-{
- struct timeval32 it_interval;
- struct timeval32 it_value;
-};
-
static inline long
-get_tv32(struct timeval *o, struct timeval32 *i)
+get_tv32(struct timeval *o, struct compat_timeval *i)
{
int err = -EFAULT;
if (access_ok(VERIFY_READ, i, sizeof(*i))) {
@@ -521,7 +510,7 @@
}

static inline long
-put_tv32(struct timeval32 *o, struct timeval *i)
+put_tv32(struct compat_timeval *o, struct timeval *i)
{
int err = -EFAULT;
if (access_ok(VERIFY_WRITE, o, sizeof(*o))) {
@@ -531,70 +520,6 @@
return err;
}

-static inline long
-get_it32(struct itimerval *o, struct itimerval32 *i)
-{
- int err = -EFAULT;
- if (access_ok(VERIFY_READ, i, sizeof(*i))) {
- err = __get_user(o->it_interval.tv_sec, &i->it_interval.tv_sec);
- err |= __get_user(o->it_interval.tv_usec, &i->it_interval.tv_usec);
- err |= __get_user(o->it_value.tv_sec, &i->it_value.tv_sec);
- err |= __get_user(o->it_value.tv_usec, &i->it_value.tv_usec);
- }
- return err;
-}
-
-static inline long
-put_it32(struct itimerval32 *o, struct itimerval *i)
-{
- int err = -EFAULT;
- if (access_ok(VERIFY_WRITE, o, sizeof(*o))) {
- err = __put_user(i->it_interval.tv_sec, &o->it_interval.tv_sec);
- err |= __put_user(i->it_interval.tv_usec, &o->it_interval.tv_usec);
- err |= __put_user(i->it_value.tv_sec, &o->it_value.tv_sec);
- err |= __put_user(i->it_value.tv_usec, &o->it_value.tv_usec);
- }
- return err;
-}
-
-extern int do_getitimer(int which, struct itimerval *value);
-
-asmlinkage long
-sys32_getitimer(int which, struct itimerval32 *it)
-{
- struct itimerval kit;
- int error;
-
- error = do_getitimer(which, &kit);
- if (!error && put_it32(it, &kit))
- error = -EFAULT;
-
- return error;
-}
-
-extern int do_setitimer(int which, struct itimerval *, struct itimerval *);
-
-asmlinkage long
-sys32_setitimer(int which, struct itimerval32 *in, struct itimerval32 *out)
-{
- struct itimerval kin, kout;
- int error;
-
- if (in) {
- if (get_it32(&kin, in))
- return -EFAULT;
- } else
- memset(&kin, 0, sizeof(kin));
-
- error = do_setitimer(which, &kin, out ? &kout : NULL);
- if (error || !out)
- return error;
- if (put_it32(out, &kout))
- return -EFAULT;
-
- return 0;
-
-}
asmlinkage unsigned long
sys32_alarm(unsigned int seconds)
{
@@ -616,45 +541,11 @@
/* Translations due to time_t size differences. Which affects all
sorts of things, like timeval and itimerval. */

-struct utimbuf_32 {
- int atime;
- int mtime;
-};
-
-extern asmlinkage long sys_utimes(char * filename, struct timeval * utimes);
-extern asmlinkage long sys_gettimeofday (struct timeval *tv, struct timezone *tz);
-
-asmlinkage long
-ia32_utime(char * filename, struct utimbuf_32 *times32)
-{
- mm_segment_t old_fs = get_fs();
- struct timeval tv[2];
- long ret;
-
- if (times32) {
- get_user(tv[0].tv_sec, &times32->atime);
- tv[0].tv_usec = 0;
- get_user(tv[1].tv_sec, &times32->mtime);
- tv[1].tv_usec = 0;
- set_fs (KERNEL_DS);
- } else {
- set_fs (KERNEL_DS);
- ret = sys_gettimeofday(&tv[0], 0);
- if (ret < 0)
- goto out;
- tv[1] = tv[0];
- }
- ret = sys_utimes(filename, tv);
- out:
- set_fs (old_fs);
- return ret;
-}
-
extern struct timezone sys_tz;
extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);

asmlinkage long
-sys32_gettimeofday(struct timeval32 *tv, struct timezone *tz)
+sys32_gettimeofday(struct compat_timeval *tv, struct timezone *tz)
{
if (tv) {
struct timeval ktv;
@@ -670,7 +561,7 @@
}

asmlinkage long
-sys32_settimeofday(struct timeval32 *tv, struct timezone *tz)
+sys32_settimeofday(struct compat_timeval *tv, struct timezone *tz)
{
struct timeval ktv;
struct timezone ktz;
@@ -827,7 +718,7 @@
#define ROUND_UP_TIME(x,y) (((x)+(y)-1)/(y))

asmlinkage long
-sys32_select(int n, fd_set *inp, fd_set *outp, fd_set *exp, struct timeval32 *tvp32)
+sys32_select(int n, fd_set *inp, fd_set *outp, fd_set *exp, struct compat_timeval *tvp32)
{
fd_set_bits fds;
char *bits;
@@ -931,37 +822,7 @@
if (copy_from_user(&a, arg, sizeof(a)))
return -EFAULT;
return sys32_select(a.n, (fd_set *)A(a.inp), (fd_set *)A(a.outp), (fd_set *)A(a.exp),
- (struct timeval32 *)A(a.tvp));
-}
-
-struct timespec32 {
- int tv_sec;
- int tv_nsec;
-};
-
-extern asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp);
-
-asmlinkage long
-sys32_nanosleep(struct timespec32 *rqtp, struct timespec32 *rmtp)
-{
- struct timespec t;
- int ret;
- mm_segment_t old_fs = get_fs ();
-
- if (verify_area(VERIFY_READ, rqtp, sizeof(struct timespec32)) ||
- __get_user (t.tv_sec, &rqtp->tv_sec) ||
- __get_user (t.tv_nsec, &rqtp->tv_nsec))
- return -EFAULT;
- set_fs (KERNEL_DS);
- ret = sys_nanosleep(&t, rmtp ? &t : NULL);
- set_fs (old_fs);
- if (rmtp && ret == -EINTR) {
- if (verify_area(VERIFY_WRITE, rmtp, sizeof(struct timespec32)) ||
- __put_user (t.tv_sec, &rmtp->tv_sec) ||
- __put_user (t.tv_nsec, &rmtp->tv_nsec))
- return -EFAULT;
- }
- return ret;
+ (struct compat_timeval *)A(a.tvp));
}

asmlinkage ssize_t sys_readv(unsigned long,const struct iovec *,unsigned long);
@@ -1153,8 +1014,8 @@
}

struct rusage32 {
- struct timeval32 ru_utime;
- struct timeval32 ru_stime;
+ struct compat_timeval ru_utime;
+ struct compat_timeval ru_stime;
int ru_maxrss;
int ru_ixrss;
int ru_idrss;
@@ -1406,38 +1267,6 @@

/* 32-bit timeval and related flotsam. */

-extern asmlinkage long sys_utime(char * filename, struct utimbuf * times);
-
-struct utimbuf32 {
- __kernel_time_t32 actime, modtime;
-};
-
-asmlinkage long
-sys32_utime(char * filename, struct utimbuf32 *times)
-{
- struct utimbuf t;
- mm_segment_t old_fs;
- int ret;
- char *filenam;
-
- if (!times)
- return sys_utime(filename, NULL);
- if (verify_area(VERIFY_READ, times, sizeof(struct utimbuf32)) ||
- __get_user (t.actime, &times->actime) ||
- __get_user (t.modtime, &times->modtime))
- return -EFAULT;
- filenam = getname (filename);
- ret = PTR_ERR(filenam);
- if (!IS_ERR(filenam)) {
- old_fs = get_fs();
- set_fs (KERNEL_DS);
- ret = sys_utime(filenam, &t);
- set_fs (old_fs);
- putname(filenam);
- }
- return ret;
-}
-
extern asmlinkage long sys_sysfs(int option, unsigned long arg1,
unsigned long arg2);

@@ -1528,7 +1357,7 @@
struct timespec *interval);

asmlinkage long
-sys32_sched_rr_get_interval(__kernel_pid_t32 pid, struct timespec32 *interval)
+sys32_sched_rr_get_interval(__kernel_pid_t32 pid, struct compat_timespec *interval)
{
struct timespec t;
int ret;
@@ -1537,7 +1366,7 @@
set_fs (KERNEL_DS);
ret = sys_sched_rr_get_interval(pid, &t);
set_fs (old_fs);
- if (verify_area(VERIFY_WRITE, interval, sizeof(struct timespec32)) ||
+ if (verify_area(VERIFY_WRITE, interval, sizeof(struct compat_timespec)) ||
__put_user (t.tv_sec, &interval->tv_sec) ||
__put_user (t.tv_nsec, &interval->tv_nsec))
return -EFAULT;
@@ -1582,7 +1411,7 @@
extern asmlinkage long sys_rt_sigpending(sigset_t *set, size_t sigsetsize);

asmlinkage long
-sys32_rt_sigpending(sigset32_t *set, __kernel_size_t32 sigsetsize)
+sys32_rt_sigpending(sigset32_t *set, compat_size_t sigsetsize)
{
sigset_t s;
sigset32_t s32;
@@ -1688,7 +1517,7 @@

asmlinkage long
sys32_rt_sigtimedwait(sigset32_t *uthese, siginfo_t32 *uinfo,
- struct timespec32 *uts, __kernel_size_t32 sigsetsize)
+ struct compat_timespec *uts, compat_size_t sigsetsize)
{
sigset_t s;
sigset32_t s32;
@@ -1707,7 +1536,7 @@
case 1: s.sig[0] = s32.sig[0] | (((long)s32.sig[1]) << 32);
}
if (uts) {
- if (verify_area(VERIFY_READ, uts, sizeof(struct timespec32)) ||
+ if (verify_area(VERIFY_READ, uts, sizeof(struct compat_timespec)) ||
__get_user (t.tv_sec, &uts->tv_sec) ||
__get_user (t.tv_nsec, &uts->tv_nsec))
return -EFAULT;
@@ -1749,7 +1578,7 @@
asmlinkage long sys_utimes(char *, struct timeval *);

asmlinkage long
-sys32_utimes(char *filename, struct timeval32 *tvs)
+sys32_utimes(char *filename, struct compat_timeval *tvs)
{
char *kfilename;
struct timeval ktvs[2];
@@ -1851,20 +1680,17 @@
extern asmlinkage ssize_t sys_pwrite64(unsigned int fd, const char * buf,
size_t count, loff_t pos);

-typedef __kernel_ssize_t32 ssize_t32;
-
-
/* warning: next two assume little endian */
-asmlinkage ssize_t32
-sys32_pread(unsigned int fd, char *ubuf, __kernel_size_t32 count,
+asmlinkage compat_size_t
+sys32_pread(unsigned int fd, char *ubuf, compat_size_t count,
u32 poslo, u32 poshi)
{
return sys_pread64(fd, ubuf, count,
((loff_t)AA(poshi) << 32) | AA(poslo));
}

-asmlinkage ssize_t32
-sys32_pwrite(unsigned int fd, char *ubuf, __kernel_size_t32 count,
+asmlinkage compat_size_t
+sys32_pwrite(unsigned int fd, char *ubuf, compat_size_t count,
u32 poslo, u32 poshi)
{
return sys_pwrite64(fd, ubuf, count,
@@ -1916,7 +1742,7 @@
u32 modes;
s32 offset, freq, maxerror, esterror;
s32 status, constant, precision, tolerance;
- struct timeval32 time;
+ struct compat_timeval time;
s32 tick;
s32 ppsfreq, jitter, shift, stabil;
s32 jitcnt, calcnt, errcnt, stbcnt;
diff -ruN 2.5.50-BK.2/include/asm-x86_64/compat.h 2.5.50-BK.2-32bit.1/include/asm-x86_64/compat.h
--- 2.5.50-BK.2/include/asm-x86_64/compat.h 1970-01-01 10:00:00.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-x86_64/compat.h 2002-12-04 15:17:13.000000000 +1100
@@ -0,0 +1,18 @@
+#ifndef _ASM_X86_64_COMPAT_H
+#define _ASM_X86_64_COMPAT_H
+/*
+ * Architecture specific compatibility types
+ */
+#include <linux/types.h>
+
+typedef u32 compat_size_t;
+typedef s32 compat_ssize_t;
+typedef s32 compat_time_t;
+typedef s32 compat_suseconds_t;
+
+struct compat_timespec {
+ compat_time_t tv_sec;
+ s32 tv_nsec;
+};
+
+#endif /* _ASM_X86_64_COMPAT_H */
diff -ruN 2.5.50-BK.2/include/asm-x86_64/ia32.h 2.5.50-BK.2-32bit.1/include/asm-x86_64/ia32.h
--- 2.5.50-BK.2/include/asm-x86_64/ia32.h 2002-10-21 13:35:27.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-x86_64/ia32.h 2002-12-04 14:46:21.000000000 +1100
@@ -10,10 +10,7 @@
*/

/* 32bit compatibility types */
-typedef unsigned int __kernel_size_t32;
-typedef int __kernel_ssize_t32;
typedef int __kernel_ptrdiff_t32;
-typedef int __kernel_time_t32;
typedef int __kernel_clock_t32;
typedef int __kernel_pid_t32;
typedef unsigned short __kernel_ipc_pid_t32;
diff -ruN 2.5.50-BK.2/include/asm-x86_64/socket32.h 2.5.50-BK.2-32bit.1/include/asm-x86_64/socket32.h
--- 2.5.50-BK.2/include/asm-x86_64/socket32.h 2002-10-21 01:02:53.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-x86_64/socket32.h 2002-12-04 14:31:17.000000000 +1100
@@ -1,6 +1,8 @@
#ifndef SOCKET32_H
#define SOCKET32_H 1

+#include <linux/compat.h>
+
/* XXX This really belongs in some header file... -DaveM */
#define MAX_SOCK_ADDR 128 /* 108 for Unix domain -
16 for IP, 16 for IPX,
@@ -11,14 +13,14 @@
u32 msg_name;
int msg_namelen;
u32 msg_iov;
- __kernel_size_t32 msg_iovlen;
+ compat_size_t msg_iovlen;
u32 msg_control;
- __kernel_size_t32 msg_controllen;
+ compat_size_t msg_controllen;
unsigned msg_flags;
};

struct cmsghdr32 {
- __kernel_size_t32 cmsg_len;
+ compat_size_t cmsg_len;
int cmsg_level;
int cmsg_type;
};

2002-12-04 07:19:10

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer - IA64

Hi David, Linus,

This is the IA64 specific patch.
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

diff -ruN 2.5.50-BK.2/arch/ia64/Kconfig 2.5.50-BK.2-32bit.1/arch/ia64/Kconfig
--- 2.5.50-BK.2/arch/ia64/Kconfig 2002-11-28 10:34:41.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/ia64/Kconfig 2002-12-03 16:52:09.000000000 +1100
@@ -397,6 +397,11 @@
run IA-32 Linux binaries on an IA-64 Linux system.
If in doubt, say Y.

+config COMPAT
+ bool
+ depends on IA32_SUPPORT
+ default y
+
config PERFMON
bool "Performance monitor support"
help
diff -ruN 2.5.50-BK.2/arch/ia64/ia32/ia32_entry.S 2.5.50-BK.2-32bit.1/arch/ia64/ia32/ia32_entry.S
--- 2.5.50-BK.2/arch/ia64/ia32/ia32_entry.S 2002-05-30 05:12:20.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/ia64/ia32/ia32_entry.S 2002-12-04 17:40:17.000000000 +1100
@@ -221,7 +221,7 @@
data8 sys32_alarm
data8 sys32_ni_syscall
data8 sys32_pause
- data8 sys32_utime /* 30 */
+ data8 compat_sys_utime /* 30 */
data8 sys32_ni_syscall /* old stty syscall holder */
data8 sys32_ni_syscall /* old gtty syscall holder */
data8 sys_access
@@ -295,8 +295,8 @@
data8 sys32_ioperm
data8 sys32_socketcall
data8 sys_syslog
- data8 sys32_setitimer
- data8 sys32_getitimer /* 105 */
+ data8 compat_sys_setitimer
+ data8 compat_sys_getitimer /* 105 */
data8 sys32_newstat
data8 sys32_newlstat
data8 sys32_newfstat
@@ -353,7 +353,7 @@
data8 sys_sched_get_priority_max
data8 sys_sched_get_priority_min /* 160 */
data8 sys32_sched_rr_get_interval
- data8 sys32_nanosleep
+ data8 compat_sys_nanosleep
data8 sys_mremap
data8 sys_setresuid /* 16-bit version */
data8 sys32_getresuid16 /* 16-bit version */ /* 165 */
diff -ruN 2.5.50-BK.2/arch/ia64/ia32/ia32_signal.c 2.5.50-BK.2-32bit.1/arch/ia64/ia32/ia32_signal.c
--- 2.5.50-BK.2/arch/ia64/ia32/ia32_signal.c 2002-10-31 14:05:10.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/ia64/ia32/ia32_signal.c 2002-12-03 16:50:05.000000000 +1100
@@ -22,6 +22,7 @@
#include <linux/stddef.h>
#include <linux/unistd.h>
#include <linux/wait.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include <asm/rse.h>
@@ -570,8 +571,8 @@
}

asmlinkage long
-sys32_rt_sigtimedwait (sigset32_t *uthese, siginfo_t32 *uinfo, struct timespec32 *uts,
- unsigned int sigsetsize)
+sys32_rt_sigtimedwait (sigset32_t *uthese, siginfo_t32 *uinfo,
+ struct compat_timespec *uts, unsigned int sigsetsize)
{
extern asmlinkage long sys_rt_sigtimedwait (const sigset_t *, siginfo_t *,
const struct timespec *, size_t);
diff -ruN 2.5.50-BK.2/arch/ia64/ia32/sys_ia32.c 2.5.50-BK.2-32bit.1/arch/ia64/ia32/sys_ia32.c
--- 2.5.50-BK.2/arch/ia64/ia32/sys_ia32.c 2002-11-18 15:47:40.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/ia64/ia32/sys_ia32.c 2002-12-04 16:16:19.000000000 +1100
@@ -20,7 +20,6 @@
#include <linux/fs.h>
#include <linux/file.h>
#include <linux/signal.h>
-#include <linux/utime.h>
#include <linux/resource.h>
#include <linux/times.h>
#include <linux/utsname.h>
@@ -49,6 +48,7 @@
#include <linux/ptrace.h>
#include <linux/stat.h>
#include <linux/ipc.h>
+#include <linux/compat.h>

#include <asm/types.h>
#include <asm/uaccess.h>
@@ -697,90 +697,20 @@
return ret;
}

-struct timeval32
-{
- int tv_sec, tv_usec;
-};
-
-struct itimerval32
-{
- struct timeval32 it_interval;
- struct timeval32 it_value;
-};
-
static inline long
-get_tv32 (struct timeval *o, struct timeval32 *i)
+get_tv32 (struct timeval *o, struct compat_timeval *i)
{
return (!access_ok(VERIFY_READ, i, sizeof(*i)) ||
(__get_user(o->tv_sec, &i->tv_sec) | __get_user(o->tv_usec, &i->tv_usec)));
}

static inline long
-put_tv32 (struct timeval32 *o, struct timeval *i)
+put_tv32 (struct compat_timeval *o, struct timeval *i)
{
return (!access_ok(VERIFY_WRITE, o, sizeof(*o)) ||
(__put_user(i->tv_sec, &o->tv_sec) | __put_user(i->tv_usec, &o->tv_usec)));
}

-static inline long
-get_it32 (struct itimerval *o, struct itimerval32 *i)
-{
- return (!access_ok(VERIFY_READ, i, sizeof(*i)) ||
- (__get_user(o->it_interval.tv_sec, &i->it_interval.tv_sec) |
- __get_user(o->it_interval.tv_usec, &i->it_interval.tv_usec) |
- __get_user(o->it_value.tv_sec, &i->it_value.tv_sec) |
- __get_user(o->it_value.tv_usec, &i->it_value.tv_usec)));
-}
-
-static inline long
-put_it32 (struct itimerval32 *o, struct itimerval *i)
-{
- return (!access_ok(VERIFY_WRITE, o, sizeof(*o)) ||
- (__put_user(i->it_interval.tv_sec, &o->it_interval.tv_sec) |
- __put_user(i->it_interval.tv_usec, &o->it_interval.tv_usec) |
- __put_user(i->it_value.tv_sec, &o->it_value.tv_sec) |
- __put_user(i->it_value.tv_usec, &o->it_value.tv_usec)));
-}
-
-extern int do_getitimer (int which, struct itimerval *value);
-
-asmlinkage long
-sys32_getitimer (int which, struct itimerval32 *it)
-{
- struct itimerval kit;
- int error;
-
- error = do_getitimer(which, &kit);
- if (!error && put_it32(it, &kit))
- error = -EFAULT;
-
- return error;
-}
-
-extern int do_setitimer (int which, struct itimerval *, struct itimerval *);
-
-asmlinkage long
-sys32_setitimer (int which, struct itimerval32 *in, struct itimerval32 *out)
-{
- struct itimerval kin, kout;
- int error;
-
- if (in) {
- if (get_it32(&kin, in))
- return -EFAULT;
- } else
- memset(&kin, 0, sizeof(kin));
-
- error = do_setitimer(which, &kin, out ? &kout : NULL);
- if (error || !out)
- return error;
- if (put_it32(out, &kout))
- return -EFAULT;
-
- return 0;
-
-}
-
asmlinkage unsigned long
sys32_alarm (unsigned int seconds)
{
@@ -802,42 +732,11 @@
/* Translations due to time_t size differences. Which affects all
sorts of things, like timeval and itimerval. */

-struct utimbuf_32 {
- int atime;
- int mtime;
-};
-
-extern asmlinkage long sys_utimes(char * filename, struct timeval * utimes);
-extern asmlinkage long sys_gettimeofday (struct timeval *tv, struct timezone *tz);
-
-asmlinkage long
-sys32_utime (char *filename, struct utimbuf_32 *times32)
-{
- mm_segment_t old_fs = get_fs();
- struct timeval tv[2], *tvp;
- long ret;
-
- if (times32) {
- if (get_user(tv[0].tv_sec, &times32->atime))
- return -EFAULT;
- tv[0].tv_usec = 0;
- if (get_user(tv[1].tv_sec, &times32->mtime))
- return -EFAULT;
- tv[1].tv_usec = 0;
- set_fs(KERNEL_DS);
- tvp = tv;
- } else
- tvp = NULL;
- ret = sys_utimes(filename, tvp);
- set_fs(old_fs);
- return ret;
-}
-
extern struct timezone sys_tz;
extern int do_sys_settimeofday (struct timeval *tv, struct timezone *tz);

asmlinkage long
-sys32_gettimeofday (struct timeval32 *tv, struct timezone *tz)
+sys32_gettimeofday (struct compat_timeval *tv, struct timezone *tz)
{
if (tv) {
struct timeval ktv;
@@ -853,7 +752,7 @@
}

asmlinkage long
-sys32_settimeofday (struct timeval32 *tv, struct timezone *tz)
+sys32_settimeofday (struct compat_timeval *tv, struct timezone *tz)
{
struct timeval ktv;
struct timezone ktz;
@@ -1003,7 +902,7 @@
#define ROUND_UP_TIME(x,y) (((x)+(y)-1)/(y))

asmlinkage long
-sys32_select (int n, fd_set *inp, fd_set *outp, fd_set *exp, struct timeval32 *tvp32)
+sys32_select (int n, fd_set *inp, fd_set *outp, fd_set *exp, struct compat_timeval *tvp32)
{
fd_set_bits fds;
char *bits;
@@ -1110,28 +1009,7 @@
if (copy_from_user(&a, arg, sizeof(a)))
return -EFAULT;
return sys32_select(a.n, (fd_set *) A(a.inp), (fd_set *) A(a.outp), (fd_set *) A(a.exp),
- (struct timeval32 *) A(a.tvp));
-}
-
-extern asmlinkage long sys_nanosleep (struct timespec *rqtp, struct timespec *rmtp);
-
-asmlinkage long
-sys32_nanosleep (struct timespec32 *rqtp, struct timespec32 *rmtp)
-{
- struct timespec t;
- int ret;
- mm_segment_t old_fs = get_fs();
-
- if (get_user (t.tv_sec, &rqtp->tv_sec) || get_user (t.tv_nsec, &rqtp->tv_nsec))
- return -EFAULT;
- set_fs(KERNEL_DS);
- ret = sys_nanosleep(&t, rmtp ? &t : NULL);
- set_fs(old_fs);
- if (rmtp && ret == -EINTR) {
- if (put_user(t.tv_sec, &rmtp->tv_sec) || put_user(t.tv_nsec, &rmtp->tv_nsec))
- return -EFAULT;
- }
- return ret;
+ (struct compat_timeval *) A(a.tvp));
}

struct iovec32 { unsigned int iov_base; int iov_len; };
@@ -1304,7 +1182,7 @@
};

struct cmsghdr32 {
- __kernel_size_t32 cmsg_len;
+ compat_size_t cmsg_len;
int cmsg_level;
int cmsg_type;
};
@@ -1369,7 +1247,7 @@
{
struct cmsghdr *kcmsg, *kcmsg_base;
__kernel_size_t kcmlen, tmp;
- __kernel_size_t32 ucmlen;
+ compat_size_t ucmlen;
struct cmsghdr32 *ucmsg;
long err;

@@ -1893,10 +1771,10 @@
extern asmlinkage long sys_getpeername(int fd, struct sockaddr *usockaddr,
int *usockaddr_len);
extern asmlinkage long sys_send(int fd, void *buff, size_t len, unsigned flags);
-extern asmlinkage long sys_sendto(int fd, u32 buff, __kernel_size_t32 len,
+extern asmlinkage long sys_sendto(int fd, u32 buff, compat_size_t len,
unsigned flags, u32 addr, int addr_len);
extern asmlinkage long sys_recv(int fd, void *ubuf, size_t size, unsigned flags);
-extern asmlinkage long sys_recvfrom(int fd, u32 ubuf, __kernel_size_t32 size,
+extern asmlinkage long sys_recvfrom(int fd, u32 ubuf, compat_size_t size,
unsigned flags, u32 addr, u32 addr_len);
extern asmlinkage long sys_setsockopt(int fd, int level, int optname,
char *optval, int optlen);
@@ -2018,8 +1896,8 @@

struct semid_ds32 {
struct ipc_perm32 sem_perm; /* permissions .. see ipc.h */
- __kernel_time_t32 sem_otime; /* last semop time */
- __kernel_time_t32 sem_ctime; /* last change time */
+ compat_time_t sem_otime; /* last semop time */
+ compat_time_t sem_ctime; /* last change time */
u32 sem_base; /* ptr to first semaphore in array */
u32 sem_pending; /* pending operations to be processed */
u32 sem_pending_last; /* last pending operation */
@@ -2029,9 +1907,9 @@

struct semid64_ds32 {
struct ipc64_perm32 sem_perm;
- __kernel_time_t32 sem_otime;
+ compat_time_t sem_otime;
unsigned int __unused1;
- __kernel_time_t32 sem_ctime;
+ compat_time_t sem_ctime;
unsigned int __unused2;
unsigned int sem_nsems;
unsigned int __unused3;
@@ -2042,9 +1920,9 @@
struct ipc_perm32 msg_perm;
u32 msg_first;
u32 msg_last;
- __kernel_time_t32 msg_stime;
- __kernel_time_t32 msg_rtime;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_stime;
+ compat_time_t msg_rtime;
+ compat_time_t msg_ctime;
u32 wwait;
u32 rwait;
unsigned short msg_cbytes;
@@ -2056,11 +1934,11 @@

struct msqid64_ds32 {
struct ipc64_perm32 msg_perm;
- __kernel_time_t32 msg_stime;
+ compat_time_t msg_stime;
unsigned int __unused1;
- __kernel_time_t32 msg_rtime;
+ compat_time_t msg_rtime;
unsigned int __unused2;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_ctime;
unsigned int __unused3;
unsigned int msg_cbytes;
unsigned int msg_qnum;
@@ -2074,9 +1952,9 @@
struct shmid_ds32 {
struct ipc_perm32 shm_perm;
int shm_segsz;
- __kernel_time_t32 shm_atime;
- __kernel_time_t32 shm_dtime;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_atime;
+ compat_time_t shm_dtime;
+ compat_time_t shm_ctime;
__kernel_ipc_pid_t32 shm_cpid;
__kernel_ipc_pid_t32 shm_lpid;
unsigned short shm_nattch;
@@ -2084,12 +1962,12 @@

struct shmid64_ds32 {
struct ipc64_perm shm_perm;
- __kernel_size_t32 shm_segsz;
- __kernel_time_t32 shm_atime;
+ compat_size_t shm_segsz;
+ compat_time_t shm_atime;
unsigned int __unused1;
- __kernel_time_t32 shm_dtime;
+ compat_time_t shm_dtime;
unsigned int __unused2;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_ctime;
unsigned int __unused3;
__kernel_pid_t32 shm_cpid;
__kernel_pid_t32 shm_lpid;
@@ -2614,8 +2492,8 @@
}

struct rusage32 {
- struct timeval32 ru_utime;
- struct timeval32 ru_stime;
+ struct compat_timeval ru_utime;
+ struct compat_timeval ru_stime;
int ru_maxrss;
int ru_ixrss;
int ru_idrss;
@@ -3623,7 +3501,7 @@
}

asmlinkage long
-sys32_sched_rr_get_interval (pid_t pid, struct timespec32 *interval)
+sys32_sched_rr_get_interval (pid_t pid, struct compat_timespec *interval)
{
extern asmlinkage long sys_sched_rr_get_interval (pid_t, struct timespec *);
mm_segment_t old_fs = get_fs();
@@ -4192,7 +4070,7 @@
u32 modes;
s32 offset, freq, maxerror, esterror;
s32 status, constant, precision, tolerance;
- struct timeval32 time;
+ struct compat_timeval time;
s32 tick;
s32 ppsfreq, jitter, shift, stabil;
s32 jitcnt, calcnt, errcnt, stbcnt;
diff -ruN 2.5.50-BK.2/include/asm-ia64/compat.h 2.5.50-BK.2-32bit.1/include/asm-ia64/compat.h
--- 2.5.50-BK.2/include/asm-ia64/compat.h 1970-01-01 10:00:00.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-ia64/compat.h 2002-12-04 15:05:49.000000000 +1100
@@ -0,0 +1,19 @@
+#ifndef _ASM_IA64_COMPAT_H
+#define _ASM_IA64_COMPAT_H
+/*
+ * Architecture specific compatibility types
+ */
+
+#include <linux/types.h>
+
+typedef u32 compat_size_t;
+typedef s32 compat_ssize_t;
+typedef s32 compat_time_t;
+typedef s32 compat_suseconds_t;
+
+struct compat_timespec {
+ compat_time_t tv_sec;
+ s32 tv_nsec;
+};
+
+#endif /* _ASM_IA64_COMPAT_H */
diff -ruN 2.5.50-BK.2/include/asm-ia64/ia32.h 2.5.50-BK.2-32bit.1/include/asm-ia64/ia32.h
--- 2.5.50-BK.2/include/asm-ia64/ia32.h 2002-10-31 14:06:05.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/include/asm-ia64/ia32.h 2002-12-04 14:45:54.000000000 +1100
@@ -12,10 +12,7 @@
*/

/* 32bit compatibility types */
-typedef unsigned int __kernel_size_t32;
-typedef int __kernel_ssize_t32;
typedef int __kernel_ptrdiff_t32;
-typedef int __kernel_time_t32;
typedef int __kernel_clock_t32;
typedef int __kernel_pid_t32;
typedef unsigned short __kernel_ipc_pid_t32;
@@ -41,11 +38,6 @@
#define IA32_CLOCKS_PER_SEC 100 /* Cast in stone for IA32 Linux */
#define IA32_TICK(tick) ((unsigned long long)(tick) * IA32_CLOCKS_PER_SEC / CLOCKS_PER_SEC)

-struct timespec32 {
- int tv_sec;
- int tv_nsec;
-};
-
/* fcntl.h */
struct flock32 {
short l_type;

2002-12-04 07:22:20

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer - PARISC

Hi Willy, Linus,

This is tha PARISC specific patch.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

diff -ruN 2.5.50-BK.2/arch/parisc/Kconfig 2.5.50-BK.2-32bit.1/arch/parisc/Kconfig
--- 2.5.50-BK.2/arch/parisc/Kconfig 2002-11-18 15:47:40.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/parisc/Kconfig 2002-12-03 16:56:52.000000000 +1100
@@ -107,6 +107,11 @@
enable this option otherwise. The 64bit kernel is significantly bigger
and slower than the 32bit one.

+config COMPAT
+ bool
+ depends PARISC64
+ default y
+
config PDC_NARROW
bool "32-bit firmware"
depends on PARISC64
diff -ruN 2.5.50-BK.2/arch/parisc/kernel/binfmt_elf32.c 2.5.50-BK.2-32bit.1/arch/parisc/kernel/binfmt_elf32.c
--- 2.5.50-BK.2/arch/parisc/kernel/binfmt_elf32.c 2002-10-31 14:05:12.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/parisc/kernel/binfmt_elf32.c 2002-12-04 15:24:51.000000000 +1100
@@ -19,7 +19,7 @@
#include <linux/module.h>
#include <linux/config.h>
#include <linux/elfcore.h>
-#include "sys32.h" /* struct timeval32 */
+#include <linux/compat.h> /* struct compat_timeval */

#define elf_prstatus elf_prstatus32
struct elf_prstatus32
@@ -32,10 +32,10 @@
pid_t pr_ppid;
pid_t pr_pgrp;
pid_t pr_sid;
- struct timeval32 pr_utime; /* User time */
- struct timeval32 pr_stime; /* System time */
- struct timeval32 pr_cutime; /* Cumulative user time */
- struct timeval32 pr_cstime; /* Cumulative system time */
+ struct compat_timeval pr_utime; /* User time */
+ struct compat_timeval pr_stime; /* System time */
+ struct compat_timeval pr_cutime; /* Cumulative user time */
+ struct compat_timeval pr_cstime; /* Cumulative system time */
elf_gregset_t pr_reg; /* GP registers */
int pr_fpvalid; /* True if math co-processor being used. */
};
diff -ruN 2.5.50-BK.2/arch/parisc/kernel/ioctl32.c 2.5.50-BK.2-32bit.1/arch/parisc/kernel/ioctl32.c
--- 2.5.50-BK.2/arch/parisc/kernel/ioctl32.c 2002-10-31 14:05:12.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/parisc/kernel/ioctl32.c 2002-12-04 15:25:09.000000000 +1100
@@ -10,6 +10,7 @@

#include <linux/config.h>
#include <linux/types.h>
+#include <linux/compat.h>
#include "sys32.h"
#include <linux/kernel.h>
#include <linux/sched.h>
@@ -164,7 +165,7 @@

static int do_siocgstamp(unsigned int fd, unsigned int cmd, unsigned long arg)
{
- struct timeval32 *up = (struct timeval32 *)arg;
+ struct compat_timeval *up = (struct compat_timeval *)arg;
struct timeval ktv;
mm_segment_t old_fs = get_fs();
int err;
@@ -1060,8 +1061,8 @@
#define PPPIOCSCOMPRESS32 _IOW('t', 77, struct ppp_option_data32)

struct ppp_idle32 {
- __kernel_time_t32 xmit_idle;
- __kernel_time_t32 recv_idle;
+ compat_time_t xmit_idle;
+ compat_time_t recv_idle;
};
#define PPPIOCGIDLE32 _IOR('t', 63, struct ppp_idle32)

diff -ruN 2.5.50-BK.2/arch/parisc/kernel/signal32.c 2.5.50-BK.2-32bit.1/arch/parisc/kernel/signal32.c
--- 2.5.50-BK.2/arch/parisc/kernel/signal32.c 2002-10-31 14:05:13.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/parisc/kernel/signal32.c 2002-12-04 14:34:27.000000000 +1100
@@ -8,6 +8,7 @@
#include <linux/sched.h>
#include <linux/types.h>
#include <linux/errno.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include "sys32.h"
@@ -175,7 +176,7 @@
typedef struct {
unsigned int ss_sp;
int ss_flags;
- __kernel_size_t32 ss_size;
+ compat_size_t ss_size;
} stack_t32;

int
diff -ruN 2.5.50-BK.2/arch/parisc/kernel/sys32.h 2.5.50-BK.2-32bit.1/arch/parisc/kernel/sys32.h
--- 2.5.50-BK.2/arch/parisc/kernel/sys32.h 2002-10-31 14:05:13.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/parisc/kernel/sys32.h 2002-12-04 15:25:32.000000000 +1100
@@ -12,11 +12,6 @@
set_fs (old_fs); \
}

-struct timeval32 {
- int tv_sec;
- int tv_usec;
-};
-
typedef __u32 __sighandler_t32;

#include <linux/signal.h>
diff -ruN 2.5.50-BK.2/arch/parisc/kernel/sys_parisc32.c 2.5.50-BK.2-32bit.1/arch/parisc/kernel/sys_parisc32.c
--- 2.5.50-BK.2/arch/parisc/kernel/sys_parisc32.c 2002-11-18 15:47:40.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/parisc/kernel/sys_parisc32.c 2002-12-04 16:18:23.000000000 +1100
@@ -16,7 +16,6 @@
#include <linux/mm.h>
#include <linux/file.h>
#include <linux/signal.h>
-#include <linux/utime.h>
#include <linux/resource.h>
#include <linux/times.h>
#include <linux/utsname.h>
@@ -52,6 +51,7 @@
#include <linux/mman.h>
#include <linux/binfmts.h>
#include <linux/namei.h>
+#include <linux/compat.h>

#include <asm/types.h>
#include <asm/uaccess.h>
@@ -386,42 +386,6 @@
* code available in case it's useful to others. -PB
*/

-/* from utime.h */
-struct utimbuf32 {
- __kernel_time_t32 actime;
- __kernel_time_t32 modtime;
-};
-
-asmlinkage long sys32_utime(char *filename, struct utimbuf32 *times)
-{
- struct utimbuf32 times32;
- struct utimbuf times64;
- extern long sys_utime(char *filename, struct utimbuf *times);
- char *fname;
- long ret;
-
- if (!times)
- return sys_utime(filename, NULL);
-
- /* get the 32-bit struct from user space */
- if (copy_from_user(&times32, times, sizeof times32))
- return -EFAULT;
-
- /* convert it into the 64-bit one */
- times64.actime = times32.actime;
- times64.modtime = times32.modtime;
-
- /* grab the file name */
- fname = getname(filename);
-
- KERNEL_SYSCALL(ret, sys_utime, fname, &times64);
-
- /* free the file name */
- putname(fname);
-
- return ret;
-}
-
struct tms32 {
__kernel_clock_t32 tms_utime;
__kernel_clock_t32 tms_stime;
@@ -584,71 +548,42 @@
}
#endif /* CONFIG_SYSCTL */

-struct timespec32 {
- s32 tv_sec;
- s32 tv_nsec;
-};
-
static int
-put_timespec32(struct timespec32 *u, struct timespec *t)
+put_compat_timespec(struct compat_timespec *u, struct timespec *t)
{
- struct timespec32 t32;
+ struct compat_timespec t32;
t32.tv_sec = t->tv_sec;
t32.tv_nsec = t->tv_nsec;
return copy_to_user(u, &t32, sizeof t32);
}

-asmlinkage int sys32_nanosleep(struct timespec32 *rqtp, struct timespec32 *rmtp)
-{
- struct timespec t;
- struct timespec32 t32;
- int ret;
- extern asmlinkage int sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp);
-
- if (copy_from_user(&t32, rqtp, sizeof t32))
- return -EFAULT;
- t.tv_sec = t32.tv_sec;
- t.tv_nsec = t32.tv_nsec;
-
- DBG(("sys32_nanosleep({%d, %d})\n", t32.tv_sec, t32.tv_nsec));
-
- KERNEL_SYSCALL(ret, sys_nanosleep, &t, rmtp ? &t : NULL);
- if (rmtp && ret == -EINTR) {
- if (put_timespec32(rmtp, &t))
- return -EFAULT;
- }
- return ret;
-}
-
asmlinkage long sys32_sched_rr_get_interval(pid_t pid,
- struct timespec32 *interval)
+ struct compat_timespec *interval)
{
struct timespec t;
int ret;
extern asmlinkage long sys_sched_rr_get_interval(pid_t pid, struct timespec *interval);

KERNEL_SYSCALL(ret, sys_sched_rr_get_interval, pid, &t);
- if (put_timespec32(interval, &t))
+ if (put_compat_timespec(interval, &t))
return -EFAULT;
return ret;
}

-typedef __kernel_time_t32 time_t32;
-
static int
-put_timeval32(struct timeval32 *u, struct timeval *t)
+put_compat_timeval(struct compat_timeval *u, struct timeval *t)
{
- struct timeval32 t32;
+ struct compat_timeval t32;
t32.tv_sec = t->tv_sec;
t32.tv_usec = t->tv_usec;
return copy_to_user(u, &t32, sizeof t32);
}

static int
-get_timeval32(struct timeval32 *u, struct timeval *t)
+get_compat_timeval(struct compat_timeval *u, struct timeval *t)
{
int err;
- struct timeval32 t32;
+ struct compat_timeval t32;

if ((err = copy_from_user(&t32, u, sizeof t32)) == 0)
{
@@ -658,10 +593,10 @@
return err;
}

-asmlinkage long sys32_time(time_t32 *tloc)
+asmlinkage long sys32_time(compat_time_t *tloc)
{
time_t now = get_seconds();
- time_t32 now32 = now;
+ compat_time_t now32 = now;

if (tloc)
if (put_user(now32, tloc))
@@ -671,14 +606,14 @@
}

asmlinkage int
-sys32_gettimeofday(struct timeval32 *tv, struct timezone *tz)
+sys32_gettimeofday(struct compat_timeval *tv, struct timezone *tz)
{
extern void do_gettimeofday(struct timeval *tv);

if (tv) {
struct timeval ktv;
do_gettimeofday(&ktv);
- if (put_timeval32(tv, &ktv))
+ if (put_compat_timeval(tv, &ktv))
return -EFAULT;
}
if (tz) {
@@ -690,14 +625,14 @@
}

asmlinkage int
-sys32_settimeofday(struct timeval32 *tv, struct timezone *tz)
+sys32_settimeofday(struct compat_timeval *tv, struct timezone *tz)
{
struct timeval ktv;
struct timezone ktz;
extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);

if (tv) {
- if (get_timeval32(tv, &ktv))
+ if (get_compat_timeval(tv, &ktv))
return -EFAULT;
}
if (tz) {
@@ -708,67 +643,9 @@
return do_sys_settimeofday(tv ? &ktv : NULL, tz ? &ktz : NULL);
}

-struct itimerval32 {
- struct timeval32 it_interval; /* timer interval */
- struct timeval32 it_value; /* current value */
-};
-
-asmlinkage long sys32_getitimer(int which, struct itimerval32 *ov32)
-{
- int error = -EFAULT;
- struct itimerval get_buffer;
- extern int do_getitimer(int which, struct itimerval *value);
-
- if (ov32) {
- error = do_getitimer(which, &get_buffer);
- if (!error) {
- struct itimerval32 gb32;
- gb32.it_interval.tv_sec = get_buffer.it_interval.tv_sec;
- gb32.it_interval.tv_usec = get_buffer.it_interval.tv_usec;
- gb32.it_value.tv_sec = get_buffer.it_value.tv_sec;
- gb32.it_value.tv_usec = get_buffer.it_value.tv_usec;
- if (copy_to_user(ov32, &gb32, sizeof(gb32)))
- error = -EFAULT;
- }
- }
- return error;
-}
-
-asmlinkage long sys32_setitimer(int which, struct itimerval32 *v32,
- struct itimerval32 *ov32)
-{
- struct itimerval set_buffer, get_buffer;
- struct itimerval32 sb32, gb32;
- extern int do_setitimer(int which, struct itimerval *value, struct itimerval *ov32);
- int error;
-
- if (v32) {
- if(copy_from_user(&sb32, v32, sizeof(sb32)))
- return -EFAULT;
-
- set_buffer.it_interval.tv_sec = sb32.it_interval.tv_sec;
- set_buffer.it_interval.tv_usec = sb32.it_interval.tv_usec;
- set_buffer.it_value.tv_sec = sb32.it_value.tv_sec;
- set_buffer.it_value.tv_usec = sb32.it_value.tv_usec;
- } else
- memset((char *) &set_buffer, 0, sizeof(set_buffer));
-
- error = do_setitimer(which, &set_buffer, ov32 ? &get_buffer : 0);
- if (error || !ov32)
- return error;
-
- gb32.it_interval.tv_sec = get_buffer.it_interval.tv_sec;
- gb32.it_interval.tv_usec = get_buffer.it_interval.tv_usec;
- gb32.it_value.tv_sec = get_buffer.it_value.tv_sec;
- gb32.it_value.tv_usec = get_buffer.it_value.tv_usec;
- if (copy_to_user(ov32, &gb32, sizeof(gb32)))
- return -EFAULT;
- return 0;
-}
-
struct rusage32 {
- struct timeval32 ru_utime;
- struct timeval32 ru_stime;
+ struct compat_timeval ru_utime;
+ struct compat_timeval ru_stime;
int ru_maxrss;
int ru_ixrss;
int ru_idrss;
@@ -850,11 +727,11 @@
unsigned short st_reserved2; /* old st_gid */
__kernel_dev_t32 st_rdev;
__kernel_off_t32 st_size;
- __kernel_time_t32 st_atime;
+ compat_time_t st_atime;
unsigned int st_spare1;
- __kernel_time_t32 st_mtime;
+ compat_time_t st_mtime;
unsigned int st_spare2;
- __kernel_time_t32 st_ctime;
+ compat_time_t st_ctime;
unsigned int st_spare3;
int st_blksize;
int st_blocks;
@@ -1302,7 +1179,7 @@
}

static int
-qm_modules(char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_modules(char *buf, size_t bufsize, compat_size_t *ret)
{
struct module *mod;
size_t nmod, space, len;
@@ -1337,7 +1214,7 @@
}

static int
-qm_deps(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_deps(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
size_t i, space, len;

@@ -1374,7 +1251,7 @@
}

static int
-qm_refs(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_refs(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
size_t nrefs, space, len;
struct module_ref *ref;
@@ -1418,7 +1295,7 @@
}

static inline int
-qm_symbols(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_symbols(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
size_t i, space, len;
struct module_symbol *s;
@@ -1477,7 +1354,7 @@
}

static inline int
-qm_info(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_info(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
int error = 0;

@@ -1505,7 +1382,7 @@
return error;
}

-asmlinkage int sys32_query_module(char *name_user, int which, char *buf, __kernel_size_t32 bufsize, __kernel_size_t32 *ret)
+asmlinkage int sys32_query_module(char *name_user, int which, char *buf, compat_size_t bufsize, compat_size_t *ret)
{
struct module *mod;
int err;
@@ -1776,14 +1653,14 @@
u32 msg_name;
int msg_namelen;
u32 msg_iov;
- __kernel_size_t32 msg_iovlen;
+ compat_size_t msg_iovlen;
u32 msg_control;
- __kernel_size_t32 msg_controllen;
+ compat_size_t msg_controllen;
unsigned msg_flags;
};

struct cmsghdr32 {
- __kernel_size_t32 cmsg_len;
+ compat_size_t cmsg_len;
int cmsg_level;
int cmsg_type;
};
@@ -1917,7 +1794,7 @@
{
struct cmsghdr32 *ucmsg;
struct cmsghdr *kcmsg, *kcmsg_base;
- __kernel_size_t32 ucmlen;
+ compat_size_t ucmlen;
__kernel_size_t kcmlen, tmp;

kcmlen = 0;
@@ -2283,7 +2160,7 @@
err = move_addr_to_user(addr, kern_msg.msg_namelen, uaddr, uaddr_len);
if(cmsg_ptr != 0 && err >= 0) {
unsigned long ucmsg_ptr = ((unsigned long)kern_msg.msg_control);
- __kernel_size_t32 uclen = (__kernel_size_t32) (ucmsg_ptr - cmsg_ptr);
+ compat_size_t uclen = (compat_size_t) (ucmsg_ptr - cmsg_ptr);
err |= __put_user(uclen, &user_msg->msg_controllen);
}
if(err >= 0)
@@ -2590,7 +2467,7 @@
#define DIVIDE_ROUND_UP(x,y) (((x)+(y)-1)/(y))

asmlinkage long
-sys32_select(int n, u32 *inp, u32 *outp, u32 *exp, struct timeval32 *tvp)
+sys32_select(int n, u32 *inp, u32 *outp, u32 *exp, struct compat_timeval *tvp)
{
fd_set_bits fds;
char *bits;
@@ -2599,7 +2476,7 @@

timeout = MAX_SCHEDULE_TIMEOUT;
if (tvp) {
- struct timeval32 tv32;
+ struct compat_timeval tv32;
time_t sec, usec;

if ((ret = copy_from_user(&tv32, tvp, sizeof tv32)))
@@ -2903,8 +2780,8 @@
__u32 dqb_ihardlimit;
__u32 dqb_isoftlimit;
__u32 dqb_curinodes;
- __kernel_time_t32 dqb_btime;
- __kernel_time_t32 dqb_itime;
+ compat_time_t dqb_btime;
+ compat_time_t dqb_itime;
};


@@ -2965,7 +2842,7 @@
int tolerance; /* clock frequency tolerance (ppm)
* (read only)
*/
- struct timeval32 time; /* (read only) */
+ struct compat_timeval time; /* (read only) */
int tick; /* (modified) usecs between clock ticks */

int ppsfreq; /* pps frequency (scaled ppm) (ro) */
diff -ruN 2.5.50-BK.2/include/asm-parisc/compat.h 2.5.50-BK.2-32bit.1/include/asm-parisc/compat.h
--- 2.5.50-BK.2/include/asm-parisc/compat.h 1970-01-01 10:00:00.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-parisc/compat.h 2002-12-04 15:14:16.000000000 +1100
@@ -0,0 +1,18 @@
+#ifndef _ASM_PARISC_COMPAT_H
+#define _ASM_PARISC_COMPAT_H
+/*
+ * Architecture specific compatibility types
+ */
+#include <linux/types.h>
+
+typedef u32 compat_size_t;
+typedef s32 compat_ssize_t;
+typedef s32 compat_time_t;
+typedef s32 compat_suseconds_t;
+
+struct compat_timespec {
+ compat_time_t tv_sec;
+ s32 tv_nsec;
+};
+
+#endif /* _ASM_PARISC_COMPAT_H */
diff -ruN 2.5.50-BK.2/include/asm-parisc/posix_types.h 2.5.50-BK.2-32bit.1/include/asm-parisc/posix_types.h
--- 2.5.50-BK.2/include/asm-parisc/posix_types.h 2002-10-31 14:06:07.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/include/asm-parisc/posix_types.h 2002-12-04 14:46:06.000000000 +1100
@@ -66,10 +66,7 @@
typedef unsigned short __kernel_ipc_pid_t32;
typedef unsigned int __kernel_uid_t32;
typedef unsigned int __kernel_gid_t32;
-typedef unsigned int __kernel_size_t32;
-typedef int __kernel_ssize_t32;
typedef int __kernel_ptrdiff_t32;
-typedef int __kernel_time_t32;
typedef int __kernel_suseconds_t32;
typedef int __kernel_clock_t32;
typedef int __kernel_daddr_t32;

2002-12-04 07:20:55

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Hi Ralf, Linus,

This is the MIPS64 specific patch.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

diff -ruN 2.5.50-BK.2/arch/mips64/Kconfig 2.5.50-BK.2-32bit.1/arch/mips64/Kconfig
--- 2.5.50-BK.2/arch/mips64/Kconfig 2002-11-28 10:35:37.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/mips64/Kconfig 2002-12-03 16:53:57.000000000 +1100
@@ -371,6 +371,11 @@
compatibility. Since all software available for Linux/MIPS is
currently 32-bit you should say Y here.

+config COMPAT
+ bool
+ depends on MIPS32_COMPAT
+ default y
+
config BINFMT_ELF32
bool
depends on MIPS32_COMPAT
diff -ruN 2.5.50-BK.2/arch/mips64/kernel/ioctl32.c 2.5.50-BK.2-32bit.1/arch/mips64/kernel/ioctl32.c
--- 2.5.50-BK.2/arch/mips64/kernel/ioctl32.c 2002-11-05 10:50:55.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/mips64/kernel/ioctl32.c 2002-12-04 15:21:53.000000000 +1100
@@ -9,6 +9,7 @@
*/
#include <linux/config.h>
#include <linux/types.h>
+#include <linux/compat.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/sched.h>
@@ -65,14 +66,9 @@

#define A(__x) ((unsigned long)(__x))

-struct timeval32 {
- int tv_sec;
- int tv_usec;
-};
-
static int do_siocgstamp(unsigned int fd, unsigned int cmd, unsigned long arg)
{
- struct timeval32 *up = (struct timeval32 *)arg;
+ struct compat_timeval *up = (struct compat_timeval *)arg;
struct timeval ktv;
mm_segment_t old_fs = get_fs();
int err;
diff -ruN 2.5.50-BK.2/arch/mips64/kernel/linux32.c 2.5.50-BK.2-32bit.1/arch/mips64/kernel/linux32.c
--- 2.5.50-BK.2/arch/mips64/kernel/linux32.c 2002-10-21 01:02:44.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/mips64/kernel/linux32.c 2002-12-04 16:17:27.000000000 +1100
@@ -22,11 +22,11 @@
#include <linux/sem.h>
#include <linux/msg.h>
#include <linux/sysctl.h>
-#include <linux/utime.h>
#include <linux/utsname.h>
#include <linux/personality.h>
#include <linux/timex.h>
#include <linux/dnotify.h>
+#include <linux/compat.h>
#include <net/sock.h>

#include <asm/uaccess.h>
@@ -116,36 +116,6 @@
return sys_ftruncate(fd, ((long) high << 32) | low);
}

-extern asmlinkage int sys_utime(char * filename, struct utimbuf * times);
-
-struct utimbuf32 {
- __kernel_time_t32 actime, modtime;
-};
-
-asmlinkage int sys32_utime(char * filename, struct utimbuf32 *times)
-{
- struct utimbuf t;
- mm_segment_t old_fs;
- int ret;
- char *filenam;
-
- if (!times)
- return sys_utime(filename, NULL);
- if (get_user (t.actime, &times->actime) ||
- __get_user (t.modtime, &times->modtime))
- return -EFAULT;
- filenam = getname (filename);
- ret = PTR_ERR(filenam);
- if (!IS_ERR(filenam)) {
- old_fs = get_fs();
- set_fs (KERNEL_DS);
- ret = sys_utime(filenam, &t);
- set_fs (old_fs);
- putname (filenam);
- }
- return ret;
-}
-
#if 0
/*
* count32() counts the number of arguments/envelopes
@@ -463,20 +433,9 @@
return(n);
}

-struct timeval32
-{
- int tv_sec, tv_usec;
-};
-
-struct itimerval32
-{
- struct timeval32 it_interval;
- struct timeval32 it_value;
-};
-
struct rusage32 {
- struct timeval32 ru_utime;
- struct timeval32 ru_stime;
+ struct compat_timeval ru_utime;
+ struct compat_timeval ru_stime;
int ru_maxrss;
int ru_ixrss;
int ru_idrss;
@@ -683,7 +642,7 @@
}

static inline long
-get_tv32(struct timeval *o, struct timeval32 *i)
+get_tv32(struct timeval *o, struct compat_timeval *i)
{
return (!access_ok(VERIFY_READ, i, sizeof(*i)) ||
(__get_user(o->tv_sec, &i->tv_sec) |
@@ -691,72 +650,13 @@
}

static inline long
-get_it32(struct itimerval *o, struct itimerval32 *i)
-{
- return (!access_ok(VERIFY_READ, i, sizeof(*i)) ||
- (__get_user(o->it_interval.tv_sec, &i->it_interval.tv_sec) |
- __get_user(o->it_interval.tv_usec, &i->it_interval.tv_usec) |
- __get_user(o->it_value.tv_sec, &i->it_value.tv_sec) |
- __get_user(o->it_value.tv_usec, &i->it_value.tv_usec)));
-}
-
-static inline long
-put_tv32(struct timeval32 *o, struct timeval *i)
+put_tv32(struct compat_timeval *o, struct timeval *i)
{
return (!access_ok(VERIFY_WRITE, o, sizeof(*o)) ||
(__put_user(i->tv_sec, &o->tv_sec) |
__put_user(i->tv_usec, &o->tv_usec)));
}

-static inline long
-put_it32(struct itimerval32 *o, struct itimerval *i)
-{
- return (!access_ok(VERIFY_WRITE, o, sizeof(*o)) ||
- (__put_user(i->it_interval.tv_sec, &o->it_interval.tv_sec) |
- __put_user(i->it_interval.tv_usec, &o->it_interval.tv_usec) |
- __put_user(i->it_value.tv_sec, &o->it_value.tv_sec) |
- __put_user(i->it_value.tv_usec, &o->it_value.tv_usec)));
-}
-
-extern int do_getitimer(int which, struct itimerval *value);
-
-asmlinkage int
-sys32_getitimer(int which, struct itimerval32 *it)
-{
- struct itimerval kit;
- int error;
-
- error = do_getitimer(which, &kit);
- if (!error && put_it32(it, &kit))
- error = -EFAULT;
-
- return error;
-}
-
-extern int do_setitimer(int which, struct itimerval *, struct itimerval *);
-
-
-asmlinkage int
-sys32_setitimer(int which, struct itimerval32 *in, struct itimerval32 *out)
-{
- struct itimerval kin, kout;
- int error;
-
- if (in) {
- if (get_it32(&kin, in))
- return -EFAULT;
- } else
- memset(&kin, 0, sizeof(kin));
-
- error = do_setitimer(which, &kin, out ? &kout : NULL);
- if (error || !out)
- return error;
- if (put_it32(out, &kout))
- return -EFAULT;
-
- return 0;
-
-}
asmlinkage unsigned long
sys32_alarm(unsigned int seconds)
{
@@ -784,7 +684,7 @@
extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);

asmlinkage int
-sys32_gettimeofday(struct timeval32 *tv, struct timezone *tz)
+sys32_gettimeofday(struct compat_timeval *tv, struct timezone *tz)
{
if (tv) {
struct timeval ktv;
@@ -800,7 +700,7 @@
}

asmlinkage int
-sys32_settimeofday(struct timeval32 *tv, struct timezone *tz)
+sys32_settimeofday(struct compat_timeval *tv, struct timezone *tz)
{
struct timeval ktv;
struct timezone ktz;
@@ -1112,7 +1012,7 @@
#define MAX_SELECT_SECONDS \
((unsigned long) (MAX_SCHEDULE_TIMEOUT / HZ)-1)

-asmlinkage int sys32_select(int n, u32 *inp, u32 *outp, u32 *exp, struct timeval32 *tvp)
+asmlinkage int sys32_select(int n, u32 *inp, u32 *outp, u32 *exp, struct compat_timeval *tvp)
{
fd_set_bits fds;
char *bits;
@@ -1205,16 +1105,11 @@



-struct timespec32 {
- int tv_sec;
- int tv_nsec;
-};
-
extern asmlinkage int sys_sched_rr_get_interval(pid_t pid,
struct timespec *interval);

asmlinkage int
-sys32_sched_rr_get_interval(__kernel_pid_t32 pid, struct timespec32 *interval)
+sys32_sched_rr_get_interval(__kernel_pid_t32 pid, struct compat_timespec *interval)
{
struct timespec t;
int ret;
@@ -1230,31 +1125,6 @@
}


-extern asmlinkage int sys_nanosleep(struct timespec *rqtp,
- struct timespec *rmtp);
-
-asmlinkage int
-sys32_nanosleep(struct timespec32 *rqtp, struct timespec32 *rmtp)
-{
- struct timespec t;
- int ret;
- mm_segment_t old_fs = get_fs ();
-
- if (get_user (t.tv_sec, &rqtp->tv_sec) ||
- __get_user (t.tv_nsec, &rqtp->tv_nsec))
- return -EFAULT;
-
- set_fs (KERNEL_DS);
- ret = sys_nanosleep(&t, rmtp ? &t : NULL);
- set_fs (old_fs);
- if (rmtp && ret == -EINTR) {
- if (__put_user (t.tv_sec, &rmtp->tv_sec) ||
- __put_user (t.tv_nsec, &rmtp->tv_nsec))
- return -EFAULT;
- }
- return ret;
-}
-
struct tms32 {
int tms_utime;
int tms_stime;
@@ -1418,8 +1288,8 @@

struct semid_ds32 {
struct ipc_perm32 sem_perm; /* permissions .. see ipc.h */
- __kernel_time_t32 sem_otime; /* last semop time */
- __kernel_time_t32 sem_ctime; /* last change time */
+ compat_time_t sem_otime; /* last semop time */
+ compat_time_t sem_ctime; /* last change time */
u32 sem_base; /* ptr to first semaphore in array */
u32 sem_pending; /* pending operations to be processed */
u32 sem_pending_last; /* last pending operation */
@@ -1432,9 +1302,9 @@
struct ipc_perm32 msg_perm;
u32 msg_first;
u32 msg_last;
- __kernel_time_t32 msg_stime;
- __kernel_time_t32 msg_rtime;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_stime;
+ compat_time_t msg_rtime;
+ compat_time_t msg_ctime;
u32 wwait;
u32 rwait;
unsigned short msg_cbytes;
@@ -1447,9 +1317,9 @@
struct shmid_ds32 {
struct ipc_perm32 shm_perm;
int shm_segsz;
- __kernel_time_t32 shm_atime;
- __kernel_time_t32 shm_dtime;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_atime;
+ compat_time_t shm_dtime;
+ compat_time_t shm_ctime;
__kernel_ipc_pid_t32 shm_cpid;
__kernel_ipc_pid_t32 shm_lpid;
unsigned short shm_nattch;
@@ -1819,7 +1689,7 @@
__kernel_caddr_t32 oldval;
__kernel_caddr_t32 oldlenp;
__kernel_caddr_t32 newval;
- __kernel_size_t32 newlen;
+ compat_size_t newlen;
unsigned int __unused[4];
};

@@ -1935,7 +1805,7 @@
u32 modes;
s32 offset, freq, maxerror, esterror;
s32 status, constant, precision, tolerance;
- struct timeval32 time;
+ struct compat_timeval time;
s32 tick;
s32 ppsfreq, jitter, shift, stabil;
s32 jitcnt, calcnt, errcnt, stbcnt;
diff -ruN 2.5.50-BK.2/arch/mips64/kernel/scall_o32.S 2.5.50-BK.2-32bit.1/arch/mips64/kernel/scall_o32.S
--- 2.5.50-BK.2/arch/mips64/kernel/scall_o32.S 2002-02-11 15:12:25.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/mips64/kernel/scall_o32.S 2002-12-04 17:40:22.000000000 +1100
@@ -263,7 +263,7 @@
sys sys32_alarm 1
sys sys_fstat 2
sys sys_pause 0
- sys sys32_utime 2 /* 4030 */
+ sys compat_sys_utime 2 /* 4030 */
sys sys_ni_syscall 0
sys sys_ni_syscall 0
sys sys_access 2
@@ -337,8 +337,8 @@
sys sys_ni_syscall 0 /* sys_ioperm */
sys sys_socketcall 2
sys sys_syslog 3
- sys sys32_setitimer 3
- sys sys32_getitimer 2 /* 4105 */
+ sys compat_sys_setitimer 3
+ sys compat_sys_getitimer 2 /* 4105 */
sys sys32_newstat 2
sys sys32_newlstat 2
sys sys32_newfstat 2
@@ -399,7 +399,7 @@
sys sys_sched_get_priority_max 1
sys sys_sched_get_priority_min 1
sys sys32_sched_rr_get_interval 2 /* 4165 */
- sys sys32_nanosleep 2
+ sys compat_sys_nanosleep 2
sys sys_mremap 4
sys sys_accept 3
sys sys_bind 3
diff -ruN 2.5.50-BK.2/arch/mips64/kernel/signal32.c 2.5.50-BK.2-32bit.1/arch/mips64/kernel/signal32.c
--- 2.5.50-BK.2/arch/mips64/kernel/signal32.c 2002-05-30 05:12:21.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/mips64/kernel/signal32.c 2002-12-04 14:33:00.000000000 +1100
@@ -17,6 +17,7 @@
#include <linux/wait.h>
#include <linux/ptrace.h>
#include <linux/unistd.h>
+#include <linux/compat.h>

#include <asm/asm.h>
#include <asm/bitops.h>
@@ -59,7 +60,7 @@
/* IRIX compatible stack_t */
typedef struct sigaltstack32 {
s32 ss_sp;
- __kernel_size_t32 ss_size;
+ compat_size_t ss_size;
int ss_flags;
} stack32_t;

diff -ruN 2.5.50-BK.2/include/asm-mips64/compat.h 2.5.50-BK.2-32bit.1/include/asm-mips64/compat.h
--- 2.5.50-BK.2/include/asm-mips64/compat.h 1970-01-01 10:00:00.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-mips64/compat.h 2002-12-04 15:12:07.000000000 +1100
@@ -0,0 +1,18 @@
+#ifndef _ASM_MIPS64_COMPAT_H
+#define _ASM_MIPS64_COMPAT_H
+/*
+ * Architecture specific compatibility types
+ */
+#include <linux/types.h>
+
+typedef u32 compat_size_t;
+typedef s32 compat_ssize_t;
+typedef s32 compat_time_t;
+typedef s32 compat_suseconds_t;
+
+struct compat_timespec {
+ compat_time_t tv_sec;
+ s32 tv_nsec;
+};
+
+#endif /* _ASM_MIPS64_COMPAT_H */
diff -ruN 2.5.50-BK.2/include/asm-mips64/posix_types.h 2.5.50-BK.2-32bit.1/include/asm-mips64/posix_types.h
--- 2.5.50-BK.2/include/asm-mips64/posix_types.h 2000-07-10 15:18:15.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-mips64/posix_types.h 2002-12-04 14:46:01.000000000 +1100
@@ -58,10 +58,7 @@
typedef int __kernel_ipc_pid_t32;
typedef int __kernel_uid_t32;
typedef int __kernel_gid_t32;
-typedef unsigned int __kernel_size_t32;
-typedef int __kernel_ssize_t32;
typedef int __kernel_ptrdiff_t32;
-typedef int __kernel_time_t32;
typedef int __kernel_suseconds_t32;
typedef int __kernel_clock_t32;
typedef int __kernel_daddr_t32;
diff -ruN 2.5.50-BK.2/include/asm-mips64/stat.h 2.5.50-BK.2-32bit.1/include/asm-mips64/stat.h
--- 2.5.50-BK.2/include/asm-mips64/stat.h 2002-11-18 15:47:55.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/include/asm-mips64/stat.h 2002-12-03 17:05:07.000000000 +1100
@@ -10,6 +10,7 @@
#define _ASM_STAT_H

#include <linux/types.h>
+#include <linux/compat.h>

struct __old_kernel_stat {
unsigned int st_dev;
@@ -40,11 +41,11 @@
int st_pad2[2];
__kernel_off_t32 st_size;
int st_pad3;
- __kernel_time_t32 st_atime;
+ compat_time_t st_atime;
int reserved0;
- __kernel_time_t32 st_mtime;
+ compat_time_t st_mtime;
int reserved1;
- __kernel_time_t32 st_ctime;
+ compat_time_t st_ctime;
int reserved2;
int st_blksize;
int st_blocks;

2002-12-04 07:27:12

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

On Wed, 4 Dec 2002 18:02:24 +1100 Stephen Rothwell <[email protected]> wrote:
>
> Below is the generic part of the start of the compatibility syscall layer.
> I think I have made it generic enough that each architecture can define
> what compatibility means.

Across all the arhitectures, the diffstat of the patch so far looks
like this:

arch/ia64/Kconfig | 5
arch/ia64/ia32/ia32_entry.S | 8 -
arch/ia64/ia32/ia32_signal.c | 5
arch/ia64/ia32/sys_ia32.c | 186 ++++----------------------
arch/mips64/Kconfig | 5
arch/mips64/kernel/ioctl32.c | 8 -
arch/mips64/kernel/linux32.c | 168 ++---------------------
arch/mips64/kernel/scall_o32.S | 8 -
arch/mips64/kernel/signal32.c | 3
arch/parisc/Kconfig | 5
arch/parisc/kernel/binfmt_elf32.c | 10 -
arch/parisc/kernel/ioctl32.c | 7
arch/parisc/kernel/signal32.c | 3
arch/parisc/kernel/sys32.h | 5
arch/parisc/kernel/sys_parisc32.c | 195 +++++----------------------
arch/ppc64/Kconfig | 4
arch/ppc64/kernel/binfmt_elf32.c | 18 --
arch/ppc64/kernel/ioctl32.c | 12 -
arch/ppc64/kernel/misc.S | 8 -
arch/ppc64/kernel/signal32.c | 11 -
arch/ppc64/kernel/sys32.S | 4
arch/ppc64/kernel/sys_ppc32.c | 262 ++++++++-----------------------------
arch/s390x/Kconfig | 5
arch/s390x/kernel/binfmt_elf32.c | 14 -
arch/s390x/kernel/entry.S | 8 -
arch/s390x/kernel/ioctl32.c | 5
arch/s390x/kernel/linux32.c | 240 ++++++++-------------------------
arch/s390x/kernel/linux32.h | 6
arch/s390x/kernel/wrapper32.S | 34 ++--
arch/sparc64/Kconfig | 5
arch/sparc64/kernel/binfmt_elf32.c | 18 --
arch/sparc64/kernel/ioctl32.c | 12 -
arch/sparc64/kernel/signal32.c | 3
arch/sparc64/kernel/sys32.S | 4
arch/sparc64/kernel/sys_sparc32.c | 225 +++++++------------------------
arch/sparc64/kernel/sys_sunos32.c | 20 +-
arch/sparc64/kernel/systbls.S | 12 -
arch/sparc64/solaris/misc.c | 7
arch/sparc64/solaris/socket.c | 5
arch/x86_64/Kconfig | 5
arch/x86_64/ia32/ia32_binfmt.c | 14 -
arch/x86_64/ia32/ia32_ioctl.c | 12 -
arch/x86_64/ia32/ia32entry.S | 8 -
arch/x86_64/ia32/ipc32.c | 35 ++--
arch/x86_64/ia32/socket32.c | 9 -
arch/x86_64/ia32/sys_ia32.c | 214 ++----------------------------
fs/open.c | 18 +-
include/asm-ia64/compat.h | 19 ++
include/asm-ia64/ia32.h | 8 -
include/asm-mips64/compat.h | 18 ++
include/asm-mips64/posix_types.h | 3
include/asm-mips64/stat.h | 7
include/asm-parisc/compat.h | 18 ++
include/asm-parisc/posix_types.h | 3
include/asm-ppc64/compat.h | 18 ++
include/asm-ppc64/ppc32.h | 12 -
include/asm-s390x/compat.h | 18 ++
include/asm-sparc64/compat.h | 18 ++
include/asm-sparc64/posix_types.h | 3
include/asm-sparc64/signal.h | 3
include/asm-sparc64/stat.h | 7
include/asm-x86_64/compat.h | 18 ++
include/asm-x86_64/ia32.h | 3
include/asm-x86_64/socket32.h | 8 -
include/linux/compat.h | 29 ++++
include/linux/time.h | 2
kernel/Makefile | 1
kernel/compat.c | 114 ++++++++++++++++
kernel/timer.c | 32 ++--
69 files changed, 777 insertions(+), 1463 deletions(-)

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

2002-12-04 07:23:48

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Hi Martin, Linus,

This is the S390x specific patch.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

diff -ruN 2.5.50-BK.2/arch/s390x/Kconfig 2.5.50-BK.2-32bit.1/arch/s390x/Kconfig
--- 2.5.50-BK.2/arch/s390x/Kconfig 2002-11-28 10:34:42.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/s390x/Kconfig 2002-12-03 16:58:25.000000000 +1100
@@ -92,6 +92,11 @@
(and some other stuff like libraries and such) is needed for
executing 31 bit applications. It is safe to say "Y".

+config COMPAT
+ bool
+ depends on S390_SUPPORT
+ default y
+
config BINFMT_ELF32
tristate "Kernel support for 31 bit ELF binaries"
depends on S390_SUPPORT
diff -ruN 2.5.50-BK.2/arch/s390x/kernel/binfmt_elf32.c 2.5.50-BK.2-32bit.1/arch/s390x/kernel/binfmt_elf32.c
--- 2.5.50-BK.2/arch/s390x/kernel/binfmt_elf32.c 2002-09-01 12:00:02.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/s390x/kernel/binfmt_elf32.c 2002-12-04 15:27:08.000000000 +1100
@@ -115,14 +115,10 @@
#include <linux/config.h>
#include <linux/elfcore.h>
#include <linux/binfmts.h>
+#include <linux/compat.h>

int setup_arg_pages32(struct linux_binprm *bprm);

-struct timeval32
-{
- int tv_sec, tv_usec;
-};
-
#define elf_prstatus elf_prstatus32
struct elf_prstatus32
{
@@ -134,10 +130,10 @@
pid_t pr_ppid;
pid_t pr_pgrp;
pid_t pr_sid;
- struct timeval32 pr_utime; /* User time */
- struct timeval32 pr_stime; /* System time */
- struct timeval32 pr_cutime; /* Cumulative user time */
- struct timeval32 pr_cstime; /* Cumulative system time */
+ struct compat_timeval pr_utime; /* User time */
+ struct compat_timeval pr_stime; /* System time */
+ struct compat_timeval pr_cutime; /* Cumulative user time */
+ struct compat_timeval pr_cstime; /* Cumulative system time */
elf_gregset_t pr_reg; /* GP registers */
int pr_fpvalid; /* True if math co-processor being used. */
};
diff -ruN 2.5.50-BK.2/arch/s390x/kernel/entry.S 2.5.50-BK.2-32bit.1/arch/s390x/kernel/entry.S
--- 2.5.50-BK.2/arch/s390x/kernel/entry.S 2002-11-28 10:35:37.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/s390x/kernel/entry.S 2002-12-04 17:40:35.000000000 +1100
@@ -406,7 +406,7 @@
.long SYSCALL(sys_alarm,sys32_alarm_wrapper)
.long SYSCALL(sys_ni_syscall,sys_ni_syscall) /* old fstat syscall */
.long SYSCALL(sys_pause,sys32_pause)
- .long SYSCALL(sys_utime,sys32_utime_wrapper) /* 30 */
+ .long SYSCALL(sys_utime,compat_sys_utime_wrapper) /* 30 */
.long SYSCALL(sys_ni_syscall,sys_ni_syscall) /* old stty syscall */
.long SYSCALL(sys_ni_syscall,sys_ni_syscall) /* old gtty syscall */
.long SYSCALL(sys_access,sys32_access_wrapper)
@@ -480,8 +480,8 @@
.long SYSCALL(sys_ni_syscall,sys_ni_syscall)
.long SYSCALL(sys_socketcall,sys32_socketcall_wrapper)
.long SYSCALL(sys_syslog,sys32_syslog_wrapper)
- .long SYSCALL(sys_setitimer,sys32_setitimer_wrapper)
- .long SYSCALL(sys_getitimer,sys32_getitimer_wrapper) /* 105 */
+ .long SYSCALL(sys_setitimer,compat_sys_setitimer_wrapper)
+ .long SYSCALL(sys_getitimer,compat_sys_getitimer_wrapper) /* 105 */
.long SYSCALL(sys_newstat,sys32_newstat_wrapper)
.long SYSCALL(sys_newlstat,sys32_newlstat_wrapper)
.long SYSCALL(sys_newfstat,sys32_newfstat_wrapper)
@@ -538,7 +538,7 @@
.long SYSCALL(sys_sched_get_priority_max,sys32_sched_get_priority_max_wrapper)
.long SYSCALL(sys_sched_get_priority_min,sys32_sched_get_priority_min_wrapper)
.long SYSCALL(sys_sched_rr_get_interval,sys32_sched_rr_get_interval_wrapper)
- .long SYSCALL(sys_nanosleep,sys32_nanosleep_wrapper)
+ .long SYSCALL(sys_nanosleep,compat_sys_nanosleep_wrapper)
.long SYSCALL(sys_mremap,sys32_mremap_wrapper)
.long SYSCALL(sys_ni_syscall,sys32_setresuid16_wrapper) /* old setresuid16 syscall */
.long SYSCALL(sys_ni_syscall,sys32_getresuid16_wrapper) /* old getresuid16 syscall */
diff -ruN 2.5.50-BK.2/arch/s390x/kernel/ioctl32.c 2.5.50-BK.2-32bit.1/arch/s390x/kernel/ioctl32.c
--- 2.5.50-BK.2/arch/s390x/kernel/ioctl32.c 2002-11-28 10:34:42.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/s390x/kernel/ioctl32.c 2002-12-04 15:26:13.000000000 +1100
@@ -70,11 +70,6 @@
return ret;
}

-struct timeval32 {
- int tv_sec;
- int tv_usec;
-};
-
#define EXT2_IOC32_GETFLAGS _IOR('f', 1, int)
#define EXT2_IOC32_SETFLAGS _IOW('f', 2, int)
#define EXT2_IOC32_GETVERSION _IOR('v', 1, int)
diff -ruN 2.5.50-BK.2/arch/s390x/kernel/linux32.c 2.5.50-BK.2-32bit.1/arch/s390x/kernel/linux32.c
--- 2.5.50-BK.2/arch/s390x/kernel/linux32.c 2002-11-28 10:34:42.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/s390x/kernel/linux32.c 2002-12-04 16:23:19.000000000 +1100
@@ -22,7 +22,6 @@
#include <linux/mm.h>
#include <linux/file.h>
#include <linux/signal.h>
-#include <linux/utime.h>
#include <linux/resource.h>
#include <linux/times.h>
#include <linux/utsname.h>
@@ -57,6 +56,7 @@
#include <linux/icmpv6.h>
#include <linux/sysctl.h>
#include <linux/binfmts.h>
+#include <linux/compat.h>

#include <asm/types.h>
#include <asm/ipc.h>
@@ -245,49 +245,20 @@

/* 32-bit timeval and related flotsam. */

-struct timeval32
-{
- int tv_sec, tv_usec;
-};
-
-struct itimerval32
-{
- struct timeval32 it_interval;
- struct timeval32 it_value;
-};
-
-static inline long get_tv32(struct timeval *o, struct timeval32 *i)
+static inline long get_tv32(struct timeval *o, struct compat_timeval *i)
{
return (!access_ok(VERIFY_READ, tv32, sizeof(*tv32)) ||
(__get_user(o->tv_sec, &i->tv_sec) |
__get_user(o->tv_usec, &i->tv_usec)));
}

-static inline long put_tv32(struct timeval32 *o, struct timeval *i)
+static inline long put_tv32(struct compat_timeval *o, struct timeval *i)
{
return (!access_ok(VERIFY_WRITE, o, sizeof(*o)) ||
(__put_user(i->tv_sec, &o->tv_sec) |
__put_user(i->tv_usec, &o->tv_usec)));
}

-static inline long get_it32(struct itimerval *o, struct itimerval32 *i)
-{
- return (!access_ok(VERIFY_READ, i32, sizeof(*i32)) ||
- (__get_user(o->it_interval.tv_sec, &i->it_interval.tv_sec) |
- __get_user(o->it_interval.tv_usec, &i->it_interval.tv_usec) |
- __get_user(o->it_value.tv_sec, &i->it_value.tv_sec) |
- __get_user(o->it_value.tv_usec, &i->it_value.tv_usec)));
-}
-
-static inline long put_it32(struct itimerval32 *o, struct itimerval *i)
-{
- return (!access_ok(VERIFY_WRITE, i32, sizeof(*i32)) ||
- (__put_user(i->it_interval.tv_sec, &o->it_interval.tv_sec) |
- __put_user(i->it_interval.tv_usec, &o->it_interval.tv_usec) |
- __put_user(i->it_value.tv_sec, &o->it_value.tv_sec) |
- __put_user(i->it_value.tv_usec, &o->it_value.tv_usec)));
-}
-
struct msgbuf32 { s32 mtype; char mtext[1]; };

struct ipc64_perm_ds32
@@ -318,8 +289,8 @@

struct semid_ds32 {
struct ipc_perm32 sem_perm; /* permissions .. see ipc.h */
- __kernel_time_t32 sem_otime; /* last semop time */
- __kernel_time_t32 sem_ctime; /* last change time */
+ compat_time_t sem_otime; /* last semop time */
+ compat_time_t sem_ctime; /* last change time */
u32 sem_base; /* ptr to first semaphore in array */
u32 sem_pending; /* pending operations to be processed */
u32 sem_pending_last; /* last pending operation */
@@ -330,9 +301,9 @@
struct semid64_ds32 {
struct ipc64_perm_ds32 sem_perm;
unsigned int __pad1;
- __kernel_time_t32 sem_otime;
+ compat_time_t sem_otime;
unsigned int __pad2;
- __kernel_time_t32 sem_ctime;
+ compat_time_t sem_ctime;
u32 sem_nsems;
u32 __unused1;
u32 __unused2;
@@ -343,9 +314,9 @@
struct ipc_perm32 msg_perm;
u32 msg_first;
u32 msg_last;
- __kernel_time_t32 msg_stime;
- __kernel_time_t32 msg_rtime;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_stime;
+ compat_time_t msg_rtime;
+ compat_time_t msg_ctime;
u32 wwait;
u32 rwait;
unsigned short msg_cbytes;
@@ -358,11 +329,11 @@
struct msqid64_ds32 {
struct ipc64_perm_ds32 msg_perm;
unsigned int __pad1;
- __kernel_time_t32 msg_stime;
+ compat_time_t msg_stime;
unsigned int __pad2;
- __kernel_time_t32 msg_rtime;
+ compat_time_t msg_rtime;
unsigned int __pad3;
- __kernel_time_t32 msg_ctime;
+ compat_time_t msg_ctime;
unsigned int msg_cbytes;
unsigned int msg_qnum;
unsigned int msg_qbytes;
@@ -376,9 +347,9 @@
struct shmid_ds32 {
struct ipc_perm32 shm_perm;
int shm_segsz;
- __kernel_time_t32 shm_atime;
- __kernel_time_t32 shm_dtime;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_atime;
+ compat_time_t shm_dtime;
+ compat_time_t shm_ctime;
__kernel_ipc_pid_t32 shm_cpid;
__kernel_ipc_pid_t32 shm_lpid;
unsigned short shm_nattch;
@@ -386,12 +357,12 @@

struct shmid64_ds32 {
struct ipc64_perm_ds32 shm_perm;
- __kernel_size_t32 shm_segsz;
- __kernel_time_t32 shm_atime;
+ compat_size_t shm_segsz;
+ compat_time_t shm_atime;
unsigned int __unused1;
- __kernel_time_t32 shm_dtime;
+ compat_time_t shm_dtime;
unsigned int __unused2;
- __kernel_time_t32 shm_ctime;
+ compat_time_t shm_ctime;
unsigned int __unused3;
__kernel_pid_t32 shm_cpid;
__kernel_pid_t32 shm_lpid;
@@ -1010,37 +981,7 @@
return sys_ftruncate(fd, (high << 32) | low);
}

-extern asmlinkage int sys_utime(char * filename, struct utimbuf * times);
-
-struct utimbuf32 {
- __kernel_time_t32 actime, modtime;
-};
-
-asmlinkage int sys32_utime(char * filename, struct utimbuf32 *times)
-{
- struct utimbuf t;
- mm_segment_t old_fs;
- int ret;
- char *filenam;
-
- if (!times)
- return sys_utime(filename, NULL);
- if (get_user (t.actime, &times->actime) ||
- __get_user (t.modtime, &times->modtime))
- return -EFAULT;
- filenam = getname (filename);
- ret = PTR_ERR(filenam);
- if (!IS_ERR(filenam)) {
- old_fs = get_fs();
- set_fs (KERNEL_DS);
- ret = sys_utime(filenam, &t);
- set_fs (old_fs);
- putname (filenam);
- }
- return ret;
-}
-
-struct iovec32 { u32 iov_base; __kernel_size_t32 iov_len; };
+struct iovec32 { u32 iov_base; compat_size_t iov_len; };

typedef ssize_t (*io_fn_t)(struct file *, char *, size_t, loff_t *);
typedef ssize_t (*iov_fn_t)(struct file *, const struct iovec *, unsigned long, loff_t *);
@@ -1363,7 +1304,7 @@
asmlinkage int sys32_select(int n, u32 *inp, u32 *outp, u32 *exp, u32 tvp_x)
{
fd_set_bits fds;
- struct timeval32 *tvp = (struct timeval32 *)AA(tvp_x);
+ struct compat_timeval *tvp = (struct compat_timeval *)AA(tvp_x);
char *bits;
unsigned long nn;
long timeout;
@@ -1671,8 +1612,8 @@
}

struct rusage32 {
- struct timeval32 ru_utime;
- struct timeval32 ru_stime;
+ struct compat_timeval ru_utime;
+ struct compat_timeval ru_stime;
s32 ru_maxrss;
s32 ru_ixrss;
s32 ru_idrss;
@@ -1774,14 +1715,10 @@
return ret;
}

-struct timespec32 {
- s32 tv_sec;
- s32 tv_nsec;
-};
-
extern asmlinkage int sys_sched_rr_get_interval(pid_t pid, struct timespec *interval);

-asmlinkage int sys32_sched_rr_get_interval(__kernel_pid_t32 pid, struct timespec32 *interval)
+asmlinkage int sys32_sched_rr_get_interval(__kernel_pid_t32 pid,
+ struct compat_timespec *interval)
{
struct timespec t;
int ret;
@@ -1796,28 +1733,6 @@
return ret;
}

-extern asmlinkage int sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp);
-
-asmlinkage int sys32_nanosleep(struct timespec32 *rqtp, struct timespec32 *rmtp)
-{
- struct timespec t;
- int ret;
- mm_segment_t old_fs = get_fs ();
-
- if (get_user (t.tv_sec, &rqtp->tv_sec) ||
- __get_user (t.tv_nsec, &rqtp->tv_nsec))
- return -EFAULT;
- set_fs (KERNEL_DS);
- ret = sys_nanosleep(&t, rmtp ? &t : NULL);
- set_fs (old_fs);
- if (rmtp && ret == -EINTR) {
- if (__put_user (t.tv_sec, &rmtp->tv_sec) ||
- __put_user (t.tv_nsec, &rmtp->tv_nsec))
- return -EFAULT;
- }
- return ret;
-}
-
extern asmlinkage int sys_sigprocmask(int how, old_sigset_t *set, old_sigset_t *oset);

asmlinkage int sys32_sigprocmask(int how, old_sigset_t32 *set, old_sigset_t32 *oset)
@@ -1837,7 +1752,7 @@

extern asmlinkage int sys_rt_sigprocmask(int how, sigset_t *set, sigset_t *oset, size_t sigsetsize);

-asmlinkage int sys32_rt_sigprocmask(int how, sigset_t32 *set, sigset_t32 *oset, __kernel_size_t32 sigsetsize)
+asmlinkage int sys32_rt_sigprocmask(int how, sigset_t32 *set, sigset_t32 *oset, compat_size_t sigsetsize)
{
sigset_t s;
sigset_t32 s32;
@@ -1888,7 +1803,7 @@

extern asmlinkage int sys_rt_sigpending(sigset_t *set, size_t sigsetsize);

-asmlinkage int sys32_rt_sigpending(sigset_t32 *set, __kernel_size_t32 sigsetsize)
+asmlinkage int sys32_rt_sigpending(sigset_t32 *set, compat_size_t sigsetsize)
{
sigset_t s;
sigset_t32 s32;
@@ -1916,7 +1831,7 @@

asmlinkage int
sys32_rt_sigtimedwait(sigset_t32 *uthese, siginfo_t32 *uinfo,
- struct timespec32 *uts, __kernel_size_t32 sigsetsize)
+ struct compat_timespec *uts, compat_size_t sigsetsize)
{
int ret, sig;
sigset_t these;
@@ -2136,14 +2051,14 @@
u32 msg_name;
int msg_namelen;
u32 msg_iov;
- __kernel_size_t32 msg_iovlen;
+ compat_size_t msg_iovlen;
u32 msg_control;
- __kernel_size_t32 msg_controllen;
+ compat_size_t msg_controllen;
unsigned msg_flags;
};

struct cmsghdr32 {
- __kernel_size_t32 cmsg_len;
+ compat_size_t cmsg_len;
int cmsg_level;
int cmsg_type;
};
@@ -2277,7 +2192,7 @@
{
struct cmsghdr32 *ucmsg;
struct cmsghdr *kcmsg, *kcmsg_base;
- __kernel_size_t32 ucmlen;
+ compat_size_t ucmlen;
__kernel_size_t kcmlen, tmp;

kcmlen = 0;
@@ -2498,12 +2413,12 @@
* from 64-bit time values to 32-bit time values
*/
case SO_TIMESTAMP: {
- __kernel_time_t32* ptr_time32 = CMSG32_DATA(kcmsg32);
+ compat_time_t* ptr_time32 = CMSG32_DATA(kcmsg32);
__kernel_time_t* ptr_time = CMSG_DATA(ucmsg);
get_user(*ptr_time32, ptr_time);
get_user(*(ptr_time32+1), ptr_time+1);
kcmsg32->cmsg_len -= 2*(sizeof(__kernel_time_t) -
- sizeof(__kernel_time_t32));
+ sizeof(compat_time_t));
}
default:;
}
@@ -2746,7 +2661,7 @@
err = __put_user(msg_sys.msg_flags, &msg->msg_flags);
if (err)
goto out_freeiov;
- err = __put_user((__kernel_size_t32) ((unsigned long)msg_sys.msg_control - cmsg_ptr), &msg->msg_controllen);
+ err = __put_user((compat_size_t) ((unsigned long)msg_sys.msg_control - cmsg_ptr), &msg->msg_controllen);
if (err)
goto out_freeiov;
err = len;
@@ -2848,7 +2763,7 @@
struct timeval tmp;
mm_segment_t old_fs;

- if (get_tv32(&tmp, (struct timeval32 *)optval ))
+ if (get_tv32(&tmp, (struct compat_timeval *)optval ))
return -EFAULT;
old_fs = get_fs();
set_fs(KERNEL_DS);
@@ -3126,7 +3041,7 @@
}

static int
-qm_modules(char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_modules(char *buf, size_t bufsize, compat_size_t *ret)
{
struct module *mod;
size_t nmod, space, len;
@@ -3161,7 +3076,7 @@
}

static int
-qm_deps(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_deps(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
size_t i, space, len;

@@ -3198,7 +3113,7 @@
}

static int
-qm_refs(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_refs(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
size_t nrefs, space, len;
struct module_ref *ref;
@@ -3242,7 +3157,7 @@
}

static inline int
-qm_symbols(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_symbols(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
size_t i, space, len;
struct module_symbol *s;
@@ -3301,7 +3216,7 @@
}

static inline int
-qm_info(struct module *mod, char *buf, size_t bufsize, __kernel_size_t32 *ret)
+qm_info(struct module *mod, char *buf, size_t bufsize, compat_size_t *ret)
{
int error = 0;

@@ -3683,7 +3598,7 @@
extern struct timezone sys_tz;
extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);

-asmlinkage int sys32_gettimeofday(struct timeval32 *tv, struct timezone *tz)
+asmlinkage int sys32_gettimeofday(struct compat_timeval *tv, struct timezone *tz)
{
if (tv) {
struct timeval ktv;
@@ -3698,7 +3613,7 @@
return 0;
}

-asmlinkage int sys32_settimeofday(struct timeval32 *tv, struct timezone *tz)
+asmlinkage int sys32_settimeofday(struct compat_timeval *tv, struct timezone *tz)
{
struct timeval ktv;
struct timezone ktz;
@@ -3715,46 +3630,9 @@
return do_sys_settimeofday(tv ? &ktv : NULL, tz ? &ktz : NULL);
}

-extern int do_getitimer(int which, struct itimerval *value);
-
-asmlinkage int sys32_getitimer(int which, struct itimerval32 *it)
-{
- struct itimerval kit;
- int error;
-
- error = do_getitimer(which, &kit);
- if (!error && put_it32(it, &kit))
- error = -EFAULT;
-
- return error;
-}
-
-extern int do_setitimer(int which, struct itimerval *, struct itimerval *);
-
-asmlinkage int sys32_setitimer(int which, struct itimerval32 *in, struct itimerval32 *out)
-{
- struct itimerval kin, kout;
- int error;
-
- if (in) {
- if (get_it32(&kin, in))
- return -EFAULT;
- } else
- memset(&kin, 0, sizeof(kin));
-
- error = do_setitimer(which, &kin, out ? &kout : NULL);
- if (error || !out)
- return error;
- if (put_it32(out, &kout))
- return -EFAULT;
-
- return 0;
-
-}
-
asmlinkage int sys_utimes(char *, struct timeval *);

-asmlinkage int sys32_utimes(char *filename, struct timeval32 *tvs)
+asmlinkage int sys32_utimes(char *filename, struct compat_timeval *tvs)
{
char *kfilename;
struct timeval ktvs[2];
@@ -3807,27 +3685,25 @@
extern asmlinkage ssize_t sys_pwrite64(unsigned int fd, const char * buf,
size_t count, loff_t pos);

-typedef __kernel_ssize_t32 ssize_t32;
-
-asmlinkage ssize_t32 sys32_pread64(unsigned int fd, char *ubuf,
- __kernel_size_t32 count, u32 poshi, u32 poslo)
+asmlinkage compat_ssize_t sys32_pread64(unsigned int fd, char *ubuf,
+ compat_size_t count, u32 poshi, u32 poslo)
{
- if ((ssize_t32) count < 0)
+ if ((compat_ssize_t) count < 0)
return -EINVAL;
return sys_pread64(fd, ubuf, count, ((loff_t)AA(poshi) << 32) | AA(poslo));
}

-asmlinkage ssize_t32 sys32_pwrite64(unsigned int fd, char *ubuf,
- __kernel_size_t32 count, u32 poshi, u32 poslo)
+asmlinkage compat_ssize_t sys32_pwrite64(unsigned int fd, char *ubuf,
+ compat_size_t count, u32 poshi, u32 poslo)
{
- if ((ssize_t32) count < 0)
+ if ((compat_ssize_t) count < 0)
return -EINVAL;
return sys_pwrite64(fd, ubuf, count, ((loff_t)AA(poshi) << 32) | AA(poslo));
}

extern asmlinkage ssize_t sys_readahead(int fd, loff_t offset, size_t count);

-asmlinkage ssize_t32 sys32_readahead(int fd, u32 offhi, u32 offlo, s32 count)
+asmlinkage compat_ssize_t sys32_readahead(int fd, u32 offhi, u32 offlo, s32 count)
{
return sys_readahead(fd, ((loff_t)AA(offhi) << 32) | AA(offlo), count);
}
@@ -3882,7 +3758,7 @@
u32 modes;
s32 offset, freq, maxerror, esterror;
s32 status, constant, precision, tolerance;
- struct timeval32 time;
+ struct compat_timeval time;
s32 tick;
s32 ppsfreq, jitter, shift, stabil;
s32 jitcnt, calcnt, errcnt, stbcnt;
@@ -4353,7 +4229,7 @@

asmlinkage int
sys32_futex(void *uaddr, int op, int val,
- struct timespec32 *timeout32)
+ struct compat_timespec *timeout32)
{
struct timespec tmp;
mm_segment_t old_fs;
@@ -4373,9 +4249,9 @@

asmlinkage ssize_t sys_read(unsigned int fd, char * buf, size_t count);

-asmlinkage ssize_t32 sys32_read(unsigned int fd, char * buf, size_t count)
+asmlinkage compat_ssize_t sys32_read(unsigned int fd, char * buf, size_t count)
{
- if ((ssize_t32) count < 0)
+ if ((compat_ssize_t) count < 0)
return -EINVAL;

return sys_read(fd, buf, count);
@@ -4383,9 +4259,9 @@

asmlinkage ssize_t sys_write(unsigned int fd, const char * buf, size_t count);

-asmlinkage ssize_t32 sys32_write(unsigned int fd, char * buf, size_t count)
+asmlinkage compat_ssize_t sys32_write(unsigned int fd, char * buf, size_t count)
{
- if ((ssize_t32) count < 0)
+ if ((compat_ssize_t) count < 0)
return -EINVAL;

return sys_write(fd, buf, count);
diff -ruN 2.5.50-BK.2/arch/s390x/kernel/linux32.h 2.5.50-BK.2-32bit.1/arch/s390x/kernel/linux32.h
--- 2.5.50-BK.2/arch/s390x/kernel/linux32.h 2002-10-08 12:02:40.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/arch/s390x/kernel/linux32.h 2002-12-04 14:46:24.000000000 +1100
@@ -2,6 +2,7 @@
#define _ASM_S390X_S390_H

#include <linux/config.h>
+#include <linux/compat.h>
#include <linux/socket.h>
#include <linux/nfs_fs.h>
#include <linux/sunrpc/svc.h>
@@ -15,10 +16,7 @@
((unsigned long)(__x))

/* Now 32bit compatibility types */
-typedef unsigned int __kernel_size_t32;
-typedef int __kernel_ssize_t32;
typedef int __kernel_ptrdiff_t32;
-typedef int __kernel_time_t32;
typedef int __kernel_clock_t32;
typedef int __kernel_pid_t32;
typedef unsigned short __kernel_ipc_pid_t32;
@@ -253,7 +251,7 @@
typedef struct {
__u32 ss_sp; /* pointer */
int ss_flags;
- __kernel_size_t32 ss_size;
+ compat_size_t ss_size;
} stack_t32;

/* asm/ucontext.h */
diff -ruN 2.5.50-BK.2/arch/s390x/kernel/wrapper32.S 2.5.50-BK.2-32bit.1/arch/s390x/kernel/wrapper32.S
--- 2.5.50-BK.2/arch/s390x/kernel/wrapper32.S 2002-11-28 10:34:42.000000000 +1100
+++ 2.5.50-BK.2-32bit.1/arch/s390x/kernel/wrapper32.S 2002-12-04 17:40:29.000000000 +1100
@@ -130,11 +130,11 @@

#sys32_pause_wrapper # void

- .globl sys32_utime_wrapper
-sys32_utime_wrapper:
+ .globl compat_sys_utime_wrapper
+compat_sys_utime_wrapper:
llgtr %r2,%r2 # char *
- llgtr %r3,%r3 # struct utimbuf_emu31 *
- jg sys32_utime # branch to system call
+ llgtr %r3,%r3 # struct compat_utimbuf *
+ jg compat_sys_utime # branch to system call

.globl sys32_access_wrapper
sys32_access_wrapper:
@@ -465,18 +465,18 @@
lgfr %r4,%r4 # int
jg sys_syslog # branch to system call

- .globl sys32_setitimer_wrapper
-sys32_setitimer_wrapper:
+ .globl compat_sys_setitimer_wrapper
+compat_sys_setitimer_wrapper:
lgfr %r2,%r2 # int
llgtr %r3,%r3 # struct itimerval_emu31 *
llgtr %r4,%r4 # struct itimerval_emu31 *
- jg sys32_setitimer # branch to system call
+ jg compat_sys_setitimer # branch to system call

- .globl sys32_getitimer_wrapper
-sys32_getitimer_wrapper:
+ .globl compat_sys_getitimer_wrapper
+compat_sys_getitimer_wrapper:
lgfr %r2,%r2 # int
llgtr %r3,%r3 # struct itimerval_emu31 *
- jg sys32_getitimer # branch to system call
+ jg compat_sys_getitimer # branch to system call

.globl sys32_newstat_wrapper
sys32_newstat_wrapper:
@@ -743,14 +743,14 @@
.globl sys32_sched_rr_get_interval_wrapper
sys32_sched_rr_get_interval_wrapper:
lgfr %r2,%r2 # pid_t
- llgtr %r3,%r3 # struct timespec_emu31 *
+ llgtr %r3,%r3 # struct compat_timespec *
jg sys32_sched_rr_get_interval # branch to system call

- .globl sys32_nanosleep_wrapper
-sys32_nanosleep_wrapper:
- llgtr %r2,%r2 # struct timespec_emu31 *
- llgtr %r3,%r3 # struct timespec_emu31 *
- jg sys32_nanosleep # branch to system call
+ .globl compat_sys_nanosleep_wrapper
+compat_sys_nanosleep_wrapper:
+ llgtr %r2,%r2 # struct compat_timespec *
+ llgtr %r3,%r3 # struct compat_timespec *
+ jg compat_sys_nanosleep # branch to system call

.globl sys32_mremap_wrapper
sys32_mremap_wrapper:
@@ -839,7 +839,7 @@
sys32_rt_sigtimedwait_wrapper:
llgtr %r2,%r2 # const sigset_emu31_t *
llgtr %r3,%r3 # siginfo_emu31_t *
- llgtr %r4,%r4 # const struct timespec_emu31 *
+ llgtr %r4,%r4 # const struct compat_timespec *
llgfr %r5,%r5 # size_t
jg sys32_rt_sigtimedwait # branch to system call

diff -ruN 2.5.50-BK.2/include/asm-s390x/compat.h 2.5.50-BK.2-32bit.1/include/asm-s390x/compat.h
--- 2.5.50-BK.2/include/asm-s390x/compat.h 1970-01-01 10:00:00.000000000 +1000
+++ 2.5.50-BK.2-32bit.1/include/asm-s390x/compat.h 2002-12-04 15:15:45.000000000 +1100
@@ -0,0 +1,18 @@
+#ifndef _ASM_S390X_COMPAT_H
+#define _ASM_S390X_COMPAT_H
+/*
+ * Architecture specific compatibility types
+ */
+#include <linux/types.h>
+
+typedef u32 compat_size_t;
+typedef s32 compat_ssize_t;
+typedef s32 compat_time_t;
+typedef s32 compat_suseconds_t;
+
+struct compat_timespec {
+ compat_time_t tv_sec;
+ s32 tv_nsec;
+};
+
+#endif /* _ASM_S390X_COMPAT_H */

2002-12-04 07:29:49

by David Mosberger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer - IA64

>>>>> On Wed, 4 Dec 2002 18:26:27 +1100, Stephen Rothwell <[email protected]> said:

Stephen> Hi David, Linus, This is the IA64 specific patch.

Looks good to me. I'd be happy to apply it if Linus accepts the
platform-independent portion of the patch.

Thanks,

--david

2002-12-04 11:22:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer - X86_64

On Wed, Dec 04, 2002 at 08:18:50AM +0100, Stephen Rothwell wrote:
> Hi Andi, Linus,
>
> Here is the x86_64 specific patch.

Thanks looks good.

I'll apply it when Linus takes the architecture independent part that
it relies on.

-Andi

P.S.: Thank you for doing this work. It was long overdue.

2002-12-04 11:51:10

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Hi!

> Below is the generic part of the start of the compatibility syscall layer.
> I think I have made it generic enough that each architecture can define
> what compatibility means.
>
> To use this,an architecture must create asm/compat.h and provide typedefs
> for (currently) compat_time_t, compat_suseconds_t, struct compat_timespec.
~~~~
Maybe we need better name?
This is too easy to missparse ;-).
Pavel
--
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

2002-12-04 16:46:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Wed, 4 Dec 2002, Stephen Rothwell wrote:
>
> To use this,an architecture must create asm/compat.h and provide typedefs
> for (currently) compat_time_t, compat_suseconds_t, struct compat_timespec.

Ok, this looks fine to me. At this point my only issues are purely
cosmetic, namely that "suseconds" thing made me go "wtf?". I have _never_
actually seen it used, it's one of those stupid typedefs that have no
point to them.

Since the only use of "suseconds" is in the "compat_timeval" definition,
and since you already have a "compat_timespec", my reaction is to ask why
we don't just make "compat_timeval" be the arch-supplied typedef, and drop
that strange "suseconds" thing entirely?

That avoids a very awkward name, _and_ it looks more natural to pair
compat_timeval and compat_timespec anyway.

(Yeah, I know we use suseconds_t in the "real" timeval declaration, and I
think that's strange too.)

I'll do that change by hand, and apply this to get the ball rolling, ok?

Linus

2002-12-04 16:49:33

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

From: Linus Torvalds <[email protected]>
Date: Wed, 4 Dec 2002 08:54:47 -0800 (PST)

I'll do that change by hand, and apply this to get the ball rolling, ok?

"Just do it." :-)

2002-12-04 16:56:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Wed, 4 Dec 2002, David S. Miller wrote:
>
> "Just do it." :-)

Pushing the result out now (after having done a quick test that it would
seem to work on x86 if it had a compat layer).

Linus

2002-12-04 19:51:00

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Stephen Rothwell wrote:
>
> Hi Linus,
>
> Below is the generic part of the start of the compatibility syscall layer.
> I think I have made it generic enough that each architecture can define
> what compatibility means.
>
> To use this,an architecture must create asm/compat.h and provide typedefs
> for (currently) compat_time_t, compat_suseconds_t, struct compat_timespec.
>
> Hopefully, this is what you had in mind - ohterwise back to the drawing
> board.
>
> I will follow this posting with the architecture specific patches that I
> have done but not tested.

The standard for POSIX nano_sleep() says that it should not
return on a signal UNLESS the signal is delivered to the
user. (I.e. not on SIGSTOP, SIGTRACE, etc. or any signal
that does not invoke a user handler.) To do this sort of
thing the do_signal() function returns true in this case and
otherwise false. The reason this is not in nano_sleep()
today seems to be that do_signal() requires &regs which
differs from arch to arch in how it is passed to a system
call handler. The common code does not need or use regs
except to pass it to do_signal().

As a suggestion for a solution for this, is it true that
regs, on a system call, will ALWAYS be at the end of the
stack? Lets assume, at least for nano_sleep, that it will
not be called from the kernel. Even if it is, what should
signal behavior be in this case? Should it not use the user
regs?

Here is a suggested function to get the regs:

struct pt_regs *get_task_registers(struct task_struct* task)
{
return (struct pt_regs *)((unsigned char
*)task->thread.esp0 -
sizeof(struct pt_regs));
}

comments?

>
> --
> Cheers,
> Stephen Rothwell [email protected]
> http://www.canb.auug.org.au/~sfr/
>
> diff -ruN 2.5.50-BK.2/fs/open.c 2.5.50-BK.2-32bit.1/fs/open.c
> --- 2.5.50-BK.2/fs/open.c 2002-12-04 12:07:36.000000000 +1100
> +++ 2.5.50-BK.2-32bit.1/fs/open.c 2002-12-04 12:01:36.000000000 +1100
> @@ -280,7 +280,7 @@
> * must be owner or have write permission.
> * Else, update from *times, must be owner or super user.
> */
> -asmlinkage long sys_utimes(char * filename, struct timeval * utimes)
> +long do_utimes(char * filename, struct timeval * times)
> {
> int error;
> struct nameidata nd;
> @@ -299,11 +299,7 @@
>
> /* Don't worry, the checks are done in inode_change_ok() */
> newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
> - if (utimes) {
> - struct timeval times[2];
> - error = -EFAULT;
> - if (copy_from_user(&times, utimes, sizeof(times)))
> - goto dput_and_out;
> + if (times) {
> newattrs.ia_atime.tv_sec = times[0].tv_sec;
> newattrs.ia_atime.tv_nsec = times[0].tv_usec * 1000;
> newattrs.ia_mtime.tv_sec = times[1].tv_sec;
> @@ -323,6 +319,16 @@
> return error;
> }
>
> +asmlinkage long sys_utimes(char * filename, struct timeval * utimes)
> +{
> + struct timeval times[2];
> +
> + if (utimes && copy_from_user(&times, utimes, sizeof(times)))
> + return -EFAULT;
> + return do_utimes(filename, utimes ? times : NULL);
> +}
> +
> +
> /*
> * access() needs to use the real uid/gid, not the effective uid/gid.
> * We do this by temporarily clearing all FS-related capabilities and
> diff -ruN 2.5.50-BK.2/include/linux/compat.h 2.5.50-BK.2-32bit.1/include/linux/compat.h
> --- 2.5.50-BK.2/include/linux/compat.h 1970-01-01 10:00:00.000000000 +1000
> +++ 2.5.50-BK.2-32bit.1/include/linux/compat.h 2002-12-04 15:42:36.000000000 +1100
> @@ -0,0 +1,29 @@
> +#ifndef _LINUX_COMPAT_H
> +#define _LINUX_COMPAT_H
> +/*
> + * These are the type definitions for the arhitecure sepcific
> + * compatibility layer.
> + */
> +#include <linux/config.h>
> +
> +#ifdef CONFIG_COMPAT
> +
> +#include <asm/compat.h>
> +
> +struct compat_timeval {
> + compat_time_t tv_sec;
> + compat_suseconds_t tv_usec;
> +};
> +
> +struct compat_utimbuf {
> + compat_time_t actime;
> + compat_time_t modtime;
> +};
> +
> +struct compat_itimerval {
> + struct compat_timeval it_interval;
> + struct compat_timeval it_value;
> +};
> +
> +#endif /* CONFIG_COMPAT */
> +#endif /* _LINUX_COMPAT_H */
> diff -ruN 2.5.50-BK.2/include/linux/time.h 2.5.50-BK.2-32bit.1/include/linux/time.h
> --- 2.5.50-BK.2/include/linux/time.h 2002-11-18 15:47:56.000000000 +1100
> +++ 2.5.50-BK.2-32bit.1/include/linux/time.h 2002-12-03 15:47:26.000000000 +1100
> @@ -138,6 +138,8 @@
> #ifdef __KERNEL__
> extern void do_gettimeofday(struct timeval *tv);
> extern void do_settimeofday(struct timeval *tv);
> +extern long do_nanosleep(struct timespec *t);
> +extern long do_utimes(char * filename, struct timeval * times);
> #endif
>
> #define FD_SETSIZE __FD_SETSIZE
> diff -ruN 2.5.50-BK.2/kernel/Makefile 2.5.50-BK.2-32bit.1/kernel/Makefile
> --- 2.5.50-BK.2/kernel/Makefile 2002-11-28 10:34:59.000000000 +1100
> +++ 2.5.50-BK.2-32bit.1/kernel/Makefile 2002-12-03 15:42:28.000000000 +1100
> @@ -21,6 +21,7 @@
> obj-$(CONFIG_CPU_FREQ) += cpufreq.o
> obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
> obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend.o
> +obj-$(CONFIG_COMPAT) += compat.o
>
> ifneq ($(CONFIG_IA64),y)
> # According to Alan Modra <[email protected]>, the -fno-omit-frame-pointer is
> diff -ruN 2.5.50-BK.2/kernel/compat.c 2.5.50-BK.2-32bit.1/kernel/compat.c
> --- 2.5.50-BK.2/kernel/compat.c 1970-01-01 10:00:00.000000000 +1000
> +++ 2.5.50-BK.2-32bit.1/kernel/compat.c 2002-12-04 17:40:08.000000000 +1100
> @@ -0,0 +1,114 @@
> +/*
> + * linux/kernel/compat.c
> + *
> + * Kernel compatibililty routines for e.g. 32 bit syscall support
> + * on 64 bit kernels.
> + *
> + * Copyright (C) 2002 Stephen Rothwell, IBM Corporation
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/linkage.h>
> +#include <linux/compat.h>
> +#include <linux/errno.h>
> +#include <linux/time.h>
> +
> +#include <asm/uaccess.h>
> +
> +asmlinkage long compat_sys_nanosleep(struct compat_timespec *rqtp,
> + struct compat_timespec *rmtp)
> +{
> + struct timespec t;
> + struct compat_timespec ct;
> + s32 ret;
> +
> + if (copy_from_user(&ct, rqtp, sizeof(ct)))
> + return -EFAULT;
> + t.tv_sec = ct.tv_sec;
> + t.tv_nsec = ct.tv_nsec;
> + ret = do_nanosleep(&t);
> + if (rmtp && (ret == -EINTR)) {
> + ct.tv_sec = t.tv_sec;
> + ct.tv_nsec = t.tv_nsec;
> + if (copy_to_user(rmtp, &ct, sizeof(ct)))
> + return -EFAULT;
> + }
> + return ret;
> +}
> +
> +/*
> + * Not all architectures have sys_utime, so implement this in terms
> + * of sys_utimes.
> + */
> +asmlinkage long compat_sys_utime(char *filename, struct compat_utimbuf *t)
> +{
> + struct timeval tv[2];
> +
> + if (t) {
> + if (get_user(tv[0].tv_sec, &t->actime) ||
> + get_user(tv[1].tv_sec, &t->modtime))
> + return -EFAULT;
> + tv[0].tv_usec = 0;
> + tv[1].tv_usec = 0;
> + }
> + return do_utimes(filename, t ? tv : NULL);
> +}
> +
> +
> +static inline long get_compat_itimerval(struct itimerval *o,
> + struct compat_itimerval *i)
> +{
> + return (!access_ok(VERIFY_READ, i, sizeof(*i)) ||
> + (__get_user(o->it_interval.tv_sec, &i->it_interval.tv_sec) |
> + __get_user(o->it_interval.tv_usec, &i->it_interval.tv_usec) |
> + __get_user(o->it_value.tv_sec, &i->it_value.tv_sec) |
> + __get_user(o->it_value.tv_usec, &i->it_value.tv_usec)));
> +}
> +
> +static inline long put_compat_itimerval(struct compat_itimerval *o,
> + struct itimerval *i)
> +{
> + return (!access_ok(VERIFY_WRITE, o, sizeof(*o)) ||
> + (__put_user(i->it_interval.tv_sec, &o->it_interval.tv_sec) |
> + __put_user(i->it_interval.tv_usec, &o->it_interval.tv_usec) |
> + __put_user(i->it_value.tv_sec, &o->it_value.tv_sec) |
> + __put_user(i->it_value.tv_usec, &o->it_value.tv_usec)));
> +}
> +
> +extern int do_getitimer(int which, struct itimerval *value);
> +
> +asmlinkage long compat_sys_getitimer(int which, struct compat_itimerval *it)
> +{
> + struct itimerval kit;
> + int error;
> +
> + error = do_getitimer(which, &kit);
> + if (!error && put_compat_itimerval(it, &kit))
> + error = -EFAULT;
> + return error;
> +}
> +
> +extern int do_setitimer(int which, struct itimerval *, struct itimerval *);
> +
> +asmlinkage long compat_sys_setitimer(int which, struct compat_itimerval *in,
> + struct compat_itimerval *out)
> +{
> + struct itimerval kin, kout;
> + int error;
> +
> + if (in) {
> + if (get_compat_itimerval(&kin, in))
> + return -EFAULT;
> + } else
> + memset(&kin, 0, sizeof(kin));
> +
> + error = do_setitimer(which, &kin, out ? &kout : NULL);
> + if (error || !out)
> + return error;
> + if (put_compat_itimerval(out, &kout))
> + return -EFAULT;
> + return 0;
> +}
> diff -ruN 2.5.50-BK.2/kernel/timer.c 2.5.50-BK.2-32bit.1/kernel/timer.c
> --- 2.5.50-BK.2/kernel/timer.c 2002-12-04 12:07:39.000000000 +1100
> +++ 2.5.50-BK.2-32bit.1/kernel/timer.c 2002-12-04 12:03:23.000000000 +1100
> @@ -1020,33 +1020,41 @@
> return current->pid;
> }
>
> -asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)
> +long do_nanosleep(struct timespec *t)
> {
> - struct timespec t;
> unsigned long expire;
>
> - if(copy_from_user(&t, rqtp, sizeof(struct timespec)))
> - return -EFAULT;
> -
> - if (t.tv_nsec >= 1000000000L || t.tv_nsec < 0 || t.tv_sec < 0)
> + if ((t->tv_nsec >= 1000000000L) || (t->tv_nsec < 0) || (t->tv_sec < 0))
> return -EINVAL;
>
> - expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);
> + expire = timespec_to_jiffies(t) + (t->tv_sec || t->tv_nsec);
>
> current->state = TASK_INTERRUPTIBLE;
> expire = schedule_timeout(expire);
>
> if (expire) {
> - if (rmtp) {
> - jiffies_to_timespec(expire, &t);
> - if (copy_to_user(rmtp, &t, sizeof(struct timespec)))
> - return -EFAULT;
> - }
> + jiffies_to_timespec(expire, t);
> return -EINTR;
> }
> return 0;
> }
>
> +asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)
> +{
> + struct timespec t;
> + long ret;
> +
> + if (copy_from_user(&t, rqtp, sizeof(t)))
> + return -EFAULT;
> +
> + ret = do_nanosleep(&t);
> + if (rmtp && (ret == -EINTR)) {
> + if (copy_to_user(rmtp, &t, sizeof(t)))
> + return -EFAULT;
> + }
> + return ret;
> +}
> +
> /*
> * sys_sysinfo - fill in sysinfo struct
> */
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-12-04 20:01:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Wed, 4 Dec 2002, george anzinger wrote:
>
> As a suggestion for a solution for this, is it true that
> regs, on a system call, will ALWAYS be at the end of the
> stack?

No. Some architectures do not save enough state on the stack by default,
and need to do more to use do_signal(). Look at alpha, for example - the
default kernel stack doesn't contain all tbe registers needed, and
the alpha do_signal() calling convention is different.

If you want to handle do_signal(), then you need to do _all_ of this in
architecture-specific files. You simply cannot do what you want to do in a
generic way.

Linus

2002-12-04 20:48:34

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

On Wed, Dec 04, 2002 at 12:07:11PM -0800, Linus Torvalds wrote:
>
> On Wed, 4 Dec 2002, george anzinger wrote:
> >
> > As a suggestion for a solution for this, is it true that
> > regs, on a system call, will ALWAYS be at the end of the
> > stack?
>
> No. Some architectures do not save enough state on the stack by default,
> and need to do more to use do_signal(). Look at alpha, for example - the
> default kernel stack doesn't contain all tbe registers needed, and
> the alpha do_signal() calling convention is different.
>
> If you want to handle do_signal(), then you need to do _all_ of this in
> architecture-specific files. You simply cannot do what you want to do in a
> generic way.

I think you should be able to call do_signal or a wrapper in some
platform-independent way. Is the necessary information recoverable in
Alpha et al.? What do you think of adding a standard wrapper function
so that system calls can process a signal if necessary?

Not only did George need this for POSIX conformance, I've seen a lot of
complaints about GDB interrupting sys_nanosleep even on cancelled
signals.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer

2002-12-04 22:05:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

From: Daniel Jacobowitz <[email protected]>
Date: Wed, 4 Dec 2002 15:56:09 -0500

Is the necessary information recoverable in
Alpha et al.?

No, and Sparc is the same. It's kept in local registers
in the assembler of the trap return path.

2002-12-04 22:25:27

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

"David S. Miller" wrote:
>
> From: Daniel Jacobowitz <[email protected]>
> Date: Wed, 4 Dec 2002 15:56:09 -0500
>
> Is the necessary information recoverable in
> Alpha et al.?
>
> No, and Sparc is the same. It's kept in local registers
> in the assembler of the trap return path.

One solution would then appear to be that we need arch
wrappers for nano_sleep and clock_nanosleep (when and if).

On the PARISC I did this (a long time ago in a far away
place) by unwinding the stack to pick up the registers that
were saved along the way. Is this at all feasible?

It might help to understand just what registers do_signal
needs. It doesn't need them all, I suspect.

Yet another idea, do_signal does not actually call the user
handler (the only case where it needs the regs) but sets up
the stack to make it happen when the system call returns.
If there were a function that could be called to find out if
a signal was going to be delivered, the right thing could be
done in nano_sleep() and the actual do_signal call could
come from the system call return path as it does now.

Yes, I like that...
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-12-04 22:34:27

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

From: george anzinger <[email protected]>
Date: Wed, 04 Dec 2002 14:31:09 -0800

It might help to understand just what registers do_signal
needs. It doesn't need them all, I suspect.

Some of the original register values the process had before
the system call was processed.

I think it's best just to abstract this away properly, as
Linus said to begin with I'd like to note, rather than trying
to come up with a "portable way to call do_signal()".

do_signal is magic and this allows all sorts of great optimizations
and simplifications, please don't add any constraints or complexity
to it.

2002-12-04 22:35:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Wed, 4 Dec 2002, george anzinger wrote:
>
> On the PARISC I did this (a long time ago in a far away
> place) by unwinding the stack to pick up the registers that
> were saved along the way. Is this at all feasible?

No. Alpha (and apparently sparc) simply do not save the registers that the
signal handling wants on the stack _at_all_. There are too many registers
to save at each system call entry point, and 99% of all system calls never
need it.

The system call return that checks for signals anyway will end up saving a
special stack frame when needed. As will the special signal-related system
calls (sigsuspend() and friends). All of this is not only architecture-
dependent, it is literally coded in assembly language for the
architectures. See "do_switch_stack()" for alpha.

Anyway, if you wondered why Linux beats every other Unix out there on
system call overhead, now you know. And yes, this is important.

Linus

2002-12-04 23:35:23

by Jim Houston

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds wrote:
>
> On Wed, 4 Dec 2002, george anzinger wrote:
> >
> > As a suggestion for a solution for this, is it true that
> > regs, on a system call, will ALWAYS be at the end of the
> > stack?
>
> No. Some architectures do not save enough state on the stack by default,
> and need to do more to use do_signal(). Look at alpha, for example - the
> default kernel stack doesn't contain all tbe registers needed, and
> the alpha do_signal() calling convention is different.
>
> If you want to handle do_signal(), then you need to do _all_ of this in
> architecture-specific files. You simply cannot do what you want to do in a
> generic way.
>
> Linus

Hi Linus,

Agreed! In my alternative version of the Posix timers patch, I avoid
calling do_signal() from clock_nanosleep by using a variant of the
existing ERESTARTNOHAND mechanism. The problem I ran into was that I
could not tell on entry to clock_nanosleep if it was a new call or
an old one being restarted. I solved this by adding a new
ERESTARTNANOSLP error code and making a small change in do_signal().
The handling of ERESTARTNANOSLP is the same as ERESTARTNOHAND but also
sets a new flag in the task_struct before restarting the system call.

This is still an architecture-specific change but atleast it is simple.

Jim Houston - Concurrent Computer Corp.

2002-12-05 00:10:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Wed, 4 Dec 2002, Jim Houston wrote:
>
> Agreed! In my alternative version of the Posix timers patch, I avoid
> calling do_signal() from clock_nanosleep by using a variant of the
> existing ERESTARTNOHAND mechanism. The problem I ran into was that I
> could not tell on entry to clock_nanosleep if it was a new call or
> an old one being restarted.

Restarting has other problems too, namely how to save off the partial
results.

> I solved this by adding a new
> ERESTARTNANOSLP error code and making a small change in do_signal().
> The handling of ERESTARTNANOSLP is the same as ERESTARTNOHAND but also
> sets a new flag in the task_struct before restarting the system call.

The problem I see with this is that the signal handler can do a
"siglongjump()" out of the regular path, and the next system call may well
be a _new_ nanosleep() that has nothing to do with the old one. And
realizing that it's _not_ a restarted one is interesting.

A better and more flexible approach would be to not restart the same
system call with the same parameters, but having some way of telling
do_signal to restart with new parameters and a new system call number.

For example, it shouldn't be impossible to have an interface more akin to

...
thread_info->restart_block.syscall = __NR_nanosleep_restart;
thread_info->restart_block.arg0 = timeout + jiffies; /* absolute time */
return -ERESTARTSYS_RESTARTBLOCK;

where the signal stack stuff re-writes not just eip (like the current
restart logic does), but also rewrites the system call number and the
argument registers.

This way you can get a truly restartable system call, because the
arguments really need to be fundamentally changed (the restarted system
call had better have _absolute_ time, not relative time, since we don't
know how much time passed before it got restarted).

Linus

2002-12-05 01:56:34

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds wrote:
>
> On Wed, 4 Dec 2002, Jim Houston wrote:
> >
> > Agreed! In my alternative version of the Posix timers patch, I avoid
> > calling do_signal() from clock_nanosleep by using a variant of the
> > existing ERESTARTNOHAND mechanism. The problem I ran into was that I
> > could not tell on entry to clock_nanosleep if it was a new call or
> > an old one being restarted.
>
> Restarting has other problems too, namely how to save off the partial
> results.
>
> > I solved this by adding a new
> > ERESTARTNANOSLP error code and making a small change in do_signal().
> > The handling of ERESTARTNANOSLP is the same as ERESTARTNOHAND but also
> > sets a new flag in the task_struct before restarting the system call.
>
> The problem I see with this is that the signal handler can do a
> "siglongjump()" out of the regular path, and the next system call may well
> be a _new_ nanosleep() that has nothing to do with the old one. And
> realizing that it's _not_ a restarted one is interesting.
>
> A better and more flexible approach would be to not restart the same
> system call with the same parameters, but having some way of telling
> do_signal to restart with new parameters and a new system call number.
>
> For example, it shouldn't be impossible to have an interface more akin to
>
> ...
> thread_info->restart_block.syscall = __NR_nanosleep_restart;
> thread_info->restart_block.arg0 = timeout + jiffies; /* absolute time */
> return -ERESTARTSYS_RESTARTBLOCK;
>
> where the signal stack stuff re-writes not just eip (like the current
> restart logic does), but also rewrites the system call number and the
> argument registers.
>
> This way you can get a truly restartable system call, because the
> arguments really need to be fundamentally changed (the restarted system
> call had better have _absolute_ time, not relative time, since we don't
> know how much time passed before it got restarted).

The way the system is now a system call "appears" to get by
value calls, but the parameters are on the stack (in the
regs structure). This is what is restored and passed back
on a system call restart. What I am getting at is that
nano_sleep could scribble anything it wants here and
"notice" it on the recall. A new call would not have these
values. So, all that needs to really happen in the arch
code is to set up the restart address (or, maybe the system
call number). But then, if this is what we do, do we even
need a new call?

But now you are going to tell me that other archs don't pass
parameters to system calls in this way, right?
Changing the call to absolute changes the semantics (in
particular the behavior on clock setting) in a way I don't
think you want to. I.e. you can tell it was done. So you
would have to do this in a way that does not look like the
absolute call in the current POSIX spec.
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-12-05 02:20:37

by Jim Houston

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds wrote:
>
> On Wed, 4 Dec 2002, Jim Houston wrote:
> >
> > Agreed! In my alternative version of the Posix timers patch, I avoid
> > calling do_signal() from clock_nanosleep by using a variant of the
> > existing ERESTARTNOHAND mechanism. The problem I ran into was that I
> > could not tell on entry to clock_nanosleep if it was a new call or
> > an old one being restarted.
>
> Restarting has other problems too, namely how to save off the partial
> results.
>
> > I solved this by adding a new
> > ERESTARTNANOSLP error code and making a small change in do_signal().
> > The handling of ERESTARTNANOSLP is the same as ERESTARTNOHAND but also
> > sets a new flag in the task_struct before restarting the system call.
>
> The problem I see with this is that the signal handler can do a
> "siglongjump()" out of the regular path, and the next system call may well
> be a _new_ nanosleep() that has nothing to do with the old one. And
> realizing that it's _not_ a restarted one is interesting.
>
> A better and more flexible approach would be to not restart the same
> system call with the same parameters, but having some way of telling
> do_signal to restart with new parameters and a new system call number.
>
> For example, it shouldn't be impossible to have an interface more akin to
>
> ...
> thread_info->restart_block.syscall = __NR_nanosleep_restart;
> thread_info->restart_block.arg0 = timeout + jiffies; /* absolute time */
> return -ERESTARTSYS_RESTARTBLOCK;
>
> where the signal stack stuff re-writes not just eip (like the current
> restart logic does), but also rewrites the system call number and the
> argument registers.
>
> This way you can get a truly restartable system call, because the
> arguments really need to be fundamentally changed (the restarted system
> call had better have _absolute_ time, not relative time, since we don't
> know how much time passed before it got restarted).
>
> Linus

Hi Linus,

The general solution you propose sounds nice but I have a feeling
the implementation would get ugly. It is hard enough to back up the
pc. I hate to think where the arguments are on some machines.

I think that "siglongjump()" is not a problem. My change to
do_signal() only sets the flag indicating a restart at the same time
it backs up the pc to restart the system call. I don't see a path
where the user code gets control before we're back at clock_nanosleep.

I'm saving the information to restart the nanosleep in the task_struct.
I have a pre-allocated timer which I leave running. When I get
into nanosleep for the restart, I just have to check if the timer has
already expired and, if not, go back to sleep. To calculate the
remaining time I also save an un-rounded copy of the absolute expiry
time (also in the task_struct).

Jim Houston - Concurrent Computer Corp.

2002-12-05 02:42:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Wed, 4 Dec 2002, george anzinger wrote:
>
> The way the system is now a system call "appears" to get by
> value calls, but the parameters are on the stack (in the
> regs structure). This is what is restored and passed back
> on a system call restart. What I am getting at is that
> nano_sleep could scribble anything it wants here and
> "notice" it on the recall.

Absolutely. That's what my ERESTARTSYS_RESTARTBLOCK thing is all about: a
"portable" way to let the architecture-specific do_signal() know what to
do about the return stack.

It mustn't be nanosleep()-specific, that just gets too nasty.

> Changing the call to absolute changes the semantics (in
> particular the behavior on clock setting) in a way I don't
> think you want to. I.e. you can tell it was done. So you
> would have to do this in a way that does not look like the
> absolute call in the current POSIX spec.

No, the point is that re-starting the system call is totally invisible to
user space, and user space would never use the "restart" system call
directly.

Let me give a more explicit example on an x86 level:

- This is part of the x86 library function:

movl 4(%esp),%ebx // request
movl 8(%esp),%ecx // remainder
movl $162,%eax // nanosleep syscall #
int 0x80 // system call

- this enters the kernel, which saves stuff off on the stack,
and calls sys_nanosleep by indexing the 162 off the system call
table. Time is now X.

- we're supposed to sleep until "X + request"

...
schedule_timeout()

- we get woken up by a signal thing, which doesn't have a handler, but
does (for example) put us to sleep. Let's say that it's SIGSTOP. To
handle the signal, sys_nanosleep() need to return -ERESTARTSYS because
it can't do it on its own.

- 2 seconds later, the user sends a SIGCONT, and the process restarts.
Time is now X+2, which may or may not be AFTER the original timeout.

See the problem here? We MUST NOT restart the system call with the
original timeout pointer (the contents of which we must not change). Not
only have we already slept part of the time (that part we know about), but
we may _also_ have been blocked by a signal part of the time (which has
been totally outside the control of sys_nanosleep()).

So my solution implies that our restart logic in do_signal(), which
already knows how to update the user-level EIP register (that's how the
restart is done), can also be told to update the system call and the
argument registers. So what we do is to introduce a _new_ system call
(system call number NNN), which takes a different form of timeout, namely
"absolute value of end time".

And then, when we enter do_signal(), we not only update %eip to point to
the original "int 0x80" instruction, we _also_ update %eax to point to the
new system call NNN, _and_ we update %ebx to contain the new timeout in
absolute jiffies:

current_thread->restart_block.syscall_nr = NNN;
current_thread->restart_block.arg0 = jiffies + timeout;

and then we have a

sys_nanosleep_resume(unsigned long timeout, struct timespec *rem)
{
long jif = timeout - jiffies;

if (jif > 0) {
current->state = TASK_INTERRUPTIBLE;
jif = schedule_timeout(jif);
/* interrupted - we already have the restart block set up */
if (jif) {
if (rem)
jiffies_to_usertimespec(jif, rem);
return -ERESTART_RESTARTBLOCK;
}
}
put_user(0, rem->tv_sec);
put_user(0, rem->tv_nsec);
return 0;
}

See? The "nanosleep_resume" system call is never used by a program
directly, it's only virtualized by the signal restart changing the system
call number on restart. (A user program _could_ use it directly, but
there's no point, and the interface to the thing might change at any
time).

Linus

2002-12-05 03:03:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


Wouldn't it be much easier to just keep the few system calls that need
to handle such magic system call restart in architecture specific
code ? There aren't that many of them. The arch part does not even
need to contain the full body of the syscall, just a wrapper around
a "do_*" function.

After all this 32bit emulation unification should be something to make
further development easier, not a cause in itself.

-Andi

2002-12-05 03:41:04

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds wrote:
>
> On Wed, 4 Dec 2002, george anzinger wrote:
> >
> > The way the system is now a system call "appears" to get by
> > value calls, but the parameters are on the stack (in the
> > regs structure). This is what is restored and passed back
> > on a system call restart. What I am getting at is that
> > nano_sleep could scribble anything it wants here and
> > "notice" it on the recall.
>
> Absolutely. That's what my ERESTARTSYS_RESTARTBLOCK thing is all about: a
> "portable" way to let the architecture-specific do_signal() know what to
> do about the return stack.
>
> It mustn't be nanosleep()-specific, that just gets too nasty.
>
> > Changing the call to absolute changes the semantics (in
> > particular the behavior on clock setting) in a way I don't
> > think you want to. I.e. you can tell it was done. So you
> > would have to do this in a way that does not look like the
> > absolute call in the current POSIX spec.
>
> No, the point is that re-starting the system call is totally invisible to
> user space, and user space would never use the "restart" system call
> directly.
>
> Let me give a more explicit example on an x86 level:
>
> - This is part of the x86 library function:
>
> movl 4(%esp),%ebx // request
> movl 8(%esp),%ecx // remainder
> movl $162,%eax // nanosleep syscall #
> int 0x80 // system call
>
> - this enters the kernel, which saves stuff off on the stack,
> and calls sys_nanosleep by indexing the 162 off the system call
> table. Time is now X.
>
> - we're supposed to sleep until "X + request"
>
> ...
> schedule_timeout()
>
> - we get woken up by a signal thing, which doesn't have a handler, but
> does (for example) put us to sleep. Let's say that it's SIGSTOP. To
> handle the signal, sys_nanosleep() need to return -ERESTARTSYS because
> it can't do it on its own.
>
> - 2 seconds later, the user sends a SIGCONT, and the process restarts.
> Time is now X+2, which may or may not be AFTER the original timeout.
>
> See the problem here? We MUST NOT restart the system call with the
> original timeout pointer (the contents of which we must not change). Not
> only have we already slept part of the time (that part we know about), but
> we may _also_ have been blocked by a signal part of the time (which has
> been totally outside the control of sys_nanosleep()).
>
> So my solution implies that our restart logic in do_signal(), which
> already knows how to update the user-level EIP register (that's how the
> restart is done), can also be told to update the system call and the
> argument registers.

Once it changes the system call (eax, right), could the new
call code then just get the parms from the restart_block.
Means less code for the signal handler and keeps things
simple. It also means that the call gets the orgional parms
back so it is very generic, i.e. the signal code does not
need to know which parms to change and which to not.

> So what we do is to introduce a _new_ system call
> (system call number NNN), which takes a different form of timeout, namely
> "absolute value of end time".

I think it would be best to keep this as generic as
possible, i.e. let the new call code fetch its own
paramerers from the restart_block.
>
> And then, when we enter do_signal(), we not only update %eip to point to
> the original "int 0x80" instruction, we _also_ update %eax to point to the
> new system call NNN, _and_ we update %ebx to contain the new timeout in
> absolute jiffies:
>
> current_thread->restart_block.syscall_nr = NNN;
> current_thread->restart_block.arg0 = jiffies + timeout;

My question is who sets up these values? I think you are
saying it should be the system call. Is this right?
>
> and then we have a
>
> sys_nanosleep_resume(unsigned long timeout, struct timespec *rem)
> {
> long jif = timeout - jiffies;
>
> if (jif > 0) {
> current->state = TASK_INTERRUPTIBLE;
> jif = schedule_timeout(jif);
> /* interrupted - we already have the restart block set up */
> if (jif) {
> if (rem)
> jiffies_to_usertimespec(jif, rem);
> return -ERESTART_RESTARTBLOCK;
> }
> }
> put_user(0, rem->tv_sec);
> put_user(0, rem->tv_nsec);
> return 0;
> }
>
> See? The "nanosleep_resume" system call is never used by a program
> directly, it's only virtualized by the signal restart changing the system
> call number on restart. (A user program _could_ use it directly, but
> there's no point, and the interface to the thing might change at any
> time).

I think we could cause it to error out, if, for example, the
restart_block is null.

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-12-05 04:02:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Wed, 4 Dec 2002, george anzinger wrote:
>
> Once it changes the system call (eax, right), could the new
> call code then just get the parms from the restart_block.

Agreed.

> I think it would be best to keep this as generic as
> possible, i.e. let the new call code fetch its own
> paramerers from the restart_block.

We could even have one _single_ a generic "restart" system call, and have
the function pointer for that be in the restart block.

> My question is who sets up these values? I think you are
> saying it should be the system call. Is this right?

Whatever system call that return -ERESTART_RESTARTBLOCK, yes.

So it would never get set up at all in the fast path. Only in the error
case path of a system call that wants to have restarting capabilities.

Linus

2002-12-05 07:04:41

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds wrote:
>
> On Wed, 4 Dec 2002, george anzinger wrote:
> >
> > Once it changes the system call (eax, right), could the new
> > call code then just get the parms from the restart_block.
>
> Agreed.
>
> > I think it would be best to keep this as generic as
> > possible, i.e. let the new call code fetch its own
> > paramerers from the restart_block.
>
> We could even have one _single_ a generic "restart" system call, and have
> the function pointer for that be in the restart block.

I think what you mean is that, if there is a
restart_function (i.e. the block is set up) and the return
is -ERESTART_RESTARTBLOCK, then change eax (x86 ) to call
sys_restart which would in turn call the function in the
restart_block.

One of the problems with this is the way parameters are
passed to system calls. One way to do this would be to have
sys_restart branch to the restart_function (requires asm).
Another way is to just pass a struct pointer (actually the
reg struct) which the restart function could sort out. For
example for nano_sleep:

int sys_restart(struct void parms)
{
return (current->restart_block.sys_call) (&parms);

}
Then:
struct nano_sleep_call{
struct timespec *tp;
struct timespec *rem;
}
int restart_nano_sleep(struct nano_sleep_call *parm)
>
> > My question is who sets up these values? I think you are
> > saying it should be the system call. Is this right?
>
> Whatever system call that return -ERESTART_RESTARTBLOCK, yes.
>
> So it would never get set up at all in the fast path. Only in the error
> case path of a system call that wants to have restarting capabilities.

And then only when returning -ERESTART_RESTARTBLOCK.
>
> Linus
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-12-05 09:47:48

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S
--- linux-2.5.50-bk4-kb/arch/i386/kernel/entry.S Wed Dec 4 23:28:20 2002
+++ linux/arch/i386/kernel/entry.S Wed Dec 4 23:48:49 2002
@@ -769,6 +769,7 @@
.long sys_epoll_wait
.long sys_remap_file_pages
.long sys_set_tid_address
+ .long sys_restart_syscall


.rept NR_syscalls-(.-sys_call_table)/4
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/arch/i386/kernel/signal.c linux/arch/i386/kernel/signal.c
--- linux-2.5.50-bk4-kb/arch/i386/kernel/signal.c Thu Oct 3 10:41:57 2002
+++ linux/arch/i386/kernel/signal.c Thu Dec 5 00:27:21 2002
@@ -507,6 +507,7 @@
/* If so, check system call restarting.. */
switch (regs->eax) {
case -ERESTARTNOHAND:
+ case -ERESTART_RESTARTBLOCK:
regs->eax = -EINTR;
break;

@@ -589,6 +590,10 @@
regs->eax == -ERESTARTSYS ||
regs->eax == -ERESTARTNOINTR) {
regs->eax = regs->orig_eax;
+ regs->eip -= 2;
+ }
+ if (regs->eax == -ERESTART_RESTARTBLOCK){
+ regs->eax = __NR_restart_syscall;
regs->eip -= 2;
}
}
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/include/asm-i386/thread_info.h linux/include/asm-i386/thread_info.h
--- linux-2.5.50-bk4-kb/include/asm-i386/thread_info.h Mon Sep 9 10:35:03 2002
+++ linux/include/asm-i386/thread_info.h Thu Dec 5 01:07:23 2002
@@ -20,6 +20,12 @@
* - if the contents of this structure are changed, the assembly constants must also be changed
*/
#ifndef __ASSEMBLY__
+struct restart_block {
+ int (*fun)(void *);
+ long arg0;
+ long arg1;
+};
+
struct thread_info {
struct task_struct *task; /* main task structure */
struct exec_domain *exec_domain; /* execution domain */
@@ -31,6 +37,7 @@
0-0xBFFFFFFF for user-thead
0-0xFFFFFFFF for kernel-thread
*/
+ struct restart_block restart_block;

__u8 supervisor_stack[0];
};
@@ -44,6 +51,7 @@
#define TI_CPU 0x0000000C
#define TI_PRE_COUNT 0x00000010
#define TI_ADDR_LIMIT 0x00000014
+#define TI_RESTART_BLOCK 0x0000018

#endif

@@ -63,6 +71,9 @@
.cpu = 0, \
.preempt_count = 1, \
.addr_limit = KERNEL_DS, \
+ .restart_block = { \
+ .fun = 0, \
+ }, \
}

#define init_thread_info (init_thread_union.thread_info)
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/include/asm-i386/unistd.h linux/include/asm-i386/unistd.h
--- linux-2.5.50-bk4-kb/include/asm-i386/unistd.h Wed Nov 27 15:49:22 2002
+++ linux/include/asm-i386/unistd.h Wed Dec 4 23:48:46 2002
@@ -263,7 +263,7 @@
#define __NR_sys_epoll_wait 256
#define __NR_remap_file_pages 257
#define __NR_set_tid_address 258
-
+#define __NR_restart_syscall 259

/* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */

diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/include/linux/errno.h linux/include/linux/errno.h
--- linux-2.5.50-bk4-kb/include/linux/errno.h Mon Sep 9 10:35:15 2002
+++ linux/include/linux/errno.h Wed Dec 4 23:53:21 2002
@@ -10,6 +10,7 @@
#define ERESTARTNOINTR 513
#define ERESTARTNOHAND 514 /* restart if no handler.. */
#define ENOIOCTLCMD 515 /* No ioctl command */
+#define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */

/* Defined for the NFSv3 protocol */
#define EBADHANDLE 521 /* Illegal NFS file handle */
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/kernel/signal.c linux/kernel/signal.c
--- linux-2.5.50-bk4-kb/kernel/signal.c Wed Dec 4 23:27:02 2002
+++ linux/kernel/signal.c Thu Dec 5 01:18:20 2002
@@ -1351,6 +1351,15 @@
* System call entry points.
*/

+asmlinkage long
+sys_restart_syscall( void *parm)
+{
+ if ( ! current_thread_info()->restart_block.fun){
+ return current_thread_info()->restart_block.fun(&parm);
+ }
+ return -ENOSYS;
+}
+
/*
* We don't need to get the kernel lock - this is all local to this
* particular thread.. (and that's good, because this is _heavily_
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/kernel/timer.c linux/kernel/timer.c
--- linux-2.5.50-bk4-kb/kernel/timer.c Wed Dec 4 23:27:02 2002
+++ linux/kernel/timer.c Thu Dec 5 01:16:03 2002
@@ -1020,19 +1020,39 @@
return current->pid;
}

+struct nano_sleep_call {
+ struct timespec *rqtp;
+ struct timespec *rmtp;
+};
+
+asmlinkage long sys_nanosleep_restart( struct nano_sleep_call * parms);
+
asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)
{
struct timespec t;
unsigned long expire;
+ struct restart_block *restart_block;

- if(copy_from_user(&t, rqtp, sizeof(struct timespec)))
- return -EFAULT;
-
- if (t.tv_nsec >= 1000000000L || t.tv_nsec < 0 || t.tv_sec < 0)
- return -EINVAL;
+ if (rqtp) {
+ if(copy_from_user(&t, rqtp, sizeof(struct timespec)))
+ return -EFAULT;

- expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);
+ if (t.tv_nsec >= 1000000000L || t.tv_nsec < 0 || t.tv_sec < 0)
+ return -EINVAL;
+ expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);

+ }else{
+ restart_block = &current_thread_info()->restart_block;
+ if( restart_block->fun !=
+ (int (*)(void *))sys_nanosleep_restart ||
+ ! restart_block->arg0){
+ return -EFAULT;
+ }
+ restart_block->fun = NULL;
+ expire = restart_block->arg0 - jiffies;
+ if (expire < 0)
+ return 0;
+ }
current->state = TASK_INTERRUPTIBLE;
expire = schedule_timeout(expire);

@@ -1042,10 +1062,19 @@
if (copy_to_user(rmtp, &t, sizeof(struct timespec)))
return -EFAULT;
}
- return -EINTR;
+ restart_block = &current_thread_info()->restart_block;
+ restart_block->fun = (int (*)(void *))sys_nanosleep_restart;
+ restart_block->arg0 = jiffies + expire;
+ return -ERESTART_RESTARTBLOCK;
}
return 0;
}
+
+asmlinkage long sys_nanosleep_restart( struct nano_sleep_call * parms)
+{
+ return sys_nanosleep(NULL, parms->rmtp);
+}
+

/*
* sys_sysinfo - fill in sysinfo struct


Attachments:
regs-fix-2.5.50-bk4.1.0.patch (5.97 kB)

2002-12-05 15:17:44

by Jim Houston

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Hi Linus, George,

I like the direction that this ERESTART_RESTARTBLOCK patch is
going.

It might be nice to clear the restart_block.fun in handle_signal()
in the ERESTART_RESTARTBLOCK path which returns -EINTR. This eliminates
the chance of a stale restart.

Jim Houston - Concurrent Computer Corp.

2002-12-05 16:31:44

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S
--- linux-2.5.50-bk4-kb/arch/i386/kernel/entry.S Wed Dec 4 23:28:20 2002
+++ linux/arch/i386/kernel/entry.S Wed Dec 4 23:48:49 2002
@@ -769,6 +769,7 @@
.long sys_epoll_wait
.long sys_remap_file_pages
.long sys_set_tid_address
+ .long sys_restart_syscall


.rept NR_syscalls-(.-sys_call_table)/4
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/arch/i386/kernel/signal.c linux/arch/i386/kernel/signal.c
--- linux-2.5.50-bk4-kb/arch/i386/kernel/signal.c Thu Oct 3 10:41:57 2002
+++ linux/arch/i386/kernel/signal.c Thu Dec 5 08:16:30 2002
@@ -506,6 +506,8 @@
if (regs->orig_eax >= 0) {
/* If so, check system call restarting.. */
switch (regs->eax) {
+ case -ERESTART_RESTARTBLOCK:
+ current_thread_info()->restart_block.fun = NULL;
case -ERESTARTNOHAND:
regs->eax = -EINTR;
break;
@@ -589,6 +591,10 @@
regs->eax == -ERESTARTSYS ||
regs->eax == -ERESTARTNOINTR) {
regs->eax = regs->orig_eax;
+ regs->eip -= 2;
+ }
+ if (regs->eax == -ERESTART_RESTARTBLOCK){
+ regs->eax = __NR_restart_syscall;
regs->eip -= 2;
}
}
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/include/asm-i386/thread_info.h linux/include/asm-i386/thread_info.h
--- linux-2.5.50-bk4-kb/include/asm-i386/thread_info.h Mon Sep 9 10:35:03 2002
+++ linux/include/asm-i386/thread_info.h Thu Dec 5 01:07:23 2002
@@ -20,6 +20,12 @@
* - if the contents of this structure are changed, the assembly constants must also be changed
*/
#ifndef __ASSEMBLY__
+struct restart_block {
+ int (*fun)(void *);
+ long arg0;
+ long arg1;
+};
+
struct thread_info {
struct task_struct *task; /* main task structure */
struct exec_domain *exec_domain; /* execution domain */
@@ -31,6 +37,7 @@
0-0xBFFFFFFF for user-thead
0-0xFFFFFFFF for kernel-thread
*/
+ struct restart_block restart_block;

__u8 supervisor_stack[0];
};
@@ -44,6 +51,7 @@
#define TI_CPU 0x0000000C
#define TI_PRE_COUNT 0x00000010
#define TI_ADDR_LIMIT 0x00000014
+#define TI_RESTART_BLOCK 0x0000018

#endif

@@ -63,6 +71,9 @@
.cpu = 0, \
.preempt_count = 1, \
.addr_limit = KERNEL_DS, \
+ .restart_block = { \
+ .fun = 0, \
+ }, \
}

#define init_thread_info (init_thread_union.thread_info)
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/include/asm-i386/unistd.h linux/include/asm-i386/unistd.h
--- linux-2.5.50-bk4-kb/include/asm-i386/unistd.h Wed Nov 27 15:49:22 2002
+++ linux/include/asm-i386/unistd.h Wed Dec 4 23:48:46 2002
@@ -263,7 +263,7 @@
#define __NR_sys_epoll_wait 256
#define __NR_remap_file_pages 257
#define __NR_set_tid_address 258
-
+#define __NR_restart_syscall 259

/* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */

diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/include/linux/errno.h linux/include/linux/errno.h
--- linux-2.5.50-bk4-kb/include/linux/errno.h Mon Sep 9 10:35:15 2002
+++ linux/include/linux/errno.h Wed Dec 4 23:53:21 2002
@@ -10,6 +10,7 @@
#define ERESTARTNOINTR 513
#define ERESTARTNOHAND 514 /* restart if no handler.. */
#define ENOIOCTLCMD 515 /* No ioctl command */
+#define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */

/* Defined for the NFSv3 protocol */
#define EBADHANDLE 521 /* Illegal NFS file handle */
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/kernel/signal.c linux/kernel/signal.c
--- linux-2.5.50-bk4-kb/kernel/signal.c Wed Dec 4 23:27:02 2002
+++ linux/kernel/signal.c Thu Dec 5 01:18:20 2002
@@ -1351,6 +1351,15 @@
* System call entry points.
*/

+asmlinkage long
+sys_restart_syscall( void *parm)
+{
+ if ( ! current_thread_info()->restart_block.fun){
+ return current_thread_info()->restart_block.fun(&parm);
+ }
+ return -ENOSYS;
+}
+
/*
* We don't need to get the kernel lock - this is all local to this
* particular thread.. (and that's good, because this is _heavily_
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk4-kb/kernel/timer.c linux/kernel/timer.c
--- linux-2.5.50-bk4-kb/kernel/timer.c Wed Dec 4 23:27:02 2002
+++ linux/kernel/timer.c Thu Dec 5 01:16:03 2002
@@ -1020,19 +1020,39 @@
return current->pid;
}

+struct nano_sleep_call {
+ struct timespec *rqtp;
+ struct timespec *rmtp;
+};
+
+asmlinkage long sys_nanosleep_restart( struct nano_sleep_call * parms);
+
asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)
{
struct timespec t;
unsigned long expire;
+ struct restart_block *restart_block;

- if(copy_from_user(&t, rqtp, sizeof(struct timespec)))
- return -EFAULT;
-
- if (t.tv_nsec >= 1000000000L || t.tv_nsec < 0 || t.tv_sec < 0)
- return -EINVAL;
+ if (rqtp) {
+ if(copy_from_user(&t, rqtp, sizeof(struct timespec)))
+ return -EFAULT;

- expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);
+ if (t.tv_nsec >= 1000000000L || t.tv_nsec < 0 || t.tv_sec < 0)
+ return -EINVAL;
+ expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);

+ }else{
+ restart_block = &current_thread_info()->restart_block;
+ if( restart_block->fun !=
+ (int (*)(void *))sys_nanosleep_restart ||
+ ! restart_block->arg0){
+ return -EFAULT;
+ }
+ restart_block->fun = NULL;
+ expire = restart_block->arg0 - jiffies;
+ if (expire < 0)
+ return 0;
+ }
current->state = TASK_INTERRUPTIBLE;
expire = schedule_timeout(expire);

@@ -1042,10 +1062,19 @@
if (copy_to_user(rmtp, &t, sizeof(struct timespec)))
return -EFAULT;
}
- return -EINTR;
+ restart_block = &current_thread_info()->restart_block;
+ restart_block->fun = (int (*)(void *))sys_nanosleep_restart;
+ restart_block->arg0 = jiffies + expire;
+ return -ERESTART_RESTARTBLOCK;
}
return 0;
}
+
+asmlinkage long sys_nanosleep_restart( struct nano_sleep_call * parms)
+{
+ return sys_nanosleep(NULL, parms->rmtp);
+}
+

/*
* sys_sysinfo - fill in sysinfo struct
Binary files linux-2.5.50-bk4-kb/scripts/lxdialog/lxdialog and linux/scripts/lxdialog/lxdialog differ
Binary files linux-2.5.50-bk4-kb/usr/gen_init_cpio and linux/usr/gen_init_cpio differ
Binary files linux-2.5.50-bk4-kb/usr/initramfs_data.cpio.gz and linux/usr/initramfs_data.cpio.gz differ


Attachments:
regs-fix-2.5.50-bk4.1.1.patch (6.33 kB)

2002-12-05 16:54:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)



On Thu, 5 Dec 2002, george anzinger wrote:
>
> I think this covers all the bases. It builds boots and
> runs. I haven't tested nano_sleep to see if it does the
> right thing yet...

Well, it definitely doesn't, since at least this test is the wrong way
around (as well as being against the coding style whitespace rules ;-p):

+ if ( ! current_thread_info()->restart_block.fun){
+ return current_thread_info()->restart_block.fun(&parm);

Also, I would suggest against having a NULL pointer, and instead just
initializing it with a function that sets it to an error return (don't use
ENOSYS, since the system call _does_ exist, and ENOSYS is what old kernels
would return if you do it by hand by mistake. I'd suggest -EINTR, since
that will "DoTheRightThing(tm)" if we somehow get confused).

Linus

2002-12-05 23:56:59

by Richard Henderson

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

On Thu, Dec 05, 2002 at 08:35:18AM -0800, george anzinger wrote:
> +asmlinkage long
> +sys_restart_syscall( void *parm)
> +{
> + if ( ! current_thread_info()->restart_block.fun){
> + return current_thread_info()->restart_block.fun(&parm);
> + }

(1) Address of parameter?
(2) Passing in such parameters unchecked is a security hole.
(3) Much easier to just take all information from restart_block
instead of pushing it into the fake syscall.
(4) Should probably clear restart_block.fun immediately.


r~

2002-12-06 09:11:23

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk5-kb/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S
--- linux-2.5.50-bk5-kb/arch/i386/kernel/entry.S Thu Dec 5 12:28:15 2002
+++ linux/arch/i386/kernel/entry.S Thu Dec 5 12:28:52 2002
@@ -769,6 +769,7 @@
.long sys_epoll_wait
.long sys_remap_file_pages
.long sys_set_tid_address
+ .long sys_restart_syscall


.rept NR_syscalls-(.-sys_call_table)/4
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk5-kb/arch/i386/kernel/signal.c linux/arch/i386/kernel/signal.c
--- linux-2.5.50-bk5-kb/arch/i386/kernel/signal.c Thu Oct 3 10:41:57 2002
+++ linux/arch/i386/kernel/signal.c Thu Dec 5 12:28:52 2002
@@ -506,6 +506,9 @@
if (regs->orig_eax >= 0) {
/* If so, check system call restarting.. */
switch (regs->eax) {
+ case -ERESTART_RESTARTBLOCK:
+ current_thread_info()->restart_block.fun =
+ do_no_restart_syscall;
case -ERESTARTNOHAND:
regs->eax = -EINTR;
break;
@@ -589,6 +592,10 @@
regs->eax == -ERESTARTSYS ||
regs->eax == -ERESTARTNOINTR) {
regs->eax = regs->orig_eax;
+ regs->eip -= 2;
+ }
+ if (regs->eax == -ERESTART_RESTARTBLOCK){
+ regs->eax = __NR_restart_syscall;
regs->eip -= 2;
}
}
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk5-kb/include/asm-i386/thread_info.h linux/include/asm-i386/thread_info.h
--- linux-2.5.50-bk5-kb/include/asm-i386/thread_info.h Mon Sep 9 10:35:03 2002
+++ linux/include/asm-i386/thread_info.h Thu Dec 5 12:28:53 2002
@@ -11,6 +11,7 @@

#ifndef __ASSEMBLY__
#include <asm/processor.h>
+#include <linux/linkage.h>
#endif

/*
@@ -20,6 +21,12 @@
* - if the contents of this structure are changed, the assembly constants must also be changed
*/
#ifndef __ASSEMBLY__
+struct restart_block {
+ long (*fun)(void *);
+ long arg0;
+ long arg1;
+};
+
struct thread_info {
struct task_struct *task; /* main task structure */
struct exec_domain *exec_domain; /* execution domain */
@@ -31,6 +38,7 @@
0-0xBFFFFFFF for user-thead
0-0xFFFFFFFF for kernel-thread
*/
+ struct restart_block restart_block;

__u8 supervisor_stack[0];
};
@@ -44,6 +52,7 @@
#define TI_CPU 0x0000000C
#define TI_PRE_COUNT 0x00000010
#define TI_ADDR_LIMIT 0x00000014
+#define TI_RESTART_BLOCK 0x0000018

#endif

@@ -55,6 +64,12 @@
* preempt_count needs to be 1 initially, until the scheduler is functional.
*/
#ifndef __ASSEMBLY__
+/*
+ * We need this to do the initialization, but we don't want to clutter up
+ * things with the signal.h which is where it should be...
+ */
+extern asmlinkage long do_no_restart_syscall( void *parm);
+
#define INIT_THREAD_INFO(tsk) \
{ \
.task = &tsk, \
@@ -63,6 +78,9 @@
.cpu = 0, \
.preempt_count = 1, \
.addr_limit = KERNEL_DS, \
+ .restart_block = { \
+ .fun = do_no_restart_syscall, \
+ }, \
}

#define init_thread_info (init_thread_union.thread_info)
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk5-kb/include/asm-i386/unistd.h linux/include/asm-i386/unistd.h
--- linux-2.5.50-bk5-kb/include/asm-i386/unistd.h Wed Nov 27 15:49:22 2002
+++ linux/include/asm-i386/unistd.h Thu Dec 5 12:28:53 2002
@@ -263,7 +263,7 @@
#define __NR_sys_epoll_wait 256
#define __NR_remap_file_pages 257
#define __NR_set_tid_address 258
-
+#define __NR_restart_syscall 259

/* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */

diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk5-kb/include/linux/errno.h linux/include/linux/errno.h
--- linux-2.5.50-bk5-kb/include/linux/errno.h Mon Sep 9 10:35:15 2002
+++ linux/include/linux/errno.h Thu Dec 5 12:28:54 2002
@@ -10,6 +10,7 @@
#define ERESTARTNOINTR 513
#define ERESTARTNOHAND 514 /* restart if no handler.. */
#define ENOIOCTLCMD 515 /* No ioctl command */
+#define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */

/* Defined for the NFSv3 protocol */
#define EBADHANDLE 521 /* Illegal NFS file handle */
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk5-kb/include/linux/signal.h linux/include/linux/signal.h
--- linux-2.5.50-bk5-kb/include/linux/signal.h Mon Sep 9 10:35:04 2002
+++ linux/include/linux/signal.h Thu Dec 5 12:28:54 2002
@@ -219,6 +219,7 @@
}

extern long do_sigpending(void *, unsigned long);
+extern asmlinkage long do_no_restart_syscall( void *parm);

#ifndef HAVE_ARCH_GET_SIGNAL_TO_DELIVER
struct pt_regs;
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk5-kb/kernel/signal.c linux/kernel/signal.c
--- linux-2.5.50-bk5-kb/kernel/signal.c Thu Dec 5 11:48:53 2002
+++ linux/kernel/signal.c Thu Dec 5 12:28:54 2002
@@ -1351,6 +1351,19 @@
* System call entry points.
*/

+asmlinkage long
+sys_restart_syscall( void *parm)
+{
+ return current_thread_info()->restart_block.fun(&parm);
+}
+
+asmlinkage long
+do_no_restart_syscall( void *parm)
+{
+ return -EINTR;
+}
+
+
/*
* We don't need to get the kernel lock - this is all local to this
* particular thread.. (and that's good, because this is _heavily_
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.50-bk5-kb/kernel/timer.c linux/kernel/timer.c
--- linux-2.5.50-bk5-kb/kernel/timer.c Thu Dec 5 11:48:53 2002
+++ linux/kernel/timer.c Thu Dec 5 12:48:54 2002
@@ -1020,21 +1020,50 @@
return current->pid;
}

+struct nano_sleep_call {
+ struct timespec *rqtp;
+ struct timespec *rmtp;
+};
+
+asmlinkage long sys_nanosleep_restart( struct nano_sleep_call * parms);
+
+
long do_nanosleep(struct timespec *t)
{
unsigned long expire;
+ struct restart_block *restart_block =
+ &current_thread_info()->restart_block;

- if ((t->tv_nsec >= 1000000000L) || (t->tv_nsec < 0) || (t->tv_sec < 0))
- return -EINVAL;
+ if( restart_block->fun == (int (*)(void *))sys_nanosleep_restart){
+ /*
+ * Interrupted by a non-delivered signal, pick up remaining
+ * time and continue.
+ */
+ restart_block->fun = do_no_restart_syscall;
+ if(!restart_block->arg0)
+ return -EINTR;
+
+ expire = restart_block->arg0 - jiffies;
+ if(expire <= 0)
+ return 0;
+ }else{
+ if ((t->tv_nsec >= 1000000000L) ||
+ (t->tv_nsec < 0) ||
+ (t->tv_sec < 0))
+ return -EINVAL;

- expire = timespec_to_jiffies(t) + (t->tv_sec || t->tv_nsec);
+ expire = timespec_to_jiffies(t) + (t->tv_sec || t->tv_nsec);
+ }

current->state = TASK_INTERRUPTIBLE;
expire = schedule_timeout(expire);

if (expire) {
jiffies_to_timespec(expire, t);
- return -EINTR;
+ restart_block = &current_thread_info()->restart_block;
+ restart_block->fun = (int (*)(void *))sys_nanosleep_restart;
+ restart_block->arg0 = jiffies + expire;
+ return -ERESTART_RESTARTBLOCK;
}
return 0;
}
@@ -1048,11 +1077,16 @@
return -EFAULT;

ret = do_nanosleep(&t);
- if (rmtp && (ret == -EINTR)) {
+ if (rmtp && (ret == -ERESTART_RESTARTBLOCK)) {
if (copy_to_user(rmtp, &t, sizeof(t)))
return -EFAULT;
}
return ret;
+}
+
+asmlinkage long sys_nanosleep_restart( struct nano_sleep_call * parms)
+{
+ return sys_nanosleep(parms->rqtp, parms->rmtp);
}

/*


Attachments:
reg-fix-2.5.50-bk5-1.0.patch (7.05 kB)

2002-12-06 17:50:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


I just pushed my version of the system call restart code to the BK trees.
It's losely based on Georges code, but subtly different. Also, I didn't
actually update any actual system calls to use it, I just did the
infrastructure.

Non-x86 architectures need to be updated to work with this: they need to
update their thread structures, the additional do_signal() support in
their signal.c, and add the actual system call itself somewhere. For x86,
this was about 15 lines of changes.

The basic premise is very simple: if you want to restart a system call,
you can do

restart = &current_thread()->restart_block;
restart->fn = my_continuation_function;
restart->arg0 = my_arg0_for_continuation;
restart->arg1 = my_arg1_for_continuation;
..
return -ERESTARTSYS_RESTARTBLOCK;

which will cause the system call to either return -EINTR (if a signal
handler was actually invoced) or for "benign" signals (SIGSTOP etc) it
will end up restarting the system call at the continuation function (with
the "restart" block as the argument).

We could extend this to allow restarting even over signal handlers, but
that would have some re-entrancy issues (ie what happens if a signal
handler itself wants to use a system call that may want restarting), so at
least for now restarting is only done when no handler is invoced (*).

Linus

(*) The nesting case is by no means impossible to handle gracefully
(adding a "restart even if handler is called" error number and returning
-EINTR if nesting, for example), but I don't know of any system calls that
would really want to try to restart anyway, so..

2002-12-06 19:12:05

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Fri, 6 Dec 2002, Linus Torvalds wrote:
>
> I just pushed my version of the system call restart code to the BK trees.
> It's losely based on Georges code, but subtly different. Also, I didn't
> actually update any actual system calls to use it, I just did the
> infrastructure.

I did the nanosleep() implementation using the new infrastructure now, and
am pushing it out as I write this.

Ironically (considering the origin of the thread), this actually _breaks_
the kernel/compat.c nanosleep handling, since the restarting really needs
to know the type for "struct timespec", and the common "do_nanosleep()"
was just too stupid and limited to be able to do restarting sanely.

Compat people can hopefully fix it up. Either by just copying the
nanosleep function and not even trying to share code, or by making the
restart function be a function pointer argument to a new and improved
common "do_nanosleep()".

It's been tested, and the only problem I found (which is kind of
fundamental) is that if the system call gets interrupted by a signal and
restarted, and then later returns successfully, the partial restart will
have updated the "remaining time" field to whatever was remaining when the
restart was started.

That could be fixed by making the restart block contain not just the
restart pointer, but also a "no restart possible" pointer, which would be
the one called if the signal handler logic ended up returning -EINTR.

It's a trivial extension, and possibly worth it regardless (it might be
useful for other system call cases too that may want to undo some
reservation or whatever), but I would like to hear from the standards
lawyers whether POSIX/SuS actually cares or not. George?

Linus

2002-12-06 20:13:00

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds wrote:
>
> On Fri, 6 Dec 2002, Linus Torvalds wrote:
> >
> > I just pushed my version of the system call restart code to the BK trees.
> > It's losely based on Georges code, but subtly different. Also, I didn't
> > actually update any actual system calls to use it, I just did the
> > infrastructure.
>
> I did the nanosleep() implementation using the new infrastructure now, and
> am pushing it out as I write this.
>
> Ironically (considering the origin of the thread), this actually _breaks_
> the kernel/compat.c nanosleep handling, since the restarting really needs
> to know the type for "struct timespec", and the common "do_nanosleep()"
> was just too stupid and limited to be able to do restarting sanely.
>
> Compat people can hopefully fix it up. Either by just copying the
> nanosleep function and not even trying to share code, or by making the
> restart function be a function pointer argument to a new and improved
> common "do_nanosleep()".
>
> It's been tested, and the only problem I found (which is kind of
> fundamental) is that if the system call gets interrupted by a signal and
> restarted, and then later returns successfully, the partial restart will
> have updated the "remaining time" field to whatever was remaining when the
> restart was started.
>
> That could be fixed by making the restart block contain not just the
> restart pointer, but also a "no restart possible" pointer, which would be
> the one called if the signal handler logic ended up returning -EINTR.
>
> It's a trivial extension, and possibly worth it regardless (it might be
> useful for other system call cases too that may want to undo some
> reservation or whatever), but I would like to hear from the standards
> lawyers whether POSIX/SuS actually cares or not. George?

My reading of the standard indicates that the return values
have meaning ONLY if EINTR is returned. I changed the POSIX
Clocks & timers patch to do it this way, and, yes it is
observable from user space. My test code tried to pass a
bad return address to flush out an error which failed
because the address was not used so I just changed the
test. My reading of the prior nanosleep seemed to say the
same thing, i.e. the address was not dereferenced on
success.

I have not looked at your code yet, but I am concerned that
the restart may not be able to get to the original
parameters. For nanosleep this is not a problem, but for
clock_nanosleep there are 4 parameters, at least two of
which are needed for restart (the Clock and the return
address). (See the POSIX timers patch for example.)

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-12-06 21:05:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Fri, 6 Dec 2002, george anzinger wrote:
>
> I have not looked at your code yet, but I am concerned that
> the restart may not be able to get to the original
> parameters.

The way the new system call restarting is done, it never looks at the old
parameters. They don't even _exist_ for the restarted call (well, they do,
but the restart function can't actually get at them). So it is up to the
original interrupted call to save off anything it needs saving off (and it
get sthe "restart_block" structure to do that saving in. Right now that's
just three words, but we can expand it if necessary).

Anyway, it sounds like the new nanosleep behaviour is acceptable (and
certainly closer to SuS than the old one), and that the "time remaining"
part is simply undefined for a successful sleep. I'd like to clean it up
(either always clearing the time on success, or just never updating it at
all), but at least standards lawyers aren't going to complain.


Linus

2002-12-06 21:49:10

by Jim Houston

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds wrote:
>
> On Fri, 6 Dec 2002, george anzinger wrote:
> >
> > I have not looked at your code yet, but I am concerned that
> > the restart may not be able to get to the original
> > parameters.
>
> The way the new system call restarting is done, it never looks at the old
> parameters. They don't even _exist_ for the restarted call (well, they do,
> but the restart function can't actually get at them). So it is up to the
> original interrupted call to save off anything it needs saving off (and it
> get sthe "restart_block" structure to do that saving in. Right now that's
> just three words, but we can expand it if necessary).
>

Hi Linus,

I know it would be a few extra lines of assembly code but it would be
nice if the restart routine had the original arguments. Would it be too
ugly to do something like:

sys_restart_syscall:
GET_THREAD_INFO(%eax)
jmp TI_RESTART_BLOCK(%eax)

I'm having second thoughts about even sending this. Its just that I hate
casts more than I hate assembly code and using the restart_block to save
the arguments implys casts.

Jim Houston - Concurrent Computer Corp.

2002-12-06 22:51:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Fri, 6 Dec 2002, Jim Houston wrote:
>
> I know it would be a few extra lines of assembly code but it would be
> nice if the restart routine had the original arguments.

It's not even extra code on x86, since we don't stomp on any of the
arguments, and they will all have the same values when returning. So on
x86, we could see the arguments by just adding parameters to the
sys_restart_syscall() function.

However, the same is not necessarily true on other architectures, where
there can be overlap between clobbers and arguments (so that the first
invocation of the system call may have trashed the arguments unless it
explicitly saves it), and that's the reason I don't want to expose the
original ones.

Also, I actually much prefer the arguments to be saved away in the restart
block for another reason too - because that way you can _trust_ them. The
restart function basically knows that the arguments are truly the same
that were saved away - if you allow the register contents from user space
to be re-used, clever 'ptrace()' usage will be able to change the
registers.

In other words, with the current setup, we can actually have hidden kernel
state inside the restart block, and it never gets leaked to/from user
space. That's potentially quite useful in itself (you can cache argument
values).

Linus

2002-12-06 23:03:44

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Jim Houston wrote:
>
> Linus Torvalds wrote:
> >
> > On Fri, 6 Dec 2002, george anzinger wrote:
> > >
> > > I have not looked at your code yet, but I am concerned that
> > > the restart may not be able to get to the original
> > > parameters.
> >
> > The way the new system call restarting is done, it never looks at the old
> > parameters. They don't even _exist_ for the restarted call (well, they do,
> > but the restart function can't actually get at them). So it is up to the
> > original interrupted call to save off anything it needs saving off (and it
> > get sthe "restart_block" structure to do that saving in. Right now that's
> > just three words, but we can expand it if necessary).
> >
>
> Hi Linus,
>
> I know it would be a few extra lines of assembly code but it would be
> nice if the restart routine had the original arguments. Would it be too
> ugly to do something like:
>
> sys_restart_syscall:
> GET_THREAD_INFO(%eax)
> jmp TI_RESTART_BLOCK(%eax)
>
> I'm having second thoughts about even sending this. Its just that I hate
> casts more than I hate assembly code and using the restart_block to save
> the arguments implys casts.
>
I too, think the original parameters are very useful. I
keep wondering if we could simplify all this by just
restarting the SAME system call. Keep the restart block to
save stuff in AND let deliver_signal clear a word in it if
it is not restarting the call. This way the system call
knows it is being restarted, but it has all the parameters
it need PLUS the restart arguments. The only hole I see
here is making sure the restart block is cleared after use.
This could be done by bumping it each system call so exactly
1 means this is a restart. OR, we could trust the users to
always clear it after using (better I think, no code on the
fast path).

Oh, on thinking on this, we can get this behavior with what
we have if we just change the -ERESTARTNOHAND to clear the
first word or the restart block on the handler call. This
way you can have your cake and eat it too :)
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-12-07 00:51:45

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer - PPC64


> This is the PPC64 specific patch. It goes slightly further than necessary
> by defining compat_size_t and compat_ssize_t.

Thanks Stephen, this has been merged and pulled into Linus-BK.

Anton

2002-12-07 02:24:44

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

--- linux-2.5.50-bk4-hrposix/arch/i386/kernel/signal.c Fri Dec 6 18:17:06 2002
+++ linux/arch/i386/kernel/singnal.c Fri Dec 6 18:20:05 2002
@@ -507,8 +507,8 @@
/* If so, check system call restarting.. */
switch (regs->eax) {
case -ERESTART_RESTARTBLOCK:
- current_thread_info()->restart_block.fn = do_no_restart_syscall;
case -ERESTARTNOHAND:
+ current_thread_info()->restart_block.fn = do_no_restart_syscall;
regs->eax = -EINTR;
break;



Attachments:
reg_sug.patch (546.00 B)

2002-12-08 20:37:10

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

From: Linus Torvalds <[email protected]>
Date: Fri, 6 Dec 2002 11:20:26 -0800 (PST)

I did the nanosleep() implementation using the new infrastructure now, and
am pushing it out as I write this.
...
Compat people can hopefully fix it up.

I'm fixing this up right now.

2002-12-09 18:39:12

by David Mosberger

[permalink] [raw]
Subject: RE: [PATCH] compatibility syscall layer (lets try again)

>>>>> On Mon, 9 Dec 2002 09:35:59 -0800 (PST), Linus Torvalds <[email protected]> said:

Linus> And apparently ia64 is again being a singularly awkward
Linus> architecture.

I don't want to interfere with your ability to take potshots ;-), but
just to avoid confusion: passing syscall arguments in registers is NOT
an architectural requirement, it just makes good sense.

--david

2002-12-09 18:40:22

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


> You're not looking at a recent 2.5.x tree with the nanosleep() restart
> logic.
I had been looking at 2.5.50, we had a different meaning of current.
If you are saying that for any implementation of nanosleep I have to implement
the -ERESTART_RESTARTBLOCK thingy anyway, then I better start with it.

blue skies,
Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Sch?naicherstr. 220, D-71032 B?blingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [email protected]


2002-12-09 18:43:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)



On Mon, 9 Dec 2002, Martin Schwidefsky wrote:
>
> I had been looking at 2.5.50, we had a different meaning of current.
> If you are saying that for any implementation of nanosleep I have to implement
> the -ERESTART_RESTARTBLOCK thingy anyway, then I better start with it.

You don't _have_ to. An architecture for which restarting is just too
painful can just always choose to return -EINTR, that should be ok. That's
how nanosleep() used to work before - it may not be 100% SuS compliant,
but it's not as if anybody really cares, I suspect.

Linus

2002-12-09 17:49:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)




On Mon, 9 Dec 2002, Jim Houston wrote:
>
> Either I'm missing something or this is broken if there is ever
> more than one restart function involved. You save the arguments
> to the register state that gdb saves but not the restart function
> address. In the nested case this would call one restart function
> with the arguments of another.

That's true. Scratch this patch - it's too painful on ia64 anyway, and it
doesn't nest correctly in the first place. We'd need to save off the
"restarted system call number" in user space too to get proper nesting,
just saving the arguments isn't enough.

So we're just going to have to live with the fact that restarting doesn't
nest. I think the gdb example by Daniel was interesting, but perhaps not
all that important ;)

Linus

2002-12-09 18:12:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)



On Mon, 9 Dec 2002, Martin Schwidefsky wrote:
>
> The current system call restart doesn't change the system call number. We just
> substract two from the psw address (aka eip) and go back to user space.

You're not looking at a recent 2.5.x tree with the nanosleep() restart
logic.

Linus

2002-12-09 17:52:55

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


> Well, that is tricky independently of the actual argument stuff - even
the
> _current_ system call restart ends up being tricky for you in that case,
I
> suspect. The current one already basically depends on rewriting the
system
> call number, it just leaves the arguments untouched.

The current system call restart doesn't change the system call number. We just
substract two from the psw address (aka eip) and go back to user space.

> One thing you could do (which is pretty much architecture-independent) is
> to have a "restart" flag in the thread info structure. The system call
> entry code-path already has to check the thread info structure for things
> like the "ptrace" flags, so it shouldn't add much overhead to the system
> call entrypoint.

Hmm, probably the best thing to do. I though about changing the return address
in the last frame of the stack but this is ugly and confuses the code. Another
flag in thread_info (_TIF_RESTART?) doesn't add any instruction on the hot
path in the system call handler and is simple enough to implement.

blue skies,
Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Sch?naicherstr. 220, D-71032 B?blingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [email protected]


2002-12-09 17:42:27

by Jim Houston

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds wrote:

> NOTE NOTE NOTE! This patch is totally untested. It may or may not compile
> and work. It _looks_ correct, but that's all I'm going to guarantee about
> it.
>
> if (regs->eax == -ERESTART_RESTARTBLOCK){
> + struct restart_block *restart = &current_thread_info()->restart_block;
> regs->eax = __NR_restart_syscall;
> + regs->ebx = restart->arg0;
> + regs->ecx = restart->arg1;

Hi Linus,

Either I'm missing something or this is broken if there is ever
more than one restart function involved. You save the arguments
to the register state that gdb saves but not the restart function
address. In the nested case this would call one restart function
with the arguments of another.

Jim Houston - Concurrent Computer Corp.

2002-12-09 17:24:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


On Mon, 9 Dec 2002, Martin Schwidefsky wrote:
>
> For s390/s390x this is actually quite tricky. The system call number is
> coded in the instruction, e.g. 0x0aa2 is svc 162 or sys_nanosleep. There
> is no register involved that contains the system call number I could
> simply change. I either have to change the instruction (no way) or I
> have to avoid going back to userspace in this case. This would require
> assembler magic in entry.S. Not nice.

Well, that is tricky independently of the actual argument stuff - even the
_current_ system call restart ends up being tricky for you in that case, I
suspect. The current one already basically depends on rewriting the system
call number, it just leaves the arguments untouched.

One thing you could do (which is pretty much architecture-independent) is
to have a "restart" flag in the thread info structure. The system call
entry code-path already has to check the thread info structure for things
like the "ptrace" flags, so it shouldn't add much overhead to the system
call entrypoint.

You would need to add a "do _not_ restart" macro to the system call
restart infrastructure, which would clear the restart bit when a restarted
system call returns (it also would get cleared when ERESTART_RESTARTBLOCK
ends up being changed into an EINTR, of course, but that's already
architecture-dependent code so you can do anything there).

Linus

2002-12-09 17:27:36

by Linus Torvalds

[permalink] [raw]
Subject: RE: [PATCH] compatibility syscall layer (lets try again)



On Mon, 9 Dec 2002, Mikael Starvik wrote:
>
> No problem for CRIS architechture (port will be submitted when 2.5.51
> has been released if that happens before xmas).

Note that I've not committed the patch to my tree at all, and as far as I
am concerned this is in somebody elses court (ie somebody that cares about
restarting). I don't have any strong feelings either way about how
restarting should work - and I'd like to have somebody take it up and
testing it as well as having architecture maintainers largely sign off on
this approach.

It's certainly more flexible to save restart info in user space registers,
so in that way it's good. It has some downsides, though - it may be
against the callinmg convention of the architecture, for example, to
change those registers (some people expect the system call arguments to
not be changed by the system call, so when it returns and the arguments
have been modified to be the "restart arguments", those people would be
unhappy).

And apparently ia64 is again being a singularly awkward architecture.

Linus

2002-12-09 17:19:54

by David Mosberger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

>>>>> On Mon, 9 Dec 2002 08:48:13 -0800 (PST), Linus Torvalds <[email protected]> said:

Linus> Architecture maintainers, can you comment on how easy/hard it
Linus> is to do the same thing on your architectures? I _assume_
Linus> it's trivial (akin to the three-liner register state change
Linus> in i386/kernel/signal.c).

It's not trivial on ia64: we keep the syscall arguments in registers
(the stacked registers, to be precise), so to modify them, we need to
(a) flush the stacked registers to memory and (b) find the frame that
contains the syscall arguments, (c) patch the values in memory, and
(d) reload the stacked registers. It's doable (like you say, ptrace()
does it already), but that's about the best I can say about it...

--david

2002-12-09 16:53:09

by Mikael Starvik

[permalink] [raw]
Subject: RE: [PATCH] compatibility syscall layer (lets try again)

No problem for CRIS architechture (port will be submitted when 2.5.51
has been released if that happens before xmas).

/Mikael

-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Linus Torvalds
Sent: Monday, December 09, 2002 5:48 PM
To: Daniel Jacobowitz
Cc: george anzinger; Jim Houston; Stephen Rothwell; LKML;
[email protected]; David S. Miller; [email protected]; [email protected];
[email protected]; [email protected]; [email protected]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)




On Mon, 9 Dec 2002, Daniel Jacobowitz wrote:
>
> Well, here's something to consider. This isn't entirely hypothetical;
> there are test cases in GDB's regression suite that cover nearly this.
>
> Suppose a process is sleeping for an hour. The user wants to see what
> another thread is doing, so he hits Control-C; the thread which happens
> to be reported as 'current' is the one that was in nanosleep(). It
> used to be that when he said continue, the nanosleep would return; now
> hopefully it'll continue. Great! But this damnable user isn't done
> yet. He wants to look at one of his data structures. He calls a
> debugging print_foo() function from GDB. He realizes he left a
> sleep-for-a-minute nanosleep call in it and C-c's again. Now we have
> two interrupted nanosleep calls and the application will never see a
> signal to interrupt either of them; he says "continue" twice and
> expects to get back to his hour-long sleep.

Ok, this will definitely not work with the current restart mechanism.

We could make it work, but it would be rather involved. To make nesting
work correctly for the above, you could take two approaches:

- tell gdb about the restart state through some ptrace() interface, and
have gdb save and restore it.

- save all the restart state in user space registers/memory (on the
stack, for example), so that it automatically nests correctly.

The second approach has the advantage that that not only would it work
with unmodified gdb binaries, it would also allow us to nest correctly
over a signal handler invocation, which is needed if we ever allow
restarting even if a handler is invoced.

It's not really hard to do, but both approaches open up restarting to
potential security issues, ie now you have to make sure that you're not
leaking kernel data or forgetting to check something over a restart. That
doesn't matter for nanosleep() itself (none of the data there is in any
way security-conscious, even if one of the restart arguments has been
modified to the in-kernel "jiffies" representation), but it might make a
difference for other system calls.

If you want to test this out and play with it, the x86 implementation is
pretty easy. I don't know how nasty it can be to re-initialize the system
call argument registers on other architectures, but going through the
do_signal() logic _should_ mean that we have access to all user mode
register state (otherwise ptrace() wouldn't work on such architectures).

NOTE NOTE NOTE! This patch is totally untested. It may or may not compile
and work. It _looks_ correct, but that's all I'm going to guarantee about
it.

Architecture maintainers, can you comment on how easy/hard it is to do the
same thing on your architectures? I _assume_ it's trivial (akin to the
three-liner register state change in i386/kernel/signal.c).

Linus

===== arch/i386/kernel/signal.c 1.22 vs edited =====
--- 1.22/arch/i386/kernel/signal.c Fri Dec 6 09:43:43 2002
+++ edited/arch/i386/kernel/signal.c Mon Dec 9 08:39:57 2002
@@ -594,7 +594,10 @@
regs->eip -= 2;
}
if (regs->eax == -ERESTART_RESTARTBLOCK){
+ struct restart_block *restart = &current_thread_info()->restart_block;
regs->eax = __NR_restart_syscall;
+ regs->ebx = restart->arg0;
+ regs->ecx = restart->arg1;
regs->eip -= 2;
}
}
===== include/linux/thread_info.h 1.4 vs edited =====
--- 1.4/include/linux/thread_info.h Fri Dec 6 09:43:43 2002
+++ edited/include/linux/thread_info.h Mon Dec 9 08:44:56 2002
@@ -11,11 +11,11 @@
* System call restart block.
*/
struct restart_block {
- long (*fn)(struct restart_block *);
- unsigned long arg0, arg1, arg2;
+ long (*fn)(unsigned long arg0, unsigned long arg1);
+ unsigned long arg0, arg1;
};

-extern long do_no_restart_syscall(struct restart_block *parm);
+extern long do_no_restart_syscall(unsigned long arg0, unsigned long arg1);

#include <linux/bitops.h>
#include <asm/thread_info.h>
===== kernel/signal.c 1.55 vs edited =====
--- 1.55/kernel/signal.c Fri Dec 6 11:08:28 2002
+++ edited/kernel/signal.c Mon Dec 9 08:45:19 2002
@@ -1351,13 +1351,13 @@
* System call entry points.
*/

-asmlinkage long sys_restart_syscall(void)
+asmlinkage long sys_restart_syscall(unsigned long arg0, unsigned long arg1)
{
struct restart_block *restart = &current_thread_info()->restart_block;
- return restart->fn(restart);
+ return restart->fn(arg0, arg1);
}

-long do_no_restart_syscall(struct restart_block *param)
+long do_no_restart_syscall(unsigned long arg0, unsigned long arg1)
{
return -EINTR;
}
===== kernel/timer.c 1.37 vs edited =====
--- 1.37/kernel/timer.c Fri Dec 6 11:10:33 2002
+++ edited/kernel/timer.c Mon Dec 9 08:42:36 2002
@@ -1021,10 +1021,10 @@
return current->pid;
}

-static long nanosleep_restart(struct restart_block *restart)
+static long nanosleep_restart(unsigned long arg0, unsigned long arg1)
{
- unsigned long expire = restart->arg0, now = jiffies;
- struct timespec *rmtp = (struct timespec *) restart->arg1;
+ unsigned long expire = arg0, now = jiffies;
+ struct timespec *rmtp = (struct timespec *) arg1;
long ret;

/* Did it expire while we handled signals? */

2002-12-09 17:13:02

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


Hi Linus,

>Architecture maintainers, can you comment on how easy/hard it is to do the
>same thing on your architectures? I _assume_ it's trivial (akin to the
>three-liner register state change in i386/kernel/signal.c).

For s390/s390x this is actually quite tricky. The system call number is
coded in the instruction, e.g. 0x0aa2 is svc 162 or sys_nanosleep. There
is no register involved that contains the system call number I could
simply change. I either have to change the instruction (no way) or I
have to avoid going back to userspace in this case. This would require
assembler magic in entry.S. Not nice.

blue skies,
Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Sch?naicherstr. 220, D-71032 B?blingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [email protected]


2002-12-09 16:39:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)



On Mon, 9 Dec 2002, Daniel Jacobowitz wrote:
>
> Well, here's something to consider. This isn't entirely hypothetical;
> there are test cases in GDB's regression suite that cover nearly this.
>
> Suppose a process is sleeping for an hour. The user wants to see what
> another thread is doing, so he hits Control-C; the thread which happens
> to be reported as 'current' is the one that was in nanosleep(). It
> used to be that when he said continue, the nanosleep would return; now
> hopefully it'll continue. Great! But this damnable user isn't done
> yet. He wants to look at one of his data structures. He calls a
> debugging print_foo() function from GDB. He realizes he left a
> sleep-for-a-minute nanosleep call in it and C-c's again. Now we have
> two interrupted nanosleep calls and the application will never see a
> signal to interrupt either of them; he says "continue" twice and
> expects to get back to his hour-long sleep.

Ok, this will definitely not work with the current restart mechanism.

We could make it work, but it would be rather involved. To make nesting
work correctly for the above, you could take two approaches:

- tell gdb about the restart state through some ptrace() interface, and
have gdb save and restore it.

- save all the restart state in user space registers/memory (on the
stack, for example), so that it automatically nests correctly.

The second approach has the advantage that that not only would it work
with unmodified gdb binaries, it would also allow us to nest correctly
over a signal handler invocation, which is needed if we ever allow
restarting even if a handler is invoced.

It's not really hard to do, but both approaches open up restarting to
potential security issues, ie now you have to make sure that you're not
leaking kernel data or forgetting to check something over a restart. That
doesn't matter for nanosleep() itself (none of the data there is in any
way security-conscious, even if one of the restart arguments has been
modified to the in-kernel "jiffies" representation), but it might make a
difference for other system calls.

If you want to test this out and play with it, the x86 implementation is
pretty easy. I don't know how nasty it can be to re-initialize the system
call argument registers on other architectures, but going through the
do_signal() logic _should_ mean that we have access to all user mode
register state (otherwise ptrace() wouldn't work on such architectures).

NOTE NOTE NOTE! This patch is totally untested. It may or may not compile
and work. It _looks_ correct, but that's all I'm going to guarantee about
it.

Architecture maintainers, can you comment on how easy/hard it is to do the
same thing on your architectures? I _assume_ it's trivial (akin to the
three-liner register state change in i386/kernel/signal.c).

Linus

===== arch/i386/kernel/signal.c 1.22 vs edited =====
--- 1.22/arch/i386/kernel/signal.c Fri Dec 6 09:43:43 2002
+++ edited/arch/i386/kernel/signal.c Mon Dec 9 08:39:57 2002
@@ -594,7 +594,10 @@
regs->eip -= 2;
}
if (regs->eax == -ERESTART_RESTARTBLOCK){
+ struct restart_block *restart = &current_thread_info()->restart_block;
regs->eax = __NR_restart_syscall;
+ regs->ebx = restart->arg0;
+ regs->ecx = restart->arg1;
regs->eip -= 2;
}
}
===== include/linux/thread_info.h 1.4 vs edited =====
--- 1.4/include/linux/thread_info.h Fri Dec 6 09:43:43 2002
+++ edited/include/linux/thread_info.h Mon Dec 9 08:44:56 2002
@@ -11,11 +11,11 @@
* System call restart block.
*/
struct restart_block {
- long (*fn)(struct restart_block *);
- unsigned long arg0, arg1, arg2;
+ long (*fn)(unsigned long arg0, unsigned long arg1);
+ unsigned long arg0, arg1;
};

-extern long do_no_restart_syscall(struct restart_block *parm);
+extern long do_no_restart_syscall(unsigned long arg0, unsigned long arg1);

#include <linux/bitops.h>
#include <asm/thread_info.h>
===== kernel/signal.c 1.55 vs edited =====
--- 1.55/kernel/signal.c Fri Dec 6 11:08:28 2002
+++ edited/kernel/signal.c Mon Dec 9 08:45:19 2002
@@ -1351,13 +1351,13 @@
* System call entry points.
*/

-asmlinkage long sys_restart_syscall(void)
+asmlinkage long sys_restart_syscall(unsigned long arg0, unsigned long arg1)
{
struct restart_block *restart = &current_thread_info()->restart_block;
- return restart->fn(restart);
+ return restart->fn(arg0, arg1);
}

-long do_no_restart_syscall(struct restart_block *param)
+long do_no_restart_syscall(unsigned long arg0, unsigned long arg1)
{
return -EINTR;
}
===== kernel/timer.c 1.37 vs edited =====
--- 1.37/kernel/timer.c Fri Dec 6 11:10:33 2002
+++ edited/kernel/timer.c Mon Dec 9 08:42:36 2002
@@ -1021,10 +1021,10 @@
return current->pid;
}

-static long nanosleep_restart(struct restart_block *restart)
+static long nanosleep_restart(unsigned long arg0, unsigned long arg1)
{
- unsigned long expire = restart->arg0, now = jiffies;
- struct timespec *rmtp = (struct timespec *) restart->arg1;
+ unsigned long expire = arg0, now = jiffies;
+ struct timespec *rmtp = (struct timespec *) arg1;
long ret;

/* Did it expire while we handled signals? */

2002-12-09 15:34:21

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

On Fri, Dec 06, 2002 at 09:57:08AM -0800, Linus Torvalds wrote:
>
> I just pushed my version of the system call restart code to the BK trees.
> It's losely based on Georges code, but subtly different. Also, I didn't
> actually update any actual system calls to use it, I just did the
> infrastructure.
>
> Non-x86 architectures need to be updated to work with this: they need to
> update their thread structures, the additional do_signal() support in
> their signal.c, and add the actual system call itself somewhere. For x86,
> this was about 15 lines of changes.
>
> The basic premise is very simple: if you want to restart a system call,
> you can do
>
> restart = &current_thread()->restart_block;
> restart->fn = my_continuation_function;
> restart->arg0 = my_arg0_for_continuation;
> restart->arg1 = my_arg1_for_continuation;
> ..
> return -ERESTARTSYS_RESTARTBLOCK;
>
> which will cause the system call to either return -EINTR (if a signal
> handler was actually invoced) or for "benign" signals (SIGSTOP etc) it
> will end up restarting the system call at the continuation function (with
> the "restart" block as the argument).
>
> We could extend this to allow restarting even over signal handlers, but
> that would have some re-entrancy issues (ie what happens if a signal
> handler itself wants to use a system call that may want restarting), so at
> least for now restarting is only done when no handler is invoced (*).
>
> Linus
>
> (*) The nesting case is by no means impossible to handle gracefully
> (adding a "restart even if handler is called" error number and returning
> -EINTR if nesting, for example), but I don't know of any system calls that
> would really want to try to restart anyway, so..

Well, here's something to consider. This isn't entirely hypothetical;
there are test cases in GDB's regression suite that cover nearly this.

Suppose a process is sleeping for an hour. The user wants to see what
another thread is doing, so he hits Control-C; the thread which happens
to be reported as 'current' is the one that was in nanosleep(). It
used to be that when he said continue, the nanosleep would return; now
hopefully it'll continue. Great! But this damnable user isn't done
yet. He wants to look at one of his data structures. He calls a
debugging print_foo() function from GDB. He realizes he left a
sleep-for-a-minute nanosleep call in it and C-c's again. Now we have
two interrupted nanosleep calls and the application will never see a
signal to interrupt either of them; he says "continue" twice and
expects to get back to his hour-long sleep.

Note that I'm not saying we _need_ to support this, mind :) It's a
little pathological.

Another thing that annoys me slightly about this is that we mess with
the value in orig_eax etc. Now a debugger would have to look at the
instruction stream to figure out what the syscall was that we're
stopped in, reliably. Not a big deal.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer

2002-12-09 06:17:44

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

On Sun, 08 Dec 2002 12:41:00 -0800 (PST) "David S. Miller" <[email protected]> wrote:
>
> From: Linus Torvalds <[email protected]>
> Date: Fri, 6 Dec 2002 11:20:26 -0800 (PST)
>
> I did the nanosleep() implementation using the new infrastructure now, and
> am pushing it out as I write this.
> ...
> Compat people can hopefully fix it up.
>
> I'm fixing this up right now.

Thanks for this, Dave.

Isn't it nice that it only needs to be fixed once :-)

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

2002-12-09 20:19:15

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

From: David Mosberger <[email protected]>
Date: Mon, 9 Dec 2002 09:27:06 -0800

>>>>> On Mon, 9 Dec 2002 08:48:13 -0800 (PST), Linus Torvalds <[email protected]> said:

Linus> Architecture maintainers, can you comment on how easy/hard it
Linus> is to do the same thing on your architectures? I _assume_
Linus> it's trivial (akin to the three-liner register state change
Linus> in i386/kernel/signal.c).

It's not trivial on ia64:

It was really easy on Sparc.

2002-12-09 20:14:38

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

From: "Martin Schwidefsky" <[email protected]>
Date: Mon, 9 Dec 2002 18:16:43 +0100

>Architecture maintainers, can you comment on how easy/hard it is to do the
>same thing on your architectures? I _assume_ it's trivial (akin to the
>three-liner register state change in i386/kernel/signal.c).

For s390/s390x this is actually quite tricky. The system call number is
coded in the instruction, e.g. 0x0aa2 is svc 162 or sys_nanosleep. There
is no register involved that contains the system call number I could
simply change. I either have to change the instruction (no way) or I
have to avoid going back to userspace in this case. This would require
assembler magic in entry.S. Not nice.

Put the magic restart_block syscall at some fixed place in every
user process, change the PC to that. Or, alternatively, put the
restart_block syscall insn on the stack and point the PC at that.

This isn't rocket science :-)

2002-12-09 23:22:46

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds writes:

> Architecture maintainers, can you comment on how easy/hard it is to do the
> same thing on your architectures? I _assume_ it's trivial (akin to the
> three-liner register state change in i386/kernel/signal.c).

It's just as easy on PPC and PPC64 as on x86.

Paul.

2002-12-10 00:05:54

by Paul Mackerras

[permalink] [raw]
Subject: RE: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds writes:

> Note that I've not committed the patch to my tree at all, and as far as I
> am concerned this is in somebody elses court (ie somebody that cares about
> restarting). I don't have any strong feelings either way about how
> restarting should work - and I'd like to have somebody take it up and
> testing it as well as having architecture maintainers largely sign off on
> this approach.

There is a simpler way to solve the nanosleep problem which doesn't
involve any more restart magic than we have been using for years.
That is to define a new sys_new_nanosleep system call which takes one
argument which is a pointer to the time to sleep. If the sleep gets
interrupted by a pending signal, the kernel sys_new_nanosleep will
write back the remaining time (overwriting the requested time) and
return -ERESTARTNOHAND. The glibc nanosleep() then looks like this:

int nanosleep(const struct timespec *req, struct timespec *rem)
{
*rem = *req;
return new_nanosleep(rem);
}

Any reason why this can't work?

(BTW this is Rusty's idea. :)

Regards,
Paul.

2002-12-10 08:14:26

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

--- /usr/src/linux-2.5.50-bk7-posix/arch/i386/kernel/signal.c~ Sat Dec 7 21:36:11 2002
+++ /usr/src/linux-2.5.50-bk7-posix/arch/i386/kernel/signal.c Tue Dec 10 00:06:10 2002
@@ -505,9 +505,8 @@
/* Are we from a system call? */
if (regs->orig_eax >= 0) {
/* If so, check system call restarting.. */
+ current_thread_info()->restart_block.fl = 0;
switch (regs->eax) {
- case -ERESTART_RESTARTBLOCK:
- current_thread_info()->restart_block.fn = do_no_restart_syscall;
case -ERESTARTNOHAND:
regs->eax = -EINTR;
break;
@@ -591,10 +590,6 @@
regs->eax == -ERESTARTSYS ||
regs->eax == -ERESTARTNOINTR) {
regs->eax = regs->orig_eax;
- regs->eip -= 2;
- }
- if (regs->eax == -ERESTART_RESTARTBLOCK){
- regs->eax = __NR_restart_syscall;
regs->eip -= 2;
}
}
--- /usr/src/linux-2.5.50-bk7-posix/include/linux/thread_info.h~ Sat Dec 7 21:36:43 2002
+++ /usr/src/linux-2.5.50-bk7-posix/include/linux/thread_info.h Tue Dec 10 00:12:31 2002
@@ -11,7 +11,7 @@
* System call restart block.
*/
struct restart_block {
- long (*fn)(struct restart_block *);
+ long fl;
unsigned long arg0, arg1, arg2;
};

--- /usr/src/linux-2.5.50-bk7-posix/include/asm-i386/thread_info.h~ Sat Dec 7 21:36:41 2002
+++ /usr/src/linux-2.5.50-bk7-posix/include/asm-i386/thread_info.h Tue Dec 10 00:09:32 2002
@@ -68,7 +68,7 @@
.preempt_count = 1, \
.addr_limit = KERNEL_DS, \
.restart_block = { \
- .fn = do_no_restart_syscall, \
+ .fl = 0, \
}, \
}



Attachments:
restart-2.5.50-bk7.patch (1.49 kB)

2002-12-10 08:38:28

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


> Put the magic restart_block syscall at some fixed place in every
> user process, change the PC to that. Or, alternatively, put the
> restart_block syscall insn on the stack and point the PC at that.
>
> This isn't rocket science :-)

Something like that was my first though as well. I would have played
games with return addresses inside the kernel instead of user space.
The idea to have another _TIF_xxx flag seems much cleaner though and
I want the cleanest solution for this. Once this is implemented every
system call can be restarted with a different system call number. Who
knows what other uses this might have?

blue skies,
Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Sch?naicherstr. 220, D-71032 B?blingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [email protected]


2002-12-10 11:02:57

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Linus Torvalds wrote:
> So what we do is to introduce a _new_ system call
> (system call number NNN), which takes a different form of timeout, namely
> "absolute value of end time".

An "absolute value of end time" variant of
select/poll/epoll/io_getevents would be good anyway from userspace, to
avoid the gettimeofday+poll race condition.

So, perhaps the solution here is to simply provide absolute time
variants of the system calls which currently take time delays, and
have the relative-time variants rewrite themselves into absolute form?

That's architecture neutral _and_ fixes a long-standing race
condition..

-- Jamie

2002-12-10 14:32:56

by Keith Owens

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

On Mon, 9 Dec 2002 18:56:30 +0100,
"Martin Schwidefsky" <[email protected]> wrote:
>
>> Well, that is tricky independently of the actual argument stuff - even
>the
>> _current_ system call restart ends up being tricky for you in that case,
>I
>> suspect. The current one already basically depends on rewriting the
>system
>> call number, it just leaves the arguments untouched.
>
>The current system call restart doesn't change the system call number. We just
>substract two from the psw address (aka eip) and go back to user space.

EX R1,syscall - instruction length is 4, not 2.

2002-12-10 17:13:41

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)


> EX R1,syscall - instruction length is 4, not 2.
Another good reason not to go back to user space.

blue skies,
Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Sch?naicherstr. 220, D-71032 B?blingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [email protected]


2002-12-10 23:01:45

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Daniel Jacobowitz wrote:
>
> On Fri, Dec 06, 2002 at 09:57:08AM -0800, Linus Torvalds wrote:
> >
> > I just pushed my version of the system call restart code to the BK trees.
> > It's losely based on Georges code, but subtly different. Also, I didn't
> > actually update any actual system calls to use it, I just did the
> > infrastructure.
> >
> > Non-x86 architectures need to be updated to work with this: they need to
> > update their thread structures, the additional do_signal() support in
> > their signal.c, and add the actual system call itself somewhere. For x86,
> > this was about 15 lines of changes.
> >
> > The basic premise is very simple: if you want to restart a system call,
> > you can do
> >
> > restart = &current_thread()->restart_block;
> > restart->fn = my_continuation_function;
> > restart->arg0 = my_arg0_for_continuation;
> > restart->arg1 = my_arg1_for_continuation;
> > ..
> > return -ERESTARTSYS_RESTARTBLOCK;
> >
> > which will cause the system call to either return -EINTR (if a signal
> > handler was actually invoced) or for "benign" signals (SIGSTOP etc) it
> > will end up restarting the system call at the continuation function (with
> > the "restart" block as the argument).
> >
> > We could extend this to allow restarting even over signal handlers, but
> > that would have some re-entrancy issues (ie what happens if a signal
> > handler itself wants to use a system call that may want restarting), so at
> > least for now restarting is only done when no handler is invoced (*).
> >
> > Linus
> >
> > (*) The nesting case is by no means impossible to handle gracefully
> > (adding a "restart even if handler is called" error number and returning
> > -EINTR if nesting, for example), but I don't know of any system calls that
> > would really want to try to restart anyway, so..
>
> Well, here's something to consider. This isn't entirely hypothetical;
> there are test cases in GDB's regression suite that cover nearly this.
>
> Suppose a process is sleeping for an hour. The user wants to see what
> another thread is doing, so he hits Control-C; the thread which happens
> to be reported as 'current' is the one that was in nanosleep(). It
> used to be that when he said continue, the nanosleep would return; now
> hopefully it'll continue. Great! But this damnable user isn't done
> yet. He wants to look at one of his data structures. He calls a
> debugging print_foo() function from GDB. He realizes he left a
> sleep-for-a-minute nanosleep call in it and C-c's again. Now we have
> two interrupted nanosleep calls and the application will never see a
> signal to interrupt either of them; he says "continue" twice and
> expects to get back to his hour-long sleep.
>
> Note that I'm not saying we _need_ to support this, mind :) It's a
> little pathological.

I seem to recall working on a debugger in the distant past
and put a lock in it that did not allow it to run a debuggee
function while the debugee was in a system call. It seems
to me that is is begging for problems and is not that hard
for gdb/etc to prevent.

Daniel, what to you think?

-g
>
> Another thing that annoys me slightly about this is that we mess with
> the value in orig_eax etc. Now a debugger would have to look at the
> instruction stream to figure out what the syscall was that we're
> stopped in, reliably. Not a big deal.
>
> --
> Daniel Jacobowitz
> MontaVista Software Debian GNU/Linux Developer

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-12-10 23:06:11

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Paul Mackerras wrote:
>
> Linus Torvalds writes:
>
> > Note that I've not committed the patch to my tree at all, and as far as I
> > am concerned this is in somebody elses court (ie somebody that cares about
> > restarting). I don't have any strong feelings either way about how
> > restarting should work - and I'd like to have somebody take it up and
> > testing it as well as having architecture maintainers largely sign off on
> > this approach.
>
> There is a simpler way to solve the nanosleep problem which doesn't
> involve any more restart magic than we have been using for years.
> That is to define a new sys_new_nanosleep system call which takes one
> argument which is a pointer to the time to sleep. If the sleep gets
> interrupted by a pending signal, the kernel sys_new_nanosleep will
> write back the remaining time (overwriting the requested time) and
> return -ERESTARTNOHAND. The glibc nanosleep() then looks like this:
>
> int nanosleep(const struct timespec *req, struct timespec *rem)
> {
> *rem = *req;
> return new_nanosleep(rem);
> }
>
> Any reason why this can't work?
>
> (BTW this is Rusty's idea. :)
>
This all started because the standard says nano_sleep should
wake up delta time from when it went to sleep. To this one
would need to save the absolute time, not the time
remaining. In other words, while ptrace or what ever are
doing there thing, time IS passing.
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-12-11 07:03:11

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

On Tue, Dec 10, 2002 at 03:07:37PM -0800, george anzinger wrote:
> Daniel Jacobowitz wrote:
> >
> > On Fri, Dec 06, 2002 at 09:57:08AM -0800, Linus Torvalds wrote:
> > >
> > > I just pushed my version of the system call restart code to the BK trees.
> > > It's losely based on Georges code, but subtly different. Also, I didn't
> > > actually update any actual system calls to use it, I just did the
> > > infrastructure.
> > >
> > > Non-x86 architectures need to be updated to work with this: they need to
> > > update their thread structures, the additional do_signal() support in
> > > their signal.c, and add the actual system call itself somewhere. For x86,
> > > this was about 15 lines of changes.
> > >
> > > The basic premise is very simple: if you want to restart a system call,
> > > you can do
> > >
> > > restart = &current_thread()->restart_block;
> > > restart->fn = my_continuation_function;
> > > restart->arg0 = my_arg0_for_continuation;
> > > restart->arg1 = my_arg1_for_continuation;
> > > ..
> > > return -ERESTARTSYS_RESTARTBLOCK;
> > >
> > > which will cause the system call to either return -EINTR (if a signal
> > > handler was actually invoced) or for "benign" signals (SIGSTOP etc) it
> > > will end up restarting the system call at the continuation function (with
> > > the "restart" block as the argument).
> > >
> > > We could extend this to allow restarting even over signal handlers, but
> > > that would have some re-entrancy issues (ie what happens if a signal
> > > handler itself wants to use a system call that may want restarting), so at
> > > least for now restarting is only done when no handler is invoced (*).
> > >
> > > Linus
> > >
> > > (*) The nesting case is by no means impossible to handle gracefully
> > > (adding a "restart even if handler is called" error number and returning
> > > -EINTR if nesting, for example), but I don't know of any system calls that
> > > would really want to try to restart anyway, so..
> >
> > Well, here's something to consider. This isn't entirely hypothetical;
> > there are test cases in GDB's regression suite that cover nearly this.
> >
> > Suppose a process is sleeping for an hour. The user wants to see what
> > another thread is doing, so he hits Control-C; the thread which happens
> > to be reported as 'current' is the one that was in nanosleep(). It
> > used to be that when he said continue, the nanosleep would return; now
> > hopefully it'll continue. Great! But this damnable user isn't done
> > yet. He wants to look at one of his data structures. He calls a
> > debugging print_foo() function from GDB. He realizes he left a
> > sleep-for-a-minute nanosleep call in it and C-c's again. Now we have
> > two interrupted nanosleep calls and the application will never see a
> > signal to interrupt either of them; he says "continue" twice and
> > expects to get back to his hour-long sleep.
> >
> > Note that I'm not saying we _need_ to support this, mind :) It's a
> > little pathological.
>
> I seem to recall working on a debugger in the distant past
> and put a lock in it that did not allow it to run a debuggee
> function while the debugee was in a system call. It seems
> to me that is is begging for problems and is not that hard
> for gdb/etc to prevent.
>
> Daniel, what to you think?

I really don't see the point. I like being able to call functions
while stopped in read() or poll(); you can hit C-c and go off to
examine your application. That's as much "in a syscall" as this is.

Besides, implementing this is more of a pain than it's worth.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer

2002-12-11 08:06:28

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

Daniel Jacobowitz wrote:
>
> On Tue, Dec 10, 2002 at 03:07:37PM -0800, george anzinger wrote:
> > Daniel Jacobowitz wrote:
> > >
> > > On Fri, Dec 06, 2002 at 09:57:08AM -0800, Linus Torvalds wrote:
> > > >
> > > > I just pushed my version of the system call restart code to the BK trees.
> > > > It's losely based on Georges code, but subtly different. Also, I didn't
> > > > actually update any actual system calls to use it, I just did the
> > > > infrastructure.
> > > >
> > > > Non-x86 architectures need to be updated to work with this: they need to
> > > > update their thread structures, the additional do_signal() support in
> > > > their signal.c, and add the actual system call itself somewhere. For x86,
> > > > this was about 15 lines of changes.
> > > >
> > > > The basic premise is very simple: if you want to restart a system call,
> > > > you can do
> > > >
> > > > restart = &current_thread()->restart_block;
> > > > restart->fn = my_continuation_function;
> > > > restart->arg0 = my_arg0_for_continuation;
> > > > restart->arg1 = my_arg1_for_continuation;
> > > > ..
> > > > return -ERESTARTSYS_RESTARTBLOCK;
> > > >
> > > > which will cause the system call to either return -EINTR (if a signal
> > > > handler was actually invoced) or for "benign" signals (SIGSTOP etc) it
> > > > will end up restarting the system call at the continuation function (with
> > > > the "restart" block as the argument).
> > > >
> > > > We could extend this to allow restarting even over signal handlers, but
> > > > that would have some re-entrancy issues (ie what happens if a signal
> > > > handler itself wants to use a system call that may want restarting), so at
> > > > least for now restarting is only done when no handler is invoced (*).
> > > >
> > > > Linus
> > > >
> > > > (*) The nesting case is by no means impossible to handle gracefully
> > > > (adding a "restart even if handler is called" error number and returning
> > > > -EINTR if nesting, for example), but I don't know of any system calls that
> > > > would really want to try to restart anyway, so..
> > >
> > > Well, here's something to consider. This isn't entirely hypothetical;
> > > there are test cases in GDB's regression suite that cover nearly this.
> > >
> > > Suppose a process is sleeping for an hour. The user wants to see what
> > > another thread is doing, so he hits Control-C; the thread which happens
> > > to be reported as 'current' is the one that was in nanosleep(). It
> > > used to be that when he said continue, the nanosleep would return; now
> > > hopefully it'll continue. Great! But this damnable user isn't done
> > > yet. He wants to look at one of his data structures. He calls a
> > > debugging print_foo() function from GDB. He realizes he left a
> > > sleep-for-a-minute nanosleep call in it and C-c's again. Now we have
> > > two interrupted nanosleep calls and the application will never see a
> > > signal to interrupt either of them; he says "continue" twice and
> > > expects to get back to his hour-long sleep.
> > >
> > > Note that I'm not saying we _need_ to support this, mind :) It's a
> > > little pathological.
> >
> > I seem to recall working on a debugger in the distant past
> > and put a lock in it that did not allow it to run a debuggee
> > function while the debugee was in a system call. It seems
> > to me that is is begging for problems and is not that hard
> > for gdb/etc to prevent.
> >
> > Daniel, what to you think?
>
> I really don't see the point. I like being able to call functions
> while stopped in read() or poll(); you can hit C-c and go off to
> examine your application. That's as much "in a syscall" as this is.

Granted, and restarting them is a bit easier, to be sure.
But if the function does the same call it can hose the
program.
>
> Besides, implementing this is more of a pain than it's worth.

The problem we are left with is how to detect that a
nanosleep call that is interrupted and has set itself up to
continue and determine if it is, in fact, now handling that
continue call. Once we figure that out, we need to
understand what to do if it is not, i.e. what to do when the
call is continued. The "correct" thing to do would be to
stack all the relevant info on the user stack and pop it off
when needed much as what is done on a signal.

As it stands now, however, the detection of the need to do
this comes AFTER ptrace has handed control over to the
debugger and it has modified the users stack.
>
How about something like this:

1.) We use the same error returns as now and do not use a
new restart system call.
2.) Define a function "push_restart(struct restart_block
*restart)" which is called by the function when it detects
that it _may_ be restarted. This function would save the
restart block in a block of memory allocated from the slab
allocater and link it into the task, either the task_struct
or the thread_info. The function would add the current user
stack address and pc to the block.
3.) do_signal would toss the block and return the memory if
it determines that the call is not to be restarted.
4.) Define a "int pop_restart(struct restart_block
*restart)" function which the system call should call if it
is possible that it may be restarted. This function would
search the list of restart_blocks for the task doing:
a.) freeing any that have stack tags that are below the
current stack and
b.) returning the block with the current stack and pc
address, or null if not found.

This should be a true push down stack so that either a.) or
b.) will always be true unless there is no block for the
current call. I.e. the search can stop once the stack tag
is above the current user stack address.

We could put the first restart block in the task_struct or
thread_info area so we almost never allocate any memory.

So, is this overkill?

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-12-11 08:17:56

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [PATCH] compatibility syscall layer (lets try again)

On Wed, Dec 11, 2002 at 12:11:11AM -0800, george anzinger wrote:
> So, is this overkill?

Yes. I'm also 99% certain it won't solve the problem. Really, the
only way to get this right is to make sure all the state is out where
GDB will see it and protect it. But it sounds like doing this on S/390
and ia64 is prohibitively complicated.

I'll think about it. If we can't come up with anything better we may
want to just make this behave correctly on those architectures that can
do it.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer