2001-12-22 11:36:30

by Keith Owens

[permalink] [raw]
Subject: [patch] Assigning syscall numbers for testing

Resend, previous mail had the wrong version of the patch.

It is clear (to me at least) that developers have problems assigning
syscall numbers for testing, before their code is accepted into the
kernel. Developers have to worry about collisions with everybody else
who is testing syscalls. More importantly, the user space code has to
"know" what the testing syscall number is this week. As a minor
problem, strace cannot report on testing syscalls, except to print the
number.

The patch below dynamically assigns a syscall number to a name and
exports the number and name via /proc. Dynamic assignment removes the
collision problem. Exporting via /proc allows user space code to
automatically find out what the syscall number is this week. strace
could read the /proc output to print the syscall name, although it
still cannot print the arguments.

This facility must only be used while testing code, until the
developer's code is accepted by Linus when it should be assigned a
permanent syscall number. Anybody caught using this kernel facility
for code that is in Linus's kernel will be hung, drawn, quartered then
forced to write in COBOL.

To dynamically register a syscall number, ignoring error checking :-

#include <linux/dynamic_syscalls.h>

printk(KERN_DEBUG "Assigned syscall number %d for %s\n",
register_dynamic_syscall("attrctl", DYNAMIC_SYSCALL_FUNC(sys_attrctl)),
"attrctl");

There is no deregister function. Syscalls are _not_ supported in
modules, there is no architecture independent way of handling the
branch from the syscall handler to a module. I know that people have
done syscalls in modules but unless they have coded some architecture
dependent assembler glue, I guarantee that their code will break on
ia64 and ppc64. In any case, Linus has said that syscalls are not
supported in modules. Ignore the modules howto, it is hopelessly out
of date.

User space code should open /proc/dynamic_syscalls, read the lines
looking for their syscall name, extract the number and call the glibc
syscall() function using that number. Do not use the _syscalln()
functions, they require a constant syscall number at compile time.

If the code cannot open /proc/dynamic_syscalls or cannot find the
desired syscall name, fall back to the assigned syscall number (if any)
or fail if there is no assigned syscall number. By falling back to the
assigned syscall number, new versions of the user space code are
backwards compatible, on older kernels it will use the dynamic syscall
number, on newer kernels it will use the assigned number.

The patch has support for i386 and ia64. Each architecture needs
include/asm-$(ARCH)/dynamic_syscalls.h defining the range of dynamic
syscalls and information about entries in sys_call_table. On archs
that use function pointers, DYNAMIC_SYSCALL_FUNCADDR must extract the
function address from the descriptor.

The patch is against 2.4.17 but should fit 2.4.16 and 2.5 as well.
Enjoy.

Index: 17.1/kernel/Makefile
--- 17.1/kernel/Makefile Tue, 18 Sep 2001 13:43:44 +1000 kaos (linux-2.4/k/3_Makefile 1.1.10.2 644)
+++ 17.1(w)/kernel/Makefile Sat, 22 Dec 2001 22:21:06 +1100 kaos (linux-2.4/k/3_Makefile 1.1.10.2 644)
@@ -14,7 +14,7 @@ export-objs = signal.o sys.o kmod.o cont
obj-y = sched.o dma.o fork.o exec_domain.o panic.o printk.o \
module.o exit.o itimer.o info.o time.o softirq.o resource.o \
sysctl.o acct.o capability.o ptrace.o timer.o user.o \
- signal.o sys.o kmod.o context.o
+ signal.o sys.o kmod.o context.o dynamic_syscalls.o

obj-$(CONFIG_UID16) += uid16.o
obj-$(CONFIG_MODULES) += ksyms.o
Index: 17.1/fs/proc/proc_misc.c
--- 17.1/fs/proc/proc_misc.c Thu, 22 Nov 2001 11:15:28 +1100 kaos (linux-2.4/o/b/48_proc_misc. 1.1.1.1.1.1.1.8 644)
+++ 17.1(w)/fs/proc/proc_misc.c Sat, 22 Dec 2001 22:13:37 +1100 kaos (linux-2.4/o/b/48_proc_misc. 1.1.1.1.1.1.1.8 644)
@@ -36,6 +36,7 @@
#include <linux/init.h>
#include <linux/smp_lock.h>
#include <linux/seq_file.h>
+#include <linux/dynamic_syscalls.h>

#include <asm/uaccess.h>
#include <asm/pgtable.h>
@@ -320,6 +321,13 @@ static int devices_read_proc(char *page,
return proc_calc_metrics(page, start, off, count, eof, len);
}

+static int dynamic_syscalls_read_proc(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ int len = get_dynamic_syscalls_list(page);
+ return proc_calc_metrics(page, start, off, count, eof, len);
+}
+
static int partitions_read_proc(char *page, char **start, off_t off,
int count, int *eof, void *data)
{
@@ -511,6 +519,7 @@ void __init proc_misc_init(void)
#endif
{"stat", kstat_read_proc},
{"devices", devices_read_proc},
+ {"dynamic_syscalls", dynamic_syscalls_read_proc},
{"partitions", partitions_read_proc},
#if !defined(CONFIG_ARCH_S390)
{"interrupts", interrupts_read_proc},
Index: 17.1/kernel/dynamic_syscalls.c
--- 17.1/kernel/dynamic_syscalls.c Sat, 22 Dec 2001 22:21:58 +1100 kaos ()
+++ 17.1(w)/kernel/dynamic_syscalls.c Sat, 22 Dec 2001 22:18:29 +1100 kaos (linux-2.4/O/f/42_dynamic_sy 644)
@@ -0,0 +1,93 @@
+/*
+ * kernel/dynamic_syscalls.c
+ *
+ * (C) 2001 Keith owens <[email protected]>
+ *
+ * Assign dynamic syscall numbers for testing. Code that has not been assigned
+ * an official syscall number can use these functions to get a dynamic syscall
+ * number during testing. It only works for syscall code that is built into
+ * the kernel, there is no architecture independent way of handling a syscall
+ * when the code is in a module, not to mention that such code would be
+ * horribly racy against module unload. None of the functions should be
+ * exported, to prevent modules calling this code by mistake.
+ *
+ * NOTE: This facility is only to be used during testing. When your code is
+ * ready to be included in the mainstream kernel, you must get official
+ * syscall numbers from whoever controls the syscall numbers.
+ */
+
+#include <asm/dynamic_syscalls.h>
+
+#ifdef DYNAMIC_SYSCALL_FIRST
+
+#include <linux/kernel.h>
+#include <linux/smp.h>
+#include <linux/sched.h>
+#include <linux/errno.h>
+#include <linux/brlock.h>
+
+static rwlock_t dynamic_syscalls_lock = RW_LOCK_UNLOCKED;
+static const char *dynamic_syscalls_name[DYNAMIC_SYSCALL_LAST-DYNAMIC_SYSCALL_FIRST+1];
+extern DYNAMIC_SYSCALL_T sys_call_table[];
+
+/**
+ * get_dynamic_syscalls_list - print the list of dynamic syscall numbers
+ * @page: output buffer
+ *
+ * Description: Fill up a /proc buffer to print the dynamically assigned syscall
+ * numbers. Assumes that the data will not exceed the size of the /proc page.
+ **/
+
+int get_dynamic_syscalls_list(char *page)
+{
+ int i, len;
+ len = sprintf(page, "Dynamic syscall numbers:\n");
+ read_lock(&dynamic_syscalls_lock);
+ for (i = 0; i < sizeof(dynamic_syscalls_name)/sizeof(dynamic_syscalls_name[0]) ; i++) {
+ if (dynamic_syscalls_name[i]) {
+ len += sprintf(page+len, "%d %s 0x%" DYNAMIC_SYSCALL_FMT "x\n",
+ i+DYNAMIC_SYSCALL_FIRST,
+ dynamic_syscalls_name[i],
+ sys_call_table[i+DYNAMIC_SYSCALL_FIRST]);
+ }
+ }
+ read_unlock(&dynamic_syscalls_lock);
+ return len;
+}
+
+/**
+ * register_dynamic_syscall - assign a dynamic syscall number.
+ * @name: the name of the syscall, used by user space code to find the number.
+ * Use a unique name, if there is any possibility of conflict with
+ * other test syscalls then include your company or initials in the name.
+ * @func: address of function to be invoked by this syscall name. If the
+ * function is in a module then the results are undefined.
+ *
+ * Description: Find the first syscall that has a null name pointer and the
+ * syscall table entry is empty.
+ *
+ * Returns: < 0, an error occured, the return value is the error number.
+ * > 0, the syscall number that has been assigned.
+ **/
+
+int register_dynamic_syscall(const char *name, void (*func)(void))
+{
+ int i, ret = -EBUSY;
+ write_lock(&dynamic_syscalls_lock);
+ for (i = 0; i < sizeof(dynamic_syscalls_name)/sizeof(dynamic_syscalls_name[0]) ; i++) {
+ if (dynamic_syscalls_name[i] == NULL && sys_call_table[i+DYNAMIC_SYSCALL_FIRST] == DYNAMIC_SYSCALL_EMPTY) {
+ dynamic_syscalls_name[i] = name;
+ sys_call_table[i+DYNAMIC_SYSCALL_FIRST] = DYNAMIC_SYSCALL_FUNCADDR(func);
+ ret = i+DYNAMIC_SYSCALL_FIRST+DYNAMIC_SYSCALL_OFFSET;
+ break;
+ }
+ }
+ write_unlock(&dynamic_syscalls_lock);
+ return ret;
+}
+
+/* No unregister_dynamic_syscall function. Syscalls are not supported in
+ * modules.
+ */
+
+#endif /* DYNAMIC_SYSCALL_FIRST */
Index: 17.1/include/linux/dynamic_syscalls.h
--- 17.1/include/linux/dynamic_syscalls.h Sat, 22 Dec 2001 22:21:58 +1100 kaos ()
+++ 17.1(w)/include/linux/dynamic_syscalls.h Sat, 22 Dec 2001 22:13:37 +1100 kaos (linux-2.4/O/f/43_dynamic_sy 644)
@@ -0,0 +1,9 @@
+#ifndef _LINUX_DYNAMIC_SYSCALLS_H
+#define _LINUX_DYNAMIC_SYSCALLS_H
+
+extern int get_dynamic_syscalls_list(char *page);
+extern int register_dynamic_syscall(const char *name, void (*func)(void));
+
+#define DYNAMIC_SYSCALL_FUNC(f) (void (*)(void))(&f)
+
+#endif /* _LINUX_DYNAMIC_SYSCALLS_H */
Index: 17.1/include/asm-ia64/dynamic_syscalls.h
--- 17.1/include/asm-ia64/dynamic_syscalls.h Sat, 22 Dec 2001 22:21:58 +1100 kaos ()
+++ 17.1(w)/include/asm-ia64/dynamic_syscalls.h Sat, 22 Dec 2001 22:13:37 +1100 kaos (linux-2.4/O/f/44_dynamic_sy 644)
@@ -0,0 +1,28 @@
+#ifndef _ASM_DYNAMIC_SYSCALLS_H
+#define _ASM_DYNAMIC_SYSCALLS_H
+
+#include <linux/sys.h>
+
+#define DYNAMIC_SYSCALL_T long long
+#define DYNAMIC_SYSCALL_FMT "ll"
+
+/* IA64 function parameters do not point to the function itself, they point to a
+ * descriptor containing the 64 bit address of the function and the global
+ * pointer. Dereference the function pointer to get the real function address,
+ * the syscall table requires direct addresses.
+ *
+ * This will almost certainly destroy your kernel if the syscall function is in
+ * a module because it will be entered with the wrong global pointer.
+ * Don't do that.
+ */
+
+#include <linux/types.h>
+#define DYNAMIC_SYSCALL_FUNCADDR(f) ({DYNAMIC_SYSCALL_T *fp = (DYNAMIC_SYSCALL_T *)(f); fp[0];})
+
+#define DYNAMIC_SYSCALL_OFFSET 1024
+#define DYNAMIC_SYSCALL_FIRST 240
+#define DYNAMIC_SYSCALL_LAST NR_syscalls-1
+#define DYNAMIC_SYSCALL_EMPTY DYNAMIC_SYSCALL_FUNCADDR(&ia64_ni_syscall)
+extern long ia64_ni_syscall(void); /* No need to define parameters */
+
+#endif /* _ASM_DYNAMIC_SYSCALLS_H */
Index: 17.1/include/asm-i386/dynamic_syscalls.h
--- 17.1/include/asm-i386/dynamic_syscalls.h Sat, 22 Dec 2001 22:21:58 +1100 kaos ()
+++ 17.1(w)/include/asm-i386/dynamic_syscalls.h Sat, 22 Dec 2001 22:13:37 +1100 kaos (linux-2.4/O/f/45_dynamic_sy 644)
@@ -0,0 +1,21 @@
+#ifndef _ASM_DYNAMIC_SYSCALLS_H
+#define _ASM_DYNAMIC_SYSCALLS_H
+
+#include <linux/sys.h>
+
+#define DYNAMIC_SYSCALL_T long
+#define DYNAMIC_SYSCALL_FMT "l"
+
+/* This is a noop on most systems. It does real work on architectures that use
+ * function descriptors. See asm-ia64/dynamic_syscalls.h for an example.
+ */
+
+#define DYNAMIC_SYSCALL_FUNCADDR(f) (DYNAMIC_SYSCALL_T)(f)
+
+#define DYNAMIC_SYSCALL_OFFSET 0
+#define DYNAMIC_SYSCALL_FIRST 240
+#define DYNAMIC_SYSCALL_LAST NR_syscalls-1
+#define DYNAMIC_SYSCALL_EMPTY DYNAMIC_SYSCALL_FUNCADDR(&sys_ni_syscall)
+extern long sys_ni_syscall(void); /* No need to define parameters */
+
+#endif /* _ASM_DYNAMIC_SYSCALLS_H */


2001-12-24 17:06:38

by Doug Ledford

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

Alan Cox wrote:

>>Well, I'm not going to mess with code, but here's the example. Say you
>>start at syscall 240 for dynamic registration. Someone then submits a patch
>>
>
> The number you start at depends on the kernel you run.
>
>
>>modify the base of your patch, but if it has been accepted into any real
>>kernels anywhere, then someone could inadvertently end up running a user
>>space app compiled against Linus' new kernel and that uses the newly
>>allocated syscalls 240 and 241. If that's run on an older kernel with your
>>
>
> The code on execution will read the syscall numbers from procfs. It will
> find new numbers and call those. Its a very simple implementation of lazy
> binding. It only breaks if you actually run out of syscalls, and then it
> fails safe.
>
> Alan
>
>

No it doesn't. You are *assuming* that *all* code will check the lazy
syscall bindings. My example was about code using the predefined syscall
number for new functions on an older kernel where those functions don't
exist, but where they overlap with the older dynamic syscall numbers. In
short, the patch is safe for code that uses the lazy binding, but it can
still overlap with future syscall numbers and code that doesn't use the lazy
binding but instead uses predefined numbers.

--

Doug Ledford <[email protected]> http://people.redhat.com/dledford
Please check my web site for aic7xxx updates/answers before
e-mailing me about problems

2001-12-24 18:00:42

by David Lang

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

you miss the point, the syscall numbers will not nessasarily be consistant
from boot to boot so if your code does not check for them it's seriously
broken (and remember this is only for stuff in experimental status). The
hope is that most if not all of the real checking can end up being done in
glibc

David Lang



On Mon, 24 Dec 2001, Doug Ledford wrote:

> Date: Mon, 24 Dec 2001 12:06:19 -0500
> From: Doug Ledford <[email protected]>
> To: Alan Cox <[email protected]>
> Cc: Keith Owens <[email protected]>, Benjamin LaHaise <[email protected]>,
> [email protected]
> Subject: Re: [patch] Assigning syscall numbers for testing
>
> Alan Cox wrote:
>
> >>Well, I'm not going to mess with code, but here's the example. Say you
> >>start at syscall 240 for dynamic registration. Someone then submits a patch
> >>
> >
> > The number you start at depends on the kernel you run.
> >
> >
> >>modify the base of your patch, but if it has been accepted into any real
> >>kernels anywhere, then someone could inadvertently end up running a user
> >>space app compiled against Linus' new kernel and that uses the newly
> >>allocated syscalls 240 and 241. If that's run on an older kernel with your
> >>
> >
> > The code on execution will read the syscall numbers from procfs. It will
> > find new numbers and call those. Its a very simple implementation of lazy
> > binding. It only breaks if you actually run out of syscalls, and then it
> > fails safe.
> >
> > Alan
> >
> >
>
> No it doesn't. You are *assuming* that *all* code will check the lazy
> syscall bindings. My example was about code using the predefined syscall
> number for new functions on an older kernel where those functions don't
> exist, but where they overlap with the older dynamic syscall numbers. In
> short, the patch is safe for code that uses the lazy binding, but it can
> still overlap with future syscall numbers and code that doesn't use the lazy
> binding but instead uses predefined numbers.
>
> --
>
> Doug Ledford <[email protected]> http://people.redhat.com/dledford
> Please check my web site for aic7xxx updates/answers before
> e-mailing me about problems
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-12-24 18:14:12

by Alan

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

> syscall bindings. My example was about code using the predefined syscall
> number for new functions on an older kernel where those functions don't
> exist, but where they overlap with the older dynamic syscall numbers. In
> short, the patch is safe for code that uses the lazy binding, but it can
> still overlap with future syscall numbers and code that doesn't use the lazy
> binding but instead uses predefined numbers.

Now I follow you. So if Linus takes that patch he needs to allocate a block
of per architecture dynamic syscall number space for it to use. Negative
syscall numbers seem the most promising approach ?

2001-12-24 18:16:52

by Doug Ledford

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

Alan Cox wrote:

>>syscall bindings. My example was about code using the predefined syscall
>>number for new functions on an older kernel where those functions don't
>>exist, but where they overlap with the older dynamic syscall numbers. In
>>short, the patch is safe for code that uses the lazy binding, but it can
>>still overlap with future syscall numbers and code that doesn't use the lazy
>>binding but instead uses predefined numbers.
>>
>
> Now I follow you. So if Linus takes that patch he needs to allocate a block
> of per architecture dynamic syscall number space for it to use. Negative
> syscall numbers seem the most promising approach ?
>
>

Something like that. It needs to be a large enough range to reasonably
support the maximum number of expected syscalls that could possibly be in
testing at one time (which is a total guesstimate if you ask me), and it
should hopefully be up high so that we aren't allocating new numbers around
it. However, I think it needs to be allocated *regardless* of whether Linus
takes the patch into his kernel. Even if the patch is simply used outside
Linus's kernel, it still needs the allocation to truly be safe.

--

Doug Ledford <[email protected]> http://people.redhat.com/dledford
Please check my web site for aic7xxx updates/answers before
e-mailing me about problems

2001-12-24 18:13:52

by Doug Ledford

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

David Lang wrote:

> you miss the point, the syscall numbers will not nessasarily be consistant
> from boot to boot so if your code does not check for them it's seriously
> broken (and remember this is only for stuff in experimental status). The
> hope is that most if not all of the real checking can end up being done in
> glibc


No, I'm not missing the point. Try to follow with me here, this isn't
rocket science. *NOT* *ALL* *SOFTWARE* *IS* *OR* *WILL* *BE* *USING*
*DYNAMIC* *SYSCALLS*. Your scenario is fine if you want to convert all
existing software to dynamic syscalls. However, my scenario specifically
dealt with software that *DOES* *NOT* use dynamic syscalls (and which
doesn't need to because the syscalls it *does* use have been allocated).

Since people are having such a hard time with this, let me spell it out in
more detail. Assume the following scenario:

Linux 2.4.17 + dynamic syscall patch. Dynamic syscalls start at 240.

Linux 2.4.18 comes out, and now there are two *new* *official* *statically*
*allocated* syscalls at 240 and 241 (they are SYSGETAMIBLKHEAD and
SYSSETAMIBLKHEAD).

A new piece of software (or an existing one, doesn't matter) is written to
take advantage of the new syscalls. It uses the *predefined* syscall
numbers and is compiled against 2.4.18. It relies upon -ENOSYS (as is
typical for non-dynamic syscalls) to indicate if the kernel doesn't support
the intended syscalls.

Now, someone without realizing the implications of what's going on, runs
this new program on a machine running the 2.4.17 + dynamic syscall patch.

BOOM!

So, to reiterate my points. This *IS* *NOT* *SAFE* unless either A) the
dynamic syscall number range is officially allocated *before* the patch goes
into use to avoid these collisions later or B) you switch *all* software to
using dynamic syscalls (which does have a performance impact on the software
and which would also require lots of work).


> David Lang
>
>
>
> On Mon, 24 Dec 2001, Doug Ledford wrote:
>
>
>>Date: Mon, 24 Dec 2001 12:06:19 -0500
>>From: Doug Ledford <[email protected]>
>>To: Alan Cox <[email protected]>
>>Cc: Keith Owens <[email protected]>, Benjamin LaHaise <[email protected]>,
>> [email protected]
>>Subject: Re: [patch] Assigning syscall numbers for testing
>>
>>Alan Cox wrote:
>>
>>
>>>>Well, I'm not going to mess with code, but here's the example. Say you
>>>>start at syscall 240 for dynamic registration. Someone then submits a patch
>>>>
>>>>
>>>The number you start at depends on the kernel you run.
>>>
>>>
>>>
>>>>modify the base of your patch, but if it has been accepted into any real
>>>>kernels anywhere, then someone could inadvertently end up running a user
>>>>space app compiled against Linus' new kernel and that uses the newly
>>>>allocated syscalls 240 and 241. If that's run on an older kernel with your
>>>>
>>>>
>>>The code on execution will read the syscall numbers from procfs. It will
>>>find new numbers and call those. Its a very simple implementation of lazy
>>>binding. It only breaks if you actually run out of syscalls, and then it
>>>fails safe.
>>>
>>>Alan
>>>
>>>
>>>
>>No it doesn't. You are *assuming* that *all* code will check the lazy
>>syscall bindings. My example was about code using the predefined syscall
>>number for new functions on an older kernel where those functions don't
>>exist, but where they overlap with the older dynamic syscall numbers. In
>>short, the patch is safe for code that uses the lazy binding, but it can
>>still overlap with future syscall numbers and code that doesn't use the lazy
>>binding but instead uses predefined numbers.
>>
>>--
>>
>> Doug Ledford <[email protected]> http://people.redhat.com/dledford
>> Please check my web site for aic7xxx updates/answers before
>> e-mailing me about problems
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to [email protected]
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at http://www.tux.org/lkml/
>>
>>
>



--

Doug Ledford <[email protected]> http://people.redhat.com/dledford
Please check my web site for aic7xxx updates/answers before
e-mailing me about problems

2001-12-24 18:20:12

by David Lang

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

so this just means that an eye needs to be kept on the non-dynamic
syscalls and up the starting point for dynamic syscalls significantly
before we run out of space for the non-dynamic ones.

running software that depends on features in a new kernel on a
significantly older kernel is always questionable, if you software really
needs to do that you need to watch for a bunch of things.

David Lang


On Mon, 24 Dec 2001, Doug Ledford wrote:

> Date: Mon, 24 Dec 2001 13:13:29 -0500
> From: Doug Ledford <[email protected]>
> To: David Lang <[email protected]>
> Cc: Alan Cox <[email protected]>, Keith Owens <[email protected]>,
> Benjamin LaHaise <[email protected]>, [email protected]
> Subject: Re: [patch] Assigning syscall numbers for testing
>
> David Lang wrote:
>
> > you miss the point, the syscall numbers will not nessasarily be consistant
> > from boot to boot so if your code does not check for them it's seriously
> > broken (and remember this is only for stuff in experimental status). The
> > hope is that most if not all of the real checking can end up being done in
> > glibc
>
>
> No, I'm not missing the point. Try to follow with me here, this isn't
> rocket science. *NOT* *ALL* *SOFTWARE* *IS* *OR* *WILL* *BE* *USING*
> *DYNAMIC* *SYSCALLS*. Your scenario is fine if you want to convert all
> existing software to dynamic syscalls. However, my scenario specifically
> dealt with software that *DOES* *NOT* use dynamic syscalls (and which
> doesn't need to because the syscalls it *does* use have been allocated).
>
> Since people are having such a hard time with this, let me spell it out in
> more detail. Assume the following scenario:
>
> Linux 2.4.17 + dynamic syscall patch. Dynamic syscalls start at 240.
>
> Linux 2.4.18 comes out, and now there are two *new* *official* *statically*
> *allocated* syscalls at 240 and 241 (they are SYSGETAMIBLKHEAD and
> SYSSETAMIBLKHEAD).
>
> A new piece of software (or an existing one, doesn't matter) is written to
> take advantage of the new syscalls. It uses the *predefined* syscall
> numbers and is compiled against 2.4.18. It relies upon -ENOSYS (as is
> typical for non-dynamic syscalls) to indicate if the kernel doesn't support
> the intended syscalls.
>
> Now, someone without realizing the implications of what's going on, runs
> this new program on a machine running the 2.4.17 + dynamic syscall patch.
>
> BOOM!
>
> So, to reiterate my points. This *IS* *NOT* *SAFE* unless either A) the
> dynamic syscall number range is officially allocated *before* the patch goes
> into use to avoid these collisions later or B) you switch *all* software to
> using dynamic syscalls (which does have a performance impact on the software
> and which would also require lots of work).
>
>
> > David Lang
> >
> >
> >
> > On Mon, 24 Dec 2001, Doug Ledford wrote:
> >
> >
> >>Date: Mon, 24 Dec 2001 12:06:19 -0500
> >>From: Doug Ledford <[email protected]>
> >>To: Alan Cox <[email protected]>
> >>Cc: Keith Owens <[email protected]>, Benjamin LaHaise <[email protected]>,
> >> [email protected]
> >>Subject: Re: [patch] Assigning syscall numbers for testing
> >>
> >>Alan Cox wrote:
> >>
> >>
> >>>>Well, I'm not going to mess with code, but here's the example. Say you
> >>>>start at syscall 240 for dynamic registration. Someone then submits a patch
> >>>>
> >>>>
> >>>The number you start at depends on the kernel you run.
> >>>
> >>>
> >>>
> >>>>modify the base of your patch, but if it has been accepted into any real
> >>>>kernels anywhere, then someone could inadvertently end up running a user
> >>>>space app compiled against Linus' new kernel and that uses the newly
> >>>>allocated syscalls 240 and 241. If that's run on an older kernel with your
> >>>>
> >>>>
> >>>The code on execution will read the syscall numbers from procfs. It will
> >>>find new numbers and call those. Its a very simple implementation of lazy
> >>>binding. It only breaks if you actually run out of syscalls, and then it
> >>>fails safe.
> >>>
> >>>Alan
> >>>
> >>>
> >>>
> >>No it doesn't. You are *assuming* that *all* code will check the lazy
> >>syscall bindings. My example was about code using the predefined syscall
> >>number for new functions on an older kernel where those functions don't
> >>exist, but where they overlap with the older dynamic syscall numbers. In
> >>short, the patch is safe for code that uses the lazy binding, but it can
> >>still overlap with future syscall numbers and code that doesn't use the lazy
> >>binding but instead uses predefined numbers.
> >>
> >>--
> >>
> >> Doug Ledford <[email protected]> http://people.redhat.com/dledford
> >> Please check my web site for aic7xxx updates/answers before
> >> e-mailing me about problems
> >>
> >>-
> >>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>the body of a message to [email protected]
> >>More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>Please read the FAQ at http://www.tux.org/lkml/
> >>
> >>
> >
>
>
>
> --
>
> Doug Ledford <[email protected]> http://people.redhat.com/dledford
> Please check my web site for aic7xxx updates/answers before
> e-mailing me about problems
>

2001-12-24 18:24:12

by Doug Ledford

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

David Lang wrote:

> so this just means that an eye needs to be kept on the non-dynamic
> syscalls and up the starting point for dynamic syscalls significantly
> before we run out of space for the non-dynamic ones.
>
> running software that depends on features in a new kernel on a
> significantly older kernel is always questionable, if you software really
> needs to do that you need to watch for a bunch of things.


No. This is different. Calling a syscall and expecting to get either A)
the syscall you intended or B) -ENOSYS is an accepted, safe practice under
Unix/Linux. This breaks that practice.





--

Doug Ledford <[email protected]> http://people.redhat.com/dledford
Please check my web site for aic7xxx updates/answers before
e-mailing me about problems

2001-12-24 18:56:15

by Alan

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

> it. However, I think it needs to be allocated *regardless* of whether Linus
> takes the patch into his kernel. Even if the patch is simply used outside
> Linus's kernel, it still needs the allocation to truly be safe.

Negative numbers are safe until Linus has 2^31 syscalls, at which point
quite frankly we would have a few other problems including the fact that
the syscall table won't fit in kernel mapped memory.

I'm sure we could get Linus to agree not to use negative numbers out of
spite.

2001-12-24 19:32:00

by Russell King

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

On Mon, Dec 24, 2001 at 07:05:31PM +0000, Alan Cox wrote:
> > it. However, I think it needs to be allocated *regardless* of whether Linus
> > takes the patch into his kernel. Even if the patch is simply used outside
> > Linus's kernel, it still needs the allocation to truly be safe.
>
> Negative numbers are safe until Linus has 2^31 syscalls, at which point
> quite frankly we would have a few other problems including the fact that
> the syscall table won't fit in kernel mapped memory.

Please leave the allocation of the exact number space to the port
maintainers discression.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-12-24 20:38:06

by Alan

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

> > Negative numbers are safe until Linus has 2^31 syscalls, at which point
> > quite frankly we would have a few other problems including the fact that
> > the syscall table won't fit in kernel mapped memory.
>
> Please leave the allocation of the exact number space to the port
> maintainers discression.

Sorry.. I'm talking about x86 here. Linus is the x86 port maintainer as
well so we have to plan it out that way. For non x86 sure.

Alan

2001-12-24 23:39:05

by Edgar Toernig

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

Russell King wrote:
>
> On Mon, Dec 24, 2001 at 07:05:31PM +0000, Alan Cox wrote:
> > > it. However, I think it needs to be allocated *regardless* of whether Linus
> > > takes the patch into his kernel. Even if the patch is simply used outside
> > > Linus's kernel, it still needs the allocation to truly be safe.
> >
> > Negative numbers are safe until Linus has 2^31 syscalls, at which point
> > quite frankly we would have a few other problems including the fact that
> > the syscall table won't fit in kernel mapped memory.
>
> Please leave the allocation of the exact number space to the port
> maintainers discression.

Why not assign 1 syscall that gets the name of an experimental syscall
as its first argument and does the demultiplexing?

Ciao, ET.

2001-12-24 23:48:05

by Andreas Steinmetz

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing


On 24-Dec-2001 Edgar Toernig wrote:
> Russell King wrote:
>>
>> On Mon, Dec 24, 2001 at 07:05:31PM +0000, Alan Cox wrote:
>> > > it. However, I think it needs to be allocated *regardless* of whether
>> > > Linus
>> > > takes the patch into his kernel. Even if the patch is simply used
>> > > outside
>> > > Linus's kernel, it still needs the allocation to truly be safe.
>> >
>> > Negative numbers are safe until Linus has 2^31 syscalls, at which point
>> > quite frankly we would have a few other problems including the fact that
>> > the syscall table won't fit in kernel mapped memory.
>>
>> Please leave the allocation of the exact number space to the port
>> maintainers discression.
>
> Why not assign 1 syscall that gets the name of an experimental syscall
> as its first argument and does the demultiplexing?
>

Please, no multiplexing. A well defined range (small as it may be) open to
developers (and thus collision) will do.

> Ciao, ET.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Andreas Steinmetz
D.O.M. Datenverarbeitung GmbH

2001-12-25 02:18:57

by Keith Owens

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

On Mon, 24 Dec 2001 13:13:29 -0500,
Doug Ledford <[email protected]> wrote:
>David Lang wrote:
>Since people are having such a hard time with this, let me spell it out in
>more detail. Assume the following scenario:
>
>Linux 2.4.17 + dynamic syscall patch. Dynamic syscalls start at 240.
>
>Linux 2.4.18 comes out, and now there are two *new* *official* *statically*
>*allocated* syscalls at 240 and 241 (they are SYSGETAMIBLKHEAD and
>SYSSETAMIBLKHEAD).
>
>A new piece of software (or an existing one, doesn't matter) is written to
>take advantage of the new syscalls. It uses the *predefined* syscall
>numbers and is compiled against 2.4.18. It relies upon -ENOSYS (as is
>typical for non-dynamic syscalls) to indicate if the kernel doesn't support
>the intended syscalls.
>
>Now, someone without realizing the implications of what's going on, runs
>this new program on a machine running the 2.4.17 + dynamic syscall patch.
>
>BOOM!

i386 dynamic syscall table starts at 240. Last assigned syscall entry
is currently 225, leaving room for 14 new assigned syscalls. 2.4.0
(January 5 2001) had 222 syscalls, so 2.4 added 3 assigned syscalls in
just under a year. At that rate the dynamic syscall range is safe for
4 years. I make the reasonable assumption that new syscalls are
assigned monotonically, that Linus does not arbitrarily assign numbers
with gaps between them.

You can argue about whether the gap will close in 2 or 4 years. I
suspect it will be longer than 4 years because syscall growth has been
dropping off since 2.0.

You will only get a problem when :-

* The assigned numbers reach the dynamic range _and_
* A program with an assigned syscall that overlaps the old dynamic
range runs on a 4 year old kernel _and_
* The 4 year old kernel has the dynamic syscall patch _and_
* Some code on the 4 year old kernel is using dynamic syscalls.

A problem will only arise if _all_ of those criteria are met. In
particular the old kernel must still be running application code that
uses dynamic syscalls, no dynamic syscalls used == no problem.

My patch is for _testing_ syscalls, not for long term use instead of
getting assigned syscall numbers. Even if you run a brand new
application on a 4 year old kernel, you will not still be running
testing code on the old kernel.

I agree that it would be nice to have a permanently reserved dynamic
range and if my patch goes into the kernel that is exactly what I will
ask Linus for. Even if there is no reserved dynamic range, the risk of
causing problems with new applications on old kernels is miniscule.

ps. Bah, humbug!

2001-12-25 10:01:36

by Russell King

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

On Tue, Dec 25, 2001 at 01:18:26PM +1100, Keith Owens wrote:
> i386 dynamic syscall table starts at 240. Last assigned syscall entry
> is currently 225, leaving room for 14 new assigned syscalls. 2.4.0
> (January 5 2001) had 222 syscalls, so 2.4 added 3 assigned syscalls in
> just under a year.

Erm, there's a rather obvious flaw in your argument here - 2.4 is supposed
to be a stable kernel with relatively few features appearing in it. We're
now into 2.5. We've already seen several people trying to get new syscall
numbers between 2.5.0 and 2.5.1, which is also a relatively short
timeframe.

Lets look at some more realistic timeframe. These figures are for i386:

2.2.20 - 190 syscalls, last one is sys_vfork
2.4.17 - 225 syscalls, last one is sys_readahead

So, between these two stable kernel series, _35_ syscalls have been added.
If we assume this trend will continue through 2.5, then we'll be up to
260 syscalls when 2.6 or 3.0 is out.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-12-26 18:11:14

by Riley Williams

[permalink] [raw]
Subject: Re: [patch] Assigning syscall numbers for testing

Hi David.

> so this just means that an eye needs to be kept on the non-dynamic
> syscalls and up the starting point for dynamic syscalls
> significantly before we run out of space for the non-dynamic ones.

How far in advance do you call "significantly"? I know of dedicated
systems still running Linux 1.2.12, and I use 2.0 series kernels on
some of my systems, so your solution is a non-starter in my book.

I'd say that there are only three options for safely dealing with
this problem...

1. Allocate a fixed range for dynamic syscalls that can NEVER be
used by static ones. If the static syscalls hit the bottom of
it, they skip it and restart just above its top.

Two variants of this have been proposed so far...

a. Negative syscall numbers are dynamic.

b. Syscall numbers with the MSB set are dynamic.

...and these are essentially variants on the same thing.

2. Each syscall includes a flag stating whether it's a static
or dynamic one, with separate jump tables for each. This
would be something that is currently always the same state
for existing static syscalls.

One variant of this would be to redefine the syscall number
as being two fields, with the MSB as the static/dynamic flag
and the rest as the syscall number. This would incorporate
option (1b) as a variant of this option.

3. Have separate syscall entry points for static and dynamic
syscalls.

...and anything else is at risk from the scenario referred to.

> running software that depends on features in a new kernel on a
> significantly older kernel is always questionable, if you software
> really needs to do that you need to watch for a bunch of things.

Very true, but not necessarily relevant.

>>> you miss the point, the syscall numbers will not nessasarily be
>>> consistant from boot to boot so if your code does not check for
>>> them it's seriously broken (and remember this is only for stuff
>>> in experimental status). The hope is that most if not all of the
>>> real checking can end up being done in glibc

>> No, I'm not missing the point. Try to follow with me here, this
>> isn't rocket science. *NOT* *ALL* *SOFTWARE* *IS* *OR* *WILL* *BE*
>> *USING* *DYNAMIC* *SYSCALLS*. Your scenario is fine if you want to
>> convert all existing software to dynamic syscalls. However, my
>> scenario specifically dealt with software that *DOES* *NOT* use
>> dynamic syscalls (and which doesn't need to because the syscalls it
>> *does* use have been allocated).
>>
>> Since people are having such a hard time with this, let me spell it
>> out in more detail. Assume the following scenario:
>>
>> Linux 2.4.17 + dynamic syscall patch. Dynamic syscalls start at 240.

Assume they finish at 255 for my comments below.

>> Linux 2.4.18 comes out, and now there are two *new* *official*
>> *statically* *allocated* syscalls at 240 and 241 (they are
>> SYSGETAMIBLKHEAD and SYSSETAMIBLKHEAD).

Preventing this would require that solution (1) has been implemented,
in which case the new syscalls would be 256 and 257 as 240 through 255
are reserved for the dynamic syscalls.

>> A new piece of software (or an existing one, doesn't matter) is
>> written to take advantage of the new syscalls. It uses the
>> *predefined* syscall numbers and is compiled against 2.4.18. It
>> relies upon -ENOSYS (as is typical for non-dynamic syscalls) to
>> indicate if the kernel doesn't support the intended syscalls.
>>
>> Now, someone without realizing the implications of what's going
>> on, runs this new program on a machine running the 2.4.17+dynamic
>> syscall patch.
>>
>> BOOM!
>>
>> So, to reiterate my points. This *IS* *NOT* *SAFE* unless either
>>
>> A) the dynamic syscall number range is officially allocated
>> *before* the patch goes into use to avoid these collisions
>> later or
>>
>> B) you switch *all* software to using dynamic syscalls (which
>> does have a performance impact on the software and which
>> would also require lots of work).

Best wishes from Riley.