2007-01-07 08:15:40

by Amit Choudhary

[permalink] [raw]
Subject: [DISCUSS] Making system calls more portable.

Hi,

I wanted to know if there is any inclination towards making system calls more portable. Please let
me know if this discussion has happened before.

Well, system calls today are not portable mainly because they are invoked using a number and it
may happen that a number 'N' may refer to systemcall_1() on one system/kernel and to
systemcall_2() on another system/kernel. This problem may surface if you compile your program
using headers from version_1 of the kernel, and then install another version of the kernel or a
custom kernel that has extended the system call table (on the same system). If we want to improve
the portability then we can avoid this approach or improve this approach. It may or may not be
complex to implement these.

1. Invoke a system call using its name. Pass its name to the kernel as an argument of syscall() or
some other function. Probably may make the invocation of the system call slower. If the name
doesn't match in the kernel then an error can be returned.

2. Create a /proc entry that will return the number of the system call given its name. This number
can then be used to invoke the system call.

These approaches will also remove the dependency from user space header file that contains the
mapping from the system call name to its number. I hope that I made some sense.

Regards,
Amit


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


2007-01-07 08:26:34

by Rene Herman

[permalink] [raw]
Subject: Re: [DISCUSS] Making system calls more portable.

On 01/07/2007 09:15 AM, Amit Choudhary wrote:

> Well, system calls today are not portable mainly because they are
> invoked using a number and it may happen that a number 'N' may refer
> to systemcall_1() on one system/kernel and to systemcall_2() on
> another system/kernel.

If we're limited to Linux kernels, this seems to not be the case. Great
care is taken in keeping this userspace ABI stable -- new system calls
are given new numbers. Old system calls may disappear (after a long
grace period) but even then I don't believe the number is ever recycled.

If your discussion is not limited to Linux kernels, then sure, but being
portable at that (sub-libc) level is asking too much.

> I hope that I made some sense.

Some, but your supposition seems unclear.

Rene

2007-01-07 09:07:43

by Amit Choudhary

[permalink] [raw]
Subject: Re: [DISCUSS] Making system calls more portable.


--- Rene Herman <[email protected]> wrote:
>
>If we're limited to Linux kernels, this seems to not be the case. Great care is taken in keeping
>this userspace ABI stable -- new system calls are given new numbers. Old system calls may
>disappear (after a long grace period) but even then I don't believe the number is ever recycled.
>
> If your discussion is not limited to Linux kernels, then sure, but being
> portable at that (sub-libc) level is asking too much.
>

I will come to the main issue later but I just wanted to point out that we maintain information at
two separate places - mapping between the name and the number in user space and kernel space.
Shouldn't this duplication be removed.

Now, let's say a vendor has linux_kernel_version_1 that has 300 system calls. The vendor needs to
give some extra functionality to its customers and the way chosen is to implement new system call.
The new system call number is 301. The customer gets this custom kernel and uses number 301.
Next, he downloads another kernel (newer linux kernel version) on his system that has already
implemented the system call numbered 301. The customer now runs his program. Even if he compiles
it again he has the old header files, so that does not make a difference.

Now his program uses number 301 that refers to some other system call and so, we can see system
crash, or some very wrong behaviour. Making system calls more portable will ensure that atleast
the program gets an indication that something is wrong (error returned from the kernel that this
system call name is not matched). Or, if the vendor is actually successful in pushing its system
call to the mainline kernel, no one needs to worry about it. Everything will run happily.

However, people may say that, implementing custom system calls is not advocated by linux. And I
think it is not advocated precisely because of this reason that they are not portable.

Regards,
Amit



__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

2007-01-07 09:18:06

by Rene Herman

[permalink] [raw]
Subject: Re: [DISCUSS] Making system calls more portable.

On 01/07/2007 10:07 AM, Amit Choudhary wrote:

> However, people may say that, implementing custom system calls is not
> advocated by linux. And I think it is not advocated precisely because
> of this reason that they are not portable.

True I guess. But do you want to live in a software environment where as
a matter of course every distribution out there puts its "value-add" in
custom system calls and creating a $DISTRIBUTION-only userspace? After
all, if nothing uses their shiny new custom syscall, they might as well
not add it. This would fragment Linux quite horribly and IMO cases where
this happens should be _dis_ couraged, not encouraged by making it less
problematic to survive the resulting mess.

Rene.

2007-01-07 09:33:57

by Vadim Lobanov

[permalink] [raw]
Subject: Re: [DISCUSS] Making system calls more portable.

On Sun, 2007-01-07 at 00:15 -0800, Amit Choudhary wrote:
> 1. Invoke a system call using its name. Pass its name to the kernel as an argument of syscall() or
> some other function. Probably may make the invocation of the system call slower. If the name
> doesn't match in the kernel then an error can be returned.
>
> 2. Create a /proc entry that will return the number of the system call given its name. This number
> can then be used to invoke the system call.

Your argument has a built-in assumption that, whereas syscall numbers do
collide, syscall names will not. This assumption is not true; people
will quite often pick the same name for something independent of each
other.

Additionally, the proposed solutions will require a dramatic increase in
memory, to store a static string name for each syscall, and a marked
increase in CPU usage, to do string hashing and matching for each
syscall invocation (and these can occur very often). This overhead is
hard to justify just to support custom vendor kernels, as Rene pointed
out in a separate reply.

> These approaches will also remove the dependency from user space header file that contains the
> mapping from the system call name to its number. I hope that I made some sense.

I thought that this file was "shipped upwards" by the kernel already, as
a sanitized header?

-- Vadim Lobanov


2007-01-07 11:04:49

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [DISCUSS] Making system calls more portable.


On Jan 7 2007 01:07, Amit Choudhary wrote:
>
>I will come to the main issue later but I just wanted to point out
>that we maintain information at two separate places - mapping
>between the name and the number in user space and kernel space.
>Shouldn't this duplication be removed.

For example? Do you plan on using "syscall strings" instead of
syscall numbers? I would not go for it. Comparing strings takes much
longer than comparing a register-size integer.

>Now, let's say a vendor has linux_kernel_version_1 that has 300
>system calls. The vendor needs to give some extra functionality to
>its customers and the way chosen is to implement new system call.
>The new system call number is 301. [...]

Umm, like with Internet addresses, you can't just reserve yourself
one you like. Including MACs on the local ethernet segment. Though
the MAC space is large with 2^48 or more, you can ARP spoof and
hinder the net.
In other words, if the vendor, or you, are going to use a
non-standard 301, you are supposed to run into problems, sooner or
later [Murphy's Law or Finagle's Corollary].
What you probably want is a syscall number range marked for private
use, much like there is for majors in /dev or 10.0.0.0/8 on inet.


-`J'
--

2007-01-07 14:03:33

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [DISCUSS] Making system calls more portable.

On Sun, Jan 07, 2007 at 01:07:41AM -0800, Amit Choudhary wrote:
> Now, let's say a vendor has linux_kernel_version_1 that has 300
> system calls. The vendor needs to give some extra functionality to
> its customers and the way chosen is to implement new system call.
> The new system call number is 301. The customer gets this custom
> kernel and uses number 301. Next, he downloads another kernel
> (newer linux kernel version) on his system that has already
> implemented the system call numbered 301. The customer now runs his
> program. Even if he compiles it again he has the old header files,
> so that does not make a difference.

So the vendor is doing something bad, and his customers will pay the
price, and they will switch to another vendor who isn't doesn't create
traps for their customers. What's the problem? :-)

Serious,y you got into trouble in your second sentence --- and not
just by the use of the passive voice: "the way chosen is to implement
(a) new system call". Don't do that.

There are plenty of other ways of requesting kernel services; you can
create your own device driver and pass string commands to the device
driver, for example. What? You say string-based parsing is slow?
But you were just advocating doing that for all system calls!

Well, then your other choice is to convince the kernel developers that
the interface is stable, and of general interest to the community ---
and if not, then perhaps a more general version of the interface can
be made, with peer review improving the design, and then that can get
implemented.

One example (of both strategies) is shared namespaces. A vendor's
engineers worked with the Linux VFS developers to introduce shared
namespaces, so that in the future an important product of theirs will
be able to use that feature, instead of a custom kernel module, which
was getting too expensive to maintain. Getting shared namespaces in
took well over a year, and it'll probably another year or two before
all customers' systems will have it and the product can be revved to
use it, but that's an example of the right way to do things; and in
the meantime, customers are using the version that requires a binary
module that does *NOT* use system calls to provide kernel callouts,
but a custom filesystem instead.

- Ted