2008-07-07 05:09:43

by Jinkai Gao

[permalink] [raw]
Subject: Suggestion: LKM should be able to add system call for itself

LKM(loadable kernel module) was first introduced for drivers. Users
rarely need to talk to the modules directly. If does, several methods
are available now, such as /proc file, interruption, etc. However,
these interfaces are predefined, which makes the communication between
user space and kernel space quite restricted. Although we can manage
to deliver the information or commands between them, the
implementation tends to be ugly.

Of course, for driver modules, these mechanisms are enough. But as
long as it is called Loadable Kernel Module instead of Loadable Kernel
Driver, I think it should be able to do more than that. For example,
LSM(linux security module),most of which(selinux, apparmor, etc.) use
policy files as their core. Users write policy files, LSM make access
control decision based on the files. Seems like users don't need to
talk to LSM directly. But what if user want to temporarily disable a
role or capability he is holding ? Not much he can do, isn't
it(although nothing is impossible, making a new system call makes much
more sense).

Above is to demonstrate that LKM is extension to kernel, and the
system calls should be able to extend as long as the kernel is
extending. So The LKM should be able to define its own user interface
by adding new system call for itself. And actually, it is not hard to
implement such kind of dynamic system call table as I thought it
through.

There was time when people can modify the sys_call_table[],which has
been forbidden since it was realized as extremly dangrous operation.
But thing can be implemented in a safe way. Kernel may provide
registration function like this:

typedef int (*syscall_func_t)(struct pt_regs regs);
int syscall_register(char* name, syscall_func_t sys_call);

So that modules can add their own system call without affect the
original sys_call_table[]. And since the system call number will be
unpredictable, either we let users know the number, or we generate the
corresponding library function to make users' life easier.

This is my personal opinion, any criticism or correction is welcomed.
Thanks

---
Syracuse University
Jinkai Gao


2008-07-07 07:01:38

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Suggestion: LKM should be able to add system call for itself

On Mon, 7 Jul 2008 01:09:30 -0400
"Jinkai Gao" <[email protected]> wrote:

> Above is to demonstrate that LKM is extension to kernel, and the
> system calls should be able to extend as long as the kernel is
> extending. So The LKM should be able to define its own user interface
> by adding new system call for itself.

Since we promise a stable ABI to userspace, this is a bit of a problem.

But... look today, we already have various system calls implemented by
modules. (example: sys_nfsservctl)
but to make it fully dynamic? Not a good idea... nobody would be able
to program to it.




--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-07-07 08:40:39

by Bart Van Assche

[permalink] [raw]
Subject: Re: Suggestion: LKM should be able to add system call for itself

On Mon, Jul 7, 2008 at 7:09 AM, Jinkai Gao <[email protected]> wrote:
> LKM(loadable kernel module) was first introduced for drivers. Users
> rarely need to talk to the modules directly. If does, several methods
> are available now, such as /proc file, interruption, etc. However,
> these interfaces are predefined, which makes the communication between
> user space and kernel space quite restricted.

Did you already have a look at e.g. http://lwn.net/Kernel/LDD3/ for
suggestions of alternatives for communication between userspace and
kernel modules ? Alternatives to system calls are e.g. ioctl's, memory
mapped I/O and sockets. The last alternative is used by udevd
(PF_NETLINK).

Bart.

2008-07-07 09:35:30

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Suggestion: LKM should be able to add system call for itself


On Monday 2008-07-07 07:09, Jinkai Gao wrote:

>LKM(loadable kernel module) was first introduced for drivers. Users
>rarely need to talk to the modules directly. If does, several methods
>are available now, such as /proc file, interruption, etc. However,
>these interfaces are predefined, which makes the communication between
>user space and kernel space quite restricted.

And that is good -- I certainly do not want something to step out of
bounds by accident or intention.

>Of course, for driver modules, these mechanisms are enough. But as
>long as it is called Loadable Kernel Module instead of Loadable Kernel
>Driver, I think it should be able to do more than that. For example,
>LSM(linux security module),most of which(selinux, apparmor, etc.) use
>policy files as their core. Users write policy files, LSM make access
>control decision based on the files. Seems like users don't need to
>talk to LSM directly. But what if user want to temporarily disable a
>role or capability he is holding ? Not much he can do, isn't
>it(although nothing is impossible, making a new system call makes much
>more sense).

I do not see what a syscall will buy over a "switch file" in procfs or
sysfs.

>So The LKM should be able to define its own user interface
>by adding new system call for itself.

And the point is? Why cannot it use, say, a character device?

>And actually, it is not hard to
>implement such kind of dynamic system call table as I thought it
>through.

It is. You do not know what number your syscall will get. And if
you knew, it might just happen that this specific number is taken
in the next iteration in the Linux kernel.

>There was time when people can modify the sys_call_table[],which has
>been forbidden since it was realized as extremly dangrous operation.
>But thing can be implemented in a safe way. Kernel may provide
>registration function like this:
>
>typedef int (*syscall_func_t)(struct pt_regs regs);
>int syscall_register(char* name, syscall_func_t sys_call);
>
>So that modules can add their own system call without affect the
>original sys_call_table[]. And since the system call number will be
>unpredictable, either we let users know the number,

Letting the user know does not help you. Binaries are already compiled
with the syscall numbers in, and recompiling is not feasible even
if you could.

It is pointless.

2008-07-07 12:12:19

by Jinkai Gao

[permalink] [raw]
Subject: Re: Suggestion: LKM should be able to add system call for itself

On Mon, Jul 7, 2008 at 3:01 AM, Arjan van de Ven <[email protected]> wrote:
> On Mon, 7 Jul 2008 01:09:30 -0400
> "Jinkai Gao" <[email protected]> wrote:
>
>> Above is to demonstrate that LKM is extension to kernel, and the
>> system calls should be able to extend as long as the kernel is
>> extending. So The LKM should be able to define its own user interface
>> by adding new system call for itself.
>
> Since we promise a stable ABI to userspace, this is a bit of a problem.
>
> But... look today, we already have various system calls implemented by
> modules. (example: sys_nfsservctl)
> but to make it fully dynamic? Not a good idea... nobody would be able
> to program to it.

Why? Using the interface we provide to add and delete system call (the
module can only unregister the system calls registered by itself), all
the existing system calls will be the same. It is just you can have
more system calls then you need, That shouldn't be a problem.

2008-07-07 12:37:18

by Jinkai Gao

[permalink] [raw]
Subject: Re: Suggestion: LKM should be able to add system call for itself

On Mon, Jul 7, 2008 at 4:40 AM, Bart Van Assche
<[email protected]> wrote:
> On Mon, Jul 7, 2008 at 7:09 AM, Jinkai Gao <[email protected]> wrote:
>> LKM(loadable kernel module) was first introduced for drivers. Users
>> rarely need to talk to the modules directly. If does, several methods
>> are available now, such as /proc file, interruption, etc. However,
>> these interfaces are predefined, which makes the communication between
>> user space and kernel space quite restricted.
>
> Did you already have a look at e.g. http://lwn.net/Kernel/LDD3/ for
> suggestions of alternatives for communication between userspace and
> kernel modules ? Alternatives to system calls are e.g. ioctl's, memory
> mapped I/O and sockets.

Yes, all kinds of alternatives exist. But they are alternatives
anyway, which are tricky ways to do things when you can't find a
reasonable ways. Actually,to communication between userspace and
kernel modules, all I need is a interface with two parameters, all the
system calls can be implemented out of that. So basically you can
write every system call using something like ioctl. But ioctl is not
designed for generic purpose after all.

Why the number of system calls is growing? because the kernel is
growing. why we don't use the alternatives to implement the new need
for system calls? Because it doesn't make any sense. We can't ignore
the kernel modules' need for system calls just because they are
loadable.

2008-07-07 14:01:06

by Jinkai Gao

[permalink] [raw]
Subject: Re: Suggestion: LKM should be able to add system call for itself

On Mon, Jul 7, 2008 at 5:35 AM, Jan Engelhardt <[email protected]> wrote:
>
> On Monday 2008-07-07 07:09, Jinkai Gao wrote:
>
>>LKM(loadable kernel module) was first introduced for drivers. Users
>>rarely need to talk to the modules directly. If does, several methods
>>are available now, such as /proc file, interruption, etc. However,
>>these interfaces are predefined, which makes the communication between
>>user space and kernel space quite restricted.
>
> And that is good -- I certainly do not want something to step out of
> bounds by accident or intention.
>
>>Of course, for driver modules, these mechanisms are enough. But as
>>long as it is called Loadable Kernel Module instead of Loadable Kernel
>>Driver, I think it should be able to do more than that. For example,
>>LSM(linux security module),most of which(selinux, apparmor, etc.) use
>>policy files as their core. Users write policy files, LSM make access
>>control decision based on the files. Seems like users don't need to
>>talk to LSM directly. But what if user want to temporarily disable a
>>role or capability he is holding ? Not much he can do, isn't
>>it(although nothing is impossible, making a new system call makes much
>>more sense).
>
> I do not see what a syscall will buy over a "switch file" in procfs or
> sysfs.
>
>>So The LKM should be able to define its own user interface
>>by adding new system call for itself.
>
> And the point is? Why cannot it use, say, a character device?

Please refer to my reply to Bart.

>>And actually, it is not hard to
>>implement such kind of dynamic system call table as I thought it
>>through.
>
> It is. You do not know what number your syscall will get. And if
> you knew, it might just happen that this specific number is taken
> in the next iteration in the Linux kernel.

You are right. So we can use ascii name instead of number to identify
the system call. Kernel will match the function with the name.To have
backward compatibility, number should still be supported. Yes, it is
not as easy as I thought, but as long as it is valuable and doable, we
should have a try, right?

>>There was time when people can modify the sys_call_table[],which has
>>been forbidden since it was realized as extremly dangrous operation.
>>But thing can be implemented in a safe way. Kernel may provide
>>registration function like this:
>>
>>typedef int (*syscall_func_t)(struct pt_regs regs);
>>int syscall_register(char* name, syscall_func_t sys_call);
>>
>>So that modules can add their own system call without affect the
>>original sys_call_table[]. And since the system call number will be
>>unpredictable, either we let users know the number,
>
> Letting the user know does not help you. Binaries are already compiled
> with the syscall numbers in, and recompiling is not feasible even
> if you could.

I was wrong, number is useless here.

> It is pointless.
>



--
Syracuse University
Jinkai Gao

2008-07-07 14:16:32

by Bart Van Assche

[permalink] [raw]
Subject: Re: Suggestion: LKM should be able to add system call for itself

On Mon, Jul 7, 2008 at 2:36 PM, Jinkai Gao <[email protected]> wrote:
> Why the number of system calls is growing? because the kernel is
> growing. why we don't use the alternatives to implement the new need
> for system calls? Because it doesn't make any sense. We can't ignore
> the kernel modules' need for system calls just because they are
> loadable.

Arjan and Jan have already explained in detail why adding system calls
dynamically is troublesome.

Bart.

2008-07-07 14:17:20

by Josh Boyer

[permalink] [raw]
Subject: Re: Suggestion: LKM should be able to add system call for itself

On Mon, 2008-07-07 at 10:00 -0400, Jinkai Gao wrote:
> On Mon, Jul 7, 2008 at 5:35 AM, Jan Engelhardt <[email protected]> wrote:
> >
> > On Monday 2008-07-07 07:09, Jinkai Gao wrote:
> >
> >>LKM(loadable kernel module) was first introduced for drivers. Users
> >>rarely need to talk to the modules directly. If does, several methods
> >>are available now, such as /proc file, interruption, etc. However,
> >>these interfaces are predefined, which makes the communication between
> >>user space and kernel space quite restricted.
> >
> > And that is good -- I certainly do not want something to step out of
> > bounds by accident or intention.
> >
> >>Of course, for driver modules, these mechanisms are enough. But as
> >>long as it is called Loadable Kernel Module instead of Loadable Kernel
> >>Driver, I think it should be able to do more than that. For example,
> >>LSM(linux security module),most of which(selinux, apparmor, etc.) use
> >>policy files as their core. Users write policy files, LSM make access
> >>control decision based on the files. Seems like users don't need to
> >>talk to LSM directly. But what if user want to temporarily disable a
> >>role or capability he is holding ? Not much he can do, isn't
> >>it(although nothing is impossible, making a new system call makes much
> >>more sense).
> >
> > I do not see what a syscall will buy over a "switch file" in procfs or
> > sysfs.
> >
> >>So The LKM should be able to define its own user interface
> >>by adding new system call for itself.
> >
> > And the point is? Why cannot it use, say, a character device?
>
> Please refer to my reply to Bart.
>
> >>And actually, it is not hard to
> >>implement such kind of dynamic system call table as I thought it
> >>through.
> >
> > It is. You do not know what number your syscall will get. And if
> > you knew, it might just happen that this specific number is taken
> > in the next iteration in the Linux kernel.
>
> You are right. So we can use ascii name instead of number to identify
> the system call. Kernel will match the function with the name.To have
> backward compatibility, number should still be supported. Yes, it is
> not as easy as I thought, but as long as it is valuable and doable, we
> should have a try, right?

So you have to search a list of strings using strcmp to determine what
syscall is being called? That would be horrible for performance.

josh

2008-07-07 14:29:52

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Suggestion: LKM should be able to add system call for itself

On Mon, 7 Jul 2008 08:12:00 -0400
"Jinkai Gao" <[email protected]> wrote:

> On Mon, Jul 7, 2008 at 3:01 AM, Arjan van de Ven
> <[email protected]> wrote:
> > On Mon, 7 Jul 2008 01:09:30 -0400
> > "Jinkai Gao" <[email protected]> wrote:
> >
> >> Above is to demonstrate that LKM is extension to kernel, and the
> >> system calls should be able to extend as long as the kernel is
> >> extending. So The LKM should be able to define its own user
> >> interface by adding new system call for itself.
> >
> > Since we promise a stable ABI to userspace, this is a bit of a
> > problem.
> >
> > But... look today, we already have various system calls implemented
> > by modules. (example: sys_nfsservctl)
> > but to make it fully dynamic? Not a good idea... nobody would be
> > able to program to it.
>
> Why? Using the interface we provide to add and delete system call (the
> module can only unregister the system calls registered by itself), all
> the existing system calls will be the same. It is just you can have
> more system calls then you need, That shouldn't be a problem.

but when the kernel later adds new ones.. overlap.

Really.. it's not hard to do this. Look at nfs etc. You CAN do this,
just you need to reserve your system call number officially (and create
a manpage for it describing what it does)... and then it doesn't really
matter if it's module or vmlinux who provides it. Again.. nfs has
solved this.



--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-07-07 14:42:26

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Suggestion: LKM should be able to add system call for itself


On Monday 2008-07-07 14:36, Jinkai Gao wrote:
>
>Yes, all kinds of alternatives exist. But they are alternatives
>anyway, which are tricky ways to do things when you can't find a
>reasonable ways. Actually,to communication between userspace and
>kernel modules, all I need is a interface with two parameters, all the
>system calls can be implemented out of that. So basically you can
>write every system call using something like ioctl. But ioctl is not
>designed for generic purpose after all.

Two parameters? (I take it, syscall number and a pointer to data.)
ioctl can do the same. It takes a number and an unsigned long
(sufficent for a pointer) to data. I could do the same over a cdev, a
32-bit quantity serving as a number and a 64-bit quantity as a
pointer. The same holds for netlink. There are endless ways to pass
on bits into the kernel, and be it an ICMP packet. Once you have the
addresses, you can use copy_from_user(), and be done. That still
does not say why syscalls are better than ioctl or netlink.

Well, nl and cdevs become especially handy when passing on more data
than just a pointer... usually you can do away with the pointer
indirection, e.g.

struct timespec n;
syscall(123, &n);
vs ioctl(somefd, 123, &n);
vs write(cdevfd, &n, sizeof(n));
vs nlmsg_write(it's not so easy in NL after all);

hooray. Win = 0.

>Why the number of system calls is growing?

It is not. The reiser4() system call was just as heavily debated
because it just did not seem to fit.

>because the kernel is
>growing. why we don't use the alternatives to implement the new need
>for system calls? Because it doesn't make any sense.

It does not make any sense to discuss here. Each task to achieve has
a specific preferred method (syscalls, cdev, libnl, ioctl) to do it
over.


Maybe syscalls have been ok 20+ years ago. Maybe people still don't
know cdevs or netlink because they are submerged in teaching DOS
semantics only.
If syscalls were so übergreat, then /dev would be a lot less populated:
Where's my nvidiactl() syscall?

Subject: Re: Suggestion: LKM should be able to add system call for itself

On Mon, 07 Jul 2008 10:16:51 -0400
Josh Boyer <[email protected]> wrote:

> > You are right. So we can use ascii name instead of number to
> > identify the system call. Kernel will match the function with the
> > name.To have backward compatibility, number should still be
> > supported. Yes, it is not as easy as I thought, but as long as it
> > is valuable and doable, we should have a try, right?
>
> So you have to search a list of strings using strcmp to determine what
> syscall is being called? That would be horrible for performance.
>
> josh
>

Actually it isn't that bad if you do it like dlsym()/dlopen() do it in
userspace. That is, have the system linker fill in dynamic syscalls,
possibly in a separate ELF section. This way you could version syscalls.

Furthermore, it may make sense to implement all syscalls through glibc,
so that the burden of maintaining obsolete/modified syscalls does not
fall onto the kernel. This already happens for most syscalls, but the
rest (mostly those Linux-specific) still rely on syscall numbers
defined as macros.

But that still will _not_ solve the problem, because:
- there are users which will only use older libc versions
- there are statically linked executables
- the modified/new syscall might not provide the same behavior, even
when used through a compatibility (glibc) wrapper

IOW, this problem can be reduced to any other instance where protocols
or APIs get changed. This usually isn't a problem, but the kernel can't
afford bloat to maintain compatibility.

I hope this makes the issue more clear.


Cheers,
Eduard