2007-11-15 16:42:35

by Ulrich Drepper

[permalink] [raw]
Subject: [PATCH 0/4] sys_indirect system call

The following patches provide an alternative implementation of the
sys_indirect system call which has been discussed a few times.
This no system call allows us to extend existing system call
interfaces with adding more system calls.

Davide's previous implementation is IMO far more complex than
warranted. This code here is trivial, as you can see. I've
discussed this approach with Linus last week and for a brief moment
we actually agreed on something.

We pass an additional block of data to the kernel, it is copied into
the task_struct, and then it is up to the function implementing the system
call to interpret the data. No attempt is made to catch invalid calls.
This is really the same issue as with invalid system call parameters.
The proposed code only rejects:

- recursive calls to sys_indirect
- too large additional blocks

It is debatable whether we should check whether the indirectly called
system call actually needs or supports additional parameters. This would
require a new table for each architecture or a switch statement in
sys_indirect. Linus expressed concerns about cache pollution and I don't
see how making the indirect call without a test can cause problems so
I decided against such checks.

The code for x86 and x86-64 gets by without a single line of assembly
code. This is likely to be true for most/all the other archs as well.
There is architecture-dependent code, though. For x86 and x86-64 I've
also fixed up UML (although only x86-64 is tested, that's my setup).

The last patch shows the first application of the functionality. It is by
far not complete, more will follow, but I want to see how these patches are
received before I spend more time on it. This code is enough to test the
implementation with the following test program. Adjust it for architectures
other than x86 and x86-64.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <unistd.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/syscall.h>

typedef uint32_t __u32;
typedef uint64_t __u64;

union indirect_params {
struct {
int flags;
} file_flags;
};

#ifdef __x86_64__
# define __NR_indirect 286
struct indirect_registers {
__u64 rax;
__u64 rdi;
__u64 rsi;
__u64 rdx;
__u64 r10;
__u64 r8;
__u64 r9;
};
#elif defined __i386__
# define __NR_indirect 325
struct indirect_registers {
__u32 eax;
__u32 ebx;
__u32 ecx;
__u32 edx;
__u32 esi;
__u32 edi;
__u32 ebp;
};
#else
# error "need to define __NR_indirect and struct indirect_params"
#endif

#define FILL_IN(var, values...) \
var = (struct indirect_registers) { values }

int
main (void)
{
int fd = socket (AF_INET, SOCK_DGRAM, IPPROTO_IP);
int s1 = fcntl (fd, F_GETFD);
printf ("old: FD_CLOEXEC %s set\n", s1 == 0 ? "not" : "is");
close (fd);

union indirect_params i;
i.file_flags.flags = O_CLOEXEC;

struct indirect_registers r;
FILL_IN (r, __NR_socket, AF_INET, SOCK_DGRAM, IPPROTO_IP);

fd = syscall (__NR_indirect, &r, &i, sizeof (i));
int s2 = fcntl (fd, F_GETFD);
printf ("new: FD_CLOEXEC %s set\n", s2 == 0 ? "not" : "is");
close (fd);

return s1 != 0 || s2 == 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Signed-off-by: Ulrich Drepper <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

arch/um/Makefile | 2 +-
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/kernel/syscall_table_32.S | 1 +
include/asm-um/indirect-i386.h | 6 ++++++
include/asm-um/indirect-x86_64.h | 10 ++++++++++
include/asm-x86/indirect.h | 5 +++++
include/asm-x86/indirect_32.h | 27 +++++++++++++++++++++++++++
include/asm-x86/indirect_64.h | 30 ++++++++++++++++++++++++++++++
include/asm-x86/unistd_32.h | 3 ++-
include/asm-x86/unistd_64.h | 2 ++
include/linux/indirect.h | 13 +++++++++++++
include/linux/sched.h | 4 ++++
include/linux/syscalls.h | 3 +++
kernel/Makefile | 2 +-
kernel/indirect.c | 25 +++++++++++++++++++++++++
net/socket.c | 21 +++++++++++++--------
16 files changed, 144 insertions(+), 11 deletions(-)


2007-11-15 22:20:16

by Zach Brown

[permalink] [raw]
Subject: Re: [PATCH 0/4] sys_indirect system call

Ulrich Drepper wrote:
> The following patches provide an alternative implementation of the
> sys_indirect system call which has been discussed a few times.
> This no system call allows us to extend existing system call
> interfaces with adding more system calls.

I might quarrel with some details, but I really like the general notion.

I think we can use this to pass per-syscall syslet data to the
scheduler. Today the patches have submission syscalls which basically
wrap some data for the scheduler around the syscall ABI. This might
save the syslet patches from needing to know about each arch's ABI.

We'll see. I'll give it a try.

- z

2007-11-15 22:34:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/4] sys_indirect system call



On Thu, 15 Nov 2007, Zach Brown wrote:
>
> I think we can use this to pass per-syscall syslet data to the
> scheduler.

Yes, I mentioned this to Ulrich as one of the things that would make
sense.

Uli doesn't care that much about async syscalls, but I think that from a
kernel standpoint, we'd want to use this same indirect call for async
scheduling, rather than have two separate interfaces (because async
scheduling will want to have all the same flags intefaces for
open/socket/etc *too* and a doubly-indirect setup would be insanity!)

And it definitely fits the bill as a really simple syslet model for the
trivial case of doing just single system calls asynchronously.

Linus

2007-11-15 23:31:33

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH 0/4] sys_indirect system call

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Linus Torvalds wrote:
> Uli doesn't care that much about async syscalls,

At the moment I just care a lot more about getting the API straightened
out. An asynchronous incomplete/unsafe API is still an
incomplete/unsafe API.

BTW, I've botched the x86-on-x86_64 support. I have a patch but need to
patch it before I'll submit v3 of the patch set. If you want to work on
the patch and get syslet support going, let me know, I'll send the
latest version.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHPNaj2ijCOnn/RHQRAqNCAJ9eO8px9NTgXBuAR1ueVpQaY3gdnwCgke3g
hqtO6djA2fNZTQS2qDq8w2A=
=AInz
-----END PGP SIGNATURE-----

2007-11-15 23:41:18

by Zach Brown

[permalink] [raw]
Subject: Re: [PATCH 0/4] sys_indirect system call


> BTW, I've botched the x86-on-x86_64 support. I have a patch but need to
> patch it before I'll submit v3 of the patch set. If you want to work on
> the patch and get syslet support going, let me know, I'll send the
> latest version.

I probably won't come around to trying sys_indirect with syslets until
next week, so it's probably fine for me to just wait for you to get v3
out. (and I'm more likely to test native i386 or x86_64 anyway, so.. :))

- z

2007-11-17 09:18:01

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH 0/4] sys_indirect system call

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Linus Torvalds wrote:
> Uli doesn't care that much about async syscalls, but I think that from a
> kernel standpoint, we'd want to use this same indirect call for async
> scheduling,

Note that I added a flags parameter to sys_indirect in the v3 patch.
This should allow you to add additional functionality like syslets
later. Currently a zero value is enforced. In future nonzero values
could also imply that the function takes additional parameters.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHPrDk2ijCOnn/RHQRAks1AJ43zF42Vy2ru2D8X3W13YlzYpazUQCfci37
wTKr35RIViiwkQWNMMCeMdk=
=Gmld
-----END PGP SIGNATURE-----