by Rusty Russell

[permalink] [raw]

Subject: Re: [PATCH 5/6] syslets: add generic syslets infrastructure

On Thursday 10 January 2008 09:58:10 Linus Torvalds wrote:
> On Thu, 10 Jan 2008, Rusty Russell wrote:
> > I'd have to read his original statement, but eventfd doesn't build up
> > state, so I think it qualifies.
>
> How about you guys battle it out by giving an example program usign the
> interface?

Nice idea.

> And *simplicity* of the end result really does matter. If it's not simple
> to use, people won't use it.

Completely agreed, but async is always more complex than sync.

eg. your malloc()-and-overwrite trick here assumes it's serial, similarly
stack vars. Even before anything's happened we've massively increased the
number of mallocs :( Maybe someone else can see a neater way?

Below is an async-ready version (I've assumed you still want each dir grouped
together). It's already slower (best 0m3.842s vs best 0m3.659s):

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <dirent.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>

static void find(const char *base, int baselen);

struct result
{
struct result *next;
struct stat st;
unsigned int namelen;
char name[];
};

static struct result *new_result(const char *base, int baselen,
const char *sub, int sublen)
{
struct result *r;

r = malloc(sizeof(*r) + baselen + sublen + 2);
memcpy(r->name, base, baselen);
r->name[baselen] = '/';
memcpy(r->name + baselen + 1, sub, sublen+1);
r->namelen = baselen + sublen + 1;

return r;
}

static void process(struct result *r, struct result **dirs)
{
printf("%8lu %s\n", r->st.st_size, r->name);
if (S_ISDIR(r->st.st_mode)) {
r->next = *dirs;
*dirs = r;
} else
free(r);
}

/* -1 = fail, 0 = success, 1 = in progress. */
static int handle_async(struct result *r)
{
/* Ours is sync. */
return lstat(r->name, &r->st);
}

static void process_pending(struct result *pending, struct result **dirs)
{
/* Since we're sync, pending will be NULL. Otherwise call
pending as they complete. */
}

static void find(const char *base, int baselen)
{
DIR *dir;
struct result *r, *pending = NULL, *dirs = NULL;

dir = opendir(base);
if (dir) {
struct dirent *de;
while ((de = readdir(dir)) != NULL) {
const char *p = de->d_name;
int len = strlen(p);
if (p[0] == '.') {
if (len == 1)
continue;
if (len == 2 && p[1] == '.')
continue;
}
r = new_result(base, baselen, p, len);
switch (handle_async(r)) {
case 0:
process(r, &dirs);
break;
case -1:
free(r);
break;
case 1:
r->next = pending;
pending = r;
}
}
closedir(dir);
process_pending(pending, &dirs);
while (dirs) {
find(dirs->name, dirs->namelen);
r = dirs;
dirs = dirs->next;
free(r);
}
}
}

int main(int argc, char **argv)
{
find(".",1);
return 0;
}

2008-01-10 05:41:43

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH 5/6] syslets: add generic syslets infrastructure

So my radical ultra tired rant o the week...

Rather than adding sys_indirect and syslets as is,

* admit that this is beginning to look like a new ABI. explore the
freedoms that that avenue opens...

* (even more radical) I wonder what a tiny, SANE register-based
bytecode interface might look like. Have a single page shared between
kernel and userland, for each thread. Userland fills that page with
bytecode, for a virtual machines with 256 registers -- where
instructions roughly equate to syscalls.

The common case -- a single syscall like open(2) -- would be a single
byte bytecode, plus a couple VM register stores. The result is stored
in another VM register.

But this format enables more complex cases, where userland programs can
pass strings of syscalls into the kernel, and let them execute until
some exceptional condition occurs. Results would be stored in VM
registers (or userland addresses stored in VM registers...).

This sort of interface would be
* fast

* equate to the current syscall regime (easy to get existing APIs
going... hopefully equivalent to glibc switching to a strange new
SYSENTER variant)

* be flexible enough to support a simple implementation today

* be flexible enough to enable experiments into syscall parallelism (aka
VM instruction parallelism <grin>)

* be flexible enough to enable experiments into syscall batching

One would probably want to add some simple logic opcodes in addition to
opcodes for syscalls and such -- but this should not turn into Forth or
Parrot or Java :)

Thus, this new ABI can easily and immediately support all existing
syscalls, while enabling

Now to come up with a good programming API and model(s) to match this
parallel, batched kernel<->userland interface...

Jeff, very tired and delirious, so feel free to laugh at this,
but I've been pondering this for a while