2007-01-22 08:19:25

by Wink Saville

[permalink] [raw]
Subject: [RFC] Asynchronous Messaging

I have implemented a technique which allows a kernel-space thread
or ISR to communicate with user-space or kernel-space threads
asynchronously and without having to copy data (zero copy).

The solution I came up with I call ACE, Atomic Code Execution. As the
name implies once code starts executing within the ACE environment,
that code is guaranteed to complete before any other code will run.

This is accomplished by allocating a page (or more) of memory which
is executable and mapped into every threads address space. Also, all
ISR entry points are modified to detect if the code that was interrupted
was executing within the ACE page. If it was then the ACE code is
allowed to complete before the ISR continues. This then provides
the guarantee of atomic execution.

Another way to look at it is that it gives user space programs the
capability to disable/enable interrupts thus allowing user space code
to execute the equivalent of spin_lock_irqsave() and
spin_unlock_irqrestore().

I then implemented asynchronous messaging with zero copy by implementing
link list operations within the ACE page, allocating the messages
and auxiliary memory globally using vmalloc and adding the notion of a
mproc (message processor) which encapsulates the a thread
and a queue.

I believe the ACE technique and the mproc idea could be used for several
purposes beyond my desire to write event driven applications. In particular
I could see it as a means of implementing device drivers written in user space
as well as a possible technique for communicating with virtual machines such
as Xen or KVM.

Currently, the proof of concept code runs on an Core 2 Duo. For those that
are interested the code is available as a patch against 2.6.19
at http://www.saville.com/linux/async.

I have been using asynchronous messaging for 4+ years and have found that
it provides very interesting properties, but is hindered because it is not
directly supported by operating systems. I am very interested in getting
feedback on the idea of including asynchronous messaging within the kernel.

Thank you,

Wink Saville


2007-01-22 14:33:27

by Alan

[permalink] [raw]
Subject: Re: [RFC] Asynchronous Messaging

> This is accomplished by allocating a page (or more) of memory which
> is executable and mapped into every threads address space. Also, all
> ISR entry points are modified to detect if the code that was interrupted
> was executing within the ACE page. If it was then the ACE code is
> allowed to complete before the ISR continues. This then provides
> the guarantee of atomic execution.

What if you enter the ISR, pass the point of the check and then another
CPU core hits the ACE space ?

Also how do you handle the case where the code gets stuck in your atomic
pages ?

Alan

2007-01-22 15:57:42

by Wink Saville

[permalink] [raw]
Subject: Re: [RFC] Asynchronous Messaging

On 1/22/07, Alan <[email protected]> wrote:
> > This is accomplished by allocating a page (or more) of memory which
> > is executable and mapped into every threads address space. Also, all
> > ISR entry points are modified to detect if the code that was interrupted
> > was executing within the ACE page. If it was then the ACE code is
> > allowed to complete before the ISR continues. This then provides
> > the guarantee of atomic execution.
>
> What if you enter the ISR, pass the point of the check and then another
> CPU core hits the ACE space ?

If CPU A has passed the point of the check then by definition the lock in
the ACE space that it was holding will have been released and be available
to CPU B, thus there will be no contention and CPU B will proceed to
execute the code within the ACE space.

> Also how do you handle the case where the code gets stuck in your atomic
> pages ?

The code in the ACE space must execute quickly and must never get stuck, the
same rules as any code which holds spin locks. As I envision it the
ACE space is "micro-code" provided by only the kernel and thus is bug
free.

Of course shit happens, for example I use ACE to manipulate shared linked lists.
What happens if a pointer passed to the ACE code caused a page fault.
This will cause the ISR to be reentered and is definitely a problem. But this
can be detected and "fixed-up", i.e. release the spin lock and mark the
faulting code to be killed and not rescheduled.

My proof of concept code does not handle this situation but I believe it
can be handled.

A similar problem might occur if buggy or malicious code were to begin
executing in the "middle" of the ACE space rather than at one of its entry
points. Protection will need to put in place to handle this also. For instance
if N ISR's in a row detect that the ACE space code has never stopped
executing then kill the erroneous thread. Another idea would be to only
allow "approved" code to use ACE.

Wink