I have implemented SSI (single system image) clustering extensions to
Linux kernel in the form of a loadable module.
It roughly mimics OpenMosix model of deputy/remote split (migrated
processes leave a stub on the node where they were born and depend on
the "home" node for IO).
The implementation shares no code with Mosix/Open Mosix (was written
from scratch), is much smaller, and is easily portable to multiple
architectures.
We are considering publication of this code and forming an open source
project around it.
I have two questions to the community:
1) Is community interested in using this code? Do users require SSI
product in the era when everybody is talking about partitioning of
machines and not clustering?
2) Are kernel maintainers interested in clustering extensions to Linux
kernel? Do they see any value in them? (Our code does not require kernel
changes, but we are willing to submit it for inclusion if there is
interest.)
Please CC me and the list when replying.
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
Ar Llu, 2006-10-16 am 14:49 +0200, ysgrifennodd Constantine Gavrilov:
> 2) Are kernel maintainers interested in clustering extensions to Linux
> kernel? Do they see any value in them? (Our code does not require kernel
> changes, but we are willing to submit it for inclusion if there is
> interest.)
If they are doing SSI well and do not need core kernel changes then yes
they sound very interesting to me. Historically the big concern has
always been that things like this muck up the kernel core which affects
the other 99.99999% of users who don't want SSI clustering.
Alan
El Mon, 16 Oct 2006 14:49:39 +0200,
Constantine Gavrilov <[email protected]> escribi?:
> 1) Is community interested in using this code? Do users require SSI
> product in the era when everybody is talking about partitioning of
> machines and not clustering?
Why not, I certainly like the idea of partitioning my machine with
lots of XEN VMs and then joining all their power with SSI!
(couldn't resist it sorry)
Alan Cox wrote:
>Ar Llu, 2006-10-16 am 14:49 +0200, ysgrifennodd Constantine Gavrilov:
>
>
>>2) Are kernel maintainers interested in clustering extensions to Linux
>>kernel? Do they see any value in them? (Our code does not require kernel
>>changes, but we are willing to submit it for inclusion if there is
>>interest.)
>>
>>
>
>If they are doing SSI well and do not need core kernel changes then yes
>they sound very interesting to me. Historically the big concern has
>always been that things like this muck up the kernel core which affects
>the other 99.99999% of users who don't want SSI clustering.
>
>Alan
>
>
>
>
SSI intrudes kernel in two places: a) IO system calls, b ) page fault
code for shared memory pages.
a) IO system calls are "packed" and forwarded to the "home" node, where
original syscall code is executed.
b) A hook is inserted into page fault code that brings shared memory
pages from other nodes when necessary.
Apart from these two hooks, SSI code is a "standalone" kernel API add-on
("add", not "change").
Currently, we can do both "intrusions" from the kernel module. I assume
that if we submit code, you will require a kernel patch that explicitly
calls our hooks.
Also, continuous SSI in-kernel support may require SSI changes in the
following cases: a) new fields in task struct that reflect process state
(may affect task migration), b) changes in the page fault mechanism (may
effect SSI shared memory code that brings and invalidates pages), c)
addition of new system calls (may require implementation of SSI
suspport for them).
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
On 16/10/06, Constantine Gavrilov <[email protected]> wrote:
> I have implemented SSI (single system image) clustering extensions to
> Linux kernel in the form of a loadable module.
>
> It roughly mimics OpenMosix model of deputy/remote split (migrated
> processes leave a stub on the node where they were born and depend on
> the "home" node for IO).
>
> The implementation shares no code with Mosix/Open Mosix (was written
> from scratch), is much smaller, and is easily portable to multiple
> architectures.
>
> We are considering publication of this code and forming an open source
> project around it.
>
> I have two questions to the community:
>
> 1) Is community interested in using this code? Do users require SSI
> product in the era when everybody is talking about partitioning of
> machines and not clustering?
Some users require SSI clustering and some just like playing with it.
In any case, more options than those available currently can only be
good :)
> 2) Are kernel maintainers interested in clustering extensions to Linux
> kernel? Do they see any value in them? (Our code does not require kernel
> changes, but we are willing to submit it for inclusion if there is
> interest.)
>
I'm sure there's interrest in at least seeing it.
You should consider cleaning up your code according to
Documentation/CodingStyle first though (if it doesn't already follow
it) or your first batch of feedback is probably just going to be a
bunch of style cleanup requests ;)
--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
Constantine Gavrilov wrote:
> I have implemented SSI (single system image) clustering extensions to
> Linux kernel in the form of a loadable module.
>
> It roughly mimics OpenMosix model of deputy/remote split (migrated
> processes leave a stub on the node where they were born and depend on
> the "home" node for IO).
>
> The implementation shares no code with Mosix/Open Mosix (was written
> from scratch), is much smaller, and is easily portable to multiple
> architectures.
>
> We are considering publication of this code and forming an open source
> project around it.
>
> I have two questions to the community:
>
> 1) Is community interested in using this code? Do users require SSI
> product in the era when everybody is talking about partitioning of
> machines and not clustering?
> 2) Are kernel maintainers interested in clustering extensions to Linux
> kernel? Do they see any value in them? (Our code does not require kernel
> changes, but we are willing to submit it for inclusion if there is
> interest.)
>
> Please CC me and the list when replying.
>
I am interested in seeing the changes. I am right now working on getting parts of OpenSSI (http://www.openssi.org)
changes merged upstream. Bruce Walker of the OpenSSI project have a design of implementing cluster wide procs. The
same doc can be found on http://www.openssi.org website. The paper talks about how to implement cluster wide proccess model
without requiring home/deputy concept. But yes it require some core kernel changes. But should be Conditionally enabled
like selinux. So overhead for non cluster users should be nill.
Regarding my work you can see the status here
http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=summary
It only gets the ICS changes. That means it introduce a transport independent kernel cluster framework. Right now it supports two interconnect IPV4 and infiniband verbs.
I am planning on taking the CFS changes. That should bring in clusterwide shared memory too. The way it was done in OpenSSI
was to hook a new nopage() function for CFS so that when we page fault, we bring the pages from other node.So i am not sure whether
one need a VM hook for getting clusterwide shared memory. But without seeing the code i am clueless.
-aneesh
Please see inline...
Aneesh Kumar K.V wrote:
>
> I am interested in seeing the changes. I am right now working on
> getting parts of OpenSSI (http://www.openssi.org)
> changes merged upstream. Bruce Walker of the OpenSSI project have a
> design of implementing cluster wide procs. The
> same doc can be found on http://www.openssi.org website. The paper talks
> about how to implement cluster wide proccess model
> without requiring home/deputy concept. But yes it require some core
> kernel changes. But should be Conditionally enabled
> like selinux. So overhead for non cluster users should be nill.
I am personally not interested in making intrusive kernel changes even
if it yields in true "single-system image". I want very small changes
(preferrably none).
>
> Regarding my work you can see the status here
> http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=summary
>
> It only gets the ICS changes. That means it introduce a transport
> independent kernel cluster framework. Right now it supports two
> interconnect IPV4 and infiniband verbs.
We also have transport abstraction layer and transport plugins for
TCP/IP, SDP (Infiniband and possibly others), and SCI (Dolphin).
> I am planning on taking the CFS changes. That should bring in
> clusterwide shared memory too. The way it was done in OpenSSI
> was to hook a new nopage() function for CFS so that when we page
> fault, we bring the pages from other node.So i am not sure whether
> one need a VM hook for getting clusterwide shared memory. But without
> seeing the code i am clueless.
>
Nopage will be called if there is no pte. That means, with just nopage
you cannot implement RO-RW transition. If you use nopage only, you
cannot have multiple readers, because you cannot invalidate all other
readers if one reader goes read-write. Thus nopage allows single reader
or single writer whle the page fault hook allows multiple readers and
single writer.
--
----------------------------------------
Constantine Gavrilov
Kernel Developer
Qlusters Software Ltd
1 Azrieli Center, Tel-Aviv
Phone: +972-3-6081977
Fax: +972-3-6081841
----------------------------------------
Constantine Gavrilov wrote:
> Please see inline...
>
> Aneesh Kumar K.V wrote:
>
>>
>> I am interested in seeing the changes. I am right now working on
>> getting parts of OpenSSI (http://www.openssi.org)
>> changes merged upstream. Bruce Walker of the OpenSSI project have a
>> design of implementing cluster wide procs. The
>> same doc can be found on http://www.openssi.org website. The paper talks
>> about how to implement cluster wide proccess model
>> without requiring home/deputy concept. But yes it require some core
>> kernel changes. But should be Conditionally enabled
>> like selinux. So overhead for non cluster users should be nill.
>
> I am personally not interested in making intrusive kernel changes even
> if it yields in true "single-system image". I want very small changes
> (preferrably none).
>
>>
>> Regarding my work you can see the status here
>> http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=summary
>>
>> It only gets the ICS changes. That means it introduce a transport
>> independent kernel cluster framework. Right now it supports two
>> interconnect IPV4 and infiniband verbs.
>
> We also have transport abstraction layer and transport plugins for
> TCP/IP, SDP (Infiniband and possibly others), and SCI (Dolphin).
I would really like to see this code. Is it available on the web some where ?
>
>> I am planning on taking the CFS changes. That should bring in
>> clusterwide shared memory too. The way it was done in OpenSSI
>> was to hook a new nopage() function for CFS so that when we page
>> fault, we bring the pages from other node.So i am not sure whether
>> one need a VM hook for getting clusterwide shared memory. But without
>> seeing the code i am clueless.
>>
> Nopage will be called if there is no pte. That means, with just nopage
> you cannot implement RO-RW transition. If you use nopage only, you
> cannot have multiple readers, because you cannot invalidate all other
> readers if one reader goes read-write. Thus nopage allows single reader
> or single writer whle the page fault hook allows multiple readers and
> single writer.
>
With the nopage changes CFS also have a token based synchronization mechanism.
The token code now works at the granularity of file. But very well can be made to work with file data ranges (range tokens).
To kind of explain how it works with CFS, When you want to map a page you request token server and token server grant
you token. Now somebody when trying to do a write map will result in the above mapped page being unmapped and shipped to the
other node. So as long as all are readers there can be multiple nodes sharing the same page.
When one of the node is in writer mode, Then token server will force to unmap the page from other nodes when ever there
is a page access.
All these code is opensourced as a part of OpenSSI project. BTW I am only saying that the code at http://www.openssi.org may be
of interest to you.
-aneesh
Hi!
> I have implemented SSI (single system image) clustering extensions to
> Linux kernel in the form of a loadable module.
>
> It roughly mimics OpenMosix model of deputy/remote split (migrated
> processes leave a stub on the node where they were born and depend on
> the "home" node for IO).
>
> The implementation shares no code with Mosix/Open Mosix (was written
> from scratch), is much smaller, and is easily portable to multiple
> architectures.
>
> We are considering publication of this code and forming an open source
> project around it.
>
> I have two questions to the community:
>
> 1) Is community interested in using this code? Do users require SSI
> product in the era when everybody is talking about partitioning of
> machines and not clustering?
Yes... Remember that some people run hypervisors to enable process
migration.
> 2) Are kernel maintainers interested in clustering extensions to Linux
> kernel? Do they see any value in them? (Our code does not require kernel
> changes, but we are willing to submit it for inclusion if there is
> interest.)
I'd say so.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html