LinuxLists.cc - Linux, the microkernel (was Re: latest linus-2.5 BK broken)

2002-06-21 18:10:05

Subject: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

Larry McVoy wrote:
> the first place. It's proactive rather than reactive. And the reason
> I harp on this is that I'm positive (and history supports me 100%)
> that the reactive approach doesn't work, you'll be stuck with it,
> there is no way to "fix" it other than starting over with a new kernel.
> Then we get to repeat this whole discussion in 15 years with one of the
> Linux veterans trying to explain to the NewOS guys that multi threading
> really isn't as cool as it sounds and they should try this other approach.

One point that is missed, I think, is that Linux secretly wants to be a
microkernel.

Oh, I don't mean the strict definition of microkernel, we are continuing
to push the dogma of "do it in userspace" or "do it in process context"
(IOW userspace in the kernel).

Look at the kernel now -- the current kernel is not simply an
event-driven, monolithic program [the tradition kernel design]. Linux
also depends on a number of kernel threads to perform various
asynchronous tasks. We have had userspace agents managing bits of
hardware for a while now, and that trend is only going to be reinforced
with Al's initramfs.

IMO, the trend of the kernel is towards a collection of asynchronous
tasks, which lends itself to high parallelism. Hardware itself is
trending towards playing friendly with other hardware in the system
(examples: TCQ-driven bus release and interrupt coalescing), another
element of parallelism.

I don't see the future of Linux as a twisted nightmare of spinlocks.

Jeff

(I wonder if, shades of the old Linus/Tanenbaum flamewar, I will catch
hell from Linus for mentioning the word "microkernel" :))

2002-06-21 18:58:56

by Cort Dougan

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

That's not a microkernel design philosophy, it's a good OS design
philosophy. If it doesn't _have_ to be in the kernel, it generally
shouldn't be.

I agree with you that Linux is already a loosely connected yet highly
inter-dependent set of asynchronous tasks. That makes for a very difficult
to analyze system.

I don't see Linux being in serious jeopardy in the short-term of becoming
solaris. It only aims at running on 1-4 processors and does a pretty good
job of that. Most sane people realize, as Larry points out, that the
current design will not scale to 64 processors and beyond. That's obvious,
it's not an alarmist or deep statement. The key is to realize that it's
not _meant_ to scale that high right now.

I've done a little work with Larry's suggestion for scaling Linux and it's
very smart in that it solves the problem in a very simple and elegant way.
DEC did the same thing with Galaxy some time ago but they layered it with
so much of their cluster software and OpenVMS that it lost all the
performance that it had gained by being clever. If you want a simple
description of the idea (the way I am working on it), it's a software
version of NORMA.

Linux's sweet spot is 2-4 processors and probably shouldn't try to change.
It's a very hard problem going higher. Many systems have failed in exactly
the same way trying to do that sort of thing. Just cluster a bunch of
those 2-4 processor Linux's (room full of boxes, large 64-way IBM server or
some hybrid) and you have a clean solution.

} Oh, I don't mean the strict definition of microkernel, we are continuing
} to push the dogma of "do it in userspace" or "do it in process context"
} (IOW userspace in the kernel).
}
} Look at the kernel now -- the current kernel is not simply an
} event-driven, monolithic program [the tradition kernel design]. Linux
} also depends on a number of kernel threads to perform various
} asynchronous tasks. We have had userspace agents managing bits of
} hardware for a while now, and that trend is only going to be reinforced
} with Al's initramfs.
}
} IMO, the trend of the kernel is towards a collection of asynchronous
} tasks, which lends itself to high parallelism. Hardware itself is
} trending towards playing friendly with other hardware in the system
} (examples: TCQ-driven bus release and interrupt coalescing), another
} element of parallelism.
}
} I don't see the future of Linux as a twisted nightmare of spinlocks.

2002-06-21 20:26:12

by Daniel Phillips

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

On Friday 21 June 2002 20:46, Cort Dougan wrote:
> I don't see Linux being in serious jeopardy in the short-term of becoming
> solaris. It only aims at running on 1-4 processors and does a pretty good
> job of that. Most sane people realize, as Larry points out, that the
> current design will not scale to 64 processors and beyond. That's obvious,
> it's not an alarmist or deep statement. The key is to realize that it's
> not _meant_ to scale that high right now.

And originally, it was never meant to scale to more than one processor.

--
Daniel

2002-06-22 01:33:29

by Rob Landley

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

On Friday 21 June 2002 02:09 pm, Jeff Garzik wrote:

> One point that is missed, I think, is that Linux secretly wants to be a
> microkernel.
>
> Oh, I don't mean the strict definition of microkernel, we are continuing
> to push the dogma of "do it in userspace" or "do it in process context"
> (IOW userspace in the kernel).
...
>
>
> (I wonder if, shades of the old Linus/Tanenbaum flamewar, I will catch
> hell from Linus for mentioning the word "microkernel" :))

Amateur computer historian piping up...

A microkernel design was actually made to work once, with good performance.
It was about fifteen years ago, in the amiga. Know how they pulled it off?
Commodore used a mutant ultra-cheap 68030 that had -NO- memory management
unit.

No memory protection meant that message passing devolved to "here's a
pointer, please don't eat my data". And it's message passing that kills
microkernels, all that busy work from copying data (or, worse, playing with
page tables) when doing message passing kills your performance and makes the
sucker undebuggable. You wind up jumping through hoops to get access to the
data you need, and at any given point there are three different copies of it
flying through the memory bus getting out of sync with each other and needing
a forest of locks to even TRY to resolve.

In the Linux kernel, even when we have process context we can "reach out and
touch someone" any time we want to. No message passing nightmares, just keep
track of what you're exporting or Al will flame you. :) Lock, diddle the
original, unlock, move on. No copies, no version skew.

A microkernel design WITHOUT message passing is really just an extremely
modular monolithic kernel. Modularization, like object oriented programming,
is cool up until the point you let it turn into a religion. As long as you
don't wind up fighting your design and winding up unable to access your own
data when you really need to, because it's on the wrong side of a relatively
arbitrary boundary, modularization is a good thing.

Rob

2002-06-22 12:41:16

by Roman Zippel

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

Hi,

On Fri, 21 Jun 2002, Larry McVoy wrote:

> On Fri, Jun 21, 2002 at 09:07:10PM -0400, Horst von Brand wrote:
> > Right. If they had designed it for 4/8 CPUs from the start, they would
> > surely have gotten it dead wrong. Just to find out how wrong around now...
>
> I couldn't disagree more. The reason that all the SMP threaded OS's start
> to suck is that managers say "Yeah, one CPU is good but how about 2?" Then
> a year goes by and then they say "Yeah, 2 CPUs are good but how about 4?".
> Etc. So the system is never designed, it is hacked. It's no wonder they
> suck.

That's the important difference here, we have no managers forcing us to
specific goals. We have the time to develop a good solution, we are not
forced to accept a solution which sucks. We have the freedom to constantly
break the kernel and we don't have to maintain backwards compability,
which especially with regard to locking would really suck.

bye, Roman

2002-06-22 01:14:38

by Horst von Brand

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

[Cc:s heavily snipped]
Daniel Phillips <[email protected]> said:
> On Friday 21 June 2002 20:46, Cort Dougan wrote:
> > I don't see Linux being in serious jeopardy in the short-term of becoming
> > solaris. It only aims at running on 1-4 processors and does a pretty good
> > job of that. Most sane people realize, as Larry points out, that the
> > current design will not scale to 64 processors and beyond. That's obvious,
> > it's not an alarmist or deep statement. The key is to realize that it's
> > not _meant_ to scale that high right now.
>
> And originally, it was never meant to scale to more than one processor.

Right. If they had designed it for 4/8 CPUs from the start, they would
surely have gotten it dead wrong. Just to find out how wrong around now...

If 64-way becomes commodity one day in whatever form the hardware people
dream up, Linux will surely follow.
--
Horst von Brand [email protected]
Casilla 9G, Vin~a del Mar, Chile +56 32 672616

2002-06-22 01:23:38

by Larry McVoy

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

On Fri, Jun 21, 2002 at 09:07:10PM -0400, Horst von Brand wrote:
> Right. If they had designed it for 4/8 CPUs from the start, they would
> surely have gotten it dead wrong. Just to find out how wrong around now...

I couldn't disagree more. The reason that all the SMP threaded OS's start
to suck is that managers say "Yeah, one CPU is good but how about 2?" Then
a year goes by and then they say "Yeah, 2 CPUs are good but how about 4?".
Etc. So the system is never designed, it is hacked. It's no wonder they
suck.

My point has always been that if you were told up front that you needed to
hit 2 orders of magnitude more CPUs than you have today, the design you'd
end up with would be very different than the "just hack it some more to get
2x more CPUs".

The interesting thing is to look at the ways you'd deal with a 1024 processors
and then work backwards to see how you scale it down to 1. There is NO WAY
to scale a fine grain threaded system which works on a 1024 system down to
a 1 CPU system, those are profoundly different.

I think you could take the OS cluster idea and scale it up as well as down.
Scaling down is really important, Linux works well in the embedded space,
that is probably the greatest financial success story that Linux has, let's
not screw it up.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2002-06-22 15:10:50

by Alan

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

> A microkernel design was actually made to work once, with good performance.
> It was about fifteen years ago, in the amiga. Know how they pulled it off?
> Commodore used a mutant ultra-cheap 68030 that had -NO- memory management
> unit.

Vanilla 68000 actually. And it never worked well - the UI folks had
to use a library not threads. The fs performance sucked

2002-06-22 18:23:16

by Rob Landley

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

On Saturday 22 June 2002 11:31 am, Alan Cox wrote:
> > A microkernel design was actually made to work once, with good
> > performance. It was about fifteen years ago, in the amiga. Know how they
> > pulled it off? Commodore used a mutant ultra-cheap 68030 that had -NO-
> > memory management unit.
>
> Vanilla 68000 actually. And it never worked well - the UI folks had
> to use a library not threads. The fs performance sucked

I dug through my notes a bit, and the interview I was thinking (with one of
the designers before he died, Jay Minor I think) said that when they did
upgrade to the 68030 (long after the A1000), they specifically comissioned an
MMU-less version (68EC030), and that if they'd had to deal with an MMU in the
first place he doubted they could ever have gotten a microkernel architecture
to work.

Unfortunately, all I have from said interview at the moment are the notes I
took. My first year of computer history research was a learning experience
about how to do research, back before I learned to store the URL the notes
came from with the notes (no, the fact it's in my bookmarks list doesn't mean
I can find it again), and to save pages to my hard drive becaue the links
have been known to go away over time... :)

On a side note, it's fun looking through the tanenbaum-torvalds debate
archive and see all the people holding up the amiga as an example of a
successful microkernel with decent performance, and note the lack of MMU...

Rob

2002-06-22 19:00:45

by Ruth Ivimey-Cook

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

On Sat, 22 Jun 2002, Rob Landley wrote:

>On Saturday 22 June 2002 11:31 am, Alan Cox wrote:
>> > A microkernel design was actually made to work once, with good
>> > performance. It was about fifteen years ago, in the amiga. Know how they
>> > pulled it off? Commodore used a mutant ultra-cheap 68030 that had -NO-
>> > memory management unit.
>>
>> Vanilla 68000 actually. And it never worked well - the UI folks had
>> to use a library not threads. The fs performance sucked

Threads (in the sense of tasks[1]) in fact worked extremely well and very
efficiently on the Amiga, and "Intuition" was always coded as one thread and
was modified use them more widely as the programmers had time and resource to
do so.

>On a side note, it's fun looking through the tanenbaum-torvalds debate
>archive and see all the people holding up the amiga as an example of a
>successful microkernel with decent performance, and note the lack of MMU...

I was very happy indeed with the performance of the computer, given the 0.25
MIPS CPU. The "Exec" scheduler was an extremely good design of its type, as
has been recognised in various places since.

The filesystem of the Amiga was very slow because it was a very definitely
second-best setup; the original Amiga Corp. folks ran out of cash and in the
end the filesystem from another OS, Tripos, was grafted in. Not only was it
not what was originally designed in, but it was written in an
almost-incompatible language (BCPL).

However, I won't argue about MMU vs non-MMU; it was obvious from the start
that any kind of memory protection between tasks would render a great deal of
the system design useless, because the whole system shared memory and
resources. How else did people get away with application footprints 1/5 to
1/10 that of equivalents on Windows?

Regards,

Ruth

[1] Exec only understood "tasks" as the basic scheduling unit; a task could be
extended to become a process if access to the filesystem was required, but
doing so did not change the scheduling cost at all.

--
Ruth Ivimey-Cook
Software engineer and technical writer.

2002-06-22 21:10:00

by jdow

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

From: "Alan Cox" <[email protected]>

> > A microkernel design was actually made to work once, with good performance.
> > It was about fifteen years ago, in the amiga. Know how they pulled it off?
> > Commodore used a mutant ultra-cheap 68030 that had -NO- memory management
> > unit.
>
> Vanilla 68000 actually. And it never worked well - the UI folks had
> to use a library not threads. The fs performance sucked

Some things just cannot be passed by..... The Amiga HAS worked well and
DOES work well - - - FINALLY. (It took several years and a VERY serious
debugging effort with Bill Hawes and Bryce Nesbitt finding and quashing
all manner of bad or missing pointer checks and the like. They made the
OS itself a remarkable work of art.)

You are right, Alan, in that it used a vanilla, slow, 68000 in its original
incarnation. A company named Metacomco generated the "DOS" part of the
system. IMAO they should have been sued for malpractice. The only good feature
the file system had was its resilience. Had it been coded correctly loss of
data would have been hard to achieve short of physical disk problems. Later
incarnations of the file system proved it could be remarkably fast accessing
specific files. Directory listings remain agonizingly slow.

The OS "exec" library is remarkably compact, quick, and resilient. It does
suffer from not using memory protection. However, in a testament to some
Amiga programmers AmigaDOS can survive months of up time with typical single
user loads with its latest incarnation. (My slightly hypertrophied A3000T
sits over there running some applications for me 24x7 quite nicely, thank
you.) This has had me musing about the relative quality of Linux applications
that blithely throw segfaults rather than check for overflows, null pointers,
and the like. I'd NEVER let the typical Linux application touch my Amigas for
two reasons, the crashes are annoying and they mean there are security holes
waiting for exploitation.

Having "everything" in the system a shared library has some advantages for
updating things on the fly without reboots, as is routinely exploited within
the Linux world. A side effect of the way this was implemented yields a
rather endearing Amiga trait, you cannot exceed array boundaries in most of
the OS and shared libraries. Arrays are eschewed in favor of linked lists.

'Tis a shame the idiots who owned and ran the company sucked it dry and
tossed the remains. The OS could wring remarkable performance out of rather
antiquated hardware, well in excess of what Apple could wring out of the
same hardware.

{^_-} I am rather fond of the tool. And I note it has (and in some instances
still) performed admirably in near real-time applications such as
show control (EFX in Las Vegas for one) and telemetry reception and
analysis (at NASA.) To be sure AmigaDOS 1.0 through 1.3 were rather
dreadful. 2.04 was remarkable. No AmigaDOS was EVER even approximately
as bad as an abortion I had to work on called GRiD-OS, however.

2002-06-23 16:11:40

by Sandy Harris

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

Larry McVoy wrote:

> The interesting thing is to look at the ways you'd deal with a 1024 processors
> and then work backwards to see how you scale it down to 1. There is NO WAY
> to scale a fine grain threaded system which works on a 1024 system down to
> a 1 CPU system, those are profoundly different.
>
> I think you could take the OS cluster idea and scale it up as well as down.
> Scaling down is really important, Linux works well in the embedded space,
> that is probably the greatest financial success story that Linux has, let's
> not screw it up.

Assuming we can get 4-way right, methinks Larry's ideas are likely to be a
whole lot easier way to handle a 32 or 64-way box than trying to re-design
the kernel sufficiently to do that well without destroying anything
important in the 1<= nCPU <= 4 case. Especially so because 16 to 64-way
clusters are common as dirt, and we can borrow tested tools. Anything that
works on a 16-box Beowulf ought to adapt nicely to a 64-way box with 16
of Larry's OSlets.

However, it is a lot harder to see that Larry's stuff is the right way
to deal with a 1024-CPU system. At that point, you've got perhaps 256
4-way groups running OSlets. How does communication overhead scale, and
do we have reason to suppose it is tolerable at 1024?

Also, it isn't as clear that clustering experience applies. Are clusters
that size built hierachically? Is a 1024-CPU Beowulf practical, and if so
do you build it as a Beowulf of 32 32-CPU Beowulfs? Is something analogous
required in the OSlet approach? would it work?

2002-06-23 17:29:06

by Jakob Oestergaard

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

On Sun, Jun 23, 2002 at 11:15:53AM -0400, Sandy Harris wrote:
> Larry McVoy wrote:
>
...
> Also, it isn't as clear that clustering experience applies. Are clusters
> that size built hierachically? Is a 1024-CPU Beowulf practical, and if so
> do you build it as a Beowulf of 32 32-CPU Beowulfs? Is something analogous
> required in the OSlet approach? would it work?

Well yes and no. Often the hierarchy is really shallow. A typical
(larger) Beowulf (if such a thing exists) could be ~50 nodes per 100Mbit
switch, heaps of those switches go into (interconnected) gigabit
switches, and that's it. There are *many* 'wulfs out there with just
one or a few switches - but they are not 1024 CPUs either.

Much more specialized interconnects are often used. The SP/2 (IBM) used
something resembling "one big switch", which was in reality a number of
cleverly connected smaller switches (sorry, forgot the topology) - so no
real hierarchy, similar bandwidth and latency between any two nodes an a
several-hundred node cluster.

The "Earth Simulator" (the #1 on http://www.top500.org) is using a one-stage
crossbar for it's 5000+ nodes.

My personal pet theory is, in short, that the hardware stays fairly flat
- not because it is beneficial to (on the contrary!), but because
software assumes that it is flat. The software paradigms in practical
use today have not changed since the early '80s and as long as the
hardware manages to stay "almost flat" that's not going to change.

--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2002-06-23 17:57:01

by John Alvord

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

On Sat, 22 Jun 2002 14:09:30 -0700, "jdow" <[email protected]> wrote:

>From: "Alan Cox" <[email protected]>
>
>> > A microkernel design was actually made to work once, with good performance.
>> > It was about fifteen years ago, in the amiga. Know how they pulled it off?
>> > Commodore used a mutant ultra-cheap 68030 that had -NO- memory management
>> > unit.
>>
>> Vanilla 68000 actually. And it never worked well - the UI folks had
>> to use a library not threads. The fs performance sucked
>
>Some things just cannot be passed by..... The Amiga HAS worked well and
>DOES work well - - - FINALLY. (It took several years and a VERY serious
>debugging effort with Bill Hawes and Bryce Nesbitt finding and quashing
>all manner of bad or missing pointer checks and the like. They made the
>OS itself a remarkable work of art.)

Was that the same Bill Hawes who hung around L-K quashing bugs for a
year or so (maybe 3-4 years ago?)

john alvord

2002-06-23 20:48:20

by jdow

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

From: "John Alvord" <[email protected]>

>On Sat, 22 Jun 2002 14:09:30 -0700, "jdow" <[email protected]> wrote:

>>From: "Alan Cox" <[email protected]>
>>
>>>> A microkernel design was actually made to work once, with good performance.
>>>> It was about fifteen years ago, in the amiga. Know how they pulled it off?
>>>> Commodore used a mutant ultra-cheap 68030 that had -NO- memory management
>>>> unit.
>>>
>>> Vanilla 68000 actually. And it never worked well - the UI folks had
>>> to use a library not threads. The fs performance sucked
>>
>>Some things just cannot be passed by..... The Amiga HAS worked well and
>>DOES work well - - - FINALLY. (It took several years and a VERY serious
>>debugging effort with Bill Hawes and Bryce Nesbitt finding and quashing
>>all manner of bad or missing pointer checks and the like. They made the
>>OS itself a remarkable work of art.)

>Was that the same Bill Hawes who hung around L-K quashing bugs for a
>year or so (maybe 3-4 years ago?)

I believe it was. That is about where I lost track of him. I hope he is
doing well wherever he is. You folks here should have done almost anything
to keep him around.

{^_^}

2002-06-23 21:41:03

by Xavier Bestel

[permalink] [raw]

Subject: [OT] Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

Le sam 22/06/2002 ? 17:31, Alan Cox a ?crit :
> > A microkernel design was actually made to work once, with good performance.
> > It was about fifteen years ago, in the amiga. Know how they pulled it off?
> > Commodore used a mutant ultra-cheap 68030 that had -NO- memory management
> > unit.
>
> Vanilla 68000 actually. And it never worked well - the UI folks had
> to use a library not threads. The fs performance sucked

<troll feeding>
IIRC all simple UI things were done int the "input task" context (the
task moving the mouse pointer, to simplify things) and more heavy duty
had to be offloaded to the right task - using message passing of course.
This was not the intended design, which was to make Intuition a real
device (in the amiga sense, i.e. it could have its own task), but you
know, AmigaOS was a commercial proprietary OS with deadlines and a
complex history. That's why it had a really sucky fs, too (put your
floppy in the drive, type dir, drink a coffee while listening to your
disk being eaten, see the command output one-line-by-second).

Xav

2002-06-24 06:27:06

by Craig I. Hagan

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

> Also, it isn't as clear that clustering experience applies. Are clusters
> that size built hierachically? Is a 1024-CPU Beowulf practical, and if so
> do you build it as a Beowulf of 32 32-CPU Beowulfs? Is something analogous
> required in the OSlet approach? would it work?

a system of that size has many "practical" applications. It *can* be done
without partitioning it into a tree hierarchy, however, you will need a very
capable interconnect (quadrics and myrinet come to mind). Tt that you'll have a
tiered switching hierarchy even if the nodes are presented in a flat layer.

IMHO nearly any level of breakout for grid computing (basically a cluster
hierarchy) starts to become interesting as a function of your app/problem size
and how many simultanous jobs you are running.

Of course, we can stop and hit reality for a second: not many people can afford
a 1024 cpu cluster, hence the proliferation of smaller ones ;)

-- craig

.- ... . -.-. .-. . - -- . ... ... .- --. .

Craig I. Hagan "It's a small world, but I wouldn't want to back it up"
hagan(at)cih.com "True hackers don't die, their ttl expires"
"It takes a village to raise an idiot, but an idiot can raze a village"

Stop the spread of spam, use a sendmail condom!
http://www.cih.com/~hagan/smtpd-hacks

In Bandwidth we trust

2002-06-24 11:10:04

by Eric W. Biederman

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

Sandy Harris <[email protected]> writes:

> Larry McVoy wrote:
>
> > The interesting thing is to look at the ways you'd deal with a 1024 processors
>
> > and then work backwards to see how you scale it down to 1. There is NO WAY
> > to scale a fine grain threaded system which works on a 1024 system down to
> > a 1 CPU system, those are profoundly different.
> >
> > I think you could take the OS cluster idea and scale it up as well as down.
> > Scaling down is really important, Linux works well in the embedded space,
> > that is probably the greatest financial success story that Linux has, let's
> > not screw it up.
>
> Assuming we can get 4-way right, methinks Larry's ideas are likely to be a
> whole lot easier way to handle a 32 or 64-way box than trying to re-design
> the kernel sufficiently to do that well without destroying anything
> important in the 1<= nCPU <= 4 case. Especially so because 16 to 64-way
> clusters are common as dirt, and we can borrow tested tools. Anything that
> works on a 16-box Beowulf ought to adapt nicely to a 64-way box with 16
> of Larry's OSlets.

I wonder sometimes. With a 16 way cluster practically any tool will
work and not give you problems. I don't think many of the tools have
progressed beyond the make it work stage, and into polish yet.

> However, it is a lot harder to see that Larry's stuff is the right way
> to deal with a 1024-CPU system. At that point, you've got perhaps 256
> 4-way groups running OSlets. How does communication overhead scale, and
> do we have reason to suppose it is tolerable at 1024?

The rule is to communicate as little as possible. Because even if you
have a very low latency interconnect, with insane amounts of
bandwidth, it is needed for your application, not for cluster
management services.

> Also, it isn't as clear that clustering experience applies. Are clusters
> that size built hierachically? Is a 1024-CPU Beowulf practical, and if so
> do you build it as a Beowulf of 32 32-CPU Beowulfs? Is something analogous
> required in the OSlet approach? would it work?

A cluster with 960 compute nodes (each 2way) is being built for
Lawrence Livermore National Lab. http://www.llnl.gov/linux/mcr/.
The insane part is the Lustre filesystem is going to be a 32 Node
cluster in and of itself.

So there will be experience out there.

Eric

2002-06-24 13:06:22

by J.A. Magallon

[permalink] [raw]

Subject: Re: Linux, the microkernel (was Re: latest linus-2.5 BK broken)

On 2002.06.24 Craig I. Hagan wrote:
>> Also, it isn't as clear that clustering experience applies. Are clusters
>> that size built hierachically? Is a 1024-CPU Beowulf practical, and if so
>> do you build it as a Beowulf of 32 32-CPU Beowulfs? Is something analogous
>> required in the OSlet approach? would it work?
>
>a system of that size has many "practical" applications. It *can* be done
>without partitioning it into a tree hierarchy, however, you will need a very
>capable interconnect (quadrics and myrinet come to mind). Tt that you'll have a
>tiered switching hierarchy even if the nodes are presented in a flat layer.
>
>IMHO nearly any level of breakout for grid computing (basically a cluster
>hierarchy) starts to become interesting as a function of your app/problem size
>and how many simultanous jobs you are running.
>
>Of course, we can stop and hit reality for a second: not many people can afford
>a 1024 cpu cluster, hence the proliferation of smaller ones ;)
>

You do not have to go so far. Take a simple cluster of dual Xeon boxes (ie,
4 'cpus' per box). Current clustering software (MPI, PVM) is not ready to
handle a 2-level hierarchy, one with slow communications over tcp and a lower
level working as a shared-memory thread-able cluster.
It would not be so strange nowadays (nor too much expensive) to have a 8-16
nodes with 4 cpus each.

--
J.A. Magallon \ Software is like sex: It's better when it's free
mailto:[email protected] \ -- Linus Torvalds, FSF T-shirt
Linux werewolf 2.4.19-pre10-jam3, Mandrake Linux 8.3 (Cooker) for i586
gcc (GCC) 3.1.1 (Mandrake Linux 8.3 3.1.1-0.6mdk)