2002-01-01 05:02:33

by Rob Landley

[permalink] [raw]
Subject: New Scheduler and Digital Signal Processors?

I've heard several people (including Alan) talk about banging on the
scheduler to make 8-way systems happy, and the first scheduler surgery patch
has apparently been accepted to 2.5 now. Lots of people are talking about a
scheduler patch with a per-processor task queue, so I'd like to ask a
question.

What would be involved in a non-symmetrical multi-processor Linux? Have the
per-processor task queues, but have them be different types of processes
using different machine languages?

The reason I ask is TI has a new chip with a DSP built into it, and DSPs are
eventually bound to replace all the dedicated I/O chips we've got today. A
DSP is just a dedicated I/O processor, they can act like modems, sound and
video, 3D accelerators, USB interface, serial/paralell/joystick, ethernet,
802.11b wireless and bluetooth... Just hook it up to the right output
adapter and feed the DSP the right program and boom, it works.

In theory, you could stick a $5 after-market adapter on the output of your
DSP video to convert from a VGA plug to a TV plug, and reprogram the DSP to
produce a different output signal with a driver upgrade. In software. Ooh.

DSP is just a processor designed to do I/O. It runs a program telling it how
to convert input into output. Ooh. This is half of unix programming, and
why we have the pipe command, only now you can unload each pipe stage on a
dedicated coprocessor, so you can program your sound output chip to do speech
synthesis or decompress MP3's. And latency becomes way less of a problem if
you can dedicate a processor to it. (Think of a DMA channel that can run a
filter program on the data it's transporting, and boom: you've got a DSP.)
In theory, anything you can write as a filter that sits on a pipe could be
done by a DSP, and in the Unix philosophy that's just about everything.

Now combine together every strange niche I/O chip you've got now and make ONE
of the suckers, in massive volume, and think "economies of scale". These
suckers are going to be CHEAP. And low power. The portable market seems to
be drooling over them, and they're already coming embedded into next
generation processors. (A math coprocessor, a built-in DSP... I heard
there's an ARM generation in development that's got 4 DSPs built-in...) A
machine with a lot of DSPs was half of Steve Jobs' "NeXT" box idea...

So, back to the Linux scheduler. Right now our approach to these things is
(if I understand correctly) to feed 'em their program like firmware. Load
the driver, DSP gets its program, and it's dedicated to that task. Okay,
fun. And considering lots of them are hooked up to specific I/O devices at
the other end (an 802.11b antenna, an ethernet jack, etc) that makes sense.

But there's already a company out there (http://www.dsplinux.net, proprietary dudes)
that SEEMS to be treating a DSP like a seperate processor, capable of
scheduling tasks to the DSP (think dynamic DMA channel allocation, I'm not
sure how the electronics work out here: it would make sense to be able to
allocate and deallocate them like any other resource, but this is giving
hardware makers far too much credit). Considering the range of applications
you can have for sound cards alone (be a modem, text to speech, midi, mp3
decompression, mp3 compression during recording, ogg vorbis, etc), wouldn't
it be nice to be able to program DSPs a little more dynamically than "device
driver shows it how to be a sound card"?

Right now, the scheduler has sort of been hacked by some people to have the
concept of "realtime tasks" and "not realtime tasks". But if you think that
in five or ten years we may see machines built ENTIRELY out of DSPs (sort of
like RISC, only more so). The hyper-multi-threading whatsis thing they're
doing with the P4 is sort of like this: they have execution cores linked for
performance and now they're de-linking them because the programmer's better
at finding paralellism than the hardware is.

Think about the 3D accelerator problem. Break your screen up into 16
sections, one DSP sorts the triangles into each quadrant, 16 other DSPs blast
triangles to frame buffer, and then one more DSP is constantly doing a DMA
write to the video output to drive your LCD panel at 70hz. 3D acceleration
becomes a question of having enough DSPs, fast enough, and feeding them the
right software. 80 million triangles per second is the human visual
perception threshold, beyond that nvidia's binary-only drivers can go hang...

Am I totally on the wrong track here? When do we start worrying about this?

Rob

P.S. The appeal of USB largely seems to be "generic DSP spewing data out to
some device with another DSP in it, using a known protocol to communicate and
standard commodity wiring so everything has the same type of plug so you
don't need adapters. And the device on the far end may have a little buffer
if you're lucky". USB is something we queue requests up for right now, but
this strikes me as something the paradigm of being able to schedule tasks to
the DSP might fit? Maybe not as time slices, but perhaps as something like
tasklets?


2002-01-01 07:25:44

by Timothy Covell

[permalink] [raw]
Subject: Re: New Scheduler and Digital Signal Processors?

On Monday 31 December 2001 15:00, Rob Landley wrote:
> I've heard several people (including Alan) talk about banging on the
> scheduler to make 8-way systems happy, and the first scheduler surgery
> patch has apparently been accepted to 2.5 now. Lots of people are talking
> about a scheduler patch with a per-processor task queue, so I'd like to ask
> a question.

[snip]

And could we please have support for one of those FPGA super computers?
They are already in use by the NASA and probably NSA. It ain't fair that
they can crack DES and GOST a million times faster than I can. And I need
to improve my Seti scores. ;-)


This is my way of saying that I'll believe when I see it in the mass market.

--
[email protected].

2002-01-01 10:15:55

by Alan

[permalink] [raw]
Subject: Re: New Scheduler and Digital Signal Processors?

> The reason I ask is TI has a new chip with a DSP built into it, and DSPs are
> eventually bound to replace all the dedicated I/O chips we've got today. A
> DSP is just a dedicated I/O processor, they can act like modems, sound and
> video, 3D accelerators, USB interface, serial/paralell/joystick, ethernet,
> 802.11b wireless and bluetooth... Just hook it up to the right output
> adapter and feed the DSP the right program and boom, it works.

Thats the theory. If you believe it please read up on the history of the
i960 based i2o motherboard PCI controller, the 56K dsp on some Atari
machines and so on.

> of the suckers, in massive volume, and think "economies of scale". These
> suckers are going to be CHEAP. And low power. The portable market seems to

And irrelevant

o An extra chip - expensive to mount and wire
o "But we can do that on the main processor"

etc

>From a scheduling point of view I would expect such a dsp to run a seperate
OS of its own, perhaps the rtlinux core without Linux

2002-01-01 19:34:24

by Davide Libenzi

[permalink] [raw]
Subject: Re: New Scheduler and Digital Signal Processors?

On Tue, 1 Jan 2002, Alan Cox wrote:

> >From a scheduling point of view I would expect such a dsp to run a seperate
> OS of its own, perhaps the rtlinux core without Linux

I agree, i better see this DSPs running their own 'OS' inside their own
'domain' with the main OS talking with them in a more high level interface
more then in terms of binary to run. More, many of these DSPs are usually
handling not-shareable devices so talking about multitasking would have a
very little sense.



- Davide


2002-01-02 07:39:15

by john slee

[permalink] [raw]
Subject: Re: New Scheduler and Digital Signal Processors?

On Mon, Dec 31, 2001 at 04:00:26PM -0500, Rob Landley wrote:
> DSP is just a processor designed to do I/O. It runs a program telling it how
> to convert input into output. Ooh. This is half of unix programming, and
> why we have the pipe command, only now you can unload each pipe stage on a

thats all well and good if whatever is on the output side of the dsp is
the final destination for your data but surely bouncing it around
between all those dsp chips is going to take time...

> dedicated coprocessor, so you can program your sound output chip to do speech
> synthesis or decompress MP3's. And latency becomes way less of a problem if
> you can dedicate a processor to it. (Think of a DMA channel that can run a
> filter program on the data it's transporting, and boom: you've got a DSP.)
> In theory, anything you can write as a filter that sits on a pipe could be
> done by a DSP, and in the Unix philosophy that's just about everything.

i have a high-end-ish soundcard that has four analogdevices SHARC DSPs
on it. using the (unfortunately windows/mac only) software you upload
synth/effects code to it, draw your signal routing on teh screen, then
control it through midi ports on the back of the card... (or with cubase
or similar.)

its a great system, and as you allude to the latency is fantastically
low. for more info, see http://www.creamware.com/. my card is a bit
old (the "pulsar" model) but all of their cards are built on the same
platform. i believe that with their proprietary bus and more expensive
cards you can use up to 45 SHARCs as one huge synth engine.

these days people are doing much the same thing in software (witness
cubase VST, buzz, fruityloops, rebirth, reason, logic, etc etc) but
while today's athlons et al. are getting fast enough for it there's a
few nice aspects of the dsp solution that make it a clear winner (in my
eyes at least). such as not hearing clicks/jitter when you move the
mouse too rapidly...

j.

--
R N G G "Well, there it goes again... And we just sit
I G G G here without opposable thumbs." -- gary larson

2002-01-02 16:07:12

by Jesse Pollard

[permalink] [raw]
Subject: Re: New Scheduler and Digital Signal Processors?

Rob Landley <[email protected]>:
>
> I've heard several people (including Alan) talk about banging on the
> scheduler to make 8-way systems happy, and the first scheduler surgery patch
> has apparently been accepted to 2.5 now. Lots of people are talking about a
> scheduler patch with a per-processor task queue, so I'd like to ask a
> question.
>
> What would be involved in a non-symmetrical multi-processor Linux? Have the
> per-processor task queues, but have them be different types of processes
> using different machine languages?

1. booting
2. interrupts
3. context switching
4. memory management
5. I/O

>
> The reason I ask is TI has a new chip with a DSP built into it, and DSPs are
> eventually bound to replace all the dedicated I/O chips we've got today. A
> DSP is just a dedicated I/O processor, they can act like modems, sound and
> video, 3D accelerators, USB interface, serial/paralell/joystick, ethernet,
> 802.11b wireless and bluetooth... Just hook it up to the right output
> adapter and feed the DSP the right program and boom, it works.

Most of these processors have dedicated memory, not sense of context, and no
understanding of clock interrupts other than for their dedicated design
functions.

For such a configuration you would have to have a kernel OS for each processor,
written in it's custom instruction set.

Otherwise, you have security problems out the wazu.

Most of these processors are NOT general purpose. They are dedicated to
solving ONE specific problem at a time.

> In theory, you could stick a $5 after-market adapter on the output of your
> DSP video to convert from a VGA plug to a TV plug, and reprogram the DSP to
> produce a different output signal with a driver upgrade. In software. Ooh.

Not on a millisecond by millisecond abasis.

> DSP is just a processor designed to do I/O. It runs a program telling it how
> to convert input into output. Ooh. This is half of unix programming, and
> why we have the pipe command, only now you can unload each pipe stage on a
> dedicated coprocessor, so you can program your sound output chip to do speech
> synthesis or decompress MP3's. And latency becomes way less of a problem if
> you can dedicate a processor to it. (Think of a DMA channel that can run a
> filter program on the data it's transporting, and boom: you've got a DSP.)
> In theory, anything you can write as a filter that sits on a pipe could be
> done by a DSP, and in the Unix philosophy that's just about everything.

First two sentences are correct (well.. include the Ooh.). This processor
is running a dedicated program. It doesn't do page faults. It can take a long
time (relatively) to load the program. Unloading a user context is not usually
available.

You have to think of it more as a dedicated vector processor.
One user, One task. Context switching is not possible.

> Now combine together every strange niche I/O chip you've got now and make ONE
> of the suckers, in massive volume, and think "economies of scale". These
> suckers are going to be CHEAP. And low power. The portable market seems to
> be drooling over them, and they're already coming embedded into next
> generation processors. (A math coprocessor, a built-in DSP... I heard
> there's an ARM generation in development that's got 4 DSPs built-in...) A
> machine with a lot of DSPs was half of Steve Jobs' "NeXT" box idea...

Yes, and those DSPs only handled one function each.

> So, back to the Linux scheduler. Right now our approach to these things is
> (if I understand correctly) to feed 'em their program like firmware. Load
> the driver, DSP gets its program, and it's dedicated to that task. Okay,
> fun. And considering lots of them are hooked up to specific I/O devices at
> the other end (an 802.11b antenna, an ethernet jack, etc) that makes sense.
>
> But there's already a company out there (http://www.dsplinux.net, proprietary dudes)
> that SEEMS to be treating a DSP like a seperate processor, capable of
> scheduling tasks to the DSP (think dynamic DMA channel allocation, I'm not
> sure how the electronics work out here: it would make sense to be able to
> allocate and deallocate them like any other resource, but this is giving
> hardware makers far too much credit). Considering the range of applications
> you can have for sound cards alone (be a modem, text to speech, midi, mp3
> decompression, mp3 compression during recording, ogg vorbis, etc), wouldn't
> it be nice to be able to program DSPs a little more dynamically than "device
> driver shows it how to be a sound card"?

This just goes back to the old dedicated vector CPUs provided to DOS to make
a fast, single application run.

> Right now, the scheduler has sort of been hacked by some people to have the
> concept of "realtime tasks" and "not realtime tasks". But if you think that
> in five or ten years we may see machines built ENTIRELY out of DSPs (sort of
> like RISC, only more so). The hyper-multi-threading whatsis thing they're
> doing with the P4 is sort of like this: they have execution cores linked for
> performance and now they're de-linking them because the programmer's better
> at finding paralellism than the hardware is.

The scheduler keeps track of process contexts. If the DSP is being used
for a video, then it may be requested to do Audio... the process context must
be saved/restored - along with the entire program (swap, not paging). Usually
the DSP in question cannot determine, nor identify processes. The stream data
very well. They do not have a general purpose bus. Usually all that is
available is:

PCI ---> program memory
|
V
control ----> DSP -----> output (input)
^
PCI |
data ----------+

Output/input is frequently analog. Sometimes instead of PCI data you get
a direct port into memory (example: AGP).

The DSP internal registers are not always available. They are to be initialized
when the program is started.

The usual problem is speed. These chips are fast, but not at context switching
or program load.

The program memory is loaded (slow - could take a .01 second, may take longer
depending on the size of the program).
The program is started and runs very fast.
Data can only be provided/accepted at PCI speed (relatively slow compaired to
CPU).

> Think about the 3D accelerator problem. Break your screen up into 16
> sections, one DSP sorts the triangles into each quadrant, 16 other DSPs blast
> triangles to frame buffer, and then one more DSP is constantly doing a DMA
> write to the video output to drive your LCD panel at 70hz. 3D acceleration
> becomes a question of having enough DSPs, fast enough, and feeding them the
> right software. 80 million triangles per second is the human visual
> perception threshold, beyond that nvidia's binary-only drivers can go hang...

Absolutely - a dedicated single use problem. Solved by SGI. Very expensive.

> Am I totally on the wrong track here? When do we start worrying about this?

Shouldn't have to worry about anything. This type of hardware has been around
for at least 30 years.

I'd rather see someone figure out how to do NUMA via multiple AGP ports
connecting to multiple motherboards....

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.