Hello World!
I went to see Andrew Morton speak at Xerox PARC and he indicated that
some of the RT patch was a little crazy . Specifically interrupts in
threads (Correct me if I'm wrong Andrew). It seems a lot of the
maintainers haven't really warmed up to it.
I don't know to what extent Ingo has lobbied to try to get acceptance
into an unstable or stable kernel. However, since I know Andrew is cold
to accepting it , I thought I would ask what would need to be done to
the RT patch so that it could be accepted?
I think the fact that some distributions are including RT patched
kernels is a sign that this technology is getting mature. Not to mention
the fact that it's a 600k+ patch and getting bigger everyday.
I'm sure there are some people fiercely opposed to it, some of whom I've
already run into. What is it about RT that gets people's skin crawling?
It is a configure option after all.
Daniel
Daniel Walker wrote:
> I went to see Andrew Morton speak at Xerox PARC and he indicated that
> some of the RT patch was a little crazy . Specifically interrupts in
> threads (Correct me if I'm wrong Andrew). It seems a lot of the
> maintainers haven't really warmed up to it.
Understandably at first encounter it may seem rather
unconventional. However scheduled interrupt execution
has existed in Solaris for years.
What are the objections?
-john
--
[email protected]
On Mon, May 23, 2005 at 04:14:26PM -0700, Daniel Walker wrote:
> Hello World!
>
> I went to see Andrew Morton speak at Xerox PARC and he indicated that
> some of the RT patch was a little crazy . Specifically interrupts in
> threads (Correct me if I'm wrong Andrew). It seems a lot of the
> maintainers haven't really warmed up to it.
>
> I don't know to what extent Ingo has lobbied to try to get acceptance
> into an unstable or stable kernel. However, since I know Andrew is cold
> to accepting it , I thought I would ask what would need to be done to
> the RT patch so that it could be accepted?
>
> I think the fact that some distributions are including RT patched
> kernels is a sign that this technology is getting mature. Not to mention
> the fact that it's a 600k+ patch and getting bigger everyday.
>
> I'm sure there are some people fiercely opposed to it, some of whom I've
> already run into. What is it about RT that gets people's skin crawling?
> It is a configure option after all.
Personally I think interrupt threads, spinlocks as sleeping mutexes and PI
is something we should keep out of the kernel tree. If you want such
advanced RT features use a special microkernel and run Linux as user
process, using RTAI or maybe soon some of the more sofisticated virtualization
technologies.
* Christoph Hellwig <[email protected]> wrote:
> Personally I think interrupt threads, spinlocks as sleeping mutexes
> and PI is something we should keep out of the kernel tree. [...]
it's not really a problem - they integrate nicely. They also found
dozens of hard-to-catch bugs already so if you dont care about embedded
systems at all then worst-case you can consider it a spinlock debugging
mechanism, with the difference that DEBUG_SPINLOCK is far uglier ;)
Anyway, this discussion is premature, as i'm not submitting all these
patches yet.
Ingo
Ingo Molnar wrote:
> * Christoph Hellwig <[email protected]> wrote:
>
>
>>Personally I think interrupt threads, spinlocks as sleeping mutexes
>>and PI is something we should keep out of the kernel tree. [...]
>
>
> it's not really a problem - they integrate nicely. They also found
> dozens of hard-to-catch bugs already so if you dont care about embedded
> systems at all then worst-case you can consider it a spinlock debugging
> mechanism, with the difference that DEBUG_SPINLOCK is far uglier ;)
> Anyway, this discussion is premature, as i'm not submitting all these
> patches yet.
>
Probably the concern is in multiplicative increase in complexity of
configurations and I'm sure the code itself is more complex too.
Of course this is weighed off against the improvements added to the
kernel. I'm personally not too clear on what those improvements are;
a bit better soft-realtime response? (I don't know) What kind of
userbase increase would that allow? .01%, 1.0%...? Is that large
enough to warrant being included in the kernel? Does it even make
technical sense to do this in the general purpose kernel rather than
a specialised solution?
Those are the kinds of questions that will have to be debated (I
guess this mail is directed more towards Daniel than you, Ingo :)).
Nick
* Nick Piggin <[email protected]> wrote:
> Of course this is weighed off against the improvements added to the
> kernel. I'm personally not too clear on what those improvements are; a
> bit better soft-realtime response? (I don't know) [...]
what the -RT kernel (PREEMPT_RT) offers are guaranteed hard-realtime
responses. ~15 usecs worst-case latency on a 2GHz Athlon64. On arbitrary
(SCHED_OTHER) workloads. (I.e. i've measured such worst-case latencies
when running 1000 hackbench tasks or when swapping the box to death, or
when running 40 parallel copies of the LTP testsuite.)
so it's well worth the effort, but there's no hurry and all the changes
are incremental anyway. I can understand Daniel's desire for more action
(he's got a product to worry about), but upstream isnt ready for this
yet.
Ingo
Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
>
>>Of course this is weighed off against the improvements added to the
>>kernel. I'm personally not too clear on what those improvements are; a
>>bit better soft-realtime response? (I don't know) [...]
>
>
> what the -RT kernel (PREEMPT_RT) offers are guaranteed hard-realtime
> responses. ~15 usecs worst-case latency on a 2GHz Athlon64. On arbitrary
> (SCHED_OTHER) workloads. (I.e. i've measured such worst-case latencies
> when running 1000 hackbench tasks or when swapping the box to death, or
> when running 40 parallel copies of the LTP testsuite.)
>
Oh OK, I didn't realise it is aiming for hard RT. Cool! but
that wasn't so much the main point I was trying to make...
> so it's well worth the effort, but there's no hurry and all the changes
> are incremental anyway. I can understand Daniel's desire for more action
> (he's got a product to worry about), but upstream isnt ready for this
> yet.
>
Basically the same questions I think will still be up for debate.
Not that I want to start now, nor do I really have any feelings
on the matter yet (other than I'm glad you're not in a hurry :)).
For example, it may not be clear to everyone that it is
automatically well worth the effort ;) And others may really
want the functionality but prefer it to be done in a specialised
software like Christoph said.
Nick
* Nick Piggin <[email protected]> wrote:
> Oh OK, I didn't realise it is aiming for hard RT. Cool! but
> that wasn't so much the main point I was trying to make...
>
> >so it's well worth the effort, but there's no hurry and all the changes
> >are incremental anyway. I can understand Daniel's desire for more action
> >(he's got a product to worry about), but upstream isnt ready for this
> >yet.
> >
>
> Basically the same questions I think will still be up for debate. Not
> that I want to start now, nor do I really have any feelings on the
> matter yet (other than I'm glad you're not in a hurry :)).
i expect it to be pretty much like voluntary-preempt: there was much
flaming 9 months ago and by today 99% of the voluntary-preempt patches
are already in the upstream kernel and the remaining 1% (which just adds
the config option and touches one include file) i didnt submit yet.
so i dont think there's much need to worry or even to decide anything
upfront: the merge is already happening. The two biggest preconditions
of PREEMPT_RT, the irq subsystem rewrite, and the spinlock-init API
cleanups are already upstream. The rest is just details or out-of-line
code. The discussions need to happen in small isolated steps, as the
component technologies are merged and discussed. The components are all
useful even without the final PREEMPT_RT step (which further proves the
usefulness of PREEMPT_RT - but you dont have to agree with that global
assertion).
So i'm afraid nothing radical will happen anywhere. Maybe we can have
one final flamewar-party in the end when the .config options are about
to be added, just for nostalgia, ok? =B-)
Ingo
On Tue, 24 May 2005, Christoph Hellwig wrote:
> On Mon, May 23, 2005 at 04:14:26PM -0700, Daniel Walker wrote:
>
> Personally I think interrupt threads, spinlocks as sleeping mutexes and PI
> is something we should keep out of the kernel tree.
A general threaded interrupt is not a good thing. Ingo made this to see
how far he can press it. But having serial drivers running in interrupt is
way overkill. Even network drivers can (provided they use DMA) run in
interrupt without hurting the overall latencies. It all depends on the
driver and how it interfaces with the rest of the kernel, especially what
locks are shared and how long the lock are taken. If they are small
enough, interrupt context and thus raw spinlocks are good enough.
In general, I think each driver ought to be configurable: Either it runs
in interrupt context or it runs in a thread. The locks have to be changed
accordingly from raw spinlocks to mutexes.
As for PI: Well, I don't think it will affect the overall stability to
have it as something you can switch on/off compile time. Will it even hurt
anyone except for a tiny overhead of checking wether there are RT waiters
or not?
I think the configuration space ought to be something like:
1) Server: No interrupt threads, raw spinlocks and no preemption.
2) RT: Preemption, mutexes with PI in almost all places,
interrupts are threaded per configuration per device.
Desktops ought to run as RT!! Most of all to force people to test the RT
setup, but also to make sure people can run a audio device etc.
> If you want such
> advanced RT features use a special microkernel and run Linux as user
> process, using RTAI or maybe soon some of the more sofisticated virtualization
> technologies.
I find that a bad approach:
1) You don't have RT in userspace.
2) You can't use Linux drivers for standeard hardware when you want it to
be part of your deterministic RT application.
Esben
* Nick Piggin <[email protected]> wrote:
> Oh? I thought the idea of the voluntary-preempt thing was to stick
> cond_rescheds into might_sleep. At least that was the part I think I
> objected to... but I don't think I was one of the participants in that
> flamewar :)
the VP patchset consisted of dozens of latency-breakers, of the
->break_lock mechanism, of the might_sleep()s (which were placed based
on latency tracing tools) and on the cond_resched()s too, (and other
stuff i forget). Most of this is upstream now. To put a cond_resched()
into might_sleep() is now a 5-liner :-)
Ingo
Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
>
>>Oh OK, I didn't realise it is aiming for hard RT. Cool! but
>>that wasn't so much the main point I was trying to make...
>>
>>
>>>so it's well worth the effort, but there's no hurry and all the changes
>>>are incremental anyway. I can understand Daniel's desire for more action
>>>(he's got a product to worry about), but upstream isnt ready for this
>>>yet.
>>>
>>
>>Basically the same questions I think will still be up for debate. Not
>>that I want to start now, nor do I really have any feelings on the
>>matter yet (other than I'm glad you're not in a hurry :)).
>
>
> i expect it to be pretty much like voluntary-preempt: there was much
> flaming 9 months ago and by today 99% of the voluntary-preempt patches
> are already in the upstream kernel and the remaining 1% (which just adds
> the config option and touches one include file) i didnt submit yet.
>
Oh? I thought the idea of the voluntary-preempt thing was to stick
cond_rescheds into might_sleep. At least that was the part I think
I objected to... but I don't think I was one of the participants in
that flamewar :)
> so i dont think there's much need to worry or even to decide anything
> upfront: the merge is already happening. The two biggest preconditions
> of PREEMPT_RT, the irq subsystem rewrite, and the spinlock-init API
> cleanups are already upstream. The rest is just details or out-of-line
> code. The discussions need to happen in small isolated steps, as the
> component technologies are merged and discussed. The components are all
> useful even without the final PREEMPT_RT step (which further proves the
> usefulness of PREEMPT_RT - but you dont have to agree with that global
> assertion).
>
No definitely - if things can get merged bit by bit in small, agreeable
chunks then that is the best way of course.
> So i'm afraid nothing radical will happen anywhere. Maybe we can have
> one final flamewar-party in the end when the .config options are about
> to be added, just for nostalgia, ok? =B-)
Well from Daniel's message it seemed like things were not quite so far
along as you say.
Flamewar party? I'm afraid I don't have a thing to bring (... yet!)
I'm sure someone will invite themselves, for old time's sake :)
Nick Piggin wrote:
> Ingo Molnar wrote:
>
>> * Nick Piggin <[email protected]> wrote:
>>
>>
>>> Of course this is weighed off against the improvements added to the
>>> kernel. I'm personally not too clear on what those improvements are; a
>>> bit better soft-realtime response? (I don't know) [...]
>>
>>
>>
>> what the -RT kernel (PREEMPT_RT) offers are guaranteed hard-realtime
>> responses. ~15 usecs worst-case latency on a 2GHz Athlon64. On
>> arbitrary (SCHED_OTHER) workloads. (I.e. i've measured such worst-case
>> latencies when running 1000 hackbench tasks or when swapping the box
>> to death, or when running 40 parallel copies of the LTP testsuite.)
>>
>
> Oh OK, I didn't realise it is aiming for hard RT. Cool! but
> that wasn't so much the main point I was trying to make...
>
>> so it's well worth the effort, but there's no hurry and all the
>> changes are incremental anyway. I can understand Daniel's desire for
>> more action (he's got a product to worry about), but upstream isnt
>> ready for this yet.
>>
>
> Basically the same questions I think will still be up for debate.
> Not that I want to start now, nor do I really have any feelings
> on the matter yet (other than I'm glad you're not in a hurry :)).
>
> For example, it may not be clear to everyone that it is
> automatically well worth the effort ;) And others may really
> want the functionality but prefer it to be done in a specialised
> software like Christoph said.
>
> Nick
>
There are definitely those who would prefer to have the functionality,
at least as an option, in the mainline kernel. The group that I contract
for get heartburn about having to patch every kernel running on every
development workstation and every production system. We need hard RT,
but currently when we have to have hard RT we go with a different
product. Another thing that some of us want/need is a hard real-time
Linux that doesn't create the segregation that most of these specialized
products create. Currently there are damn few choices for real posix
applications development with hard RT requirements running in a Unix
environment.
--
kr
Esben Nielsen wrote:
> I find that a bad approach:
> 1) You don't have RT in userspace.
> 2) You can't use Linux drivers for standeard hardware when you want it to
> be part of your deterministic RT application.
Please have a look at RTAI/fusion. For the record, RTAI has been providing
hard-rt in standard Linux user-space for over 5 years now. With RTAI/Fusion
this gets even better as there isn't even a special API ...
Here are a few links if you're interested:
http://www.rtai.org/modules.php?name=Content&pa=showpage&pid=1
http://marc.theaimsgroup.com/?l=linux-kernel&m=111634653913840&w=2
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Tue, 24 May 2005, Karim Yaghmour wrote:
>
> Esben Nielsen wrote:
> > I find that a bad approach:
> > 1) You don't have RT in userspace.
> > 2) You can't use Linux drivers for standeard hardware when you want it to
> > be part of your deterministic RT application.
>
> Please have a look at RTAI/fusion. For the record, RTAI has been providing
> hard-rt in standard Linux user-space for over 5 years now. With RTAI/Fusion
> this gets even better as there isn't even a special API ...
>
The tests I have read (I can't remember the links, but it was on lwn.net)
states that the worst case latency is even worse than for a standeard 2.6
kernel!
If you gonna make usefull deterministic real-time in userspace you got to
change stuff in kernel space and implement stuff like priority
inheritance, priority ceiling or similar. It can only turn up to be an
ugly hack which will end up being as intruesive into the kernel as Ingo's
approach. If you don't do anything like that you can not use _any_ Linux
kernel resources from your RT processes even though you have reimplemented
the pthread library to know about the "super RT" priorities.
But I give you: You will gain better interrupt latencies because
interrupts are executed below the Linux proper. I.e. when the Linux
kernel runs with interrupt disabled, they are really enabled in the RTAI
subsystem.
My estimate is that RTAI is good when you have a very small subsystem you
need to run RT with very low latencies. Forinstance, controlling a fast
device with limiting hardware resources to buffer events.
For large control systems I don't think it is the proper way to do it.
There it is much better to run the control tasks as normal Linux
user-space processes with RT-priority. I can see Ingo's kernel doing that,
I can't see RTAI doing it except for very special situations where you
don't make _any_ Linux system calls at all! You can't even use a
normal Linux network device or character device from your RT application!
> Here are a few links if you're interested:
> http://www.rtai.org/modules.php?name=Content&pa=showpage&pid=1
> http://marc.theaimsgroup.com/?l=linux-kernel&m=111634653913840&w=2
>
> Karim
Esben
I have no intent of prosecuting either methods. I was simply pointing out
a fact. FWIW, both approachs can be used together, they are not (as I may
have mistakenly stated in previous debates) in contradiction.
Here are some random comments to be taken very lightly:
Esben Nielsen wrote:
> The tests I have read (I can't remember the links, but it was on lwn.net)
> states that the worst case latency is even worse than for a standeard 2.6
> kernel!
Sorry, I'm an avid LWN reader and haven't come accross something like this.
There are A LOT of benchmarks thrown left and right, mostly by embedded
distros wishing to push their own agenda. If what you assert is to hold,
then please at least provide us with a URL.
> If you gonna make usefull deterministic real-time in userspace you got to
> change stuff in kernel space and implement stuff like priority
> inheritance, priority ceiling or similar. It can only turn up to be an
> ugly hack which will end up being as intruesive into the kernel as Ingo's
> approach. If you don't do anything like that you can not use _any_ Linux
> kernel resources from your RT processes even though you have reimplemented
> the pthread library to know about the "super RT" priorities.
I've visited these issues before. It all boils down to a simple question:
is it worth making the kernel so much more complicated for such a minority
when 90% of the problems encountered in the field revolve around the
necessity of responding to an interrupt in a deterministic fashion?
And for those 90% of cases, a simple hyper-visor/nanokernel layer is
good enough. For the remaining 10% of cases, that's where something like
the rt-preempt or RTAI become necessary.
> But I give you: You will gain better interrupt latencies because
> interrupts are executed below the Linux proper. I.e. when the Linux
> kernel runs with interrupt disabled, they are really enabled in the RTAI
> subsystem.
For most cases, as I said, there's no need for either rtai or rt-preempt,
all you need is direct access to the interrupt source, something a
hypervisor/nanokernel can easily provide you with. At its simplest, you
need two things:
- turning the core of the interrupt disabling defines into function pointers
- turning the core IRQ handler (do_IRQ) into a function pointer
Using this, a driver needing hard-rt can just tap into the interrupt flow
and get deterministic behavior WITHOUT either rtai, rt-preempt, or even
adeos. Of course, you can look for a clean implementation of this scheme,
and adeos does this quite well. Philippe can correct me if I'm wrong,
but with just the above hooks, much of adeos can be made to be a loadable
module.
Sure, tapping into the interrupt flow isn't as featured as having true
hard-rt in user-space (either with rt-preempt or rtai), but it's a got
a very nice cost/benefit ratio.
> My estimate is that RTAI is good when you have a very small subsystem you
> need to run RT with very low latencies. Forinstance, controlling a fast
> device with limiting hardware resources to buffer events.
> For large control systems I don't think it is the proper way to do it.
> There it is much better to run the control tasks as normal Linux
> user-space processes with RT-priority. I can see Ingo's kernel doing that,
> I can't see RTAI doing it except for very special situations where you
> don't make _any_ Linux system calls at all! You can't even use a
> normal Linux network device or character device from your RT application!
You are certainly entitled to your preferences, but if you're interested
in hearing about large-scale/industrial deployments of RTAI (and there
are plenty I assure you), I would suggest a visit to the rtai-users mailing
list.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Tuesday 24 May 2005 09:58, Esben Nielsen wrote:
As a user working on makeing a small mill into a CNC machine, using
emc, I'll comment as needed here.
>On Tue, 24 May 2005, Karim Yaghmour wrote:
>> Esben Nielsen wrote:
>> > I find that a bad approach:
>> > 1) You don't have RT in userspace.
>> > 2) You can't use Linux drivers for standeard hardware when you
>> > want it to be part of your deterministic RT application.
>>
>> Please have a look at RTAI/fusion. For the record, RTAI has been
>> providing hard-rt in standard Linux user-space for over 5 years
>> now. With RTAI/Fusion this gets even better as there isn't even a
>> special API ...
>
>The tests I have read (I can't remember the links, but it was on
> lwn.net) states that the worst case latency is even worse than for
> a standeard 2.6 kernel!
It is worse, by a quite noticeable amount. Keyboard events often take
noticeable fractions of a second to show, or to take effect if its
the apps own interface that you are entering control data to.
>If you gonna make usefull deterministic real-time in userspace you
> got to change stuff in kernel space and implement stuff like
> priority inheritance, priority ceiling or similar. It can only
> turn up to be an ugly hack which will end up being as intruesive
> into the kernel as Ingo's approach. If you don't do anything like
> that you can not use _any_ Linux kernel resources from your RT
> processes even though you have reimplemented the pthread library to
> know about the "super RT" priorities.
>
>But I give you: You will gain better interrupt latencies because
>interrupts are executed below the Linux proper. I.e. when the Linux
>kernel runs with interrupt disabled, they are really enabled in the
> RTAI subsystem.
>
>My estimate is that RTAI is good when you have a very small
> subsystem you need to run RT with very low latencies. Forinstance,
> controlling a fast device with limiting hardware resources to
> buffer events.
This is true, and in order to be able to run emc with anything like a
decent motor speed at the motors, I had to buy a new board and video
card to replace the one I was going to use, which was a 266mhz p2.
The p2 could do it, but there was no time left to run linux, so
controlling the application wasn't possible. Without changing the
RTAI cycle time, a 1400mhz athlon runs linux at fairly normal speed
while emc is running.
>For large control systems I don't think it is the proper way to do
> it. There it is much better to run the control tasks as normal
> Linux user-space processes with RT-priority. I can see Ingo's
> kernel doing that, I can't see RTAI doing it except for very
> special situations where you don't make _any_ Linux system calls at
> all! You can't even use a normal Linux network device or character
> device from your RT application!
I agree. Use RTAI if you are building a specialized box that will
never be asked to do anything else, mostly because thats all it will
be capable of doing unless it has horsepower to burn, lots of it.
Ingo's RT patches allow me to do some play time and driver development
on this box for that application with a reasonable expectation that
it will work on that box when I haul the code down there and
recompile it there.
I've also noted that Jack users need this as its not very usable
without it, or wasn't half a year ago. I'm not a die hard Jack user,
but I'd really like to import my movie camera without any dropped
frames, so I expect what fixes one will fix the other.
If, at the same time, it will give me back a keyboard when SA is
filtering the incoming mail on this machine, thats a huge plus. I'm
about to find that out as I just built the -07 version.
>> Here are a few links if you're interested:
>> http://www.rtai.org/modules.php?name=Content&pa=showpage&pid=1
>> http://marc.theaimsgroup.com/?l=linux-kernel&m=111634653913840&w=2
>>
>> Karim
>
>Esben
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.34% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.
* Karim Yaghmour <[email protected]> wrote:
> I've visited these issues before. It all boils down to a simple
> question: is it worth making the kernel so much more complicated for
> such a minority when 90% of the problems encountered in the field
> revolve around the necessity of responding to an interrupt in a
> deterministic fashion?
>
> And for those 90% of cases, a simple hyper-visor/nanokernel layer is
> good enough. For the remaining 10% of cases, that's where something
> like the rt-preempt or RTAI become necessary. [...]
just to make sure, by "much more complicated" are you referring to the
PREEMPT_RT feature? Right now PREEMPT_RT consists of 8000 new lines of
code (of which 2000 is debugging code) and 2000 lines of modified kernel
code. One of the primary goals i had was to keep it simple and robust.
That's more than 3 times smaller than UML and it's almost an order of
magnitude smaller than a nanokernel codebase i just checked, and it
boots/works on just about everything where the stock Linux kernel boots.
I challenge you to write a nanokernel/hypervisor with a comparable
feature-set in that many lines of code.
anyway, as always, the devil is in the details. I certainly dont suggest
that nanokernels/hypervisors are not nice (to the contrary!), or that
all component technologies of the -RT patchset will be accepted into
Linux. PREEMPT_RT started out as an experiment to reduce scheduling
latencies within the constraints of the Linux kernel. It is not an
all-or-nothing feature, it's more of a collection of incremental
patches. It is one, nonexclusive way of doing things.
Ingo
K.R. Foley wrote:
>
> There are definitely those who would prefer to have the functionality,
> at least as an option, in the mainline kernel. The group that I contract
> for get heartburn about having to patch every kernel running on every
> development workstation and every production system. We need hard RT,
> but currently when we have to have hard RT we go with a different
> product.
Well, yes. There are lots of things Linux isn't suited for.
There are likewise a lot of patches that SGI would love to
get into the kernel so it runs better on their 500+ CPU
systems. My point was just that a new functionality/feature
doesn't by itself justify being included in the kernel.org
kernel.
> Another thing that some of us want/need is a hard real-time
> Linux that doesn't create the segregation that most of these specialized
> products create. Currently there are damn few choices for real posix
> applications development with hard RT requirements running in a Unix
> environment.
>
Maybe there are damn few because it is hard to get right within
the framework of a general posix environment. Or maybe its
because it has a comparatively small userbase (compared to say
mid/small servers and desktops). Which are neither completely
invalid reasons against its inclusion in Linux.
But I want to be clear that I haven't read or thought about the
code in question too much, and I don't have any opinions on it
yet. So please nobody involve me in a flamewar about it :)
Nick
Nick Piggin wrote:
> K.R. Foley wrote:
>
>>
>> There are definitely those who would prefer to have the functionality,
>> at least as an option, in the mainline kernel. The group that I contract
>> for get heartburn about having to patch every kernel running on every
>> development workstation and every production system. We need hard RT,
>> but currently when we have to have hard RT we go with a different
>> product.
>
>
> Well, yes. There are lots of things Linux isn't suited for.
> There are likewise a lot of patches that SGI would love to
> get into the kernel so it runs better on their 500+ CPU
> systems. My point was just that a new functionality/feature
> doesn't by itself justify being included in the kernel.org
> kernel.
Agreed. Maybe the Linux kernel can't be all things to all of us, even as
configuration options. I am certainly not the one who is going to make
that decision either. I just wanted voice my opinion from a
user/developer perspective.
>
>> Another thing that some of us want/need is a hard real-time
>> Linux that doesn't create the segregation that most of these specialized
>> products create. Currently there are damn few choices for real posix
>> applications development with hard RT requirements running in a Unix
>> environment.
>>
>
> Maybe there are damn few because it is hard to get right within
> the framework of a general posix environment. Or maybe its
> because it has a comparatively small userbase (compared to say
> mid/small servers and desktops). Which are neither completely
> invalid reasons against its inclusion in Linux.
>
> But I want to be clear that I haven't read or thought about the
> code in question too much, and I don't have any opinions on it
> yet. So please nobody involve me in a flamewar about it :)
>
> Nick
>
>
And please don't misunderstand my statements as trying start a flamewar
either. :-)
--
kr
* Daniel Walker <[email protected]> wrote:
> Ouch.. Let me disclaim my email , I'm writing for me and no one else.
> I'm just a sponsored kernel hacker... Are you worried about RedHat
> products?
nope, what i was referring to was the CGL distro mvista.com recently
announced (released?). My bad if it's not your worry ... :)
Ingo
On Tue, 2005-05-24 at 10:15 +0200, Ingo Molnar wrote:
> so it's well worth the effort, but there's no hurry and all the changes
> are incremental anyway. I can understand Daniel's desire for more action
> (he's got a product to worry about), but upstream isnt ready for this
> yet.
Ouch.. Let me disclaim my email , I'm writing for me and no one else.
I'm just a sponsored kernel hacker... Are you worried about RedHat
products?
The main reason for my email is that I know Andrew and Linus don't want
interrupts in threads. Without that there is no PREEMPT_RT . If you want
to narrow the discussion to just interrupts in threads that's fine with
me, cause that's what I'm concerned about.
There has been some version of interrupts in threads running around for
almost a year now. To me that's mature enough to be "unstable".
Daniel
* Nick Piggin <[email protected]> wrote:
> Well, yes. There are lots of things Linux isn't suited for. There are
> likewise a lot of patches that SGI would love to get into the kernel
> so it runs better on their 500+ CPU systems. [...]
this reminds me. PREEMPT_RT found a handful of SMP races that not even
100+ CPU systems triggered in any deterministic way.
(I have mentioned this before but it seems worth repeating: the
preemption model of PREEMPT_RT is similar to a SMP Linux kernel running
on an system that has an 'infinite' number of CPUs. Each task can be
thought of having its own separate CPU - and SMP-alike instruction
overlap can happen at any instruction boundary.)
So the very small meets (and helps) the very large in interesting ways.
PREEMPT_RT very much depends on a good SMP implementation and on a good
CONFIG_PREEMPT implementation. The synergies are much wider than just
enabling deterministic behavior in embedded systems.
Ingo
Ingo Molnar wrote:
> just to make sure, by "much more complicated" are you referring to the
> PREEMPT_RT feature? Right now PREEMPT_RT consists of 8000 new lines of
> code (of which 2000 is debugging code) and 2000 lines of modified kernel
> code. One of the primary goals i had was to keep it simple and robust.
I'm refering to the complexity of the behavior. Turning interrupts to
threads and spinlocks to mutexes makes vanilla Linux's behavior much
more complicated than it already is. But before a bunch of mouth-foaming
rugby players tackle me to the ground, please keep in mind that this is
my appreciation of things. Others have claimed that they are perfectly
fine with this ...
> That's more than 3 times smaller than UML and it's almost an order of
> magnitude smaller than a nanokernel codebase i just checked, and it
> boots/works on just about everything where the stock Linux kernel boots.
> I challenge you to write a nanokernel/hypervisor with a comparable
> feature-set in that many lines of code.
No challenge needed, I'm not refering to codebase. No to mention that I'm
not even going to get near claiming knowing the kernel's guts anywhere
as much as you do :)
Here's running the risk of comparing apples to oranges:
$ ll adeos-linux-2.6.11-i386-r10c3.patch realtime-preempt-2.6.12-rc4-V0.7.47-07
-rw-rw-r-- 1 karim karim 195105 May 24 10:19 adeos-linux-2.6.11-i386-r10c3.patch
-rw-rw-r-- 1 karim karim 610509 May 24 00:14 realtime-preempt-2.6.12-rc4-V0.7.47-07
> anyway, as always, the devil is in the details. I certainly dont suggest
> that nanokernels/hypervisors are not nice (to the contrary!), or that
> all component technologies of the -RT patchset will be accepted into
> Linux. PREEMPT_RT started out as an experiment to reduce scheduling
> latencies within the constraints of the Linux kernel. It is not an
> all-or-nothing feature, it's more of a collection of incremental
> patches. It is one, nonexclusive way of doing things.
Here's from a previous posting back in october:
> development pace. Let's face it, no self-respecting effort that has
> ever labeled itself as wanting to provide "hard real-time Linux"
> has been active on the LKML on the same level as Ingo (though many
> have concentrated a lot of effort and talent on other lists.)
Clearly I recognize the work you have accomplished, and you are correct
in stating that the approach is nonexclusive. If the patch does indeed
make it into the kernel, then so be it. It's worth considering though
that there are other methods which can provide hard-rt without increasing
the kernel's complexity even when enabled; the most basic of which would
be turning the interrupt-handling/disable to function pointers. At the
next level, you could have something like the interrupt pipeline from
adeos on top (possibly as a loadable module), and at a third level you
could have something like RTAI/fusion (as additional loadable modules) ...
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
Daniel Walker wrote:
> The main reason for my email is that I know Andrew and Linus don't want
> interrupts in threads. Without that there is no PREEMPT_RT . If you want
> to narrow the discussion to just interrupts in threads that's fine with
> me, cause that's what I'm concerned about.
Again, what are the technical arguments against scheduled
interrupts causing these folks concern? All in all it is
just a different locking paradigm where interrupt-initiated
scheduling is moved closer to the metal.
-john
--
[email protected]
Esben Nielsen wrote:
> On Tue, 24 May 2005, Karim Yaghmour wrote:
>
>
>>Esben Nielsen wrote:
>>
>>>I find that a bad approach:
>>>1) You don't have RT in userspace.
>>>2) You can't use Linux drivers for standeard hardware when you want it to
>>>be part of your deterministic RT application.
>>
>>Please have a look at RTAI/fusion. For the record, RTAI has been providing
>>hard-rt in standard Linux user-space for over 5 years now. With RTAI/Fusion
>>this gets even better as there isn't even a special API ...
>>
>
> The tests I have read (I can't remember the links, but it was on lwn.net)
> states that the worst case latency is even worse than for a standeard 2.6
> kernel!
You are likely talking about a benchmark conducted by Peter Laurich and
published by Linuxdevices, I guess. In fact, the benchmark code was
wrong, and has been amended by the author himself in a follow-up to the
initial article:
http://www.linuxdevices.com/articles/AT3479098230.html
The figures obtained with RTAI's LXRT extension are now correct, and not
surprisingly, LXRT is the best performer among the solutions aimed at
providing RT support in user-space in this study, with 48 us worst-case
scheduling latency under high load on a SBC based on a Celeron running
at 650 MHz, and populated with 512 MB SDRAM. PREEMPT_RT was not tested
by the benchmark, though.
>
> If you gonna make usefull deterministic real-time in userspace you got to
> change stuff in kernel space and implement stuff like priority
> inheritance, priority ceiling or similar. It can only turn up to be an
> ugly hack which will end up being as intruesive into the kernel as Ingo's
> approach. If you don't do anything like that you can not use _any_ Linux
> kernel resources from your RT processes even though you have reimplemented
> the pthread library to know about the "super RT" priorities.
>
Indeed, this is why the RTAI project has an experimental branch called
"fusion", distinct from the classic LXRT one, which aims at a better
integration of the real-time services it provides into the common kernel
framework. The idea is to allow RTAI applications to be alternatively
controlled by the Linux kernel and the real-time nucleus, while keeping
the RT priority scheme intact across seamless and automatic transitions
between both.
If you think that PREEMPT_RT will be able to give you real-time
guarantees that are as stringent as those already exhibited by
additional nucleus/co-schedulers on any kind of hardware including the
embedded ones, then you will likely conclude that we are currently
chasing wild gooses (and that we owe you one of them for Xmas). If not,
well, maybe considering a symbiotic approach between a fully preemptable
Linux kernel providing regular services and a specialized co-scheduler
providing extreme predictability should make some sense, given that both
co-operate to control a single set of real-time tasks in user-space.
> But I give you: You will gain better interrupt latencies because
> interrupts are executed below the Linux proper. I.e. when the Linux
> kernel runs with interrupt disabled, they are really enabled in the RTAI
> subsystem.
>
IIRC, interrupt latency has never been the toughest problem with the
vanilla Linux kernel with respect to predictability, but the scheduling
latency still is. Hence Ingo's work, not to speak of previous efforts
regarding preemptability, I guess.
> My estimate is that RTAI is good when you have a very small subsystem you
> need to run RT with very low latencies. Forinstance, controlling a fast
> device with limiting hardware resources to buffer events.
> For large control systems I don't think it is the proper way to do it.
> There it is much better to run the control tasks as normal Linux
> user-space processes with RT-priority. I can see Ingo's kernel doing that,
> I can't see RTAI doing it except for very special situations where you
> don't make _any_ Linux system calls at all! You can't even use a
> normal Linux network device or character device from your RT application!
>
>
It happens that many if not most among the complex real-time
applications have varying requirements with respect to determinism,
depending on the task or execution context you consider from them. This
is why the split application model, involving some kernel modules which
implement the demanding RT stuff and some user-space programs connected
to them, has been used for years.
What the RTAI project is trying to do now with its fusion branch, is to
make compatible a larger spectrum of RT requirements without killing the
design of your RT application as above. E.g. some tasks need to run on a
10Khz period, whilst others can deal with ~100 us jitter under high load
, just for the purpose of being able to call regular Linux services; but
you want all of them running embodied in a single regular user-space
process, and being able to use GDB for chasing the gremlins in there.
For us, this implies to make the most preemptable Linux kernel we can
find and the fusion nucleus share the same semantics and co-operate,
instead of blindly running side-by-side, so that RTAI eventually appears
as a native support provided by the Linux kernel to the real-time
application designers. Sometimes, this requires RTAI to impersonate some
vanilla system calls such as nanosleep(), so that you really have
micro-second level wakeups, with a little help of RTAI's integrated
oneshot timer. This also requires a bunch of headaches, coffee, and
machines going south, but that's nothing worth documenting in a README,
I guess.
--
Philippe.
On Tue, 2005-05-24 at 10:41 -0500, K.R. Foley wrote:
> Nick Piggin wrote:
> > K.R. Foley wrote:
> >
> >>
> >> There are definitely those who would prefer to have the functionality,
> >> at least as an option, in the mainline kernel. The group that I contract
> >> for get heartburn about having to patch every kernel running on every
> >> development workstation and every production system. We need hard RT,
> >> but currently when we have to have hard RT we go with a different
> >> product.
> >
> >
> > Well, yes. There are lots of things Linux isn't suited for.
> > There are likewise a lot of patches that SGI would love to
> > get into the kernel so it runs better on their 500+ CPU
> > systems. My point was just that a new functionality/feature
> > doesn't by itself justify being included in the kernel.org
> > kernel.
>
> Agreed. Maybe the Linux kernel can't be all things to all of us, even as
> configuration options. I am certainly not the one who is going to make
> that decision either. I just wanted voice my opinion from a
> user/developer perspective.
I disagree .. The perspective I got from Andrew Morton was that if
enough people want a feature it will/should go in. I agree with that. If
a new feature is added , it just makes a larger download (as long as
it's a configure option). I don't see a downside.
Daniel
On Tue, 2005-05-24 at 10:15 +0200, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
> > Of course this is weighed off against the improvements added to the
> > kernel. I'm personally not too clear on what those improvements are; a
> > bit better soft-realtime response? (I don't know) [...]
>
> what the -RT kernel (PREEMPT_RT) offers are guaranteed hard-realtime
> responses. ~15 usecs worst-case latency on a 2GHz Athlon64. On arbitrary
> (SCHED_OTHER) workloads. (I.e. i've measured such worst-case latencies
> when running 1000 hackbench tasks or when swapping the box to death, or
> when running 40 parallel copies of the LTP testsuite.)
I wouldn't start making guarantees yet .. For instance printk can hold
off interrupts for unknown periods (unknown to me anyway) depending on
the size of the strings that it prints.
Daniel
On Tue, 2005-05-24 at 19:14 +1000, Nick Piggin wrote:
>
> Well from Daniel's message it seemed like things were not quite so far
> along as you say.
I think Ingo is just confident that in time things will get merged. I
know that there are some people who don't want/like the RT changes. I'm
interested to know what people's objections are. So far in this thread
the only objection that I've feel is valid is the added complexity the
RT patch would add to Linux.
Daniel
Karim wrote:
>
> Ingo Molnar wrote:
> > just to make sure, by "much more complicated" are you
> referring to the
> > PREEMPT_RT feature? Right now PREEMPT_RT consists of 8000
> new lines of
> > code (of which 2000 is debugging code) and 2000 lines of
> modified kernel
> > code. One of the primary goals i had was to keep it simple
> and robust.
>
> I'm refering to the complexity of the behavior. Turning
> interrupts to threads and spinlocks to mutexes makes vanilla
> Linux's behavior much more complicated than it already is.
>
Linux has been distributing and decoupling locking and
data structures since the first multi CPU kernel was booted.
All data integrity is just as consistent in RT, so HOW does
the behavior change?
SMP is mainstream now (Pentium IV, to start).
The kernel development is just taking the logical next step.
RT is eco-friendly, even, if you can bear it, in preventing
high-powered CPUs from burning megawatts spinning on
a bit in memory. Watch the temperature spikes when that
happens.
Basically, the reality is, that software loading can always
exceed the given hardware, for any system. If you want
some things to always work smoothly, you need to have some
way to bound response time and prioritize deterministically.
The more the computer becomes an entertainment device in the
mainstream (ahem, Ipod), the more this will be an opportunity
for Linux. People are of course running Linux on their Ipods
already. But - can it play the music without skipping?
With RT it CAN.
Also keep in mind the time-critical response requirements of
multimedia systems. Its not just Linux in embedded devices.
Its going to Linux in your TV some day soon (or already).
Take a look at all the big Sony TVs. All MontaVista Linux.
But Linux is behind, somewhat in a lot of this technology,
as pointed out by others. IRQ threads is not radical, untested,
new technology. Nor is a mutex, priority inheritance or not.
Linux is consistent with the Unix legacy - resource sharing,
fairness, progress. All good things, endemic to the evolution
of Linux. But the other Unixes have moved past that - to keep up.
Most of the Unixes are clustering back-room systems now, but some
are still foraging alongside the north-western American
Tyrannosaurus. They are evolving, and trying not to get chomped.
The pressure is going to increase, the question is do
we lead, or do we follow and pay, whereever they want to
take us today?
Basically this technology could go into the kernel, to quote Ingo,
as "no-drag". You turn it off and it goes away, no overhead.
No pain, no worries, no stress, no flaming.
And Linux leads the way, and the multimedia / audio folks are happy,
able to push open source further, opening the door for more folks to
contribute, best they know how.
There is absolutely nothing, btw. in any of the the
sub-kernels, patented or not, that can't be done in Linux.
Sven
Sven Dietrich wrote:
> Linux has been distributing and decoupling locking and
> data structures since the first multi CPU kernel was booted.
>
> All data integrity is just as consistent in RT, so HOW does
> the behavior change?
Here's quoting Arjan from another thread just today:
> PREEMPT was (and is?) a stability risk and so you'll see RHEL4 not
> having it enabled.
... and that's for simple preemption ...
Beyond that, I must admit that I'm probably missing the point of
your question: fact is that running interrupt handlers as threads
!= dealing with interrupts in a linear fashion (as is now.) That's
behavior change right there, to mention just that.
> The more the computer becomes an entertainment device in the
> mainstream (ahem, Ipod), the more this will be an opportunity
> for Linux. People are of course running Linux on their Ipods
> already. But - can it play the music without skipping?
Like I said, there are many paths that lead to the same result.
> With RT it CAN.
Sure. There's a nice thread about just that topic (using rt,
as in Adeos, to get skipless audio) following the release of
Adeos back in june 2002.
> The pressure is going to increase, the question is do
> we lead, or do we follow and pay, whereever they want to
> take us today?
To the best of my understanding, support for a given feature in
other unices has not necessarily resulted in it being included
in Linux. sys_clone() is a good example. Frankly, though, I
would rather not get into such a philosophical debate. My point
is that I personally (and it seems others too) feel that the
cost/benefit ratio plays favorably for a lightweight rt solution.
> Basically this technology could go into the kernel, to quote Ingo,
> as "no-drag". You turn it off and it goes away, no overhead.
> No pain, no worries, no stress, no flaming.
Sure, but the same could be said for a lot of different things.
But like I said in my mail to Ingo, the approaches discussed
here are not mutually exclusive.
> There is absolutely nothing, btw. in any of the the
> sub-kernels, patented or not, that can't be done in Linux.
If you look in the archives, you'll actually find me making an
almost identical statement. And like I said back then, the
question isn't whether Linux can become QNX, but whether this
is a desirable goal. And I'll stop here, simply because that's
not up to me to decide. I've taken enough bandwidth as it is
on this thread, and I frankly don't think that any of what I
said above has added any more information for those who've
read my previous postings. I only got into this thread to point
out that some info about RTAI was wrong. So like I told Ingo,
if rt-preempt gets in, then so be it.
<repeating-myself>
>From my POV, it just seems that it's worth asking a basic
question: what is the least intrusive modification to the Linux
kernel that will allow obtaining hard-rt and what mechanisms
can we or can we not build on that modification? Again, my
answer to this question doesn't matter, it's the development
crowd's collective answer that matters. And in championing
the hypervisor/nanokernel path, I could turn out to be horribly
wrong. At this stage, though, I'm yet unconvinced of the
necessity of anything but the most basic kernel changes (as
in using function pointers for the interrupt control path,
which could be a CONFIG_ also).
Much as you see the deployment of Linux in various products
by virtue of working for a distro vendor, so do I encounter
quite a few embedded developers through various venues like
hands-on classes I teach, and in almost every case of hard-
rt problem with Linux I've had submitted to me, there is a
way of getting what is needed with what something like a
nanokernel. Sure, there are cases where it isn't enough and
where something like rt-preempt or RTAI or whatever would
be best, but from my experience this is not the norm.
Again, this is all a question of perspective. I'm basing my
analysis on my personal experience. Yours may dictate an
entirely different path, and again, I could be completely
wrong. We may also end up somewhere in between:
There is, in fact, nothing precluding rt-preemption from
co-existing with a nanokernel.
</repeating-myself>
Cheers,
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Tue, May 24, 2005 at 11:05:53AM -0700, Daniel Walker wrote:
> I think Ingo is just confident that in time things will get merged. I
> know that there are some people who don't want/like the RT changes. I'm
> interested to know what people's objections are. So far in this thread
> the only objection that I've feel is valid is the added complexity the
> RT patch would add to Linux.
There's a process to this that has to happen before inclusion. Ingo
outlined that earlier. This patch isn't terribly well known and really
needs to be hammered much harder by a larger community to trigger
breakage.
I think there's a lot of general ignorance regarding this patch, the
usefulness of it and this thread is partially addressing them.
billl
Karim wrote:
> Sven Dietrich wrote:
> > Linux has been distributing and decoupling locking and
> > data structures since the first multi CPU kernel was booted.
> >
> > All data integrity is just as consistent in RT, so HOW does
> > the behavior change?
>
> Here's quoting Arjan from another thread just today:
> > PREEMPT was (and is?) a stability risk and so you'll see RHEL4 not
> > having it enabled.
>
> ... and that's for simple preemption ...
>
Any stability risk in PREEMPT is a stability risk in SMP.
I've been looking at some form of this RT code for over a year now.
And 99 % of failures is buggy / dirty code that
could fail under SMP as well, if not worse.
We've pushed all that back to Ingo
and elsewhere.
And the bugs keep on coming. This is the fast-track to exposing
concurrency problems all over the place, which makes better code
in Linux.
The RT stuff has the potential to enhance SMP scalability.
> Beyond that, I must admit that I'm probably missing the point
> of your question: fact is that running interrupt handlers as
> threads != dealing with interrupts in a linear fashion (as is
> now.) That's behavior change right there, to mention just that.
>
Linear is maybe not a good term to use. Do you mean the order in which
IRQs execute, or just that you execute IRQ code in process context.
A good generic IRQ handler that can run on multiple architectures,
doesn't care what the CPU flags are.
If it does have to be so down and dirty, it probably doesn't run
as a thread, and its likely SA, as well.
If its about the order in which pending IRQs are processed, then lets take
a leap and ask why does that matter. The IRQs shouldn't care less what
order they are processed in, if they are truly asynchronous / sporadic.
After all, how would the system know which IRQ arrived first to begin with,
when you happen to enable IRQs somewhere, and there happen to be 3 IRQs pending?
>
> There is, in fact, nothing precluding rt-preemption from
> co-existing with a nanokernel. </repeating-myself>
>
Except complexity, as the performance differential between the
Linux kernel and the nanokernel vanishes.
On Wed, May 25, 2005 at 01:21:24AM +1000, Nick Piggin wrote:
> Maybe there are damn few because it is hard to get right within
> the framework of a general posix environment. Or maybe its
> because it has a comparatively small userbase (compared to say
> mid/small servers and desktops). Which are neither completely
> invalid reasons against its inclusion in Linux.
Rest assured RTOS companies are going to flip when this gets
to the mainstream. You might not be aware of this, but this is
probably end up being the single most important patch in the
software industry when it hits the mainstream. There's been
an over emphasis on desktop/server and really persistent lack
of understanding of the embedded market and growth within it.
One of the main reasons why Linux isn't used for more embedded
systems is the lack of hard RT abilities. With this patch, it
has a chance to hit a large section of future consumer devices,
audio/video, with the performance of frame accurate SGI IRIX
boxes. SGI XFS supports some of these features but is under
utilized at this moment partially as a side effect of poor
Linux latency performance. As app folks start to use this more
the more various subcomponents will get fixed.
There's a lot of bad X11 and other app code out there in
userspace.
There's active work in for getting the userspace mutex stuff
down, robust mutexes, so that the RT app folks can have a lot
more solid development environment for traditional RT applications.
However, traditional applications aren't where the excitement
is.
The non-traditional stuff, SGI and friends, is really where I
have my interests because I'm sick of jumping/droppy media
applications. Current schedulers, particularly the priority
based ones, aren't suitable for temporally partitioning run-away
applications from other apps. Introducing the RT patch so that
the scheduler has high resolution material to control along with
an appropriate frame driven scheduler can fullfill this need.
It is in a lot of ways like introducing multitasking OSes for
the first time to people who are use to MSDOS. Multimedia apps
tended to collide with each other in that regard.
With this, Linux has a strong possibility be a top gaming/multimedia
platform almost over night with retuned apps and such. Microsoft
can't do anything about it when this happens and we're so close
right now to pull it off. Having a fully preemptive core is
the precondition for all of this work.
> But I want to be clear that I haven't read or thought about the
> code in question too much, and I don't have any opinions on it
> yet. So please nobody involve me in a flamewar about it :)
The code is pretty mild for the most part and fixes a lot of
preexisting hacks (spin-waiting in drivers) in the kernel. If
anything it's triggered some preexisting bugs and stressed the
system in ways that would still be difficult to trigger in SMP
scenarios.
It's not terribly intrusive, nor does it alter the basic
concurrency structures of the kernel. If you'd like me to, I can
explain this patch to you in private. :) I was working on a
parallel project to Ingo's patches and got pretty far, but have
abandoned it for Ingo's stuff since he's knows the kernel much
better than I do, fixes things faster, etc...
bill
On Tue, 2005-05-24 at 18:31 -0400, Karim Yaghmour wrote:
> I've taken enough bandwidth as it is
> on this thread, and I frankly don't think that any of what I
> said above has added any more information for those who've
> read my previous postings.
On Tue, 2005-05-24 at 15:41 -0700, Bill Huey wrote:
> On Tue, May 24, 2005 at 11:05:53AM -0700, Daniel Walker wrote:
> > I think Ingo is just confident that in time things will get merged. I
> > know that there are some people who don't want/like the RT changes. I'm
> > interested to know what people's objections are. So far in this thread
> > the only objection that I've feel is valid is the added complexity the
> > RT patch would add to Linux.
>
> There's a process to this that has to happen before inclusion. Ingo
> outlined that earlier. This patch isn't terribly well known and really
> needs to be hammered much harder by a larger community to trigger
> breakage.
That's a good reason why it should be included. The maintainers know
that as developers there is no way for us to flush out all the bugs in
our code by ourselves. If the RT patch was added to -mm it would have
greatly increased coverage which , as you noted, is needed . Drivers
will break like mad , but no one but the community has all the hardware
for the drivers.
> I think there's a lot of general ignorance regarding this patch, the
> usefulness of it and this thread is partially addressing them.
True ..
Daniel
On Tue, May 24, 2005 at 06:31:41PM -0400, Karim Yaghmour wrote:
> <repeating-myself>
> From my POV, it just seems that it's worth asking a basic
> question: what is the least intrusive modification to the Linux
> kernel that will allow obtaining hard-rt and what mechanisms
> can we or can we not build on that modification? Again, my
> answer to this question doesn't matter, it's the development
> crowd's collective answer that matters. And in championing
> the hypervisor/nanokernel path, I could turn out to be horribly
> wrong. At this stage, though, I'm yet unconvinced of the
> necessity of anything but the most basic kernel changes (as
> in using function pointers for the interrupt control path,
> which could be a CONFIG_ also).
I know what you're saying and it's kind unaddressed by various
in this discussion.
When I think of the advantages of a single over dual image kernel
system I think of it in terms of how I'm going to implement QoS.
If I need to get access to a special TCP/IP socket in real time
with strong determinancy you run into the problem of crossing to
kernel concurrency domains, one preemptible one not, with a dual
kernel system and have to use queues or other things to
communicate with it. Even with lockless structures, you're still
expressing latency in the Linux kernel personality if you have
some kind of preexisting app that's already running in an atomic
critical section holding non-preemptive spinlocks.
However this is not RTAI as I understand it since it can run N
number of image for each RT task (correct?)
Having multipule images helps out, but fails in scenarios where
you have to have tight data coupling. I have to think about things
like dcache_lock, route tables, access to various IO system like
SCSI and TCP/IP, etc...
A single system image makes access to this direct unlike dual kernel
system where you need some kind of communication coupling. Resource
access is direct. Modifying large grained subsystems in the kernel
is also direct. As preexisting multimedia apps use more RT facilities,
apps are going to need something more of a general purpose OS to make
development easiler. These aren't traditional RT apps at all, but
still require hard RT response times. Keep in mind media apps use
the screen, X11, audio device(s), IDE/SCSI for streaming, networking,
etc... It's a comprehensive use of many of the facilities of kernel
unlike traditional RT apps.
Now, this doesn't necessarily replace RTAI, but a preemptive Linux
kernel can live as a first-class citizen to RTAI. I've been thinking
about merging some of the RTAI scheduler stuff into the RT patch.
uber-preemption Linux doesn't have a sophisticate userspace yet
and here RTAI clearly wins, no decent RT signal handling, etc...
There are other problems with it and the current implementation.
This is going to take time to sort out so RTAI still wins at this
point.
I hope I addressed this properly, but that's the point of view
I'm coming from.
bill
On Tue, 2005-05-24 at 18:31 -0400, Karim Yaghmour wrote:
> I've taken enough bandwidth as it is
> on this thread, and I frankly don't think that any of what I
> said above has added any more information for those who've
> read my previous postings. I only got into this thread to point
> out that some info about RTAI was wrong. So like I told Ingo,
> if rt-preempt gets in, then so be it.
Here's my favorite excerpt:
On Sat, 2004-10-09 at 16:11, Karim Yaghmour wrote:
> And this has been demonstrated mathematically/algorithmically to be
> true 100% of the time, regardless of the load and the driver set? IOW,
> if I was building an automated industrial saw (based on a VP+IRQ-thread
> kernel or a combination of the above-mentioned agregate) with a
> safety mechanism that depended on the kernel's responsivness to
> outside events to avoid bodily harm, would you be willing to put your
> hand beneath it?
Maybe -RT should be merged when Ingo puts his hand under the saw.
Lee
On Tue, May 24, 2005 at 04:44:04PM -0700, Daniel Walker wrote:
> That's a good reason why it should be included. The maintainers know
> that as developers there is no way for us to flush out all the bugs in
> our code by ourselves. If the RT patch was added to -mm it would have
> greatly increased coverage which , as you noted, is needed . Drivers
> will break like mad , but no one but the community has all the hardware
> for the drivers.
It's too premature at this time. There was a lot of work that went
into the RT patch that I would have like for folks to have thought
it through more carefully like RCU, the RT mutex itself, etc...
All of it is very raw and most likely still is subject to rapid
change.
It conflicts with the sched domain and RCU changes at this time
so integration with -mm is highly problematic. -mm is too massive
as is for anything like the RT patch to go in. I've already tried
merging these trees in usig Monotone as my backing SCM and came
to this conclusion.
I consider the RT patch to be for front line folks only at this
time. Give it another 6 months or so since people are having enough
problems with 2.6.11.x
bill
On Tue, 2005-05-24 at 19:22 +0200, Philippe Gerum wrote:
> In fact, the benchmark code was
> wrong, and has been amended by the author himself in a follow-up to
> the
> initial article
>
> http://www.linuxdevices.com/articles/AT3479098230.html
printf() from an RT thread is a serious beginner mistake, and the author
does not seem to understand why. He blames the problem on a change in
RTAI's behavior.
Lee
Bill Huey (hui) wrote:
> On Tue, May 24, 2005 at 04:44:04PM -0700, Daniel Walker wrote:
>
>>That's a good reason why it should be included. The maintainers know
>>that as developers there is no way for us to flush out all the bugs in
>>our code by ourselves. If the RT patch was added to -mm it would have
>>greatly increased coverage which , as you noted, is needed . Drivers
>>will break like mad , but no one but the community has all the hardware
>>for the drivers.
>
>
> It's too premature at this time. There was a lot of work that went
> into the RT patch that I would have like for folks to have thought
> it through more carefully like RCU, the RT mutex itself, etc...
> All of it is very raw and most likely still is subject to rapid
> change.
>
> It conflicts with the sched domain and RCU changes at this time
> so integration with -mm is highly problematic. -mm is too massive
> as is for anything like the RT patch to go in. I've already tried
> merging these trees in usig Monotone as my backing SCM and came
> to this conclusion.
>
> I consider the RT patch to be for front line folks only at this
> time. Give it another 6 months or so since people are having enough
> problems with 2.6.11.x
>
> bill
The only question I would ask of you is this: What will be different in
6 months? The patch may be a bit different, it may be a lot different.
However, it probably won't be that much more rung out than it is today
until more people start beating on it. This probably won't happen until
it is merged. :-)
--
kr
On Tue, 2005-05-24 at 19:32 -0500, K.R. Foley wrote:
> The only question I would ask of you is this: What will be different in
> 6 months? The patch may be a bit different, it may be a lot different.
> However, it probably won't be that much more rung out than it is today
> until more people start beating on it. This probably won't happen until
> it is merged. :-)
>
All of the Linux audio oriented distributions are already shipping -RT
kernels, and most of the serious Linux audio users who use general
purpose distros are running it. That's a few thousand people running it
24/7 for months, and it's been at least a month since any of these users
found a real bug in -RT.
Lee
On Tue, 2005-05-24 at 17:10 -0700, Bill Huey wrote:
> On Tue, May 24, 2005 at 04:44:04PM -0700, Daniel Walker wrote:
> > That's a good reason why it should be included. The maintainers know
> > that as developers there is no way for us to flush out all the bugs in
> > our code by ourselves. If the RT patch was added to -mm it would have
> > greatly increased coverage which , as you noted, is needed . Drivers
> > will break like mad , but no one but the community has all the hardware
> > for the drivers.
>
> It's too premature at this time. There was a lot of work that went
> into the RT patch that I would have like for folks to have thought
> it through more carefully like RCU, the RT mutex itself, etc...
> All of it is very raw and most likely still is subject to rapid
> change.
>
I think some of it is volatile still, but there are plenty of pieces
that could go in now. Threaded interrupts is up for discussion, this is
the reason why I started the thread. People appear to have specific
objections to that feature, which are still not clear.
Whole patch, no, small chunks yes.
Daniel
Lee Revell wrote:
> Apologies to anyone who got blank/bizarre messages from me, I just found
> out they are due to this bug:
>
> https://bugzilla.ubuntu.com/show_bug.cgi?id=10942
For a moment there I thought you were trying to make a point and
decided to put your own hand under the saw ... phew ;)
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Tue, May 24, 2005 at 05:45:13PM -0700, Daniel Walker wrote:
> I think some of it is volatile still, but there are plenty of pieces
> that could go in now. Threaded interrupts is up for discussion, this is
> the reason why I started the thread. People appear to have specific
> objections to that feature, which are still not clear.
>
> Whole patch, no, small chunks yes.
You should have CCed Andrew originally.
All objections I've seen so far have been vague, from folks that
don't/or refuse to understand the fundamentals of how this patch
works nor has tracked the development of it carefully. Until
specific questions and objections are articulated, nothing can be
addressed at this time. I'm biased to ignoring all talk until
then.
bill
On Tue, May 24, 2005 at 05:59:42PM -0700, Bill Huey wrote:
> You should have CCed Andrew originally.
Sorry, you did. :)
I thought I saw something else instead. My bad.
bill
On Tue, 2005-05-24 at 17:59 -0700, Bill Huey wrote:
> On Tue, May 24, 2005 at 05:45:13PM -0700, Daniel Walker wrote:
> > I think some of it is volatile still, but there are plenty of pieces
> > that could go in now. Threaded interrupts is up for discussion, this is
> > the reason why I started the thread. People appear to have specific
> > objections to that feature, which are still not clear.
> >
> > Whole patch, no, small chunks yes.
>
> You should have CCed Andrew originally.
He was CC'd on the very first email. It could be that he isn't going to
get involved cause Ingo already said he's not submitting anything ..
> All objections I've seen so far have been vague, from folks that
> don't/or refuse to understand the fundamentals of how this patch
> works nor has tracked the development of it carefully. Until
> specific questions and objections are articulated, nothing can be
> addressed at this time. I'm biased to ignoring all talk until
> then.
I'm not going to ignore any of the discussion, but it would be nice to
hear Andrew's, or Linus's specific objections..
Daniel
Lee Revell wrote:
> On Tue, 2005-05-24 at 19:32 -0500, K.R. Foley wrote:
>
>>The only question I would ask of you is this: What will be different in
>>6 months? The patch may be a bit different, it may be a lot different.
>>However, it probably won't be that much more rung out than it is today
>>until more people start beating on it. This probably won't happen until
>>it is merged. :-)
>>
>
>
> All of the Linux audio oriented distributions are already shipping -RT
> kernels, and most of the serious Linux audio users who use general
> purpose distros are running it. That's a few thousand people running it
> 24/7 for months, and it's been at least a month since any of these users
> found a real bug in -RT.
>
> Lee
This is good news. I thought I was the only one running this. :-D I am
really glad to hear that though. This still isn't the same kind of
numbers that we might get with a merge, but it is a good start.
--
kr
Daniel Walker <[email protected]> wrote:
>
> I'm not going to ignore any of the discussion, but it would be nice to
> hear Andrew's, or Linus's specific objections..
I have no specific objections - this all started out from my general
observation that things like process-context IRQ handlers and
priority-inheriting mutexes have had a tough reception in the past, and are
likely to do so in the future as well.
This thing will be discussed on a patch-by-patch basis. Contra this email
thread, we won't consider it from an all-or-nothing perspective.
(That being said, it's already a mighty task to decrypt your way through
the maze-like implementation of spin_lock(), lock_kernel(),
smp_processor_id() etc, etc. I really do wish there was some way we could
clean up/simplify that stuff before getting in and adding more source-level
complexity).
Andrew Morton wrote:
>Daniel Walker <[email protected]> wrote:
>
>
>>I'm not going to ignore any of the discussion, but it would be nice to
>> hear Andrew's, or Linus's specific objections..
>>
>>
>This thing will be discussed on a patch-by-patch basis. Contra this email
>thread, we won't consider it from an all-or-nothing perspective.
>
>(That being said, it's already a mighty task to decrypt your way through
>the maze-like implementation of spin_lock(), lock_kernel(),
>smp_processor_id() etc, etc. I really do wish there was some way we could
>clean up/simplify that stuff before getting in and adding more source-level
>complexity).
>
>
>
The IRQ threads are actually a separate implementation.
IRQ threads do not depend on mutexes, nor do they depend
on any of the more opaque general spinlock changes, so this
stuff SHOULD be separated out, to eliminate the confusion..
There was an original IRQ threads submission by
John Cooper/ TimeSys, about a year ago, which Ingo
subsequently rewrote.
The original MV RT-kernel contribution provided separate patches
for IRQ threads, based on Ingo's VP work.
Even Ingo's current IRQ thread implementation,
which provides a /proc interface to pop IRQs in and out of threads,
does not depend on any of the more complex RT-mutex related stuff.
And Ingo's IRQ threads implementation hasn't substantially
changed in close to a year now.
In that sense, Daniel's original query focuses on IRQ threads.
Its up to Ingo if he wants to break that out as a separate patch, again.
I think people would find their system responsiveness / tunability
goes up tremendously, if you drop just a few unimportant IRQs into
threads.
As a logical prerequisite to the Mutex stuff, the IRQ threads, if broken
out,
could allow folks to test the water in the shallow end of the pool.
Give this technology some run-time, get everyone happy with it,
reduce the patch size for RT, and get the Arm folks, a.o. on generic IRQs,
then lets deal with the other pieces of RT, in a nice overseeable fashion.
Sven
Esben Nielsen wrote:
> On Tue, 24 May 2005, Christoph Hellwig wrote:
>
> > On Mon, May 23, 2005 at 04:14:26PM -0700, Daniel Walker wrote:
> >
> > Personally I think interrupt threads, spinlocks as sleeping mutexes
> > and PI is something we should keep out of the kernel tree.
>
> A general threaded interrupt is not a good thing. Ingo made
> this to see how far he can press it. But having serial
> drivers running in interrupt is way overkill. Even network
> drivers can (provided they use DMA) run in interrupt without
> hurting the overall latencies. It all depends on the driver
> and how it interfaces with the rest of the kernel, especially
> what locks are shared and how long the lock are taken. If
> they are small enough, interrupt context and thus raw
> spinlocks are good enough. In general, I think each driver
> ought to be configurable: Either it runs in interrupt context
> or it runs in a thread. The locks have to be changed
> accordingly from raw spinlocks to mutexes.
>
You can run interrupts in threads without any mutex.
There is a /proc interface to switch between threads / mutex.
Sven Dietrich <[email protected]> wrote:
>
> I think people would find their system responsiveness / tunability
> goes up tremendously, if you drop just a few unimportant IRQs into
> threads.
People cannot detect the difference between 1000usec and 50usec latencies,
so they aren't going to notice any changes in responsiveness at all.
On Tue, 2005-05-24 at 19:20 -0700, Andrew Morton wrote:
> Sven Dietrich <[email protected]> wrote:
> >
> > I think people would find their system responsiveness / tunability
> > goes up tremendously, if you drop just a few unimportant IRQs into
> > threads.
>
> People cannot detect the difference between 1000usec and 50usec latencies,
> so they aren't going to notice any changes in responsiveness at all.
The IDE IRQ handler can in fact run for several ms, which people sure
can detect.
Lee
Bill Huey (hui) wrote:
> I think there's a lot of general ignorance regarding this patch, the
> usefulness of it and this thread is partially addressing them.
Forgive the dumb question:
Why isn't anyone doing a presentation about Ingo's patch at the OLS
this year?
If you want to get this thing in front of peoples' eyes, this would
probably be the best venue. It would certainly be a good place to
get people talking about it. Explaining what's in the patch, how
it came to be, what are the interdependencies, modifications to
existing code, added core files, pros/cons, performance, actual
demo, etc.
Currently, looking at the listed presentations, apart from finding
myself thinking "hm..., I swear that guy did the same presentation
last year ... and maybe the year before", I can't see any entry
alluding to rt-preempt ... maybe I missed it?
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
>
> Bill Huey (hui) wrote:
> > I think there's a lot of general ignorance regarding this
> patch, the
> > usefulness of it and this thread is partially addressing them.
>
> Forgive the dumb question:
> Why isn't anyone doing a presentation about Ingo's patch at
> the OLS this year?
>
> If you want to get this thing in front of peoples' eyes, this
> would probably be the best venue. It would certainly be a
> good place to get people talking about it. Explaining what's
> in the patch, how it came to be, what are the
> interdependencies, modifications to existing code, added core
> files, pros/cons, performance, actual demo, etc.
>
> Currently, looking at the listed presentations, apart from
> finding myself thinking "hm..., I swear that guy did the same
> presentation last year ... and maybe the year before", I
> can't see any entry alluding to rt-preempt ... maybe I missed it?
>
I think its too late to add a presentation there now,
but if folks are interested, I would be willing to talk about
it all day long.
Sven
On Tue, 2005-05-24 at 22:37 -0400, Karim Yaghmour wrote:
> Bill Huey (hui) wrote:
> > I think there's a lot of general ignorance regarding this patch, the
> > usefulness of it and this thread is partially addressing them.
>
> Forgive the dumb question:
> Why isn't anyone doing a presentation about Ingo's patch at the OLS
> this year?
Ssh! We're trying to sneak up on Microsoft...
Lee
Sven Dietrich wrote:
> I think its too late to add a presentation there now,
> but if folks are interested, I would be willing to talk about
> it all day long.
I can't speak for anyone, but given how important a change this
is, I would be surprised that the organization committee wouldn't
try to make an effort to accomodate this. After all, it isn't
as if this was organized by outsiders, these folks are part of
the community, surely they are sensitive the community's needs.
Again, I can't speak for anyone, but it would be worth querying
the appropriate authorities ... of course the sooner the better.
I don't know the deadlines, but surely between yourself, Bill,
Lee, Daniel and Ingo you would be able to even put a paper together
rather fast.
... anyway ... it's someone else's baby to carry not mine ... I
just thought I'd mention this idea I got ...
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
Lee Revell wrote:
>>Forgive the dumb question:
>>Why isn't anyone doing a presentation about Ingo's patch at the OLS
>>this year?
>
> Ssh! We're trying to sneak up on Microsoft...
:)
Seriously though, does this actually mean that there is a presentation
at OLS?
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
I just submitted a proposal.
If Ingo was planning on giving one, I'd defer.
> -----Original Message-----
> From: Karim Yaghmour [mailto:[email protected]]
> Sent: Tuesday, May 24, 2005 7:58 PM
> To: Lee Revell
> Cc: Bill Huey (hui); Daniel Walker; Nick Piggin; Ingo Molnar;
> Christoph Hellwig; [email protected];
> [email protected]; [email protected]
> Subject: Re: RT patch acceptance
>
>
>
> Lee Revell wrote:
> >>Forgive the dumb question:
> >>Why isn't anyone doing a presentation about Ingo's patch at the OLS
> >>this year?
> >
> > Ssh! We're trying to sneak up on Microsoft...
>
> :)
>
> Seriously though, does this actually mean that there is a
> presentation at OLS?
>
> Karim
> --
> Author, Speaker, Developer, Consultant
> Pushing Embedded and Real-Time Linux Systems Beyond the
> Limits http://www.opersys.com || [email protected] || 1-866-677-4546
>
Sven Dietrich wrote:
>>Forgive the dumb question:
>>Why isn't anyone doing a presentation about Ingo's patch at
>>the OLS this year?
> I think its too late to add a presentation there now,
> but if folks are interested, I would be willing to talk about
> it all day long.
At the very minimum it should be possible to add a BOF session...
Chris
Lee Revell wrote:
>On Tue, 2005-05-24 at 19:20 -0700, Andrew Morton wrote:
>
>>Sven Dietrich <[email protected]> wrote:
>>
>>>I think people would find their system responsiveness / tunability
>>> goes up tremendously, if you drop just a few unimportant IRQs into
>>> threads.
>>>
>>People cannot detect the difference between 1000usec and 50usec latencies,
>>so they aren't going to notice any changes in responsiveness at all.
>>
>
>The IDE IRQ handler can in fact run for several ms, which people sure
>can detect.
>
>
Are you serious? Even at 10ms, the monitor refresh rate would have to be
over 100Hz for anyone to "notice" anything, right?... What sort of numbers
are you talking when you say several?
Send instant messages to your online friends http://au.messenger.yahoo.com
On Tuesday 24 May 2005 22:20, Andrew Morton wrote:
>Sven Dietrich <[email protected]> wrote:
>> I think people would find their system responsiveness / tunability
>> goes up tremendously, if you drop just a few unimportant IRQs
>> into threads.
>
>People cannot detect the difference between 1000usec and 50usec
> latencies, so they aren't going to notice any changes in
> responsiveness at all.
Excuse me? 1 second (1000 usecs, 200 times your 50 usec example) is
VERY noticeable when you are listening to music, or worse yet, trying
to edit it. For much of that, submillisecond accuracy makes or
breaks the application.
Lets get out of the server only camp here folks, linux is used for a
hell of a lot more than a home for apache.
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.34% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.
> Lee Revell wrote:
>
> >On Tue, 2005-05-24 at 19:20 -0700, Andrew Morton wrote:
> >
> >>Sven Dietrich <[email protected]> wrote:
> >>
> >>>I think people would find their system responsiveness /
> tunability
> >>>goes up tremendously, if you drop just a few unimportant
> IRQs into
> >>>threads.
> >>>
> >>People cannot detect the difference between 1000usec and 50usec
> >>latencies, so they aren't going to notice any changes in
> >>responsiveness at all.
> >>
> >
> >The IDE IRQ handler can in fact run for several ms, which
> people sure
> >can detect.
> >
> >
>
> Are you serious? Even at 10ms, the monitor refresh rate would
> have to be over 100Hz for anyone to "notice" anything,
> right?... What sort of numbers are you talking when you say several?
>
Even without numbers, the IDE IRQ, when run in a thread,
competes with tasks at process level, so that other
tasks can make some progress. Especially if those tasks are
high priority.
With multiple disks on a chain, you can see transients that
lock up the CPU in IRQ mode for human-perceptible time,
especially on slower CPUs...
This is part of the reason why SoftIRQd exists: to act as
a governor for bottom halves that run over and over again.
SoftIRQd handles those bursty bottom halves in task space.
So with that, you already have bottom halves in threads.
Then we are just talking about the concept of running the
top-half in a thread as well.
Maybe Lee will have some numbers handy...
On Tuesday 24 May 2005 19:46, Lee Revell wrote:
>On Tue, 2005-05-24 at 18:31 -0400, Karim Yaghmour wrote:
>> I've taken enough bandwidth as it is
>> on this thread, and I frankly don't think that any of what I
>> said above has added any more information for those who've
>> read my previous postings. I only got into this thread to point
>> out that some info about RTAI was wrong. So like I told Ingo,
>> if rt-preempt gets in, then so be it.
>
>Here's my favorite excerpt:
>
>On Sat, 2004-10-09 at 16:11, Karim Yaghmour wrote:
>> And this has been demonstrated mathematically/algorithmically to
>> be true 100% of the time, regardless of the load and the driver
>> set? IOW, if I was building an automated industrial saw (based on
>> a VP+IRQ-thread kernel or a combination of the above-mentioned
>> agregate) with a safety mechanism that depended on the kernel's
>> responsivness to outside events to avoid bodily harm, would you be
>> willing to put your hand beneath it?
>
>Maybe -RT should be merged when Ingo puts his hand under the saw.
>
>Lee
Off topic sorry, can't resist.
Maybe so Lee, but first we'ed better check with the USTPO, as one of
the major table saw makers is actually selling a saw that you can
stick a weiner into while its running, and a common bandaid can cover
the damage. The blade is stopped before the next tooth after the one
that initially contacts the weiner can come around and do more than
scratch the weiner. The stop is a bit noisy I assume considering you
are stopping a blade turning 3k to 6k rpms in 1/4" of linear motion
at the rim of the blade, and rather expensive, ISTR a $400 option in
their top of the line 10 inch table saws. Because of the larger
components, the 14" saw carries over $1k premium for that option.
>
>-
>To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.34% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.
Sven Dietrich wrote:
> Andrew Morton wrote:
>
>> Daniel Walker <[email protected]> wrote:
>>
>>
>>> I'm not going to ignore any of the discussion, but it would be nice to
>>> hear Andrew's, or Linus's specific objections..
>>>
>>
>> This thing will be discussed on a patch-by-patch basis. Contra this
>> email
>> thread, we won't consider it from an all-or-nothing perspective.
>>
>> (That being said, it's already a mighty task to decrypt your way through
>> the maze-like implementation of spin_lock(), lock_kernel(),
>> smp_processor_id() etc, etc. I really do wish there was some way we
>> could
>> clean up/simplify that stuff before getting in and adding more
>> source-level
>> complexity).
>>
>>
>>
> The IRQ threads are actually a separate implementation.
>
> IRQ threads do not depend on mutexes, nor do they depend
> on any of the more opaque general spinlock changes, so this
> stuff SHOULD be separated out, to eliminate the confusion..
There is the assumption a spinlock-mutex can provide
synchronization with interrupt processing. The means
to effect this is by the interrupt payload running in
task context where it plays by the rules of a spinlock-mutex
(ie: block upon contention).
If the interrupt payload runs in exception context a
blocking spinlock-mutex by definition cannot be used for
synchronization. Rather we are back to the raw_spinlock
which must also disable exception interrupts in tandem
with acquiring the raw_spinlock -- an attempt to acquire
the raw_spinlock in exception interrupt context must be
guaranteed to always succeed.
So there is a mutual design dependence between IRQ threads
and spinlock-mutexes in order to allow interrupt payload
processing to be pushed into kernel scheduleable task context.
This gives the benefit of minimizing the amount of time a
CPU spends in exception interrupt context and eliminates
the need for the spinlock-mutex to resort to disabling
interrupts in order to synchronize with payload execution.
> There was an original IRQ threads submission by
> John Cooper/ TimeSys, about a year ago, which Ingo
> subsequently rewrote.
I wasn't involved in that work. The credit there goes to
Scott Wood.
> As a logical prerequisite to the Mutex stuff, the IRQ threads, if broken
> out,
> could allow folks to test the water in the shallow end of the pool.
Dropping IRQ threads will require either a reversion to
all raw_spinlock usage or creation of a spinlock-mutex
version which disables interrupts for cases where code
must synchronize with exception interrupts. Neither of
these sounds particularly attractive compared to the
IRQ thread mechanism.
I'd like to hear some technical arguments of why IRQ threads
are held with such suspicion. Also it isn't the case prior
mechanisms are being obsoleted. Exception context interrupt
processing and raw_spinlocks to synchronize with them are
still available and will be for those edge cases which
are not addressable via spinlock-mutexes.
-john
--
[email protected]
Sven Dietrich wrote:
>>Are you serious? Even at 10ms, the monitor refresh rate would
>>have to be over 100Hz for anyone to "notice" anything,
>>right?... What sort of numbers are you talking when you say several?
>>
>>
>
>Even without numbers, the IDE IRQ, when run in a thread,
>competes with tasks at process level, so that other
>tasks can make some progress. Especially if those tasks are
>high priority.
>
>With multiple disks on a chain, you can see transients that
>lock up the CPU in IRQ mode for human-perceptible time,
>especially on slower CPUs...
>
>
I'm fairly sure this isn't going to happen. Unless you're telling me
that irqs run for 50ms plus.
What is more likely is that you are seeing some starvation from simply
too much CPU usage by interrupts (note this has nothing to do with latency).
*This* is the reason ksoftirqd exists.
ie. you are seeing a throughput issue rather than a latency issue.
>This is part of the reason why SoftIRQd exists: to act as
>a governor for bottom halves that run over and over again.
>SoftIRQd handles those bursty bottom halves in task space.
>
>
ksoftirq doesn't alleviate any kind of latencies anywhere AFAIKS.
>So with that, you already have bottom halves in threads.
>
>
softirqs won't normally run in another thread though, right?
>Then we are just talking about the concept of running the
>top-half in a thread as well.
>
>
Yeah I don't think it is anything close to the same concept of
softirqs though. But yeah, "just" running top half in threads
sounds like one of the issues that will come up for discussion ;)
Send instant messages to your online friends http://au.messenger.yahoo.com
Gene Heskett wrote:
>On Tuesday 24 May 2005 22:20, Andrew Morton wrote:
>
>>Sven Dietrich <[email protected]> wrote:
>>
>>>I think people would find their system responsiveness / tunability
>>> goes up tremendously, if you drop just a few unimportant IRQs
>>>into threads.
>>>
>>People cannot detect the difference between 1000usec and 50usec
>>latencies, so they aren't going to notice any changes in
>>responsiveness at all.
>>
>
>Excuse me?
>
You are excused ;)
> 1 second (1000 usecs, 200 times your 50 usec example) is
>
1000usecs is 1msec.
>VERY noticeable when you are listening to music, or worse yet, trying
>to edit it. For much of that, submillisecond accuracy makes or
>breaks the application.
>
>
For listening to music, 1msec is absolutely no problem. For editing,
perhaps it is getting problematic. But Andrew (and parent) were
not talking about realtime applications, but *interactivity*.
>Lets get out of the server only camp here folks, linux is used for a
>hell of a lot more than a home for apache.
>
Let's all try to keep calm and think carefully about what someone has said
and in what context before responding.
This is a topic that for some reason will tend to degenerate into a random
shouting match where nobody actually says anything or listens to anything,
and nothing gets done. Not to say you are trying to start a flamewar, Gene,
but everyone just needs to tread a bit carefully :)
Send instant messages to your online friends http://au.messenger.yahoo.com
On Wed, May 25, 2005 at 03:05:58PM +1000, Nick Piggin wrote:
> What is more likely is that you are seeing some starvation from simply
> too much CPU usage by interrupts (note this has nothing to do with latency).
> *This* is the reason ksoftirqd exists.
>
> ie. you are seeing a throughput issue rather than a latency issue.
That's presumptuous. It's more likely that it's flakey hardware
or a missed interrupt that would cause something like that. I've
seen that happen.
> ksoftirq doesn't alleviate any kind of latencies anywhere AFAIKS.
Ksoftirq is used to support the concurrency model in the RT patch
so that irq-threads and {spin,read,write}_irq_{un,}lock can be
correct yet preemptible.
> softirqs won't normally run in another thread though, right?
> Yeah I don't think it is anything close to the same concept of
> softirqs though. But yeah, "just" running top half in threads
> sounds like one of the issues that will come up for discussion ;)
This is not an issue for the regular kernel since they can run
along side with the IRQ at interrupt time. It's only compile time
relevant to the RT.
And please don't take a chunk of the patch out of context and FUD it.
There's enough uncertainty regarding this track as is without additional
misunderstanding or misrepresentations.
The stuff that is directly relevant to you and your work is the
scheduler changes needed to support RT and the possibility that
it would conflict with the sched domain stuff for NUMA boxes.
The needs are sufficiently different that in the long run something
like "sched-plugin" might be needed to simplify kernel development
and permit branched development.
bill
* Nick Piggin <[email protected]> wrote:
> Lee Revell wrote:
>
> > The IDE IRQ handler can in fact run for several ms, which people
> > sure can detect.
>
> Are you serious? Even at 10ms, the monitor refresh rate would have to
> be over 100Hz for anyone to "notice" anything, right?... [...]
you are assuming direct observation. Sure, a human (normally) doesnt
notice smaller than say 10-20 msec of lag. But, a human very much
notices indirect effects of latencies, such as the nasty 'click' a
soundcard produces if it overruns.
> What sort of numbers are you talking when you say several?
a couple of msecs easily even on fast boxes. Well over 10 msecs on
slower boxes.
Ingo
* Andrew Morton <[email protected]> wrote:
> (That being said, it's already a mighty task to decrypt your way
> through the maze-like implementation of spin_lock(), lock_kernel(),
> smp_processor_id() etc, etc. I really do wish there was some way we
> could clean up/simplify that stuff before getting in and adding more
> source-level complexity).
yes, that's next on my list, and it's completely independent of
PREEMPT_RT, as 'the maze of spinlock APIs' already exists in the current
kernel. (PREEMPT_RT only makes the problem worse) But dont expect big
wonders.
Ingo
On Tue, May 24, 2005 at 10:37:33PM -0400, Karim Yaghmour wrote:
> Bill Huey (hui) wrote:
> > I think there's a lot of general ignorance regarding this patch, the
> > usefulness of it and this thread is partially addressing them.
>
> Forgive the dumb question:
> Why isn't anyone doing a presentation about Ingo's patch at the OLS
> this year?
>
> If you want to get this thing in front of peoples' eyes, this would
> probably be the best venue. It would certainly be a good place to
> get people talking about it. Explaining what's in the patch, how
> it came to be, what are the interdependencies, modifications to
> existing code, added core files, pros/cons, performance, actual
> demo, etc.
I haven't even asked my employeer if I should go or not ? should I ?
Seriously, I was going to stay out here and work on more RT related
stuff that I've been working on for a number of months. Who should go
to OLS ?
bill
* Andrew Morton <[email protected]> wrote:
> Sven Dietrich <[email protected]> wrote:
> >
> > I think people would find their system responsiveness / tunability
> > goes up tremendously, if you drop just a few unimportant IRQs into
> > threads.
>
> People cannot detect the difference between 1000usec and 50usec
> latencies, so they aren't going to notice any changes in
> responsiveness at all.
i agree in theory, but interestingly, people who use the -RT branch do
report a smoother desktop experience. While it might also be a
psychological effect, under -RT an interactive X process has the same
kind of latency properties as if all of the mouse pointer input and
rendering was done in the kernel (like some other desktop OSs do).
so in terms of mouse pointer 'smoothness', it might very well be
possible for humans to detect a couple of msec delays visually - even
though they are unable to notice those delays directly. (Isnt there some
existing research on this?)
but this is getting offtrack. -RT does have direct benefits for pro
audio (and of course embedded systems) users, maybe also interactivity
benefits for slower/older systems, but i'm not trying to argue that it's
necessary for the generic desktop. (especially considering the kernel
overhead)
but there exist other indirect benefits: what is a scheduling latency
critical path on CONFIG_PREEMPT, is still a (secondary) critical path on
PREEMPT_RT too, which embedded people will try to improve. The same is
true for voluntary-preempt: if you break a latency path on
CONFIG_PREEMPT, you implicitly improve PREEMPT_VOLUNTARY too. So there
are fundamental cross-effects between the preemption models, and by
cowardly luring those embedded developers into using the stock Linux
kernel instead of hacking on their own isolated patches/trees (or OSs)
we indirectly improve latencies of the desktop preemption model too.
Please dont underestimate the amount of development that goes on in the
embedded world, the more of them use Linux, the better for all Linux
users.
it's also a perception thing: if Linux _can_ offer sub-100 usec
latencies, embedded developers are more likely to pick it for their
project - even if the hardware does not need so good latencies. Embedded
developers (and OS vendors) will be more likely to standardize on Linux
exclusively, if they know that whatever future customer comes around,
Linux will be able to perform.
it's pretty much the same story as with scalability: only a few people
needs Linux to scale to 500 CPUs (in fact only a small precentage needs
anything above 4 CPUs), but the perception advantage gives 2-CPU people
the warm fuzzy feeling that if Linux works fine on 500 CPUs then it must
be more than adequate on 2 CPUs. Is anyone going to argue that Linux
does not need to scale above 4 CPUs just because the number of users in
that space is less than 1%?
[ of course this is all just talk, but people seem to have a desire to
talk about it :-) ]
Ingo
> Lee Revell wrote:
> >On Tue, 2005-05-24 at 19:20 -0700, Andrew Morton wrote:
> >
> >>Sven Dietrich <[email protected]> wrote:
> >>
> >>>I think people would find their system responsiveness / tunability
> >>> goes up tremendously, if you drop just a few unimportant IRQs into
> >>> threads.
> >>>
> >>People cannot detect the difference between 1000usec and
> 50usec latencies,
> >>so they aren't going to notice any changes in responsiveness at all.
> >
> >The IDE IRQ handler can in fact run for several ms, which people sure
> >can detect.
>
> Are you serious? Even at 10ms, the monitor refresh rate would
> have to be
> over 100Hz for anyone to "notice" anything, right?... What
> sort of numbers
> are you talking when you say several?
I measured IDE delays just a few weeks ago.
We are talking about up to 100 ms.
Absolutely unacceptable for realtime systems.
*Very* noticeable even for interactive systems:
Keyboard and mouse lags, lost timer ticks, ...
Why that long?
* The system I tested uses a CF card connected to the standard IDE
controller as its primary disk.
* The CF card runs in PIO mode. Hence, all data transfer is done
by the CPU itself, in the interrupt handler, blocking the CPU.
* CF cards are slow, the worst I've seen does about 1.5 MB/s.
* On the other hand, CF cards deliver data continuosly:
As soon as one sector has been read, the interrupt for the
next sector arrives. No hole in between to do other things.
* Now, calculate the time for the standard sequential readahead,
which is 128 KB. You end up with something close to 100 ms.
During this time, the CPU is completely occupied by IDE,
not reacting to anything else in the standard kernel.
With the RT kernel, at least everything above the IDE interrupt
priority level is able to continue.
--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com
Bill Huey (hui) wrote:
> On Wed, May 25, 2005 at 03:05:58PM +1000, Nick Piggin wrote:
>
>>What is more likely is that you are seeing some starvation from simply
>>too much CPU usage by interrupts (note this has nothing to do with latency).
>>*This* is the reason ksoftirqd exists.
>>
>>ie. you are seeing a throughput issue rather than a latency issue.
>
>
> That's presumptuous. It's more likely that it's flakey hardware
> or a missed interrupt that would cause something like that. I've
> seen that happen.
>
Err, no. Please read what was written.
"With multiple disks on a chain, you can see transients that
lock up the CPU in IRQ mode for human-perceptible time,
especially on slower CPUs... "
I was pointing out that this will be a throughput rather than
latency issue. Unless you're saying that an interrupt handler
will run for 30ms or more?
>
>>ksoftirq doesn't alleviate any kind of latencies anywhere AFAIKS.
>
>
> Ksoftirq is used to support the concurrency model in the RT patch
> so that irq-threads and {spin,read,write}_irq_{un,}lock can be
> correct yet preemptible.
>
That has nothing to do with what I said. I said ksoftirq doesn't
alleviate latencies.
>
>>softirqs won't normally run in another thread though, right?
>
>
>
>>Yeah I don't think it is anything close to the same concept of
>>softirqs though. But yeah, "just" running top half in threads
>>sounds like one of the issues that will come up for discussion ;)
>
>
> This is not an issue for the regular kernel since they can run
> along side with the IRQ at interrupt time. It's only compile time
> relevant to the RT.
>
It is relevant because code complexity is relevant. Have you
been reading what has been said? Don't take my word for it, read
what Andrew is saying.
> And please don't take a chunk of the patch out of context and FUD it.
> There's enough uncertainty regarding this track as is without additional
> misunderstanding or misrepresentations.
>
I haven't looked at *any* part of *any* patch, nor commented on
any patch. I described the type of discussion and acceptance
that needs to happen before a large patch (like this) gets merged.
I also backed up Andrew's assertion that better interrupt latencies
wouldn't really help interactivity (the scheduler is a *far* bigger
factor here)
> The stuff that is directly relevant to you and your work is the
> scheduler changes needed to support RT and the possibility that
> it would conflict with the sched domain stuff for NUMA boxes.
> The needs are sufficiently different that in the long run something
> like "sched-plugin" might be needed to simplify kernel development
> and permit branched development.
>
I don't know what you're talking about, sorry.
Why are people so touchy about this subject? I didn't even anywhere
criticize anyone's patches or any approach or idea!! :\
The best way to get anything to happen is to get a common
understanding going through constructive discussion. Please stick to
that. Thanks.
Send instant messages to your online friends http://au.messenger.yahoo.com
kus Kusche Klaus wrote:
>> What sort of numbers are you talking when you say several?
>
>
> I measured IDE delays just a few weeks ago.
>
> We are talking about up to 100 ms.
OK, thanks for the data point.
That kind of interrupts off time is not really acceptable to
anyone, be it real time or a server.
I guess nothing has been done about the problem because it is
a relatively rare setup (or not enough people complaining).
Send instant messages to your online friends http://au.messenger.yahoo.com
Ingo Molnar wrote:
> * Andrew Morton <[email protected]> wrote:
>
>
>>Sven Dietrich <[email protected]> wrote:
>>
>>>I think people would find their system responsiveness / tunability
>>> goes up tremendously, if you drop just a few unimportant IRQs into
>>> threads.
>>
>>People cannot detect the difference between 1000usec and 50usec
>>latencies, so they aren't going to notice any changes in
>>responsiveness at all.
>
>
> i agree in theory, but interestingly, people who use the -RT branch do
> report a smoother desktop experience. While it might also be a
> psychological effect, under -RT an interactive X process has the same
> kind of latency properties as if all of the mouse pointer input and
> rendering was done in the kernel (like some other desktop OSs do).
>
> so in terms of mouse pointer 'smoothness', it might very well be
> possible for humans to detect a couple of msec delays visually - even
> though they are unable to notice those delays directly. (Isnt there some
> existing research on this?)
I'm guessing not, just because the monitor probably hasn't even
refreshed at that point ;) But...
[...]
>
> [ of course this is all just talk, but people seem to have a desire to
> talk about it :-) ]
>
You make good points. What's more, I don't think anyone needs to
advocate the RT work on the basis that it improves interactiveness.
That path is just going to lead to unwinnable arguments and will
distract from the real measurable improvements that it does bring.
I think anyone who doesn't like that won't be convinced because
someone is telling them it improves interactiveness ;)
Now lest I create a negative image of myself, I'd like to say that
without looking at the code, it sounds quite nice and if it can be
nicely encapsulated and CONFIGurable then I don't see why it
can't eventually be included.
Send instant messages to your online friends http://au.messenger.yahoo.com
* Nick Piggin <[email protected]> wrote:
> >i agree in theory, but interestingly, people who use the -RT branch do
> >report a smoother desktop experience. While it might also be a
> >psychological effect, under -RT an interactive X process has the same
> >kind of latency properties as if all of the mouse pointer input and
> >rendering was done in the kernel (like some other desktop OSs do).
> >
> >so in terms of mouse pointer 'smoothness', it might very well be
> >possible for humans to detect a couple of msec delays visually - even
> >though they are unable to notice those delays directly. (Isnt there some
> >existing research on this?)
>
> I'm guessing not, just because the monitor probably hasn't even
> refreshed at that point ;) But...
this reminds me, people very much notice the difference between an LCD
that has 20 msec refresh rates vs. ones that have 10 msec refresh rates.
i'd say the direct perception limit should be somewhere around 10 msec,
but there can be indirect effects that add up. (e.g. while we might not
be able to detect so small delays directly, the human eye can see
_distance_ anomalies that are caused by small delays. E.g. the feeling
of how 'smoothly' the mouse moves might be more accurate than direct
delay perception. But i'm really out on a limb here as this is so hard
to measure directly.)
> [...]
> >
> >[ of course this is all just talk, but people seem to have a desire to
> > talk about it :-) ]
> >
>
> You make good points. What's more, I don't think anyone needs to
> advocate the RT work on the basis that it improves interactiveness.
>
> That path is just going to lead to unwinnable arguments and will
> distract from the real measurable improvements that it does bring.
>
> I think anyone who doesn't like that won't be convinced because
> someone is telling them it improves interactiveness ;)
a good number of testers use it because it improves interactiveness - so
you'll see these arguments come up.
One indirect latency effect is that during heavier VM load, e.g. kswapd
(or the swapout code) is preemptable by X.
Another indirect effect is that in the stock kernel, interrupt work is
not preemptable. So a short succession of heavier interrupts, followed
by softirq processing, can very much cause more than 10 msec delays.
Under PREEMPT_RT (or just PREEMPT + softirq and hardirq threading) these
workloads are much more controlled. (at the price of significant IRQ
processing overhead, which should not be ignored either.)
but ... i agree that this argument in isolation cannot "win". It's the
sum of arguments that matters.
Ingo
> K.R. Foley wrote:
>
> > There are definitely those who would prefer to have the
> functionality,
> > at least as an option, in the mainline kernel. The group
> that I contract
> > for get heartburn about having to patch every kernel
> running on every
> > development workstation and every production system. We
> need hard RT,
> > but currently when we have to have hard RT we go with a different
> > product.
>
> Well, yes. There are lots of things Linux isn't suited for.
> There are likewise a lot of patches that SGI would love to
> get into the kernel so it runs better on their 500+ CPU
> systems. My point was just that a new functionality/feature
> doesn't by itself justify being included in the kernel.org
> kernel.
I would like to throw in my (and my employer's) point of view,
which is the point of view of a potential user of RT linux,
not the view of a kernel developer.
We are currently evaluating the suitability of Linux for
industrial control systems.
We strongly opt for having RT in the standard kernel,
not as a separate patch.
It will surely make a big difference for our final decision.
>From the engineer's point of view:
* Adding one patch to the kernel is usually quite trivial.
However, adding several big patches to the kernel is a
major PITA: They are usually based on different versions
of the base kernel, they collide and usually need some manual
merging, they have not been tested together, they cause the
number of updates to explode exponentially (a critical fix of
any of the patches involved forces a rebuild of the whole thing),
and so on.
Currently, we have to integrate the RT patch, some debugging
patches (like kgdb), additional device drivers and protocols
(e.g. CANbus), and probably more in future.
Hence, having as many features as possible in the standard kernel
instead of in separate patches would be a major relief, and would
simplify using linux a lot.
>From the management's point of view:
* Mgmt has to pay for the staff doing this patching and integration,
which causes additional costs (and delays the product).
Currently, our products are based on a commercial realtime OS,
which works out of the box - Linux has to compete with that.
* Something being "a patch" makes a big difference in attitude:
Mgmt strongly resists to base the success of our company on patches.
"Patch" sounds hacky, quick & dirty, temporary, ...
"Standard" sounds a lot more reliable, solid, proven, well-done.
For its decision, the mgmt wants a clear indication that RT linux
is something which is expected to exist and to be maintained
for years, not just some quick hack to demonstrate the principle.
Being in the standard kernel would clearly indicate a commitment
that RT linux is there to stay.
Greetings
--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com
* Karim Yaghmour <[email protected]> wrote:
> Bill Huey (hui) wrote:
> > I think there's a lot of general ignorance regarding this patch, the
> > usefulness of it and this thread is partially addressing them.
>
> Forgive the dumb question:
> Why isn't anyone doing a presentation about Ingo's patch at the OLS
> this year?
(i guess mostly because i'm pretty presentation-shy. It's probably too
late for OLS, but if someone else feels a desire to do more in this
area, i certainly wont complain.)
> Currently, looking at the listed presentations, apart from finding
> myself thinking "hm..., I swear that guy did the same presentation
> last year ... and maybe the year before", I can't see any entry
> alluding to rt-preempt ... maybe I missed it?
you could not have seen it a year ago because it simple didnt exist back
then :) I started implementing the PREEMPT_RT model roughly half a year
ago.
Ingo
On Wed, 25 May 2005 08:33:06 +0200, Ingo Molnar wrote:
> i agree in theory, but interestingly, people who use the -RT branch do
> report a smoother desktop experience. While it might also be a
> psychological effect, under -RT an interactive X process has the same
> kind of latency properties as if all of the mouse pointer input and
> rendering was done in the kernel (like some other desktop OSs do).
The only way to actually know if it really makes a difference or not
would be to run a double-blind test, with people not knowing if
they're running a RT kernel or not, and then reporting their
experience re. desktop smoothness. But I doubt such a test could
actually be taken into consideration, unless distributions start
shipping with different kernels without the user knowing, and then ask
about how it feels ...
This all being said, esp. concerning the next point you rise
> so in terms of mouse pointer 'smoothness', it might very well be
> possible for humans to detect a couple of msec delays visually - even
> though they are unable to notice those delays directly. (Isnt there some
> existing research on this?)
IIRC, there have been (I'm not sure if there still are) some issues
with IRQs being lost on the input devices, missing keys, missing
events or misbehaving of mice and similar ... would these problems
(and the underlying issues in the codepaths) be more easy or harder to
happen and to trace if they happened?
--
Giuseppe "Oblomov" Bilotta
"They that can give up essential liberty to obtain
a little temporary safety deserve neither liberty
nor safety." Benjamin Franklin
On Wed, May 25, 2005 at 05:00:29PM +1000, Nick Piggin wrote:
> Err, no. Please read what was written.
>
> "With multiple disks on a chain, you can see transients that
> lock up the CPU in IRQ mode for human-perceptible time,
> especially on slower CPUs... "
ok, yeah, I thought you were talking about something else. Email
is a tricky medium at times.
> I was pointing out that this will be a throughput rather than
> latency issue. Unless you're saying that an interrupt handler
> will run for 30ms or more?
Don't know. For non-DMA IDE access data copies are CPU driven
which can create tons of latency problems for that case. irq-threads
run as SCHED_FIFO and can freeze the system under heavy IO load
in that situation.
The way I interpreted the comment originally was irq-threads suck
which is why these latency happens. This is not what you ment. The
fear here is that there would be a push in the discussion to revert
some of this preemption work away from full preemption to more
conservative techniques like preemption points, which would drive
me absolutely bonkers right now.
> It is relevant because code complexity is relevant. Have you
> been reading what has been said? Don't take my word for it, read
> what Andrew is saying.
I reread those comments.
I suggest that you read the patch for the answer to softirq
complexity. You'll find the implementation sane. It's a simple add-on
to how softirqs are handled currently with but pushed all ksoftirqd.
What was non-preemptible execution at the end of an IRQ handler in
the normal kernel is now pushed all to that thread. There's no mystery
magic here.
> I haven't looked at *any* part of *any* patch, nor commented on
> any patch. I described the type of discussion and acceptance
> that needs to happen before a large patch (like this) gets merged.
I interpreted the ksoftirqd comment/worry you said was just that,
inserting a concern about something you percieved as a possible
problem, therefore doubt.
> I also backed up Andrew's assertion that better interrupt latencies
> wouldn't really help interactivity (the scheduler is a *far* bigger
> factor here)
This is a complicated problem and I'm not sure what folks are getting
at with the irq latency track other than it exists and it effects
overall latency in the system. Running above the priority of the
irq-thread handlers in a fully preemptible kernel should permit that
thread to run that task and service that thread at hand. Having the
ability to do that *could* help with overall system interactivity,
but it would require proper userspace apps to exploit it.
> I don't know what you're talking about, sorry.
I thought you were doing some kind of scheduler CPU NUMA migration
stuff ?
> Why are people so touchy about this subject? I didn't even anywhere
> criticize anyone's patches or any approach or idea!! :\
Because it's a radical conceptual change to the kernel and this
development group has been classically resistent to these kind of
changes from what I've seen. There's also an insane amount of pressure
from commericial circle to blow this patch out of the water before
inclusion. Just because you haven't seen it yet doesn't mean it's not
going to happen. You can bet that RTOS folks are going to be all over
it in a negative way and they aren't as friendly as you.
Just about everybody in the RTOS works knows this is going to be an
atomic bomb to the industry for better or worse. This is excluding
the freakout nature of the SGI-ish stuff you'll be able to do with
this patch. Just wait until somebody gets a frame locked Doom3 out. :)
> The best way to get anything to happen is to get a common
> understanding going through constructive discussion. Please stick to
> that. Thanks.
Sorry, yeah, I'm a bit jumpy from dealing with chronic irrationality
from the FreeBSD group, which has created low expectations from various
open source groups at times. Interaction with other jumpy kernel
conservatives in this community doesn't the help the matter.
Basically, the more you read http://linuxdevices.com the more you'll
understand why folks are edgy about this. :)
bill
On Wed, 25 May 2005 17:46, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > >i agree in theory, but interestingly, people who use the -RT branch do
> > >report a smoother desktop experience. While it might also be a
> > >psychological effect, under -RT an interactive X process has the same
> > >kind of latency properties as if all of the mouse pointer input and
> > >rendering was done in the kernel (like some other desktop OSs do).
> > >
> > >so in terms of mouse pointer 'smoothness', it might very well be
> > >possible for humans to detect a couple of msec delays visually - even
> > >though they are unable to notice those delays directly. (Isnt there some
> > >existing research on this?)
> >
> > I'm guessing not, just because the monitor probably hasn't even
> > refreshed at that point ;) But...
>
> this reminds me, people very much notice the difference between an LCD
> that has 20 msec refresh rates vs. ones that have 10 msec refresh rates.
>
> i'd say the direct perception limit should be somewhere around 10 msec,
> but there can be indirect effects that add up. (e.g. while we might not
> be able to detect so small delays directly, the human eye can see
> _distance_ anomalies that are caused by small delays. E.g. the feeling
> of how 'smoothly' the mouse moves might be more accurate than direct
> delay perception. But i'm really out on a limb here as this is so hard
> to measure directly.)
Quite a lot outside the computing world has been done on human perception and
the limit of perception on what would be scheduling jitter is approximately
7ms if I recall correctly.
Cheers,
Con
On Tue, 24 May 2005, john cooper wrote:
> [...]
> I'd like to hear some technical arguments of why IRQ threads
> are held with such suspicion. Also it isn't the case prior
> mechanisms are being obsoleted. Exception context interrupt
> processing and raw_spinlocks to synchronize with them are
> still available and will be for those edge cases which
> are not addressable via spinlock-mutexes.
>
Performance! Even on RT systems you do NOT make all interrupts run in
threads. Simple devices like UARTS run everything in interrupt context.
Introducing a context switch for every character received on such a
channel can be _very_ expensive.
I think it would be safe to convert almost every driver back to run in
exception context and use raw spinlocks for locking accordingly. Very few
driver actually does a lot of work on the interrupt level. Only those
devices high bandwidth and no DMA is a problem (old IDE and ethernet
devices spring to mind).
Therefore a framework where it can be configured per device would be the
ideal solution.
I do not know the structure of the code very well and I do not have any
time to look into it now. But I could imagine kbuild can be set up to
change the relavant between being a mutex and a raw spinlocks depending on
which code runs in exception context or in a thread.
> -john
>
>
Esben
Bill Huey (hui) wrote:
> I haven't even asked my employeer if I should go or not ? should I ?
Given your involvement with Linux, probably yes, but it's really up to
you. You should be aware, though, that just yesterday I got an automated
e-mail saying that they were 95% booked (remember that attendance is
capped at 500). So if you should decide to go, now is the time ...
> Seriously, I was going to stay out here and work on more RT related
> stuff that I've been working on for a number of months. Who should go
> to OLS ?
Gee ... I can't really say I have definitive answer to that. From
personal experience, I can tell you that there are people from all
levels of involvement that go in there, from core hackers to just
simple users. The real key of OLS is the networking you get to do
with other people (i.e. meeting people in the hallways and
discussing issues which are hard to resolve on a mailing list.)
Not to mention the beer drinking ... ;)
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
Esben Nielsen wrote:
> On Tue, 24 May 2005, john cooper wrote:
>>I'd like to hear some technical arguments of why IRQ threads
>>are held with such suspicion...
>
> Performance! Even on RT systems you do NOT make all interrupts run in
> threads. Simple devices like UARTS run everything in interrupt context.
> Introducing a context switch for every character received on such a
> channel can be _very_ expensive.
The IRQ thread mechanism introduces a facility which offers
a benefit at an associated cost. For cases where the interrupt
payload processing is small in comparison to the associated
context switch, overhead in this case may be optimized
by running the payload processing in exception context.
But "performance" here is a vague term. It may in some cases
be preferable to incur an increased overhead of interrupt payload
processing in task context to improve overall CPU availability
or reduce interrupt lockout in code associated with the
interrupt. It is a system-wide issue depending on the system
goals.
I agree for simple devices which generate high frequency interrupts
and have trivial interrupt payload processing, the addition of
deferring the latter to task context may be unneeded overhead.
But even here it is a system-wide design issue and I don't see
a simple, universal right-way/wrong-way. In any case the choice
of either mechanisms is available.
As a data point, commercial OSes exist which strive to optimize
for non-RT throughput which by default defer all interrupt payload
processing into task context. Not that this is necessarily
conclusive here but it should offer reassurance this isn't as
radical a concept as it may seem.
-john
--
[email protected]
Ingo Molnar wrote:
> (i guess mostly because i'm pretty presentation-shy. It's probably too
> late for OLS, but if someone else feels a desire to do more in this
> area, i certainly wont complain.)
Like I told Sven, if this important enough (as it seems it is), I don't
see why the people in charge wouldn't at least try to find a spot.
> you could not have seen it a year ago because it simple didnt exist back
> then :) I started implementing the PREEMPT_RT model roughly half a year
> ago.
I know. It wasn't a comment about PREEMPT_RT, but rather about the OLS
in general ...
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Wednesday 25 May 2005 01:17, Nick Piggin wrote:
>Gene Heskett wrote:
>>On Tuesday 24 May 2005 22:20, Andrew Morton wrote:
>>>Sven Dietrich <[email protected]> wrote:
>>>>I think people would find their system responsiveness /
>>>> tunability goes up tremendously, if you drop just a few
>>>> unimportant IRQs into threads.
>>>
>>>People cannot detect the difference between 1000usec and 50usec
>>>latencies, so they aren't going to notice any changes in
>>>responsiveness at all.
>>
>>Excuse me?
>
>You are excused ;)
>
>> 1 second (1000 usecs, 200 times your 50 usec example) is
>
>1000usecs is 1msec.
Duh, my mistake. And it probably wouldn't do to plead alzheimers
either, darn.
[...]
>This is a topic that for some reason will tend to degenerate into a
> random shouting match where nobody actually says anything or
> listens to anything, and nothing gets done. Not to say you are
> trying to start a flamewar, Gene, but everyone just needs to tread
> a bit carefully :)
Duely chastised.
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.34% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.
On Wednesday 25 May 2005 02:33, Ingo Molnar wrote:
>[ of course this is all just talk, but people seem to have a desire
> to talk about it :-) ]
>
> Ingo
Did you get my (private) message re the 47-07 patch and einstein@home?
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.34% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.
Bill Huey (hui) wrote:
> On Tue, May 24, 2005 at 06:31:41PM -0400, Karim Yaghmour wrote:
>
>><repeating-myself>
>>From my POV, it just seems that it's worth asking a basic
>>question: what is the least intrusive modification to the Linux
>>kernel that will allow obtaining hard-rt and what mechanisms
>>can we or can we not build on that modification? Again, my
>>answer to this question doesn't matter, it's the development
>>crowd's collective answer that matters. And in championing
>>the hypervisor/nanokernel path, I could turn out to be horribly
>>wrong. At this stage, though, I'm yet unconvinced of the
>>necessity of anything but the most basic kernel changes (as
>>in using function pointers for the interrupt control path,
>>which could be a CONFIG_ also).
>
>
> I know what you're saying and it's kind unaddressed by various
> in this discussion.
>
> When I think of the advantages of a single over dual image kernel
> system I think of it in terms of how I'm going to implement QoS.
> If I need to get access to a special TCP/IP socket in real time
> with strong determinancy you run into the problem of crossing to
> kernel concurrency domains, one preemptible one not, with a dual
> kernel system and have to use queues or other things to
> communicate with it. Even with lockless structures, you're still
> expressing latency in the Linux kernel personality if you have
> some kind of preexisting app that's already running in an atomic
> critical section holding non-preemptive spinlocks.
>
> However this is not RTAI as I understand it since it can run N
> number of image for each RT task (correct?)
>
Basically, yes. The RTAI/fusion machinery makes sure that either Linux
or the RTAI co-scheduler are alternately in control of the RT tasks they
share, depending on the code they happen to tread on, and use the same
priority scheme so that you don't end up losing your effective RTAI
priority just because you happen to issue a regular system call that
migrates you under the control of the Linux kernel to process it.
It turns out that your worst-case sched latency when using Linux
services in RT context is mainly defined by the granularity of the Linux
kernel, since the co-scheduler has very simple synchronization
constraints, and can be activated at any time by interrupts regardless
of the current masking state set by the Linux kernel. For the same
reason, if the task keeps using only RTAI-specific services, then your
worst-case is always close to the hardware limit.
> Having multipule images helps out, but fails in scenarios where
> you have to have tight data coupling. I have to think about things
> like dcache_lock, route tables, access to various IO system like
> SCSI and TCP/IP, etc...
>
> A single system image makes access to this direct unlike dual kernel
> system where you need some kind of communication coupling. Resource
> access is direct. Modifying large grained subsystems in the kernel
> is also direct. As preexisting multimedia apps use more RT facilities,
> apps are going to need something more of a general purpose OS to make
> development easiler. These aren't traditional RT apps at all, but
> still require hard RT response times. Keep in mind media apps use
> the screen, X11, audio device(s), IDE/SCSI for streaming, networking,
> etc... It's a comprehensive use of many of the facilities of kernel
> unlike traditional RT apps.
Agreed, not all apps requiring bounded latencies are sampling i/o ports
on the bare metal at 20Khz, just because there are different levels of
RT requirements. For this reason, RTAI's fusion track has always been
meant to leverage and complement the undergoing efforts to improve the
vanilla kernel wrt overall latency and proper priority enforcement, in
the current case PREEMPT_RT and its PI implementation. RTAI (all tracks
included) addresses a small niche of applications which really can't
take any chance with unexpected latencies when time constraints are
extreme, underlying hardware has limited capacities, and/or detection of
3rd party code randomly inducing jittery in a large kernel codebase is
out of reach. A niche which is being shared with other RTOS and as you
already pointed out, may even get smaller once Linux latencies
eventually becomes predictable and bounded within a reasonably low
micro-sec range, as some people who don't actually need extremely low
worst-case latencies close to the hardware limits eventually figure out
that vanilla Linux on full preemption steroids is up to the job.
As said earlier, one of the main goals of the fusion track within the
RTAI project is about providing a convenient way for the remaining niche
users to access both worlds seamlessly, so that the covered spectrum of
applications with varying RT requirements could be broader. In that
sense, you can bet that we are among the supporters of the PREEMPT_RT
effort, because it magically solves half of the long-term issues
involved with having a practical and sound integration between fusion
and Linux.
>
> Now, this doesn't necessarily replace RTAI, but a preemptive Linux
> kernel can live as a first-class citizen to RTAI. I've been thinking
> about merging some of the RTAI scheduler stuff into the RT patch.
I did it recently, crafting a combo patch between Adeos (needed by
fusion) and PREEMPT_RT (0.7.44-03). The results running fusion over this
combo are encouraging, even if things remain to be ironed.
> uber-preemption Linux doesn't have a sophisticate userspace yet
> and here RTAI clearly wins, no decent RT signal handling, etc...
> There are other problems with it and the current implementation.
> This is going to take time to sort out so RTAI still wins at this
> point.
>
IMHO, RTAI will eventually achieve one of its major goals when it
succeeds to smartly and transparently integrate with Linux, while still
keeping the standard semantics for the RT tasks it manages. At this
point, using RTAI or not will not be a matter of religion between mere
Linux or co-kernel zealots, but a decision based on the required level
of predictability that one may obtain in a particular hw/sw context.
Hopefully.
> I hope I addressed this properly, but that's the point of view
> I'm coming from.
>
> bill
>
--
Philippe.
Bill Huey (hui) wrote:
[snip helpful explanations]
^
thanks for that.
> Sorry, yeah, I'm a bit jumpy from dealing with chronic irrationality
> from the FreeBSD group, which has created low expectations from various
> open source groups at times. Interaction with other jumpy kernel
> conservatives in this community doesn't the help the matter.
>
Well no that's OK, and no hard feelings. I think there was a bit
of misunderstanding on my behalf as well :)
And I perhaps didn't make it so clear that I was taking a neutral
stance, and not actually commenting on the patch specifically.
> Basically, the more you read http://linuxdevices.com the more you'll
> understand why folks are edgy about this. :)
>
Well I think obviously any improvement in Linux's capability is a
good thing. And at the end of the day it sounds like most or maybe
all this stuff should be able to get included. But it is always
going to be a slow process, and you'll probably have to put up with
some flames along the way :P
Well I'll be quiet now, unfortunately I didn't add much to the
discussion myself!
Send instant messages to your online friends http://au.messenger.yahoo.com
I havnt had time to look at thes patches so could someone
who has answer the following questions
- what is the increase in kernel overhead with the full
patch enabled
- can the patch be configured IN/OUT and if so BUILD/RUN time
- I saw the mention of BUG catching, can someone elaborate
TIA
--
mit freundlichen Gr??en, Brian.
Darn, I just got a chance to check my email, and I'm coming in very late
to this thread. Well, I've tried to include everyone that is in on it
so that I can get direct emails too. I'm very interested in this
thread.
On Wed, 2005-05-25 at 17:12 +0200, Brian O'Mahoney wrote:
> I havnt had time to look at thes patches so could someone
> who has answer the following questions
>
> - what is the increase in kernel overhead with the full
> patch enabled
There's a few and I'm sure that others will elaborate further.
wrt. IRQ threads: Usually when a interrupt goes off, what ever is
running (presumably with interrupts enabled) gets interrupted and the
top level code is executed. With the RT patch, instead each top level
function is implemented by a separate thread. So instead of just
executing the code at the time of the interrupt, you need to wake up the
corresponding thread instead. Now you have the overhead of a context
switch (two actually, one to get the the thread and another to get back
to what was interrupted). With lots of interrupts going off, you have
lots of context switches. Not to mention the slight overhead of the
threads themselves.
With the priority inheritance, you have the overhead of the spin locks
(which are now mutexes) having to do more to check if they are locked.
And if so, they get added to a priority list (if real time).
>
> - can the patch be configured IN/OUT and if so BUILD/RUN time
>
The patch is turned on with CONFIG options. There's also an /proc
interface to change the actions of the kernel at run time if the CONFIGs
were turned on. You can change the way interrupts are preempted, and so
on.
> - I saw the mention of BUG catching, can someone elaborate
>
I believe that you are talking about the catching problems that are hard
to find in the normal system. The RT patch allows for much more
preemption and things that are not truly re-entrant but are expected to
be on a SMP system can be found much easier, since you have more context
switches happening at points that are usually protected by a spin lock.
The RT kernel makes the protection of spinlock areas just protected by
those that have the lock, as apposed to just disabling preemption.
Hope this helps,
Gru?,
-- Steve
Karim Yaghmour <[email protected]> wrote:
> Why isn't anyone doing a presentation about Ingo's patch at the OLS
> this year?
It's not as good as a whole talk, but I'll certainly touch on it during
my increasingly traditional "state of kernel development" talk. There
should be a good discussion on RT stuff at the kernel summit just before
OLS, so there might just be some interesting new things to say...
jon
Jonathan Corbet
Executive editor, LWN.net
[email protected]
Ingo Molnar <[email protected]> writes:
> * Andrew Morton <[email protected]> wrote:
>
>> Sven Dietrich <[email protected]> wrote:
>> >
>> > I think people would find their system responsiveness / tunability
>> > goes up tremendously, if you drop just a few unimportant IRQs into
>> > threads.
>>
>> People cannot detect the difference between 1000usec and 50usec
>> latencies, so they aren't going to notice any changes in
>> responsiveness at all.
>
> i agree in theory, but interestingly, people who use the -RT branch do
> report a smoother desktop experience. While it might also be a
I bet if you did a double blind test (users not knowing if they
run with RT patch or not or think they are running with patch when they
are not) they would report the same.
Basically when people go through all that effort of applying
a patch then they really want to see an improvement. If it is there
or not.
You surely have seen that with other patches when users
suddenly reported something worked better/smoother with a new
release etc and there was absolutely no explanation for it in the changed
code.
I have no reason to believe this is any different with all
this RT testing.
-Andi (who also would prefer to not have interrupt threads, locks like
a maze and related horribilities in the mainline kernel)
On Wed, 2005-05-25 at 09:27 -0400, Karim Yaghmour wrote:
> Ingo Molnar wrote:
> > (i guess mostly because i'm pretty presentation-shy. It's probably too
> > late for OLS, but if someone else feels a desire to do more in this
> > area, i certainly wont complain.)
>
> Like I told Sven, if this important enough (as it seems it is), I don't
> see why the people in charge wouldn't at least try to find a spot.
>
I have submitted a proposal, but its obviously very very late.
I suggest that folks should contact any powers that be at OLS,
and express their interest in an RT presentation.
Then I'll get up there and you all can throw rotten fruit and veggies ;)
Sven
On Wed, 2005-05-25 at 19:17 +0200, Andi Kleen wrote:
> I bet if you did a double blind test (users not knowing if they
> run with RT patch or not or think they are running with patch when they
> are not) they would report the same.
>
The double blind is a bad idea. It puts perceptions in the head where
they may say they see no improvement when there is.
A better test would be to have two identical computers running side to
side (say computer A and B). Let a number of people play around with
both for a while, with web browsing, movie viewing, mp3 listening, etc.
And then have a survey of which machine performed better and general
comments. Don't let them even know about the RT patch, although one
would have it and the other would not. That would probably be the best
indication of whether or not the patch is noticeable. If everyone
(>85%) says that computer A is much smoother in running video or sound,
and computer A had the RT patch, then the answer would be yes. But if A
didn't, or there was no notice of a difference (~50% say A and ~50% say
B), then you can say there is no notice of difference with the patch.
> Basically when people go through all that effort of applying
> a patch then they really want to see an improvement. If it is there
> or not.
So make it mainline, and then they won't need to go through any effort
in applying the patch :-)
>
> You surely have seen that with other patches when users
> suddenly reported something worked better/smoother with a new
> release etc and there was absolutely no explanation for it in the changed
> code.
Yes, but these changes are part of the code.
>
> I have no reason to believe this is any different with all
> this RT testing.
Maybe not for the average user noticing the differences, but Lee's
latency tests seem to show something.
>
> -Andi (who also would prefer to not have interrupt threads, locks like
> a maze and related horribilities in the mainline kernel)
Why is it so horrible to have interrupts as threads? It's just a config
option and it really doesn't complicate the kernel that much. As for
the maze in the locks, the spin_locks are already pretty confusing with
out the changes, and the confusion with them is just to keep the
interface the same. I actually like the way Ingo did the locks. I use
to work for TimeSys and if you wanted to use a raw_spinlock in their
kernel you needed to always explicitly call raw_spin_lock and
raw_spin_unlock. The macros with Ingo's makes it easy to switch between
the raw and mutex spin lock.
-- Steve
On Wed, 2005-05-25 at 19:17 +0200, Andi Kleen wrote:
> Ingo Molnar <[email protected]> writes:
>
> > * Andrew Morton <[email protected]> wrote:
> >
> >> Sven Dietrich <[email protected]> wrote:
> >> >
> >> > I think people would find their system responsiveness / tunability
> >> > goes up tremendously, if you drop just a few unimportant IRQs into
> >> > threads.
> >>
> >> People cannot detect the difference between 1000usec and 50usec
> >> latencies, so they aren't going to notice any changes in
> >> responsiveness at all.
> >
> > i agree in theory, but interestingly, people who use the -RT branch do
> > report a smoother desktop experience. While it might also be a
>
> I bet if you did a double blind test (users not knowing if they
> run with RT patch or not or think they are running with patch when they
> are not) they would report the same.
>
I would take that bet double or nothing.
> Basically when people go through all that effort of applying
> a patch
You mean typing "patch -p1 < ..."
> then they really want to see an improvement. If it is there
> or not.
>
Hopefully they will also set the config options correctly :)
> You surely have seen that with other patches when users
> suddenly reported something worked better/smoother with a new
> release etc and there was absolutely no explanation for it in the changed
> code.
>
I suppose the audio guys have something on that.
Even if you don't have an ear for music, you can hear a
skip on a CD, a scratch on a record, or a glitch on
a digital audio file from preemption latency.
These are all events in the same time frame, and
that is in the milliseconds....
> I have no reason to believe this is any different with all
> this RT testing.
>
And that's why we have been testing and benchmarking, to
produce number sets that supersede faith, belief, and
conjecture. But ultimately, you can trust your senses,
and I think the audio / video test would allow your eyes
to see, and your ears to hear the difference.
> -Andi (who also would prefer to not have interrupt threads, locks like
> a maze and related horribilities in the mainline kernel)
I am definitely for breaking out an IRQ threads patch,
separate from the RT-mutex patches, even if just to
allow examination of that code without the clutter.
Andi,
the 2.6.12-rc5 is broken in nvidia Ck804 Opteron MB.
the Core id seems to be right now.
the core 0 of node 1 can not be started and hang there.
YH
CPU 0(2) -> Node 0 -> Core 0
enabled ExtINT on CPU#0
ENABLING IO-APIC IRQs
Using IO-APIC 4
...changing IO-APIC physical APIC ID to 4 ... ok.
Using IO-APIC 5
...changing IO-APIC physical APIC ID to 5 ... ok.
Using IO-APIC 6
...changing IO-APIC physical APIC ID to 6 ... ok.
Using IO-APIC 7
...changing IO-APIC physical APIC ID to 7 ... ok.
Synchronizing Arb IDs.
testing the IO APIC.......................
.................................... done.
Using local APIC timer interrupts.
Detected 12.564 MHz APIC timer.
Booting processor 1/1 rip 6000 rsp ffff81007ff07f58
Initializing CPU#1
masked ExtINT on CPU#1
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(2) -> Node 0 -> Core 1
stepping 00
CPU 1: Syncing TSC to CPU 0.
Booting processor 2/2 rip 6000 rsp ffff81013ff11f58
Initializing CPU#2
masked ExtINT on CPU#2
On Wed, 2005-05-25 at 08:05 +0200, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
> > Lee Revell wrote:
> >
> > > The IDE IRQ handler can in fact run for several ms, which people
> > > sure can detect.
> >
> > Are you serious? Even at 10ms, the monitor refresh rate would have to
> > be over 100Hz for anyone to "notice" anything, right?... [...]
>
> you are assuming direct observation. Sure, a human (normally) doesnt
> notice smaller than say 10-20 msec of lag. But, a human very much
> notices indirect effects of latencies, such as the nasty 'click' a
> soundcard produces if it overruns.
>
> > What sort of numbers are you talking when you say several?
>
> a couple of msecs easily even on fast boxes. Well over 10 msecs on
> slower boxes.
>
Right, normal desktop use on a fast machine probably won't notice. But
if you're trying to play a softsynth with a MIDI keyboard, 10ms is about
the threshold of perceptible lag. I think it's reasonable to expect
this to work without having to customize your kernel for low latency.
If you're trying to plug your guitar into the line in, and put some
LADSPA effects on it, then the threshold is really 3-5ms, because
keyboard players are used to more latency (think about the mechanics of
striking a piano key vs. plucking a string with a pick).
I don't think sub-millisecond latencies are needed with the default
config. But, both of the above should work OOTB like on Windows and
OSX.
Lee
Sven-Thorsten Dietrich wrote:
> I have submitted a proposal, but its obviously very very late.
>
> I suggest that folks should contact any powers that be at OLS,
> and express their interest in an RT presentation.
There will be a board for informal BOF sessions. That would be one
possibility if the formal presentation is declined.
I suspect there will be a decent number of interested people, just cause
it's kind of a neat topic.
Chris
On Wed, May 25, 2005 at 11:10:55AM -0700, YhLu wrote:
> Andi,
>
> the 2.6.12-rc5 is broken in nvidia Ck804 Opteron MB.
>
> the Core id seems to be right now.
>
> the core 0 of node 1 can not be started and hang there.
Hmm, I tested it on a simulator only. It worked there.
Will try to double check on a real DC machine, although it is
difficult for some other reasons.
-Andi
>
> YH
>
> CPU 0(2) -> Node 0 -> Core 0
> enabled ExtINT on CPU#0
> ENABLING IO-APIC IRQs
> Using IO-APIC 4
> ...changing IO-APIC physical APIC ID to 4 ... ok.
> Using IO-APIC 5
> ...changing IO-APIC physical APIC ID to 5 ... ok.
> Using IO-APIC 6
> ...changing IO-APIC physical APIC ID to 6 ... ok.
> Using IO-APIC 7
> ...changing IO-APIC physical APIC ID to 7 ... ok.
> Synchronizing Arb IDs.
> testing the IO APIC.......................
>
>
>
>
> .................................... done.
> Using local APIC timer interrupts.
> Detected 12.564 MHz APIC timer.
> Booting processor 1/1 rip 6000 rsp ffff81007ff07f58
> Initializing CPU#1
> masked ExtINT on CPU#1
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> CPU: L2 Cache: 1024K (64 bytes/line)
> CPU 1(2) -> Node 0 -> Core 1
> stepping 00
> CPU 1: Syncing TSC to CPU 0.
> Booting processor 2/2 rip 6000 rsp ffff81013ff11f58
> Initializing CPU#2
> masked ExtINT on CPU#2
Sven Dietrich wrote:
>>Bill Huey (hui) wrote:
>>
>>>I think there's a lot of general ignorance regarding this
>>
>>patch, the
>>
>>>usefulness of it and this thread is partially addressing them.
>>
>>Forgive the dumb question:
>>Why isn't anyone doing a presentation about Ingo's patch at
>>the OLS this year?
>>
>>If you want to get this thing in front of peoples' eyes, this
>>would probably be the best venue. It would certainly be a
>>good place to get people talking about it. Explaining what's
>>in the patch, how it came to be, what are the
>>interdependencies, modifications to existing code, added core
>>files, pros/cons, performance, actual demo, etc.
>>
>>Currently, looking at the listed presentations, apart from
>>finding myself thinking "hm..., I swear that guy did the same
>>presentation last year ... and maybe the year before", I
>>can't see any entry alluding to rt-preempt ... maybe I missed it?
>>
>
>
> I think its too late to add a presentation there now,
> but if folks are interested, I would be willing to talk about
> it all day long.
I'm intensely interested in this. Tell me where you'll be
talking (even if it's on the sidewalk outside) and I'll be
there!
=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================
If irqs are run in threads, which are scheduled, how are they scheduled?
fifo? What's the point then; simply to let the top half run to completion
before another top half starts? If it's about setting scheduling priorities
for irq threads, some one top half can prempt another, why not just use irq
levels, like bsd (using pic's is slower than using threads?)?
--
Tom Vier <[email protected]>
DSA Key ID 0x15741ECE
On Wed, 25 May 2005, Tom Vier wrote:
> If irqs are run in threads, which are scheduled, how are they scheduled?
> fifo? What's the point then; simply to let the top half run to completion
> before another top half starts? If it's about setting scheduling priorities
> for irq threads, some one top half can prempt another, why not just use irq
> levels, like bsd (using pic's is slower than using threads?)?
>
Long interrupt handlers can be interrupt by _tasks_, not only other
interrupts! An audio application running in userspace can be scheduled
over an ethernet interrupt handler copying data from the
controller into RAM (without DMA).
Esben
Is the RT patch for the x86 only or is it arch independent?
I'd like to do some work with it on our embedded boards if I don't get
restricted to pentiums.
thx,
NZG.
On Wednesday 25 May 2005 16:05, Esben Nielsen wrote:
> On Wed, 25 May 2005, Tom Vier wrote:
> > If irqs are run in threads, which are scheduled, how are they scheduled?
> > fifo? What's the point then; simply to let the top half run to completion
> > before another top half starts? If it's about setting scheduling
> > priorities for irq threads, some one top half can prempt another, why not
> > just use irq levels, like bsd (using pic's is slower than using
> > threads?)?
>
> Long interrupt handlers can be interrupt by _tasks_, not only other
> interrupts! An audio application running in userspace can be scheduled
> over an ethernet interrupt handler copying data from the
> controller into RAM (without DMA).
>
> Esben
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Wed, May 25, 2005 at 11:05:05PM +0200, Esben Nielsen wrote:
> Long interrupt handlers can be interrupt by _tasks_, not only other
> interrupts! An audio application running in userspace can be scheduled
> over an ethernet interrupt handler copying data from the
> controller into RAM (without DMA).
Doesn't that greatly increase the risk of the hardware overrunning it's
buffer?
--
Tom Vier <[email protected]>
DSA Key ID 0x15741ECE
On Wed, May 25, 2005 at 04:58:41PM -0400, Tom Vier wrote:
> If irqs are run in threads, which are scheduled, how are they scheduled?
> fifo? What's the point then; simply to let the top half run to completion
> before another top half starts? If it's about setting scheduling priorities
> for irq threads, some one top half can prempt another, why not just use irq
> levels, like bsd (using pic's is slower than using threads?)?
The point is to have explicit scheduler control this kind with relation
to the RT app in question and not bring back retro Vax 11/780 device
drive semantics in the year 2005. Even FreeBSD/DragonFlyBSD has this
stuff removed.
bill
On Wed, May 25, 2005 at 05:25:38PM -0400, Tom Vier wrote:
> On Wed, May 25, 2005 at 11:05:05PM +0200, Esben Nielsen wrote:
> > Long interrupt handlers can be interrupt by _tasks_, not only other
> > interrupts! An audio application running in userspace can be scheduled
> > over an ethernet interrupt handler copying data from the
> > controller into RAM (without DMA).
>
> Doesn't that greatly increase the risk of the hardware overrunning it's
> buffer?
If you have a broken device and associated driver yes. But it's not like
irq-threads are going to change that either way.
bill
On Wed, 25 May 2005, Tom Vier wrote:
> On Wed, May 25, 2005 at 11:05:05PM +0200, Esben Nielsen wrote:
> > Long interrupt handlers can be interrupt by _tasks_, not only other
> > interrupts! An audio application running in userspace can be scheduled
> > over an ethernet interrupt handler copying data from the
> > controller into RAM (without DMA).
>
> Doesn't that greatly increase the risk of the hardware overrunning it's
> buffer?
Hopefully you do not have much hardware on your PC you have to service
within very short timeframes without getting into serious trouble - if so
you need a RTOS :-)
By not servicing you ethernet device you might loose packages - but the IP
protocols are supposed to handle that in the first place so there is no
real problem there.
The whole point of putting it into threads is that you can decide which is
the most important: Your audio application or your slow ethernet
device with no DMA. If the driver for the netcard is fast small enough,
run it with higher priority than your RT application, otherwise give it a
lower priority. Then if your RT application takes too much CPU you will
loose packages. You can't get both (without adding more CPUs).
Without threading the ethernet device and giving it sufficient low
priority, somebody can DOS attack your RT application by spamming the
local network!
Esben
On Wed, May 25, 2005 at 11:00:19AM -0700, Sven-Thorsten Dietrich wrote:
>
> > I have no reason to believe this is any different with all
> > this RT testing.
> >
>
> And that's why we have been testing and benchmarking, to
> produce number sets that supersede faith, belief, and
> conjecture. But ultimately, you can trust your senses,
> and I think the audio / video test would allow your eyes
> to see, and your ears to hear the difference.
I understand that you have some real improvements that are measurable.
What I objected to was the claim that it actually made any difference
to interactive users.
>
> > -Andi (who also would prefer to not have interrupt threads, locks like
> > a maze and related horribilities in the mainline kernel)
>
> I am definitely for breaking out an IRQ threads patch,
> separate from the RT-mutex patches, even if just to
> allow examination of that code without the clutter.
What I dislike with RT mutexes is that they convert all locks.
It doesnt make much sense to me to have a complex lock that
only protects a few lines of code (and a lot of the spinlock
code is like this). That is just a waste of cycles.
But I always though we should have a new lock type that is between
spinlocks and semaphores and is less heavyweight than a semaphore
(which tends to be quite slow due to its many context switches). Something
like a spinaphore, although it probably doesnt need full semaphore
semantics (rarely any code in the kernel uses that anyways). It could
spin for a short time and then sleep. Then convert some selected
locks over. e.g. the mm_sem and the i_sem would be primary users of this.
And maybe some of the heavier spinlocks.
If you drop irq threads then you cannot convert all locks
anymore or have to add ugly in_interrupt()checks. So any conversion like
that requires converting locks.
-Andi
On Thu, 2005-05-26 at 21:32 +0200, Andi Kleen wrote:
> >
> > > I have no reason to believe this is any different with all
> > > this RT testing.
> > >
> >
> > And that's why we have been testing and benchmarking, to
> > produce number sets that supersede faith, belief, and
> > conjecture. But ultimately, you can trust your senses,
> > and I think the audio / video test would allow your eyes
> > to see, and your ears to hear the difference.
>
> I understand that you have some real improvements that are measurable.
> What I objected to was the claim that it actually made any difference
> to interactive users.
>
Yes, the observation is subjective.
I have been experiencing it since we got our first prototype up and
running, back in July or August last year.
In addition, I have received unsolicited comments in this regard, from
just about everyone who has run the RT kernel on a desktop. And I have
heard others echo similar observations, on this list and elsewhere.
But yes, it is subjective without a placebo control group...
The fact about the RT implementation is, that it eliminates ALL high
latencies, so you will never have the aggregate transient bursts, that
make your music skip, or make your mouse freeze on the screen, even for
just an instant.
These transients will occur on a desktop kernel, and no matter how
infrequent, once is too many for some class of applications.
Since these transient events do occur, and folks may have learned to
ignore them, the *complete absence* of these transients IS absolutely
noticed by anyone who has worked with the non-preemptable and the
current preemptable kernels.
And I think that is what we are talking about. The feeling that the
desktop experience is smoother because the responsiveness of the
system interleaves mouse IRQs with the consequent task-based updates on
the screen. What is not noticable, is the additional latency in
processing the mouse interrupt in a thread.
I think the music example is most relevant. Think of a guy on stage
blaming his computer for a malfunctioning, noisy, guitar effect.
Or a frame glitch in an animation, or video game.
> >
> > > -Andi (who also would prefer to not have interrupt threads, locks like
> > > a maze and related horribilities in the mainline kernel)
> >
> > I am definitely for breaking out an IRQ threads patch,
> > separate from the RT-mutex patches, even if just to
> > allow examination of that code without the clutter.
>
Here, I am talking about separating out the patch, and applying it
first, not dropping it from the RT implementation.
> What I dislike with RT mutexes is that they convert all locks.
> It doesnt make much sense to me to have a complex lock that
> only protects a few lines of code (and a lot of the spinlock
> code is like this). That is just a waste of cycles.
>
It is NOT just a few lines of code. Millisecond latencies on high-
powered CPU systems means more code than is probably required to send a
rocket 'round the moon and back.
In addition, there are lock-ordering and lock-nesting issues (not to be
confused with the Scottish sea creature :) that make this approach non-
trivial whatsoever.
> But I always though we should have a new lock type that is between
> spinlocks and semaphores and is less heavyweight than a semaphore
> (which tends to be quite slow due to its many context switches). Something
> like a spinaphore, although it probably doesnt need full semaphore
> semantics (rarely any code in the kernel uses that anyways). It could
> spin for a short time and then sleep. Then convert some selected
> locks over. e.g. the mm_sem and the i_sem would be primary users of this.
> And maybe some of the heavier spinlocks.
This is a bottom up approach, that simply doesn't work. I spent months
considering this same scenario, so did a lot of other folks. This type
of hybrid solution would blow the complexity and patch size through the
roof, and render it unmaintainable. It is precisely why we introduced
the concept to LKML in the first place. Review the archives for a week
or two, after my RT post on October 9.
>
> If you drop irq threads then you cannot convert all locks
> anymore or have to add ugly in_interrupt()checks. So any conversion like
> that requires converting locks.
>
You will find a very good explanation of the dependencies in my original
post on October 9. Also, please see my comment above, under "allow
examination of that code without the clutter."
Sven
> Here, I am talking about separating out the patch, and applying it
> first, not dropping it from the RT implementation.
I really dislike the idea of interrupt threads. It seems totally
wrong to me to make such a fundamental operation as an interrupt
much slower. If really any interrupts take too long they should
move to workqueues instead and be preempted there. But keep
the basic fundamental operations fast please (at least that used to be one
of the Linux mottos that served it very well for many years, although more
and more people seem to forget it now)
> > What I dislike with RT mutexes is that they convert all locks.
> > It doesnt make much sense to me to have a complex lock that
> > only protects a few lines of code (and a lot of the spinlock
> > code is like this). That is just a waste of cycles.
> >
>
> It is NOT just a few lines of code. Millisecond latencies on high-
> powered CPU systems means more code than is probably required to send a
> rocket 'round the moon and back.
Most spinlocks only protect small code parts. Those that protect
larger codes can probably use optionally some different lock.
But dont attack it with "one size fits all" locking please.
> In addition, there are lock-ordering and lock-nesting issues (not to be
> confused with the Scottish sea creature :) that make this approach non-
> trivial whatsoever.
Hmm? Sorry that didnt make any sense. If the code was correct
before changing to a different spin like type should not
make any difference.
The only problem you have is interrupt code, which cannot sleep,
but I dont think you will eventually get around of fixing these
properly (= checking the code if it is slow and yes move it
over and if not leave it alone)
> > spin for a short time and then sleep. Then convert some selected
> > locks over. e.g. the mm_sem and the i_sem would be primary users of this.
> > And maybe some of the heavier spinlocks.
>
> This is a bottom up approach, that simply doesn't work. I spent months
> considering this same scenario, so did a lot of other folks. This type
> of hybrid solution would blow the complexity and patch size through the
> roof, and render it unmaintainable. It is precisely why we introduced
Of course you would not do it as a big patch. Instead you do it one
by one, every time you identify a problem you submit a patch, it gets
accepted etc.
This way you never have a big pile of patches, just a small patchset in a current
queue.
Of course big patches dont work, but there is no reason you have to keep
big patches for this.
That is how most other Linux maintainers work too. Why should that code
be any different?
It sounds to me like you did not understand how Linux kernel code
submission is
> You will find a very good explanation of the dependencies in my original
> post on October 9. Also, please see my comment above, under "allow
> examination of that code without the clutter."
If you tried it with a big patchkit I am not surprised that the approach
didnt work. But did you ever consider the problem might be with the
way you submitted patches, not with the basic code?
-Andi
Andi Kleen wrote:
> What I dislike with RT mutexes is that they convert all locks.
> It doesnt make much sense to me to have a complex lock that
> only protects a few lines of code (and a lot of the spinlock
> code is like this). That is just a waste of cycles.
I had brought this up in the dim past in the context
of adaptive mutexes which could via heuristics decide
whether to spin/sleep.
> But I always though we should have a new lock type that is between
> spinlocks and semaphores and is less heavyweight than a semaphore
> (which tends to be quite slow due to its many context switches). Something
> like a spinaphore, although it probably doesnt need full semaphore
> semantics (rarely any code in the kernel uses that anyways). It could
> spin for a short time and then sleep.
Spin if the lock is contended and the owner is active
on a cpu under the assumption the lock owner's average
hold time is less than that of a context switch. There
are restrictions as once a path holds an adaptive
mutex as a spin lock it cannot acquire another adaptive
mutex as a blocking lock.
> If you drop irq threads then you cannot convert all locks
> anymore or have to add ugly in_interrupt()checks. So any conversion like
> that requires converting locks.
Yes, I was trying to make that point in an earlier thread.
-john
--
[email protected]
On Thu, 2005-05-26 at 22:27 +0200, Andi Kleen wrote:
> > Here, I am talking about separating out the patch, and applying it
> > first, not dropping it from the RT implementation.
>
> I really dislike the idea of interrupt threads. It seems totally
> wrong to me to make such a fundamental operation as an interrupt
> much slower. If really any interrupts take too long they should
> move to workqueues instead and be preempted there. But keep
> the basic fundamental operations fast please (at least that used to be one
> of the Linux mottos that served it very well for many years, although more
> and more people seem to forget it now)
>
IRQ threads are configurable. If you don't want them, you CAN turn them
off (if you have already turned them on).
You don't HAVE to turn them on.
The IRQ option option adds "responsiveness" to the not forgotten, but
configurable, "progress, throughput, fairness, resource sharing"
principles of Unix found in the Linux kernel.
A lot of the IRQ stuff is already in bottom halves, where it runs in a
thread sometimes. But when it doesn't, its not preemptable, sometimes
for a long time. Add other non-preemptable regions, and you get big
aggregates.
The IRQ thread option runs SoftIRQd in a thread always, if you want to
configure it that way, and eliminates the IRQ latency burstyness.
As far as the hard IRQs, you can leave them off too, and maybe turn on
one for a while, until you forget that you did that.
Then try another.
Your other questions are probably covered by Ingo's current remarks.
Cheers,
Sven
On Thu, May 26, 2005 at 09:32:30PM +0200, Andi Kleen wrote:
> I understand that you have some real improvements that are measurable.
> What I objected to was the claim that it actually made any difference
> to interactive users.
At first, no. It's a complicated problem and kernel preemptibility is
only a part of it. It's a critical part of it since it's got to
eventually feed to a scheduler of some sort. Feeding large temporal
chunks, high latency, to a scheduler defeats how priority in the system
is expressed in relation to other threads in the system.
> What I dislike with RT mutexes is that they convert all locks.
> It doesnt make much sense to me to have a complex lock that
> only protects a few lines of code (and a lot of the spinlock
> code is like this). That is just a waste of cycles.
Yeah, but really this is can only seriously be taken if you have numbers
showing that there's more contention on these paths. Until that happens
the actual scenario is unknown. But I strongly suspect that it doesn't
really make a difference mainly because of all of the SMP work that's
been done to Linux over the years. It's fundamentally about a contention
problem.
> But I always though we should have a new lock type that is between
> spinlocks and semaphores and is less heavyweight than a semaphore
> (which tends to be quite slow due to its many context switches). Something
> like a spinaphore, although it probably doesnt need full semaphore
> semantics (rarely any code in the kernel uses that anyways). It could
> spin for a short time and then sleep. Then convert some selected
> locks over. e.g. the mm_sem and the i_sem would be primary users of this.
> And maybe some of the heavier spinlocks.
Adaptiving spinning is a difficult thing to do since you have to snoop
for the active "current" on other processors to determine if you have to
sleep or not. FreeBSD 5.x uses this stuff and the locking code is very
complicated. In the future, it maybe desirable to incorporate parts of
this functionality into another RT mutex implementation. The current one
is overloaded enough with functionality as is .
> If you drop irq threads then you cannot convert all locks
> anymore or have to add ugly in_interrupt()checks. So any conversion like
> that requires converting locks.
That's reversed. Interrupt threads are an isolated change itself and can
be submitted upstream if so desired with no associated lock changes.
But that paragraph above is rather vague, so I can only guess at what you're
talking about. There are ways of doing context stealing with irq-threads to
minimize overhead and the FreeBSD folks have partially implemented this from
my memory.
bill
On Thu, 2005-05-26 at 16:38 -0400, john cooper wrote:
> Andi Kleen wrote:
> > What I dislike with RT mutexes is that they convert all locks.
> > It doesnt make much sense to me to have a complex lock that
> > only protects a few lines of code (and a lot of the spinlock
> > code is like this). That is just a waste of cycles.
>
> I had brought this up in the dim past in the context
> of adaptive mutexes which could via heuristics decide
> whether to spin/sleep.
> > But I always though we should have a new lock type that is between
> > spinlocks and semaphores and is less heavyweight than a semaphore
> > (which tends to be quite slow due to its many context switches). Something
> > like a spinaphore, although it probably doesnt need full semaphore
> > semantics (rarely any code in the kernel uses that anyways). It could
> > spin for a short time and then sleep.
>
> Spin if the lock is contended and the owner is active
> on a cpu under the assumption the lock owner's average
> hold time is less than that of a context switch. There
> are restrictions as once a path holds an adaptive
> mutex as a spin lock it cannot acquire another adaptive
> mutex as a blocking lock.
>
It might be simpler to get things working with a basic implementation
first, (status quo), and then look into adding something like this.
I don't see how this approach decreases the complexity of the task at
hand, especially not in regards to concurrency.
> > If you drop irq threads then you cannot convert all locks
> > anymore or have to add ugly in_interrupt()checks. So any conversion like
> > that requires converting locks.
>
> Yes, I was trying to make that point in an earlier thread.
>
My original comment was:
> The IRQ threads are actually a separate implementation.
>
> IRQ threads do not depend on mutexes, nor do they depend
> on any of the more opaque general spinlock changes, so this
> stuff SHOULD be separated out, to eliminate the confusion..
...
> As a logical prerequisite to the Mutex stuff, the IRQ threads,
> if broken out, could allow folks to test the water in the shallow end
> of the pool.
The dependency was STATED: "as a logical prerequisite...".
The context was: "breaking the IRQ threads into a separate patch"
You misread it, and then commented on that.
Sven
On Thu, 2005-05-26 at 13:52 -0700, Bill Huey wrote:
> > But I always though we should have a new lock type that is between
> > spinlocks and semaphores and is less heavyweight than a semaphore
> > (which tends to be quite slow due to its many context switches). Something
> > like a spinaphore, although it probably doesnt need full semaphore
> > semantics (rarely any code in the kernel uses that anyways). It could
> > spin for a short time and then sleep. Then convert some selected
> > locks over. e.g. the mm_sem and the i_sem would be primary users of this.
> > And maybe some of the heavier spinlocks.
>
> Adaptiving spinning is a difficult thing to do since you have to snoop
> for the active "current" on other processors to determine if you have to
> sleep or not. FreeBSD 5.x uses this stuff and the locking code is very
> complicated. In the future, it maybe desirable to incorporate parts of
> this functionality into another RT mutex implementation. The current one
> is overloaded enough with functionality as is .
>
> > If you drop irq threads then you cannot convert all locks
> > anymore or have to add ugly in_interrupt()checks. So any conversion like
> > that requires converting locks.
>
> That's reversed. Interrupt threads are an isolated change itself and can
> be submitted upstream if so desired with no associated lock changes.
> But that paragraph above is rather vague, so I can only guess at what you're
> talking about. There are ways of doing context stealing with irq-threads to
> minimize overhead and the FreeBSD folks have partially implemented this from
> my memory.
>
> bill
>
On Thu, 2005-05-26 at 13:52 -0700, Bill Huey wrote:
Sorry for the empty reply.
I was putting a hotdog under the saw to see if it would stop.
> > If you drop irq threads then you cannot convert all locks
> > anymore or have to add ugly in_interrupt()checks. So any conversion like
> > that requires converting locks.
>
> That's reversed. Interrupt threads are an isolated change itself and can
> be submitted upstream if so desired with no associated lock changes.
Precisely what was stated in the first place.
> bill
>
On Thu, May 26, 2005 at 10:27:47PM +0200, Andi Kleen wrote:
> I really dislike the idea of interrupt threads. It seems totally
> wrong to me to make such a fundamental operation as an interrupt
> much slower. If really any interrupts take too long they should
> move to workqueues instead and be preempted there. But keep
> the basic fundamental operations fast please (at least that used to be one
> of the Linux mottos that served it very well for many years, although more
> and more people seem to forget it now)
Again, it's about a bottom-up verses top-down approach to maintaining
this logic. To reiterate what both Ingo and Sven previously stated,
it isn't possible to do all of this per path and lock in any sane manner.
This extends to all of the drivers in Linux as well.
> Most spinlocks only protect small code parts. Those that protect
> larger codes can probably use optionally some different lock.
The problem with that is that it violates the "sleeping in atomic"
deadlock constraint in that the lock graph enforces a hierarchial
ordering downward, where an atomic lock forces all locks below it to
be also non-preemptible. There are at least 3+ critical sections
that needed to be made preemptible before the rest of the graph can
be correctly preemptible.
I found this out the week before Ingo did his stuff since I was one
of three projects during that time doing this mass retrofit. My project
is seldom mentioned any more since don't post publically very much,
moved to Ingo's tree and have been working on things internally here.
But I had my stuff running in May of last year against 2.6.0, so I was
one of the first folks to see the value of this path and the push into
the problem space. Much of my reasoning is derived from FreeBSD 5.x
> But dont attack it with "one size fits all" locking please.
> > In addition, there are lock-ordering and lock-nesting issues (not to be
> > confused with the Scottish sea creature :) that make this approach non-
> > trivial whatsoever.
>
> Hmm? Sorry that didnt make any sense. If the code was correct
> before changing to a different spin like type should not
> make any difference.
>
> The only problem you have is interrupt code, which cannot sleep,
> but I dont think you will eventually get around of fixing these
> properly (= checking the code if it is slow and yes move it
> over and if not leave it alone)
See Above.
> Of course you would not do it as a big patch. Instead you do it one
> by one, every time you identify a problem you submit a patch, it gets
> accepted etc.
Again, (Sven and Ingo) it's impossible to do since the kernel is so
complicate and the interdependencies of locks in the lock graph make
this track effectively impossible when using your manner of attack.
> This way you never have a big pile of patches, just a small patchset in a current
> queue.
>
> Of course big patches dont work, but there is no reason you have to keep
> big patches for this.
>
> That is how most other Linux maintainers work too. Why should that code
> be any different?
>
> It sounds to me like you did not understand how Linux kernel code
> submission is
...
> If you tried it with a big patchkit I am not surprised that the approach
> didnt work. But did you ever consider the problem might be with the
> way you submitted patches, not with the basic code?
The problem here is not the patch, but instead random hysteria from folks
that don't know the patch and haven't tracked development that are yelping
about fringe problems without an understanding the approach and things
that it address without any completeness
The concepts and suggestion here are not radical. There's plenty of
precedence from traditional RTOSes and other general purpose systems
(FreeBSD 5.x/Solaris/SGI IRIX) that uses these techiques. But the
hysteria around this stuff has really create impressions of what "not to
do in the kernel" without any real or substantial facts otherwise.
The entire debate from both sides is full of misunderstandings and
misconceptions of each approach. Most of it is completely irrational
paranoia from folks that are use to one way of doing things verses
actually looking at the approach from this group's point of view.
Until this is more widely incorporated or used, it's still going to have
what I believe to be a false stigma against it.
bill
Sven-Thorsten Dietrich wrote:
> On Thu, 2005-05-26 at 16:38 -0400, john cooper wrote:
>>Spin if the lock is contended and the owner is active
>>on a cpu under the assumption the lock owner's average
>>hold time is less than that of a context switch. There
>>are restrictions as once a path holds an adaptive
>>mutex as a spin lock it cannot acquire another adaptive
>>mutex as a blocking lock.
>
> It might be simpler to get things working with a basic implementation
> first, (status quo), and then look into adding something like this.
I wasn't suggesting this is the time to consider doing so,
but rather pointing it out as an available optimization.
> I don't see how this approach decreases the complexity of the task at
> hand, especially not in regards to concurrency.
It increases the efficiency of the mutex as we don't incur
context switches (in general) unless necessary. Concurrency
isn't fundamentally affected.
-john
--
[email protected]
Andi Kleen wrote:
> I really dislike the idea of interrupt threads. It seems totally
> wrong to me to make such a fundamental operation as an interrupt
> much slower. If really any interrupts take too long they should
> move to workqueues instead and be preempted there.
That is the basic idea but is being applied by default.
There is nothing more happening here than pushing the limits
of the design. Certainly there will be some inefficiencies
introduced in the process where some interrupt payload
would have been more efficiently executed in exception
context. If it makes sense on a system-wide basis to
do so, it can be reverted on a case-by-case basis. This is
after all experimental code.
Yet there are other factors which influence the decision
of a running context for the interrupt payload. Depending
on how other code in the system best synchronizes with the
payload, minimizing interrupt disable time in either
payload code and/or coordinating code, etc..
> Most spinlocks only protect small code parts. Those that protect
> larger codes can probably use optionally some different lock.
>
> But dont attack it with "one size fits all" locking please.
This another case of a sweeping change to the default
behavior. Again an experiment to push the limits of the
given design. Clearly we aren't buying anything to trade off
a spinlock protecting the update of a single pointer with a
blocking lock and associated context switching. But it does
demonstrate code previously relying on synchronization via
polled lock faired remarkably well with preemptive blocking
mutexes.
-john
--
[email protected]
Sven-Thorsten Dietrich wrote:
> On Thu, 2005-05-26 at 22:27 +0200, Andi Kleen wrote:
>
>>>Here, I am talking about separating out the patch, and applying it
>>>first, not dropping it from the RT implementation.
>>
>>I really dislike the idea of interrupt threads. It seems totally
>>wrong to me to make such a fundamental operation as an interrupt
>>much slower. If really any interrupts take too long they should
>>move to workqueues instead and be preempted there. But keep
>>the basic fundamental operations fast please (at least that used to be one
>>of the Linux mottos that served it very well for many years, although more
>>and more people seem to forget it now)
>
> IRQ threads are configurable. If you don't want them, you CAN turn them
> off (if you have already turned them on).
>
> You don't HAVE to turn them on.
Unless you have configured PREEMPT_RT which requires
PREEMPT_SOFTIRQS and PREEMPT_HARDIRQS such that
spinlock-mutexes are able to synchronize interrupt
processing. In other PREEMPT_* configuration modes
inclusion of IRQ threads is optional.
I think this may have been the source of confusion in
prior discussions.
-john
--
[email protected]
On Wed, 2005-05-25 at 10:32 -0700, Sven-Thorsten Dietrich wrote:
> On Wed, 2005-05-25 at 09:27 -0400, Karim Yaghmour wrote:
> > Ingo Molnar wrote:
> > > (i guess mostly because i'm pretty presentation-shy. It's probably too
> > > late for OLS, but if someone else feels a desire to do more in this
> > > area, i certainly wont complain.)
> >
> > Like I told Sven, if this important enough (as it seems it is), I don't
> > see why the people in charge wouldn't at least try to find a spot.
> >
>
> I have submitted a proposal, but its obviously very very late.
>
> I suggest that folks should contact any powers that be at OLS,
> and express their interest in an RT presentation.
>
> Then I'll get up there and you all can throw rotten fruit and veggies ;)
Sven,
Any news of your speech? I only live 4 and a half hours drive from
Ottawa, and I just got the OK to go (from the wife :-). I guess the OLS
can get another $500 from me be just letting you speak, even if it's
just a BOFS. I got a few extra tomatoes rotting in the fridge from last
year that I can bring ;-)
-- Steve
Nick Piggin wrote:
> I have a question about what sort of RT guarantees people might
> want. Forget specific patches or implementations for a minute.
> I'm genuinely curious, as an uneducated bystander - I want to get
> a bit more background about this.
>
Sorry this question is not directed to you, Andi.
But anyone with some info feel free to chime in ;)
Thanks.
Send instant messages to your online friends http://au.messenger.yahoo.com
Andi Kleen wrote:
>>>What I dislike with RT mutexes is that they convert all locks.
>>>It doesnt make much sense to me to have a complex lock that
>>>only protects a few lines of code (and a lot of the spinlock
>>>code is like this). That is just a waste of cycles.
>>>
>>
>>It is NOT just a few lines of code. Millisecond latencies on high-
>>powered CPU systems means more code than is probably required to send a
>>rocket 'round the moon and back.
>
>
> Most spinlocks only protect small code parts. Those that protect
> larger codes can probably use optionally some different lock.
>
> But dont attack it with "one size fits all" locking please.
>
I have a question about what sort of RT guarantees people might
want. Forget specific patches or implementations for a minute.
I'm genuinely curious, as an uneducated bystander - I want to get
a bit more background about this.
Presumably your RT tasks are going to want to do some kind of
*real* work somewhere along the line - so how is that work provided
guarantees?
For example, suppose you have preemptible everything, and priority
inheritance and that's all nice. But the actual time in which
some thread holds a lock is time that no other thread can take
that lock either, regardless of its priority.
So in that sense, if you do hard RT in the Linux kernel, it surely
is always going to be some subset of operations, dependant on
exact locking implementation, other tasks running and resource usage,
right?
Tasklist lock might be a good example off the top of my head - so
you may be able to send a signal to another process with deterministic
latency, however that latency might look something like: x + nrproc*y
It appears to me (uneducated bystander, remember) that a nanokernel
running a small hard-rt kernel and Linux together might be "better"
for people that want real realtime.
Just from the point of view of making the RT kernel as small and easy
to verify as possible, and not having to provide for general purpose
non-RT tasks. Then you also get the benefit of not having to make the
general purpose Linux support hard real time.
For example, if your RT kernel had something like a tasklist lock, it
may have an upper limit on the number of processes, or put in restart
points where lower priority processes drop the lock and restart what
they were doing if a high prio process one comes along - obviously
neither solution would fly for the Linux tasklist lock.
Or have I missed something completely? You RT guys have thought about
it - so what are some pros of the Linux-RT patch and/or cons of the
nanokernel approach, please?
[ And again, please don't say why Ingo's RT patch should go in, I'm
not talking about any patch, any merging of patches or even that
some hypothetical patch *shouldn't* go in - even if it does have
above problem ;) ]
Thanks very much,
Nick
Send instant messages to your online friends http://au.messenger.yahoo.com
* Nick Piggin <[email protected]> wrote:
> Presumably your RT tasks are going to want to do some kind of *real*
> work somewhere along the line - so how is that work provided
> guarantees?
there are several layers to this. The primary guarantee we can offer is
to execute userspace code within N usecs. Most code that needs hard
guarantees is quite simple and uses orthogonal mechanisms.
The secondary guarantee we will eventually offer is that as long as a
process uses orthogonal resources, other tasks will not delay it. This
is not quite true right now but it's possible to achive it
realistically. If an RT and a non-RT task shares e.g. the same file
descriptor - or the RT tasks does not use mlockall() to exclude VM, then
we cannot guarantee latencies. There might be more subtle 'sharing', but
as long as the primary APIs are fundamentally O(1) [and they are], the
worst-case overhead will be deterministic. What controls is the
worst-case latency of the kernel facility an RT task makes use of. Under
normal PREEMPT what controls latencies is the _system-wide_ worst-case
latency - which is a very different thing.
but it's not like hard-RT tasks live in a vacuum: they already have to
be aware of the latencies caused by themselves, and they have to be
consciously aware of what kernel facilities they use. If you do hard-RT
you have to be very aware of every line of code your task may execute.
> So in that sense, if you do hard RT in the Linux kernel, it surely is
> always going to be some subset of operations, dependant on exact
> locking implementation, other tasks running and resource usage, right?
yes. The goal is that latencies will fundamentally depend on what
facilities (and sharing) the RT task makes use of - instead of depending
on what _other_ tasks do in the system.
> Tasklist lock might be a good example off the top of my head - so you
> may be able to send a signal to another process with deterministic
> latency, however that latency might look something like: x + nrproc*y
yes, signals are not O(1).
Fundamentally, the Linux kernel constantly moves towards separation of
unrelated functionality, for scalability reasons. So the moment there's
some unexpected sharing, we try to get to rid of it not primarily due to
latencies, but due to performance. (and vice versa - one reason why it's
not hard to get latency patches into the kernel) E.g. the tasklist lock
might be convered to RCU one day. The idea is that a 'perfectly
scalable' Linux kernel also has perfect latencies - the two goals meet.
> It appears to me (uneducated bystander, remember) that a nanokernel
> running a small hard-rt kernel and Linux together might be "better"
> for people that want real realtime.
If your application model can tolerate a total separation of OSs then
that's sure a viable way. If you want to do everything under one
instance of Linux, and want to separate out some well-controlled RT
functionality, then PREEMPT_RT is good for you.
Note that if you can tolerate separation of OSs (i.e. no sharing or
well-controlled sharing) then you can do that under PREEMPT_RT too, here
and today: e.g. run all the non-RT tasks in an UML or QEMU instance.
(separation of UML needs more work but it's fundamentally ok.) Or you
can use PREEMPT_RT as the nanokernel [although this sure is overkill]
and put all the RT functionality into a virtual machine. So instead of a
hard choice forced upon you, virtualization becomes an option. Soft-RT
applications can morph towards hard-RT conditions and vice versa.
So whether it's good enough will have to be seen - maybe nanokernels
will win in the end. As long as PREEMPT_RT does not impose any undue
design burden on the stock kernel (and i believe it does not) it's a
win-win situation: latency improvements will drive scalability,
scalability improvements will drive latencies, and the code can be
easily removed if it becomes unused.
Ingo
On Fri, 2005-05-27 at 15:19 +1000, Nick Piggin wrote:
> Or have I missed something completely? You RT guys have thought about
> it - so what are some pros of the Linux-RT patch and/or cons of the
> nanokernel approach, please?
>
Sorry Nick,
The discussion about sub-kernels does not need to happen again in this
forum.
We have made more than enough noise about RT already, and taken
bandwidth that people are using to do real work.
There are different application domains, and nano-kernels have theirs.
If you are truly interested, there are a lot of papers about RT. There
are nanokernel implementations and patents you can review, and there is
a lot of controversy.
I would refer you to LKML the archives, going back a few years, where
you can find answers to all your questions.
The opinion you form in the end may depend largely on what you read,
whom you talk to, your understanding of how everything interacts, and
your intuition.
If you have any remaining specific questions, I'll be glad to point you
to more information (OFF LIST)
(I hope that puts it back in the box)
Best Regards,
Sven
Ingo Molnar wrote:
Thanks Ingo,
> * Nick Piggin <[email protected]> wrote:
>
>
>>Presumably your RT tasks are going to want to do some kind of *real*
>>work somewhere along the line - so how is that work provided
>>guarantees?
>
>
> there are several layers to this. The primary guarantee we can offer is
> to execute userspace code within N usecs. Most code that needs hard
> guarantees is quite simple and uses orthogonal mechanisms.
>
Well yes, but *somewhere* along the line they'll need to interact
with something else in a timely (in the RT sense) manner?
[...]
>
>>So in that sense, if you do hard RT in the Linux kernel, it surely is
>>always going to be some subset of operations, dependant on exact
>>locking implementation, other tasks running and resource usage, right?
>
>
> yes. The goal is that latencies will fundamentally depend on what
> facilities (and sharing) the RT task makes use of - instead of depending
> on what _other_ tasks do in the system.
>
OK.
>
>>Tasklist lock might be a good example off the top of my head - so you
>>may be able to send a signal to another process with deterministic
>>latency, however that latency might look something like: x + nrproc*y
>
>
> yes, signals are not O(1).
>
> Fundamentally, the Linux kernel constantly moves towards separation of
> unrelated functionality, for scalability reasons. So the moment there's
> some unexpected sharing, we try to get to rid of it not primarily due to
> latencies, but due to performance. (and vice versa - one reason why it's
> not hard to get latency patches into the kernel) E.g. the tasklist lock
> might be convered to RCU one day. The idea is that a 'perfectly
> scalable' Linux kernel also has perfect latencies - the two goals meet.
>
I'd have to think about that one ;)
But yeah I agree they seem to broadly move in the same direction,
but let's not split hairs.
>
>>It appears to me (uneducated bystander, remember) that a nanokernel
>>running a small hard-rt kernel and Linux together might be "better"
>>for people that want real realtime.
>
>
> If your application model can tolerate a total separation of OSs then
> that's sure a viable way. If you want to do everything under one
> instance of Linux, and want to separate out some well-controlled RT
> functionality, then PREEMPT_RT is good for you.
>
> Note that if you can tolerate separation of OSs (i.e. no sharing or
> well-controlled sharing) then you can do that under PREEMPT_RT too, here
> and today: e.g. run all the non-RT tasks in an UML or QEMU instance.
> (separation of UML needs more work but it's fundamentally ok.) Or you
> can use PREEMPT_RT as the nanokernel [although this sure is overkill]
> and put all the RT functionality into a virtual machine. So instead of a
> hard choice forced upon you, virtualization becomes an option. Soft-RT
> applications can morph towards hard-RT conditions and vice versa.
>
OK. I what sort of applications can't tolerate the nanokernel type
separation? I guess the hosts would be seperated by some network like
device, shared memory, etc. devices that use functionality provided
by the nanokernel?
> So whether it's good enough will have to be seen - maybe nanokernels
> will win in the end. As long as PREEMPT_RT does not impose any undue
> design burden on the stock kernel (and i believe it does not) it's a
> win-win situation: latency improvements will drive scalability,
> scalability improvements will drive latencies, and the code can be
> easily removed if it becomes unused.
Well yeah, from what I gather, the PREEMPT_RT work needn't be excluded
on the basis that it can't provide hard-RT - for a real world example
all the sound guys seem to love it ;) so it obviously is worth something.
And if the complexity can be nicely hidden away and configured out,
then I personally don't have any problem with it whatsoever :) But
I don't like to comment further on actual code until I see the actual
proposed patch when you're happy with it.
Nick
Send instant messages to your online friends http://au.messenger.yahoo.com
Sven-Thorsten Dietrich wrote:
> On Fri, 2005-05-27 at 15:19 +1000, Nick Piggin wrote:
>
>>Or have I missed something completely? You RT guys have thought about
>>it - so what are some pros of the Linux-RT patch and/or cons of the
>>nanokernel approach, please?
>>
>
>
> Sorry Nick,
>
Hi Sven,
> The discussion about sub-kernels does not need to happen again in this
> forum.
>
I never saw it happen in this forum. I believe you if you say it
has, but I suspect a lot has changed since then.
> We have made more than enough noise about RT already, and taken
> bandwidth that people are using to do real work.
>
These days you have to say something pretty stupid to worsen
the noise ratio on lkml ;) People manage to get real work done, so
don't worry about that.
What's more, this is actually a discussion that I hope *will* be
productive.
> There are different application domains, and nano-kernels have theirs.
>
> If you are truly interested, there are a lot of papers about RT. There
> are nanokernel implementations and patents you can review, and there is
> a lot of controversy.
>
What do you mean "truly interested"? Of course, that is why I asked.
I am not so much interested from the "I want to build an RT control
system" point of view as from "I want to get some background info on
changes that might soon be proposed to our kernel".
And that is why it is relevant on this forum. In that context (ie.
having a patch included) it is not up to us to go wading through years
of old debate, research and discussion. But whoever will propose the
RT patch to be included simply needs to come up with the rationale
and basically address people's concerns.
So unless the conclusion of your previous discussion was Linus and
Andrew saying "yes, we'll go with solution 'blah'", then it absolutely
is necessary to get the issues out in the open.
However I don't think think it would be too much to ask if you want
to be removed from CC list on the rest of the thread.
> I would refer you to LKML the archives, going back a few years, where
> you can find answers to all your questions.
>
A few *years* ago? No thanks.
> The opinion you form in the end may depend largely on what you read,
> whom you talk to, your understanding of how everything interacts, and
> your intuition.
>
> If you have any remaining specific questions, I'll be glad to point you
> to more information (OFF LIST)
lkml is a great forum for almost any kernel development discussion,
especially something like this.
I had some specific questions in my previous email. The main thing
I guess, is "why not a nanokernel?". I am also very interested in the
applications that will require PREEMPT_RT too.
Don't take my questions personally or as an attack against the patch.
This kind of discussion always happens, and always surrounds moderately
large changes proposed to the kernel. (I'm not saying you won't get
flames, but this really isn't one).
>
> (I hope that puts it back in the box)
>
Actually it doesn't at all, but thanks anyway ;)
Thanks,
Nick
Send instant messages to your online friends http://au.messenger.yahoo.com
On Thu, 2005-05-26 at 22:27 +0200, Andi Kleen wrote:
> I really dislike the idea of interrupt threads. It seems totally
> wrong to me to make such a fundamental operation as an interrupt
> much slower. If really any interrupts take too long they should
> move to workqueues instead and be preempted there.
I don't see a real good argument why an interrupt is such a fundamental
operation which has to be treated seperately. It is a computation type
with a set of constraints. It depends on your system requirements which
importance and execution mechanism you select for the computation in
order to fulfil the constraints.
If you want deterministic control in an OS you have to control _all_
computation types by the scheduler. IRQ to thread conversion is one of
many mechanisms to gain control over the execution behaviour of
interrupt type computations. It has an nice advantage over other
mechanisms as it is simple to understand; people are used to deal with
threads and priorities.
> But keep
> the basic fundamental operations fast please (at least that used to be one
> of the Linux mottos that served it very well for many years, although more
> and more people seem to forget it now)
"It has been that way since ages" arguments are not really productive in
a discussion. If you look at the history of Linux over the years, many
things which seemed to be untouchable have been changed in order to make
it usable for specific application types.
Linux deals quite well with a broad range of application scenarios and
there is quite a lot of interest from industrial users to have RT
features included. Sure you may argue that the addon solutions,
nanokernel approaches are available for that, but industrial users are
looking for a scalable all in one solution, where they can turn on the
features they need for the specific application.
tglx
On Fri, 2005-05-27 at 11:14 +0200, Ingo Molnar wrote:
> * Thomas Gleixner <[email protected]> wrote:
>
> > > But keep
> > > the basic fundamental operations fast please (at least that used to be one
> > > of the Linux mottos that served it very well for many years, although more
> > > and more people seem to forget it now)
> >
> > "It has been that way since ages" arguments are not really productive in
> > a discussion. [...]
>
> to make sure the wide context has not been lost: no way is IRQ threading
> ever going to be the main or even the preferred mode of operation.
Sure, I did not imply to make it the standard behaviour. It's a feature
beside other more or less useful ones.
tglx
> NOT-Signed-off-by: Ingo Molnar <[email protected]>
:)
* Thomas Gleixner <[email protected]> wrote:
> > But keep
> > the basic fundamental operations fast please (at least that used to be one
> > of the Linux mottos that served it very well for many years, although more
> > and more people seem to forget it now)
>
> "It has been that way since ages" arguments are not really productive in
> a discussion. [...]
to make sure the wide context has not been lost: no way is IRQ threading
ever going to be the main or even the preferred mode of operation.
secondly, there's no performance impact on stock kernels, nor any design
drag. I have done a very quick & dirty separation out of hardirq
threading from -RT patchset, see the patch below. It's pretty small:
8 files changed, 375 insertions(+), 53 deletions(-)
no arch level change is needed - if an arch uses GENERIC_HARDIRQS then
it will be automatically capable to run hardirq threads.
Ingo
NOT-Signed-off-by: Ingo Molnar <[email protected]>
--- linux/kernel/irq/proc.c.orig
+++ linux/kernel/irq/proc.c
@@ -7,9 +7,12 @@
*/
#include <linux/irq.h>
+#include <asm/uaccess.h>
#include <linux/proc_fs.h>
#include <linux/interrupt.h>
+#include "internals.h"
+
static struct proc_dir_entry *root_irq_dir, *irq_dir[NR_IRQS];
#ifdef CONFIG_SMP
@@ -67,37 +70,6 @@ static int irq_affinity_write_proc(struc
#endif
-#define MAX_NAMELEN 128
-
-static int name_unique(unsigned int irq, struct irqaction *new_action)
-{
- struct irq_desc *desc = irq_desc + irq;
- struct irqaction *action;
-
- for (action = desc->action ; action; action = action->next)
- if ((action != new_action) && action->name &&
- !strcmp(new_action->name, action->name))
- return 0;
- return 1;
-}
-
-void register_handler_proc(unsigned int irq, struct irqaction *action)
-{
- char name [MAX_NAMELEN];
-
- if (!irq_dir[irq] || action->dir || !action->name ||
- !name_unique(irq, action))
- return;
-
- memset(name, 0, MAX_NAMELEN);
- snprintf(name, MAX_NAMELEN, "%s", action->name);
-
- /* create /proc/irq/1234/handler/ */
- action->dir = proc_mkdir(name, irq_dir[irq]);
-}
-
-#undef MAX_NAMELEN
-
#define MAX_NAMELEN 10
void register_irq_proc(unsigned int irq)
@@ -137,10 +109,96 @@ void register_irq_proc(unsigned int irq)
void unregister_handler_proc(unsigned int irq, struct irqaction *action)
{
+ if (action->threaded)
+ remove_proc_entry(action->threaded->name, action->dir);
if (action->dir)
remove_proc_entry(action->dir->name, irq_dir[irq]);
}
+#ifndef CONFIG_PREEMPT_RT
+
+static int threaded_read_proc(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ return sprintf(page, "%c\n",
+ ((struct irqaction *)data)->flags & SA_NODELAY ? '0' : '1');
+}
+
+static int threaded_write_proc(struct file *file, const char __user *buffer,
+ unsigned long count, void *data)
+{
+ int c;
+ struct irqaction *action = data;
+ irq_desc_t *desc = irq_desc + action->irq;
+
+ if (get_user(c, buffer))
+ return -EFAULT;
+ if (c != '0' && c != '1')
+ return -EINVAL;
+
+ spin_lock_irq(&desc->lock);
+
+ if (c == '0')
+ action->flags |= SA_NODELAY;
+ if (c == '1')
+ action->flags &= ~SA_NODELAY;
+ recalculate_desc_flags(desc);
+
+ spin_unlock_irq(&desc->lock);
+
+ return 1;
+}
+
+#endif
+
+#define MAX_NAMELEN 128
+
+static int name_unique(unsigned int irq, struct irqaction *new_action)
+{
+ struct irq_desc *desc = irq_desc + irq;
+ struct irqaction *action;
+
+ for (action = desc->action ; action; action = action->next)
+ if ((action != new_action) && action->name &&
+ !strcmp(new_action->name, action->name))
+ return 0;
+ return 1;
+}
+
+void register_handler_proc(unsigned int irq, struct irqaction *action)
+{
+ char name [MAX_NAMELEN];
+
+ if (!irq_dir[irq] || action->dir || !action->name ||
+ !name_unique(irq, action))
+ return;
+
+ memset(name, 0, MAX_NAMELEN);
+ snprintf(name, MAX_NAMELEN, "%s", action->name);
+
+ /* create /proc/irq/1234/handler/ */
+ action->dir = proc_mkdir(name, irq_dir[irq]);
+ if (!action->dir)
+ return;
+#ifndef CONFIG_PREEMPT_RT
+ {
+ struct proc_dir_entry *entry;
+ /* create /proc/irq/1234/handler/threaded */
+ entry = create_proc_entry("threaded", 0600, action->dir);
+ if (!entry)
+ return;
+ entry->nlink = 1;
+ entry->data = (void *)action;
+ entry->read_proc = threaded_read_proc;
+ entry->write_proc = threaded_write_proc;
+ action->threaded = entry;
+ }
+#endif
+}
+
+#undef MAX_NAMELEN
+
+
void init_irq_proc(void)
{
int i;
@@ -150,6 +208,9 @@ void init_irq_proc(void)
if (!root_irq_dir)
return;
+ /* create /proc/irq/prof_cpu_mask */
+ create_prof_cpu_mask(root_irq_dir);
+
/*
* Create entries for all existing IRQs.
*/
--- linux/kernel/irq/manage.c.orig
+++ linux/kernel/irq/manage.c
@@ -7,8 +7,10 @@
*/
#include <linux/irq.h>
-#include <linux/module.h>
#include <linux/random.h>
+#include <linux/module.h>
+#include <linux/kthread.h>
+#include <linux/syscalls.h>
#include <linux/interrupt.h>
#include "internals.h"
@@ -30,8 +32,12 @@ void synchronize_irq(unsigned int irq)
{
struct irq_desc *desc = irq_desc + irq;
- while (desc->status & IRQ_INPROGRESS)
- cpu_relax();
+ if (hardirq_preemption && !(desc->status & IRQ_NODELAY))
+ wait_event(desc->wait_for_handler,
+ !(desc->status & IRQ_INPROGRESS));
+ else
+ while (desc->status & IRQ_INPROGRESS)
+ cpu_relax();
}
EXPORT_SYMBOL(synchronize_irq);
@@ -127,6 +133,21 @@ void enable_irq(unsigned int irq)
EXPORT_SYMBOL(enable_irq);
/*
+ * If any action has SA_NODELAY then turn IRQ_NODELAY on:
+ */
+void recalculate_desc_flags(struct irq_desc *desc)
+{
+ struct irqaction *action;
+
+ desc->status &= ~IRQ_NODELAY;
+ for (action = desc->action ; action; action = action->next)
+ if (action->flags & SA_NODELAY)
+ desc->status |= IRQ_NODELAY;
+}
+
+static int start_irq_thread(int irq, struct irq_desc *desc);
+
+/*
* Internal function that tells the architecture code whether a
* particular irq has been exclusively allocated or is available
* for driver use.
@@ -176,6 +197,9 @@ int setup_irq(unsigned int irq, struct i
rand_initialize_irq(irq);
}
+ if (!(new->flags & SA_NODELAY))
+ if (start_irq_thread(irq, desc))
+ return -ENOMEM;
/*
* The following block of code has to be executed atomically
*/
@@ -198,6 +222,11 @@ int setup_irq(unsigned int irq, struct i
*p = new;
+ /*
+ * Propagate any possible SA_NODELAY flag into IRQ_NODELAY:
+ */
+ recalculate_desc_flags(desc);
+
if (!shared) {
desc->depth = 0;
desc->status &= ~(IRQ_DISABLED | IRQ_AUTODETECT |
@@ -211,7 +240,7 @@ int setup_irq(unsigned int irq, struct i
new->irq = irq;
register_irq_proc(irq);
- new->dir = NULL;
+ new->dir = new->threaded = NULL;
register_handler_proc(irq, new);
return 0;
@@ -262,6 +291,7 @@ void free_irq(unsigned int irq, void *de
else
desc->handler->disable(irq);
}
+ recalculate_desc_flags(desc);
spin_unlock_irqrestore(&desc->lock,flags);
unregister_handler_proc(irq, action);
@@ -347,3 +377,171 @@ int request_irq(unsigned int irq,
EXPORT_SYMBOL(request_irq);
+#ifdef CONFIG_PREEMPT_HARDIRQS
+
+int hardirq_preemption = 1;
+
+EXPORT_SYMBOL(hardirq_preemption);
+
+/*
+ * Real-Time Preemption depends on hardirq threading:
+ */
+#ifndef CONFIG_PREEMPT_RT
+
+static int __init hardirq_preempt_setup (char *str)
+{
+ if (!strncmp(str, "off", 3))
+ hardirq_preemption = 0;
+ else
+ get_option(&str, &hardirq_preemption);
+ if (!hardirq_preemption)
+ printk("turning off hardirq preemption!\n");
+
+ return 1;
+}
+
+__setup("hardirq-preempt=", hardirq_preempt_setup);
+
+#endif
+
+static void do_hardirq(struct irq_desc *desc)
+{
+ struct irqaction * action;
+ unsigned int irq = desc - irq_desc;
+
+ local_irq_disable();
+
+ if (desc->status & IRQ_INPROGRESS) {
+ action = desc->action;
+ spin_lock(&desc->lock);
+ for (;;) {
+ irqreturn_t action_ret = 0;
+
+ if (action) {
+ spin_unlock(&desc->lock);
+ action_ret = handle_IRQ_event(irq, NULL,action);
+ local_irq_enable();
+ cond_resched_all();
+ spin_lock_irq(&desc->lock);
+ }
+ if (!noirqdebug)
+ note_interrupt(irq, desc, action_ret);
+ if (likely(!(desc->status & IRQ_PENDING)))
+ break;
+ desc->status &= ~IRQ_PENDING;
+ }
+ desc->status &= ~IRQ_INPROGRESS;
+ /*
+ * The ->end() handler has to deal with interrupts which got
+ * disabled while the handler was running.
+ */
+ desc->handler->end(irq);
+ spin_unlock(&desc->lock);
+ }
+ local_irq_enable();
+ if (waitqueue_active(&desc->wait_for_handler))
+ wake_up(&desc->wait_for_handler);
+}
+
+extern asmlinkage void __do_softirq(void);
+
+static int curr_irq_prio = 49;
+
+static int do_irqd(void * __desc)
+{
+ struct sched_param param = { 0, };
+ struct irq_desc *desc = __desc;
+#ifdef CONFIG_SMP
+ int irq = desc - irq_desc;
+ cpumask_t mask;
+
+ mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
+ set_cpus_allowed(current, mask);
+#endif
+ current->flags |= PF_NOFREEZE | PF_HARDIRQ;
+
+ /*
+ * Scale irq thread priorities from prio 50 to prio 25
+ */
+ param.sched_priority = curr_irq_prio;
+ if (param.sched_priority > 25)
+ curr_irq_prio = param.sched_priority - 1;
+
+ sys_sched_setscheduler(current->pid, SCHED_FIFO, ¶m);
+
+ while (!kthread_should_stop()) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ do_hardirq(desc);
+ cond_resched_all();
+ __do_softirq();
+ local_irq_enable();
+#ifdef CONFIG_SMP
+ /*
+ * Did IRQ affinities change?
+ */
+ if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
+ mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
+ set_cpus_allowed(current, mask);
+ }
+#endif
+ schedule();
+ }
+ __set_current_state(TASK_RUNNING);
+ return 0;
+}
+
+static int ok_to_create_irq_threads;
+
+static int start_irq_thread(int irq, struct irq_desc *desc)
+{
+ if (desc->thread || !ok_to_create_irq_threads)
+ return 0;
+
+ desc->thread = kthread_create(do_irqd, desc, "IRQ %d", irq);
+ if (!desc->thread) {
+ printk(KERN_ERR "irqd: could not create IRQ thread %d!\n", irq);
+ return -ENOMEM;
+ }
+
+ /*
+ * An interrupt may have come in before the thread pointer was
+ * stored in desc->thread; make sure the thread gets woken up in
+ * such a case:
+ */
+ smp_mb();
+ wake_up_process(desc->thread);
+
+ return 0;
+}
+
+void __init init_hardirqs(void)
+{
+ int i;
+ ok_to_create_irq_threads = 1;
+
+ for (i = 0; i < NR_IRQS; i++) {
+ irq_desc_t *desc = irq_desc + i;
+
+ if (desc->action && !(desc->status & IRQ_NODELAY))
+ start_irq_thread(i, desc);
+ }
+}
+
+#else
+
+static int start_irq_thread(int irq, struct irq_desc *desc)
+{
+ return 0;
+}
+
+#endif
+
+void __init early_init_hardirqs(void)
+{
+ int i;
+
+ for (i = 0; i < NR_IRQS; i++)
+ init_waitqueue_head(&irq_desc[i].wait_for_handler);
+}
+
+
--- linux/kernel/irq/handle.c.orig
+++ linux/kernel/irq/handle.c
@@ -9,6 +9,7 @@
#include <linux/irq.h>
#include <linux/module.h>
#include <linux/random.h>
+#include <linux/kallsyms.h>
#include <linux/interrupt.h>
#include <linux/kernel_stat.h>
@@ -32,7 +33,7 @@ irq_desc_t irq_desc[NR_IRQS] __cacheline
[0 ... NR_IRQS-1] = {
.status = IRQ_DISABLED,
.handler = &no_irq_type,
- .lock = SPIN_LOCK_UNLOCKED
+ .lock = RAW_SPIN_LOCK_UNLOCKED
}
};
@@ -74,6 +75,32 @@ irqreturn_t no_action(int cpl, void *dev
}
/*
+ * Hack - used for development only.
+ */
+int debug_direct_keyboard = 0;
+
+int redirect_hardirq(struct irq_desc *desc)
+{
+ /*
+ * Direct execution:
+ */
+ if (!hardirq_preemption || (desc->status & IRQ_NODELAY) ||
+ !desc->thread)
+ return 0;
+
+#ifdef __i386__
+ if (debug_direct_keyboard && (desc - irq_desc == 1))
+ return 0;
+#endif
+
+ BUG_ON(!irqs_disabled());
+ if (desc->thread && desc->thread->state != TASK_RUNNING)
+ wake_up_process(desc->thread);
+
+ return 1;
+}
+
+/*
* Have got an event to handle:
*/
fastcall int handle_IRQ_event(unsigned int irq, struct pt_regs *regs,
@@ -81,30 +108,50 @@ fastcall int handle_IRQ_event(unsigned i
{
int ret, retval = 0, status = 0;
- if (!(action->flags & SA_INTERRUPT))
+ /*
+ * Unconditionally enable interrupts for threaded
+ * IRQ handlers:
+ */
+ if (!hardirq_count() || !(action->flags & SA_INTERRUPT))
local_irq_enable();
do {
+ unsigned int preempt_count = preempt_count();
+
ret = action->handler(irq, action->dev_id, regs);
+ if (preempt_count() != preempt_count) {
+ stop_trace();
+ print_symbol("BUG: unbalanced irq-handler preempt count in %s!\n", (unsigned long) action->handler);
+ printk("entered with %08x, exited with %08x.\n", preempt_count, preempt_count());
+ dump_stack();
+ preempt_count() = preempt_count;
+ }
if (ret == IRQ_HANDLED)
status |= action->flags;
retval |= ret;
action = action->next;
} while (action);
- if (status & SA_SAMPLE_RANDOM)
+ if (status & SA_SAMPLE_RANDOM) {
+ local_irq_enable();
add_interrupt_randomness(irq);
+ }
local_irq_disable();
return retval;
}
+cycles_t irq_timestamp(unsigned int irq)
+{
+ return irq_desc[irq].timestamp;
+}
+
/*
* do_IRQ handles all normal device IRQ's (the special
* SMP cross-CPU interrupts have their own specific
* handlers).
*/
-fastcall unsigned int __do_IRQ(unsigned int irq, struct pt_regs *regs)
+fastcall notrace unsigned int __do_IRQ(unsigned int irq, struct pt_regs *regs)
{
irq_desc_t *desc = irq_desc + irq;
struct irqaction * action;
@@ -124,6 +171,7 @@ fastcall unsigned int __do_IRQ(unsigned
desc->handler->end(irq);
return 1;
}
+ desc->timestamp = get_cycles();
spin_lock(&desc->lock);
desc->handler->ack(irq);
@@ -156,6 +204,12 @@ fastcall unsigned int __do_IRQ(unsigned
goto out;
/*
+ * hardirq redirection to the irqd process context:
+ */
+ if (redirect_hardirq(desc))
+ goto out_no_end;
+
+ /*
* Edge triggered interrupts need to remember
* pending events.
* This applies to any hw interrupts that allow a second
@@ -180,13 +234,13 @@ fastcall unsigned int __do_IRQ(unsigned
desc->status &= ~IRQ_PENDING;
}
desc->status &= ~IRQ_INPROGRESS;
-
out:
/*
* The ->end() handler has to deal with interrupts which got
* disabled while the handler was running.
*/
desc->handler->end(irq);
+out_no_end:
spin_unlock(&desc->lock);
return 1;
--- linux/kernel/irq/autoprobe.c.orig
+++ linux/kernel/irq/autoprobe.c
@@ -7,6 +7,7 @@
*/
#include <linux/irq.h>
+#include <linux/delay.h>
#include <linux/module.h>
#include <linux/interrupt.h>
@@ -26,7 +27,7 @@ static DECLARE_MUTEX(probe_sem);
*/
unsigned long probe_irq_on(void)
{
- unsigned long val, delay;
+ unsigned long val;
irq_desc_t *desc;
unsigned int i;
@@ -44,9 +45,10 @@ unsigned long probe_irq_on(void)
spin_unlock_irq(&desc->lock);
}
- /* Wait for longstanding interrupts to trigger. */
- for (delay = jiffies + HZ/50; time_after(delay, jiffies); )
- /* about 20ms delay */ barrier();
+ /*
+ * Wait for longstanding interrupts to trigger, 20 msec delay:
+ */
+ msleep(HZ/50);
/*
* enable any unassigned irqs
@@ -66,10 +68,9 @@ unsigned long probe_irq_on(void)
}
/*
- * Wait for spurious interrupts to trigger
+ * Wait for spurious interrupts to trigger, 100 msec delay:
*/
- for (delay = jiffies + HZ/10; time_after(delay, jiffies); )
- /* about 100ms delay */ barrier();
+ msleep(HZ/10);
/*
* Now filter out any obviously spurious interrupts
--- linux/kernel/irq/internals.h.orig
+++ linux/kernel/irq/internals.h
@@ -4,6 +4,8 @@
extern int noirqdebug;
+void recalculate_desc_flags(struct irq_desc *desc);
+
#ifdef CONFIG_PROC_FS
extern void register_irq_proc(unsigned int irq);
extern void register_handler_proc(unsigned int irq, struct irqaction *action);
--- linux/include/linux/interrupt.h.orig
+++ linux/include/linux/interrupt.h
@@ -41,7 +41,7 @@ struct irqaction {
void *dev_id;
struct irqaction *next;
int irq;
- struct proc_dir_entry *dir;
+ struct proc_dir_entry *dir, *threaded;
};
extern irqreturn_t no_action(int cpl, void *dev_id, struct pt_regs *regs);
@@ -126,6 +131,7 @@ extern void softirq_init(void);
#define __raise_softirq_irqoff(nr) do { local_softirq_pending() |= 1UL << (nr); } while (0)
extern void FASTCALL(raise_softirq_irqoff(unsigned int nr));
extern void FASTCALL(raise_softirq(unsigned int nr));
+extern void wakeup_irqd(void);
/* Tasklets --- multithreaded analogue of BHs.
--- linux/include/linux/hardirq.h.orig
+++ linux/include/linux/hardirq.h
@@ -58,11 +58,13 @@
* Are we doing bottom half or hardware interrupt processing?
* Are we in a softirq context? Interrupt context?
*/
-#define in_irq() (hardirq_count())
-#define in_softirq() (softirq_count())
-#define in_interrupt() (irq_count())
-
-#if defined(CONFIG_PREEMPT) && !defined(CONFIG_PREEMPT_BKL)
+#define in_irq() (hardirq_count() || (current->flags & PF_HARDIRQ))
+#define in_softirq() (softirq_count() || (current->flags & PF_SOFTIRQ))
+#define in_interrupt() (irq_count())
+
+#if defined(CONFIG_PREEMPT) && \
+ !defined(CONFIG_PREEMPT_BKL) && \
+ !defined(CONFIG_PREEMPT_RT)
# define in_atomic() ((preempt_count() & ~PREEMPT_ACTIVE) != kernel_locked())
#else
# define in_atomic() ((preempt_count() & ~PREEMPT_ACTIVE) != 0)
--- linux/include/linux/sched.h.orig
+++ linux/include/linux/sched.h
@@ -791,6 +942,9 @@ do { if (atomic_dec_and_test(&(tsk)->usa
#define PF_SYNCWRITE 0x00200000 /* I am doing a sync write */
#define PF_BORROWED_MM 0x00400000 /* I am a kthread doing use_mm */
#define PF_RANDOMIZE 0x00800000 /* randomize virtual address space */
+#define PF_SOFTIRQ 0x01000000 /* softirq context */
+#define PF_HARDIRQ 0x02000000 /* hardirq context */
+
/*
* Only the _current_ task can read/write to tsk->flags, but other
On Fri, 2005-05-27 at 18:17 +1000, Nick Piggin wrote:
> >>Or have I missed something completely? You RT guys have thought about
> >>it - so what are some pros of the Linux-RT patch and/or cons of the
> >>nanokernel approach, please?
>
> I never saw it happen in this forum. I believe you if you say it
> has, but I suspect a lot has changed since then.
It happened and mostly ended with a flame feast.
I try to give a very short and incomplete answer to a complex question.
Having RT features integrated in the kernel itself makes it simple to do
smooth transitions of applications from the soft-RT to the hard-RT world
without changing code, recompiling. You have one set of libraries
instead of two and perfect collocation of non-RT and RT threads. Users
have only to deal with one API instead of two.
Nanokernels give you slightly better latencies and make a clear
seperation between the RT and non RT world. This seperation is better
reviewable and gives you a chance to do static code path analysis in
order to do theoretical worst case estimation, which is a prerequisite
for approvals in certain application fields.
Theres a lot more - factual and "religious", but it takes more than a
few lines and a few minutes :)
I think there will be more application areas than the unpopular
industrial/embedded stuff in the near future which would benefit from
integrated RT enhancements.
tglx
* Bill Huey <[email protected]> wrote:
> There's really no good reason why this kernel can't get the same
> latency as a nanokernel. The scheduler paths are riddled with SMP
> rebalancing stuff and the like which contributes to overall system
> latency. Remove those things and replace it with things like direct
> CPU pining and you'll start seeing those numbers collapse. [...]
could you be a bit more specific? None of that stuff should show up on
UP kernels. Even on SMP, rebalancing is either asynchronous, or O(1).
Ingo
On Fri, May 27, 2005 at 03:19:37PM +1000, Nick Piggin wrote:
> For example, suppose you have preemptible everything, and priority
> inheritance and that's all nice. But the actual time in which
> some thread holds a lock is time that no other thread can take
> that lock either, regardless of its priority.
Ingo just answered it and I commonly touch on this subject off and on
and including this thread recently.
> So in that sense, if you do hard RT in the Linux kernel, it surely
> is always going to be some subset of operations, dependant on
> exact locking implementation, other tasks running and resource usage,
> right?
...
> It appears to me (uneducated bystander, remember) that a nanokernel
> running a small hard-rt kernel and Linux together might be "better"
> for people that want real realtime.
I answered this already in this thread with the digression about RTAI.
I believe that it is a clear explanation but it's a bit out of the box
so to speak since it's looking towards more sophisticated uses of this
patch from an overall software design point of view.
> Just from the point of view of making the RT kernel as small and easy
> to verify as possible, and not having to provide for general purpose
> non-RT tasks. Then you also get the benefit of not having to make the
> general purpose Linux support hard real time.
There's really no good reason why this kernel can't get the same latency
as a nanokernel. The scheduler paths are riddled with SMP rebalancing
stuff and the like which contributes to overall system latency. Remove
those things and replace it with things like direct CPU pining and you'll
start seeing those numbers collapse. There are also TLB issues, but there
are many way of reducing and stripping down this kernel to reach so called
nanokernel times. Nanokernel times are overidealized IMO. It's not
because of design necessarily, but because of implementation issues that
add more latency to the deterministic latency time constant.
Just a thread wake operation under an 2x SMP box flattens the latency
histogram from a 8 usec spike to a 10-22 usec spread (800mhz p3 2x)
roughly. There are many more spots that contribute to latency that can
be made static or precomputed in some way.
RT priority thread rebalancing and IPI send off adds to this rescheduling
latency as well.
> For example, if your RT kernel had something like a tasklist lock, it
> may have an upper limit on the number of processes, or put in restart
> points where lower priority processes drop the lock and restart what
> they were doing if a high prio process one comes along - obviously
> neither solution would fly for the Linux tasklist lock.
>
> Or have I missed something completely? You RT guys have thought about
> it - so what are some pros of the Linux-RT patch and/or cons of the
> nanokernel approach, please?
Again, I think I answered this in the RTAI discussion in this thread.
If you can groke my lingo then it should lay a kind of design track
where this kind of kernel design can be easier and more flexible to
work with in regards to future kernel subsystem alterations.
> [ And again, please don't say why Ingo's RT patch should go in, I'm
> not talking about any patch, any merging of patches or even that
> some hypothetical patch *shouldn't* go in - even if it does have
> above problem ;) ]
Shut up :)
> Thanks very much,
bill
Bill Huey (hui) wrote:
> On Fri, May 27, 2005 at 03:19:37PM +1000, Nick Piggin wrote:
>>It appears to me (uneducated bystander, remember) that a nanokernel
>>running a small hard-rt kernel and Linux together might be "better"
>>for people that want real realtime.
>
>
> I answered this already in this thread with the digression about RTAI.
> I believe that it is a clear explanation but it's a bit out of the box
> so to speak since it's looking towards more sophisticated uses of this
> patch from an overall software design point of view.
>
Yes I did see that post of yours, but I didn't really understand the
explaination. (I hope I don't quote you out of context, please correct
me if so).
I don't see why you would have problems crossing kernel "concurrency
domains" with the nanokernel approach. Presumably your hard-RT guest
kernel or its tasks aren't going to go to the Linux image in order
to satisfy a hard RT request.
Also, you said "I have to think about things like dcache_lock, route
tables, access to various IO system like SCSI and TCP/IP, etc...",
but at first glance, those locks and structures are exactly why you
wouldn't want to do hard-rt work along side general purpose work in
the Linux kernel.
And quite how they would interfere with the hard-rt guest, you didn't
make clear.
"A single system image makes access to this direct unlike dual kernel
system where you need some kind of communication coupling. Resource
access is direct."
... but you still need the locks, right?
>
>>Just from the point of view of making the RT kernel as small and easy
>>to verify as possible, and not having to provide for general purpose
>>non-RT tasks. Then you also get the benefit of not having to make the
>>general purpose Linux support hard real time.
>
>
> There's really no good reason why this kernel can't get the same latency
> as a nanokernel. The scheduler paths are riddled with SMP rebalancing
> stuff and the like which contributes to overall system latency. Remove
> those things and replace it with things like direct CPU pining and you'll
> start seeing those numbers collapse. There are also TLB issues, but there
> are many way of reducing and stripping down this kernel to reach so called
> nanokernel times. Nanokernel times are overidealized IMO. It's not
> because of design necessarily, but because of implementation issues that
> add more latency to the deterministic latency time constant.
>
Is this one reason why a nanokernel is better, then? So you wouldn't
have to worry about the SMP rebalancing, and TLB issues, and everything
else in your Linux kernel?
> Just a thread wake operation under an 2x SMP box flattens the latency
> histogram from a 8 usec spike to a 10-22 usec spread (800mhz p3 2x)
> roughly. There are many more spots that contribute to latency that can
> be made static or precomputed in some way.
>
> RT priority thread rebalancing and IPI send off adds to this rescheduling
> latency as well.
>
>
>>For example, if your RT kernel had something like a tasklist lock, it
>>may have an upper limit on the number of processes, or put in restart
>>points where lower priority processes drop the lock and restart what
>>they were doing if a high prio process one comes along - obviously
>>neither solution would fly for the Linux tasklist lock.
>>
>>Or have I missed something completely? You RT guys have thought about
>>it - so what are some pros of the Linux-RT patch and/or cons of the
>>nanokernel approach, please?
>
>
> Again, I think I answered this in the RTAI discussion in this thread.
> If you can groke my lingo then it should lay a kind of design track
> where this kind of kernel design can be easier and more flexible to
> work with in regards to future kernel subsystem alterations.
>
I'm not sure if you exactly answered my concerns in that thread
(or I didn't understand). It would be great if you could help me
out a bit here, because I feel I must be missing something here.
>
>>[ And again, please don't say why Ingo's RT patch should go in, I'm
>> not talking about any patch, any merging of patches or even that
>> some hypothetical patch *shouldn't* go in - even if it does have
>> above problem ;) ]
>
>
> Shut up :)
>
Well, so long as everyone's on the same page, I'll stop with my
silly disclaimers ;)
Nick
Send instant messages to your online friends http://au.messenger.yahoo.com
* Andi Kleen <[email protected]> wrote:
> [...] Even normal kernels must have reasonably good latency, as long
> as it doesnt cost unnecessary performance.
they do get reasonably good latency (within the hard constraints of the
possibilities of a given preemption model), due to the cross-effects
between the various preemption models, that i explained in detail in
earlier mails. Something that directly improves latencies on
CONFIG_PREEMPT improves the 'subsystem-use latencies' on PREEMPT_RT.
Also there's the positive interaction between scalability and latencies
as well.
but it's certainly not for free. Just like there's no zero-cost
virtualization, or there's no zero-cost nanokernel approach either,
there's no zero-cost single-kernel-image deterministic system either.
and the argument about binary kernels - that's a choice up to vendors
and users. Right now PREEMPT_NONE is dominant, so do you argue that
CONFIG_PREEMPT should be removed? It's certainly not zero-cost even on
the source code, witness all the preempt_disable()/preempt_enable() or
get_cpu()/put_cpu() uses.
Ingo
On Fri, May 27, 2005 at 02:48:37PM +0200, Ingo Molnar wrote:
> * Andi Kleen <[email protected]> wrote:
>
> > [...] Even normal kernels must have reasonably good latency, as long
> > as it doesnt cost unnecessary performance.
>
> they do get reasonably good latency (within the hard constraints of the
> possibilities of a given preemption model), due to the cross-effects
> between the various preemption models, that i explained in detail in
> earlier mails. Something that directly improves latencies on
> CONFIG_PREEMPT improves the 'subsystem-use latencies' on PREEMPT_RT.
I was more thinking of improvements for !PREEMPT.
> Also there's the positive interaction between scalability and latencies
> as well.
That sound more like bugs that should just be fixed in the main
kernel by more scheduling. Can you give details and examples?
>
> but it's certainly not for free. Just like there's no zero-cost
> virtualization, or there's no zero-cost nanokernel approach either,
> there's no zero-cost single-kernel-image deterministic system either.
>
> and the argument about binary kernels - that's a choice up to vendors
It is not only binary distribution kernels. I always use my own self compiled
kernels, but I certainly would not want a special kernel just to do something
normal that requires good latency (like sound use).
But that said most Linux users these days use distribution binary kernels,
so we definitely need to take care of them too.
> and users. Right now PREEMPT_NONE is dominant, so do you argue that
> CONFIG_PREEMPT should be removed? It's certainly not zero-cost even on
> the source code, witness all the preempt_disable()/preempt_enable() or
> get_cpu()/put_cpu() uses.
Actually yes I would. AFAIK CONFIG_PREEMPT never improved latency
considerably (from the numbers Ive seen), but it had extreme
cost in the code base (like all this get/put cpu mess and the
impact on RCU was also not pretty)
So it never seemed very useful to me. May be it would have been
better if it had been made UP only, that would at least have
avoided a lot of issues.
But at least CONFIG_PREEMPT is still reasonably cheap, so it is
not as intrusive as some of the stuff proposed.
-Andi
* Andi Kleen <[email protected]> wrote:
> On Fri, May 27, 2005 at 02:48:37PM +0200, Ingo Molnar wrote:
> > * Andi Kleen <[email protected]> wrote:
> >
> > > [...] Even normal kernels must have reasonably good latency, as long
> > > as it doesnt cost unnecessary performance.
> >
> > they do get reasonably good latency (within the hard constraints of the
> > possibilities of a given preemption model), due to the cross-effects
> > between the various preemption models, that i explained in detail in
> > earlier mails. Something that directly improves latencies on
> > CONFIG_PREEMPT improves the 'subsystem-use latencies' on PREEMPT_RT.
>
> I was more thinking of improvements for !PREEMPT.
how would you do that, if even a first step (PREEMPT_VOLUNTARY) was
opposed by some as possibly hurting throughput? I'm really curious, what
would you do to improve PREEMPT_NONE's latencies?
> > Also there's the positive interaction between scalability and latencies
> > as well.
>
> That sound more like bugs that should just be fixed in the main kernel
> by more scheduling. Can you give details and examples?
what i meant is a pretty common-sense thing: the more independent the
locks are, the more shortlived locking is, the less latencies there are.
The reverse is true too: most of the latency-breakers move code out from
under locks - which obviously improves scalability too. So if you are
working on scalability you'll indirectly improve latencies - and if you
are working on reducing latencies, you often improve scalability.
> > but it's certainly not for free. Just like there's no zero-cost
> > virtualization, or there's no zero-cost nanokernel approach either,
> > there's no zero-cost single-kernel-image deterministic system either.
> >
> > and the argument about binary kernels - that's a choice up to vendors
>
> It is not only binary distribution kernels. I always use my own self
> compiled kernels, but I certainly would not want a special kernel just
> to do something normal that requires good latency (like sound use).
for good sound you'll at least need PREEMPT_VOLUNTARY. You'll need
CONFIG_PREEMPT for certain workloads or pro-audio use.
> > and users. Right now PREEMPT_NONE is dominant, so do you argue that
> > CONFIG_PREEMPT should be removed? It's certainly not zero-cost even on
> > the source code, witness all the preempt_disable()/preempt_enable() or
> > get_cpu()/put_cpu() uses.
>
> Actually yes I would. AFAIK CONFIG_PREEMPT never improved latency
> considerably (from the numbers Ive seen), but it had extreme cost in
> the code base (like all this get/put cpu mess and the impact on RCU
> was also not pretty)
>
> So it never seemed very useful to me. May be it would have been better
> if it had been made UP only, that would at least have avoided a lot of
> issues.
>
> But at least CONFIG_PREEMPT is still reasonably cheap, so it is not as
> intrusive as some of the stuff proposed.
the impact of PREEMPT on the codebase has a positive effect as well: it
forces us to document SMP data structure dependencies better. Under
PREEMPT_NONE it would have been way too easy to get into the kind of
undocumented interdependent data structure business that we so well know
from the big kernel lock days. get_cpu()/put_cpu() precisely marks the
critical section where we use a given per-CPU data structure.
Ingo
On Fri, May 27, 2005 at 03:13:17PM +0200, Ingo Molnar wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
> > On Fri, May 27, 2005 at 02:48:37PM +0200, Ingo Molnar wrote:
> > > * Andi Kleen <[email protected]> wrote:
> > >
> > > > [...] Even normal kernels must have reasonably good latency, as long
> > > > as it doesnt cost unnecessary performance.
> > >
> > > they do get reasonably good latency (within the hard constraints of the
> > > possibilities of a given preemption model), due to the cross-effects
> > > between the various preemption models, that i explained in detail in
> > > earlier mails. Something that directly improves latencies on
> > > CONFIG_PREEMPT improves the 'subsystem-use latencies' on PREEMPT_RT.
> >
> > I was more thinking of improvements for !PREEMPT.
>
> how would you do that, if even a first step (PREEMPT_VOLUNTARY) was
> opposed by some as possibly hurting throughput? I'm really curious, what
> would you do to improve PREEMPT_NONE's latencies?
Mostly in the classical way. Add cond_resched where needed. Break
up a few locks. Perhaps convert a few spinlocks that block preemption
to long and which are not taken that often to a new kind of
sleep/spinlock (TBD). Then add more reschedule point again.
>
> > > Also there's the positive interaction between scalability and latencies
> > > as well.
> >
> > That sound more like bugs that should just be fixed in the main kernel
> > by more scheduling. Can you give details and examples?
>
> what i meant is a pretty common-sense thing: the more independent the
> locks are, the more shortlived locking is, the less latencies there are.
At least on SMP the most finegrained locking is not always the best;
you can end up with bouncing cache lines all the time, with two CPUs
synchronizing to each other all the time, which is just slow.
it is sometimes better to batch things with less locks.
And every lock has a cost even when not taken, and they add up pretty quickly.
> The reverse is true too: most of the latency-breakers move code out from
> under locks - which obviously improves scalability too. So if you are
> working on scalability you'll indirectly improve latencies - and if you
> are working on reducing latencies, you often improve scalability.
But I agree that often less latency is good even for scalability.
> > > but it's certainly not for free. Just like there's no zero-cost
> > > virtualization, or there's no zero-cost nanokernel approach either,
> > > there's no zero-cost single-kernel-image deterministic system either.
> > >
> > > and the argument about binary kernels - that's a choice up to vendors
> >
> > It is not only binary distribution kernels. I always use my own self
> > compiled kernels, but I certainly would not want a special kernel just
> > to do something normal that requires good latency (like sound use).
>
> for good sound you'll at least need PREEMPT_VOLUNTARY. You'll need
> CONFIG_PREEMPT for certain workloads or pro-audio use.
AFAIK the kernel has quite regressed recently, but that was not true
(for reasonable sound) at least for some earlier 2.6 kernels and
some of the low latency patchkit 2.4 kernels.
So it is certainly possible to do it without preemption.
> the impact of PREEMPT on the codebase has a positive effect as well: it
> forces us to document SMP data structure dependencies better. Under
> PREEMPT_NONE it would have been way too easy to get into the kind of
> undocumented interdependent data structure business that we so well know
> from the big kernel lock days. get_cpu()/put_cpu() precisely marks the
> critical section where we use a given per-CPU data structure.
Nah, there is still quite some code left that is unmarked, but
ignores this case for various reason (e.g. in low level exception handling
which is preempt off anyways). However you are right it might have helped
a bit for generic code. But it is still quite ugly...
-Andi
* Andi Kleen <[email protected]> wrote:
> But at least CONFIG_PREEMPT is still reasonably cheap, so it is not as
> intrusive as some of the stuff proposed.
actually, PREEMPT_RT is not nearly as intrusive (on the source level) as
PREEMPT. It's not even in the same ballpark. (because it mostly rides on
the top of the intrusion caused by SMP and PREEMPT support.)
it's intrusive in terms of performance impact.
Ingo
* Andi Kleen <[email protected]> wrote:
> > how would you do that, if even a first step (PREEMPT_VOLUNTARY) was
> > opposed by some as possibly hurting throughput? I'm really curious, what
> > would you do to improve PREEMPT_NONE's latencies?
>
> Mostly in the classical way. Add cond_resched where needed. Break up a
> few locks. Perhaps convert a few spinlocks that block preemption to
> long and which are not taken that often to a new kind of
> sleep/spinlock (TBD). Then add more reschedule point again.
been there, done that. A couple of years ago i started out with a
somewhat similar opinion to yours, which could be summed up as: "this
cannot be that hard, just break up the code, damnit". Wrote tools to see
where the latencies come from, and started sticking cond_resched()s in.
A few years down the road and after multiple restarts (lowlatency patch,
the first preempt prototype patch, -VP patchset, etc.) i ended up with
the -RT patch and with two new preemption models (PREEMPT_VOLUNTARY and
PREEMPT_RT) in addition to PREEMPT_NONE and PREEMPT. (With the extra
twist that when i started then the kernel was only 2 million lines big,
now it's 6+ million lines of code.)
Ingo
* Andi Kleen <[email protected]> wrote:
> > what i meant is a pretty common-sense thing: the more independent the
> > locks are, the more shortlived locking is, the less latencies there are.
>
> At least on SMP the most finegrained locking is not always the best;
> you can end up with bouncing cache lines all the time, with two CPUs
> synchronizing to each other all the time, which is just slow.
yeah, and i wasnt arguing for the most finegrained locking: cacheline
bouncing hurts worst-case latencies just as much (in fact, more, being a
worst-case) than it hurts scalability.
> it is sometimes better to batch things with less locks. And every lock
> has a cost even when not taken, and they add up pretty quickly.
(the best is obviously to have no locking at all, unless there's true
resource sharing.)
> > The reverse is true too: most of the latency-breakers move code out from
> > under locks - which obviously improves scalability too. So if you are
> > working on scalability you'll indirectly improve latencies - and if you
> > are working on reducing latencies, you often improve scalability.
>
> But I agree that often less latency is good even for scalability.
>
>
> > > > but it's certainly not for free. Just like there's no zero-cost
> > > > virtualization, or there's no zero-cost nanokernel approach either,
> > > > there's no zero-cost single-kernel-image deterministic system either.
> > > >
> > > > and the argument about binary kernels - that's a choice up to vendors
> > >
> > > It is not only binary distribution kernels. I always use my own self
> > > compiled kernels, but I certainly would not want a special kernel just
> > > to do something normal that requires good latency (like sound use).
> >
> > for good sound you'll at least need PREEMPT_VOLUNTARY. You'll need
> > CONFIG_PREEMPT for certain workloads or pro-audio use.
>
> AFAIK the kernel has quite regressed recently, but that was not true
> (for reasonable sound) at least for some earlier 2.6 kernels and some
> of the low latency patchkit 2.4 kernels.
>
> So it is certainly possible to do it without preemption.
PREEMPT_VOLUNTARY does it without preemption. PREEMPT_VOLUNTARY is quite
similar to most of the lowlatency patchkits, just simpler.
> > the impact of PREEMPT on the codebase has a positive effect as well: it
> > forces us to document SMP data structure dependencies better. Under
> > PREEMPT_NONE it would have been way too easy to get into the kind of
> > undocumented interdependent data structure business that we so well know
> > from the big kernel lock days. get_cpu()/put_cpu() precisely marks the
> > critical section where we use a given per-CPU data structure.
>
> Nah, there is still quite some code left that is unmarked, but ignores
> this case for various reason (e.g. in low level exception handling
> which is preempt off anyways). However you are right it might have
> helped a bit for generic code. But it is still quite ugly...
there's a slow trend related to RCU: rcu_read_lock() is silent about
what kind of implicit lock dependencies there are. So when we convert a
spinlock-using piece of code to RCU we lose that information, making it
harder to convert it to another type of locking later on. (But this is
not a complaint against RCU, just a demonstration that we do lose
information.)
Ingo
* Andi Kleen <[email protected]> wrote:
> AFAIK the kernel has quite regressed recently, but that was not true
> (for reasonable sound) at least for some earlier 2.6 kernels and some
> of the low latency patchkit 2.4 kernels.
(putting my scheduler maintainer hat on) was this under a stock !PREEMPT
kernel? If you can reproduce it personally, could you try the
PREEMPT_VOLUNTARY option available in 2.6.12-rc5-mm1? [Despite its name
it only adds cond_resched()s, not any heavier preempt mechanism.]
Ingo
At 27 May 2005 15:31:22 +0200,
Andi Kleen wrote:
>
> On Fri, May 27, 2005 at 03:13:17PM +0200, Ingo Molnar wrote:
> >
> > > > but it's certainly not for free. Just like there's no zero-cost
> > > > virtualization, or there's no zero-cost nanokernel approach either,
> > > > there's no zero-cost single-kernel-image deterministic system either.
> > > >
> > > > and the argument about binary kernels - that's a choice up to vendors
> > >
> > > It is not only binary distribution kernels. I always use my own self
> > > compiled kernels, but I certainly would not want a special kernel just
> > > to do something normal that requires good latency (like sound use).
> >
> > for good sound you'll at least need PREEMPT_VOLUNTARY. You'll need
> > CONFIG_PREEMPT for certain workloads or pro-audio use.
>
> AFAIK the kernel has quite regressed recently, but that was not true
> (for reasonable sound) at least for some earlier 2.6 kernels and
> some of the low latency patchkit 2.4 kernels.
>
> So it is certainly possible to do it without preemption.
Yes, as Ingo stated many times, addition cond_resched() to
might_sleep() does achieve the "usable" latencies -- and obviously
that's hacky.
So, the only question is whether changing (inserting) cond_resched()
to all points would be acceptable even if it results in a big amount
of changes...
Takashi
> Yes, as Ingo stated many times, addition cond_resched() to
> might_sleep() does achieve the "usable" latencies -- and obviously
> that's hacky.
>
> So, the only question is whether changing (inserting) cond_resched()
> to all points would be acceptable even if it results in a big amount
> of changes...
Or change (almost) all calls to might_sleep() into calls to
cond_reched(), and put a might_sleep() inside cond_reched().
Ciao,
D.
Nick Piggin wrote:
> Ingo Molnar wrote:
>
> Thanks Ingo,
>
>> * Nick Piggin <[email protected]> wrote:
>>
>>
>>> Presumably your RT tasks are going to want to do some kind of *real*
>>> work somewhere along the line - so how is that work provided guarantees?
>>
>>
>>
>> there are several layers to this. The primary guarantee we can offer is
>> to execute userspace code within N usecs. Most code that needs hard
>> guarantees is quite simple and uses orthogonal mechanisms.
>>
>
> Well yes, but *somewhere* along the line they'll need to interact
> with something else in a timely (in the RT sense) manner?
>
> [...]
>
>>
>>> So in that sense, if you do hard RT in the Linux kernel, it surely is
>>> always going to be some subset of operations, dependant on exact
>>> locking implementation, other tasks running and resource usage, right?
>>
>>
>>
>> yes. The goal is that latencies will fundamentally depend on what
>> facilities (and sharing) the RT task makes use of - instead of
>> depending on what _other_ tasks do in the system.
>>
>
> OK.
>
>>
>>> Tasklist lock might be a good example off the top of my head - so you
>>> may be able to send a signal to another process with deterministic
>>> latency, however that latency might look something like: x + nrproc*y
>>
>>
>>
>> yes, signals are not O(1).
>>
>> Fundamentally, the Linux kernel constantly moves towards separation of
>> unrelated functionality, for scalability reasons. So the moment
>> there's some unexpected sharing, we try to get to rid of it not
>> primarily due to latencies, but due to performance. (and vice versa -
>> one reason why it's not hard to get latency patches into the kernel)
>> E.g. the tasklist lock might be convered to RCU one day. The idea is
>> that a 'perfectly
>> scalable' Linux kernel also has perfect latencies - the two goals meet.
>>
>
> I'd have to think about that one ;)
> But yeah I agree they seem to broadly move in the same direction,
> but let's not split hairs.
I think this is an excellent point and one that is often missed by the
opponents of lower latencies in favor of throughput. At the risk of
over-simplification: If your task takes 20ms vs. 10ms eventually those
extra 10ms, every time your task runs, add up to real, noticeable wall
clock time. Unless you live in a world where you have unlimited CPU
resources, lower latencies (or maybe more correctly determinism) have to
be beneficial to throughput also. In a case where it cost throughput
because of the context changes created by higher priorty tasks
preempting lower priority tasks, change the priorities so that the tasks
don't get preempted. I believe that in a case where all codepaths have
been made to be as efficient and deterministic as possible, you remove
most of the unknowns that the OS can throw at you. Doesn't this leave it
to the application designer/developer to make his app perform at the
best possible level?
>
>>
>>> It appears to me (uneducated bystander, remember) that a nanokernel
>>> running a small hard-rt kernel and Linux together might be "better"
>>> for people that want real realtime.
>>
>>
>>
>> If your application model can tolerate a total separation of OSs then
>> that's sure a viable way. If you want to do everything under one
>> instance of Linux, and want to separate out some well-controlled RT
>> functionality, then PREEMPT_RT is good for you.
>>
>> Note that if you can tolerate separation of OSs (i.e. no sharing or
>> well-controlled sharing) then you can do that under PREEMPT_RT too,
>> here and today: e.g. run all the non-RT tasks in an UML or QEMU instance.
>> (separation of UML needs more work but it's fundamentally ok.) Or you
>> can use PREEMPT_RT as the nanokernel [although this sure is overkill]
>> and put all the RT functionality into a virtual machine. So instead of a
>> hard choice forced upon you, virtualization becomes an option. Soft-RT
>> applications can morph towards hard-RT conditions and vice versa.
>>
>
> OK. I what sort of applications can't tolerate the nanokernel type
> separation? I guess the hosts would be seperated by some network like
> device, shared memory, etc. devices that use functionality provided
> by the nanokernel?
I am not saying this is the case with all of these environments, so
please noone start throwing things at me. Imagine if you will a world
where there is no shared memory between RT tasks and non RT tasks.
Imagine a world where such tasks must share data via a pipe. Imagine if
you will a world where you don't just have to make your application as
fast and efficient as possible but you also have to build your own
facilities (such as IPC) that you take for granted in normal Linux
environment. To my knowledge there are very few RT environments today
where you don't have to live with these types of constraints. The ones
that I know about where you don't are not nanokernel type environments.
>
>> So whether it's good enough will have to be seen - maybe nanokernels
>> will win in the end. As long as PREEMPT_RT does not impose any undue
>> design burden on the stock kernel (and i believe it does not) it's a
>> win-win situation: latency improvements will drive scalability,
>> scalability improvements will drive latencies, and the code can be
>> easily removed if it becomes unused.
>
>
> Well yeah, from what I gather, the PREEMPT_RT work needn't be excluded
> on the basis that it can't provide hard-RT - for a real world example
> all the sound guys seem to love it ;) so it obviously is worth something.
>
> And if the complexity can be nicely hidden away and configured out,
> then I personally don't have any problem with it whatsoever :) But
> I don't like to comment further on actual code until I see the actual
> proposed patch when you're happy with it.
>
> Nick
> Send instant messages to your online friends
> http://au.messenger.yahoo.com -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
kr
On Fri, May 27, 2005 at 04:27:24PM +0200, Duncan Sands wrote:
> Or change (almost) all calls to might_sleep() into calls to
> cond_reched(), and put a might_sleep() inside cond_reched().
indeed
http://www.ussg.iu.edu/hypermail/linux/kernel/0407.1/1422.html
Who on earth would ever compile a kernel with PREEMPT_VOLUNTARY=n?
That's just a marketing word and useless config option as far as I can
tell. Anyway it's just source code overhead, at runtime the code is the
same, so I don't care after all.
[side-note]
Sven-Thorsten Dietrich wrote:
> If you are truly interested, there are a lot of papers about RT. There
> are nanokernel implementations and patents you can review, and there is
> a lot of controversy.
Please drop the patent topic, it hasn't been relevant for years. If
you search the LKML archives for the initial release of Adeos, you
will see a thread where that specific topic is cleared up. If still
in doubt, do read the actual relevant patent application(s?), you
will see that no nanokernel/hyervisor out there fits the described
method. Not to mention that hypervisors/nanokernels have been there
for decades ... The specific patent that covers dual-kernels does
not even attempt to claim it covers the broad world of nanokernels/
hypervisors.
Hope this clears this bit, and please don't drag this further. That
particular topic has been debated more than enough, and I've said
what I had to say about it many times already. From that point of
view, I fully agree with you that there's no need to waste people's
time further.
With that said, let's go back to talking about the actual technical
arguments :D
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Fri, May 27, 2005 at 02:10:56PM +0200, Ingo Molnar wrote:
> * Bill Huey <[email protected]> wrote:
>
> > There's really no good reason why this kernel can't get the same
> > latency as a nanokernel. The scheduler paths are riddled with SMP
> > rebalancing stuff and the like which contributes to overall system
> > latency. Remove those things and replace it with things like direct
> > CPU pining and you'll start seeing those numbers collapse. [...]
>
> could you be a bit more specific? None of that stuff should show up on
> UP kernels. Even on SMP, rebalancing is either asynchronous, or O(1).
I found out a couple of problems with IRQ rebalancing in that the
latency spread was effected by a ping-ponging of the actual interrupt
itself. I reported this to you in November and I fixed this problem
by gluing the interrupt to the same cpu as the irq-thread.
Not sure if it was the rebalancing or the cache issues, but they seem
related.
bill
On Fri, May 27, 2005 at 10:43:10PM +1000, Nick Piggin wrote:
> Yes I did see that post of yours, but I didn't really understand the
> explaination. (I hope I don't quote you out of context, please correct
> me if so).
>
> I don't see why you would have problems crossing kernel "concurrency
> domains" with the nanokernel approach. Presumably your hard-RT guest
> kernel or its tasks aren't going to go to the Linux image in order
> to satisfy a hard RT request.
The typical work being done in dual kernel scenarios involves moving
"all things needing RT" into the host RT kernel's domain. You use the
host kernel's threading APIs, etc... Graphics drivers and drivers of
all types need to be retargetted to the host kernel. You do this for
all drivers where you have an RT interest.
This kind of work removes it from running within the Linux guest image.
So now you have a situation where you have to use the host kernel's
APIs for application development or RT guarantees can't be met.
Think about X11, all of it. Is X11 something you want to run in the
host kernel's domain ? What is that going to entail ? What do you do
when you have multipule graphics output devices and need to respond
to vertical trace interrupts ? Move all of the kernel support drivers
into the host domain ?
That's a bit rough. You're stuck with programming in one domain and
there's not possibility of directly "getting at" lower level drivers,
special RT sockets, etc... because all of these subsystems have to
be retargetted towards the RT host domain. There's a very big API
program barrier to how you might want to write a common application
to take advantage of hard RT constraints.
It's tricky. You have to cross into the Linux image using some kind
message queue system an effectively marshall requests back and forth.
Getting at known syscalls is probably ok in that you can use a library
to link and some loader trickery to offload the development costs.
Now think about this. You're a single kernel engineer. You don't have
the resources to make every kernel subsystems hard RT capable. You
have this idea where you'd like get at SGI's XFS's homogenous object
storage to stream video data with guaranteed IO rates. This needs to
be running in an RT domain so that guarantees can be tightly controlled
since you're running an app that doing multipule file streaming of
those objects. What kernel subsystems does this include ?
It includes the VFS system, parts of the VM, all of the IO subsystems
including SCSI/IDE and IO schedulers, etc..., the softirq subsystem
supporting the SCSI layers and IO schedulers, all the parts of XFS
itself. The list goes on.
Think about making that entire chain of subsystems available under RT
control in a dual kernel system where you have a thick boundary
marshalling this access ? You'd have to port much of the kernel into
that host RT domain to even consider getting any kind of control over
XFS. That's massive.
A single image kernel isn't going to solve all of the contention
issue regarding locking, but it's obviously much easier to work with
and there's a much higher probability to make that entire kernel path
work with respect to thread priority. This is because it's possible
to reengineer those path to be lock-free if you so choose, etc... so
that the request is processed and submitted to an IO request queue
directly.
The system can be broken down into finer parts and access to all parts
of it in that chain is direct, linear even, without having to worry
about a decoupling layer in dual kernel system. Dual kernels might be
lock-free, but the submission of messages is still a synchronization
point. It's not a mutex, but it's still a concurrent structure that
protects the thread from the system it's calling at the moment. The
queuing, system to system partitioning, itself doesn't fix the long
execution paths of the Linux kernel image or contention within that
guest kernel.
Think of this in terms of how a wider scoped project regarding
concurrency would be overly complicated by a dual kernel system like
that. If you think about it, then you'll realize that a single kernel
image is much better if you're going down a more sophisticated road
like that.
This is just the kernel. What if you wanted to, say, export a real time
TCP/IP socket to a userspace RT app ? what's the subsystem call chain
there ? Say you want to do this within an X11 application talking to
ALSA devices ? Obviously the dual kernel model is going to break down
very shortly after the set of requirements are known and submitted.
Single image systems are clearly superior in that regard even with
the existing lock structure adding indeterminancy via priority
inheritance.
It's not just latency alone that's the issue. It's the application
programming domain that really the problem and how the needs of that
app it projects itself across the entire kernel and all supporting
subsystems. It's a large scale software design argument that drives
this track for me and it's how I think it should be viewed.
Super hard RT latencies are obivously going to not call into the
kernel for non-deterministic operations. These are more typical of
traditional RT applications. If they are properly written, then
they should run similarly to hard RT systems if you scope out a set
of priority for them to run in above the interactive priorities and
the overall system.
> Also, you said "I have to think about things like dcache_lock, route
> tables, access to various IO system like SCSI and TCP/IP, etc...",
> but at first glance, those locks and structures are exactly why you
> wouldn't want to do hard-rt work along side general purpose work in
> the Linux kernel.
>
> And quite how they would interfere with the hard-rt guest, you didn't
> make clear.
>
> "A single system image makes access to this direct unlike dual kernel
> system where you need some kind of communication coupling. Resource
> access is direct."
>
> ... but you still need the locks, right?
...
> >as a nanokernel. The scheduler paths are riddled with SMP rebalancing
> >stuff and the like which contributes to overall system latency. Remove
> >those things and replace it with things like direct CPU pining and you'll
> >start seeing those numbers collapse. There are also TLB issues, but there
> >are many way of reducing and stripping down this kernel to reach so called
> >nanokernel times. Nanokernel times are overidealized IMO. It's not
> >because of design necessarily, but because of implementation issues that
> >add more latency to the deterministic latency time constant.
> >
>
> Is this one reason why a nanokernel is better, then? So you wouldn't
> have to worry about the SMP rebalancing, and TLB issues, and everything
> else in your Linux kernel?
...
>From what I've seen, the Linux interrupt paths are about as optimized
as it gets. It seems that the SMP support and other things that make up
a general purpose system are what slow latency down, but it can be replaced
with other things that are less dependent on dynamic computations if
there's a need for it. Ingo has the last word on this track.
For most folks, anything below 20us has been referred to as "bragging
rights" by a cowork of mine here. The vast majority of apps don't really
need anything tigher. This isn't the case for all RT apps, but I still
think this is largely true.
Keep in mind this is not a complete system by far, so you have to keep
the current practical aspects out of what will be the finished product
in the future. There's a lot more to be done here.
> I'm not sure if you exactly answered my concerns in that thread
> (or I didn't understand). It would be great if you could help me
> out a bit here, because I feel I must be missing something here.
Was this better ? :) I'm blow a lot of development time writing up all
of these emails this week.
bill
Bill Huey (hui) wrote:
I'm doing a bit of snipping here, coz this is getting too big.
> The typical work being done in dual kernel scenarios involves moving
> "all things needing RT" into the host RT kernel's domain. You use the
I would have thought you'd have a Linux guest, and a hard-RT guest
both running on a nanokernel. But for the purposes of this discussion
it probably isn't a huge difference.
> Think about X11, all of it. Is X11 something you want to run in the
> host kernel's domain ? What is that going to entail ? What do you do
You could probably have some sort of RT graphics drawing facility,
but I think X11 wouldn't be it :P
> That's a bit rough. You're stuck with programming in one domain and
> there's not possibility of directly "getting at" lower level drivers,
> special RT sockets, etc... because all of these subsystems have to
> be retargetted towards the RT host domain. There's a very big API
> program barrier to how you might want to write a common application
> to take advantage of hard RT constraints.
>
Run RT programs in your RT kernel, and GP programs in your Linux
kernel. The only time one will have to cross into the other domain
is when they want to communicate with one another.
> the resources to make every kernel subsystems hard RT capable. You
> have this idea where you'd like get at SGI's XFS's homogenous object
> storage to stream video data with guaranteed IO rates. This needs to
> be running in an RT domain so that guarantees can be tightly controlled
That may be a complex problem, but it really doesn't get any simpler
when doing it with a single kernel: all those subsystems still have
to be contended with.
But it's getting a little hand-wavy, I think someone would have to
really be at death's door before trusting Linux (even with PREEMPT_RT)
and XFS to give hard RT IO guarantees any time in the next 5 or 10
years.
What I would do, I would write a block driver in the nanokernel, and
write host drivers for both the Linux and the RT kernel. The nanokernel
would give priority to RT kernel requests. Now its up to the RT kernel
to provide guarantees. Job done. (Well OK, that's very handwavy too,
but I think it is a solution that might actually be attainable, unlike
Linux XFS :)).
> Think about making that entire chain of subsystems available under RT
> control in a dual kernel system where you have a thick boundary
> marshalling this access ? You'd have to port much of the kernel into
> that host RT domain to even consider getting any kind of control over
> XFS. That's massive.
>
Well yeah, the RT kernel is going to have to implement all features
that it needs to provide. I just happen to be of the (naive) opinion
that adding functionality to a hard-RT kernel would be far easier
than adding hard-RT to the Linux kernel.
And not just from the technical "can it be done" sense, but you'll
probably end up fighting the non-RT kernel devs every step of the
way. So even if you had the perfect patchset there, it would probably
take years to merge it all, if ever.
[snip, making the Linux kernel hard-rt]
Yeah it is probably possible given enough time and effort, I grant
you that.
>
>
> This is just the kernel. What if you wanted to, say, export a real time
> TCP/IP socket to a userspace RT app ? what's the subsystem call chain
> there ?
If your RT kernel has a TCP/IP stack, then I guess the call chain is
socket(2) ;)
> Say you want to do this within an X11 application talking to
> ALSA devices ? Obviously the dual kernel model is going to break down
> very shortly after the set of requirements are known and submitted.
Well, you would do the RT work in the RT kernel, then communicate
the results to the Linux kernel.
> Single image systems are clearly superior in that regard even with
> the existing lock structure adding indeterminancy via priority
> inheritance.
>
It isn't clear to me yet. I'm sure you can make your interrupt
latencies look good, as with your scheduling latencies. But when
you talk about doing _real_ work, that will require an order of
magnitude more changes than the PREEMPT_RT patch to make Linux
hard-RT. And everyone will need to know about it, from device
driver writers and CPU arch code up.
> Super hard RT latencies are obivously going to not call into the
> kernel for non-deterministic operations. These are more typical of
But just what is a non-deterministic operation in Linux? It is
hard to know.
Suppose the PREEMPT_RT patch gets merged tomorrow. OK, now what
if *you* needed a realtime TCP/IP socket. Where will you begin?
[...]
>>I'm not sure if you exactly answered my concerns in that thread
>>(or I didn't understand). It would be great if you could help me
>>out a bit here, because I feel I must be missing something here.
>
>
> Was this better ? :) I'm blow a lot of development time writing up all
> of these emails this week.
>
Sorry, not much better... But don't waste too much time on me, and
thanks, I appreciate the time you've given me so far.
I wouldn't consider a non response (or a late response) to mean that
a point has been conceeded, or that I've won any kind of argument :-)
Best,
Nick
Send instant messages to your online friends http://au.messenger.yahoo.com
On Fri, 2005-05-27 at 09:18 +0200, Ingo Molnar wrote:
> but it's not like hard-RT tasks live in a vacuum: they already have to
> be aware of the latencies caused by themselves, and they have to be
> consciously aware of what kernel facilities they use. If you do hard-RT
> you have to be very aware of every line of code your task may execute.
>
> > So in that sense, if you do hard RT in the Linux kernel, it surely is
> > always going to be some subset of operations, dependant on exact
> > locking implementation, other tasks running and resource usage, right?
>
> yes. The goal is that latencies will fundamentally depend on what
> facilities (and sharing) the RT task makes use of - instead of depending
> on what _other_ tasks do in the system.
Real world example: JACK clients form an ordered graph and each must
finish processing a chunk of audio before signaling the next client to
start. The audio lives in shared memory and FIFOs are used for the IPC
between RT threads; when each client finishes it writes a single byte to
a fifo which wakes the next client, etc.
(we have to use FIFOs because signals are too slow and futexes are not
available on 2.4)
Of course write() will not normally be RT safe, as it can call down into
the journaling code or whatever, but on a tmpfs/shmfs writing a byte to
a FIFO takes constant time, so as long as the user makes sure the FIFOs
are set up correctly it's all 100% RT safe.
Lee
On Sat, 2005-05-28 at 13:53 +1000, Nick Piggin wrote:
> Run RT programs in your RT kernel, and GP programs in your Linux
> kernel. The only time one will have to cross into the other domain
> is when they want to communicate with one another.
>
And what about a multithreaded program with RT and non-RT threads?
>
> > the resources to make every kernel subsystems hard RT capable. You
> > have this idea where you'd like get at SGI's XFS's homogenous object
> > storage to stream video data with guaranteed IO rates. This needs to
> > be running in an RT domain so that guarantees can be tightly controlled
>
> That may be a complex problem, but it really doesn't get any simpler
> when doing it with a single kernel: all those subsystems still have
> to be contended with.
>
> But it's getting a little hand-wavy, I think someone would have to
> really be at death's door before trusting Linux (even with PREEMPT_RT)
> and XFS to give hard RT IO guarantees any time in the next 5 or 10
> years.
>
No one ever said anything about hard RT IO, or making any syscalls hard
RT. AFAIK this has never even come up during the PREEMPT_RT
development. We just want our userspace code to be scheduled as soon as
it's runnable. For the purposes of this entire thread, it's safe to
assume that if the RT thread does make any syscalls, it knows exactly
what it is doing, as in the JACK write() example.
Lee
Lee Revell wrote:
> On Sat, 2005-05-28 at 13:53 +1000, Nick Piggin wrote:
>
>>Run RT programs in your RT kernel, and GP programs in your Linux
>>kernel. The only time one will have to cross into the other domain
>>is when they want to communicate with one another.
>>
>
>
> And what about a multithreaded program with RT and non-RT threads?
>
Have to rewrite them.... Well, there are obviously none written for
Linux today, because it doesn't support hard-RT ;)
If you are talking about soft-RT, or things running on Linux today,
then sure they'll keep working. Even some or all of PREEMPT_RT may
be merged into the Linux guest too, for better soft-RT. I haven't
been arguing against that.
>
> No one ever said anything about hard RT IO, or making any syscalls hard
> RT.
Actually this is exactly what was being talked about. But note
that we're not specifically talking about PREEMPT_RT here, just
2 possible architectures that might achieve that.
> AFAIK this has never even come up during the PREEMPT_RT
> development. We just want our userspace code to be scheduled as soon as
> it's runnable. For the purposes of this entire thread, it's safe to
> assume that if the RT thread does make any syscalls, it knows exactly
> what it is doing, as in the JACK write() example.
>
I agree we'll never have a fully functional hard-RT Linux kernel.
But this thread hasn't been about whether or not the RT task knows
what it is doing (we assume it does), but the possibility of making
more parts of the kernel able to provide some RT guarantee (ie. so
said RT task *can* use more functionality).
Nick
Send instant messages to your online friends http://au.messenger.yahoo.com
On Sat, May 28, 2005 at 01:53:59PM +1000, Nick Piggin wrote:
> Bill Huey (hui) wrote:
>
> Run RT programs in your RT kernel, and GP programs in your Linux
> kernel. The only time one will have to cross into the other domain
> is when they want to communicate with one another.
OpenGL must be RT aware for the off screen buffer to be flipped. This
model isn't practical. With locking changes in X using something like
xcb in xlib, you might be able to achieve these goals. SGI IRIX is
enable to do things like this.
Please try to understand the app issues here, because you seem to have
a naive understanding of this. [evil jab :)]
> That may be a complex problem, but it really doesn't get any simpler
> when doing it with a single kernel: all those subsystems still have
> to be contended with.
>
> But it's getting a little hand-wavy, I think someone would have to
> really be at death's door before trusting Linux (even with PREEMPT_RT)
> and XFS to give hard RT IO guarantees any time in the next 5 or 10
> years.
True, but XFS was designed to deal with this in the first place. It's
not that remote a thing and if you have a nice SMP friendly system so
it's possible to restore that IRIX functionality in Linux.
> What I would do, I would write a block driver in the nanokernel, and
> write host drivers for both the Linux and the RT kernel. The nanokernel
> would give priority to RT kernel requests. Now its up to the RT kernel
> to provide guarantees. Job done. (Well OK, that's very handwavy too,
> but I think it is a solution that might actually be attainable, unlike
> Linux XFS :)).
There's a lot of unknowns here, but XFS is under utilized in Linux.
I can't really imagine how a RT host kernel could really respect
something as complicated as XFS with all of it's tree balancing stuff
and low level IO submissions with concurrent reads/writes. The nanokernel
adapation doesn't fly once you think about how complex that chain is.
The RT patch is priming that path to happen already and I would like to
see this used more.
> Well yeah, the RT kernel is going to have to implement all features
> that it needs to provide. I just happen to be of the (naive) opinion
> that adding functionality to a hard-RT kernel would be far easier
> than adding hard-RT to the Linux kernel.
The problem with that assertion is that it's pretty close to being
hard RT as is. It's not that "mysterious" and the results are very
solid. Try not to think about this in a piecewise manner, but how
an overall picture of things get used and what needs to happen to
get there as well as all of the work done so far.
> And not just from the technical "can it be done" sense, but you'll
> probably end up fighting the non-RT kernel devs every step of the
> way. So even if you had the perfect patchset there, it would probably
> take years to merge it all, if ever.
They don't understand the patch nor the problem space, so I ignore
them since they'll never push any edge that interesting. And Ingo's
comment about the RT patch riding on SMP locking as is should not
be something that's forgotten.
> [snip, making the Linux kernel hard-rt]
>
> Yeah it is probably possible given enough time and effort, I grant
> you that.
It's pretty close dude. You have to be in some kind of denial or
something like that because multipule folks have stated this already.
What's left are problems with FS code, ext3, and the like that still
have remaining atomic critical sections.
> If your RT kernel has a TCP/IP stack, then I guess the call chain is
> socket(2) ;)
>
> >Say you want to do this within an X11 application talking to
> >ALSA devices ? Obviously the dual kernel model is going to break down
> >very shortly after the set of requirements are known and submitted.
>
> Well, you would do the RT work in the RT kernel, then communicate
> the results to the Linux kernel.
Write a mini-app and see how this methodology is going to work in
this system. Both Ingo and me have already pointed out that folks
already doing general purpose apps need a simple model to work with
since they need to cross many kernel systems as well as app layers.
Stop thinking in terms of a kernel programmer stuck in 1995, but
something a bit more "large picture" in nature.
> It isn't clear to me yet. I'm sure you can make your interrupt
> latencies look good, as with your scheduling latencies. But when
My project was getting a solid spike at 4 usec for irq-thread
startups and Ingo's stuff is better. It's already there.
> you talk about doing _real_ work, that will require an order of
> magnitude more changes than the PREEMPT_RT patch to make Linux
> hard-RT. And everyone will need to know about it, from device
> driver writers and CPU arch code up.
Uh, not really. Have you looked at the patch or are you inserting
hysteria in the discussion again ? :) Sounds like hysteria.
Ingo is probably going to follow up on this so I'll let him deal
with you. I suggest you read about the latency traces being worked
on a few months ago.
> >Super hard RT latencies are obivously going to not call into the
> >kernel for non-deterministic operations. These are more typical of
>
> But just what is a non-deterministic operation in Linux? It is
> hard to know.
Pretty much any call other an things related to futex handling. That
doesn't invalidate my point since I wasn't making a broad claim in
first place.
> Suppose the PREEMPT_RT patch gets merged tomorrow. OK, now what
> if *you* needed a realtime TCP/IP socket. Where will you begin?
Start with the DragonFly BSD sources and talk to Jeffery Hsu about
his alt-q implementation. Their stack was parallelize recently and
can express this kind stuff with possible a special scheduler in
their preexisting token locking scheme. I'm not talking hard RT
here for RT enabled IO. Obvious this is going to be problematic
to a certain degree in a kernel and will have to be move more into
the realm of soft RT with high performance.
> Sorry, not much better... But don't waste too much time on me, and
> thanks, I appreciate the time you've given me so far.
Read the patch and follow the development. That's all I can say.
> I wouldn't consider a non response (or a late response) to mean that
> a point has been conceeded, or that I've won any kind of argument :-)
Well, you're wrong. :)
Well, uh, ummm, start writing RT media apps and you will know what
I'm talking about. Dual kernel stuff isn't going to fly with those
folks especially with an RT patch as good as this already in the
general kernel. More experience with this kind of programming makes
it clear where the failures are with a dual kernel approach.
bill
On Sat, May 28, 2005 at 02:43:30PM +1000, Nick Piggin wrote:
> If you are talking about soft-RT, or things running on Linux today,
> then sure they'll keep working. Even some or all of PREEMPT_RT may
> be merged into the Linux guest too, for better soft-RT. I haven't
> been arguing against that.
Hard RT guarantee are very possible, not in years, but months (possibly
already) under constraints previously outlined.
[dual complexity issues snippeted]
> I agree we'll never have a fully functional hard-RT Linux kernel.
> But this thread hasn't been about whether or not the RT task knows
> what it is doing (we assume it does), but the possibility of making
> more parts of the kernel able to provide some RT guarantee (ie. so
> said RT task *can* use more functionality).
No sane RT app person is going to call into the kernel and expect
guarantees in a general purpose system. Folks doing this kind of
RT work will have at least a path that they can follow to *possibly*
make this happen. It's all conjecture at the moment and it won't be
known until somebody takes a shot at it and all associated kernel
issues. It's most definitely a worthy project.
If this happens Linux would be an ideal kernel for digital video
recorders and such.
bill
Bill Huey (hui) wrote:
> On Sat, May 28, 2005 at 01:53:59PM +1000, Nick Piggin wrote:
>
>
> OpenGL must be RT aware for the off screen buffer to be flipped. This
> model isn't practical. With locking changes in X using something like
> xcb in xlib, you might be able to achieve these goals. SGI IRIX is
> enable to do things like this.
>
OpenGL seems to work just fine here, and it can flip off screen buffers.
> Please try to understand the app issues here, because you seem to have
> a naive understanding of this. [evil jab :)]
>
It's not an evil jab, because I do have a naive understanding of this.
But nobody has been able to say why a single kernel is better than a
nanokernel.
> True, but XFS was designed to deal with this in the first place. It's
> not that remote a thing and if you have a nice SMP friendly system so
> it's possible to restore that IRIX functionality in Linux.
>
Then it is also possible to have that functionality in a hard-RT
guest kernel too.
>
> There's a lot of unknowns here, but XFS is under utilized in Linux.
> I can't really imagine how a RT host kernel could really respect
> something as complicated as XFS with all of it's tree balancing stuff
> and low level IO submissions with concurrent reads/writes. The nanokernel
> adapation doesn't fly once you think about how complex that chain is.
Err, that wouldn't go in the nanokernel. Do you understand what I'm
talking about? The nanokernel supervises a Linux guest and a hard-RT
guest.
> The RT patch is priming that path to happen already and I would like to
> see this used more.
>
Sorry, you aren't going to make XFS in Linux generally realtime capable
any time soon, so there is no point saying how hard it is going to be
with a nanokernel.
Oh hang on, wait a second here. *I* am not talking about removing
atomic critical sections or interrupts off periods from the kernel
so that your unrelated high priority userspace code or interrupt
handler can run. I understand PREEMPT_RT has basically solved that.
What I am talking about is an RT app calling into the kernel, and
being granted some resource or service within a deterministic time.
If you RT guys don't need such a thing, then let's clear that up
now so we can all go home to our families ;)
>
> The problem with that assertion is that it's pretty close to being
> hard RT as is. It's not that "mysterious" and the results are very
> solid. Try not to think about this in a piecewise manner, but how
> an overall picture of things get used and what needs to happen to
> get there as well as all of the work done so far.
>
For interrupts that do nothing, and userspace code, I'm sure it
is pretty close to being hard-RT. What I am talking about (what
my original question asked), is what kind of useful RT work will
people want to be using the kernel for, and why isn't a microkernel
a better approach.
Seems like a pretty simple question if (as everyone seems to be
saying) the single kernel scheme is so obviously superior. No need
for any handwaving about XFS, or X11, etc.
>
> They don't understand the patch nor the problem space, so I ignore
> them since they'll never push any edge that interesting. And Ingo's
> comment about the RT patch riding on SMP locking as is should not
> be something that's forgotten.
>
Well it seems like maybe you don't have a good understanding of their
problem spaces either. And if you ignore them, then that's fine but
you won't get anything merged. (Ingo might, however ;) )
>>Well, you would do the RT work in the RT kernel, then communicate
>>the results to the Linux kernel.
>
>
> Write a mini-app and see how this methodology is going to work in
> this system. Both Ingo and me have already pointed out that folks
> already doing general purpose apps need a simple model to work with
> since they need to cross many kernel systems as well as app layers.
>
Yeah, Linux "does" general purpose apps fine today.
> Stop thinking in terms of a kernel programmer stuck in 1995, but
> something a bit more "large picture" in nature.
>
I would love to. I'm waiting for somebody to paint me a large picture.
>
>>you talk about doing _real_ work, that will require an order of
>>magnitude more changes than the PREEMPT_RT patch to make Linux
>>hard-RT. And everyone will need to know about it, from device
>>driver writers and CPU arch code up.
>
>
> Uh, not really. Have you looked at the patch or are you inserting
> hysteria in the discussion again ? :) Sounds like hysteria.
>
OK, I'll start small. What have you done with the tasklist lock?
How did you make signal delivery time deterministic?
How about fork/clone? Or don't those need to be realtime? What
exactly _do_ you need to be realtime? I'm not asking rhetorical
questions here.
>
> Pretty much any call other an things related to futex handling. That
> doesn't invalidate my point since I wasn't making a broad claim in
> first place.
>
No, but my broad question was basically - how far will people want
to go with this? And how is one method better than another?
I understand there are some operations where PREEMPT_RT probably is
very close to hard-RT. I have understood that from the start.
>
>>Suppose the PREEMPT_RT patch gets merged tomorrow. OK, now what
>>if *you* needed a realtime TCP/IP socket. Where will you begin?
>
>
> Start with the DragonFly BSD sources and talk to Jeffery Hsu about
> his alt-q implementation. Their stack was parallelize recently and
> can express this kind stuff with possible a special scheduler in
> their preexisting token locking scheme. I'm not talking hard RT
> here for RT enabled IO. Obvious this is going to be problematic
> to a certain degree in a kernel and will have to be move more into
> the realm of soft RT with high performance.
>
So why did you bring it up as a problem for the nanokernel approach
if you can't handle it with the single kernel approach?
My question is very simple. Just a simple "people need to do X, a
nanokernel can't do X because ... a single kernel can do X" will be
fine.
And you needn't use vague examples with X11 or OpenGL. A concrete
example, say a sequence of system calls would be fine.
I really won't take much convincing, I just want some basic
background.
>
>>Sorry, not much better... But don't waste too much time on me, and
>>thanks, I appreciate the time you've given me so far.
>
>
> Read the patch and follow the development. That's all I can say.
>
When you're ready to submit something to be included in the Linux
kernel, then I'm sure you will have had time to write up a clea
rationale and be able to address my questions on the linux kernel
mailing list. I look forward to it ;)
>
>>I wouldn't consider a non response (or a late response) to mean that
>>a point has been conceeded, or that I've won any kind of argument :-)
>
>
> Well, you're wrong. :)
>
Wrong about what? While no doubt I've made one or two, I have tried
to steer clear of making assertions.
Send instant messages to your online friends http://au.messenger.yahoo.com
On Fri, May 27, 2005 at 04:36:45PM -0700, Bill Huey wrote:
> Now think about this. You're a single kernel engineer. You don't have
> the resources to make every kernel subsystems hard RT capable. You
> have this idea where you'd like get at SGI's XFS's homogenous object
> storage to stream video data with guaranteed IO rates. This needs to
> be running in an RT domain so that guarantees can be tightly controlled
> since you're running an app that doing multipule file streaming of
> those objects. What kernel subsystems does this include ?
>
> It includes the VFS system, parts of the VM, all of the IO subsystems
> including SCSI/IDE and IO schedulers, etc..., the softirq subsystem
> supporting the SCSI layers and IO schedulers, all the parts of XFS
> itself. The list goes on.
You're on crack as usual, but today you go much too far. XFS doesn't
ahave anything to do with you're so Hard RT pipedreams. The so-called
'RT' subvolulme only provides a more determinitistic block allocator.
GRIO doesn't require any RT guarantees, it's entirely about I/O scheduling
and has been ported to various operating systems with sane locking semantics.
On Sat, May 28, 2005 at 07:55:00AM +0100, Christoph Hellwig wrote:
> You're on crack as usual, but today you go much too far. XFS doesn't
> ahave anything to do with you're so Hard RT pipedreams. The so-called
> 'RT' subvolulme only provides a more determinitistic block allocator.
> GRIO doesn't require any RT guarantees, it's entirely about I/O scheduling
> and has been ported to various operating systems with sane locking semantics.
I actually when I talked to the SGI folks about 5 years ago at Usenix
I got a different story where they really were thinking about hacking
a tasklet to handle some of this IO stuff going. So I'm going to bet
that you're wrong about this based on that conversation.
The combination of this and RT apps that use it require some kind of
RT guarantees. I've had a number of conversation with SGI folks that
have stated this.
And notice your jumpy comments doesn't dillute any of the things I've
pointed out whether you understand it or not.
bill
On Sat, May 28, 2005 at 03:22:59AM -0700, Bill Huey wrote:
> On Sat, May 28, 2005 at 07:55:00AM +0100, Christoph Hellwig wrote:
> > You're on crack as usual, but today you go much too far. XFS doesn't
> > ahave anything to do with you're so Hard RT pipedreams. The so-called
> > 'RT' subvolulme only provides a more determinitistic block allocator.
> > GRIO doesn't require any RT guarantees, it's entirely about I/O scheduling
> > and has been ported to various operating systems with sane locking semantics.
>
> I actually when I talked to the SGI folks about 5 years ago at Usenix
> I got a different story where they really were thinking about hacking
> a tasklet to handle some of this IO stuff going. So I'm going to bet
> that you're wrong about this based on that conversation.
I'd like to add that 16x way SGI boxes can play and record something like
300+ individual streams that are frame accurate. An SGI buddy of mine
mention that CNN actually uses such a box to handle all of their video
data in real time.
Can Linux do this ? No
bill
On Sat, May 28, 2005 at 11:24:14AM +0100, Christoph Hellwig wrote:
> *plonk*
I love how you've demonstrate the open mindless and reaction free
demeanor in this case and others. I'm sure that's going to set well
with the rest of the folks that actually understand the importance
of this work.
bill
On Sat, May 28, 2005 at 03:34:17AM -0700, Bill Huey wrote:
> On Sat, May 28, 2005 at 03:22:59AM -0700, Bill Huey wrote:
> > On Sat, May 28, 2005 at 07:55:00AM +0100, Christoph Hellwig wrote:
> > > You're on crack as usual, but today you go much too far. XFS doesn't
> > > ahave anything to do with you're so Hard RT pipedreams. The so-called
> > > 'RT' subvolulme only provides a more determinitistic block allocator.
> > > GRIO doesn't require any RT guarantees, it's entirely about I/O scheduling
> > > and has been ported to various operating systems with sane locking semantics.
> >
> > I actually when I talked to the SGI folks about 5 years ago at Usenix
> > I got a different story where they really were thinking about hacking
> > a tasklet to handle some of this IO stuff going. So I'm going to bet
> > that you're wrong about this based on that conversation.
>
> I'd like to add that 16x way SGI boxes can play and record something like
> 300+ individual streams that are frame accurate. An SGI buddy of mine
> mention that CNN actually uses such a box to handle all of their video
> data in real time.
Also, to continue this open minded discussion and reply of yours. How do
you think IO is submitted to a system like that so that those guarantees
are met ? Obivously some kind deterministic mechanism is pushing those
requests to the wire.
bill
On Sat, May 28, 2005 at 03:50:03AM -0700, Bill Huey wrote:
> Also, to continue this open minded discussion and reply of yours. How do
> you think IO is submitted to a system like that so that those guarantees
> are met ? Obivously some kind deterministic mechanism is pushing those
> requests to the wire.
Unfortunately my employment contract doesn't allow me to tell you the
details of GRIO.
On Sat, May 28, 2005 at 11:48:18AM +0100, Christoph Hellwig wrote:
> Unfortunately my employment contract doesn't allow me to tell you the
> details of GRIO.
SGI also released the XFS code to the public. I'm sure you can
intelligently comment on that right and stop being a general ass ?
bill
On Sat, May 28, 2005 at 04:01:19AM -0700, Bill Huey wrote:
> On Sat, May 28, 2005 at 11:48:18AM +0100, Christoph Hellwig wrote:
> > Unfortunately my employment contract doesn't allow me to tell you the
> > details of GRIO.
>
> SGI also released the XFS code to the public. I'm sure you can
> intelligently comment on that right and stop being a general ass ?
Obviously there's two kind of determinism going on:
1) submission of the IO request so that it arrives in a timely manner.
2) receiving and waking a thread to handle that data
3) RT decoding of the data so that it's frame locked.
4) repeat the cycles again.
If that loop is delivering drop free frames, then it's got to be
at least deterministic from the app decoding layers. If it's meeting
that, then it's also got to deliver that IO within that window or
at a rate greater than what it can buffer.
There's two kinds of determinism going on here, one CPU bound, the
other IO bound. A kernel with a 300 millisecond spike is obviously
going to violate that constraint on both fronts and make the
application glitch.
bill
On Fri, May 27, 2005 at 03:53:10PM +0200, Ingo Molnar wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
> > AFAIK the kernel has quite regressed recently, but that was not true
> > (for reasonable sound) at least for some earlier 2.6 kernels and some
> > of the low latency patchkit 2.4 kernels.
>
> (putting my scheduler maintainer hat on) was this under a stock !PREEMPT
> kernel?
Yes. I did not run the numbers personally, but I was told 2.6.11+
was already considerable worse for latency tests with jack than 2.6.8+
(this was with vendor kernels in SUSE releases); and apparently
2.6.8 was already worse than earlier 2.6.4/5 kernels or the later
and better 2.4s. CONFIG_PREEMPT in all cases did not change the
picture much. Sorry for being light on details; as I did
not run the tests personally.
BTW another reason I am pretty suspicious against the old style preempt
stuff and intrusive latency in general too is that it was broken forever
in x86-64 - I only fixed it after 2.6.11 which you may have noticed. Before
that it it would only preempt when the interrupts were off,not
on (pretty embarassing bug). And nobody complained; The problem was only found
during code review for a completely different project (thanks JanB!)
And x86-64 is quite widely used these days.
So in practice all these latencies cannot be that big a problem.
-Andi
On Sat, 2005-05-28 at 21:55 +0200, Andi Kleen wrote:
> On Fri, May 27, 2005 at 03:53:10PM +0200, Ingo Molnar wrote:
> >
> > * Andi Kleen <[email protected]> wrote:
> >
> > > AFAIK the kernel has quite regressed recently, but that was not true
> > > (for reasonable sound) at least for some earlier 2.6 kernels and some
> > > of the low latency patchkit 2.4 kernels.
> >
> > (putting my scheduler maintainer hat on) was this under a stock !PREEMPT
> > kernel?
>
> Yes. I did not run the numbers personally, but I was told 2.6.11+
> was already considerable worse for latency tests with jack than 2.6.8+
> (this was with vendor kernels in SUSE releases); and apparently
> 2.6.8 was already worse than earlier 2.6.4/5 kernels or the later
> and better 2.4s. CONFIG_PREEMPT in all cases did not change the
> picture much. Sorry for being light on details; as I did
> not run the tests personally.
Um, that sounds 100% backwards. Starting around 2.6.8 the latency (as
measured by the smallest usable jack buffer size) improved drastically
with each release. Check the linux-audio-user or linux-audio-dev
archives, many JACK users report that 2.6.10 or 2.6.11 is the first
mainline kernel that gives acceptable performance at all. In fact,
starting around 2.6.11 some pro audio users have been switching back
from PREEMPT_RT to mainline as a result.
Lee
On Fri, 27 May 2005, Bill Huey wrote:
> > It isn't clear to me yet. I'm sure you can make your interrupt
> > latencies look good, as with your scheduling latencies. But when
>
> My project was getting a solid spike at 4 usec for irq-thread
> startups and Ingo's stuff is better. It's already there.
Is that worst case?
> > I wouldn't consider a non response (or a late response) to mean that
> > a point has been conceeded, or that I've won any kind of argument :-)
>
> Well, you're wrong. :)
>
> Well, uh, ummm, start writing RT media apps and you will know what
> I'm talking about. Dual kernel stuff isn't going to fly with those
> folks especially with an RT patch as good as this already in the
> general kernel. More experience with this kind of programming makes
> it clear where the failures are with a dual kernel approach.
Media apps are actually not that commonplace as far as hard realtime
applications are concerned.
Zwane
On Sat, 2005-05-28 at 19:55 -0600, Zwane Mwaikambo wrote:
> Media apps are actually not that commonplace as far as hard realtime
> applications are concerned.
Audio capture and playback always have a hard realtime constraint. That
is, unless you don't mind your VoIP call sounding as crappy as a cell
phone...
Lee
On Sat, 28 May 2005, Lee Revell wrote:
> On Sat, 2005-05-28 at 19:55 -0600, Zwane Mwaikambo wrote:
> > Media apps are actually not that commonplace as far as hard realtime
> > applications are concerned.
>
> Audio capture and playback always have a hard realtime constraint. That
> is, unless you don't mind your VoIP call sounding as crappy as a cell
> phone...
It still doesn't mean that media apps are commonplace and who says cell
phones don't use RTOS' for their lower level software stacks?
On Sat, 28 May 2005 20:58:23 MDT, Zwane Mwaikambo said:
> On Sat, 28 May 2005, Lee Revell wrote:
>
> > On Sat, 2005-05-28 at 19:55 -0600, Zwane Mwaikambo wrote:
> > > Media apps are actually not that commonplace as far as hard realtime
> > > applications are concerned.
> >
> > Audio capture and playback always have a hard realtime constraint. That
> > is, unless you don't mind your VoIP call sounding as crappy as a cell
> > phone...
>
> It still doesn't mean that media apps are commonplace and who says cell
> phones don't use RTOS' for their lower level software stacks?
I'd be wildly surprised if media apps *were* commonplace on an operating
system that didn't supply the needed scheduling infrastructure.
That's as straw-man as commenting that applications that used more than 16
processors weren't commonplace on Linux before the scalability work that made
it feasible to build systems with more than 2 CPUs....
Nick Piggin wrote:
> But nobody has been able to say why a single kernel is better than a
> nanokernel.
I think it's a bit more like you haven't realized the answer when people
gave it, so let me try to be more clear. It's purely a matter of effort
- in general it's far easier to write one process than two communicating
processes. As far as APIs, with a single-kernel approach, an RT
programmer just has to restrict the program to calling APIs known to be
RT-safe (compare with MT-safe programming). In a split-kernel approach,
the programmer has to write RT-kernel support for the APIs he wants to
use (or beg for them to be written). Most programmers would much rather
limit API usage than implement new kernel support themselves.
A very common RT app pattern is to do a bunch of non-RT stuff, then
enter an RT loop. For an example from my work, a robot control program
starts by reading a bunch of configuration files before it starts doing
anything requiring deadlines, then enters the RT control loop. Having
to read all the configuration in a separate program and then marshall
the data over to an RT-only process via file descriptors is quite a bit
more effort. I guess some free RT-nanokernels might/could support
non-RT to RT process migration, or better messaging, but there's
additional programming effort (and overhead) that wasn't there before.
In general an app may enter and exit RT sections several times, which
really makes a split-kernel approach less than ideal.
An easy way to visualize the difference in programming effort for the
two approaches is to take your favorite threaded program and turn it
into one with separate processes that only communicate via pipes. You
can *always* do this, its just very much more painful to develop and
maintain. Your stance of "nobody can prove why a split-kernel won't
work" is equivalent to saying "we don't ever really need threads, since
processes suffice". That's true, but only in the same way that I don't
need a compilier or a pre-existing operating system to write an application.
- Jim Bruce
Also, the u-kernel has been tried, MACH & Chorus, and even the real
NT core, as opposed to the Win32 API skin, and has not worked well
yet.
James Bruce wrote:
> Nick Piggin wrote:
>
>> But nobody has been able to say why a single kernel is better than a
>> nanokernel.
>
>
> I think it's a bit more like you haven't realized the answer when people
> gave it, so let me try to be more clear. It's purely a matter of effort
> - in general it's far easier to write one process than two communicating
> processes. As far as APIs, with a single-kernel approach, an RT
> programmer just has to restrict the program to calling APIs known to be
> RT-safe (compare with MT-safe programming). In a split-kernel approach,
> the programmer has to write RT-kernel support for the APIs he wants to
> use (or beg for them to be written). Most programmers would much rather
> limit API usage than implement new kernel support themselves.
>
> A very common RT app pattern is to do a bunch of non-RT stuff, then
> enter an RT loop. For an example from my work, a robot control program
> starts by reading a bunch of configuration files before it starts doing
> anything requiring deadlines, then enters the RT control loop. Having
> to read all the configuration in a separate program and then marshall
> the data over to an RT-only process via file descriptors is quite a bit
> more effort. I guess some free RT-nanokernels might/could support
> non-RT to RT process migration, or better messaging, but there's
> additional programming effort (and overhead) that wasn't there before.
> In general an app may enter and exit RT sections several times, which
> really makes a split-kernel approach less than ideal.
>
> An easy way to visualize the difference in programming effort for the
> two approaches is to take your favorite threaded program and turn it
> into one with separate processes that only communicate via pipes. You
> can *always* do this, its just very much more painful to develop and
> maintain. Your stance of "nobody can prove why a split-kernel won't
> work" is equivalent to saying "we don't ever really need threads, since
> processes suffice". That's true, but only in the same way that I don't
> need a compilier or a pre-existing operating system to write an
> application.
>
> - Jim Bruce
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
mit freundlichen Gr??en, Brian.
Dr. Brian O'Mahoney
Mobile +41 (0)79 334 8035 Email: [email protected]
Bleicherstrasse 25, CH-8953 Dietikon, Switzerland
PGP Key fingerprint = 33 41 A2 DE 35 7C CE 5D F5 14 39 C9 6D 38 56 D5
On Sun, 29 May 2005 [email protected] wrote:
> On Sat, 28 May 2005 20:58:23 MDT, Zwane Mwaikambo said:
> > On Sat, 28 May 2005, Lee Revell wrote:
> >
> > > On Sat, 2005-05-28 at 19:55 -0600, Zwane Mwaikambo wrote:
> > > > Media apps are actually not that commonplace as far as hard realtime
> > > > applications are concerned.
> > >
> > > Audio capture and playback always have a hard realtime constraint. That
> > > is, unless you don't mind your VoIP call sounding as crappy as a cell
> > > phone...
> >
> > It still doesn't mean that media apps are commonplace and who says cell
> > phones don't use RTOS' for their lower level software stacks?
>
> I'd be wildly surprised if media apps *were* commonplace on an operating
> system that didn't supply the needed scheduling infrastructure.
>
> That's as straw-man as commenting that applications that used more than 16
> processors weren't commonplace on Linux before the scalability work that made
> it feasible to build systems with more than 2 CPUs....
I'm not talking about Linux (which should be obvious as Linux isn't an
RTOS), so it has nothing to do with Linux capabilities. I'm referring to
general hard realtime applications and their use of realtime operating
systems.
On Sun, 29 May 2005 09:00:49 MDT, Zwane Mwaikambo said:
> On Sun, 29 May 2005 [email protected] wrote:
> > I'd be wildly surprised if media apps *were* commonplace on an operating
> > system that didn't supply the needed scheduling infrastructure.
> >
> > That's as straw-man as commenting that applications that used more than 16
> > processors weren't commonplace on Linux before the scalability work that made
> > it feasible to build systems with more than 2 CPUs....
>
> I'm not talking about Linux (which should be obvious as Linux isn't an
> RTOS), so it has nothing to do with Linux capabilities. I'm referring to
> general hard realtime applications and their use of realtime operating
> systems.
As amply shown by the Ardour and linux-audio crowds, the *MAJOR* thing keeping
realtime apps from spreading further is the lack of usable RT support in CTOS
operating systems. Yes, you *can* do realtime audio - if you're willing to not
use a common operating system and run some specialized RTOS instead. This is
frequently a show-stopper for small-time use - if there's an additional $5K
cost to getting and installing an RTOS (quite likely you need a new computer,
or redo the one you have to dual-boot), it may not be a problem for a large
recording studio - but it *is* a problem for a small studio or a home user.
So you end up with "The 150 places that buy 48-channel mixers are using RT,
but the 40,000 people who buy 4/8 channel mixers aren't" - by your standards,
nobody's interested in 48-channel mixing boards either.
So tell me - who was using SMP with large numbers of processors *before* the
Linux kernel? Hmm.. You could buy an SGI Onyx. A Sun E10K. The IBM gear.
And for some odd reason, there wasn't many sites that just weren't doing
SMP - it wasn't that long ago that a 48-CPU Sun was enough to get you on the
Top500 supercomputer list. Now everybody and their pet llama has a 128-node
system, it seems....
Large-scale SMP, realtime, whatever. It doesn't matter - you're pointing at it and
saying "But nobody *uses* it" when nobody can afford the technology, when there's
plenty of people lining up and saying "We *would* be using it if it were accessible".
Nobody drives around in Rolls Royces and Bentleys either - and 20 years ago, you
could have used that to say "But nobody drives luxury cars". That's changed
considerably - you get a company that decides to make a Lexus, with 95% of the
quality at 10% of the price, and you can see a *lot* of them on the road.....
Hello Valdis,
On Sun, 29 May 2005 [email protected] wrote:
> As amply shown by the Ardour and linux-audio crowds, the *MAJOR* thing keeping
> realtime apps from spreading further is the lack of usable RT support in CTOS
> operating systems. Yes, you *can* do realtime audio - if you're willing to not
> use a common operating system and run some specialized RTOS instead. This is
> frequently a show-stopper for small-time use - if there's an additional $5K
> cost to getting and installing an RTOS (quite likely you need a new computer,
> or redo the one you have to dual-boot), it may not be a problem for a large
> recording studio - but it *is* a problem for a small studio or a home user.
> So you end up with "The 150 places that buy 48-channel mixers are using RT,
> but the 40,000 people who buy 4/8 channel mixers aren't" - by your standards,
> nobody's interested in 48-channel mixing boards either.
I seem to have gotten you rather excited, you've actually gone as far as
creating a strawman argument for my allegedly "strawman" statement. What
i originally stated was that media applications are not common place as far
as _hard_ realtime systems are concerned, this was in reply to Bill's
emphasis on media applications. Now i'm not trying to undermine the
audiophiles' goals or aspirations and i do indeed see the benefits for
them but in the event of Linux becoming an RTOS, the main fields
of interest wouldn't be from media application providers (even if there
certainly will be an increase in their interest).
> So tell me - who was using SMP with large numbers of processors *before*the
> Linux kernel? Hmm.. You could buy an SGI Onyx. A Sun E10K. The IBM gear.
> And for some odd reason, there wasn't many sites that just weren't doing
> SMP - it wasn't that long ago that a 48-CPU Sun was enough to get you on the
> Top500 supercomputer list. Now everybody and their pet llama has a 128-node
> system, it seems....
Terribly sorry old bean, but Linux isn't the center of the universe. I'm
afraid Linux wasn't the push factor which led to the proliferation of
multiprocessor systems.
> Large-scale SMP, realtime, whatever. It doesn't matter - you're pointing at it and
> saying "But nobody *uses* it" when nobody can afford the technology, when there's
> plenty of people lining up and saying "We *would* be using it if it were accessible".
No, i never said that, please look at the original statement and the
context to which it was based on.
Cheers,
Zwane
On Sun, 29 May 2005 13:52:45 MDT, Zwane Mwaikambo said:
> i originally stated was that media applications are not common place as far
> as _hard_ realtime systems are concerned, this was in reply to Bill's
> emphasis on media applications.
Only because the average factory can afford the current "hard RT" gear, and
the average musician can't.
So the end result is that the factory doesn't have to pay for another part
ruined because a hole is drilled in the wrong place if the "hard RT" misses,
and the musician just has to resign themselves to "OK, let's try *another* take
and hope there's no POPs in it this time.." - even if the "hard RT" only ruins
a $5 part that you're making thousand a day, while the next take of the
musicians may cost a lot more than $5, and you don't get thousands of takes a
day.
At that point, the musician is cursing that he doesn't have "hard RT"....
(Of course, the musician doesn't *really* need a *totally* "hard RT" guarantee -
it would probably be quite sufficient if he lost only one take or two a month.
This is the sort of place where a "98% for 10% the cost" can win big...)
Yes, there's probably lots of *other* applications that would be written if
hard RT was available cheaply - but audio/video are a *known* area already.
> Terribly sorry old bean, but Linux isn't the center of the universe. I'm
> afraid Linux wasn't the push factor which led to the proliferation of
> multiprocessor systems.
Linux was *one* factor - the *point* was that we're seeing lots of things that
use clusters and massive parallelism that we *didnt* see when clusters weren't
financially feasible for many. So looking around the SMP landscape 7-10 years
ago, you'd have found only a few large sites doing it, and you would have said
"But people are doing A, B, and C on clusters, and almost nobody's doing X, Y
and Z on clusters" (pick any 3 X Y Z that have gotten big growth since).
> Nick Piggin wrote:
> > But nobody has been able to say why a single kernel is better than a
> > nanokernel.
>
> I think it's a bit more like you haven't realized the answer
> when people
> gave it, so let me try to be more clear. It's purely a
> matter of effort
> - in general it's far easier to write one process than two
> communicating
> processes. As far as APIs, with a single-kernel approach, an RT
> programmer just has to restrict the program to calling APIs
> known to be
> RT-safe (compare with MT-safe programming). In a
> split-kernel approach,
> the programmer has to write RT-kernel support for the APIs he
> wants to
> use (or beg for them to be written). Most programmers would
> much rather
> limit API usage than implement new kernel support themselves.
I strongly support this. It makes a big difference, not only from
the technical point of view (as described above: developers have
to master "two worlds"), but more importantly from the way
management sees things: As soon as some nanokernel or RT plus non-RT
OS approach is mentioned, mgmt fears that there are two different
sources of support (with the usual finger-pointing problems: "Not
our problem, report it to ..."), twice the patches and version
hassles, additional legal issues and runtime license costs/troubles,
two different development environments which must be supported
by the central IT department, ...
When I was told to analyze whether linux is suitable for our
needs, any nanokernel or two-OS approaches were excluded from the
beginning: Mgmt thought that due to their nature and complexity,
such approaches are not able to offer any improvements w.r.t. what
we have now. Clearly, "one system and one source" is wanted!
(we currently use a monolitic, "one-world" OS, but a commercial one)
--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com
James Bruce wrote:
> Nick Piggin wrote:
>
>> But nobody has been able to say why a single kernel is better than a
>> nanokernel.
>
>
> I think it's a bit more like you haven't realized the answer when people
> gave it, so let me try to be more clear. It's purely a matter of effort
Sorry no, nobody answered me. What I did realize was that there
was a lot of noise nothing really got resolved.
> - in general it's far easier to write one process than two communicating
> processes.
I reject the vague "complexity" argument. If your application
is not fairly clear on what operations need to happen in a
deterministic time and what aren't, or if you aren't easily able
to get 2 communicating processes working, then I contend that you
shouldn't be writing a realtime application.
What's more, you don't even need to have 2 communicating processes,
you could quite possibly do everything in the realtime kernel if
you are talking some simple control system and driver.
Note that I specifically reject the *vague* complexity argument,
because if you have a *real* one (ie. we need functionality
equivalent to this sequence of system calls executed in a
deterministic time - the nanokernel approach sucks because [...])
then I'm quite willing to accept it.
The fact is, nobody seems to know quite what kind of deterministic
functionality they want (and please, let's not continue the jokes
about X11 and XFS, etc.). Which really surprises me.
I will give *you* a complexity argument. Making the Linux kernel
hard realtime is slightly more complex than writing an app that
consists of 2 communicating processes.
> As far as APIs, with a single-kernel approach, an RT
> programmer just has to restrict the program to calling APIs known to be
> RT-safe (compare with MT-safe programming). In a split-kernel approach,
> the programmer has to write RT-kernel support for the APIs he wants to
> use (or beg for them to be written).
Yeah great. Can we stop with these misleading implications now?
*A* programmer will have to write RT support in *either* scheme.
*THE* programmer (you imply, a consumer of the userspace API)
does not.
There is absolutely no difference for the userspace programmer
in terms of realtime services.
> Most programmers would much rather
> limit API usage than implement new kernel support themselves.
>
...
> A very common RT app pattern is to do a bunch of non-RT stuff, then
> enter an RT loop. For an example from my work, a robot control program
> starts by reading a bunch of configuration files before it starts doing
> anything requiring deadlines, then enters the RT control loop. Having
> to read all the configuration in a separate program and then marshall
> the data over to an RT-only process via file descriptors is quite a bit
> more effort. I guess some free RT-nanokernels might/could support
> non-RT to RT process migration, or better messaging, but there's
You're controling a robot, and you consider passing configuration
data over a file descriptor to be overly complex? I guess the robot
doesn't do much, then?
> additional programming effort (and overhead) that wasn't there before.
> In general an app may enter and exit RT sections several times, which
> really makes a split-kernel approach less than ideal.
>
You know that if your app executes some code that doesn't require
deterministic completion then it doesn't have to exit from the
RT kernel, right?
Nor does the RT kernel have to provide *only* deterministic services.
Hey, it could implement a block device backed filesystem - that solves
your robot's problem.
> An easy way to visualize the difference in programming effort for the
> two approaches is to take your favorite threaded program and turn it
> into one with separate processes that only communicate via pipes. You
Yeah, or turn it into seperate processes that communicate via
shared memory. Oh, wait...
> can *always* do this, its just very much more painful to develop and
> maintain. Your stance of "nobody can prove why a split-kernel won't
> work" is equivalent to saying "we don't ever really need threads, since
> processes suffice". That's true, but only in the same way that I don't
> need a compilier or a pre-existing operating system to write an
> application.
>
No it is not equivalent at all, and even if it were, that is not
what my stance is. Let's dispense with the metaphors and abstract
comparisons, and cut to what my stance actually is:
"Nobody has even yet *suggested* any *slightly* credible reasons
of why a single kernel might be better than a split-kernel for
hard-RT"
Of all the "reasons" I have been given, most either I (as a naive
idiot, if you will) have been able to shoot holes in, or others
have simply said they're wrong.
I hate to say but I find this almost dishonest considering
assertions like "obviously superior" are being thrown around,
along with such fine explanations as "start writing realtime apps
and you'll find out".
Send instant messages to your online friends http://au.messenger.yahoo.com
kus Kusche Klaus wrote:
> When I was told to analyze whether linux is suitable for our
> needs, any nanokernel or two-OS approaches were excluded from the
> beginning: Mgmt thought that due to their nature and complexity,
> such approaches are not able to offer any improvements w.r.t. what
> we have now. Clearly, "one system and one source" is wanted!
>
You don't explain how making the Linux kernel hard-RT
will be so much simpler and more supportable!
Send instant messages to your online friends http://au.messenger.yahoo.com
On Fri, May 27, 2005 at 03:56:44PM +0200, Takashi Iwai wrote:
> At 27 May 2005 15:31:22 +0200,
> Andi Kleen wrote:
> >
> > On Fri, May 27, 2005 at 03:13:17PM +0200, Ingo Molnar wrote:
> > >
> > > > > but it's certainly not for free. Just like there's no zero-cost
> > > > > virtualization, or there's no zero-cost nanokernel approach either,
> > > > > there's no zero-cost single-kernel-image deterministic system either.
> > > > >
> > > > > and the argument about binary kernels - that's a choice up to vendors
> > > >
> > > > It is not only binary distribution kernels. I always use my own self
> > > > compiled kernels, but I certainly would not want a special kernel just
> > > > to do something normal that requires good latency (like sound use).
> > >
> > > for good sound you'll at least need PREEMPT_VOLUNTARY. You'll need
> > > CONFIG_PREEMPT for certain workloads or pro-audio use.
> >
> > AFAIK the kernel has quite regressed recently, but that was not true
> > (for reasonable sound) at least for some earlier 2.6 kernels and
> > some of the low latency patchkit 2.4 kernels.
> >
> > So it is certainly possible to do it without preemption.
>
> Yes, as Ingo stated many times, addition cond_resched() to
> might_sleep() does achieve the "usable" latencies -- and obviously
> that's hacky.
But it's the only way to get practial(1)low latency benefit to everybody -
not just a few selected few who know how to set the right
kernel options or do other incarnations and willfully give up performance
and stability.
It is basically similar to why we often avoid kernel tunables - the
kernel must work well out of the box.
(1) = not necessarily provable, but good enough at least for jack et.al.
>
> So, the only question is whether changing (inserting) cond_resched()
> to all points would be acceptable even if it results in a big amount
> of changes...
We've been doing that for years, haven't we. The main criterium
should be not to change code, but to not affect performance considerably.
-Andi
> kus Kusche Klaus wrote:
> > When I was told to analyze whether linux is suitable for our
> > needs, any nanokernel or two-OS approaches were excluded from the
> > beginning: Mgmt thought that due to their nature and complexity,
> > such approaches are not able to offer any improvements w.r.t. what
> > we have now. Clearly, "one system and one source" is wanted!
>
> You don't explain how making the Linux kernel hard-RT
> will be so much simpler and more supportable!
I didn't state that a hard-RT linux is simpler, technically
(however, personally, I believe that once RT linux is there, *our*
job of writing RT applications, device drivers, ... will be simpler
compared to a nanokernel approach).
I just stated that for the management, with its limited interest and
understanding of deep technical details (and, in our case, with bad
experiences with RT plus non-RT OS solutions), a one-system solution
*sounds* much simpler, easier to understand, and easier to manage.
Decisions in companies aren't based on purely technical facts,
sometimes not even on rational arguments...
And concerning support:
* If we go the "pure linux" way, we may (or may not) get help from
the community for our problems (it did work quite well up to now),
or we could buy commercial linux support.
* If we go the "nanokernel plus guest linux" way, we will not get
support from the nanokernel company for general linux kernel issues,
the community help will also be close to zero, because we no
longer have a pure linux system, and the community is not able to
reproduce and analyze our problems any longer (in the same way lkml
is rather unable to help on vendor linux kernels or on tainted
kernels), and the same holds for most companies offering commercial
linux support.
Hence, w.r.t. support, the nanokernel approach looks much worse.
--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com
* Andi Kleen <[email protected]> wrote:
> > Yes, as Ingo stated many times, addition cond_resched() to
> > might_sleep() does achieve the "usable" latencies -- and obviously
> > that's hacky.
>
> But it's the only way to get practial(1)low latency benefit to
> everybody - not just a few selected few who know how to set the right
> kernel options or do other incarnations and willfully give up
> performance and stability.
>
> It is basically similar to why we often avoid kernel tunables - the
> kernel must work well out of the box.
>
> (1) = not necessarily provable, but good enough at least for jack
> et.al.
FYI, to get good latencies for jack you currently need the -RT tree and
CONFIG_PREEMPT. (see Lee Revell's and Rui Nuno Capela's extensive tests)
In other words, cond_resched() in might_sleep() (PREEMPT_VOLUNTARY,
which i announced Jul 9 last year) is _not_ good enough for
advanced-audio (jack) users. PREEMPT_VOLUNTARY is mostly good enough for
simple audio playback / gaming.
Ingo
kus Kusche Klaus wrote:
>>You don't explain how making the Linux kernel hard-RT
>>will be so much simpler and more supportable!
>
>
> I didn't state that a hard-RT linux is simpler, technically
> (however, personally, I believe that once RT linux is there, *our*
> job of writing RT applications, device drivers, ... will be simpler
> compared to a nanokernel approach).
>
Perhaps very slightly simpler. Let's keep in mind that we're
not talking about "hello, world" apps here though, so I don't
think such a general statement is of any use.
> I just stated that for the management, with its limited interest and
> understanding of deep technical details (and, in our case, with bad
> experiences with RT plus non-RT OS solutions), a one-system solution
> *sounds* much simpler, easier to understand, and easier to manage.
>
So does Windows NT to some. But let's stick to technical details.
> Decisions in companies aren't based on purely technical facts,
> sometimes not even on rational arguments...
>
> And concerning support:
[...]
> Hence, w.r.t. support, the nanokernel approach looks much worse.
>
Gee, I think you're treading very thin ground there. Basing your
argument on what possible companies or communities might possibly
support one of two unimplemented solutions.
What's more, there is no reason why a hard-RT guest kernel, or the
host nanokernel would be closed source.
Send instant messages to your online friends http://au.messenger.yahoo.com
> FYI, to get good latencies for jack you currently need the -RT tree and
> CONFIG_PREEMPT. (see Lee Revell's and Rui Nuno Capela's extensive tests)
Yeah, but you did a lot of (often unrelated to rt preempt) latency fixes in RT
that are not yet merged into mainline. When they are all merged
things might be very different. And then there can be probably
more fixes.
No matter what you do with RT this is needed anyways because
the standard non preempt kernel needs to have reasonable latencies too.
-Andi
On Mon, 2005-05-30 at 12:33 +0200, Ingo Molnar wrote:
> FYI, to get good latencies for jack you currently need the -RT tree and
> CONFIG_PREEMPT. (see Lee Revell's and Rui Nuno Capela's extensive tests)
>
> In other words, cond_resched() in might_sleep() (PREEMPT_VOLUNTARY,
> which i announced Jul 9 last year) is _not_ good enough for
> advanced-audio (jack) users. PREEMPT_VOLUNTARY is mostly good enough for
> simple audio playback / gaming.
It's neither good enough for a lot of embedded applications, but _RT is.
tglx
Nick Piggin wrote:
> Bill Huey (hui) wrote:
>>
>> Uh, not really. Have you looked at the patch or are you inserting
>> hysteria in the discussion again ? :) Sounds like hysteria.
>>
>
> OK, I'll start small. What have you done with the tasklist lock?
> How did you make signal delivery time deterministic?
>
> How about fork/clone? Or don't those need to be realtime? What
> exactly _do_ you need to be realtime? I'm not asking rhetorical
> questions here.
>
Let me ask another question while you're thinking about that.
Note, this is a *specific* question that can easily be answered
without waffling about XFS or telling me to start writing RT
media apps, or accusing me of spreading hysteria...
OK:
I think it has been conceeded that a realtime Linux kernel
cannot be enabled by default because of prohibitive overhead,
right? I think this is even the case for PREEMPT_RT, which is
not hard-RT. (Correct me if I'm wrong).
Suppose you had a system where you need some RT operations,
but cannot tolerate such overhead for general purpose
performance processing.
So by definition you have excluded a single kernel approach.
A nanokernel is not clearly excluded. In fact, maybe it is
possible to run the Linux image with little overhead? Maybe
almost none with the right CPU hardware? (correct me...)
If you get to here without correcting me, my question is:
does such an application exist? Silly example is a cell
phone + JVM, but something really interrupt heavy (and
maybe SMP as well) might be better to cripple PREEMPT_RT.
Thanks. I can think of some other specific questions too,
when you've addressed these.
Send instant messages to your online friends http://au.messenger.yahoo.com
On Mon, 30 May 2005, Nick Piggin wrote:
> kus Kusche Klaus wrote:
>
> >>You don't explain how making the Linux kernel hard-RT
> >>will be so much simpler and more supportable!
> >
> >
> > I didn't state that a hard-RT linux is simpler, technically
> > (however, personally, I believe that once RT linux is there, *our*
> > job of writing RT applications, device drivers, ... will be simpler
> > compared to a nanokernel approach).
> >
>
> Perhaps very slightly simpler. Let's keep in mind that we're
> not talking about "hello, world" apps here though, so I don't
> think such a general statement is of any use.
>
One important aspect: Time to marked. Linux does have good hardware
support compared to commercial RTOSs and guest kernels! I.e. if you can
use the native Linux driver you very soon has your board up and running.
On the other hand if you first have to write and debug (or buy) drivers
for your RTOS or guest kernel you are already delayed for months.
I do like the idea of guest kernels - especially the ability to enforce a
strict seperation of RT and non-RT. But you can't use _any_ part of the
Linux kernel in your RT application - not even drivers. I know a lot of
stuff in Linux wont ever be useable as it is highly non-deterministic (the
file system forinstance); but some of it might turn up to become
deterministic (enough :-) once people start to work on it with that
ind mind - the network stack would be a good place to start....
Esben
* Andi Kleen <[email protected]> wrote:
> > > > Yes, as Ingo stated many times, addition cond_resched() to
> > > > might_sleep() does achieve the "usable" latencies -- and
> > > > obviously that's hacky.
> > >
> > > But it's the only way to get practial(1) low latency benefit to
> > > everybody [...]
> > > (1) = not necessarily provable, but good enough at least for jack
> > > et.al.
> >
> > FYI, to get good latencies for jack you currently need the -RT tree and
> > CONFIG_PREEMPT. (see Lee Revell's and Rui Nuno Capela's extensive tests)
>
> Yeah, but you did a lot of (often unrelated to rt preempt) latency
> fixes in RT that are not yet merged into mainline. When they are all
> merged things might be very different. And then there can be probably
> more fixes.
your argument above == cond_resched() in might_sleep() [ == VP ] is the
only way to get practical (e.g. jack) latencies.
my argument == i do agree that -VP is a step forward from PREEMPT_NONE
(i'd not have written and released it otherwise), but is
by no means enough for jack. You need at least the -RT
tree + CONFIG_PREEMPT to achieve good jack latencies.
in that sense your further argument that the -RT tree has more latency
related fixes has no relevance to this point: VP by itself is _not
enough_, having more latency fixes in the -RT tree (which mostly improve
CONFIG_PREEMPT and PREEMPT_RT) does not make VP any better of a solution
for jack's purposes.
[ and yes, those other latency fixes are not necessarily directly
related to the PREEMPT_RT feature, because the -RT tree is a
collection of latency related fixes and features, of which PREEMPT_RT
is the biggest, but not the only one. ]
so my main point still remains: it's wishful thinking to expect the
'standard' ( < CONFIG_PREEMPT) Linux kernel's latencies to improve well
enough for jack.
perhaps there's some misunderstanding wrt. what the -RT tree is. The -RT
tree is a collection of latency related patches and features: it
introduces the VP and PREEMPT_RT features, and it also improves all
preemption models (including CONFIG_PREEMPT). Furthermore, it includes
(in-kernel) features to measure and debug latencies. It's called -RT
because PREEMPT_RT is undoubtedly the 'crown jewel' feature, but that
does not mean it's the only goal of the patchset.
Ingo
On Mon, May 30, 2005 at 02:10:31PM +0200, Ingo Molnar wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
> > > > > Yes, as Ingo stated many times, addition cond_resched() to
> > > > > might_sleep() does achieve the "usable" latencies -- and
> > > > > obviously that's hacky.
> > > >
> > > > But it's the only way to get practial(1) low latency benefit to
> > > > everybody [...]
> > > > (1) = not necessarily provable, but good enough at least for jack
> > > > et.al.
> > >
> > > FYI, to get good latencies for jack you currently need the -RT tree and
> > > CONFIG_PREEMPT. (see Lee Revell's and Rui Nuno Capela's extensive tests)
> >
> > Yeah, but you did a lot of (often unrelated to rt preempt) latency
> > fixes in RT that are not yet merged into mainline. When they are all
> > merged things might be very different. And then there can be probably
> > more fixes.
>
> your argument above == cond_resched() in might_sleep() [ == VP ] is the
> only way to get practical (e.g. jack) latencies.
My argument was basically that we have no other choice than
to fix it anyways, since the standard kernel has to be usable
in this regard.
(it is similar to that we e.g. don't do separate "server VM" and "desktop VM"s
although it would be sometimes tempting. after all one wants a kernel
that works well on a variety of workloads and doesn't need to extensive
hand tuning)
>
> my argument == i do agree that -VP is a step forward from PREEMPT_NONE
> (i'd not have written and released it otherwise), but is
> by no means enough for jack. You need at least the -RT
> tree + CONFIG_PREEMPT to achieve good jack latencies.
Ok where are the big issues left?
Stuff where old-style preempt helps (= not scheduling during long code without
single big lock) can be usually fixed without too much effort with
cond_resched()s. Don't you agree on that?
Your argument of it being more ongoing work to fix latencies again
is a good one, but again I see no alternative to it since the
standard well-performing kernel cannot be "abandoned" in this regard.
> perhaps there's some misunderstanding wrt. what the -RT tree is. The -RT
> tree is a collection of latency related patches and features: it
> introduces the VP and PREEMPT_RT features, and it also improves all
> preemption models (including CONFIG_PREEMPT). Furthermore, it includes
> (in-kernel) features to measure and debug latencies. It's called -RT
> because PREEMPT_RT is undoubtedly the 'crown jewel' feature, but that
> does not mean it's the only goal of the patchset.
Yes, I understand that. But because of that it is not really
fair to compare the standard kernel to RT tree with all bells and whistles
enabled. I think it would be much better if RT was considered
as individual pieces, not all or nothing.
-Andi
On Sun, May 29, 2005 at 01:52:45PM -0600, Zwane Mwaikambo wrote:
> On Sun, 29 May 2005 [email protected] wrote:
> > but the 40,000 people who buy 4/8 channel mixers aren't" - by your standards,
> > nobody's interested in 48-channel mixing boards either.
>
> I seem to have gotten you rather excited, you've actually gone as far as
> creating a strawman argument for my allegedly "strawman" statement. What
> i originally stated was that media applications are not common place as far
> as _hard_ realtime systems are concerned, this was in reply to Bill's
> emphasis on media applications. Now i'm not trying to undermine the
Zwane,
They are common to folks wanting to playing back any kind of video image
with reasonable quality. What's happened is that sloppy apps are using
slopping OSes to create one big glitchfest that consumers are use to.
Coming from an old SGI background I know how idiotic this is. Yes,
RTOSes aren't used for media applications, not because its exotic, but
because most folks that are Microsoft influence, includes Linux, can't
write decent media apps even if IRIX and the sources for the apps are
handle to them.
> audiophiles' goals or aspirations and i do indeed see the benefits for
> them but in the event of Linux becoming an RTOS, the main fields
> of interest wouldn't be from media application providers (even if there
> certainly will be an increase in their interest).
This would changes it. RTOS companies are typically are driven by
defense contracts and the like which is a culture that encourages
consumer technologies like this to be developed. If you give apps folks
the necessary tools, combine that with the knowledge, then this would
be much more prevalent as a base level of performance for these apps.
If anything media apps have been sucky for the very reason Valdis
previously described.
bill
On Mon, May 30, 2005 at 02:40:38PM +0200, Andi Kleen wrote:
...
> Ok where are the big issues left?
Pretty much the entire kernel and anything that has a loop in it.
That's why the use of preemption points can't work in that it
can't be spread throughout the kernel in the way you've mentioned.
> Yes, I understand that. But because of that it is not really
> fair to compare the standard kernel to RT tree with all bells and whistles
> enabled. I think it would be much better if RT was considered
> as individual pieces, not all or nothing.
The lock work is an all or nothing chunk. It's the main portion
of this patch that gives the major performance boost. All other
work is marginal at best to support latency or instrumentation
to back it. No insult, but Ingo's has said this multipule times.
bill
Esben Nielsen wrote:
> I do like the idea of guest kernels - especially the ability to enforce a
> strict seperation of RT and non-RT. But you can't use _any_ part of the
> Linux kernel in your RT application - not even drivers. I know a lot of
If you can't use the drivers, then presumably they're no good
to be used as they are for realtime in Linux either, though :(
In which case, you may still be better off porting the driver
over and rewriting it to be hard-realtime than rewriting Linux
itself ;)
But I could be wrong. I don't pretend to have the answers (just
questions!).
Thanks,
Nick
Send instant messages to your online friends http://au.messenger.yahoo.com
Nick Piggin wrote:
> Sorry no, nobody answered me. What I did realize was that there
> was a lot of noise nothing really got resolved.
I believe you mean you don't believe any answer given so far. You could
easily be correct, and us wrong, but it's not that nobody answered you.
Why not start by saying you disagree with us rather than pretending we
never said anything?
>> - in general it's far easier to write one process than two
>> communicating processes.
>
> I reject the vague "complexity" argument. If your application
> is not fairly clear on what operations need to happen in a
> deterministic time and what aren't, or if you aren't easily able
> to get 2 communicating processes working, then I contend that you
> shouldn't be writing a realtime application.
There is nothing vague about this. I have written distributed and
non-distributed control algorithms for quite a while now. I know how to
judge the complexity of a design for that type of system. You can claim
that I cannot judge such complexity a priori, but then I could equally
claim that you cannot judge the complexity of a kernel modification. In
that case we get nowhere. Instead let's respect each others' area of
expertise, shall we?
> What's more, you don't even need to have 2 communicating processes,
> you could quite possibly do everything in the realtime kernel if
> you are talking some simple control system and driver.
In the real world things get a bit more complicated than a simple PID
controller. There is a whole progression between soft-realtime programs
that need lots of kernel services, and hard realtime apps that might
need only a single device. It's hard to service a middle ground with a
completely different approach to the two ends of the problem space.
> Note that I specifically reject the *vague* complexity argument,
> because if you have a *real* one (ie. we need functionality
> equivalent to this sequence of system calls executed in a
> deterministic time - the nanokernel approach sucks because [...])
> then I'm quite willing to accept it.
Adding support to the nanokernel for all the APIs a user may need
strikes me as more code than the RT patch in question, which outside of
all the locks it changes is pretty compact. Especially important is the
fact that it mainly relies on SMP-safeness to achieve RT performance.
> The fact is, nobody seems to know quite what kind of deterministic
> functionality they want (and please, let's not continue the jokes
> about X11 and XFS, etc.). Which really surprises me.
It's more like the amount of functionality that can be provided with RT
defines what applications are possible and what they can do. If Ingo
asks for a usage case or similar information, I'll gladly provide it.
Since the patch in question doesn't actually need that information to
work, he hasn't asked. Your responses throughout this thread have led
me to believe any detailed information I take the time to collect will
be summarily dismissed in a few seconds, so I won't bother.
> I will give *you* a complexity argument. Making the Linux kernel
> hard realtime is slightly more complex than writing an app that
> consists of 2 communicating processes.
Nobody asked for guaranteed hard realtime yet. Right now we're
discussing a particular patch that achieves measurably good (but not
guaranteed) RT performance. It's a question of supporting that patch,
or not, and hard realtime has nothing to do with it right now. In fact,
an attempt was made to just focus in on IRQ threading to avoid a
flame-fest. So your complexity argument about a mythical future
modification of this patch not even under discussion is vague... more
vague in fact, than my example above that you rejected for being "too
vague". If you dislike the approach in the RT patch, say "the RT
patch", and don't invoke assumed future difficulties from "hard
realtime" which is not currently under discussion.
Let's entertain the complexity argument anyway though. Writing a good
shared library is more complicated than adding the support you need to a
particular application. Why do we have shared libraries then? That's
because there's more than *one* application. So its better to ask if
supporting realtime in the kernel is easier than hundreds of developers
writing split-kernel applications for the next several years. For the
RT patch it's not a question of implementation, since there already seem
to be people such as Ingo willing to do the work. It's just a question
of supporting such a beast once integrated. I think this patch's
approach shines in the fact that it depends mostly on existing SMP-safe
coding practices, and an intrusive but arguably bearable annotation of
spinlocks as to whether they need to be raw or not. The locks could be
better named, but a patch to rename them all would be far too large to
ever get accepted.
IMHO Linux has always taken the path of best design to achieve a goal,
which is distinct from "simplest design, period". It's easier to write
a non-SMP kernel, yet Linux has one. It's easier to write a kernel that
works on one architecture than one that works on 23, yet we have that
too. Is this patch the most elegant approach? Probably not, and you
can be sure it'll change a lot before it's included (if ever), simply
based on the history of such things. However we're not even getting
that far, because you are attacking everyone that says they like
programming for a single-kernel model, arguing from a viewpoint of not
having read the patch or following its development up to this point.
> Yeah great. Can we stop with these misleading implications now?
> *A* programmer will have to write RT support in *either* scheme.
> *THE* programmer (you imply, a consumer of the userspace API)
> does not.
Right, a programmer (a maintainer, really) will have to keep the
nanokernel's API up to date with additions to Linux's API that RT people
want to use. The single kernel approach simply means new APIs have to
be inspected for RT performance, but many that are written to be SMP
scalable will "just work" for most RT people. I know for me at least
its a hell of a lot easier to read kernel code and see what latency it
might entail than to produce a steady stream of updates to keep an API
up to date. The split kernel approach sounds a lot like UML in terms of
effort, which makes sense to avoid if you can.
> You're controling a robot, and you consider passing configuration
> data over a file descriptor to be overly complex?
I said MORE complex, not OVERLY complex. Really the SOLE PURPOSE of an
OS is to make writing applications easier. That's because anything
*could* be written for bare hardware in assembler from scratch; Its
simply a question of time and effort spent. Infrastructure such as an
OS or compiler make sense only because they amortize their very
difficult design over the thousands or millions of things people end up
doing with them. Enough people want at least soft RT that it *will*
happen, so let's work together to find the most elegant implementation.
> I guess the robot doesn't do much, then?
...or maybe its a system with several hundred configuration parameters,
calculated and learned tables, and other associated stuff I'd rather not
marshal over to a control process since its completely unnecessary to do
so with a better designed RT subsystem? If you know better about all
these apps, feel free to come to the next IROS or ICRA and tell us all
we've been wasting our lives on robotics research. I'm glad Linux has
become the preeminent OS for robotics in spite of attitudes such as this.
> You know that if your app executes some code that doesn't require
> deterministic completion then it doesn't have to exit from the
> RT kernel, right?
Sure, it can always upcall to the normal kernel, but the piping for this
has to be written (and maintained), along with possible library updates
to support the user-visible API. In any of the designs I've seen so far
this doesn't come for free. I haven't studied RTAI/Fusion though, so
maybe things have changed.
> Nor does the RT kernel have to provide *only* deterministic services.
> Hey, it could implement a block device backed filesystem - that solves
> your robot's problem.
And who's going to write it? I think you overestimate the burden of
supporting the proposed approach in the RT patch in comparison to
maintaining a nanokernel with a useful set of APIs. Writing and
supporting a nanokernel connected with Linux is by no means trivial or
free, and the effort for implementation and maintenance goes up for
every added API or upcall to the non-RT system.
> "Nobody has even yet *suggested* any *slightly* credible reasons
> of why a single kernel might be better than a split-kernel for
> hard-RT"
>
> Of all the "reasons" I have been given, most either I (as a naive
> idiot, if you will) have been able to shoot holes in, or others
> have simply said they're wrong.
Yes, you "shoot holes" by bringing up examples such as fork/exec and
other things RT apps would almost never do while expecting to meet
deadlines. Then at the same time, when someone describes what an RT
application typically does do, you claim how simple and trivial it all
is, and without knowing any of the details tell them that it'd be easy
to split it into separate processes. Please explain how a split-kernel
method supports a continuous progression from soft-realtime to
hard-realtime, where each set of API calls has associated latency
effects that may or may not be tolerable for a given application.
That's the problem space, and I can guarantee applications exist all
along that progression, and many don't fall cleanly into one side or the
other.
> I hate to say but I find this almost dishonest considering
> assertions like "obviously superior" are being thrown around,
> along with such fine explanations as "start writing realtime apps
> and you'll find out".
I said neither, why don't you take it up with the authors of those
comments. Btw, Mach was extended to do RT in a project called RT-Mach.
Since you like that approach so much, maybe you should ask yourself
why it failed. You could also think about why the Jack people aren't
using something like RTAI with its nanokernel approach. It's certainly
not because the people working on those systems are ignorant.
- Jim Bruce
James Bruce wrote:
> Nick Piggin wrote:
>
>> Sorry no, nobody answered me. What I did realize was that there
>> was a lot of noise nothing really got resolved.
>
>
[snip lots of stuff]
Sorry James, we were talking about hard realtime. Read the thread.
What's more, I don't think you understand how a nanokernel solution
would work, nor have much idea about the complexity of implementing
it in Linux (although that could have been a result of your thinking
that we weren't talking about hard-rt).
And my questions for which I got no answer were things like
"why is a single kernel superior to a nanokernel for hard-RT?",
"what deterministic services would a hard-RT Linux need to provide?"
So most of what you said is irrelevant, but I'll pick out a few bits.
[snip]
> Yes, you "shoot holes" by bringing up examples such as fork/exec and
> other things RT apps would almost never do while expecting to meet
No, that wasn't part of any of my hole shooting. I asked what operations
need to be realtime and have not had an answer. fork/exec was "prompting".
> deadlines. Then at the same time, when someone describes what an RT
> application typically does do, you claim how simple and trivial it all
> is, and without knowing any of the details tell them that it'd be easy
> to split it into separate processes.
Err, your example was "reading a configuration file". Not exactly
rocket science my good man.
> Please explain how a split-kernel
> method supports a continuous progression from soft-realtime to
> hard-realtime, where each set of API calls has associated latency
> effects that may or may not be tolerable for a given application. That's
> the problem space, and I can guarantee applications exist all along that
> progression, and many don't fall cleanly into one side or the other.
>
You say this like you have a confabulous solution ready to plonk
into the Linux kernel.
But it is not up to me to point out why one way is better than the
other because I am not asking to have anything merged (not saying
*you* are either, I joined this thread by asking an open ended
question).
>> I hate to say but I find this almost dishonest considering
>> assertions like "obviously superior" are being thrown around,
>> along with such fine explanations as "start writing realtime apps
>> and you'll find out".
>
>
> I said neither, why don't you take it up with the authors of those
> comments. Btw, Mach was extended to do RT in a project called RT-Mach.
> Since you like that approach so much, maybe you should ask yourself why
> it failed. You could also think about why the Jack people aren't using
> something like RTAI with its nanokernel approach. It's certainly not
> because the people working on those systems are ignorant.
>
I have a better idea. I won't read up on any of that, and I will go
and do my own thing and stop wasting my time on this thread. Then
whoever wants to start putting hard realtime functionality into Linux
can *tell* me why nanokernels failed, OK? Let's end the discussion
until then. It is going nowhere.
Send instant messages to your online friends http://au.messenger.yahoo.com
On Mon, 30 May 2005, kus Kusche Klaus wrote:
> I didn't state that a hard-RT linux is simpler, technically
> (however, personally, I believe that once RT linux is there, *our*
> job of writing RT applications, device drivers, ... will be simpler
> compared to a nanokernel approach).
I can't quite see how, in my experience they involve the same
effort, but i guess that's personal opinion.
> I just stated that for the management, with its limited interest and
> understanding of deep technical details (and, in our case, with bad
> experiences with RT plus non-RT OS solutions), a one-system solution
> *sounds* much simpler, easier to understand, and easier to manage.
>
> Decisions in companies aren't based on purely technical facts,
> sometimes not even on rational arguments...
But decisions for the Linux kernel must always be rational and technical.
Regarding ease of maintenance, debugging/maintaining an application on a
nanokernel (ie isolated) is a lot easier than something as large and
complex as the Linux kernel. This also applies for QA and general
verification.
> And concerning support:
>
> * If we go the "pure linux" way, we may (or may not) get help from
> the community for our problems (it did work quite well up to now),
> or we could buy commercial linux support.
Considering how controlling your management is, i'm surprised you'd stake
your business on something as non deterministic as the Linux kernel
mailing list.
> * If we go the "nanokernel plus guest linux" way, we will not get
> support from the nanokernel company for general linux kernel issues,
I find that hard to believe literally any company which sells you
operating system software will be more than willing to provide you support
for the supplied components, obviously at a price but they are after all
in the business of making money.
> the community help will also be close to zero, because we no
> longer have a pure linux system, and the community is not able to
> reproduce and analyze our problems any longer (in the same way lkml
> is rather unable to help on vendor linux kernels or on tainted
> kernels), and the same holds for most companies offering commercial
> linux support.
A volunteer supported public forum as a means of handling technical issues
for a company doesn't sound like a good idea.
> Hence, w.r.t. support, the nanokernel approach looks much worse.
I can't quite see how you drew that conclusion. The fact is, pay someone
and they'll resolve your problems.
Regards,
Zwane
[ From my point of view, it is clear that this part of the thread is
non-technical. IOW, we could go on back-and-worth indefinitely. In
the following, I'm putting my nanokernel-promoter hat back on to point
out a few things ... Previous disclaimers still apply :) ]
James Bruce wrote:
> I think it's a bit more like you haven't realized the answer when people
> gave it, so let me try to be more clear. It's purely a matter of effort
> - in general it's far easier to write one process than two communicating
> processes. As far as APIs, with a single-kernel approach, an RT
> programmer just has to restrict the program to calling APIs known to be
> RT-safe (compare with MT-safe programming). In a split-kernel approach,
> the programmer has to write RT-kernel support for the APIs he wants to
> use (or beg for them to be written). Most programmers would much rather
> limit API usage than implement new kernel support themselves.
Actually, I would suggest that anybody who's for PREEMPT_RT to drop
this argument. Fact is, requiring more work on the part of those wanting
to accomplishing very specialized tasks (such as RT) can very much be
seen as the Linux way.
So yes, it sucks having to write two apps, and it sucks having to port
drivers, but let's face it, 95% of Linux applications and 95% of drivers
-- statistics accurate 19 times out of 20 with a margin of error of +/-
3% :D -- will never ever be used in a hard-rt environment.
Based on that, it is likely (and indeed from reading responses, I seems
this is what is happening) that most kernel subsystem maintainers may
find the added cost of maintainership too high for the perceived benefits.
> In general an app may enter and exit RT sections several times, which
> really makes a split-kernel approach less than ideal.
I'd hate to disappoint you, but RTAI has been providing the following
calls from standard Linux apps (e.g. type "$ ./my_app" ENTER) for _five_
years:
rt_make_hard_real_time()
rt_make_soft_real_time()
Switching back-and-forth to/from hard-rt mode has been possible and
done many times.
<alternate proposal>
Much like there is nothing precluding PREEMPT_RT to co-exist with
the nanokernel approach (on which RTAI is based), it could be suggested
the adding of a linux/hard-rt directory containing the (re?)implementation
of services/abstractions required for hard-rt applications. You still
get a single tree, but there's then a clear separation at many levels,
including maintainership. As such, much of what RTAI-fusion is currently
doing could find itself in linux/hard-rt. For example, RTAI-fusion
transparently provides a hard-rt deterministic nanosleep(). This and
other such replacements for kernel/*.c would live in hard-rt/ with
no disturbance to the rest of the tree. In the same way, include/linux
could be a symbolic link to either include/linux-hrt or include/linux-srt,
with headers in include/linux-hrt referring back to include/linux-srt
where appropriate. Again, zero cost for mainstream maintainers. If the
hard-rt stuff breaks, only the rt folks get the pain. Note that I'm not
suggesting creating duplicates like this for all directories. In fact,
most of what's in arch/* and drivers/* would remain unchanged, and
where appropriate, hard-rt/* and include/linux-hrt should reuse as much
of what already exists as possible.
Sure, the hard-rt part wouldn't have all the bells and whistles of the
mainstream part, but that's what we're going to have anyway if
PREEMPT_RT is included (as is clearly acknowledged elsewhere in this
thread by those backing it), unless there's a general consensus amongst
all subsystem maintainers that Linux should become QNX-like ... which,
to the best of my reading of this thread, most are not interested in.
The above suggestion doesn't solve the two-app vs. one-app dilemma, but
it takes away the "oh, horror, we need to maintain two separate kernel
trees for our application development" from those against the nanokernel
approach _without_ imposing additional burden on mainstream maintainers.
</alternate proposal>
... so here goes, it's just an idea I'm throwing in the lion pit ...
it clearly would require much more work and input ... so devour, tear,
and crush at will ...
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Mon, 2005-05-30 at 21:58 +1000, Nick Piggin wrote:
> Esben Nielsen wrote:
>
> > I do like the idea of guest kernels - especially the ability to enforce a
> > strict seperation of RT and non-RT. But you can't use _any_ part of the
> > Linux kernel in your RT application - not even drivers. I know a lot of
>
> If you can't use the drivers, then presumably they're no good
> to be used as they are for realtime in Linux either, though :(
>
> In which case, you may still be better off porting the driver
> over and rewriting it to be hard-realtime than rewriting Linux
> itself ;)
>
> But I could be wrong. I don't pretend to have the answers (just
> questions!).
Sorry Nick, I know you want out of this thread but I figured I'll just
comment on this note alone. :-)
If a driver is well written in Linux then you don't need to rewrite it
for -RT. The thing that Ingo's locks give us is that it makes the
kernel able to preempt in more locations. When a normal driver calls
spin_lock_irqsave in the -RT kernel, it doesn't turn off interrupts or
preemption. A low priority RT task can be using it when a higher
priority RT task wakes up, and this won't slow down the waking of the
higher prio task. Thus, a Linux driver can be used in the -RT kernel for
RT tasks. As long as you know the effects of using it.
I only focus on the single kernel RT approach so I'm not going to give
any comments on the single verses nano- since I don't know enough about
the nano. But I figured I'd just comment on some of the issues of a
driver in today's Linux compared to the -RT Linux, since that part I do
know. Actually, including the -RT into mainline would probably have
more developers help out the maintainers of the devices. If someone has
an issue with a driver in regards to RT, then an RT programmer would
have to be the one to work on it, and the ending result should be
something that works better for the non-RT kernel.
OK, I'll let you get back to whatever you were doing now. ;-)
-- Steve
On Mon, 30 May 2005, Nick Piggin wrote:
> Esben Nielsen wrote:
>
> > I do like the idea of guest kernels - especially the ability to enforce a
> > strict seperation of RT and non-RT. But you can't use _any_ part of the
> > Linux kernel in your RT application - not even drivers. I know a lot of
>
> If you can't use the drivers, then presumably they're no good
> to be used as they are for realtime in Linux either, though :(
>
The driver is probably good enough but because you have to call into the
Linux kernel to use them. With a guest kernel setup you can forget about
realtime then. With PREEMPT_RT you get hard realtime behaviour out of the
box.
Ofcourse, there is a lot of buts to that. You have to check that the
driver doesn't take a call path which is nontermnistic in special cases
and the path between your application and the driver is deterministic.
A static code checker would be nice...
Esben
Esben Nielsen wrote:
> Ofcourse, there is a lot of buts to that. You have to check that the
> driver doesn't take a call path which is nontermnistic in special cases
> and the path between your application and the driver is deterministic.
> A static code checker would be nice...
Which gets up back where we began: drivers that are non-deterministic
will continue being deterministic regardless of what solution is adopted,
if any, and will be in need of a re-write/major-modification, which
itself will have little or no added value for non-rters ...
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
err ... typo.
Karim Yaghmour wrote:
> Which gets up back where we began: drivers that are non-deterministic
> will continue being deterministic regardless of what solution is adopted,
^^non-
Should read:
will continue being non-deterministic regardless of what solution is adopted,
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Mon, 30 May 2005, Karim Yaghmour wrote:
>
> [ From my point of view, it is clear that this part of the thread is
> non-technical. IOW, we could go on back-and-worth indefinitely. In
> the following, I'm putting my nanokernel-promoter hat back on to point
> out a few things ... Previous disclaimers still apply :) ]
>
> James Bruce wrote:
> > I think it's a bit more like you haven't realized the answer when people
> > gave it, so let me try to be more clear. It's purely a matter of effort
> > - in general it's far easier to write one process than two communicating
> > processes. As far as APIs, with a single-kernel approach, an RT
> > programmer just has to restrict the program to calling APIs known to be
> > RT-safe (compare with MT-safe programming). In a split-kernel approach,
> > the programmer has to write RT-kernel support for the APIs he wants to
> > use (or beg for them to be written). Most programmers would much rather
> > limit API usage than implement new kernel support themselves.
>
> Actually, I would suggest that anybody who's for PREEMPT_RT to drop
> this argument. Fact is, requiring more work on the part of those wanting
> to accomplishing very specialized tasks (such as RT) can very much be
> seen as the Linux way.
>
> So yes, it sucks having to write two apps, and it sucks having to port
> drivers, but let's face it, 95% of Linux applications and 95% of drivers
> -- statistics accurate 19 times out of 20 with a margin of error of +/-
> 3% :D -- will never ever be used in a hard-rt environment.
>
You know what? Most of the commercial RTOS I happen to use at work can't
be used for hard RT either. That includes stuff like the IP stack and
filesystem. But the small part which is (the scheduler + syncronization
mechanisms + simple drivers like UARTs) etc is _very_ usefull for RT. Some
of the drivers are not good enough for RT - but most doesn't exist at all!
Same for Linux with PREEMPT_RT: The basis system is hard RT (even better
priority inheritance mechanism) (but not as low latencies). The basis for
making a RT system is there. No, you can't use the IP stack and you can't
use the filesystem from RT threads. But all the 95% which isn't RT works
much better than in the commecial RTOS. And there is a chance that someone
might lift the burden to lift some of it to become RT to various degrees.
With a nanokernel the chance of somebody lifting a subsystem into the
nanokernel space and integrate it with the existing Linux API is very,
very close to nil. (Please, prove me wrong if you have a RT IP-stack
and maybe a RT USB stack for RTAI.)
In my view there is no really big difference between Linux and the RTOS I
use at work except that Linux works much better for all the non-RT
stuff, have better driver support etc. PREEMPT_RT shows that low priority,
non-RT stuff can be made to stop interfering with high priority RT stuff -
with exactly the same mechanisms as in the traditional RTOS, opening the
same kind of posibilities. Unless people start to throw around
raw_spin_lock's or preempt_disable() in subsystem code, I can't see why
you shouldn't rely on it to stay that way.
A subkernel is a _hack_. You must admit that. It was only done in the
first place because Linux was too big a mouthfull to rewrite. Now Ingo
have more or less done it. Why then should we continue to use a hard
to maintain hack, when we can get the real thing?
Ingo's patch have one big advantage: The good chance of going mainstream.
People might not run with CONFIG_PREEMPT_RT just as most people aren't
running with CONFIG_PREEMPT now, but the code will be there and there will
be large group ready to maintain it once it goes mainstream.
Esben
On Mon, 30 May 2005, Karim Yaghmour wrote:
>
> Esben Nielsen wrote:
> > Ofcourse, there is a lot of buts to that. You have to check that the
> > driver doesn't take a call path which is nontermnistic in special cases
> > and the path between your application and the driver is deterministic.
> > A static code checker would be nice...
>
> Which gets up back where we began: drivers that are non-deterministic
> will continue being deterministic regardless of what solution is adopted,
> if any, and will be in need of a re-write/major-modification, which
> itself will have little or no added value for non-rters ...
But if you do have to maintain your own driver it is a lot easier to start
from an existing and fix that one than it is to start all over. I bet the
modifcations aren't too big for many drivers anyways. When I get more time
I'll try to look into some drivers. Many of them is propably just about
removing printk's and the like.
Esben
>
> Karim
Esben Nielsen wrote:
> very close to nil. (Please, prove me wrong if you have a RT IP-stack
> and maybe a RT USB stack for RTAI.)
Do take me seriously when I say that RTAI is seriously overlooked:
RT-Net (real-time UDP over Ethernet):
http://www.rts.uni-hannover.de/rtnet/
RT-USB (real-time USB):
https://mail.rtai.org/pipermail/rtai/2005-April/011192.html
http://developer.berlios.de/projects/rtusb
Both of these use RTAI on top of Adeos :P
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
Esben Nielsen wrote:
> But if you do have to maintain your own driver it is a lot easier to start
> from an existing and fix that one than it is to start all over. I bet the
> modifcations aren't too big for many drivers anyways. When I get more time
> I'll try to look into some drivers. Many of them is propably just about
> removing printk's and the like.
Right, and that's exactly what you've got with RT-net (at least the last
time I used it 4 years ago.) You took the standard Ethernet driver from
Linux and modified a few calls, and bingo, you had an rt-net driver based
on the standard Linux driver ... all of which in RTAI ...
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
Here's for the fun of history, a diff between the 8390.c in 2.2.16 and the
one in rt-net 0.9.0:
0a1,12
> /*
> rtnet/module/driver/8390.c
> driver for 8390-based network interfaces
>
> rtnet - real-time networking subsystem
> Copyright (C) 1999,2000 Zentropic Computing, LLC
>
> This file is a modified version of a source file located in the
> standard Linux source tree. Information about how to find the
> original version of this file is located in rtnet/original_files.
> */
>
49a62,63
> #define EXPORT_SYMTAB
>
71a86,137
> #if defined(CONFIG_RTNET) || defined(CONFIG_RTNET_MODULE)
> #include <rtnet/rtnet.h>
> #define RT_DRIVER
> static int ei_rt_start_xmit(struct sk_buff *skb, struct rt_device *rtdev);
> #define RT_dev_alloc_skb(a) ((!ei_local->rt)?dev_alloc_skb(a):dev_alloc_rtskb(a))
> #define RT_mark_bh(a) do{if(!ei_local->rt)mark_bh(a);}while(0)
> #define RT_dev_kfree_skb(a) do{if(!ei_local->rt){dev_kfree_skb(a);}else{dev_kfree_rtskb(a);}}while(0)
> #define RT_netif_rx(a) ((!ei_local->rt)?netif_rx(a):rtnetif_rx(a))
> #define RT_printk(format,args...) rt_printk(format,##args)
> #define RT_enable_irq(a) do{if(!ei_local->rt)enable_irq(a);else rt_enable_irq(a);}while(0)
> #define RT_disable_irq_nosync(a) do{if(!ei_local->rt)disable_irq_nosync(a);else rt_disable_irq(a);}while(0)
> #define RT_spin_lock(a) \
> do{if(!ei_local->rt){ \
> spin_lock(a); \
> }else{ \
> rt_spin_lock(a); \
> }}while(0)
> #define RT_spin_unlock(a) \
> do{if(!ei_local->rt){ \
> spin_unlock(a); \
> }else{ \
> rt_spin_unlock(a); \
> }}while(0)
> #define RT_spin_lock_irqsave(a,b) \
> do{if(!ei_local->rt){ \
> spin_lock_irqsave(a,b); \
> }else{ \
> (b)=rt_spin_lock_irqsave(a); \
> }}while(0)
> #define RT_spin_unlock_irqrestore(a,b) \
> do{if(!ei_local->rt){ \
> spin_unlock_irqrestore(a,b); \
> }else{ \
> rt_spin_unlock_irqrestore(b,a); \
> }}while(0)
> #define RT_spin_lock_init(a) spin_lock_init(a)
> #else
> #define DIFE(a,b) (a)
> #define RT_dev_alloc_skb dev_alloc_skb
> #define RT_mark_bh(a) mark_bh(a)
> #define RT_dev_kfree_skb(a) dev_kfree_skb(a)
> #define RT_netif_rx(a) netif_rx(a)
> #define RT_printk printk
> #define RT_enable_irq(a) enable_irq(a)
> #define RT_disable_irq_nosync(a) disable_irq_nosync(a)
> #define RT_spin_lock(a) spin_lock(a)
> #define RT_spin_unlock(a) spin_unlock(a)
> #define RT_spin_lock_irqsave(a,b) spin_lock_irqsave(a,b)
> #define RT_spin_unlock_irqrestore(a,b) spin_unlock_irqrestore(a,b)
> #define RT_spin_lock_init(a) spin_lock_init(a)
> #endif
>
157c223
< printk(KERN_EMERG "%s: ei_open passed a non-existent device!\n", dev->name);
---
> RT_printk(KERN_EMERG "%s: ei_open passed a non-existent device!\n", dev->name);
166c232
< spin_lock_irqsave(&ei_local->page_lock, flags);
---
> RT_spin_lock_irqsave(&ei_local->page_lock, flags);
171c237
< spin_unlock_irqrestore(&ei_local->page_lock, flags);
---
> RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
186c252
< spin_lock_irqsave(&ei_local->page_lock, flags);
---
> RT_spin_lock_irqsave(&ei_local->page_lock, flags);
188c254
< spin_unlock_irqrestore(&ei_local->page_lock, flags);
---
> RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
192a259,267
> #ifdef RT_DRIVER
> static int ei_start_xmit(struct sk_buff *skb, struct device *dev);
>
> static int ei_rt_start_xmit(struct sk_buff *skb, struct rt_device *rtdev)
> {
> return ei_start_xmit(skb,rtdev->dev);
> }
> #endif
>
218c293
< spin_lock_irqsave(&ei_local->page_lock, flags);
---
> RT_spin_lock_irqsave(&ei_local->page_lock, flags);
222c297
< spin_unlock_irqrestore(&ei_local->page_lock, flags);
---
> RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
230,231c305,306
< spin_unlock_irqrestore(&ei_local->page_lock, flags);
< printk(KERN_WARNING "%s: xmit on stopped card\n", dev->name);
---
> RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
> RT_printk(KERN_WARNING "%s: xmit on stopped card\n", dev->name);
241c316
< printk(KERN_DEBUG "%s: Tx timed out, %s TSR=%#2x, ISR=%#2x, t=%d.\n",
---
> RT_printk(KERN_DEBUG "%s: Tx timed out, %s TSR=%#2x, ISR=%#2x, t=%d.\n",
256c331
< spin_unlock_irqrestore(&ei_local->page_lock, flags);
---
> RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
260,261c335,336
< disable_irq_nosync(dev->irq);
< spin_lock(&ei_local->page_lock);
---
> RT_disable_irq_nosync(dev->irq);
> RT_spin_lock(&ei_local->page_lock);
263a339
> /* XXX not realtime! */
267,268c343,344
< spin_unlock(&ei_local->page_lock);
< enable_irq(dev->irq);
---
> RT_spin_unlock(&ei_local->page_lock);
> RT_enable_irq(dev->irq);
279c355
< spin_lock_irqsave(&ei_local->page_lock, flags);
---
> RT_spin_lock_irqsave(&ei_local->page_lock, flags);
281c357
< spin_unlock_irqrestore(&ei_local->page_lock, flags);
---
> RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
288c364
< disable_irq_nosync(dev->irq);
---
> RT_disable_irq_nosync(dev->irq);
290c366
< spin_lock(&ei_local->page_lock);
---
> RT_spin_lock(&ei_local->page_lock);
294c370
< printk(KERN_WARNING "%s: Tx request while isr active.\n",dev->name);
---
> RT_printk(KERN_WARNING "%s: Tx request while isr active.\n",dev->name);
296,297c372,373
< spin_unlock(&ei_local->page_lock);
< enable_irq(dev->irq);
---
> RT_spin_unlock(&ei_local->page_lock);
> RT_enable_irq(dev->irq);
299c375
< dev_kfree_skb(skb);
---
> RT_dev_kfree_skb(skb);
321c397
< printk(KERN_DEBUG "%s: idle transmitter tx2=%d, lasttx=%d, txing=%d.\n",
---
> RT_printk(KERN_DEBUG "%s: idle transmitter tx2=%d, lasttx=%d, txing=%d.\n",
329c405
< printk(KERN_DEBUG "%s: idle transmitter, tx1=%d, lasttx=%d, txing=%d.\n",
---
> RT_printk(KERN_DEBUG "%s: idle transmitter, tx1=%d, lasttx=%d, txing=%d.\n",
335c411
< printk(KERN_DEBUG "%s: No Tx buffers free! irq=%ld tx1=%d tx2=%d last=%d\n",
---
> RT_printk(KERN_DEBUG "%s: No Tx buffers free! irq=%ld tx1=%d tx2=%d last=%d\n",
340,341c416,417
< spin_unlock(&ei_local->page_lock);
< enable_irq(dev->irq);
---
> RT_spin_unlock(&ei_local->page_lock);
> RT_enable_irq(dev->irq);
393,394c469,470
< spin_unlock(&ei_local->page_lock);
< enable_irq(dev->irq);
---
> RT_spin_unlock(&ei_local->page_lock);
> RT_enable_irq(dev->irq);
396c472
< dev_kfree_skb (skb);
---
> RT_dev_kfree_skb (skb);
414c490
< printk ("net_interrupt(): irq %d for unknown device.\n", irq);
---
> RT_printk ("net_interrupt(): irq %d for unknown device.\n", irq);
425c501
< spin_lock(&ei_local->page_lock);
---
> RT_spin_lock(&ei_local->page_lock);
431c507
< printk(ei_local->irqlock
---
> RT_printk(ei_local->irqlock
437c513
< spin_unlock(&ei_local->page_lock);
---
> RT_spin_unlock(&ei_local->page_lock);
447c523
< printk(KERN_DEBUG "%s: interrupt(isr=%#2.2x).\n", dev->name,
---
> RT_printk(KERN_DEBUG "%s: interrupt(isr=%#2.2x).\n", dev->name,
456c532
< printk(KERN_WARNING "%s: interrupt from stopped card\n", dev->name);
---
> RT_printk(KERN_WARNING "%s: interrupt from stopped card\n", dev->name);
495c571
< printk(KERN_WARNING "%s: Too much work at interrupt, status %#2.2x\n",
---
> RT_printk(KERN_WARNING "%s: Too much work at interrupt, status %#2.2x\n",
499c575
< printk(KERN_WARNING "%s: unknown interrupt %#2x\n", dev->name, interrupts);
---
> RT_printk(KERN_WARNING "%s: unknown interrupt %#2x\n", dev->name, interrupts);
504c580
< spin_unlock(&ei_local->page_lock);
---
> RT_spin_unlock(&ei_local->page_lock);
527c603
< printk(KERN_DEBUG "%s: transmitter error (%#2x): ", dev->name, txsr);
---
> RT_printk(KERN_DEBUG "%s: transmitter error (%#2x): ", dev->name, txsr);
529c605
< printk("excess-collisions ");
---
> RT_printk("excess-collisions ");
531c607
< printk("non-deferral ");
---
> RT_printk("non-deferral ");
533c609
< printk("lost-carrier ");
---
> RT_printk("lost-carrier ");
535c611
< printk("FIFO-underrun ");
---
> RT_printk("FIFO-underrun ");
537,538c613,614
< printk("lost-heartbeat ");
< printk("\n");
---
> RT_printk("lost-heartbeat ");
> RT_printk("\n");
576c652
< printk(KERN_ERR "%s: bogus last_tx_buffer %d, tx1=%d.\n",
---
> RT_printk(KERN_ERR "%s: bogus last_tx_buffer %d, tx1=%d.\n",
593c669
< printk("%s: bogus last_tx_buffer %d, tx2=%d.\n",
---
> RT_printk("%s: bogus last_tx_buffer %d, tx2=%d.\n",
608c684
< else printk(KERN_WARNING "%s: unexpected TX-done interrupt, lasttx=%d.\n",
---
> else RT_printk(KERN_WARNING "%s: unexpected TX-done interrupt, lasttx=%d.\n",
641c717
< mark_bh (NET_BH);
---
> RT_mark_bh (NET_BH);
674c750
< printk(KERN_ERR "%s: mismatched read page pointers %2x vs %2x.\n",
---
> RT_printk(KERN_ERR "%s: mismatched read page pointers %2x vs %2x.\n",
704c780
< printk(KERN_DEBUG "%s: bogus packet size: %d, status=%#2x nxpg=%#2x.\n",
---
> RT_printk(KERN_DEBUG "%s: bogus packet size: %d, status=%#2x nxpg=%#2x.\n",
714c790
< skb = dev_alloc_skb(pkt_len+2);
---
> skb = RT_dev_alloc_skb(pkt_len+2);
718c794
< printk(KERN_DEBUG "%s: Couldn't allocate a sk_buff of size %d.\n",
---
> RT_printk(KERN_DEBUG "%s: Couldn't allocate a sk_buff of size %d.\n",
730c806
< netif_rx(skb);
---
> RT_netif_rx(skb);
740c816
< printk(KERN_DEBUG "%s: bogus packet: status=%#2x nxpg=%#2x size=%d\n",
---
> RT_printk(KERN_DEBUG "%s: bogus packet: status=%#2x nxpg=%#2x size=%d\n",
752c828
< printk("%s: next frame inconsistency, %#2x\n", dev->name,
---
> RT_printk("%s: next frame inconsistency, %#2x\n", dev->name,
790c866
< printk(KERN_DEBUG "%s: Receiver overrun.\n", dev->name);
---
> RT_printk(KERN_DEBUG "%s: Receiver overrun.\n", dev->name);
855c931
< spin_lock_irqsave(&ei_local->page_lock,flags);
---
> RT_spin_lock_irqsave(&ei_local->page_lock,flags);
860c936
< spin_unlock_irqrestore(&ei_local->page_lock, flags);
---
> RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
901c977
< printk(KERN_INFO "%s: invalid multicast address length given.\n", dev->name);
---
> RT_printk(KERN_INFO "%s: invalid multicast address length given.\n", dev->name);
980c1056
< spin_lock_irqsave(&ei_local->page_lock, flags);
---
> RT_spin_lock_irqsave(&ei_local->page_lock, flags);
982c1058
< spin_unlock_irqrestore(&ei_local->page_lock, flags);
---
> RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
1004c1080
< spin_lock_init(&ei_local->page_lock);
---
> RT_spin_lock_init(&ei_local->page_lock);
1010a1087,1097
> #ifdef RT_DRIVER
> {
> struct ei_device *ei_priv = (struct ei_device *)dev->priv;
>
> if(ei_priv->rtdev == NULL)
> ei_priv->rtdev = rt_dev_alloc(dev);
>
> ei_priv->rtdev->xmit = ei_rt_start_xmit;
> }
> #endif
>
1061c1148
< printk(KERN_ERR "Hw. address read/write mismap %d\n",i);
---
> RT_printk(KERN_ERR "Hw. address read/write mismap %d\n",i);
1098c1185
< printk(KERN_WARNING "%s: trigger_send() called with the transmitter busy.\n",
---
> RT_printk(KERN_WARNING "%s: trigger_send() called with the transmitter busy.\n",
Of course this is ancient, but I just thought I'd illustrate my point.
Again, I suggest you drop the single vs. double application/driver, you
get the same results and limitations regardless of the RT method you
adopt.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Mon, May 30, 2005 at 07:37:33PM +1000, Nick Piggin wrote:
> I reject the vague "complexity" argument. If your application
> is not fairly clear on what operations need to happen in a
> deterministic time and what aren't, or if you aren't easily able
> to get 2 communicating processes working, then I contend that you
> shouldn't be writing a realtime application.
yeah, but you're also saying a lot of stuff that indicates you've
never written an RT app before.
> The fact is, nobody seems to know quite what kind of deterministic
> functionality they want (and please, let's not continue the jokes
> about X11 and XFS, etc.). Which really surprises me.
Christopher is an overconfident narrow minded jackass. You should be
beyond that behavior and chronic lack of vision.
For this to happen in Linux, ksoftirq needs to run regularly to service
IO scheduler and the SCSI layer so that GRIO can happen. You still need
that thread to be immune to large kernel latencies especially under high
IO load, which also needs high amount of regular CPU time. Guarantees
can't be met without it which means it's most definitely a kind of real
time task/thread. It's not just the lower level FS layers in question
that provides this solely. You should have intuited that from my
examples.
The details of which are more complicated than this once folk push into
that domain. I'm not an FS expert but I do know that RT is necessary
for any kind of QoS functionality like this. It's a fundamental that
must be in place.
I already told you the needs of X11/OpenGL. Buffer flipping during
vertical retrace. Additionally, it would be good to be able to determine
if a thread can get enough of a time slice to be able to render a
scene (quad/triangular mesh computation) for adaptive tessellation
or have a signal terminate that computation and flip it to the display.
Think about generalizing that for all OpenGL library implementation
and all drivers. This is not trivial for dual kernel set ups. Think
about how large X11 and running a task like that in a nanokernel
that can't sleep a task properly for swapping verse a single kernel
image that can already do it.
What was a simple read() wake up is now messaging FIFO queues it
a nanokernel set up. You'd have to retarget the apps and the drivers
to use that API instead of using limited Linux kernel facilities
via syscalls for some of this support.
> Yeah great. Can we stop with these misleading implications now?
> *A* programmer will have to write RT support in *either* scheme.
> *THE* programmer (you imply, a consumer of the userspace API)
> does not.
They'll have to clean up the driver and all upper layers. That's
easier than retargetting your app and layer to a nanokernel.
> There is absolutely no difference for the userspace programmer
> in terms of realtime services.
Except for the brickwall they run into when they need something
a bit more than an interrupt being serviced in a timely manner.
> "Nobody has even yet *suggested* any *slightly* credible reasons
> of why a single kernel might be better than a split-kernel for
> hard-RT"
Bullshit, multipule folks have. It's like you have this particular
view and can't or won't see it from another perpective. It's clearly
willfull.
> I hate to say but I find this almost dishonest considering
> assertions like "obviously superior" are being thrown around,
> along with such fine explanations as "start writing realtime apps
> and you'll find out".
Because it's true. Write a couple of these things and you'll see
what we mean by this. Consistently in this discussion, folks have
explained it to you but you can't take the ball and run with it in
a way that demonstrates that you really understand the media app
issues. It's like you're so locked into a neo-conservative way of
looking at these things that you don't know that this track can't
scale for our needs properly.
Really, most of us here have tried really really hard to get you
to understand it and the explanations are quite clear. There
isn't much we can do to change your mind since it wasn't really
there for change anyways.
bill
Karim Yaghmour <[email protected]> writes:
[snip]
> RT-USB (real-time USB):
> https://mail.rtai.org/pipermail/rtai/2005-April/011192.html
> http://developer.berlios.de/projects/rtusb
>
> Both of these use RTAI on top of Adeos :P
Hey, thanks for pointing that out! Removes my practical obstacle to
using USB. (Now I've just got to get past the aesthetic obstacle...)
cheers, Rich.
--
rich walker | Shadow Robot Company | [email protected]
technical director 251 Liverpool Road |
need a Hand? London N1 1LX | +UK 20 7700 2487
http://www.shadow.org.uk/products/newhand.shtml
On Tue, May 31, 2005 at 12:21:20AM +1000, Nick Piggin wrote:
> James Bruce wrote:
> [snip lots of stuff]
>
> Sorry James, we were talking about hard realtime. Read the thread.
> What's more, I don't think you understand how a nanokernel solution
> would work, nor have much idea about the complexity of implementing
> it in Linux (although that could have been a result of your thinking
> that we weren't talking about hard-rt).
He's was talking about it clearly as an experience RT app developer.
His points are clear and clearly show that he's written and run into
a lot of these same issues writing medium to large RT apps.
> And my questions for which I got no answer were things like
> "why is a single kernel superior to a nanokernel for hard-RT?",
> "what deterministic services would a hard-RT Linux need to provide?"
That's an RT begineer's question. You have to at least be up to speed
in that one to have the conversation at hand and folks have discussed
this repeatedly. It's not our end that failing and clearly you not
understanding this only reenforces this point..
> So most of what you said is irrelevant, but I'll pick out a few bits.
Oh god.
> No, that wasn't part of any of my hole shooting. I asked what operations
> need to be realtime and have not had an answer. fork/exec was "prompting".
I've been on vacation like most of us here and I'd like to avoid this
discussion over the weekend. But experienced RT app folks know this answer
already and that it's *not* created to put guarantees on this and "any thing
crossing the kernel boundary via a syscall" at this point. I've also said
this in previous emails but it went over your head or you didn't care to
to spend the time to understand mailings in the first place.
> >deadlines. Then at the same time, when someone describes what an RT
> >application typically does do, you claim how simple and trivial it all
> >is, and without knowing any of the details tell them that it'd be easy
> >to split it into separate processes.
>
> Err, your example was "reading a configuration file". Not exactly
> rocket science my good man.
Again, you didn't understand the variety of services being discussed
here.
Think about what you need to do for app that does sound (hard RT),
3d drawing (mostly soft RT for this example), reading disk IO that's
buffered.
By the time you get the sound playback and IO buffering going, you're
going to get a pretty complicated commuication layer already going
from those points. Now think, what if you intend to do a FFT over that
data and display it ?
It's starting to get unmanagably complicated at that point.
> I have a better idea. I won't read up on any of that, and I will go
> and do my own thing and stop wasting my time on this thread. Then
> whoever wants to start putting hard realtime functionality into Linux
> can *tell* me why nanokernels failed, OK? Let's end the discussion
> until then. It is going nowhere.
bill
On Mon, May 30, 2005 at 02:56:55PM -0400, Karim Yaghmour wrote:
> Which gets up back where we began: drivers that are non-deterministic
> will continue being deterministic regardless of what solution is adopted,
> if any, and will be in need of a re-write/major-modification, which
> itself will have little or no added value for non-rters ...
>From my memory DRM drivers have direct path to the vertical retrace
through the current ioctl() interface. It's not an issue for that driver
and probably many others that use simple syscalls like that.
The RT patch isn't hard to maintain and only one jerk-off objected to
it without providing any useful information why the single kernel
approach is faulty other than it jars his easily offended sensibilities
bill
On Mon, May 30, 2005 at 03:44:20PM -0400, Karim Yaghmour wrote:
> Esben Nielsen wrote:
> > very close to nil. (Please, prove me wrong if you have a RT IP-stack
> > and maybe a RT USB stack for RTAI.)
>
> Do take me seriously when I say that RTAI is seriously overlooked:
>
> RT-Net (real-time UDP over Ethernet):
> http://www.rts.uni-hannover.de/rtnet/
>
> RT-USB (real-time USB):
> https://mail.rtai.org/pipermail/rtai/2005-April/011192.html
> http://developer.berlios.de/projects/rtusb
>
> Both of these use RTAI on top of Adeos :P
I've always like your project and the track that it has taken with the
above along with the scheduler work. I am surprised that more folks don't
use it, but I think that has to do with the sucky web site and inability
for me and others to navigate through it for proper information.
bill
Bill Huey (hui) wrote:
>>From my memory DRM drivers have direct path to the vertical retrace
> through the current ioctl() interface. It's not an issue for that driver
> and probably many others that use simple syscalls like that.
This is rather short. Can you elaborate a little on what you're trying
to say here? thanks.
> The RT patch isn't hard to maintain and only one jerk-off objected to
> it without providing any useful information why the single kernel
> approach is faulty other than it jars his easily offended sensibilities
I didn't say the RT patch was hard to maintain. I said that it increased
the cost of maintenance for the rest of the kernel (which is the feeling
that seems to be echoed by other peoples' answers in this thread.)
BTW, please take a breath here. I'm not interested in taking part in a
flame-fest.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
Bill Huey (hui) wrote:
> Think about what you need to do for app that does sound (hard RT),
> 3d drawing (mostly soft RT for this example), reading disk IO that's
> buffered.
>
> By the time you get the sound playback and IO buffering going, you're
> going to get a pretty complicated commuication layer already going
> from those points. Now think, what if you intend to do a FFT over that
> data and display it ?
>
> It's starting to get unmanagably complicated at that point.
But that's a general argument for having hard-rt in the standard
kernel. Which one of these steps cannot, from your point of view,
be implemented in a nanokernel archiecture? ... keeping in mind
that, as Andi mentioned, the need for increased responsivness for
the mainstream kernel is relevant with or without PREEMT_RT and
that increasing responsiveness is a never-ending work-in-progress.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Mon, May 30, 2005 at 06:49:36PM -0400, Karim Yaghmour wrote:
> Bill Huey (hui) wrote:
> >>From my memory DRM drivers have direct path to the vertical retrace
> > through the current ioctl() interface. It's not an issue for that driver
> > and probably many others that use simple syscalls like that.
>
> This is rather short. Can you elaborate a little on what you're trying
> to say here? thanks.
Paths entering back into userspace are simple like the use of read() to
respond to events.
> I didn't say the RT patch was hard to maintain. I said that it increased
> the cost of maintenance for the rest of the kernel (which is the feeling
> that seems to be echoed by other peoples' answers in this thread.)
Sorry, the RT patch really doesn't effect general kernel development
dramatically. It's just exploiting SMP work already in place to get data
safety and the like. It does however kill all bogus points in the kernel
that spin-waits for something to happen, which is a positive thing for the
kernel in general since it indicated sloppy code. If anything it makes the
kernel code cleaner.
This is last day of vacation, but it doesn't feel like it unfortunately :}
bill
Bill Huey (hui) wrote:
> I've always like your project and the track that it has taken with the
> above along with the scheduler work. I am surprised that more folks don't
> use it, but I think that has to do with the sucky web site and inability
> for me and others to navigate through it for proper information.
<sarcasm-not-worth-responding-to>
Sucky web site without proper info ... hmm ... any chances you can point
me to the website for PREEMPT_RT, surely the professional design of it
and the included documentation will make me want to adopt it right away
... what's that you say, there's no website ...
</sarcasm-not-worth-responding-to>
:) seriously, though, I can't believe we've discouraged you because
we're very poor at website design. Surely after all that's been said
about the nanokernel approach you'd want to at least dedicate some
short amount of time for downloading the code and at least running
a diffstat on it or something ... or even better, giving it a test
ride. Philippe has even gone as far as providing patches providing
both PREEMPT_RT and Adeos under the same roof ... it doesn't get
much better than that ...
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Mon, May 30, 2005 at 06:54:44PM -0400, Karim Yaghmour wrote:
> Bill Huey (hui) wrote:
> > Think about what you need to do for app that does sound (hard RT),
> > 3d drawing (mostly soft RT for this example), reading disk IO that's
> > buffered.
> >
> > By the time you get the sound playback and IO buffering going, you're
> > going to get a pretty complicated commuication layer already going
> > from those points. Now think, what if you intend to do a FFT over that
> > data and display it ?
> >
> > It's starting to get unmanagably complicated at that point.
>
> But that's a general argument for having hard-rt in the standard
> kernel. Which one of these steps cannot, from your point of view,
> be implemented in a nanokernel archiecture? ... keeping in mind
No, I'm not that saying that it's impossible. It's just that's going
to be hell to write and maintain since you have deal with jitter across
various domains that influence each other. It's not unlike the "avoid
priority inversion by never letting threads of different priority lock
against each other" argument. It needs to be seperated. But this is an
issue for a single image system as well.
When I think about it in terms of dual kernel primitives, I really have
difficulty thinking about how to use the message queue stuff to integrate
all of the systems involved in particular with shared buffers. Proper
locking in those cases is scary to me for both methods, but at least
the single kernel image stuff uses familiar chunks of memory that I can
manipulate. I'm open to be proven wrong about this point if you have a
good example sources to show me. I really am.
> that, as Andi mentioned, the need for increased responsivness for
> the mainstream kernel is relevant with or without PREEMT_RT and
> that increasing responsiveness is a never-ending work-in-progress.
bill
Bill Huey (hui) wrote:
> Paths entering back into userspace are simple like the use of read() to
> respond to events.
Sure, but like Andi said, general increased responsiveness is not exclusive
to PREEMPT_RT, and any effort to reduce latency is welcome.
> Sorry, the RT patch really doesn't effect general kernel development
> dramatically. It's just exploiting SMP work already in place to get data
> safety and the like. It does however kill all bogus points in the kernel
> that spin-waits for something to happen, which is a positive thing for the
> kernel in general since it indicated sloppy code. If anything it makes the
> kernel code cleaner.
But wasn't the same said about the existing preemption code? Yet, most
distros ship with it disabled and some developers still feel that there
are no added benefits. What's the use if everyone is shipping kernels
with the feature disabled? From a practical point of view, isn't it then
obvious that such features catter for a minority? Wouldn't it therefore
make sense to isolate such changes from the rest of the kernel in as
much as possible? From what I read in response elsewhere, it does indeed
seem that there are many who feel that the changes being suggested are
far too instrusive without any benefit for most Linux users. But again,
I'm just another noise-maker on this list. Reading the words of those
who actually maintain this stuff is the best indication for me as to
what the real-time-linux community can and cannot expect to get into
the kernel.
> This is last day of vacation, but it doesn't feel like it unfortunately :}
I'm sorry you feel this way ... you do have the choice of not responding.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Mon, May 30, 2005 at 07:03:55PM -0400, Karim Yaghmour wrote:
> :) seriously, though, I can't believe we've discouraged you because
> we're very poor at website design. Surely after all that's been said
Yes you have.
> about the nanokernel approach you'd want to at least dedicate some
> short amount of time for downloading the code and at least running
> a diffstat on it or something ... or even better, giving it a test
> ride. Philippe has even gone as far as providing patches providing
> both PREEMPT_RT and Adeos under the same roof ... it doesn't get
> much better than that ...
When your random 10 second email post recently gives a more direct
pointer to the information I need or find interesting, then you know
your web site has serious problems. None of those papers are on a
top level link that can be easily accessed. The papers you folks do
are the best documentation outlining your work yet they are the most
difficult to find. That's a serious mistake.
Your sporatic posting on lkml are more informative than the web site.
bill
Bill Huey (hui) wrote:
> When I think about it in terms of dual kernel primitives, I really have
> difficulty thinking about how to use the message queue stuff to integrate
> all of the systems involved in particular with shared buffers. Proper
> locking in those cases is scary to me for both methods, but at least
> the single kernel image stuff uses familiar chunks of memory that I can
> manipulate. I'm open to be proven wrong about this point if you have a
> good example sources to show me. I really am.
Having shared buffers between adeos and Linux and/or rtai and Linux is
common practice. The're all living in the same address space anyway.
So from that point of view, just lock those pages in memory.
The issue then becomes, how do these domain all talk to each other.
At the lowest of levels, Adeos provides a fairly simple inter-domain
communication mechanism: virtual interrupts. If you have a driver that
must absolutely get hard-rt responsiveness, you load it as a priority
Adeos domain and have its hard-rt handler shoot virtual interupts to
its non-rt linux upper half using a virtual interrupt, which can then
do the rest of the work that would be done by an interrupt handler.
The reverse is also possible: use a normal linux driver to feed
virtual interrupts to higher-priority adeos domain.
These are basic primitives, and it isn't difficult to see how fancy
services can be built on. As is RTAI for example.
Again, none of this precludes working to reduce Linux's responsiveness,
but it may just save the need for modifying the locking mechanisms
or threading the interrupt handlers.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
Bill Huey (hui) wrote:
> Yes you have.
Then in this regard we have failed miserably. Any suggestion you may
have to make things better will be listened to.
> When your random 10 second email post recently gives a more direct
> pointer to the information I need or find interesting, then you know
> your web site has serious problems. None of those papers are on a
> top level link that can be easily accessed. The papers you folks do
> are the best documentation outlining your work yet they are the most
> difficult to find. That's a serious mistake.
Point taken.
Hopefully the information I'm providing in these postings will motivate
people to take a second look and, possibly, help us get things more
straight-forward for others to explore.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
Nick Piggin wrote:
> Sorry James, we were talking about hard realtime. Read the thread.
hard realtime = mathematically provable maximum latency
Yes, you'll want a nanokernel for that, you're right. That's because
one has to analyze every line of code, and protect against introduced
regressions, which is almost impossible given the pace that Linux-proper
is developed. Then there's the other 95% of applications, for which a
"statistical RT" approach such as used in the RT patch suffice. So
arguing for a nanokernel for (provable) hard realtime is orthogonal to
the discussion of this patch, and we apparently don't actually disagree.
If you look at your first two messages in this thread however, you seem
to be offering a nanokernel approach (in particular RTAI as suggested by
Cristoph) as an alternative to the RT-patch. This is sort of confused
by the fact that Ingo called it "hard realtime" because he measured a
maximum latency during a stress test. Unfortunately that's not really
hard realtime if you are just measuring it; Rather its "really damn good
soft realtime". An analysis of code paths could be done to determine if
something really does satisfy hard-RT constraints, but to my knowledge
that's not on the table at this point. So you're discussing soft
realtime if you're dicussing the RT patch.
So its really just a misunderstanding; Nanokernels certainly still have
a place for some applications even with the RT patches applied (Ingo has
said as much). However expecting audio applications such as Jack to
have to use RTAI is kind of silly, and would end up annoying the authors
of both (I'm sure the RTAI people have better things to do than support
ALSA drivers in RT mode).
> What's more, I don't think you understand how a nanokernel solution
> would work, nor have much idea about the complexity of implementing
> it in Linux (although that could have been a result of your thinking
> that we weren't talking about hard-rt).
Nanokernels for RT aren't that difficult when compared with the RT
patch, I agree with you on that. An RT scheduler is also pretty damn
easy to write (certainly easier than a general purpose one that can't
arbitrarily starve low-prio tasks). The complexity comes in when you
have to fork drivers to make them RT-compatible, or upcall into existing
ones in which case you're making the same modifications to code as in
the RT patch. Nanokernels work great for simpler hard realtime apps,
but poorly for complex softer-realtime apps. The RT patch addresses the
latter quite well.
> And my questions for which I got no answer were things like
> "why is a single kernel superior to a nanokernel for hard-RT?",
It's not better; The two methods best serve different types of applications.
> "what deterministic services would a hard-RT Linux need to provide?"
To start out with, nothing; It's better to let such applications develop
iteratively. In developing things such as Jack or my robot code, we
find out what things we can call without screwing up latency, and if we
think something could be fixed, we might ask about it on LKML to see if
someone will fix it. This model works pretty well in open source. You
can see my question about the Linux serial driver a few years ago, or
the many threads about Jack on this list.
I realize you don't like this approach, but that's pretty much how
things have been working for a while. The Jack people are using the RT
patch now, and will come back when they find something that doesn't work
as well as it seems it should. They did the same with preempt and the
lowlatency patches before it. A fixed set of requirements would be
nice, but these applications are evolving just as the kernel does.
> Err, your example was "reading a configuration file". Not exactly
> rocket science my good man.
For the third time: One model is easier to program for than the other,
neither makes anything impossible. Writing applications in assembler
isn't rocket science either, but even for "hello world" I'd rather use a
compiled language.
>> Please explain how a split-kernel method supports a continuous
>> progression from soft-realtime to hard-realtime, where each set of API
>> calls has associated latency effects that may or may not be tolerable
>> for a given application. That's the problem space, and I can guarantee
>> applications exist all along that progression, and many don't fall
>> cleanly into one side or the other.
>
> You say this like you have a confabulous solution ready to plonk
> into the Linux kernel.
I certainly don't, but I think someone else is on to a solution that can
achieve this eventually. When someone questioned "who really
wants/needs this stuff", then I piped up, along with a few others.
Many of us "RT-people" would love to to see the ordinary kernel get as
far as it can without a radical change in programming model. That means
we could write one Posix app that is realtime on Linux, and working but
possibly not realtime on older Linux versions and other operating
systems. We could tell users "use Linux 2.6.14 if you don't want the
system to hiccup". That is preferable to writing a special version of
the software for Linux just to get soft RT. That said, there will
always be a place for other approaches such as nanokernels for someone
controlling the proverbial industrial saw. For those applications you
want proof of hard realtime performance, but at the same time they don't
require streaming data off a disk or onto the network, nor using audio
hardware or serial radios for output.
[snip part about bothering to understand RT approaches]
I really hope we understand each other now, but if not I guess it wasn't
to be. Hopefully someone got something out of reading this discussion,
but I won't be posting on this branch of the thread anymore either.
- Jim Bruce
Bill Huey (hui) wrote:
> Your sporatic posting on lkml are more informative than the web site.
Here's one link I thought I'd mention:
The RTAI Testsuite LiveCD: http://issaris.org/rtai/
For those who want to give RTAI a try without having to go through
the hastle of hunting down patches and applying them.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Mon, 2005-05-30 at 19:15 -0400, Karim Yaghmour wrote:
> Sure, but like Andi said, general increased responsiveness is not exclusive
> to PREEMPT_RT, and any effort to reduce latency is welcome.
Nobody denies that, but thats no argument against a RT extension.
> But wasn't the same said about the existing preemption code? Yet, most
> distros ship with it disabled and some developers still feel that there
> are no added benefits.
One of the most disgusting arguments in this thread is "distros do XYZ".
There is a lot of Linux beyond distros.
tglx
Bill Huey (hui) wrote:
> On Tue, May 31, 2005 at 12:21:20AM +1000, Nick Piggin wrote:
>
>>And my questions for which I got no answer were things like
>>"why is a single kernel superior to a nanokernel for hard-RT?",
>>"what deterministic services would a hard-RT Linux need to provide?"
>
>
> That's an RT begineer's question. You have to at least be up to speed
> in that one to have the conversation at hand and folks have discussed
> this repeatedly. It's not our end that failing and clearly you not
> understanding this only reenforces this point..
>
Bill, you can belittle me to your heart's content. It really
doesn't bother me in the slightest.
Whenever you or anyone else try to complicate the Linux kernel
with hard-RT stuff, I'm going to ask exactly the same questions
because I don't think you know how a nanokernel solution would
work, or even what kind of services a hard-RT Linux would need
to provide.
Send instant messages to your online friends http://au.messenger.yahoo.com
James Bruce wrote:
> Nick Piggin wrote:
>
>> Sorry James, we were talking about hard realtime. Read the thread.
>
>
> hard realtime = mathematically provable maximum latency
>
> Yes, you'll want a nanokernel for that, you're right. That's because
> one has to analyze every line of code, and protect against introduced
> regressions, which is almost impossible given the pace that Linux-proper
Thank you, James. Now please tell that to Bill. It would seem
that I haven't written enough "RT media apps" for him to take
me seriously ;)
>
> If you look at your first two messages in this thread however, you seem
> to be offering a nanokernel approach (in particular RTAI as suggested by
> Cristoph) as an alternative to the RT-patch. This is sort of confused
> by the fact that Ingo called it "hard realtime" because he measured a
> maximum latency during a stress test. Unfortunately that's not really
> hard realtime if you are just measuring it; Rather its "really damn good
> soft realtime". An analysis of code paths could be done to determine if
> something really does satisfy hard-RT constraints, but to my knowledge
> that's not on the table at this point. So you're discussing soft
> realtime if you're dicussing the RT patch.
>
No, I clarified the point that the direction the RT people want
to go in is hard-realtime in the Linux kernel.
I'm very well aware of what the actual current PREEMPT_RT patch is,
and I was never talking about that particular patch.
> So its really just a misunderstanding; Nanokernels certainly still have
> a place for some applications even with the RT patches applied (Ingo has
> said as much). However expecting audio applications such as Jack to
> have to use RTAI is kind of silly, and would end up annoying the authors
> of both (I'm sure the RTAI people have better things to do than support
> ALSA drivers in RT mode).
>
Yes, Jack is more of a soft realtime application, and in that case
Linux supports it already today (although perhaps not very well -
something the RT patch aims to improve).
[snip rest]
>
> I really hope we understand each other now, but if not I guess it wasn't
> to be. Hopefully someone got something out of reading this discussion,
> but I won't be posting on this branch of the thread anymore either.
>
It seems that you do understand my position now, yes.
I'll try to refrain from posting further, too.
Nick
Send instant messages to your online friends http://au.messenger.yahoo.com
On Tue, May 31, 2005 at 11:21:49AM +1000, Nick Piggin wrote:
> Bill Huey (hui) wrote:
> >That's an RT begineer's question. You have to at least be up to speed
> >in that one to have the conversation at hand and folks have discussed
> >this repeatedly. It's not our end that failing and clearly you not
> >understanding this only reenforces this point..
...
> Bill, you can belittle me to your heart's content. It really
> doesn't bother me in the slightest.
It's not belittling, but venting my frustration at you not
understanding what I'm saying with an escalating ramp of force, jerk. :)
> Whenever you or anyone else try to complicate the Linux kernel
> with hard-RT stuff, I'm going to ask exactly the same questions
> because I don't think you know how a nanokernel solution would
> work, or even what kind of services a hard-RT Linux would need
> to provide.
Well, depends on the scope of the thing we're talking about. If
you mean pervasively throughout the kernel for every system, then
no at first if ever. If you mean for what the nanokernels are
commonly used for then, yes. We're quite close to that already
to be hard real time if you just trust eyeballing the core kernel
code. The remaining problems have been pretty much partitioned to
fringe file system logic, some networking code and things outside
of the core kernel (kernel/ generally). They'll have to be surveyed
and hammered manually. All other points, if you trust eyeballing,
should run within an interrupt plus a thread to enable if assuming
you're not runing within an interrupt/preempt off section.
Theorem proven kernels are another matter altogether, but in all
practicality we're very close to hard real time. Calling it soft
real time isn't exactly accurate too, but the thrust to get
theorem proven RT kernels recently has made the definitions more
rigid in this discussion, probably overly so. Linux will probably
never be submitted to any prover to do attain that. Very few,
(only one product of ours that I know of LynxOS 178) have taking
on that provability track. This is a highly competitive field.
There's many things being discussed. The original examples I've
given probably clouded things for you when I ment it to be clear
problem by setting the most extreme examples. Really, the problems
are more complicated than that and really are quite varied. But
the first step is to get at CPU resources in a deterministic
manner at first, the rest, in what ever form, comes later.
Food time :)
bill
On Mon, 2005-05-30 at 19:32 -0400, James Bruce wrote:
> This is sort of confused
> by the fact that Ingo called it "hard realtime" because he measured a
> maximum latency during a stress test. Unfortunately that's not
> really
> hard realtime if you are just measuring it; Rather its "really damn
> good
> soft realtime". An analysis of code paths could be done to determine
> if
> something really does satisfy hard-RT constraints, but to my
> knowledge
> that's not on the table at this point. So you're discussing soft
> realtime if you're dicussing the RT patch.
>
> So its really just a misunderstanding
No, *you're* the one misunderstanding.
Since *everything* is preemptible except a few known code paths whose
execution times determine the maximum possible latency from interrupt to
running the highest priority user process.
That's the determinism, no more, no less. But some people inexplicably
think this thread is about providing deterministic hard RT performance
for some subset of system calls, or disk IO or something, none of which
have anything to do with PREEMPT_RT.
Lee
On Mon, 2005-05-30 at 22:06 -0400, Lee Revell wrote:
> No, *you're* the one misunderstanding.
>
> Since *everything* is preemptible except a few known code paths whose
> execution times determine the maximum possible latency from interrupt to
> running the highest priority user process.
Sorry. Should read:
*Everything* is preemptible except a few known code paths, whose
execution times determine the maximum possible latency from interrupt to
running the highest priority user process.
But, you'd know that, if you'd followed the development in the slightest
bit.
And, I think Ingo knows what "determinism" and "hard realtime" mean. I
suggest you reread his posts.
Lee
Karim Yaghmour wrote:
> But wasn't the same said about the existing preemption code? Yet,
> most distros ship with it disabled and some developers still feel
> that there are no added benefits. What's the use if everyone is
> shipping kernels with the feature disabled? From a practical point of
> view, isn't it then obvious that such features catter for a minority?
That's a misrepresentation. It is well-known that Linux is used in
a wide range of embedded devices. The embedded space is very fragmented,
with lots of home-grown Linux platforms. And, I would speculate that
many of them (as well as commercial distros catering to the embedded
market) often enable preemption (including using non-mainlined kernel
preemption patches for 2.4 kernels).
Regards,
Manas
Lee Revell wrote:
> Since *everything* is preemptible except a few known code paths whose
> execution times determine the maximum possible latency from interrupt
> to running the highest priority user process.
Have all the code paths been audited? If there's a reference to an
analysis that's been done, please pass it on as I'd like to read it.
Remember that it must take into account completely cold L1 and L2 caches
for almost all of the computation, or its not truly a worst-case
analysis. If this has been done, I stand corrected. If not, then
there's no proven maximum latency, just statistical arguments that it
works well. Keep in mind that such an argument can be good enough for
most of the RT stuff people are doing, but I'm not putting my hand under
the saw just yet :)
> That's the determinism, no more, no less. But some people
> inexplicably think this thread is about providing deterministic hard
> RT performance for some subset of system calls, or disk IO or
> something, none of which have anything to do with PREEMPT_RT.
Well, that's the direction people want to take it in, since an RT thread
unable to receive any type of input or produce some type of output isn't
particularly useful for anything. First steps first, of course.
I really think the RT patches are great in what they achieve, but true
hard realtime does require proof, and I'm not aware of that having been
done (yet). However that's not a prerequisite for usefulness; A
measurement of 5 or 7 nines of reliability getting sub 100us latency
will certainly make most application writers happy.
- Jim Bruce
On Mon, 30 May 2005, Karim Yaghmour wrote:
>
> Here's for the fun of history, a diff between the 8390.c in 2.2.16 and the
> one in rt-net 0.9.0:
Hats off!
But it goes two ways: If the driver is running RT RTAI, the same driver is
running RT under PREEMPT_RT - with no modifications :-)
Esben
>
> 0a1,12
> > /*
> > rtnet/module/driver/8390.c
> > driver for 8390-based network interfaces
> >
> > rtnet - real-time networking subsystem
> > Copyright (C) 1999,2000 Zentropic Computing, LLC
> >
> > This file is a modified version of a source file located in the
> > standard Linux source tree. Information about how to find the
> > original version of this file is located in rtnet/original_files.
> > */
> >
> 49a62,63
> > #define EXPORT_SYMTAB
> >
> 71a86,137
> > #if defined(CONFIG_RTNET) || defined(CONFIG_RTNET_MODULE)
> > #include <rtnet/rtnet.h>
> > #define RT_DRIVER
> > static int ei_rt_start_xmit(struct sk_buff *skb, struct rt_device *rtdev);
> > #define RT_dev_alloc_skb(a) ((!ei_local->rt)?dev_alloc_skb(a):dev_alloc_rtskb(a))
> > #define RT_mark_bh(a) do{if(!ei_local->rt)mark_bh(a);}while(0)
> > #define RT_dev_kfree_skb(a) do{if(!ei_local->rt){dev_kfree_skb(a);}else{dev_kfree_rtskb(a);}}while(0)
> > #define RT_netif_rx(a) ((!ei_local->rt)?netif_rx(a):rtnetif_rx(a))
> > #define RT_printk(format,args...) rt_printk(format,##args)
> > #define RT_enable_irq(a) do{if(!ei_local->rt)enable_irq(a);else rt_enable_irq(a);}while(0)
> > #define RT_disable_irq_nosync(a) do{if(!ei_local->rt)disable_irq_nosync(a);else rt_disable_irq(a);}while(0)
> > #define RT_spin_lock(a) \
> > do{if(!ei_local->rt){ \
> > spin_lock(a); \
> > }else{ \
> > rt_spin_lock(a); \
> > }}while(0)
> > #define RT_spin_unlock(a) \
> > do{if(!ei_local->rt){ \
> > spin_unlock(a); \
> > }else{ \
> > rt_spin_unlock(a); \
> > }}while(0)
> > #define RT_spin_lock_irqsave(a,b) \
> > do{if(!ei_local->rt){ \
> > spin_lock_irqsave(a,b); \
> > }else{ \
> > (b)=rt_spin_lock_irqsave(a); \
> > }}while(0)
> > #define RT_spin_unlock_irqrestore(a,b) \
> > do{if(!ei_local->rt){ \
> > spin_unlock_irqrestore(a,b); \
> > }else{ \
> > rt_spin_unlock_irqrestore(b,a); \
> > }}while(0)
> > #define RT_spin_lock_init(a) spin_lock_init(a)
> > #else
> > #define DIFE(a,b) (a)
> > #define RT_dev_alloc_skb dev_alloc_skb
> > #define RT_mark_bh(a) mark_bh(a)
> > #define RT_dev_kfree_skb(a) dev_kfree_skb(a)
> > #define RT_netif_rx(a) netif_rx(a)
> > #define RT_printk printk
> > #define RT_enable_irq(a) enable_irq(a)
> > #define RT_disable_irq_nosync(a) disable_irq_nosync(a)
> > #define RT_spin_lock(a) spin_lock(a)
> > #define RT_spin_unlock(a) spin_unlock(a)
> > #define RT_spin_lock_irqsave(a,b) spin_lock_irqsave(a,b)
> > #define RT_spin_unlock_irqrestore(a,b) spin_unlock_irqrestore(a,b)
> > #define RT_spin_lock_init(a) spin_lock_init(a)
> > #endif
> >
> 157c223
> < printk(KERN_EMERG "%s: ei_open passed a non-existent device!\n", dev->name);
> ---
> > RT_printk(KERN_EMERG "%s: ei_open passed a non-existent device!\n", dev->name);
> 166c232
> < spin_lock_irqsave(&ei_local->page_lock, flags);
> ---
> > RT_spin_lock_irqsave(&ei_local->page_lock, flags);
> 171c237
> < spin_unlock_irqrestore(&ei_local->page_lock, flags);
> ---
> > RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
> 186c252
> < spin_lock_irqsave(&ei_local->page_lock, flags);
> ---
> > RT_spin_lock_irqsave(&ei_local->page_lock, flags);
> 188c254
> < spin_unlock_irqrestore(&ei_local->page_lock, flags);
> ---
> > RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
> 192a259,267
> > #ifdef RT_DRIVER
> > static int ei_start_xmit(struct sk_buff *skb, struct device *dev);
> >
> > static int ei_rt_start_xmit(struct sk_buff *skb, struct rt_device *rtdev)
> > {
> > return ei_start_xmit(skb,rtdev->dev);
> > }
> > #endif
> >
> 218c293
> < spin_lock_irqsave(&ei_local->page_lock, flags);
> ---
> > RT_spin_lock_irqsave(&ei_local->page_lock, flags);
> 222c297
> < spin_unlock_irqrestore(&ei_local->page_lock, flags);
> ---
> > RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
> 230,231c305,306
> < spin_unlock_irqrestore(&ei_local->page_lock, flags);
> < printk(KERN_WARNING "%s: xmit on stopped card\n", dev->name);
> ---
> > RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
> > RT_printk(KERN_WARNING "%s: xmit on stopped card\n", dev->name);
> 241c316
> < printk(KERN_DEBUG "%s: Tx timed out, %s TSR=%#2x, ISR=%#2x, t=%d.\n",
> ---
> > RT_printk(KERN_DEBUG "%s: Tx timed out, %s TSR=%#2x, ISR=%#2x, t=%d.\n",
> 256c331
> < spin_unlock_irqrestore(&ei_local->page_lock, flags);
> ---
> > RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
> 260,261c335,336
> < disable_irq_nosync(dev->irq);
> < spin_lock(&ei_local->page_lock);
> ---
> > RT_disable_irq_nosync(dev->irq);
> > RT_spin_lock(&ei_local->page_lock);
> 263a339
> > /* XXX not realtime! */
> 267,268c343,344
> < spin_unlock(&ei_local->page_lock);
> < enable_irq(dev->irq);
> ---
> > RT_spin_unlock(&ei_local->page_lock);
> > RT_enable_irq(dev->irq);
> 279c355
> < spin_lock_irqsave(&ei_local->page_lock, flags);
> ---
> > RT_spin_lock_irqsave(&ei_local->page_lock, flags);
> 281c357
> < spin_unlock_irqrestore(&ei_local->page_lock, flags);
> ---
> > RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
> 288c364
> < disable_irq_nosync(dev->irq);
> ---
> > RT_disable_irq_nosync(dev->irq);
> 290c366
> < spin_lock(&ei_local->page_lock);
> ---
> > RT_spin_lock(&ei_local->page_lock);
> 294c370
> < printk(KERN_WARNING "%s: Tx request while isr active.\n",dev->name);
> ---
> > RT_printk(KERN_WARNING "%s: Tx request while isr active.\n",dev->name);
> 296,297c372,373
> < spin_unlock(&ei_local->page_lock);
> < enable_irq(dev->irq);
> ---
> > RT_spin_unlock(&ei_local->page_lock);
> > RT_enable_irq(dev->irq);
> 299c375
> < dev_kfree_skb(skb);
> ---
> > RT_dev_kfree_skb(skb);
> 321c397
> < printk(KERN_DEBUG "%s: idle transmitter tx2=%d, lasttx=%d, txing=%d.\n",
> ---
> > RT_printk(KERN_DEBUG "%s: idle transmitter tx2=%d, lasttx=%d, txing=%d.\n",
> 329c405
> < printk(KERN_DEBUG "%s: idle transmitter, tx1=%d, lasttx=%d, txing=%d.\n",
> ---
> > RT_printk(KERN_DEBUG "%s: idle transmitter, tx1=%d, lasttx=%d, txing=%d.\n",
> 335c411
> < printk(KERN_DEBUG "%s: No Tx buffers free! irq=%ld tx1=%d tx2=%d last=%d\n",
> ---
> > RT_printk(KERN_DEBUG "%s: No Tx buffers free! irq=%ld tx1=%d tx2=%d last=%d\n",
> 340,341c416,417
> < spin_unlock(&ei_local->page_lock);
> < enable_irq(dev->irq);
> ---
> > RT_spin_unlock(&ei_local->page_lock);
> > RT_enable_irq(dev->irq);
> 393,394c469,470
> < spin_unlock(&ei_local->page_lock);
> < enable_irq(dev->irq);
> ---
> > RT_spin_unlock(&ei_local->page_lock);
> > RT_enable_irq(dev->irq);
> 396c472
> < dev_kfree_skb (skb);
> ---
> > RT_dev_kfree_skb (skb);
> 414c490
> < printk ("net_interrupt(): irq %d for unknown device.\n", irq);
> ---
> > RT_printk ("net_interrupt(): irq %d for unknown device.\n", irq);
> 425c501
> < spin_lock(&ei_local->page_lock);
> ---
> > RT_spin_lock(&ei_local->page_lock);
> 431c507
> < printk(ei_local->irqlock
> ---
> > RT_printk(ei_local->irqlock
> 437c513
> < spin_unlock(&ei_local->page_lock);
> ---
> > RT_spin_unlock(&ei_local->page_lock);
> 447c523
> < printk(KERN_DEBUG "%s: interrupt(isr=%#2.2x).\n", dev->name,
> ---
> > RT_printk(KERN_DEBUG "%s: interrupt(isr=%#2.2x).\n", dev->name,
> 456c532
> < printk(KERN_WARNING "%s: interrupt from stopped card\n", dev->name);
> ---
> > RT_printk(KERN_WARNING "%s: interrupt from stopped card\n", dev->name);
> 495c571
> < printk(KERN_WARNING "%s: Too much work at interrupt, status %#2.2x\n",
> ---
> > RT_printk(KERN_WARNING "%s: Too much work at interrupt, status %#2.2x\n",
> 499c575
> < printk(KERN_WARNING "%s: unknown interrupt %#2x\n", dev->name, interrupts);
> ---
> > RT_printk(KERN_WARNING "%s: unknown interrupt %#2x\n", dev->name, interrupts);
> 504c580
> < spin_unlock(&ei_local->page_lock);
> ---
> > RT_spin_unlock(&ei_local->page_lock);
> 527c603
> < printk(KERN_DEBUG "%s: transmitter error (%#2x): ", dev->name, txsr);
> ---
> > RT_printk(KERN_DEBUG "%s: transmitter error (%#2x): ", dev->name, txsr);
> 529c605
> < printk("excess-collisions ");
> ---
> > RT_printk("excess-collisions ");
> 531c607
> < printk("non-deferral ");
> ---
> > RT_printk("non-deferral ");
> 533c609
> < printk("lost-carrier ");
> ---
> > RT_printk("lost-carrier ");
> 535c611
> < printk("FIFO-underrun ");
> ---
> > RT_printk("FIFO-underrun ");
> 537,538c613,614
> < printk("lost-heartbeat ");
> < printk("\n");
> ---
> > RT_printk("lost-heartbeat ");
> > RT_printk("\n");
> 576c652
> < printk(KERN_ERR "%s: bogus last_tx_buffer %d, tx1=%d.\n",
> ---
> > RT_printk(KERN_ERR "%s: bogus last_tx_buffer %d, tx1=%d.\n",
> 593c669
> < printk("%s: bogus last_tx_buffer %d, tx2=%d.\n",
> ---
> > RT_printk("%s: bogus last_tx_buffer %d, tx2=%d.\n",
> 608c684
> < else printk(KERN_WARNING "%s: unexpected TX-done interrupt, lasttx=%d.\n",
> ---
> > else RT_printk(KERN_WARNING "%s: unexpected TX-done interrupt, lasttx=%d.\n",
> 641c717
> < mark_bh (NET_BH);
> ---
> > RT_mark_bh (NET_BH);
> 674c750
> < printk(KERN_ERR "%s: mismatched read page pointers %2x vs %2x.\n",
> ---
> > RT_printk(KERN_ERR "%s: mismatched read page pointers %2x vs %2x.\n",
> 704c780
> < printk(KERN_DEBUG "%s: bogus packet size: %d, status=%#2x nxpg=%#2x.\n",
> ---
> > RT_printk(KERN_DEBUG "%s: bogus packet size: %d, status=%#2x nxpg=%#2x.\n",
> 714c790
> < skb = dev_alloc_skb(pkt_len+2);
> ---
> > skb = RT_dev_alloc_skb(pkt_len+2);
> 718c794
> < printk(KERN_DEBUG "%s: Couldn't allocate a sk_buff of size %d.\n",
> ---
> > RT_printk(KERN_DEBUG "%s: Couldn't allocate a sk_buff of size %d.\n",
> 730c806
> < netif_rx(skb);
> ---
> > RT_netif_rx(skb);
> 740c816
> < printk(KERN_DEBUG "%s: bogus packet: status=%#2x nxpg=%#2x size=%d\n",
> ---
> > RT_printk(KERN_DEBUG "%s: bogus packet: status=%#2x nxpg=%#2x size=%d\n",
> 752c828
> < printk("%s: next frame inconsistency, %#2x\n", dev->name,
> ---
> > RT_printk("%s: next frame inconsistency, %#2x\n", dev->name,
> 790c866
> < printk(KERN_DEBUG "%s: Receiver overrun.\n", dev->name);
> ---
> > RT_printk(KERN_DEBUG "%s: Receiver overrun.\n", dev->name);
> 855c931
> < spin_lock_irqsave(&ei_local->page_lock,flags);
> ---
> > RT_spin_lock_irqsave(&ei_local->page_lock,flags);
> 860c936
> < spin_unlock_irqrestore(&ei_local->page_lock, flags);
> ---
> > RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
> 901c977
> < printk(KERN_INFO "%s: invalid multicast address length given.\n", dev->name);
> ---
> > RT_printk(KERN_INFO "%s: invalid multicast address length given.\n", dev->name);
> 980c1056
> < spin_lock_irqsave(&ei_local->page_lock, flags);
> ---
> > RT_spin_lock_irqsave(&ei_local->page_lock, flags);
> 982c1058
> < spin_unlock_irqrestore(&ei_local->page_lock, flags);
> ---
> > RT_spin_unlock_irqrestore(&ei_local->page_lock, flags);
> 1004c1080
> < spin_lock_init(&ei_local->page_lock);
> ---
> > RT_spin_lock_init(&ei_local->page_lock);
> 1010a1087,1097
> > #ifdef RT_DRIVER
> > {
> > struct ei_device *ei_priv = (struct ei_device *)dev->priv;
> >
> > if(ei_priv->rtdev == NULL)
> > ei_priv->rtdev = rt_dev_alloc(dev);
> >
> > ei_priv->rtdev->xmit = ei_rt_start_xmit;
> > }
> > #endif
> >
> 1061c1148
> < printk(KERN_ERR "Hw. address read/write mismap %d\n",i);
> ---
> > RT_printk(KERN_ERR "Hw. address read/write mismap %d\n",i);
> 1098c1185
> < printk(KERN_WARNING "%s: trigger_send() called with the transmitter busy.\n",
> ---
> > RT_printk(KERN_WARNING "%s: trigger_send() called with the transmitter busy.\n",
>
> Of course this is ancient, but I just thought I'd illustrate my point.
>
> Again, I suggest you drop the single vs. double application/driver, you
> get the same results and limitations regardless of the RT method you
> adopt.
>
> Karim
> --
> Author, Speaker, Developer, Consultant
> Pushing Embedded and Real-Time Linux Systems Beyond the Limits
> http://www.opersys.com || [email protected] || 1-866-677-4546
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Bill Huey (hui) wrote:
> Theorem proven kernels are another matter altogether, but in all
> practicality we're very close to hard real time. Calling it soft
> real time isn't exactly accurate too, but the thrust to get
> theorem proven RT kernels recently has made the definitions more
> rigid in this discussion, probably overly so. Linux will probably
> never be submitted to any prover to do attain that. Very few,
> (only one product of ours that I know of LynxOS 178) have taking
> on that provability track. This is a highly competitive field.
Perhaps we should call it soft-boiled realtime? I've always hated the
exact hard/soft distinction too, since its something that inherently has
two dimensions: (1) How fast does your code need to be serviced, and (2)
how often is it acceptable for it to fail. Even in a factory setting, a
machine whose control system fails once every 10 years is acceptable if
you can get better perfomance out of it. Also, good soft realtime for
audio can be quite a bit more difficult to implement than hard realtime
for controlling an oil tanker.
That said, its important not to claim something about a patch which
doesn't match the common definitions. Ingo has been very careful in the
claims he's made, but I think a lot of people have read his posts too
quickly and misinterpreted what he's claiming for the current patch.
This includes people on both sides of the fence. He's also been silent
for much of this discussion as its gotten out of hand, showing he's
clearly wiser than all of us.
- Jim Bruce
On Tue, 31 May 2005, Nick Piggin wrote:
> [...]
> Whenever you or anyone else try to complicate the Linux kernel
> with hard-RT stuff, [...]
The more I look at Ingo's RT patch the more I see a cleanup. It is the old
maybe-preemptive way which is a mess. There is so much the kernel
developer have to think off wrt. locking. Too many kind of contexts,
per cpu variables, miriads of locking types. When I started to look at it
I thought: What a mess.
PREEMPT_RT basicly boils it down to: everything are threads, the only way
to protect data is to use a mutex or use RCU. In short: Linux with
PREEMPT_RT is much easier to understand and develop than with !PREEMPT_RT.
Esben
James Bruce wrote:
> That said, its important not to claim something about a patch which
> doesn't match the common definitions. Ingo has been very careful in the
> claims he's made, but I think a lot of people have read his posts too
> quickly and misinterpreted what he's claiming for the current patch.
> This includes people on both sides of the fence. He's also been silent
> for much of this discussion as its gotten out of hand, showing he's
> clearly wiser than all of us.
>
I have never been in any doubt as to the specific claims I have
made. I continually have been talking about hard realtime from
start to finish, and it appears that everyone now agrees with me
that for hard-RT, a nanokernel solution is better or at least
not obviously worse at this stage.
Ingo actually of course has been completely rational and honest
the whole time - he actually emailed me to basically say "there
will be pros and cons of both, and until things develop further
I'm not completely sure".
Which I was pretty satisfied with. Then along came the lynch mob.
Send instant messages to your online friends http://au.messenger.yahoo.com
On Tue, May 31, 2005 at 07:33:38PM +1000, Nick Piggin wrote:
> James Bruce wrote:
> >claims he's made, but I think a lot of people have read his posts too
> >quickly and misinterpreted what he's claiming for the current patch.
> >This includes people on both sides of the fence. He's also been silent
> >for much of this discussion as its gotten out of hand, showing he's
> >clearly wiser than all of us.
>
> I have never been in any doubt as to the specific claims I have
> made. I continually have been talking about hard realtime from
> start to finish, and it appears that everyone now agrees with me
> that for hard-RT, a nanokernel solution is better or at least
> not obviously worse at this stage.
No, not true. That's large a myth created by dual kernel folks. The
scheduling and interrupt paths are highly optimized in Linux it's
unlikely that any other OS can really make it significantly better
in this area since the paths are branch hinted and cache sensitive.
This core logic is pretty much similar across most RTOSes.
> Ingo actually of course has been completely rational and honest
> the whole time - he actually emailed me to basically say "there
> will be pros and cons of both, and until things develop further
> I'm not completely sure".
There will be pro and cons of both, but in the end single kernel
aspects will win because app programmability issues. The dual kernel
boundary only exists because nobody took on the task of full kernel
preemptibility because of the broad amount of knowledge needed to
get the lock ordering correct as well as other concurrent conversions.
It's done now and dual kernel will have less of a strangle hold on
RT development in Linux. This will be inevitable as the technology
propagates.
> Which I was pretty satisfied with. Then along came the lynch mob.
The lynch mob is right. They have first hand experience with this
kind of work and understand the associated problem with this kind
of software development. This isn't some piecewise kernel hack that's
an easy tack on to the kernel, but a fundamentally different way
of looking at things. Understanding the concepts is mandatory here.
That's something that you're still not willing to learn which makes
discussions with you on the subject useless and pisses off the rest
of us.
bill
Nick Piggin wrote:
> I have never been in any doubt as to the specific claims I have
> made. I continually have been talking about hard realtime from
> start to finish, and it appears that everyone now agrees with me
> that for hard-RT, a nanokernel solution is better or at least
> not obviously worse at this stage.
It is only better in that if you need provable hard-RT *right now*, then
you have to use a nanokernel. The RT patch doesn't provide guaranteed
hard-RT yet[1], but it may in the future. Any RT application programmer
would rather write for a single image system than a split kernel. So if
it does eventually provide hard-RT, just about every new RT application
will target it (due to it being easier to program for). In addition it
radically improves soft-RT performance *now*, which a nanokernel doesn't
help with at all. "Best" would be getting preempt-RT to become
guaranteed hard-RT, or if that proves impossible, to have a nanokernel
in addition to preempt-RT's good statistical soft-RT guarantees.
I think where we violently disagree is that in your earlier posts you
seemed to imply that a nanokernel hard-RT solution obviates the need for
something like preempt-RT. That is not the case at all, and at the
moment they are quite orthogonal. In the future they may not be
orthogonal, because *if* preempt-RT patch becomes guaranteed hard-RT, it
would pretty much relegate nanokernels to only those applications
requiring formal verification.
- Jim Bruce
P.S. Preempt-RT is a sight to behold while updatedb is running. The
difference between it and ordinary preempt is quite impressive. Nothing
currently running has so much as a hiccup, even though / is using the
non-latency-friendly ReiserFS. The only way I even notice updatedb is
running at all is through my CPU monitor and the fact that disk IO is
slower.
[1] By this I mean on a system loaded with low priority tasks doing the
relatively arbitrary things one might do on a live system.
James Bruce wrote:
> Nick Piggin wrote:
>
>> I have never been in any doubt as to the specific claims I have
>> made. I continually have been talking about hard realtime from
>> start to finish, and it appears that everyone now agrees with me
>> that for hard-RT, a nanokernel solution is better or at least
>> not obviously worse at this stage.
>
>
> It is only better in that if you need provable hard-RT *right now*, then
> you have to use a nanokernel. The RT patch doesn't provide guaranteed
> hard-RT yet[1], but it may in the future. Any RT application programmer
This was my main line of questioning - what future direction will
do the RT guys want from the PREEMPT_RT work. I was concerned that
hard-realtime does not sound feasable for Linux.
[snip]
>
> I think where we violently disagree is that in your earlier posts you
> seemed to imply that a nanokernel hard-RT solution obviates the need for
> something like preempt-RT. That is not the case at all, and at the
Actually I think that is also where we violently agree ;)
If you look at some of my earlier posts, you'll see I had to
add 'disclaimers' until I was blue in the face. But I don't
blame you for not wanting to crawl through all that / or not
seeing it.
Basically: I know they are orthogonal, and I don't disagree
that generally better scheduling and interrupt latency would be
nice for Linux to have.
Now I'll really stop posting. Sorry everyone.
Send instant messages to your online friends http://au.messenger.yahoo.com
On Tue, May 31, 2005 at 06:48:50AM -0400, James Bruce wrote:
> P.S. Preempt-RT is a sight to behold while updatedb is running. The
> difference between it and ordinary preempt is quite impressive. Nothing
> currently running has so much as a hiccup, even though / is using the
> non-latency-friendly ReiserFS. The only way I even notice updatedb is
> running at all is through my CPU monitor and the fact that disk IO is
> slower.
Are you sure it is not only disk IO? In theory updatedb shouldn't
need much CPU, but it eats a lot of memory and causes stalls
in the disk (or at least that was my interpration on the stalls I saw)
If there is really a scheduling latency problem with updatedb
then that definitely needs to be fixed in the stock kernel.
-Andi
On 31 May 2005 13:14:45 +0200, Andi Kleen <[email protected]> wrote:
>
> Are you sure it is not only disk IO? In theory updatedb shouldn't
> need much CPU, but it eats a lot of memory and causes stalls
> in the disk (or at least that was my interpration on the stalls I saw)
> If there is really a scheduling latency problem with updatedb
> then that definitely needs to be fixed in the stock kernel.
Yeah true...I have actually never observed updatedb taking much of my
CPU cycles. It just eats up a lot of memory. When the load average on
the system is high, sometimes updatedb even results in a system
freeze.
--
Hari
On Tue, 31 May 2005, James Bruce wrote:
> Nick Piggin wrote:
> > I have never been in any doubt as to the specific claims I have
> > made. I continually have been talking about hard realtime from
> > start to finish, and it appears that everyone now agrees with me
> > that for hard-RT, a nanokernel solution is better or at least
> > not obviously worse at this stage.
>
> It is only better in that if you need provable hard-RT *right now*, then
> you have to use a nanokernel.
What do you mean by "provable"? Security critical? Forget about
nanokernels then. The WHOLE system has to be validated. If you want to a
system good enough to put (a lot of) money on it: Test, test, test.
I can't see it would be easier prove that a nano-kernel with various
needed mutex and queuing mechanism works correct than it is to prove that
the Linux scheduler with mutex and queueing mechanisms works correctly.
Both systems does the same thing and is most likely based on the same
principles!
If a module in Linux disables interrupts for a non-deterministic amount
of time, it destroys the RT in both scenarious. With the nanokernel,
the Linux kernel is patched not to disable interrupts, but if someone
didn't use the official local_irq_disable() macro the patch didn't work
anyway...
The only way you can be absolutely sure Linux doesn't hurt RT is to run
it in a full blown virtuel machine where it doesn't have access to disable
interrupts and otherwise interfere with the nano-kernel.
> The RT patch doesn't provide guaranteed
> hard-RT yet[1], but it may in the future. Any RT application programmer
> would rather write for a single image system than a split kernel. So if
> it does eventually provide hard-RT, just about every new RT application
> will target it (due to it being easier to program for). In addition it
> radically improves soft-RT performance *now*, which a nanokernel doesn't
> help with at all. "Best" would be getting preempt-RT to become
> guaranteed hard-RT, or if that proves impossible, to have a nanokernel
> in addition to preempt-RT's good statistical soft-RT guarantees.
I think it is nearly there. A few things needs to be revisited and a lot
of code paths you would have liked to use isn't RT (ioctl forinstance
still hits BKL :-( ).
>
> I think where we violently disagree is that in your earlier posts you
> seemed to imply that a nanokernel hard-RT solution obviates the need for
> something like preempt-RT. That is not the case at all, and at the
> moment they are quite orthogonal. In the future they may not be
> orthogonal, because *if* preempt-RT patch becomes guaranteed hard-RT, it
> would pretty much relegate nanokernels to only those applications
> requiring formal verification.
And I state that for those applications the nanokernel isn't good enough,
either. Otherwise I completely agree with you.
The nanokernel does have one thing PREEMPT_RT doesn't: Very short
latencies. To cap the interrupt latencies in Linux even further some
tricks will have to be made - and these would hurt performance. These
tricks are more or less what a nanokernel does: Running the whole of
Linux, including the scheduler, and all regions protected with raw
spinlocks with interrupts fully enabled.
Esben
>
> - Jim Bruce
>
On Sat, 28 May 2005, Zwane Mwaikambo wrote:
> On Fri, 27 May 2005, Bill Huey wrote:
>
> > > It isn't clear to me yet. I'm sure you can make your interrupt
> > > latencies look good, as with your scheduling latencies. But when
> >
> > My project was getting a solid spike at 4 usec for irq-thread
> > startups and Ingo's stuff is better. It's already there.
>
> Is that worst case?
So is that some sort of observable worst case value with a suitable
stress test load? You didn't answer this in your reply. I'll be setting up
my own test system soon to have a better look.
Thanks,
Zwane
On Tue, May 31, 2005 at 06:48:50AM -0400, James Bruce wrote:
> orthogonal, because *if* preempt-RT patch becomes guaranteed hard-RT, it
I don't see how can preempt-RT ever become hard-RT when a simple lock
hangs it. As soon as you call kernel code, you'll eventually hang,
kmalloc will have to allocate memory and pageout other stuff no matter
what.
I really hope embedded developers knows better and they don't get the
idea of using preempt-RT where hard-RT is required.
On Tue, 31 May 2005, Andrea Arcangeli wrote:
> On Tue, May 31, 2005 at 06:48:50AM -0400, James Bruce wrote:
> > orthogonal, because *if* preempt-RT patch becomes guaranteed hard-RT, it
>
> I don't see how can preempt-RT ever become hard-RT when a simple lock
> hangs it.
There is no "simple lock" as spinlock (or very very few). All locks are
mutexes - with priority inheritance! Ofcourse, hitting a lock which can be
held for a non-deterministic amount of time destroyes your RT - but so it
does in any RTOS.
The whole point of PREEMPT_RT is that what _other_, lower priority threads
are doing isn't going to affect you. They are _not_ disabling preemption
or locking you away. Ofcourse, as soon as you start to share resources
with other threads you have to be carefull. But priority inheritance
even makes that deterministic - provided that all code used under the lock
is deterministic. Same as for any RTOS.
> As soon as you call kernel code, you'll eventually hang,
> kmalloc will have to allocate memory and pageout other stuff no matter
> what.
Please, tell me why you think mlockall() doesn't protect my RT thread
against that problem. In the testcode I have made and run I have no
problems in practise, but I have not verified it by going through all the
mm-code. You know that code a whole lot better than I.
>
> I really hope embedded developers knows better and they don't get the
> idea of using preempt-RT where hard-RT is required.
I hope people will stop making such broad statements and reallize that
Linux can become a hard-RT OS (if not by "proof", at least by
meassurement). There is no conflict between a timesharing system scaling
to a lot of CPUs and a hard-RT system just because they are catogarized as
different in the text-books.
Esben
On Tue, May 31, 2005 at 05:07:45PM +0200, Esben Nielsen wrote:
> There is no "simple lock" as spinlock (or very very few). All locks are
> mutexes - with priority inheritance! Ofcourse, hitting a lock which can be
You mean all locks "will be" mutex? I'd rather prefer a mechanism to
handle priority inheritance in the spinlocks that disable preemption
temporarily during the deterministic (see below) critical section. Going
to sleep every time there' a contention can cause overscheduling and is
expensive if the critical section is small (especially when it's
deterministic).
> The whole point of PREEMPT_RT is that what _other_, lower priority threads
> are doing isn't going to affect you. They are _not_ disabling preemption
> or locking you away. Ofcourse, as soon as you start to share resources
> with other threads you have to be carefull. But priority inheritance
> even makes that deterministic - provided that all code used under the lock
> is deterministic. Same as for any RTOS.
"all code used under the lock is deterministic.", that means all linux
source in all critical sections has to be deterministic to provided
hard-RT. Of course if all critical sections were deterministic, even
spinlocks that disable preemption and disable local irq, would be
acceptable.
> Please, tell me why you think mlockall() doesn't protect my RT thread
> against that problem. In the testcode I have made and run I have no
mlockall only works with userspace memory, it doesn't affect kernel
memory allocated with kmalloc. kmalloc is called during select and
most other syscalls. mlockall won't prevent a silly parallel task from
doing mmap(MAP_SHARED) and forcing kmalloc to pageout stuff in order to
allocate memory.
One could change that to prevent a real time kmalloc/slab-allocator,
but it's quite enormous changes we're talking about here not just a
locking change (assuming you solve the above part of the deterministic
critical sections which is actually harder to provide than the VM side).
At best I think you could hope to execute a subset of syscalls with a
hard-RT behaviour with a subset of drivers and architectures, but whole
OS hard-RT sounds not very realistic to me with all sort of drivers
involved. Anybody with less than a 10 year release cycle probably
shouldn't depend on a hard-RT provided by preempt-RT with all possible
syscalls and drivers involved.
> I hope people will stop making such broad statements and reallize that
> Linux can become a hard-RT OS (if not by "proof", at least by
> meassurement). There is no conflict between a timesharing system scaling
> to a lot of CPUs and a hard-RT system just because they are catogarized as
> different in the text-books.
In theory I agree, in practice I think you've overstimating what it
means to make all critical sections deterministic (making the VM system
real time might be easier by using some huge reservation of ram, i.e.
absolutely non-generic kernel behaviour, and closer to a hard-RT OS than
a timesharing system, but doable).
For the determinism, you could do what Ingo did so far, that is to
"measure" but there's no way a "measurement" can provide an hard-RT
guarantee. The "measure" way is great for the lowlatency patches, and to
try to eliminate the bad-latencies paths, but it _can't_ guarantee a
"worst-case-latency".
If you're developing a medical system or an airplane, you can't risk
to kill people just because your measurement didn't accounted for a
certain workload.
Providing a math proof of "determinism" of the critical sections of a
system as large as linux is not feasible IMHO. If something you'd have
to create a software system that will provide the math proof.
I wouldn't trust humans for such a math work anyway, even if you could
afford to hire enough people. An automated system would be more
trustable, and that way you could hope to verify different linux kernel
versions in a reasonable amount of time, instead of just one.
So for hart-RT IMHO the only way is to never invoke syscalls and to run
always in irq context without sharing anything, with irqs going at max
prio using nanokernel or the patented way of redernining cli, that
people were doing for years before filing the patent. It's harder to
code that way, but that's the the price you pay to be guaranteed that
you won't block for an unknown amount of time, and I don't see other way
around it.
It scares me if people will use preemt-RT for hard-RT requirements. Ok,
if a cellphone crashes it's no big deal, but for real critical stuff
you can't play the measurement-russian-roulette.
On Tue, 2005-05-31 at 14:09 +0200, Esben Nielsen wrote:
> On Tue, 31 May 2005, James Bruce wrote:
> > It is only better in that if you need provable hard-RT *right now*, then
> > you have to use a nanokernel.
>
> What do you mean by "provable"? Security critical? Forget about
> nanokernels then. The WHOLE system has to be validated. If you want to a
> system good enough to put (a lot of) money on it: Test, test, test.
Interesting. I use to work for Martin Mariette in the early 90s testing
modules for aircraft engine controls. The code was about ten years old,
and that is because it was going under ten years of testing. The
operating system was custom made since at the time there were no
commercially available (that I knew) RTOS that could go under the
scrutiny of the Military Specs.
Later, while working at Lockheed, we had WindRiver over and they would
only give a small broken down (basically all features removed) OS that
Lockheed would be responsible for testing.
When someone mentions Hard-RT, this is what I think about. These are
the RTOS that control the airplanes that people fly in. If something
were to go wrong, people will die.
I no longer deal with that type of RT, now I still work with
applications that run on aircraft, but would not have the plane crash if
something was to go wrong. The system still had to be of a softer-RT to
give the required response, usually navigational. This is someplace
that a Linux with -RT or a nano kernel can go. The -RT patch may be
nicer since some applications are first written generically, and then
later need to become -RT for some reason or another. With the nano
approach this may take more effort. But at the moment, I'm working to
get -RT with some extra features for other things, but this is what I've
heard from others.
> I can't see it would be easier prove that a nano-kernel with various
> needed mutex and queuing mechanism works correct than it is to prove that
> the Linux scheduler with mutex and queueing mechanisms works correctly.
> Both systems does the same thing and is most likely based on the same
> principles!
Since the nano-kernel would be much smaller than the kernel, you don't
need to worry about a bad design as much that can cause a problem. I
don't know how easy it would be to separate all the paths that an RT
task uses, and make sure that there's not a lock that an RT task takes
that isn't taken someplace else that a non RT task can take for a long
time (even with PI). It's just that the kernel is so big to find
everything. This isn't impossible, but very difficult to check out.
-- Steve
Bill, I think you're chasing everyone off this thread ;-)
On Mon, 2005-05-30 at 19:15 -0400, Karim Yaghmour wrote:
> Bill Huey (hui) wrote:
> > Sorry, the RT patch really doesn't effect general kernel development
> > dramatically. It's just exploiting SMP work already in place to get data
> > safety and the like. It does however kill all bogus points in the kernel
> > that spin-waits for something to happen, which is a positive thing for the
> > kernel in general since it indicated sloppy code. If anything it makes the
> > kernel code cleaner.
>
> But wasn't the same said about the existing preemption code? Yet, most
> distros ship with it disabled and some developers still feel that there
> are no added benefits. What's the use if everyone is shipping kernels
> with the feature disabled? From a practical point of view, isn't it then
> obvious that such features catter for a minority? Wouldn't it therefore
> make sense to isolate such changes from the rest of the kernel in as
> much as possible? From what I read in response elsewhere, it does indeed
> seem that there are many who feel that the changes being suggested are
> far too instrusive without any benefit for most Linux users. But again,
> I'm just another noise-maker on this list. Reading the words of those
> who actually maintain this stuff is the best indication for me as to
> what the real-time-linux community can and cannot expect to get into
> the kernel.
Karim,
I would assume that the distros would ship without PREEMPT enabled
because it was (and probably still is) considered unstable. The distros
would prefer to have less responsive machines (not saying PREEMPT helps
the normal desktop user) than risking a machine crash. That would be
much more noticeable to the user!
The PREEMPT is already there, and if you were to add PREEMPT_RT then I
don't think many of the developers would notice. Now if someone found a
problem some where with PREEMPT_RT and complained to the maintainer, the
maintainer should (rightfully) tell them to complain to the RT
maintainers (Ingo and others, I'll help when I can). But the way Ingo's
patch works now, is to not change the way the kernel looks to the
devices. The device driver is still written the same and when a problem
occurs, that something doesn't work right with -RT, one of us fixes it,
and then submits it to the maintainer of the code. So the only extra
work a maintainer would have is dealing with those maintaining RT.
-- Steve
On Mon, 2005-05-30 at 14:40 +0200, Andi Kleen wrote:
> On Mon, May 30, 2005 at 02:10:31PM +0200, Ingo Molnar wrote:
> >
> > * Andi Kleen <[email protected]> wrote:
> >
> > >
> > > Yeah, but you did a lot of (often unrelated to rt preempt) latency
> > > fixes in RT that are not yet merged into mainline. When they are all
> > > merged things might be very different. And then there can be probably
> > > more fixes.
> >
> > your argument above == cond_resched() in might_sleep() [ == VP ] is the
> > only way to get practical (e.g. jack) latencies.
>
> My argument was basically that we have no other choice than
> to fix it anyways, since the standard kernel has to be usable
> in this regard.
>
> (it is similar to that we e.g. don't do separate "server VM" and "desktop VM"s
> although it would be sometimes tempting. after all one wants a kernel
> that works well on a variety of workloads and doesn't need to extensive
> hand tuning)
The cond_resched approach degenerates to basically "polling",
whether an RT task is ready to run.
This resembles the earliest RT systems, known as cyclic executive.
Folks moved away from that in the 1970s, because it was difficult
to maintain, since each time you add a big new feature, you have
to re-tune the system to make sure you are polling often enough.
In the long term, who is going to go through 10,000 non-preemptible
sections, and put the cond_resched's at exactly the same place?
Who is going to educate the driver folks, that they need to do
cond_resched() every so often to meet specs.
Who is going to enforce that in 6 million lines of code ?
The Linux kernel is enjoying a very broad base of application
coverage, and the big server distros may not (yet) see the
need for preemption.
The big distros and every Linux server system will be the minority
of Linux deployment when Linux takes a solid foothold in mobile
applications. (cell phones, pda's, music players, etc.)
All these gadgets are battery powered.
They have to balance weight(battery), size(device), power(CPU).
This combination of design contraints AUTOMATICALLY imposes RT
constraints on the software, since the CPU parts must be chosen
for minimal power consumption, so that you can minimize everything
else.
That means high CPU loads, which implies priorities and RT constraints,
especially in the presence of external connectivity. No one is going
to use a device for very long which drops its connections because
of transient overloads.
The RT patch provides tunability, which allows you to CHOOSE, what
level of preemption and locking you need, all the way to hard RT.
The folks who need hard RT would RATHER use Linux, but it doesn't
offer the performance at this time.
The RT work will allow that choice, without inconveniencing
the big distros, who can continue to run non-preemptable,
without impact.
When they do see the need for better preemption performance -
they would have stable technology to help them along the way.
Sven
On Tue, 31 May 2005, Steven Rostedt wrote:
> On Tue, 2005-05-31 at 14:09 +0200, Esben Nielsen wrote:
> > On Tue, 31 May 2005, James Bruce wrote:
>
> > > It is only better in that if you need provable hard-RT *right now*, then
> > > you have to use a nanokernel.
> >
> > What do you mean by "provable"? Security critical? Forget about
> > nanokernels then. The WHOLE system has to be validated. If you want to a
> > system good enough to put (a lot of) money on it: Test, test, test.
>
> Interesting. I use to work for Martin Mariette in the early 90s testing
> modules for aircraft engine controls. The code was about ten years old,
> and that is because it was going under ten years of testing. The
> operating system was custom made since at the time there were no
> commercially available (that I knew) RTOS that could go under the
> scrutiny of the Military Specs.
>
> Later, while working at Lockheed, we had WindRiver over and they would
> only give a small broken down (basically all features removed) OS that
> Lockheed would be responsible for testing.
Exactly the point: Once you use some of their add-ons it is no
certified. This corresponds to use Linux in _any_ way - nano or preempt
RT.
>
> When someone mentions Hard-RT, this is what I think about. These are
> the RTOS that control the airplanes that people fly in. If something
> were to go wrong, people will die.
Well, I consider "hard RT" as something where deadlines are mission
critical. Doesn't need to involve human lives. How secure you need to be
depends on the consequences.
Where I work it boils down to that the system behaves the same in the real
world as in test. It is not that we can prove teoretically that we can
schedule but that we are confident that some new load we see when we go
out in the real world doesn't dramatically alter our timing of our
critical tasks. But even with an RTOS that have indeed happened for us
because people have forgotten stuff to use priority inheritance or
because a system call was blocking under special circumstances.
For security critical stuff I am under the impression that the most
important stuff is to make a lot of paperwork such that if something goes
wrong no body can blame you....
>
> I no longer deal with that type of RT, now I still work with
> applications that run on aircraft, but would not have the plane crash if
> something was to go wrong. The system still had to be of a softer-RT to
> give the required response, usually navigational. This is someplace
> that a Linux with -RT or a nano kernel can go. The -RT patch may be
> nicer since some applications are first written generically, and then
> later need to become -RT for some reason or another. With the nano
> approach this may take more effort. But at the moment, I'm working to
> get -RT with some extra features for other things, but this is what I've
> heard from others.
>
> > I can't see it would be easier prove that a nano-kernel with various
> > needed mutex and queuing mechanism works correct than it is to prove that
> > the Linux scheduler with mutex and queueing mechanisms works correctly.
> > Both systems does the same thing and is most likely based on the same
> > principles!
>
> Since the nano-kernel would be much smaller than the kernel, you don't
> need to worry about a bad design as much that can cause a problem. I
> don't know how easy it would be to separate all the paths that an RT
> task uses, and make sure that there's not a lock that an RT task takes
> that isn't taken someplace else that a non RT task can take for a long
> time (even with PI). It's just that the kernel is so big to find
> everything. This isn't impossible, but very difficult to check out.
>
My point was that you have to consider the whole thing, nano + Linux
kernels. Anyplace where a irq_disable()/preempt_disable() is used you are
in trouble wrt. having RT at all. I can't see nano is dealing with it much
better than PREEMPT_RT is: Both replace spinlock's with something which
doesn't use those primitives.
As for local RT behaviour, i.e. making sure your tasks doesn't go into
non-RT locks, I agree that nano-kernels is easier as it is smaller. But as
soon as you start to make large applications emedding different
timescales, importings stacks from the outside you start to get
into similar troubles.
For me the solution is relatively clear in both cases : A static code
checkers marking calls safe/non-safe combined with expert knowledge about
the system.
> -- Steve
>
Esben
Andi Kleen wrote:
> Are you sure it is not only disk IO? In theory updatedb shouldn't
> need much CPU, but it eats a lot of memory and causes stalls
> in the disk (or at least that was my interpration on the stalls I saw)
> If there is really a scheduling latency problem with updatedb
> then that definitely needs to be fixed in the stock kernel.
I don't know, Debian's updatedb always seemed to suck up most of the CPU
for me. I am using ReiserFS with tail-packing on, which certainly
balances on the side of more CPU vs IO. Also I wouldn't be surprised if
other distros had some better approach than Debian's, which appears to
be a series of "find | sort" commands. As one would expect, find causes
most of the system load and sort causes user load spikes.
That said, preempt-RT is certainly not free right now. Sending network
messages at 60Hz appears to load this 2GHz system by about 8%, while
that workload barely shows up in stock. I figure there's still some
optimization work to be done, but obviously it's unlikely to ever be as
efficient as non-preempt-RT. The more interesting question is whether
it's any slower with the RT patch applied, but preemption turned off.
From the implementation approach, I don't think it will show any
difference from stock, but it's certainly something we've got to test a
fair amount to be sure.
- Jim Bruce
On Tue, May 31, 2005 at 12:18:03PM -0400, Steven Rostedt wrote:
> Later, while working at Lockheed, we had WindRiver over and they would
> only give a small broken down (basically all features removed) OS that
> Lockheed would be responsible for testing.
>
> When someone mentions Hard-RT, this is what I think about. These are
I think testing is the wrong word. The code should be demonstrated to be
correct, and to do so it must be stripped down and as simple as
possible. Then the more testing the better to verify it's all right, but
people shouldn't depend _only_ on huge testing. Probably linux is too
big anyway for those usages, but certainly one needs a guarantee of
hard-RT for those usages that preempt-RT sure can't provide (while
nanokernel/RTAI could at least in theory provide it, assuming rest of
linux itself has no bugs and no memory corruption/deadlocks leading to a
full system crash).
On Tue, 2005-05-31 at 19:11 +0200, Andrea Arcangeli wrote:
> On Tue, May 31, 2005 at 12:18:03PM -0400, Steven Rostedt wrote:
> > Later, while working at Lockheed, we had WindRiver over and they would
> > only give a small broken down (basically all features removed) OS that
> > Lockheed would be responsible for testing.
> >
> > When someone mentions Hard-RT, this is what I think about. These are
>
> I think testing is the wrong word. The code should be demonstrated to be
> correct, and to do so it must be stripped down and as simple as
> possible. Then the more testing the better to verify it's all right, but
> people shouldn't depend _only_ on huge testing. Probably linux is too
> big anyway for those usages, but certainly one needs a guarantee of
> hard-RT for those usages that preempt-RT sure can't provide (while
> nanokernel/RTAI could at least in theory provide it, assuming rest of
> linux itself has no bugs and no memory corruption/deadlocks leading to a
> full system crash).
How does one demonstrate that something works without a test. You may
call it a "demo", but in reality it is just another test. It's been
quite some time since I use to work on that, and I never read the
MilSpec myself, I was just told what to do by those that did read it.
But I would still call it testing. Every requirement must have a way to
prove that it was fulfilled, whether it was by "demo", inspection, or
measurement, I would call all those tests.
One of the tests that were done was to inspect ever module (or function)
for every code path it took. This grows exponential with every branch.
Programs were written for each of these modules testing all paths by
sending in the input and seeing if the expected output was returned.
Binary branches had to be tested for all enumerations. "Greater Than",
"Less Than", "Equals" (and variants ">=") was tested for one unit less
than, equal and one unit greater than. This was only done this
extensively at the module level, then there were other tedious tests at
the integration level, and system level. Could you imagine what it
would take to do this with Linux! Linux is much bigger than that code
that ran the engine of an aircraft, and that testing took ten years!
Not to mention that Linux is a moving target, and the engine control
code was designed for a single purpose and a single type of hardware.
Before I put my hand under that saw, I would want to test it several
times with a hotdog first!
-- Steve
On Tue, May 31, 2005 at 01:42:59PM -0400, Steven Rostedt wrote:
> How does one demonstrate that something works without a test. You may
> call it a "demo", but in reality it is just another test. It's been
> quite some time since I use to work on that, and I never read the
> MilSpec myself, I was just told what to do by those that did read it.
> But I would still call it testing. Every requirement must have a way to
> prove that it was fulfilled, whether it was by "demo", inspection, or
> measurement, I would call all those tests.
With testing I meant to run the OS on the bare hardware in the
final configuration and verifying that it works (possibly by measuring
the worst case latencies you get during the testing, like what Ingo does
to claim worst case latency for preempt-RT).
> One of the tests that were done was to inspect ever module (or function)
> for every code path it took. This grows exponential with every branch.
Yes, that's what I meant.
> the integration level, and system level. Could you imagine what it
> would take to do this with Linux! Linux is much bigger than that code
> that ran the engine of an aircraft, and that testing took ten years!
Indeed, that's why I believe hard-RT with preempt-RT is just a joke.
> Not to mention that Linux is a moving target, and the engine control
> code was designed for a single purpose and a single type of hardware.
Exactly.
> Before I put my hand under that saw, I would want to test it several
> times with a hotdog first!
;)
On Tue, 2005-05-31 at 19:51 +0200, Andrea Arcangeli wrote:
> On Tue, May 31, 2005 at 01:42:59PM -0400, Steven Rostedt wrote:
> > the integration level, and system level. Could you imagine what it
> > would take to do this with Linux! Linux is much bigger than that code
> > that ran the engine of an aircraft, and that testing took ten years!
>
> Indeed, that's why I believe hard-RT with preempt-RT is just a joke.
I think the main problem with this thread is the definition of what
people call hard-RT. I came from the defense industry and my version of
what hard-RT is, is what I believe you think is hard-RT. But now I'm
starting to work with more commercial industries, and I'm finding their
terminology of what hard-RT is different. This really boils down to the
terminology of hard and soft. Because, what I think of soft-RT is not
as good as what the preempt-RT patch does. You need more too it.
Probably, what I was talking about is diamond hard, and Ingo's RT patch
is metal hard. PREEMPT is just wood hard and !PREEMPT is plastic hard*.
Leaving MS Windows as feather hard ;-)
The levels of RT is really what can be guaranteed and can be proved (or
clearly demonstrated). What controls an aircraft is obviously going to
have much more scrutiny than what is controlling your cell phone. I
believe that what the -RT patch is giving us, is something that can give
the Linux kernel more that it can guarantee, but not everything. Which I
think is a good thing (and keeps me employed :-)
I don't think that hard-RT in Linux would ever be used for life or death
critical devices, like cat-scan machines or aircraft. But I do see it
more for telecommunication and as others said, music. Before I left
Lockheed, they were looking into using a version of a RT Linux for use
for applications running on the plane (not controlling it). The
requirements called for a soft-RT+ OS, but those requirements were much
more stringent than what some so called hard-RTOS could produce.
-- Steve
* OK, maybe still not as hard as what is mentioned, but I couldn't think
of better terminology. I do stand by what I called diamond and what I
called feather. ;-)
+ I know I contradicted myself by saying soft-RT is very weak and then
the requirements for soft-RT were very hard. But I never agreed with
Lockheed's use of the term soft-RT. But I guess, it was stressed that
the OS didn't need to be tested the same, and as mentioned, the lack of
terminology for this is the source of most problems, as is demonstrated
on this thread!
On Tue, May 31, 2005 at 06:11:57PM +0200, Andrea Arcangeli wrote:
Quite the party going on with this thread!!!
> At best I think you could hope to execute a subset of syscalls with a
> hard-RT behaviour with a subset of drivers and architectures, but whole
> OS hard-RT sounds not very realistic to me with all sort of drivers
> involved. Anybody with less than a 10 year release cycle probably
> shouldn't depend on a hard-RT provided by preempt-RT with all possible
> syscalls and drivers involved.
This is key. Although there may well be some realtime application
that requires -all- of Linux's syscalls and drivers, it seems that a
large number of them are happy with a small subset. Some are even OK
with user-mode execution as the only realtime service. One example
of this would be an application that maps the device registers into
user space and that pins the memory that it needs. In these cases,
the "rest of the kernel" need not provide services deterministically.
Instead, it need only be able to give up the CPU deterministically.
As I understand it, this last is the point of CONFIG_PREEMPT_RT.
I agree that making each and every component of Linux provide realtime
services (as opposed to just staying out of the way of realtime tasks)
would take quite a bit of time and effort. For example, keeping (say)
TCP/IP from interfering with realtime user-mode execution is not all
that difficult, but getting realtime response from a TCP/IP connection
across Internet is another matter.
> > I hope people will stop making such broad statements and reallize that
> > Linux can become a hard-RT OS (if not by "proof", at least by
> > meassurement). There is no conflict between a timesharing system scaling
> > to a lot of CPUs and a hard-RT system just because they are catogarized as
> > different in the text-books.
>
> In theory I agree, in practice I think you've overstimating what it
> means to make all critical sections deterministic (making the VM system
> real time might be easier by using some huge reservation of ram, i.e.
> absolutely non-generic kernel behaviour, and closer to a hard-RT OS than
> a timesharing system, but doable).
>
> For the determinism, you could do what Ingo did so far, that is to
> "measure" but there's no way a "measurement" can provide an hard-RT
> guarantee. The "measure" way is great for the lowlatency patches, and to
> try to eliminate the bad-latencies paths, but it _can't_ guarantee a
> "worst-case-latency".
There are (at least!) two competing definitions for "hard real time":
1. Absolute guarantee of meeting the deadlines.
2. Any failure to meet the deadline results in failure of the
application.
Definition #1 is attractive, especially for applications where human life
is at stake. However, for less critical applications, the problem with
#1 is that there are other sources of failure. For example, if all of
the CPUs die, you are not going to meet your deadline no matter how the
software is coded. Not even hard-coded assembly language on bare metal
can save you in this situation.
Definition #2 is in some sense more practical, but one must also supply
a required probability of success. Otherwise, one can make -any- app
be "hard realtime" as follows:
my_deadline = time(0) + TIME_ALLOWED_FOR_SOMETHING;
/* do something that has a deadline */
if (missed_deadline(my_deadline)) {
abort();
}
I suspect that Linux can meet definition #1 for only a very restricted
set of services (e.g., user-mode execution for an app that has pinned
memory). I expect that Linux would be able to meet definition #2 for
a larger set of services. But this can be done incrementally, adding
deterministic implementations of services as needed.
> If you're developing a medical system or an airplane, you can't risk
> to kill people just because your measurement didn't accounted for a
> certain workload.
>
> Providing a math proof of "determinism" of the critical sections of a
> system as large as linux is not feasible IMHO. If something you'd have
> to create a software system that will provide the math proof.
> I wouldn't trust humans for such a math work anyway, even if you could
> afford to hire enough people. An automated system would be more
> trustable, and that way you could hope to verify different linux kernel
> versions in a reasonable amount of time, instead of just one.
Agreed, I certainly would not trust a hand-made proof of determinism!
Even an automated proof has the possibility of bugs in the proof software.
But it would be necessary to specify which parts of the kernel needed
to meet realtime scheduling guarantees. If the application does not
use a VGA driver, then it is only necessary to show that the VGA driver
does not interfere with realtime processes -- one does not have to
make the VGA driver itself deterministic. Instead, one only has to
make sure that the VGA driver lets go of the CPU deterministically.
> So for hart-RT IMHO the only way is to never invoke syscalls and to run
> always in irq context without sharing anything, with irqs going at max
> prio using nanokernel or the patented way of redernining cli, that
> people were doing for years before filing the patent. It's harder to
> code that way, but that's the the price you pay to be guaranteed that
> you won't block for an unknown amount of time, and I don't see other way
> around it.
>
> It scares me if people will use preemt-RT for hard-RT requirements. Ok,
> if a cellphone crashes it's no big deal, but for real critical stuff
> you can't play the measurement-russian-roulette.
For the really critical stuff, some projects have assigned three different
teams to implement in three different languages and runtime environments,
and then coupled the resulting systems into a triple-module-redundancy
configuration. Horribly expensive, but worth it in some cases.
Thanx, Paul
On Mon, May 30, 2005 at 07:32:10PM -0400, James Bruce wrote:
> Nick Piggin wrote:
> >Sorry James, we were talking about hard realtime. Read the thread.
>
> hard realtime = mathematically provable maximum latency
>
> Yes, you'll want a nanokernel for that, you're right. That's because
> one has to analyze every line of code, and protect against introduced
> regressions, which is almost impossible given the pace that Linux-proper
> is developed. Then there's the other 95% of applications, for which a
> "statistical RT" approach such as used in the RT patch suffice. So
> arguing for a nanokernel for (provable) hard realtime is orthogonal to
> the discussion of this patch, and we apparently don't actually disagree.
In the real world, this isn't really possible. Ideally, you'd like to be able
to offer some proof of correctness for the software, but this isn't actually
going to get you provable maximum latency, because you can't prove the
hardware.
Even with perfect software, the hardware is subject to cosmic rays, bad design,
etc. Even if you strongly control the hardware for latency, eg. turn off
cache and try to make sure everything is measurable, in the end the real proof
that your device does what it says it does is measurement. If the RTOS
guarantees aren't violated during testing, or at best, in a time period
comparable to the failure rate of the hardware, that's "good enough."
Given that hardware is always subject to failure or flakiness, the more
practical distinction between "hard" and "soft" realtime is whether the failure
rate is measurable or is lost in the noise of other failure modes such as
hardware. "Soft" RT typically means that the failure rate is measurable but
may be sufficient for particular tasks, and in comparison "hard" means the
software is thought to be correct within your ability to measure.
Certainly there's a lot of value for some applications in trying to control the
software well enough that all the latencies can be understood and characterized
by inspection, but on any sort of consumer commodity hardware system this is
really not going to buy you much. There are so many potential latencies just
due to wacky hardware that even a "perfect" RTOS is going to be subject to all
sorts of weird latencies and bizarre issues eg. with interrupt routing and CPU
thermal control and the like.
Showing that the application works as intended is really just going to be a
matter of showing that on a particular system, the latency requirements are met
under load. Which is exactly a sort of statistical approach. For almost all
"PC" applications that need realtime, this is exactly what's desired.
And clearly, the ultimate test of any RT system is exactly a "statistical" test
- can it be measured to fail, and if so, why and how often?
For limited embedded applications, a "hard" nanokernel approach can certainly
lead to higher confidence that the device works as intended, but for anything
outside of embedded products it's really not very practical. Nobody's going to
run their desktop OS under a nanokernel just to make their DVD software work
right.
-J
On Tue, 2005-05-31 at 12:29 -0400, Steven Rostedt wrote:
> Bill, I think you're chasing everyone off this thread ;-)
>
That's fine with me, this thread is asinine and a complete WOB.
Absolutely nothing new has been said (well it's apparently new to people
why didn't pay any attention to PREEMPT_RT development so far) and we're
just going to repeat the whole stupid process when Ingo *actually
submits -RT for acceptance*.
Lee
On Tue, 2005-05-31 at 12:29 -0400, Steven Rostedt wrote:
> I would assume that the distros would ship without PREEMPT enabled
> because it was (and probably still is) considered unstable.
I suspect this is no longer the case, as the -RT development process has
fixed many, many of these bugs.
What would the point of shipping with PREEMPT enabled have been anyway,
when you could still get 20-30ms bumps? You'd still need huge buffers
for audio to work at all. Now that PREEMPT in mainline actually works
reasonably (1-2ms by some accounts, also due to side effects of
PREEMPT_RT development) there might be a reason to enable it.
Lee
On Tue, May 31, 2005 at 12:29:35PM -0400, Steven Rostedt wrote:
> I would assume that the distros would ship without PREEMPT enabled
> because it was (and probably still is) considered unstable.
In addition to that it is slow too due to much increased locking
overhead.
-Andi
On Tue, 2005-05-31 at 22:01 +0200, Andi Kleen wrote:
> On Tue, May 31, 2005 at 12:29:35PM -0400, Steven Rostedt wrote:
> > I would assume that the distros would ship without PREEMPT enabled
> > because it was (and probably still is) considered unstable.
>
> In addition to that it is slow too due to much increased locking
> overhead.
Doesn't this imply that distros will need to ship different kernels for
their desktop and server oriented products anyway?
Lee
how much of a slowdown is it?
distros already throw >20% performance improvements on the floor to
simplify their lives by reduceing the number of different binary kernels
they have to support.
somehow I don't think a 5% or so (which is the locking overhead of running
a SMP kernel on UP last I heard) would be the end of the world for them,
especially if it made multimedia eye candy work smoother.
David Lang
On Tue, 31 May 2005, Andi Kleen wrote:
> Date: 31 May 2005 22:01:14 +0200 Tue, 31 May 2005 22:01:14 +0200
> From: Andi Kleen <[email protected]>
> To: Steven Rostedt <[email protected]>
> Cc: [email protected], [email protected], [email protected],
> [email protected], [email protected], Ingo Molnar <[email protected]>,
> Sven-Thorsten Dietrich <[email protected]>,
> James Bruce <[email protected]>, kus Kusche Klaus <[email protected]>,
> Nick Piggin <[email protected]>, Esben Nielsen <[email protected]>,
> "Bill Huey (hui)" <[email protected]>
> Subject: Re: RT patch acceptance
>
> On Tue, May 31, 2005 at 12:29:35PM -0400, Steven Rostedt wrote:
>> I would assume that the distros would ship without PREEMPT enabled
>> because it was (and probably still is) considered unstable.
>
> In addition to that it is slow too due to much increased locking
> overhead.
>
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
On Tue, 2005-05-31 at 15:52 -0400, Lee Revell wrote:
> On Tue, 2005-05-31 at 12:29 -0400, Steven Rostedt wrote:
> > Bill, I think you're chasing everyone off this thread ;-)
> >
>
> That's fine with me, this thread is asinine and a complete WOB.
> Absolutely nothing new has been said (well it's apparently new to people
> why didn't pay any attention to PREEMPT_RT development so far) and we're
> just going to repeat the whole stupid process when Ingo *actually
> submits -RT for acceptance*.
Actually I believe this thread was worth something, if not just to get a
feel of what is going to happen when Ingo and the rest of us try to get
it accepted. It at least opened my eyes to what people think of the
patch, and what some people's fears are. I'm archiving the whole damn
thread so that I can reference it when the time comes to merge the RT
patch in. (I may be able to contradict someone with their own words ;-}
With that... I think this thread is officially dead! (I hope)
-- Steve
On Tue, May 31, 2005 at 11:36:27AM -0700, Paul E. McKenney wrote:
> This is key. Although there may well be some realtime application
> that requires -all- of Linux's syscalls and drivers, it seems that a
> large number of them are happy with a small subset. Some are even OK
> with user-mode execution as the only realtime service. One example
> of this would be an application that maps the device registers into
> user space and that pins the memory that it needs. In these cases,
Yes, this is exactly what RTAI does AFIK, it runs userland in real
time from irqs, and in turn it can't provide syscalls to avoid risking
blocking anywhere.
If you don't need to run syscalls, you don't need CONFIG_PREEMT_RT AFIK,
but only a nanokernel or the cli trick and then you're much safer and
much simpler.
Audio is an enterly different beast, audio doesn't need hard-RT (if they
miss a deadline they'll have to record again, no big deal) and they just
need a better lowlatency patch to hide the badness in the usb irq and
ide irq (instead of fixing the source of those long irqs by simply
offloading them to softirq and by adding a sysctl or some other tweak to
always run softirqs within the softirqd context instead of running
softirqs from irq context like today [to improve performance]).
> the "rest of the kernel" need not provide services deterministically.
> Instead, it need only be able to give up the CPU deterministically.
> As I understand it, this last is the point of CONFIG_PREEMPT_RT.
For preempt-RT to be equivalent to RTAI, the "rest of the kernel" will
require all places that disables preempt temporarily, to be demonstrated
to be deterministic. It's not like RTAI that depends on the _hardware_
to raise an high prio irq.
RTAI == hardware guarantee (kernel code must not be deterministic at
all, embedded developer takes almost no risks)
preempt-RT == software guarantee (kernel code under preempt_disable
definitely has to be deterministic, embedded developer must be able to
evaluate all kernel code paths in drivers that disable preempt and irqs)
Plus with RTAI we don't depend on scheduler to do the right thing etc...
that suff can break when somebody tweak the scheduler for some smp
scalability bit or something like that (just watch currently Linus and
Ingo going after a scheduler bug that hangs the system, that would crash
a system with preempt-RT but RTAI would keep going without noticing
since it gets irq even when irqs are locally disabled), while it sounds
harder to break the nanokernel thing that depends on hardware feature
and unmaskable irqs.
The point where preempt-RT enters the hard-RT equation, is only if you need
syscall execution in realtime (like audio, but audio doesn't need
hard-RT, so preempt-RT can only do good from an audio standpoint, it
makes perfect sense that jack is used as argument for preempt-RT). If
you need syscalls with hard-RT, the whole thing gets an order of
magnitude more complicated and software becomes involved anyways, so
then one can just hope that preempt-RT will get everything right and
that somebody will demonstrate it.
> I agree that making each and every component of Linux provide realtime
> services (as opposed to just staying out of the way of realtime tasks)
> would take quite a bit of time and effort. For example, keeping (say)
> TCP/IP from interfering with realtime user-mode execution is not all
> that difficult, but getting realtime response from a TCP/IP connection
> across Internet is another matter.
Definitely agreed ;)
> There are (at least!) two competing definitions for "hard real time":
>
> 1. Absolute guarantee of meeting the deadlines.
That's the one I meant with hard-RT.
> 2. Any failure to meet the deadline results in failure of the
> application.
Well, this is doable with any linux kernel out there by just having the
application use gettimeofday before the deadline should have expired, or
by measuring the cycles in the scheduler. 2 doesn't require any RT
feature at all from the kernel. Most apps in class 2 will work just
fine if their deadline is on the order of hundred msec.
> #1 is that there are other sources of failure. For example, if all of
> the CPUs die, you are not going to meet your deadline no matter how the
> software is coded. Not even hard-coded assembly language on bare metal
> can save you in this situation.
Yes, but that's a fault tolerance problem, it's possible to verify the
results of the cpu executions in software, I plan to add that with
future versions of CPUShare too (not on a cycle basis, but over time by
comparing the address space using the dirty bit of the pte and by
comparing the output generated through the network, though the dirty bit
tracking will require more kernel changes) with cpus spread across the
globe in different countries, since the seccomp environment is fully
deterministic.
As long as a software or hardware hits no bugs, class #1 is guaranteed
to keep going without missing the deadline. Hardware failure can always
happen anyway.
> Definition #2 is in some sense more practical, but one must also supply
> a required probability of success. Otherwise, one can make -any- app
> be "hard realtime" as follows:
>
> my_deadline = time(0) + TIME_ALLOWED_FOR_SOMETHING;
> /* do something that has a deadline */
> if (missed_deadline(my_deadline)) {
> abort();
> }
>
> I suspect that Linux can meet definition #1 for only a very restricted
> set of services (e.g., user-mode execution for an app that has pinned
> memory). I expect that Linux would be able to meet definition #2 for
> a larger set of services. But this can be done incrementally, adding
> deterministic implementations of services as needed.
Class #2 can be implemented in userland too, it doesn't need to be the
scheduler that invokes exit, the app can do it too IMHO.
> Agreed, I certainly would not trust a hand-made proof of determinism!
eheh ;)
> Even an automated proof has the possibility of bugs in the proof software.
> But it would be necessary to specify which parts of the kernel needed
> to meet realtime scheduling guarantees. If the application does not
> use a VGA driver, then it is only necessary to show that the VGA driver
> does not interfere with realtime processes -- one does not have to
> make the VGA driver itself deterministic. Instead, one only has to
> make sure that the VGA driver lets go of the CPU deterministically.
Agreed.
What scares me a bit, is that the RT developers will have to do the
auditing themself for every driver they use in their embedded apps that
hasn't been verified yet. It's quite different from an hardware
guarantee that won't change depending on the underlying kernel code.
The nice thing of using linux with a class #1 app (one that requires
hard-RT to never miss a deadline but that won't cause a disaster if it
fails and in turn that doesn't require demonstration that all the linux
kernel is perfect), is that all the cpu can be used for rendering and
for GUI (with a full blown QT configuration on a LCD display and full
network stack with firewall too) without risking the hard-RT part to
skip a beat. All sort of robots fits this area for example, a robot
could crash and damage itself or damage other goods if it misses a
deadline, but you want to configure and monitor it through the network
and you want a firewall and nice GUI on it etc..
Even better, even when the kernel crashes, often ping keeps working,
scheduler is completely dead, but the hard-RT part would still run
without skipping a beat.
> For the really critical stuff, some projects have assigned three different
> teams to implement in three different languages and runtime environments,
> and then coupled the resulting systems into a triple-module-redundancy
> configuration. Horribly expensive, but worth it in some cases.
I agree it's worth it, and at least reinventing the wheel sometime is
useful ;)
On Tue, May 31, 2005 at 02:29:52PM -0400, Steven Rostedt wrote:
> Probably, what I was talking about is diamond hard, and Ingo's RT patch
> is metal hard. PREEMPT is just wood hard and !PREEMPT is plastic hard*.
> Leaving MS Windows as feather hard ;-)
Yes, this is a nice way to expose it ;)
> believe that what the -RT patch is giving us, is something that can give
> the Linux kernel more that it can guarantee, but not everything. Which I
> think is a good thing (and keeps me employed :-)
;)
One thing we should be careful: if syscalls aren't needed in the app and
all the MMIO space can be mmapped by the device driver and the app can
run fully in userland and be invoked from irqs, then going with the
"diamond hard" is not more complicated than going with the weak
solutions. The "diamond hard" will work in userland too, and it won't be
substantially different from a soft-RT "metal hard" approach like
preempt-RT. So it'd be very bad if people would choose preempt-RT if
they could equally easily go with RTAI or other "diamond hard" solutions
that are order of magnitude simpler and safer.
> more for telecommunication and as others said, music. Before I left
Those are certainly areas where linux kernel has to be involved, and in
turn the "diamond hard" isn't feasible anyway without huge efforts. So
for them I definitely agree preempt-RT scheduling irqs in userland is
ok.
> * OK, maybe still not as hard as what is mentioned, but I couldn't think
> of better terminology. I do stand by what I called diamond and what I
> called feather. ;-)
;)
> the OS didn't need to be tested the same, and as mentioned, the lack of
> terminology for this is the source of most problems, as is demonstrated
> on this thread!
As you said what I've always meant with hard-RT is the "diamond hard"
thing. After all in linux everything that called hard-RT (RTAI, RTLinux,
nanokernel) was "diamond hard" so far, preempt-RT is the first time in
linux where I see the word "hard-RT" combined with something not
"diamond hard".
thanks.
On Tue, 2005-05-31 at 22:54 +0200, Andrea Arcangeli wrote:
> One thing we should be careful: if syscalls aren't needed in the app and
> all the MMIO space can be mmapped by the device driver and the app can
> run fully in userland and be invoked from irqs, then going with the
> "diamond hard" is not more complicated than going with the weak
> solutions. The "diamond hard" will work in userland too, and it won't be
> substantially different from a soft-RT "metal hard" approach like
> preempt-RT. So it'd be very bad if people would choose preempt-RT if
> they could equally easily go with RTAI or other "diamond hard" solutions
> that are order of magnitude simpler and safer.
>
The question is, it is really simpler? Programming for the -RT patch
would work with just Linux as well. You just miss your deadlines, but
the application will still run, or at least easy to test. I haven't
used the RTAI approach so I'm not familiar with the difficulties of
using it. But one would still need to make the effort in incorporating
it. If the -RT patch is merged, then all that would be needed is a
CONFIG option set.
> As you said what I've always meant with hard-RT is the "diamond hard"
> thing. After all in linux everything that called hard-RT (RTAI, RTLinux,
> nanokernel) was "diamond hard" so far, preempt-RT is the first time in
> linux where I see the word "hard-RT" combined with something not
> "diamond hard".
I wouldn't call RTAI, RTLinux or a nano-kernel (embedded with Linux)
"Diamond" hard. Maybe "Ruby" hard, but not diamond. Remember, I use to
test code that was running airplane engines, and none of those mentioned
would qualify to run that. I wouldn't want to be in an airplane that
had one of those as the main OS unless someone really stripped them down
or did the real work to verify them.
How much guarantee can the RTAI projects give on latencies? And how
well does an application running on Linux (non-RT) communicate to an
application running as RT? You don't need to answer, I guess I could
read up on it when I get the time.
So, time may tell. Ingo's patch may one day get to Ruby level, but right
now I believe 90% of all RT applications are satisfied with the "Metal"
level.
-- Steve
On Tue, May 31, 2005 at 10:54:24PM +0200, Andrea Arcangeli wrote:
> On Tue, May 31, 2005 at 02:29:52PM -0400, Steven Rostedt wrote:
> > Probably, what I was talking about is diamond hard, and Ingo's RT patch
> > is metal hard. PREEMPT is just wood hard and !PREEMPT is plastic hard*.
> > Leaving MS Windows as feather hard ;-)
>
> Yes, this is a nice way to expose it ;)
Notating it in terms of Tofu firmness would have been more comforting. :)
bill
On Tue, May 31, 2005 at 03:52:20PM -0400, Lee Revell wrote:
> On Tue, 2005-05-31 at 12:29 -0400, Steven Rostedt wrote:
> > Bill, I think you're chasing everyone off this thread ;-)
Yeah, everybody picks on the Chinese kid that's into computers. :}
> That's fine with me, this thread is asinine and a complete WOB.
> Absolutely nothing new has been said (well it's apparently new to people
> why didn't pay any attention to PREEMPT_RT development so far) and we're
> just going to repeat the whole stupid process when Ingo *actually
> submits -RT for acceptance*.
No joke.
A paper needs to be written outlining all of these issues. As folks
get into this domain from lkml it's clear that need to have a primer
to prevent stupid questions from being asked, therefore killing
communication and productivity in these discussions.
bill
On Tue, 2005-05-31 at 17:22 -0400, Steven Rostedt wrote:
> I wouldn't call RTAI, RTLinux or a nano-kernel (embedded with Linux)
> "Diamond" hard. Maybe "Ruby" hard, but not diamond. Remember, I use to
> test code that was running airplane engines, and none of those mentioned
> would qualify to run that.
I think trying to make these types of distinctions is a waste of time.
What matters is the MTBF of the software relative to the hardware on a
given system. It would be stupid to use a commercial RTOS for a cell
phone because they fall apart in a year anyway and users don't seem to
care. Ditto anything running on PC hardware. For an airplane the MTBF
obviously must be more in line with that hardware which hopefully is way
more reliable.
Only the engineer who designs the system knows for sure, so if the RT
app people say PREEMPT_RT is good enough for a *very* large set of the
applications that they currently need a commercial RTOS for, they should
be given the benefit of the doubt. To say otherwise is to assert that
you know their hardware (be it desktop PC, digital audio workstation, or
airplane) better than they do.
Lee
On Mon, May 30, 2005 at 04:00:50PM -0400, Karim Yaghmour wrote:
> Here's for the fun of history, a diff between the 8390.c in 2.2.16 and the
> one in rt-net 0.9.0:
This is really interesting code. It's really not unlike what preempt RT
is already doing with the atomic locking (replacement). From the looks
of it conversion of an ethernet driver to be RT capable is shockingly
trivial.
bill
On Tue, May 31, 2005 at 05:22:31PM -0400, Steven Rostedt wrote:
> it. If the -RT patch is merged, then all that would be needed is a
> CONFIG option set.
If RT is merged and RTAI not, that might be simpler to install, but I
wouldn't judje on what's simpler to use based on mainline inclusion or
not. I don't work in the emebdded-RT space, but if I had to build an
hard-RT embedded app for myself and I didn't need to run syscalls in
real time (i.e. no audio ioctls), I'd sure start with RTAI.
> I wouldn't call RTAI, RTLinux or a nano-kernel (embedded with Linux)
> "Diamond" hard. Maybe "Ruby" hard, but not diamond. Remember, I use to
> test code that was running airplane engines, and none of those mentioned
> would qualify to run that. I wouldn't want to be in an airplane that
> had one of those as the main OS unless someone really stripped them down
> or did the real work to verify them.
Sure agreed. But the main reliability problem that makes it only "ruby"
isn't the nanokernel itself, but the hardware too complex and linux
itself way too complex and not-provable, since it could hang and lock
the bus with a wrong dma operation on a graphics card or whatever else.
Perhaps those apps should run on OS stripped down w/o MMU and w/o irqs
and on much slower and more reliable cpus and ram. I'm not really an
expert of this area.
>From a linux point of view, currently you can't get an harder stone than
RTAI/RTLinux/nanokernel (that's probably why I was biased and I called
it "diamond" hard, even if it was only "ruby" in absolute terms ;).
> How much guarantee can the RTAI projects give on latencies? And how
That depends on the hardware I guess.
> So, time may tell. Ingo's patch may one day get to Ruby level, but right
> now I believe 90% of all RT applications are satisfied with the "Metal"
> level.
Possible. My only worry is that embedded people goes to metal level
thinking it's better than the ruby level, when they would be better and
simpler at the ruby level.
On Tue, May 31, 2005 at 05:47:47PM -0400, Lee Revell wrote:
> given system. It would be stupid to use a commercial RTOS for a cell
> phone because they fall apart in a year anyway and users don't seem to
One year is too short, 3 year lifetime is more reasonable. Personally
the only cellphone I broke so far was 3 years old. Now I've a cool
cellphone with linux 2.4.20 on it, I'm curious to see how long it will
last ;).
On Tuesday 31 May 2005 17:15, Andrea Arcangeli wrote:
> If RT is merged and RTAI not, that might be simpler to install, but I
> wouldn't judje on what's simpler to use based on mainline inclusion or
> not. I don't work in the emebdded-RT space, but if I had to build an
> hard-RT embedded app for myself and I didn't need to run syscalls in
> real time (i.e. no audio ioctls), I'd sure start with RTAI.
If I can throw my two-cents in here, I'm an embedded RT developer, and I agree
with Andrea.
RTAI is a very mature, completely open source real time system these days.
Regardless of the skill and manpower being leveraged on the RT patch, I gotta
say it looks like your re-inventing the wheel by not using the work that's
already been done in RTAI.
Seems like there is a lot of RTAI speculation going on here.
Maybe their list should be CC'd on this thread?
NZG.
On Tue, May 31, 2005 at 05:33:02PM -0500, NZG wrote:
> RTAI is a very mature, completely open source real time system these days.
> Regardless of the skill and manpower being leveraged on the RT patch, I gotta
> say it looks like your re-inventing the wheel by not using the work that's
> already been done in RTAI.
Wrong, read the beginning of the thread downward.
> Seems like there is a lot of RTAI speculation going on here.
> Maybe their list should be CC'd on this thread?
You came onto this thread very late and Karim of RTAI has been involved
in this thread from the very beginning.
bill
On Tue, 2005-05-31 at 17:47 -0400, Lee Revell wrote:
> On Tue, 2005-05-31 at 17:22 -0400, Steven Rostedt wrote:
> > I wouldn't call RTAI, RTLinux or a nano-kernel (embedded with Linux)
> > "Diamond" hard. Maybe "Ruby" hard, but not diamond. Remember, I use to
> > test code that was running airplane engines, and none of those mentioned
> > would qualify to run that.
>
> I think trying to make these types of distinctions is a waste of time.
> What matters is the MTBF of the software relative to the hardware on a
> given system. It would be stupid to use a commercial RTOS for a cell
> phone because they fall apart in a year anyway and users don't seem to
> care. Ditto anything running on PC hardware. For an airplane the MTBF
> obviously must be more in line with that hardware which hopefully is way
> more reliable.
Agreed. I only brought up the stupid names just to show that there's
not a black and white aspect to what RT is. It's mainly a black art
since there's no way to know how many bugs a program has, and how do you
truly calculate the MTBF, other than running it on the hardware itself?
> Only the engineer who designs the system knows for sure, so if the RT
> app people say PREEMPT_RT is good enough for a *very* large set of the
> applications that they currently need a commercial RTOS for, they should
> be given the benefit of the doubt. To say otherwise is to assert that
> you know their hardware (be it desktop PC, digital audio workstation, or
> airplane) better than they do.
True, but do they really know how good PREEMPT_RT is, compared to those
that develop it and the kernel? But I'm fighting to get PREEMPT_RT into
the kernel, since I really think it would be used by quite a lot in the
industry. Just not the normal Desktop user.
-- Steve
On Tue, 2005-05-31 at 14:33 -0700, Bill Huey wrote:
> On Tue, May 31, 2005 at 10:54:24PM +0200, Andrea Arcangeli wrote:
> > On Tue, May 31, 2005 at 02:29:52PM -0400, Steven Rostedt wrote:
> > > Probably, what I was talking about is diamond hard, and Ingo's RT patch
> > > is metal hard. PREEMPT is just wood hard and !PREEMPT is plastic hard*.
> > > Leaving MS Windows as feather hard ;-)
> >
> > Yes, this is a nice way to expose it ;)
>
> Notating it in terms of Tofu firmness would have been more comforting. :)
Actually, since my wife is Italian, I should have used the hardness of
spaghetti as it cooks. That way I could call MS Windows an over cooked
noodle! ;-)
Jeeze, this thread is stuck in the "D" state. It just won't die!
-- Steve
>>>>> "Esben" == Esben Nielsen <[email protected]> writes:
Esben> On Tue, 31 May 2005, James Bruce wrote:
>>
>> It is only better in that if you need provable hard-RT *right now*,
>> then you have to use a nanokernel.
Esben> What do you mean by "provable"? Security critical? Forget about
Esben> nanokernels then. The WHOLE system has to be validated.
The whole point of a nanokernel is it's *small*. The whole thing can
be formally verified. And its semantics will provide isolation
between the real-time processes and anything else that's running.
We're currently working on a system called Iguana, which will have
provable WCET for real-time scheduled tasks, and a Linux envionrment
(called `wombat') that provides compatibility for
Esben> I can't see it would be easier prove that a nano-kernel with
Esben> various needed mutex and queuing mechanism works correct than
Esben> it is to prove that the Linux scheduler with mutex and queueing
Esben> mechanisms works correctly.
Except that the nano-kernel is less than one percent of the size.
Both systems does the same thing
Esben> and is most likely based on the same principles! If a module
Esben> in Linux disables interrupts for a non-deterministic amount of
Esben> time, it destroys the RT in both scenarious. With the
Esben> nanokernel, the Linux kernel is patched not to disable
Esben> interrupts, but if someone didn't use the official
Esben> local_irq_disable() macro the patch didn't work anyway... The
Esben> only way you can be absolutely sure Linux doesn't hurt RT is to
Esben> run it in a full blown virtuel machine where it doesn't have
Esben> access to disable interrupts and otherwise interfere with the
Esben> nano-kernel.
This is precisely the approach we (and others) are taking. A
virtualised Linux to provide interactive and soft realtime (think java
games on your mobile phone), and a nanokernel for your hard realtime
tasks (think the radio controller on your mobile phone).
See http://www.disy.cse.unsw.edu.au/Software/Iguana/ for our work.
In addition to our work, there's the Adeos system
(http://www.hyades-itea.org/) uses a very similar approach.
--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
The technical we do immediately, the political takes *forever*
Bill Huey (hui) wrote:
> This is really interesting code. It's really not unlike what preempt RT
> is already doing with the atomic locking (replacement). From the looks
> of it conversion of an ethernet driver to be RT capable is shockingly
> trivial.
It is. In some cases you need to provide alternate functions, but in
most not ... However, note that this is for UDP. There is no such
thing as deterministic TCP.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Tue, 2005-05-31 at 19:41 -0400, Steven Rostedt wrote:
[snip]
> True, but do they really know how good PREEMPT_RT is, compared to those
> that develop it and the kernel? But I'm fighting to get PREEMPT_RT into
> the kernel, since I really think it would be used by quite a lot in the
> industry. Just not the normal Desktop user.
I believe that if solid RT was available to the desktop user, it would
be used in many desktop applications. It would definitely need to be
easy to enable for non-root applications. Capabilities, SCHED_ISO_*,
rlimits...
Every media playback and recording application would like it. Games
would love it, especially multi-threaded games.
[Stop reading here if you don't like games...]
Right now, a game is a big solid event loop, and it prioritizes by doing
the important things first in the loop, then maybe check the time and
drop something less important if it is taking too long, like skipping a
render frame in order to keep the sound and physics running at a smooth
rate.
But with multi-threaded games, the programmer loses a lot of scheduling
control. He might like to run a 60Hz render thread, AI thread(s), sound
thread and physics thread, but cannot because he can't guarantee they
will stay in sync. Everything stops moving if the physics thread gets
interrupted by a low-prio Folding@Home process for 200ms. Good RT would
fix that, and multi-thread games would be great for multi-cpu,
multi-core, hyper-thread systems.
We should not have to shut down all our background applications to run a
video game (or other high-interactive CPU hogs). It should run smooth
as glass even with distributed computation or Gentoo system compiles
running in the background.
If you don't believe <10 milliseconds matter to a gamer, you haven't
hung around with obsessive Quake3 players who tweak their mouse rates
over 200Hz, disable frame sync and turn off texture maps and lighting,
all to get that microscopic edge over the opponent. Tell them about
being able to tweak the mouse and network interrupt priorities and set X
and Quake3 to RT priority. They'll be all over it.
--
Zan Lynx <[email protected]>
My understanding is that the RT patch is (a) optional, and (b)
significantly reduces the latency to get to user mode RT applications,
in user mode, to the order of a few 10s of u-seconds, _almost_
guaranteed,
- this solves lots of problems, either free, or for very little
(commercial) overhead cost, I will come to other costs below.
Thus, unless anyone argues the code is wrong, damaging to stability
(after it is debugged in the normal way) or increases scheduling
overhead even when configured out, it is a Good Thing (TM).
Arguments about u-kernels, mono-kernels, APIs ... notwithstanding;
near HARD REAL TIME in a commodity OS is VERY VALUABLE.
Now lets turn to REALLY HARD REAL TIME and LIFE and MISSION critical
systems, Depending on your critical time, ie the maximum period from
input to reaction:
- it may not be possible, eg 1 fermato-second, today, you must do
a system re-design, or maybe you can't build the system
- you may be forced to compute the response in dedicated hardware,
measuring the gate-delay-sum to response
- you may be able to do it with one or more dedicated COTS cpus, ram ...
by assembly coding the critical paths, and computing worst case
timings
- or you may be able to use a RT commodity OS eg Linux, and process
more than one thread on each processor
Each of these alternatives is progressively cheaper, in terms of
equipment and dedicated development time, and enables less skilled
developers to contribute to both initial development and on-going
maintenance.
Other considerations, eg no single point of failure, may well come into
play eg aircraft flight controls, particularly if the airframe is,
inherently, unstable.
So to summarize:
(a) Critical RT may need to be specially hardware engineered, always
(b) Good OS RT latency is a good thing and will significantly promote
Linux
(c) I havnt heard any argument that the RT patch causes intolerable
scheduler overhead, but neither has the increase been quantified.
I think it is very useful.
--
mit freundlichen Gr??en, Brian.
[ removed a lot of interesting stuff ... ]
Andrea Arcangeli wrote:
> The point where preempt-RT enters the hard-RT equation, is only if you need
> syscall execution in realtime (like audio, but audio doesn't need
> hard-RT, so preempt-RT can only do good from an audio standpoint, it
> makes perfect sense that jack is used as argument for preempt-RT). If
> you need syscalls with hard-RT, the whole thing gets an order of
> magnitude more complicated and software becomes involved anyways, so
> then one can just hope that preempt-RT will get everything right and
> that somebody will demonstrate it.
Please have a look at RTAI-fusion. It provides deterministic
replacements for rt-able syscalls _transparently_ to STANDARD
Linux applications. For example, an unmodified Linux application
can get a deterministic nanosleep() via RTAI-fusion. The way
this works, is that rtai-fusion catches the syscalls prior to
them reaching Linux. So even the syscall thing isn't really a
limitation for RTAI anymore.
Philippe would be in a better position to elaborate, but that's
the essentials of it.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Tue, 2005-05-31 at 21:14 -0400, Karim Yaghmour wrote:
> Please have a look at RTAI-fusion. It provides deterministic
> replacements for rt-able syscalls _transparently_ to STANDARD
> Linux applications. For example, an unmodified Linux application
> can get a deterministic nanosleep() via RTAI-fusion. The way
> this works, is that rtai-fusion catches the syscalls prior to
> them reaching Linux. So even the syscall thing isn't really a
> limitation for RTAI anymore.
This looks very interesting. I need to read more into RTAI and friends
when I get a chance. I just received the latest Linux Journal that has
the article about the use of RTLinux with the control of magnetic
bearings. I've just started reading it so I don't know all the details
yet but it still looks very promising.
I don't think the adding of preempt-RT patch to the kernel will hurt
anything. In fact I think it may even help out the RTAI and friends.
Anyway, as it has been stated, this discussion has started too early.
(Thanks Daniel ;-)
-- Steve
This is the 279th message on this thread (not counting Lee's recent
"human timing" offshoot)
Steven Rostedt wrote:
> This looks very interesting. I need to read more into RTAI and friends
> when I get a chance. I just received the latest Linux Journal that has
> the article about the use of RTLinux with the control of magnetic
> bearings. I've just started reading it so I don't know all the details
> yet but it still looks very promising.
I'll abstain from reviving any of the RTLinux vs. RTAI zombies, none of
those that attended are in any way eager to take those ones out of the
closet, but I would suggest some archive browsing. It is nevertheless,
safe to say that there a lot of differences between these projects, and
that both sides have a very different idea of what these differences are
and what they mean. Great care should be taken not to disturb the dead :)
> I don't think the adding of preempt-RT patch to the kernel will hurt
> anything. In fact I think it may even help out the RTAI and friends.
> Anyway, as it has been stated, this discussion has started too early.
> (Thanks Daniel ;-)
I've removed the disclaimer in my latest replies, but I stated very
early in this thread that the approaches are not mutually exclusive.
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
On Tue, May 31, 2005 at 01:14:40PM -0700, David Lang wrote:
> distros already throw >20% performance improvements on the floor to
hmm, where does that >20% number come from?
On Wed, 1 Jun 2005, Andrea Arcangeli wrote:
> On Tue, May 31, 2005 at 01:14:40PM -0700, David Lang wrote:
>> distros already throw >20% performance improvements on the floor to
>
> hmm, where does that >20% number come from?
>
missed options for optimizing for the specific CPU you have. it's not as
bad as it used to be when everything was compiled for 386, but it's still
a significant benifit to recompile with no changes in options except for
the CPU type.
I haven't measured it recently, but in the 2.4.17/2.6.0 timeframe I was
seeing >30% for optimized kernels vs stock ones so I don't think the 20%
figure is unreasonable.
David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
On Tue, 31 May 2005, Steven Rostedt wrote:
> On Tue, 2005-05-31 at 17:47 -0400, Lee Revell wrote:
> > On Tue, 2005-05-31 at 17:22 -0400, Steven Rostedt wrote:
> > > I wouldn't call RTAI, RTLinux or a nano-kernel (embedded with Linux)
> > > "Diamond" hard. Maybe "Ruby" hard, but not diamond. Remember, I use to
> > > test code that was running airplane engines, and none of those mentioned
> > > would qualify to run that.
> >
> > I think trying to make these types of distinctions is a waste of time.
> > What matters is the MTBF of the software relative to the hardware on a
> > given system. It would be stupid to use a commercial RTOS for a cell
> > phone because they fall apart in a year anyway and users don't seem to
> > care. Ditto anything running on PC hardware. For an airplane the MTBF
> > obviously must be more in line with that hardware which hopefully is way
> > more reliable.
>
> Agreed. I only brought up the stupid names just to show that there's
> not a black and white aspect to what RT is. It's mainly a black art
> since there's no way to know how many bugs a program has, and how do you
> truly calculate the MTBF, other than running it on the hardware itself?
This discussion has digressed even further beyond hard/soft realtime
to reliability and fault tolerance (airplane engine), which is not
the same thing.
On Tue, May 31, 2005 at 09:14:59PM -0400, Karim Yaghmour wrote:
>
> [ removed a lot of interesting stuff ... ]
>
> Andrea Arcangeli wrote:
> > The point where preempt-RT enters the hard-RT equation, is only if you need
> > syscall execution in realtime (like audio, but audio doesn't need
> > hard-RT, so preempt-RT can only do good from an audio standpoint, it
> > makes perfect sense that jack is used as argument for preempt-RT). If
> > you need syscalls with hard-RT, the whole thing gets an order of
> > magnitude more complicated and software becomes involved anyways, so
> > then one can just hope that preempt-RT will get everything right and
> > that somebody will demonstrate it.
>
> Please have a look at RTAI-fusion. It provides deterministic
> replacements for rt-able syscalls _transparently_ to STANDARD
> Linux applications. For example, an unmodified Linux application
> can get a deterministic nanosleep() via RTAI-fusion. The way
> this works, is that rtai-fusion catches the syscalls prior to
> them reaching Linux. So even the syscall thing isn't really a
> limitation for RTAI anymore.
I -completely- misinterpreted
http://www.fdn.fr/~brouchou/rtai/rtai-doc-prj/rtai-fusion.html
on first reading some months ago. It looks much more interesting
on second reading. It does not have the degree of isolation that
the pure double-kernel approaches do, since as the paper states,
Linux can "hide" tasks that are waking up from I/O events.
However, it does appear to provide a unified user-level environment.
I will add it to my list of approaches to realtime in Linux!
Thanx, Paul
> Philippe would be in a better position to elaborate, but that's
> the essentials of it.
>
> Karim
> --
> Author, Speaker, Developer, Consultant
> Pushing Embedded and Real-Time Linux Systems Beyond the Limits
> http://www.opersys.com || [email protected] || 1-866-677-4546
>