2004-10-23 19:52:47

by Paul E. McKenney

[permalink] [raw]
Subject: [RFC][PATCH] Restricted hard realtime

Attaining hard-realtime response in an existing general-purpose operating
system has been seen as a "big-bang" conversion. The problem is that
the entire OS kernel must be modified to ensure that all code paths
are deterministic. It would be much better if there was an evolutionary
path to hard realtime.

Since multicore dies and multithreaded CPUs are quite common and
becoming quite inexpensive, why not simply designate a particular
CPU as the hard realtime CPU? Interrupts can be directed to the
other CPUs, and cpus_allowed can be used to force processes requiring
hard-realtime support to run on the designated CPU. Various other
tricks can be used to lock the user process's pages into memory.

This much has been done many times in many operating systems, and,
if the hard-realtime processes run strictly in user mode, this is
(almost) all that is required.

But even if the hard-realtime processes refrain from invoking system
calls, there are any number of traps and exceptions that can occur.
Besides, it is often convenient to use systems calls, for example,
a batch industrial process may require realtime response while the
batch is processing, but not between batches. It is convenient to
allow the realtime processes to do things like log statistics to
mass storage or the network between batches, but without interfering
with any other realtime processes that might (for example) need
realtime response while preparing for the next batch.

One way to handle this is to offload the system calls and exceptions
from the designated realtime CPU to the other CPUs. The attached
(raw, untested, probably does not compile) patch shows one way of
doing this. Then any overly long kernel code paths execute on
the non-realtime CPUs, where they do not interfere with realtime
latency on the designated realtime CPU.

This patch does have a number of shortcomings (surprise, surprise!):

o It is untested, probably doesn't even compile.

o It has not been merged with Ingo's real-time preemption
patch. Perhaps Ingo sees this as a good thing. ;-)

o The designated realtime CPU is hardcoded, as is the CPU
to which the system calls are offloaded. This happens
to line up with the purported PPC ability to direct all
interrupts to CPU 0. This would obviously need to be
fixed in a production version. Preferably in a way that
matches where the PAGG/cpusets/CKRM guys end up, but they
seem to still be hashing things out.

o There is no API to set the new TIF_SYSCALL_RTOFFLOAD
bit to designate the task as a hard-realtime task. There
are any number of ways this might be done, including
a /proc interface, a new syscall, an ioctl, or who knows
what else. Suggestions welcome, especially those accompanied
by a corresponding patch.

o It currently handles only syscalls, not traps or exceptions.

o It does not yet allow for real-time-safe system calls.
This capability is quite important, as it would allow Linux
to evolve towards hard realtime to the extent desired, one
system call at a time, for example, as Ingo's real-time
preemption made a particular system call deterministic.

o Going further out over the edge, one could have system calls
that are deterministic in some cases (e.g., writes to ramfs
vs. writes to disk) and could offload themselves only as required.

o There are some portions of the scheduler that acquire other
CPUs' runqueue locks, which is not something that the designated
realtime CPU ought to be doing. Similarly, the other CPUs
should not be acquiring the designated realtime CPU's runqueue
lock. For example, sched_migrate_task()'s job becomes more
interesting. There are no doubt other similar issues.

o Real-time application developers would no doubt want some sort
of per-task flag that prohibited offloading, so that executing
a (non-deterministic) system call would result in an error
rather than in offloading.

So, the idea is to provide an evolutionary path towards hard realtime
in Linux. Capabilities within Linux can be given hard-realtime
response separately and as needed. And there are likely a number of
capabilities that will never require hard realtime response, for example,
given current techological trends, a 1MB synchronous write to disk is
going to take some time, and will be subject to the usual retry and
error conditions. This approach allows such operations to keep their
simpler non-realtime code.

[Stepping aside, and donning the asbestos suit with tungsten pinstripes...]

Thoughts?

Thanx, Paul

PS. This can also be applied to single-CPU systems, though it is not
clear to me that it is worthwhile. Think in terms of a realtime
version of Xen that simulates two CPUs while running on a single
CPU. But enough heresy for one email...

diff -urpN -X ../dontdiff linux-2.5-2004.09.27/arch/ppc64/Kconfig linux-2.5-2004.09.27-rt/arch/ppc64/Kconfig
--- linux-2.5-2004.09.27/arch/ppc64/Kconfig Mon Sep 27 12:11:43 2004
+++ linux-2.5-2004.09.27-rt/arch/ppc64/Kconfig Sat Oct 2 14:05:08 2004
@@ -170,6 +170,18 @@ config IRQ_ALL_CPUS
multiple CPUs. Saying N here will route all IRQs to the first
CPU.

+config HARD_REALTIME
+ bool "Reserve a CPU for hard realtime processes"
+ depends on SMP && PPC_MULTIPLATFORM
+ default n
+ help
+ This option allows a CPU to be reserved for hard-realtime
+ processes. Any process running on the hard-realtime CPU
+ that executes a system call will be migrated away to a
+ non-realtime CPU for the duration of the system call.
+ For best results, interrupts should also be directed
+ away from the hard-realtime CPU.
+
config NR_CPUS
int "Maximum number of CPUs (2-128)"
range 2 128
diff -urpN -X ../dontdiff linux-2.5-2004.09.27/arch/ppc64/kernel/ptrace.c linux-2.5-2004.09.27-rt/arch/ppc64/kernel/ptrace.c
--- linux-2.5-2004.09.27/arch/ppc64/kernel/ptrace.c Tue Sep 21 16:07:37 2004
+++ linux-2.5-2004.09.27-rt/arch/ppc64/kernel/ptrace.c Fri Oct 15 08:10:47 2004
@@ -303,6 +303,12 @@ static void do_syscall_trace(void)

void do_syscall_trace_enter(struct pt_regs *regs)
{
+
+ /* The offload check must precede any non-realtime-safe code. */
+
+ if (test_thread_flag(TIF_SYSCALL_RTOFFLOAD))
+ do_syscall_rtoffload();
+
if (unlikely(current->audit_context))
audit_syscall_entry(current, regs->gpr[0],
regs->gpr[3], regs->gpr[4],
@@ -321,4 +327,9 @@ void do_syscall_trace_leave(void)
if (test_thread_flag(TIF_SYSCALL_TRACE)
&& (current->ptrace & PT_PTRACED))
do_syscall_trace();
+
+ /* The offload check must follow any non-realtime-safe code. */
+
+ if (test_thread_flag(TIF_SYSCALL_RTOFFLOAD))
+ do_syscall_rtoffload_return();
}
diff -urpN -X ../dontdiff linux-2.5-2004.09.27/include/asm-ppc64/thread_info.h linux-2.5-2004.09.27-rt/include/asm-ppc64/thread_info.h
--- linux-2.5-2004.09.27/include/asm-ppc64/thread_info.h Tue Sep 21 16:09:46 2004
+++ linux-2.5-2004.09.27-rt/include/asm-ppc64/thread_info.h Thu Oct 14 15:44:31 2004
@@ -97,6 +97,9 @@ static inline struct thread_info *curren
#define TIF_RUN_LIGHT 6 /* iSeries run light */
#define TIF_ABI_PENDING 7 /* 32/64 bit switch needed */
#define TIF_SYSCALL_AUDIT 8 /* syscall auditing active */
+#define TIF_SYSCALL_RTOFFLOAD 9 /* hard-realtime task for which
+ system calls must be offloaded
+ to other CPUs */

/* as above, but as bit values */
#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
@@ -108,7 +111,9 @@ static inline struct thread_info *curren
#define _TIF_RUN_LIGHT (1<<TIF_RUN_LIGHT)
#define _TIF_ABI_PENDING (1<<TIF_ABI_PENDING)
#define _TIF_SYSCALL_AUDIT (1<<TIF_SYSCALL_AUDIT)
-#define _TIF_SYSCALL_T_OR_A (_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT)
+#define _TIF_SYSCALL_RTOFFLOAD (1<<TIF_SYSCALL_RTOFFLOAD)
+#define _TIF_SYSCALL_T_OR_A (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
+ _TIF_SYSCALL_RTOFFLOAD)

#define _TIF_USER_WORK_MASK (_TIF_NOTIFY_RESUME | _TIF_SIGPENDING | \
_TIF_NEED_RESCHED)
diff -urpN -X ../dontdiff linux-2.5-2004.09.27/include/linux/smp.h linux-2.5-2004.09.27-rt/include/linux/smp.h
--- linux-2.5-2004.09.27/include/linux/smp.h Tue Sep 21 16:10:00 2004
+++ linux-2.5-2004.09.27-rt/include/linux/smp.h Tue Oct 19 07:24:58 2004
@@ -69,6 +69,43 @@ static inline int on_each_cpu(void (*fun
return ret;
}

+#ifdef CONFIG_HARD_REALTIME
+
+extern int realtime_cpu;
+extern int realtime_offload_cpu;
+
+/*
+ * Initialize the hard-realtime scheduling info. Initial version
+ * very crude, just a single realtime CPU (CPU 1) and the rest being
+ * non-realtime CPUs.
+ */
+void hard_realtime_init(void);
+
+/*
+ * Move to a non-realtime CPU to do non-deterministic work, such as
+ * some system calls. This is currently a crude hack, need to move
+ * to a deterministic migration procedure.
+ */
+static void do_syscall_rtoffload(void)
+{
+ sched_migrate_task(current, realtime_offload_cpu);
+}
+
+/*
+ * Move back to a realtime CPU to continue deterministic work, for example,
+ * after completing some system calls. This, again, is a crude hack,
+ * need to move to a deterministic migration procedure.
+ */
+static void do_syscall_rtoffload_return(void)
+{
+ sched_migrate_task(current, realtime_cpu);
+}
+#else /* #ifdef CONFIG_HARD_REALTIME */
+#define hard_realtime_init()
+#define do_syscall_rtoffload()
+#define do_syscall_rtoffload_return()
+#endif /* #else #ifdef CONFIG_HARD_REALTIME */
+
/*
* True once the per process idle is forked
*/
diff -urpN -X ../dontdiff linux-2.5-2004.09.27/init/main.c linux-2.5-2004.09.27-rt/init/main.c
--- linux-2.5-2004.09.27/init/main.c Tue Sep 21 16:10:08 2004
+++ linux-2.5-2004.09.27-rt/init/main.c Tue Oct 19 07:04:23 2004
@@ -500,6 +500,7 @@ asmlinkage void __init start_kernel(void
* time - but meanwhile we still have a functioning scheduler.
*/
sched_init();
+ hard_realtime_init();
build_all_zonelists();
page_alloc_init();
printk("Kernel command line: %s\n", saved_command_line);
diff -urpN -X ../dontdiff linux-2.5-2004.09.27/kernel/sched.c linux-2.5-2004.09.27-rt/kernel/sched.c
--- linux-2.5-2004.09.27/kernel/sched.c Mon Sep 27 12:11:45 2004
+++ linux-2.5-2004.09.27-rt/kernel/sched.c Tue Oct 19 07:25:12 2004
@@ -4766,3 +4766,18 @@ void __might_sleep(char *file, int line)
}
EXPORT_SYMBOL(__might_sleep);
#endif
+
+#ifdef CONFIG_HARD_REALTIME
+
+int realtime_cpu;
+int realtime_offload_cpu;
+
+/*
+ * Initialize the hard-realtime offload. Currently very crude.
+ */
+void hard_realtime_init(void)
+{
+ realtime_cpu = 1;
+ realtime_offload_cpu = 0;
+}
+#endif /* #ifdef CONFIG_HARD_REALTIME */


2004-10-23 20:41:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

On Sat, Oct 23, 2004 at 10:17:24PM +0200, Ingo Molnar wrote:
>
> * Paul E. McKenney <[email protected]> wrote:
>
> > + bool "Reserve a CPU for hard realtime processes"
>
> this has been implemented in a clean way already: check out the
> "isolcpus=" boot option & scheduler feature (implemented by Dimitri
> Sivanich) which isolates a set of CPUs via sched-domains for precisely
> such purposes. The way to enter such a domain is via the affinity
> syscall - and balancing will leave such domains isolated.

Thank you for the tip -- I will look this over!

Thanx, Paul

2004-10-23 20:17:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime


* Paul E. McKenney <[email protected]> wrote:

> + bool "Reserve a CPU for hard realtime processes"

this has been implemented in a clean way already: check out the
"isolcpus=" boot option & scheduler feature (implemented by Dimitri
Sivanich) which isolates a set of CPUs via sched-domains for precisely
such purposes. The way to enter such a domain is via the affinity
syscall - and balancing will leave such domains isolated.

ingo

2004-10-23 22:06:25

by Jon Masters

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

On Sat, 23 Oct 2004 14:24:21 -0700, Paul E. McKenney <[email protected]> wrote:

> On Sat, Oct 23, 2004 at 10:22:01PM +0200, Thomas Gleixner wrote:

> > On Sat, 2004-10-23 at 12:47 -0700, Paul E. McKenney wrote:

> > I haven't seen an embedded SMP system yet. Focussing this on SMP systems
> > is ignoring the majority of possible applications.

> Seeing SMP support for ARM lead me to believe that this was not too far
> over the edge.

They have an SMP reference implementation, however many folks don't
actually want to go the dual core approach right now for embedded
designs (apparently the increased design complexity isn't worth it).
I've had protracted discussions about this very issue quite recently
indeed. Others will disagree, I'm only basing my statement upon
conversations with various engineers - I think your idea eventually
becomes interesting, but now is not the right moment to be pushing it
yet. People still don't want this now.

Talk to smartphone manufacturers who currently have dual ARM core
designs, one running Linux and the other running an RTOS for the GSM
and phone stuff, and they'll say they actually want to reduce the
design complexity down to a single core. Talking to people suggests
that multicore designs are good in certain situations (such as in the
case above), but in general people aren't yet going to respond to your
way of doing realtime :-) Yes you do have only one OS in there, maybe
that would change opinion, but we're not quite at the point where
everything is multicore so you're not going to convince the masses.

Having said all that, for a different perspective, I hack on ppc
(Xilinx Virtex II Pro) kernel and userspace stuff for some folks that
make high resolution imaging equipment, involving extremely precise
control over a pulsed signal and data acquisition (we're talking
nanosecond/microsecond precision). Since Linux obviously isn't capable
of this level of deterministic response right now we end up farming
out work to a separate core - it's unlikely your approach would
convince the hardware folks, but I guess it might be tempting at some
point in the future. Who knows.

Jon.

2004-10-23 21:29:47

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

On Sat, Oct 23, 2004 at 10:22:01PM +0200, Thomas Gleixner wrote:
> On Sat, 2004-10-23 at 12:47 -0700, Paul E. McKenney wrote:
> > Attaining hard-realtime response in an existing general-purpose operating
> > system has been seen as a "big-bang" conversion. The problem is that
> > the entire OS kernel must be modified to ensure that all code paths
> > are deterministic. It would be much better if there was an evolutionary
> > path to hard realtime.
> >
> > Since multicore dies and multithreaded CPUs are quite common and
> > becoming quite inexpensive, why not simply designate a particular
> > CPU as the hard realtime CPU? Interrupts can be directed to the
> > other CPUs, and cpus_allowed can be used to force processes requiring
> > hard-realtime support to run on the designated CPU. Various other
> > tricks can be used to lock the user process's pages into memory.
> >
> > This much has been done many times in many operating systems, and,
> > if the hard-realtime processes run strictly in user mode, this is
> > (almost) all that is required.
> >
> > But even if the hard-realtime processes refrain from invoking system
> > calls, there are any number of traps and exceptions that can occur.
> > Besides, it is often convenient to use systems calls, for example,
> > a batch industrial process may require realtime response while the
> > batch is processing, but not between batches. It is convenient to
> > allow the realtime processes to do things like log statistics to
> > mass storage or the network between batches, but without interfering
> > with any other realtime processes that might (for example) need
> > realtime response while preparing for the next batch.
> >
> > One way to handle this is to offload the system calls and exceptions
> > from the designated realtime CPU to the other CPUs. The attached
> > (raw, untested, probably does not compile) patch shows one way of
> > doing this. Then any overly long kernel code paths execute on
> > the non-realtime CPUs, where they do not interfere with realtime
> > latency on the designated realtime CPU.
> >
> > This patch does have a number of shortcomings (surprise, surprise!):
> >
> > o It is untested, probably doesn't even compile.
> >
> > o It has not been merged with Ingo's real-time preemption
> > patch. Perhaps Ingo sees this as a good thing. ;-)
> >
> > o The designated realtime CPU is hardcoded, as is the CPU
> > to which the system calls are offloaded. This happens
> > to line up with the purported PPC ability to direct all
> > interrupts to CPU 0. This would obviously need to be
> > fixed in a production version. Preferably in a way that
> > matches where the PAGG/cpusets/CKRM guys end up, but they
> > seem to still be hashing things out.
> >
> > o There is no API to set the new TIF_SYSCALL_RTOFFLOAD
> > bit to designate the task as a hard-realtime task. There
> > are any number of ways this might be done, including
> > a /proc interface, a new syscall, an ioctl, or who knows
> > what else. Suggestions welcome, especially those accompanied
> > by a corresponding patch.
> >
> > o It currently handles only syscalls, not traps or exceptions.
> >
> > o It does not yet allow for real-time-safe system calls.
> > This capability is quite important, as it would allow Linux
> > to evolve towards hard realtime to the extent desired, one
> > system call at a time, for example, as Ingo's real-time
> > preemption made a particular system call deterministic.
> >
> > o Going further out over the edge, one could have system calls
> > that are deterministic in some cases (e.g., writes to ramfs
> > vs. writes to disk) and could offload themselves only as required.
> >
> > o There are some portions of the scheduler that acquire other
> > CPUs' runqueue locks, which is not something that the designated
> > realtime CPU ought to be doing. Similarly, the other CPUs
> > should not be acquiring the designated realtime CPU's runqueue
> > lock. For example, sched_migrate_task()'s job becomes more
> > interesting. There are no doubt other similar issues.
> >
> > o Real-time application developers would no doubt want some sort
> > of per-task flag that prohibited offloading, so that executing
> > a (non-deterministic) system call would result in an error
> > rather than in offloading.
> >
> > So, the idea is to provide an evolutionary path towards hard realtime
> > in Linux. Capabilities within Linux can be given hard-realtime
> > response separately and as needed. And there are likely a number of
> > capabilities that will never require hard realtime response, for example,
> > given current techological trends, a 1MB synchronous write to disk is
> > going to take some time, and will be subject to the usual retry and
> > error conditions. This approach allows such operations to keep their
> > simpler non-realtime code.
> >
> > [Stepping aside, and donning the asbestos suit with tungsten pinstripes...]
> >
> > Thoughts?
>
> I haven't seen an embedded SMP system yet. Focussing this on SMP systems
> is ignoring the majority of possible applications.

Seeing SMP support for ARM lead me to believe that this was not too far
over the edge.

> There are solutions around to make this work on UP by converting the cpu
> irq enable/disable flag to software.
>
> - the dual kernel approach of RTLINUX

which requires that the hard realtime stuff use a non-Linux RTOS.

> - the domain approach of RTAI/Adeos

which, as I understand it, essentially forwards interrupts to
different kernels multiplexed on the same image.

> - the in kernel approach of KURT/Libertos

which requires that the realtime code be in the kernel, which
is sometimes OK, but often not.

A fourth approach is to run something like the Xen VMM, and have it provide a
single OS with the illusion that there are two CPUs. As you say, the
OS cannot be allowed to really disable interrupts, instead, the underlying
VMM must track whether the OS thinks it has interrupts disabled on a
given "CPU", and refrain from delivering the interrupt until the OS
is ready.

Of course, on a multithreaded CPU or SMP system, the VMM is not required.

> All have one thing in common. The restriction of using existing
> syscalls, concurrency controls and other facilities provided by the
> kernel. So you end up implementing a set of parallel rt-aware
> functionality or you must modify the in kernel facilities to make this
> work.
>
> Hard realtime is not only a question of irq response and a priviledged
> userspace process. Sure there are applications which can be implemented
> selfcontained, but then I really do not need a SMP system.

Agreed, one would really want all services to be hard realtime. I am
saying that it would be good to get there step by step rather than
split-brain or big-bang.

Thanx, Paul

2004-10-23 20:31:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

On Sat, 2004-10-23 at 12:47 -0700, Paul E. McKenney wrote:
> Attaining hard-realtime response in an existing general-purpose operating
> system has been seen as a "big-bang" conversion. The problem is that
> the entire OS kernel must be modified to ensure that all code paths
> are deterministic. It would be much better if there was an evolutionary
> path to hard realtime.
>
> Since multicore dies and multithreaded CPUs are quite common and
> becoming quite inexpensive, why not simply designate a particular
> CPU as the hard realtime CPU? Interrupts can be directed to the
> other CPUs, and cpus_allowed can be used to force processes requiring
> hard-realtime support to run on the designated CPU. Various other
> tricks can be used to lock the user process's pages into memory.
>
> This much has been done many times in many operating systems, and,
> if the hard-realtime processes run strictly in user mode, this is
> (almost) all that is required.
>
> But even if the hard-realtime processes refrain from invoking system
> calls, there are any number of traps and exceptions that can occur.
> Besides, it is often convenient to use systems calls, for example,
> a batch industrial process may require realtime response while the
> batch is processing, but not between batches. It is convenient to
> allow the realtime processes to do things like log statistics to
> mass storage or the network between batches, but without interfering
> with any other realtime processes that might (for example) need
> realtime response while preparing for the next batch.
>
> One way to handle this is to offload the system calls and exceptions
> from the designated realtime CPU to the other CPUs. The attached
> (raw, untested, probably does not compile) patch shows one way of
> doing this. Then any overly long kernel code paths execute on
> the non-realtime CPUs, where they do not interfere with realtime
> latency on the designated realtime CPU.
>
> This patch does have a number of shortcomings (surprise, surprise!):
>
> o It is untested, probably doesn't even compile.
>
> o It has not been merged with Ingo's real-time preemption
> patch. Perhaps Ingo sees this as a good thing. ;-)
>
> o The designated realtime CPU is hardcoded, as is the CPU
> to which the system calls are offloaded. This happens
> to line up with the purported PPC ability to direct all
> interrupts to CPU 0. This would obviously need to be
> fixed in a production version. Preferably in a way that
> matches where the PAGG/cpusets/CKRM guys end up, but they
> seem to still be hashing things out.
>
> o There is no API to set the new TIF_SYSCALL_RTOFFLOAD
> bit to designate the task as a hard-realtime task. There
> are any number of ways this might be done, including
> a /proc interface, a new syscall, an ioctl, or who knows
> what else. Suggestions welcome, especially those accompanied
> by a corresponding patch.
>
> o It currently handles only syscalls, not traps or exceptions.
>
> o It does not yet allow for real-time-safe system calls.
> This capability is quite important, as it would allow Linux
> to evolve towards hard realtime to the extent desired, one
> system call at a time, for example, as Ingo's real-time
> preemption made a particular system call deterministic.
>
> o Going further out over the edge, one could have system calls
> that are deterministic in some cases (e.g., writes to ramfs
> vs. writes to disk) and could offload themselves only as required.
>
> o There are some portions of the scheduler that acquire other
> CPUs' runqueue locks, which is not something that the designated
> realtime CPU ought to be doing. Similarly, the other CPUs
> should not be acquiring the designated realtime CPU's runqueue
> lock. For example, sched_migrate_task()'s job becomes more
> interesting. There are no doubt other similar issues.
>
> o Real-time application developers would no doubt want some sort
> of per-task flag that prohibited offloading, so that executing
> a (non-deterministic) system call would result in an error
> rather than in offloading.
>
> So, the idea is to provide an evolutionary path towards hard realtime
> in Linux. Capabilities within Linux can be given hard-realtime
> response separately and as needed. And there are likely a number of
> capabilities that will never require hard realtime response, for example,
> given current techological trends, a 1MB synchronous write to disk is
> going to take some time, and will be subject to the usual retry and
> error conditions. This approach allows such operations to keep their
> simpler non-realtime code.
>
> [Stepping aside, and donning the asbestos suit with tungsten pinstripes...]
>
> Thoughts?

I haven't seen an embedded SMP system yet. Focussing this on SMP systems
is ignoring the majority of possible applications.

There are solutions around to make this work on UP by converting the cpu
irq enable/disable flag to software.

- the dual kernel approach of RTLINUX
- the domain approach of RTAI/Adeos
- the in kernel approach of KURT/Libertos

All have one thing in common. The restriction of using existing
syscalls, concurrency controls and other facilities provided by the
kernel. So you end up implementing a set of parallel rt-aware
functionality or you must modify the in kernel facilities to make this
work.

Hard realtime is not only a question of irq response and a priviledged
userspace process. Sure there are applications which can be implemented
selfcontained, but then I really do not need a SMP system.

tglx


2004-10-24 15:37:51

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

On Sat, Oct 23, 2004 at 11:06:03PM +0100, Jon Masters wrote:
> On Sat, 23 Oct 2004 14:24:21 -0700, Paul E. McKenney <[email protected]> wrote:
>
> > On Sat, Oct 23, 2004 at 10:22:01PM +0200, Thomas Gleixner wrote:
>
> > > On Sat, 2004-10-23 at 12:47 -0700, Paul E. McKenney wrote:
>
> > > I haven't seen an embedded SMP system yet. Focussing this on SMP systems
> > > is ignoring the majority of possible applications.
>
> > Seeing SMP support for ARM lead me to believe that this was not too far
> > over the edge.
>
> They have an SMP reference implementation, however many folks don't
> actually want to go the dual core approach right now for embedded
> designs (apparently the increased design complexity isn't worth it).
> I've had protracted discussions about this very issue quite recently
> indeed. Others will disagree, I'm only basing my statement upon
> conversations with various engineers - I think your idea eventually
> becomes interesting, but now is not the right moment to be pushing it
> yet. People still don't want this now.

Thank you for the background! It has been quite some time since I
did significant embedded work. Let's just say that I am glad that
"embedded CPU" no longer means "8-bit CPU"! ;-)

> Talk to smartphone manufacturers who currently have dual ARM core
> designs, one running Linux and the other running an RTOS for the GSM
> and phone stuff, and they'll say they actually want to reduce the
> design complexity down to a single core. Talking to people suggests
> that multicore designs are good in certain situations (such as in the
> case above), but in general people aren't yet going to respond to your
> way of doing realtime :-) Yes you do have only one OS in there, maybe
> that would change opinion, but we're not quite at the point where
> everything is multicore so you're not going to convince the masses.

Good points. Suppose there was a way to get the hard realtime benefits
using a slight elaboration of this approach that worked on single-core,
single-threaded CPUs? Would that be of interest?

> Having said all that, for a different perspective, I hack on ppc
> (Xilinx Virtex II Pro) kernel and userspace stuff for some folks that
> make high resolution imaging equipment, involving extremely precise
> control over a pulsed signal and data acquisition (we're talking
> nanosecond/microsecond precision). Since Linux obviously isn't capable
> of this level of deterministic response right now we end up farming
> out work to a separate core - it's unlikely your approach would
> convince the hardware folks, but I guess it might be tempting at some
> point in the future. Who knows.

Agreed, if you are going for the ultimate in response time, you have
no choice but to run hand-coded assembly language on bare metal (though
optimizing compilers are improving, so maybe it will soon be hand-coded
C on bare metal). If you are using your computer to digitally modulate
and synthesize a USA FM radio signal (around 100MHz carrier frequency),
you certainly are not going to have anything resembling an OS involved.
You will have nothing but a tight loop running flat out, assuming that
even today's general-purpose CPUs are fast enough to accomplish this.

So, I guess the question is whether 100-microsecond restricted hard
realtime support in Linux is worth the effort. From what you are saying,
it sounds like the answer is currently "no" if it requires multithreaded
CPUs or multicore dies.

So, again, if there was a way to make this approach work on
single-threaded single core CPUs, would that be of interest?

Thanx, Paul

2004-10-24 18:24:02

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

On Sat, Oct 23, 2004 at 10:17:24PM +0200, Ingo Molnar wrote:
>
> * Paul E. McKenney <[email protected]> wrote:
>
> > + bool "Reserve a CPU for hard realtime processes"
>
> this has been implemented in a clean way already: check out the
> "isolcpus=" boot option & scheduler feature (implemented by Dimitri
> Sivanich) which isolates a set of CPUs via sched-domains for precisely
> such purposes. The way to enter such a domain is via the affinity
> syscall - and balancing will leave such domains isolated.

Thanks again for the pointer, am slowly getting my head around this.
I haven't proven to myself that the isolcpus code gets rid of all of the
cross-runqueue lock acquisitions, but it certainly gets rid of a large
number of them. It doesn't seem to do system-call or exception-handler
offload, but it does help me see how to do this sort of thing cleanly.

Dimitri, one nit so far... Why is sched_domain_dummy under two
layers of #ifdef CONFIG_SMP? Any reason why the attached patch
would not be in order?

Thanx, Paul

diff -urpN -X ../dontdiff linux-2.5-2004.10.23/kernel/sched.c linux-2.5-2004.10.23-LBinf/kernel/sched.c
--- linux-2.5-2004.10.23/kernel/sched.c Sat Oct 23 13:23:31 2004
+++ linux-2.5-2004.10.23-LBinf/kernel/sched.c Sun Oct 24 10:50:12 2004
@@ -4437,14 +4437,12 @@ static void sched_domain_debug(void)
#define sched_domain_debug() {}
#endif

-#ifdef CONFIG_SMP
/*
* Initial dummy domain for early boot and for hotplug cpu. Being static,
* it is initialized to zero, so all balancing flags are cleared which is
* what we want.
*/
static struct sched_domain sched_domain_dummy;
-#endif

#ifdef CONFIG_HOTPLUG_CPU
/*

2004-10-24 21:09:26

by Jon Masters

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Paul E. McKenney wrote:

| Thank you for the background! It has been quite some time since I
| did significant embedded work. Let's just say that I am glad that
| "embedded CPU" no longer means "8-bit CPU"! ;-)

Oh it still does, but those cores tend to get farmed out for PSU
controll or some specific subsystem unless you're "deeply embedded".

jcm>> Talk to smartphone manufacturers who currently have dual ARM
jcm>> core designs, one running Linux and the other running an RTOS
jcm>> for the GSM and phone stuff, and they'll say they actually
jcm>> want to reduce the design complexity down to a single core.
jcm>> Talking to people suggests that multicore designs are good
jcm>> in certain situations (such as in the case above), but in
jcm>> general people aren't yet going to respond to your way of
jcm>> doing realtime :-) Yes you do have only one OS in there,
jcm>> maybe that would change opinion, but we're not quite at the
jcm>> point where everything is multicore so you're not going to
jcm>> convince the masses.

| Good points. Suppose there was a way to get the hard realtime benefits
| using a slight elaboration of this approach that worked on single-core,
| single-threaded CPUs? Would that be of interest?

I dunno. It might. I think the trouble is that you'll need to convince
people who think they want single core designs that it's not a must.
Although within a little time it'll increasingly be the case that one
has additional cores whether one wants them or not, that's not just yet
even now. Someone will doubtlessly disagree.

| if you are going for the ultimate in response time, you have
| no choice but to run hand-coded assembly language on bare metal (though
| optimizing compilers are improving, so maybe it will soon be hand-coded
| C on bare metal).

Nah. The guys I work with have enough trouble making things time nicely
with precisely coded sequences accounting for every available T state.

| If you are using your computer to digitally modulate
| and synthesize a USA FM radio signal

In this case it's large magnets and probes, but yes. Same concept.

| So, I guess the question is whether 100-microsecond restricted hard
| realtime support in Linux is worth the effort.

Some of the folks I work with think not. I'm not sure yet - many people
seem to be of the opinion that it's not worth it, but we're told that
the Smartphone guys and others without throwaway cores (or to simplify a
design somewhat) really want this. Incidentally, it looks like they'll
be about a bazzion cores in this particular FPGA range before long, so
it'll become increasingly fun finding things for them to do.

| So, again, if there was a way to make this approach work on
| single-threaded single core CPUs, would that be of interest?

I guess it would. But then we've just had a slew of RT implementations
crawl out of the woodwork and wave at us over the past few weeks and
there are three other major RT implementations which combine Linux with
a Microkernel or other external support (RTLinux, RTAI, KURT, etc.).
Perhaps it's worth working on one of the Linux patch projects
(Monta/Ingo/etc.) rather than going all out to implement it all again.

Jon.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBfBnUeTyyexZHHxERAo90AKCTgK8ZZCq4Lqw+a1VHFzhtKOO22ACfQy7E
hwuQzo04bIlxJy5b1VjGA+E=
=4omH
-----END PGP SIGNATURE-----

2004-10-24 21:22:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime


* Jon Masters <[email protected]> wrote:

> I guess it would. But then we've just had a slew of RT implementations
> crawl out of the woodwork and wave at us over the past few weeks and
> there are three other major RT implementations which combine Linux
> with a Microkernel or other external support (RTLinux, RTAI, KURT,
> etc.). Perhaps it's worth working on one of the Linux patch projects
> (Monta/Ingo/etc.) rather than going all out to implement it all again.

also note that (as i mentioned it in an earlier reply to Paul) the
'CPU[s] isolated for hard-RT use' scheduler feature has already been
implemented by Dimitri Sivanich and was accepted and integrated into the
2.6.9 kernel a couple of weeks ago.

Isolated CPUs can be set up via the "isolcpus=" boot parameter, and can
be entered via the affinity syscall. The feature came with related fixes
to the scheduler and other kernel code to eliminate cross-effects
between domains. (such as the scheduler balancing code, or the swap
tick)

So this all is banging on open doors, this particular mode of hard-RT
scheduling is there and available in vanilla Linux. If anyone wants to
try it, just download 2.6.9 and use it.

Ingo

2004-10-24 21:55:57

by Jon Masters

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

On Sun, 24 Oct 2004 23:23:59 +0200, Ingo Molnar <[email protected]> wrote:

> also note that (as i mentioned it in an earlier reply to Paul) the
> 'CPU[s] isolated for hard-RT use' scheduler feature has already been
> implemented by Dimitri Sivanich and was accepted and integrated into the
> 2.6.9 kernel a couple of weeks ago.

I saw the posts. I should go check that out myself for interest's sake
- thanks for the info. Scheduling domains is something I haven't
looked in to in much detail yet as they're not something which usually
concern me greatly.

Jon.

2004-10-24 21:57:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

On Sun, Oct 24, 2004 at 10:08:37PM +0100, Jon Masters wrote:
> | So, again, if there was a way to make this approach work on
> | single-threaded single core CPUs, would that be of interest?
>
> I guess it would. But then we've just had a slew of RT implementations
> crawl out of the woodwork and wave at us over the past few weeks and
> there are three other major RT implementations which combine Linux with
> a Microkernel or other external support (RTLinux, RTAI, KURT, etc.).
> Perhaps it's worth working on one of the Linux patch projects
> (Monta/Ingo/etc.) rather than going all out to implement it all again.

Fair enough!

Thanx, Paul

2004-10-25 12:26:49

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

On Sun, Oct 24, 2004 at 11:18:42AM -0700, Paul E. McKenney wrote:
> Dimitri, one nit so far... Why is sched_domain_dummy under two
> layers of #ifdef CONFIG_SMP? Any reason why the attached patch
> would not be in order?
>
Paul, this specific code wasn't part of my original patch, but after
looking at it briefly, I believe this patch should make sense.

Dimitri

2004-10-27 03:27:30

by Karim Yaghmour

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime


Paul E. McKenney wrote:
> Attaining hard-realtime response in an existing general-purpose operating
> system has been seen as a "big-bang" conversion. The problem is that
> the entire OS kernel must be modified to ensure that all code paths
> are deterministic. It would be much better if there was an evolutionary
> path to hard realtime.

I've been trying not to get too involved in this, though I've been
personally very interested in the topic of obtaining deterministic
response times from Linux for quite some time. Ingo's work is
certainly gathering a lot of interest, and he's certainly got the
brains and the can-do mindset that warrant a wait-and-see attitude.

I must admit though that I'm somewhat skeptical/worried. The issue
for me isn't whether Linux can actually become deterministic. This
kernel has reached heights which many of its detractors never
believed it could, it has come a long way. So whether it _could_
better/surpass existing RT-Unixes (such as LynxOS or QNX for example)
in terms of real-time performance is for me in the realm of the
possible.

That the Linux development community has to answer the question of
"how do we provide deterministic behavior for users who need it?"
was, as for the kernel developers of most popular Unixes, just a
matter of time. And in this regard, this is a piece of history that
is yet to be written: What is the _correct_ way to provide
deterministic response times in a Unix environment?

Like in most other circumstances, the Linux development community's
approach to this has been: show me the code! In that regard (and
this is in no way criticism of anyone's work), Ingo's work has
gathered a lot of interest not because it is breaking new ground
in terms of the concepts, but largely because of its very rapid
development pace. Let's face it, no self-respecting effort that has
ever labeled itself as wanting to provide "hard real-time Linux"
has been active on the LKML on the same level as Ingo (though many
have concentrated a lot of effort and talent on other lists.)

Yet, I believe that this is a case where the concepts do actually
matter a lot, and that no amount of published code will erase the
fundamental question: What is the _correct_ way to provide
deterministic response times in a Unix environment? I keep
highlighting the word "correct" because it's around this word's
definition that the answer probably lies.

Here are a number of solutions that some have found to be "correct"
for their needs over time, in chronological order of appearance:
a- Master/slave kernel (ex.: RTLinux)
b- Dual-CPU (there are actually many examples of this, some that
date back quite a few years)
c- Interrupt levels (ex.: D.Schleef, B.Kuhn, etc.)
d- Nanokernel/Hypervisor (ex.:Adeos)
e- Preemption
f- Uber-preemption and IRQ threading (a.k.a. preemption on acid)
(ex.: Ingo, TimeSys, MontaVista, Bill)

My take on this has been that the "correct" way to provide
deterministic response times in a Unix environment should minimize
in as much as possible:
a) the modifications to the targeted kernel's existing API,
behavior, source code, and functionality in general;
b) the burden for future mainstream kernel development efforts;
c) the potential for accidental/casual use of the hard-rt
capabilities, as this would in itself result in loss of
deterministic behavior;

Also, it should be:
a) architectured in a way that enables straight-forward
extension of the real-time capabilities/services without requiring
further modifications to the targeted kernel's existing API,
behavior, sources, and functionality in general;
b) truly deterministic, not simply time-bound by some upper
limit found during a sample test run;
c) _very_ simple to use without, as I said above, having the
potential of being accidentally or casually used (such a solution
should strive, in as much as possible, to provide the same API as
the targeted Unix kernel);
d) easily portable to new architectures, while remaining
consistent, both in terms of API and in terms of behavior, from
one architecture to the next;

From all the solutions that have been put forth over the years, I
have found that the nanokernel/hypervisor solution fits this
description of correctness best. The Adeos/RT-nucleus/RTAI-fusion
stack is one implementation I have been promoting, as it has
already reached important milestones. All that is needed for it
to work is the necessary hooks for Adeos to hook itself into
Linux by way of an interrupt pipeline; the latter being very simple,
portable and non-intrusive, yet could not accidentally/casually
be used without breaking. This interrupt pipeline is all that
is required for the rest of the stack to provide the services I
have alluded to in other postings by means of loadable modules,
including the ability to transparently service existing Linux
system calls via RTAI-fusion for providing applications with hard-
rt deterministic behavior.

One argument that has been leveled against this approach by those
who champion the vanilla-Linux-should-become-hard-rt cause (many
of whom are now in the uber-preemption camp) is that it requires
writing separate real-time drivers. Yet, this argument contains
a fatal flaw: drivers do not become deterministic by virtue of
running on an RTOS. IOW, even if Linux were to be made a Unix RTOS,
every single driver in the Linux sources would still have to be
rewritten with determinism in mind in order to be used in a
system that requires hard-rt. This is therefore a non-issue.

Which brings me back to what you said above: "The problem is that
the entire OS kernel must be modified to ensure that all code paths
are deterministic." There are two possible paths here.

Either:
a) Most current kernel developers intend to eventually convert the
entire existing code-base into one that contains deterministic
code paths only, and therefore impose such constraints on all future
contributors, in which case the path to follow is the one set by
the uber-preemption folks;

or:
b) Most current kernel developers intend to keep Linux a general
purpose Unix OS which mainly serves a user-base that does not need
deterministic hard-rt behavior from Linux, and therefore changes
for providing deterministic hard-rt behavior are acceptable only
if they are demonstrably minimal, non-instrusive, yet flexible
enough for those that demand hard-rt, in which case the path to
follow is the one set by the nanokernel/hyperviser folks;

So which is it?

[ You'll have to excuse the pace of my participation to this thread,
I'm giving 9-to-5 training all week. I'll respond as time permits. ]

Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546

2004-10-27 04:32:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

Karim Yaghmour <[email protected]> wrote:
>
> Here are a number of solutions that some have found to be "correct"
> for their needs over time, in chronological order of appearance:
> a- Master/slave kernel (ex.: RTLinux)
> b- Dual-CPU (there are actually many examples of this, some that
> date back quite a few years)
> c- Interrupt levels (ex.: D.Schleef, B.Kuhn, etc.)
> d- Nanokernel/Hypervisor (ex.:Adeos)
> e- Preemption
> f- Uber-preemption and IRQ threading (a.k.a. preemption on acid)
> (ex.: Ingo, TimeSys, MontaVista, Bill)

uber-preemption is the chosen way for the mainline kernel mainly because
its mechanisms can be largely hidden inside (increasingly ghastly) header
files and most developers just don't have to worry about it.

I have a sneaking suspicion that the day will come when we get nice
sub-femtosecond latencies in all the trivial benchmarks but it turns out
that the realtime processes won't be able to *do* anything useful because
whenever they perform syscalls, those syscalls end up taking long-held
locks.

Which does lead me to suggest that we need to identify the target
application areas for Ingo's current work and confirm that those
applications are seeing the results which they require. Empirical results
from the field do seem to indicate success, but I doubt if they're
sufficiently comprehensive.

2004-10-27 08:10:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime


* Andrew Morton <[email protected]> wrote:

> I have a sneaking suspicion that the day will come when we get nice
> sub-femtosecond latencies in all the trivial benchmarks [...]

with -RT-V0.3 i get lower than 20 usec _maximum_ latencies during
'./hackbench 20'. (the average latency is 1 usec) So while i'm not yet
in the sub-femtosecond category, things are looking pretty good in
PREEMPT_REALTIME land :)

> [...] but it turns out that the realtime processes won't be able to
> *do* anything useful because whenever they perform syscalls, those
> syscalls end up taking long-held locks.

this is not really a problem for the following reason: the spinlock
lock-breaks i am doing for !PREEMPT_REALTIME are exactly the kind of
latencies that could interact with a hard-RT task. What becomes a
latency reducer for the normal SMP or desktop kernel is _the_ latency
guarantee for using arbitrary kernel services in the PREEMPT_REALTIME
model.

another step is to make kernel subsystems have constant overhead (O(1)),
i'd say that's mostly true already today, and it's increasingly becoming
true in the future.

also, RT applications rarely hit the really complex kernel subsystems
because 'complex' often means 'has the potential for IO' - on any
hard-RT OS. So what is important are two factors:

1) preempt to the RT app immediately

2) do not create locking interactions between a tasks that use
'isolated resources'

#1 is quite good in PREEMPT_REALTIME, and even #2 is largely true for
it. But RT apps will try to stay isolated anyway, they do an mlockall()
and use simple syscalls.

note that #2 means: 'no big locks', which is a natural scalability
desire anyway.

> Which does lead me to suggest that we need to identify the target
> application areas for Ingo's current work and confirm that those
> applications are seeing the results which they require. Empirical
> results from the field do seem to indicate success, but I doubt if
> they're sufficiently comprehensive.

with the -V series of PREEMPT_REALTIME the 'preemption latency' target
is easy to meet, because in essence everything except the core scheduler
and the irq redirection code is preemptible :)

E.g. the innermost loop of the IDE hardirq is preemptible, the innermost
loop of kswapd is preemptible and PREEMPT_REALTIME can preempt the
pagefault handler and the mouse handler - name any code and it's
preemptible.

the API latency target is not a big issue because the 'locking
independence' requirement (#2 above) directly corresponds to 'good
scalability on SMP and NUMA'. So anytime we improve high-end scalability
we often will also improve the API-latencies of PREEMPT_REALTIME!

(of course this is a much simplified reasoning ignoring a few issues.)

Ingo

2004-10-27 22:42:25

by Bill Huey

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

On Tue, Oct 26, 2004 at 11:16:01PM -0400, Karim Yaghmour wrote:
> I've been trying not to get too involved in this, though I've been
> personally very interested in the topic of obtaining deterministic
> response times from Linux for quite some time. Ingo's work is
> certainly gathering a lot of interest, and he's certainly got the
> brains and the can-do mindset that warrant a wait-and-see attitude.

No, this needs to be discussed. And the development process is moving
faster than I've ever anticipated.

> I must admit though that I'm somewhat skeptical/worried. The issue
> for me isn't whether Linux can actually become deterministic. This
> kernel has reached heights which many of its detractors never
> believed it could, it has come a long way. So whether it _could_
> better/surpass existing RT-Unixes (such as LynxOS or QNX for example)
> in terms of real-time performance is for me in the realm of the
> possible.

It's possible. Our company (the makers of LynxOS, Lynuxworks) is
betting on it or else I would have no context to work here.

> That the Linux development community has to answer the question of
> "how do we provide deterministic behavior for users who need it?"

The biggest groups of folks I can identify is the multimedia folks.
Their applications are the most system loaded out of all of the
applications on this planet, deterministic latency, heavy CPU usage,
heavy disk load and manical temporal control via frame schedulers, etc...

This stuff has direct application to DVRs (digital video recorders)
and other things that only SGI and BeOS machines could do in their
day.

> was, as for the kernel developers of most popular Unixes, just a
> matter of time. And in this regard, this is a piece of history that
> is yet to be written: What is the _correct_ way to provide
> deterministic response times in a Unix environment?

It's going to be driven by the application. Never lose sight of the
fact that kernel is suppose to support applications, not the other
way around. Kernel programmers forget that fact and app programmers
pay the price with inferior performing kernels as a result of that
attitude.

[comments about Ingo's effort]

> Yet, I believe that this is a case where the concepts do actually
> matter a lot, and that no amount of published code will erase the
> fundamental question: What is the _correct_ way to provide
> deterministic response times in a Unix environment? I keep
> highlighting the word "correct" because it's around this word's
> definition that the answer probably lies.

A new scheduler would probably be needed to drive the system and I'm
betting on pervasively threaded QoS through all of the IO systems with
those QoS threads controlled by that scheduler. This is minimally
required for VoIP applications and the like. QoS is going to be king
in the next few years.

> My take on this has been that the "correct" way to provide
> deterministic response times in a Unix environment should minimize
> in as much as possible:
...
> a) the modifications to the targeted kernel's existing API,
> behavior, source code, and functionality in general;

IMO, little or no change should happen other than supporting basic
Posix RT facilities and possibly userspace drivers.

> b) the burden for future mainstream kernel development efforts;
> c) the potential for accidental/casual use of the hard-rt
> capabilities, as this would in itself result in loss of
> deterministic behavior;

Like with any new system, it's going to take training and time
for folks to figure this out. These patches are some of the most
important in the history of the Linux kernel and it's going to
ultimately have a strong effect on the general kernel community
as well. Hopefully, it won't clash with it, but it's almost
unavoidable. It's probably better to minimize the clash as much
as possible to minimize the adjustment.

The only other project that I know of in open source that's doing
roughly what we're doing is FreeBSD SMPng. Many of the preemption/locking
issue have direct analogs to that project and I use concepts from
the BSD/OS SMPng code in it as a guide for future development. So
one can use that as a guide for how this project might unfold and
I've been doing that. The "WITNESS" facility, etc.. :)

> Also, it should be:

> a) architectured in a way that enables straight-forward
> extension of the real-time capabilities/services without requiring
> further modifications to the targeted kernel's existing API,
> behavior, sources, and functionality in general;

uber-preemption makes the entire kernel hard RT. The question here
for me is scheduling policy for non-RT tasks outside of the
deterministic application domain.

> b) truly deterministic, not simply time-bound by some upper
> limit found during a sample test run;

That's going to come in the future, you can bet on it. It's just that
stability and correctness are major issues at this times since the
kernel just went through some major surgery. It's expected.

> c) _very_ simple to use without, as I said above, having the
> potential of being accidentally or casually used (such a solution
> should strive, in as much as possible, to provide the same API as
> the targeted Unix kernel);

That, to me, is a system-wide scheduling policy. It doesn't exist
as of yet and I'm sure that it's going to be defined in the near
future.

> d) easily portable to new architectures, while remaining
> consistent, both in terms of API and in terms of behavior, from
> one architecture to the next;

Not a problem. The vast majority of changes are apart of the core
kernel code, not architecturally bound to the x86. The PowerPC folks,
Benjamin and company, can trivially do a port in less than 2 weeks,
if not a single day.

[comments about the "Adeos/RT-nucleus/RTAI-fusion" approach]

This is a good approach, but the limitations to this are things that
require access to the full Linux XFS facilities from SGI. This requires
low level IO subsystems, etc... to be preemptable so that deadlines
can be met and other things.

> One argument that has been leveled against this approach by those
> who champion the vanilla-Linux-should-become-hard-rt cause (many
> of whom are now in the uber-preemption camp) is that it requires
> writing separate real-time drivers. Yet, this argument contains

Not a problem. Clean up is being handled now and drivers that are
abusing things by spinning under disabled preemption are the next
in line to be cleaned up.

> a fatal flaw: drivers do not become deterministic by virtue of
> running on an RTOS. IOW, even if Linux were to be made a Unix RTOS,
> every single driver in the Linux sources would still have to be
> rewritten with determinism in mind in order to be used in a
> system that requires hard-rt. This is therefore a non-issue.

Current latency tracing and bleeding edge reports should point
these places out. These problem places are getting hammered out
often within an hour. This isn't a problem since most drivers are
written well as far as I can see.

> Which brings me back to what you said above: "The problem is that
> the entire OS kernel must be modified to ensure that all code paths
> are deterministic." There are two possible paths here.
>
> Either:
> a) Most current kernel developers intend to eventually convert the
> entire existing code-base into one that contains deterministic
> code paths only, and therefore impose such constraints on all future
> contributors, in which case the path to follow is the one set by
> the uber-preemption folks;

The impact so far has been minimal. It's going to cramp on a few
style points, minimal impact, but the systems will be clearer in the
end.

> or:
> b) Most current kernel developers intend to keep Linux a general
> purpose Unix OS which mainly serves a user-base that does not need
> deterministic hard-rt behavior from Linux, and therefore changes
> for providing deterministic hard-rt behavior are acceptable only
> if they are demonstrably minimal, non-instrusive, yet flexible
> enough for those that demand hard-rt, in which case the path to
> follow is the one set by the nanokernel/hyperviser folks;
>
> So which is it?

This is a non-issue. the uber-preemption folks will continue to do
what they've/we've been doing and it just opens up more opportunities
for dual-domain RT folks. One doesn't exclude from the other.

I'm aiming for single kernel RT since media applications are more
API demanding than classic RT applications.

> [ You'll have to excuse the pace of my participation to this thread,
> I'm giving 9-to-5 training all week. I'll respond as time permits. ]

I hope this was useful. :)

bill

2004-10-28 12:08:05

by Karim Yaghmour

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime


Andrew Morton wrote:
> uber-preemption is the chosen way for the mainline kernel mainly because
> its mechanisms can be largely hidden inside (increasingly ghastly) header
> files and most developers just don't have to worry about it.

Yet, this justification applies readily to the nanokernel approach. The
interrupt pipeline needed by Adeos, for example, is mostly hidden inside
headers and developers have even less to care about it then the increased
concurrency from the uber-preemption patches.

As for how useful the uber-preemption will be, there does not seem to be
any concenssus.
This is your take:
> I have a sneaking suspicion that the day will come when we get nice
> sub-femtosecond latencies in all the trivial benchmarks but it turns out
> that the realtime processes won't be able to *do* anything useful because
> whenever they perform syscalls, those syscalls end up taking long-held
> locks.

This is Ingo's take:
> also, RT applications rarely hit the really complex kernel subsystems
> because 'complex' often means 'has the potential for IO' - on any
> hard-RT OS.

And this is Bill's take:
> The biggest groups of folks I can identify is the multimedia folks.
> Their applications are the most system loaded out of all of the
> applications on this planet, deterministic latency, heavy CPU usage,
> heavy disk load and manical temporal control via frame schedulers, etc...
>
> This stuff has direct application to DVRs (digital video recorders)
> and other things that only SGI and BeOS machines could do in their
> day.

In the 3 replies I got, there were 3 different interpretations of how
uber-preemption will be useful:
1- It's good for latency, but apps won't really get much out of it
because of non-deterministic syscalls.
2- Non-deterministic syscalls aren't a problem as long as they don't
hit complex subsystems.
3- End-applications that interact with many subsystems, especially
I/O is exactly where we want to go.

It appears that different parties draw the line at different places,
and if we follow this to its logical conclusion, it brings us to the
first choice I was highlighting:
> a) Most current kernel developers intend to eventually convert the
> entire existing code-base into one that contains deterministic
> code paths only, and therefore impose such constraints on all future
> contributors, in which case the path to follow is the one set by
> the uber-preemption folks;

IOW, those who need deterministic response times in Linux are unlikely
to be entirely satisfied with uber-preemption, and will want more.
If I read Ingo correctly, greater determinism will come over time as
Linux is made better for SMP systems, and I somewhat agree with this.
However, reduced latencies and increased threading does not in itself
make an OS deterministic in its behavior. For those who really need
a deterministic OS, uber-preemption is therefore but a stepping stone,
and more is likely to be asked for. Yet, more will not be possible
unless there is a change in the kernel's development philosophy (at
least, that's what I can make of it).

> Which does lead me to suggest that we need to identify the target
> application areas for Ingo's current work and confirm that those
> applications are seeing the results which they require. Empirical results
> from the field do seem to indicate success, but I doubt if they're
> sufficiently comprehensive.

Usually one of the litmus tests for this is to hook a function
generator to the system and inject a square wave through an
interrupt-generating I/O (ex.: parallel port), while measuring the
response time of an interrupt service routine and comparing it to
the input wave using an oscilloscope. One sign that the system is
indeed deterministic is that both square waves should appear
steady regardless of the load.

Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546

2004-10-28 12:12:39

by Karim Yaghmour

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime


Ingo Molnar wrote:
> with -RT-V0.3 i get lower than 20 usec _maximum_ latencies during
> './hackbench 20'. (the average latency is 1 usec) So while i'm not yet
> in the sub-femtosecond category, things are looking pretty good in
> PREEMPT_REALTIME land :)

Just curious: what's the setup here? (CPU speed, peripherals, distro,
applications being run to load the system, etc.) You may have described
this elsewhere on LKML and I may have missed it (sorry, I just can't
read everything that comes through).

I'm assuming that the timings are measured using the tracing
functionality currently in the patches.

Thanks,

Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546

2004-10-28 13:06:47

by john cooper

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime

Karim Yaghmour wrote:
> Usually one of the litmus tests for this is to hook a function
> generator to the system and inject a square wave through an
> interrupt-generating I/O (ex.: parallel port), while measuring the
> response time of an interrupt service routine and comparing it to
> the input wave using an oscilloscope. One sign that the system is
> indeed deterministic is that both square waves should appear
> steady regardless of the load.

Ideally yes, but there will still be some phase modulation
due to the natural randomness of interrupt masking for hard
irqs and from scheduling preemption latency for irqs run in
task context. Also contributing to this will be latency
due to interrupt hardware which may not be constant.

One likely observation will be increased contention
from periodic interrupt sources (clock) with the injected
square wave interrupt when these frequencies (or their
harmonics) approach each other. The contention would
appear periodic at the difference of these interrupt
frequencies.

Other sources of phase bobble will include variable
CPU cache content, loading of the bus from other DMA
masters, SMP bus contention, etc.. which are much more
difficult to address.

-john


--
[email protected]

2004-10-28 13:16:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH] Restricted hard realtime


* Karim Yaghmour <[email protected]> wrote:

> >with -RT-V0.3 i get lower than 20 usec _maximum_ latencies during
> >'./hackbench 20'. (the average latency is 1 usec) So while i'm not yet
> >in the sub-femtosecond category, things are looking pretty good in
> >PREEMPT_REALTIME land :)
>
> Just curious: what's the setup here? (CPU speed, peripherals, distro,
> applications being run to load the system, etc.) [...]

2 GHz Athlon running stock Fedora Core. './hackbench 20' was the
workload.

> I'm assuming that the timings are measured using the tracing
> functionality currently in the patches.

no, i used a user-space timing app called 'realfeel', but the numbers
were corroborated by the in-kernel tracer too.

but ... the best test would be if you tried the patch, it's not hard ;)
There are newer versions since i did the above measurement and testing
feedback is always welcome.

Ingo