LinuxLists.cc - [ANNOUNCE] Linux 2.6 Real Time Kernel

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 01:59, Sven-Thorsten Dietrich wrote:
> Announcing the availability of prototype real-time (RT)
> enhancements to the Linux 2.6 kernel.
>

Does not compile:

CC arch/i386/kernel/semaphore.o
CC arch/i386/kernel/signal.o
AS arch/i386/kernel/entry.o
CC arch/i386/kernel/traps.o
CC arch/i386/kernel/irq.o
arch/i386/kernel/irq.c: In function `do_IRQ':
arch/i386/kernel/irq.c:582: error: too many arguments to function `note_interrupt'
arch/i386/kernel/irq.c:667: warning: ISO C90 forbids mixed declarations and code
arch/i386/kernel/irq.c:751: error: initializer element is not constant
arch/i386/kernel/irq.c:751: error: (near initialization for `__ksymtab_request_irq.value')
arch/i386/kernel/irq.c:809: error: initializer element is not constant
arch/i386/kernel/irq.c:809: error: (near initialization for `__ksymtab_free_irq.value')
arch/i386/kernel/irq.c:904: error: initializer element is not constant
arch/i386/kernel/irq.c:904: error: (near initialization for `__ksymtab_probe_irq_on.value')
arch/i386/kernel/irq.c:1004: error: initializer element is not constant
arch/i386/kernel/irq.c:1004: error: (near initialization for `__ksymtab_probe_irq_off.value')
arch/i386/kernel/irq.c:1246: error: initializer element is not constant
arch/i386/kernel/irq.c:1246: error: (near initialization for `__ksymtab_do_softirq.value')
arch/i386/kernel/irq.c:1246: error: parse error at end of input
arch/i386/kernel/irq.c:648: warning: label `out_no_end' defined but not used
arch/i386/kernel/irq.c:79: warning: 'register_irq_proc' declared `static' but never defined
arch/i386/kernel/irq.c:277: warning: 'report_bad_irq' defined but not used
make[1]: *** [arch/i386/kernel/irq.o] Error 1
make: *** [arch/i386/kernel] Error 2

I am using gcc 3.4. I accepted all the default settings except I
enabled "Run all IRQS in threads".

Lee

2004-10-09 07:34:13

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Do you have 4k stacks turned off? The docs make note of this.

Daniel Walker

On Fri, 2004-10-08 at 23:40, Lee Revell wrote:
> On Sat, 2004-10-09 at 01:59, Sven-Thorsten Dietrich wrote:
> > Announcing the availability of prototype real-time (RT)
> > enhancements to the Linux 2.6 kernel.
> >
>
> Does not compile:
>
> CC arch/i386/kernel/semaphore.o
> CC arch/i386/kernel/signal.o
> AS arch/i386/kernel/entry.o
> CC arch/i386/kernel/traps.o
> CC arch/i386/kernel/irq.o
> arch/i386/kernel/irq.c: In function `do_IRQ':
> arch/i386/kernel/irq.c:582: error: too many arguments to function `note_interrupt'
> arch/i386/kernel/irq.c:667: warning: ISO C90 forbids mixed declarations and code
> arch/i386/kernel/irq.c:751: error: initializer element is not constant
> arch/i386/kernel/irq.c:751: error: (near initialization for `__ksymtab_request_irq.value')
> arch/i386/kernel/irq.c:809: error: initializer element is not constant
> arch/i386/kernel/irq.c:809: error: (near initialization for `__ksymtab_free_irq.value')
> arch/i386/kernel/irq.c:904: error: initializer element is not constant
> arch/i386/kernel/irq.c:904: error: (near initialization for `__ksymtab_probe_irq_on.value')
> arch/i386/kernel/irq.c:1004: error: initializer element is not constant
> arch/i386/kernel/irq.c:1004: error: (near initialization for `__ksymtab_probe_irq_off.value')
> arch/i386/kernel/irq.c:1246: error: initializer element is not constant
> arch/i386/kernel/irq.c:1246: error: (near initialization for `__ksymtab_do_softirq.value')
> arch/i386/kernel/irq.c:1246: error: parse error at end of input
> arch/i386/kernel/irq.c:648: warning: label `out_no_end' defined but not used
> arch/i386/kernel/irq.c:79: warning: 'register_irq_proc' declared `static' but never defined
> arch/i386/kernel/irq.c:277: warning: 'report_bad_irq' defined but not used
> make[1]: *** [arch/i386/kernel/irq.o] Error 1
> make: *** [arch/i386/kernel] Error 2
>
> I am using gcc 3.4. I accepted all the default settings except I
> enabled "Run all IRQS in threads".
>
> Lee
>

2004-10-09 07:42:48

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 03:33, Daniel Walker wrote:
> Do you have 4k stacks turned off? The docs make note of this.
>

My mistake, it works now.

Lee

2004-10-09 08:52:47

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 03:33, Daniel Walker wrote:
> Do you have 4k stacks turned off? The docs make note of this.
>

OK after fixing this it builds OK, but several modules complain about
unresolved symbols:

Oct 9 04:43:23 krustophenia kernel: usbcore: Unknown symbol kmutex_unlock
Oct 9 04:43:23 krustophenia kernel: usbcore: Unknown symbol kmutex_lock
Oct 9 04:43:23 krustophenia kernel: usbcore: Unknown symbol kmutex_init
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_hcd_pci_probe
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_check_bandwidth
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_disabled
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_release_bandwidth
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_register_root_hub
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_put_dev
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_get_dev
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_claim_bandwidth
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_hcd_giveback_urb
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol kmutex_unlock
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol kmutex_lock
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_hcd_pci_remove
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol kmutex_init
Oct 9 04:43:23 krustophenia kernel: uhci_hcd: Unknown symbol usb_alloc_dev
Oct 9 04:43:23 krustophenia kernel: usbcore: Unknown symbol kmutex_unlock
Oct 9 04:43:23 krustophenia kernel: usbcore: Unknown symbol kmutex_lock
Oct 9 04:43:23 krustophenia kernel: usbcore: Unknown symbol kmutex_init
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_alloc_urb
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_free_urb
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_register
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_submit_urb
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_control_msg
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_deregister
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_string
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_unlink_urb
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol kmutex_unlock
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol kmutex_lock
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_kill_urb
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_buffer_free
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol kmutex_init
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol __usb_get_extra_descriptor
Oct 9 04:43:23 krustophenia kernel: usbhid: Unknown symbol usb_buffer_alloc
Oct 9 04:43:23 krustophenia kernel: via_rhine: Unknown symbol kmutex_unlock
Oct 9 04:43:23 krustophenia kernel: via_rhine: Unknown symbol kmutex_lock
Oct 9 04:43:23 krustophenia kernel: via_rhine: Unknown symbol kmutex_init

Lee

2004-10-09 10:51:23

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Sven-Thorsten Dietrich <[email protected]> writes:

> +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT)
> +/*
> + * This could be a long-held lock. If another CPU holds it for a long time,
> + * and that CPU is not asked to reschedule then *this* CPU will spin on the
> + * lock for a long time, even if *this* CPU is asked to reschedule.
> + *
> + * So what we do here, in the slow (contended) path is to spin on the lock by
> + * hand while permitting preemption.
> + *
> + * Called inside preempt_disable().
> + */
> +
> +/* these functions are only called from inside spin_lock
> + * and old_write_lock therefore under spinlock substitution
> + * they will only be passed old spinlocks or old rwlocks as parameter
> + * there are no issues with modified mutex behavior here. */
> +
> +#endif /* defined(CONFIG_SMP) && defined(CONFIG_PREEMPT) */

May I inquire as to the purpose of placing a couple of comments under
an #ifdef?

--
M?ns Rullg?rd
[email protected]

2004-10-09 12:54:20

by John Hedditch

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

By disabling compilation of usb, s2io and scsi I can get this to build and link, but it hangs immediately on getting
to init.

Cheers,
John

2004-10-09 13:15:33

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

I got this thing to build by adding a few EXPORT_SYMBOL, patch below.
Now it seems to be running quite well. I am, however, getting
occasional "bad: scheduling while atomic!" messages, all alike:

bad: scheduling while atomic!
[<c02ef301>] schedule+0x4e5/0x4ea
[<c0114cbe>] try_to_wake_up+0x99/0xa8
[<c01332e2>] __p_mutex_down+0xfe/0x190
[<c029e238>] alloc_skb+0x32/0xc3
[<c01335e0>] kmutex_is_locked+0x1f/0x33
[<c029fa63>] skb_queue_tail+0x1c/0x45
[<c02eb43e>] unix_stream_sendmsg+0x22c/0x38c
[<c029ab03>] sock_sendmsg+0xc9/0xe3
[<c029f95a>] skb_dequeue+0x4a/0x5b
[<c02eb9e6>] unix_stream_recvmsg+0x119/0x430
[<c0137f06>] __alloc_pages+0x1cc/0x33f
[<c01168ca>] autoremove_wake_function+0x0/0x43
[<c029af4b>] sock_readv_writev+0x6e/0x97
[<c029afec>] sock_writev+0x37/0x3e
[<c029afb5>] sock_writev+0x0/0x3e
[<c014ebb8>] do_readv_writev+0x1db/0x21f
[<c01168ca>] autoremove_wake_function+0x0/0x43
[<c014e5ca>] vfs_read+0xd0/0xf5
[<c014ec94>] vfs_writev+0x49/0x52
[<c014ed5a>] sys_writev+0x47/0x76
[<c0103f09>] sysenter_past_esp+0x52/0x71

USB, sound and wireless are all working nicely.

Now the patch:

--- kernel/kmutex.c~ 2004-10-09 12:51:37 +02:00
+++ kernel/kmutex.c 2004-10-09 13:50:43 +02:00
@@ -20,6 +20,7 @@
#include <linux/config.h>
#include <linux/kmutex.h>
#include <linux/sched.h>
+#include <linux/module.h>

# if defined CONFIG_PMUTEX
# include <linux/pmutex.h>
@@ -40,11 +41,14 @@
return p_mutex_trylock(&(lock->kmtx));
}

+EXPORT_SYMBOL(kmutex_trylock);

inline int kmutex_is_locked(struct kmutex *lock)
{
return p_mutex_is_locked(&(lock->kmtx));
}
+
+EXPORT_SYMBOL(kmutex_is_locked);
# endif

@@ -60,6 +64,7 @@
#endif
}

+EXPORT_SYMBOL(kmutex_init);

/*
* print warning is case kmutex_lock is called while preempt count is
@@ -88,6 +93,8 @@
#endif
}

+EXPORT_SYMBOL(kmutex_lock);
+
void kmutex_unlock(struct kmutex *lock)
{
#if defined CONFIG_KMUTEX_DEBUG
@@ -102,6 +109,7 @@
#endif
}

+EXPORT_SYMBOL(kmutex_unlock);

void kmutex_unlock_wait(struct kmutex * lock)
{
@@ -111,4 +119,4 @@
}
}

-
+EXPORT_SYMBOL(kmutex_unlock_wait);

--
M?ns Rullg?rd
[email protected]

2004-10-09 17:33:27

by Karim Yaghmour

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Sven-Thorsten Dietrich wrote:
> - Voluntary Preemption by Ingo Molnar
> - IRQ thread patches by Scott Wood and Ingo Molnar
> - BKL mutex patch by Ingo Molnar (with MV extensions)
> - PMutex from Germany's Universitaet der Bundeswehr, Munich
> - MontaVista mutex abstraction layer replacing spinlocks with mutexes

To the best of my understanding, this still doesn't provide deterministic
hard-real-time performance in Linux.

> There are several micro-kernel solutions available, which achieve
> the required performance, but there are two general concerns with
> such solutions:
>
> 1. Two separate kernel environments, creating more overall
> system complexity and application design complexity.
> 2. Legal controversy.

It's been quite a while since any of this has been true.

> In line with the above mentioned previous Kernel enhancements,
> our work is designed to be transparent to existing applications
> and drivers.

I guess you haven't taken a look at the work on RTAI/fusion lately.
Applications use the same Linux API, and get deterministic
hard-real-time response times. It's really much less complicated
to use than the above-suggested aggregate.

Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546

2004-10-09 18:30:40

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 13:41, Karim Yaghmour wrote:
> Sven-Thorsten Dietrich wrote:
> > - Voluntary Preemption by Ingo Molnar
> > - IRQ thread patches by Scott Wood and Ingo Molnar
> > - BKL mutex patch by Ingo Molnar (with MV extensions)
> > - PMutex from Germany's Universitaet der Bundeswehr, Munich
> > - MontaVista mutex abstraction layer replacing spinlocks with mutexes
>
> To the best of my understanding, this still doesn't provide deterministic
> hard-real-time performance in Linux.

Using only the VP+IRQ thread patch, I ran my RT app for 11 million
cycles yesterday, with a maximum delay of 190 usecs. How would this not
satisfy a 200 usec hard RT constraint?

PHB: "I've looked at your proposal and decided it can't be done"
Dilbert: "I just did it. It's working perfectly"

Lee

2004-10-09 19:24:59

by stefan.eletzhofer

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, Oct 09, 2004 at 02:30:28PM -0400, Lee Revell wrote:
> On Sat, 2004-10-09 at 13:41, Karim Yaghmour wrote:
> > Sven-Thorsten Dietrich wrote:
> > > - Voluntary Preemption by Ingo Molnar
> > > - IRQ thread patches by Scott Wood and Ingo Molnar
> > > - BKL mutex patch by Ingo Molnar (with MV extensions)
> > > - PMutex from Germany's Universitaet der Bundeswehr, Munich
> > > - MontaVista mutex abstraction layer replacing spinlocks with mutexes
> >
> > To the best of my understanding, this still doesn't provide deterministic
> > hard-real-time performance in Linux.
>
> Using only the VP+IRQ thread patch, I ran my RT app for 11 million
> cycles yesterday, with a maximum delay of 190 usecs. How would this not
> satisfy a 200 usec hard RT constraint?

I think the keyword here is "deterministic", isn't it?

>
> PHB: "I've looked at your proposal and decided it can't be done"
> Dilbert: "I just did it. It's working perfectly"
>
> Lee
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Stefan Eletzhofer
InQuant Data GBR
http://www.inquant.de
+49 (0) 751 35 44 112
+49 (0) 171 23 24 529 (Mobil)
+49 (0) 751 35 44 115 (FAX)

2004-10-09 19:31:06

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 17:26, [email protected] wrote:
> On Sat, Oct 09, 2004 at 02:30:28PM -0400, Lee Revell wrote:
> > On Sat, 2004-10-09 at 13:41, Karim Yaghmour wrote:
> > > Sven-Thorsten Dietrich wrote:
> > > > - Voluntary Preemption by Ingo Molnar
> > > > - IRQ thread patches by Scott Wood and Ingo Molnar
> > > > - BKL mutex patch by Ingo Molnar (with MV extensions)
> > > > - PMutex from Germany's Universitaet der Bundeswehr, Munich
> > > > - MontaVista mutex abstraction layer replacing spinlocks with mutexes
> > >
> > > To the best of my understanding, this still doesn't provide deterministic
> > > hard-real-time performance in Linux.
> >
> > Using only the VP+IRQ thread patch, I ran my RT app for 11 million
> > cycles yesterday, with a maximum delay of 190 usecs. How would this not
> > satisfy a 200 usec hard RT constraint?
>
> I think the keyword here is "deterministic", isn't it?

Well, depends what you mean by deterministic. Some RT apps only require
an upper bound on response time. This is a form of determinism.

Lee

2004-10-09 19:37:05

by stefan.eletzhofer

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, Oct 09, 2004 at 03:30:27PM -0400, Lee Revell wrote:
> On Sat, 2004-10-09 at 17:26, [email protected] wrote:
> > On Sat, Oct 09, 2004 at 02:30:28PM -0400, Lee Revell wrote:
> > > On Sat, 2004-10-09 at 13:41, Karim Yaghmour wrote:
> > > > Sven-Thorsten Dietrich wrote:
> > > > > - Voluntary Preemption by Ingo Molnar
> > > > > - IRQ thread patches by Scott Wood and Ingo Molnar
> > > > > - BKL mutex patch by Ingo Molnar (with MV extensions)
> > > > > - PMutex from Germany's Universitaet der Bundeswehr, Munich
> > > > > - MontaVista mutex abstraction layer replacing spinlocks with mutexes
> > > >
> > > > To the best of my understanding, this still doesn't provide deterministic
> > > > hard-real-time performance in Linux.
> > >
> > > Using only the VP+IRQ thread patch, I ran my RT app for 11 million
> > > cycles yesterday, with a maximum delay of 190 usecs. How would this not
> > > satisfy a 200 usec hard RT constraint?
> >
> > I think the keyword here is "deterministic", isn't it?
>
> Well, depends what you mean by deterministic. Some RT apps only require
> an upper bound on response time. This is a form of determinism.

Yes. But can you give that upper bound "a priori", that is w/o doing
measurements with your application?

Without that I think its impossible to get _guaranteed_ upper
bounds, regardles of the application running. I think thats what
"hard real-time" is all about.

Stefan

>
> Lee
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Stefan Eletzhofer
InQuant Data GBR
http://www.inquant.de
+49 (0) 751 35 44 112
+49 (0) 171 23 24 529 (Mobil)
+49 (0) 751 35 44 115 (FAX)

2004-10-09 19:39:15

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Lee Revell <[email protected]> writes:

>> > > To the best of my understanding, this still doesn't provide
>> > > deterministic hard-real-time performance in Linux.
>> >
>> > Using only the VP+IRQ thread patch, I ran my RT app for 11 million
>> > cycles yesterday, with a maximum delay of 190 usecs. How would this not
>> > satisfy a 200 usec hard RT constraint?
>>
>> I think the keyword here is "deterministic", isn't it?
>
> Well, depends what you mean by deterministic. Some RT apps only require
> an upper bound on response time. This is a form of determinism.

Sure, but running for a zillion cycles without breaking some limit
doesn't guarantee that it never will happen. Being able to give such
a guarantee is what determinism is about.

--
M?ns Rullg?rd
[email protected]

2004-10-09 19:47:24

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 17:38, [email protected] wrote:
> On Sat, Oct 09, 2004 at 03:30:27PM -0400, Lee Revell wrote:
> > On Sat, 2004-10-09 at 17:26, [email protected] wrote:
> > > On Sat, Oct 09, 2004 at 02:30:28PM -0400, Lee Revell wrote:
> > > > On Sat, 2004-10-09 at 13:41, Karim Yaghmour wrote:
> > > > > Sven-Thorsten Dietrich wrote:
> > > > > > - Voluntary Preemption by Ingo Molnar
> > > > > > - IRQ thread patches by Scott Wood and Ingo Molnar
> > > > > > - BKL mutex patch by Ingo Molnar (with MV extensions)
> > > > > > - PMutex from Germany's Universitaet der Bundeswehr, Munich
> > > > > > - MontaVista mutex abstraction layer replacing spinlocks with mutexes
> > > > >
> > > > > To the best of my understanding, this still doesn't provide deterministic
> > > > > hard-real-time performance in Linux.
> > > >
> > > > Using only the VP+IRQ thread patch, I ran my RT app for 11 million
> > > > cycles yesterday, with a maximum delay of 190 usecs. How would this not
> > > > satisfy a 200 usec hard RT constraint?
> > >
> > > I think the keyword here is "deterministic", isn't it?
> >
> > Well, depends what you mean by deterministic. Some RT apps only require
> > an upper bound on response time. This is a form of determinism.
>
> Yes. But can you give that upper bound "a priori", that is w/o doing
> measurements with your application?
>

Yes. The upper bound on the response time of an RT task is a function
of the longest non-preemptible code path in the kernel. Currently this
is the processing of a single packet by netif_receive_skb.

AIUI hard realtime is about bounded response times. How does this not
qualify?

Lee

2004-10-09 20:04:33

by Karim Yaghmour

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Lee Revell wrote:
> Yes. The upper bound on the response time of an RT task is a function
> of the longest non-preemptible code path in the kernel. Currently this
> is the processing of a single packet by netif_receive_skb.

And this has been demonstrated mathematically/algorithmically to be
true 100% of the time, regardless of the load and the driver set? IOW,
if I was building an automated industrial saw (based on a VP+IRQ-thread
kernel or a combination of the above-mentioned agregate) with a
safety mechanism that depended on the kernel's responsivness to
outside events to avoid bodily harm, would you be willing to put your
hand beneath it?

How about things like a hard-rt deterministic nanosleep() 100% of the
time with RTAI/fusion?

Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546

2004-10-09 20:19:20

by Robert Love

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 15:47 -0400, Lee Revell wrote:

> Yes. The upper bound on the response time of an RT task is a function
> of the longest non-preemptible code path in the kernel. Currently this
> is the processing of a single packet by netif_receive_skb.
>
> AIUI hard realtime is about bounded response times. How does this not
> qualify?

I am actually in agreement with you, favoring this soft real-time
approach, but this is not bounded response time or determinism. There
are no guarantees, no measurements conducted with all possible inputs,
sizes, errors, and so on. This soft real-time approach gives great
average case--but the worst case is only a measurement on a specific
machine in a specific workload.

Robert Love

2004-10-09 20:18:49

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 16:11, Karim Yaghmour wrote:
> Lee Revell wrote:
> > Yes. The upper bound on the response time of an RT task is a function
> > of the longest non-preemptible code path in the kernel. Currently this
> > is the processing of a single packet by netif_receive_skb.
>
> And this has been demonstrated mathematically/algorithmically to be
> true 100% of the time, regardless of the load and the driver set? IOW,
> if I was building an automated industrial saw (based on a VP+IRQ-thread
> kernel or a combination of the above-mentioned agregate) with a
> safety mechanism that depended on the kernel's responsivness to
> outside events to avoid bodily harm, would you be willing to put your
> hand beneath it?
>

In theory, I think yes, if all IRQs on the system run in threads except
the saw interrupt, and the RT task that controls the saw runs at a
higher priority than all the IRQ threads. You can guarantee that other
interrupts won't delay the saw, because the saw irq is the only thing on
the system that runs in interrupt context. With the current VP
implementation you are still bounded by the longest non-preemptible code
path in the kernel AKA the longest time that a spinlock is held.
Replacing most spinlocks with mutexes reduces this to less than 20 code
paths according to Mvista, which then can be individually audited for
RT-safeness.

That being said, no way would I put my hand under the saw with the
current implementation. But, unless I am missing something, it seems
like this kind of determinism is possible with the Mvista design.

> How about things like a hard-rt deterministic nanosleep() 100% of the
> time with RTAI/fusion?

I will check that out, I have not looked at RTAI in over a year.

Lee

2004-10-09 20:27:52

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 16:20, Robert Love wrote:
> On Sat, 2004-10-09 at 15:47 -0400, Lee Revell wrote:
>
> > Yes. The upper bound on the response time of an RT task is a function
> > of the longest non-preemptible code path in the kernel. Currently this
> > is the processing of a single packet by netif_receive_skb.
> >
> > AIUI hard realtime is about bounded response times. How does this not
> > qualify?
>
> I am actually in agreement with you, favoring this soft real-time
> approach, but this is not bounded response time or determinism. There
> are no guarantees, no measurements conducted with all possible inputs,
> sizes, errors, and so on. This soft real-time approach gives great
> average case--but the worst case is only a measurement on a specific
> machine in a specific workload.

I did not mean to say that VP approach alone can do hard realtime, that
was just an example. But, when combined the MontaVista approach of
turning all but ~20 spinlocks into mutexes, it seems like the amount of
non-preemptible code is small enough that you could analyze it all and
start to make hard RT guarantees.

Lee

2004-10-09 20:51:35

by Karim Yaghmour

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Lee Revell wrote:
> In theory, I think yes, if all IRQs on the system run in threads except
> the saw interrupt, and the RT task that controls the saw runs at a
> higher priority than all the IRQ threads. You can guarantee that other
> interrupts won't delay the saw, because the saw irq is the only thing on
> the system that runs in interrupt context. With the current VP
> implementation you are still bounded by the longest non-preemptible code
> path in the kernel AKA the longest time that a spinlock is held.
> Replacing most spinlocks with mutexes reduces this to less than 20 code
> paths according to Mvista, which then can be individually audited for
> RT-safeness.
>
> That being said, no way would I put my hand under the saw with the
> current implementation. But, unless I am missing something, it seems
> like this kind of determinism is possible with the Mvista design.

It may be a question of taste, but even if that did work, which I am
not convinced of, it seems to me that it's awfully convoluted.
With the current interrupt pipeline mechanism part of Adeos, on
which RTAI and RTAI fusion are built, I can give you absolute hard-rt
deterministic guarantees while keeping the spinlocks intact, and not
having to check for the rt-safeness of any part of the kernel. You
just write the time-sensitive saw driver int handler in front of
Linux in the ipipe and you're done: 100% deterministic hard-rt,
regardless of the application load and the driver set.

> I will check that out, I have not looked at RTAI in over a year.

Here are some interesting links:

RTAI/fusion presentation by Philipppe Gerum last July (see slide 25
for some interesting numbers):
http://www.enseirb.fr/~kadionik/rmll2004/presentation/philippe_gerum.pdf
Here's a thread that explains the details about RTAI/fusion:
https://mail.rtai.org/pipermail/rtai/2004-June/thread.html#7909
Here's the ipipe core API:
http://home.gna.org/adeos/doc/api/interface_8h.html

Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546

2004-10-09 20:59:26

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 16:53, Karim Yaghmour wrote:
> Lee Revell wrote:
> > In theory, I think yes, if all IRQs on the system run in threads except
> > the saw interrupt, and the RT task that controls the saw runs at a
> > higher priority than all the IRQ threads. You can guarantee that other
> > interrupts won't delay the saw, because the saw irq is the only thing on
> > the system that runs in interrupt context. With the current VP
> > implementation you are still bounded by the longest non-preemptible code
> > path in the kernel AKA the longest time that a spinlock is held.
> > Replacing most spinlocks with mutexes reduces this to less than 20 code
> > paths according to Mvista, which then can be individually audited for
> > RT-safeness.
> >
> > That being said, no way would I put my hand under the saw with the
> > current implementation. But, unless I am missing something, it seems
> > like this kind of determinism is possible with the Mvista design.
>
> It may be a question of taste, but even if that did work, which I am
> not convinced of, it seems to me that it's awfully convoluted.
> With the current interrupt pipeline mechanism part of Adeos, on
> which RTAI and RTAI fusion are built, I can give you absolute hard-rt
> deterministic guarantees while keeping the spinlocks intact, and not
> having to check for the rt-safeness of any part of the kernel. You
> just write the time-sensitive saw driver int handler in front of
> Linux in the ipipe and you're done: 100% deterministic hard-rt,
> regardless of the application load and the driver set.

True, there are probably too many "ifs" in my above statement for a saw
or an airplane or a power plant. There does seem to be a gray area
between soft and hard realtime, where either approach could be
reasonable. For example the Mt. St. Helens example, where you could
miss a sample and it would be really bad, but not kill anyone.

Lee

2004-10-09 21:20:39

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 09:15, M?ns Rullg?rd wrote:
> I got this thing to build by adding a few EXPORT_SYMBOL, patch below.
> Now it seems to be running quite well. I am, however, getting
> occasional "bad: scheduling while atomic!" messages, all alike:
>

I am getting the same message. Also, leaving all the default debug
options on, I got this debug output, but it did not coincide with the
"bad" messages.

Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)

Lee

2004-10-09 21:35:34

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Lee Revell <[email protected]> writes:

> On Sat, 2004-10-09 at 09:15, M?ns Rullg?rd wrote:
>> I got this thing to build by adding a few EXPORT_SYMBOL, patch below.
>> Now it seems to be running quite well. I am, however, getting
>> occasional "bad: scheduling while atomic!" messages, all alike:
>>
>
> I am getting the same message. Also, leaving all the default debug
> options on, I got this debug output, but it did not coincide with the
> "bad" messages.
>
> Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
> Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
> Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)

Well, those don't give me any clues.

I had the system running that kernel for a bit over an hour and got
five of the "bad" messages, approximately evenly spaced in a
two-minute interval about 20 minutes after boot.

I did notice one improvement compared to vanilla 2.6.8.1. The sound
didn't skip when I switched from X to a text console. However, my
keyboard no longer worked in X, but that seems to be due to some
recent changes to the input subsystem.

Did you build it with our without my patch, BTW?

--
M?ns Rullg?rd
[email protected]

2004-10-09 21:41:19

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 17:35, M?ns Rullg?rd wrote:

> I did notice one improvement compared to vanilla 2.6.8.1. The sound
> didn't skip when I switched from X to a text console. However, my
> keyboard no longer worked in X, but that seems to be due to some
> recent changes to the input subsystem.
>
> Did you build it with our without my patch, BTW?

With. Most of the modules did not work without your patch.

Lee

2004-10-09 21:45:27

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Lee Revell <[email protected]> writes:

> On Sat, 2004-10-09 at 17:35, M?ns Rullg?rd wrote:
>
>> I did notice one improvement compared to vanilla 2.6.8.1. The sound
>> didn't skip when I switched from X to a text console. However, my
>> keyboard no longer worked in X, but that seems to be due to some
>> recent changes to the input subsystem.
>>
>> Did you build it with our without my patch, BTW?
>
> With. Most of the modules did not work without your patch.

Do the Montavista folks build their kernels without modules?

--
M?ns Rullg?rd
[email protected]

2004-10-09 22:04:18

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 17:35, M?ns Rullg?rd wrote:
> Lee Revell <[email protected]> writes:
>
> > On Sat, 2004-10-09 at 09:15, M?ns Rullg?rd wrote:
> >> I got this thing to build by adding a few EXPORT_SYMBOL, patch below.
> >> Now it seems to be running quite well. I am, however, getting
> >> occasional "bad: scheduling while atomic!" messages, all alike:
> >>
> >
> > I am getting the same message. Also, leaving all the default debug
> > options on, I got this debug output, but it did not coincide with the
> > "bad" messages.
> >
> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>
> Well, those don't give me any clues.
>
> I had the system running that kernel for a bit over an hour and got
> five of the "bad" messages, approximately evenly spaced in a
> two-minute interval about 20 minutes after boot.
>

I am getting these too:

bad: scheduling while atomic!
[<c0279c5a>] schedule+0x62a/0x630
[<c013b137>] kmutex_unlock+0x37/0x50
[<c013ab0d>] __p_mutex_down+0x1ed/0x360
[<c013b1e0>] kmutex_is_locked+0x20/0x40
[<c01cba47>] journal_dirty_data+0x77/0x230
[<c01bf2e2>] ext3_journal_dirty_data+0x12/0x40
[<c01bf150>] walk_page_buffers+0x60/0x70
[<c01bf7c7>] ext3_ordered_writepage+0xf7/0x160
[<c01bf6b0>] journal_dirty_data_fn+0x0/0x20
[<c018067d>] mpage_writepages+0x29d/0x3e0
[<c01bf6d0>] ext3_ordered_writepage+0x0/0x160
[<c0141c09>] do_writepages+0x39/0x50
[<c017ec5f>] __sync_single_inode+0x5f/0x220
[<c017f0b7>] sync_sb_inodes+0x1c7/0x2e0
[<c017f2c7>] writeback_inodes+0xf7/0x110
[<c0141a03>] wb_kupdate+0x93/0x100
[<c0142ccf>] __pdflush+0x2af/0x5a0
[<c0142fc0>] pdflush+0x0/0x30
[<c0142fde>] pdflush+0x1e/0x30
[<c0141970>] wb_kupdate+0x0/0x100
[<c0134af3>] kthread+0xa3/0xb0
[<c0134a50>] kthread+0x0/0xb0
[<c0103fe5>] kernel_thread_helper+0x5/0x10

Lee

2004-10-09 22:22:10

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Lee Revell <[email protected]> writes:

> On Sat, 2004-10-09 at 17:35, M?ns Rullg?rd wrote:
>> Lee Revell <[email protected]> writes:
>>
>> > On Sat, 2004-10-09 at 09:15, M?ns Rullg?rd wrote:
>> >> I got this thing to build by adding a few EXPORT_SYMBOL, patch below.
>> >> Now it seems to be running quite well. I am, however, getting
>> >> occasional "bad: scheduling while atomic!" messages, all alike:
>> >>
>> >
>> > I am getting the same message. Also, leaving all the default debug
>> > options on, I got this debug output, but it did not coincide with the
>> > "bad" messages.
>> >
>> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
>> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
>> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
>> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>>
>> Well, those don't give me any clues.
>>
>> I had the system running that kernel for a bit over an hour and got
>> five of the "bad" messages, approximately evenly spaced in a
>> two-minute interval about 20 minutes after boot.
>>
>
> I am getting these too:
>
> bad: scheduling while atomic!
> [<c0279c5a>] schedule+0x62a/0x630
> [<c013b137>] kmutex_unlock+0x37/0x50
> [<c013ab0d>] __p_mutex_down+0x1ed/0x360
> [<c013b1e0>] kmutex_is_locked+0x20/0x40
> [<c01cba47>] journal_dirty_data+0x77/0x230
> [<c01bf2e2>] ext3_journal_dirty_data+0x12/0x40

My machine is mostly XFS, which might explain why I haven't seen any
of those. I've found XFS to perform better with the multi-gigabyte
files I often deal with.

--
M?ns Rullg?rd
[email protected]

2004-10-09 22:55:38

[permalink] [raw]

Subject: RE: [ANNOUNCE] Linux 2.6 Real Time Kernel

Thanks for giving it a try!

The "bad: scheduling while atomic!" are indicative
of blocking on a mutex while holding a spinlock.

You can see the __p_mutex_down in the trace.

See the notes in the original announce
regarding the partitioning : work in progress.

I can't offer a fix for that now, but I will
post an updated mutex patch for the
EXPORT_SYMBOLS / module build issue.

Sven

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of M?ns Rullg?rd
> Sent: Saturday, October 09, 2004 6:15 AM
> To: [email protected]
> Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel
>
>
>
> I got this thing to build by adding a few EXPORT_SYMBOL, patch below.
> Now it seems to be running quite well. I am, however, getting
> occasional "bad: scheduling while atomic!" messages, all alike:
>
> bad: scheduling while atomic!
> [<c02ef301>] schedule+0x4e5/0x4ea
> [<c0114cbe>] try_to_wake_up+0x99/0xa8
> [<c01332e2>] __p_mutex_down+0xfe/0x190
> [<c029e238>] alloc_skb+0x32/0xc3
> [<c01335e0>] kmutex_is_locked+0x1f/0x33
> [<c029fa63>] skb_queue_tail+0x1c/0x45
> [<c02eb43e>] unix_stream_sendmsg+0x22c/0x38c
> [<c029ab03>] sock_sendmsg+0xc9/0xe3
> [<c029f95a>] skb_dequeue+0x4a/0x5b
> [<c02eb9e6>] unix_stream_recvmsg+0x119/0x430
> [<c0137f06>] __alloc_pages+0x1cc/0x33f
> [<c01168ca>] autoremove_wake_function+0x0/0x43
> [<c029af4b>] sock_readv_writev+0x6e/0x97
> [<c029afec>] sock_writev+0x37/0x3e
> [<c029afb5>] sock_writev+0x0/0x3e
> [<c014ebb8>] do_readv_writev+0x1db/0x21f
> [<c01168ca>] autoremove_wake_function+0x0/0x43
> [<c014e5ca>] vfs_read+0xd0/0xf5
> [<c014ec94>] vfs_writev+0x49/0x52
> [<c014ed5a>] sys_writev+0x47/0x76
> [<c0103f09>] sysenter_past_esp+0x52/0x71
>
> USB, sound and wireless are all working nicely.
>
> Now the patch:
>
> --- kernel/kmutex.c~ 2004-10-09 12:51:37 +02:00
> +++ kernel/kmutex.c 2004-10-09 13:50:43 +02:00
> @@ -20,6 +20,7 @@
> #include <linux/config.h>
> #include <linux/kmutex.h>
> #include <linux/sched.h>
> +#include <linux/module.h>
>
> # if defined CONFIG_PMUTEX
> # include <linux/pmutex.h>
> @@ -40,11 +41,14 @@
> return p_mutex_trylock(&(lock->kmtx));
> }
>
> +EXPORT_SYMBOL(kmutex_trylock);
>
> inline int kmutex_is_locked(struct kmutex *lock)
> {
> return p_mutex_is_locked(&(lock->kmtx));
> }
> +
> +EXPORT_SYMBOL(kmutex_is_locked);
> # endif
>
>
> @@ -60,6 +64,7 @@
> #endif
> }
>
> +EXPORT_SYMBOL(kmutex_init);
>
> /*
> * print warning is case kmutex_lock is called while preempt count is
> @@ -88,6 +93,8 @@
> #endif
> }
>
> +EXPORT_SYMBOL(kmutex_lock);
> +
> void kmutex_unlock(struct kmutex *lock)
> {
> #if defined CONFIG_KMUTEX_DEBUG
> @@ -102,6 +109,7 @@
> #endif
> }
>
> +EXPORT_SYMBOL(kmutex_unlock);
>
> void kmutex_unlock_wait(struct kmutex * lock)
> {
> @@ -111,4 +119,4 @@
> }
> }
>
> -
> +EXPORT_SYMBOL(kmutex_unlock_wait);
>
>
> --
> M?ns Rullg?rd
> [email protected]
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-10-09 23:21:07

by Dave Hansen

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 00:33, Daniel Walker wrote:
> Do you have 4k stacks turned off? The docs make note of this.

Isn't this a better thing to spell out in a Kconfig file than some
documentation?

-- Dave

2004-10-09 23:24:56

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 19:20, Dave Hansen wrote:
> On Sat, 2004-10-09 at 00:33, Daniel Walker wrote:
> > Do you have 4k stacks turned off? The docs make note of this.
>
> Isn't this a better thing to spell out in a Kconfig file than some
> documentation?

FWIW I did see this in the docs, it's clearly stated, I just forgot that
I had enabled 4k stacks.

Lee

2004-10-09 23:42:38

by Matthias Urlichs

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Hi, Lee Revell wrote:

> On Sat, 2004-10-09 at 03:33, Daniel Walker wrote:
>> Do you have 4k stacks turned off? The docs make note of this.
>>
>
> My mistake, it works now.
>
Actually, if 4k stacks don't work with RT turned on, this exclusion should
be encoded in the appropriate Kconfig file(s).

--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | [email protected]

2004-10-09 23:52:48

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

2004-10-10 00:05:40

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

2004-10-10 00:42:20

by Micha

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, Oct 09, 2004 at 11:35:16PM +0200, M?ns Rullg?rd wrote:
> Lee Revell <[email protected]> writes:
>
> > On Sat, 2004-10-09 at 09:15, M?ns Rullg?rd wrote:
> >> I got this thing to build by adding a few EXPORT_SYMBOL, patch below.
> >> Now it seems to be running quite well. I am, however, getting
> >> occasional "bad: scheduling while atomic!" messages, all alike:
> >>
> >
> > I am getting the same message. Also, leaving all the default debug
> > options on, I got this debug output, but it did not coincide with the
> > "bad" messages.
> >
> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>
> Well, those don't give me any clues.
>
> I had the system running that kernel for a bit over an hour and got
> five of the "bad" messages, approximately evenly spaced in a
> two-minute interval about 20 minutes after boot.
>
> I did notice one improvement compared to vanilla 2.6.8.1. The sound
> didn't skip when I switched from X to a text console. However, my
> keyboard no longer worked in X, but that seems to be due to some
> recent changes to the input subsystem.

There was some change in 2.6.9-pre-something that cause the mouse and
keyboard to exchange event interfaces between them, if it interests you.

>
> Did you build it with our without my patch, BTW?
>
> --
> M?ns Rullg?rd
> [email protected]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
> +++++++++++++++++++++++++++++++++++++++++++
> This Mail Was Scanned By Mail-seCure System
> at the Tel-Aviv University CC.
>

2004-10-10 00:45:49

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 20:05, M?ns Rullg?rd wrote:
> Lee Revell <[email protected]> writes:
>
> > On Sat, 2004-10-09 at 17:35, M?ns Rullg?rd wrote:
> >> Lee Revell <[email protected]> writes:
> >>
> >> > On Sat, 2004-10-09 at 09:15, M?ns Rullg?rd wrote:
> >> >> I got this thing to build by adding a few EXPORT_SYMBOL, patch below.
> >> >> Now it seems to be running quite well. I am, however, getting
> >> >> occasional "bad: scheduling while atomic!" messages, all alike:
> >> >>
> >> >
> >> > I am getting the same message. Also, leaving all the default debug
> >> > options on, I got this debug output, but it did not coincide with the
> >> > "bad" messages.
> >> >
> >> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> >> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
> >> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> >> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
> >> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
> >> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
> >>
> >> Well, those don't give me any clues.
> >
> > Pid 773 is the IRQ thread for eth0. I am using the via-rhine driver.
>
> I was using a prism54 wireless card.

OK, first bug: I lost my PS/2 keyboard, and had to reboot to get it
back. Unplugging and replugging it made Num Lock work again, but the
system did not respond to the keyboard at all. USB mouse continued to
work fine.

Lee

2004-10-10 01:06:15

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Lee Revell <[email protected]> writes:

> On Sat, 2004-10-09 at 20:05, M?ns Rullg?rd wrote:
>> Lee Revell <[email protected]> writes:
>>
>> > On Sat, 2004-10-09 at 17:35, M?ns Rullg?rd wrote:
>> >> Lee Revell <[email protected]> writes:
>> >>
>> >> > On Sat, 2004-10-09 at 09:15, M?ns Rullg?rd wrote:
>> >> >> I got this thing to build by adding a few EXPORT_SYMBOL, patch below.
>> >> >> Now it seems to be running quite well. I am, however, getting
>> >> >> occasional "bad: scheduling while atomic!" messages, all alike:
>> >> >>
>> >> >
>> >> > I am getting the same message. Also, leaving all the default debug
>> >> > options on, I got this debug output, but it did not coincide with the
>> >> > "bad" messages.
>> >> >
>> >> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
>> >> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>> >> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
>> >> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>> >> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
>> >> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>> >>
>> >> Well, those don't give me any clues.
>> >
>> > Pid 773 is the IRQ thread for eth0. I am using the via-rhine driver.
>>
>> I was using a prism54 wireless card.
>
> OK, first bug: I lost my PS/2 keyboard, and had to reboot to get it
> back. Unplugging and replugging it made Num Lock work again, but the
> system did not respond to the keyboard at all. USB mouse continued to
> work fine.

I lost my keyboard as well, though only in X, but I figured that could
be caused by some changes to the input layer that went in between
2.6.9-rc2 and -rc3. My synaptics touchpad also stopped working
properly. USB keyboard and mouse worked properly.

--
M?ns Rullg?rd
[email protected]

2004-10-10 01:08:50

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Micha Feigin <[email protected]> writes:

> On Sat, Oct 09, 2004 at 11:35:16PM +0200, M?ns Rullg?rd wrote:
>> Lee Revell <[email protected]> writes:
>>
>> > On Sat, 2004-10-09 at 09:15, M?ns Rullg?rd wrote:
>> >> I got this thing to build by adding a few EXPORT_SYMBOL, patch below.
>> >> Now it seems to be running quite well. I am, however, getting
>> >> occasional "bad: scheduling while atomic!" messages, all alike:
>> >>
>> >
>> > I am getting the same message. Also, leaving all the default debug
>> > options on, I got this debug output, but it did not coincide with the
>> > "bad" messages.
>> >
>> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
>> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
>> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>> > Mtx: dd84e644 [773] pri (0) inherit from [3] pri(92)
>> > Mtx dd84e644 task [773] pri (92) restored pri(0). Next owner [3] pri (92)
>>
>> Well, those don't give me any clues.
>>
>> I had the system running that kernel for a bit over an hour and got
>> five of the "bad" messages, approximately evenly spaced in a
>> two-minute interval about 20 minutes after boot.
>>
>> I did notice one improvement compared to vanilla 2.6.8.1. The sound
>> didn't skip when I switched from X to a text console. However, my
>> keyboard no longer worked in X, but that seems to be due to some
>> recent changes to the input subsystem.
>
> There was some change in 2.6.9-pre-something that cause the mouse and
> keyboard to exchange event interfaces between them, if it interests you.

The keyboard doesn't have a device entry in my X config file, and I
suppose the mouse would still be at /dev/input/mice, no?

--
M?ns Rullg?rd
[email protected]

2004-10-10 01:09:19

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 21:05, M?ns Rullg?rd wrote:
> > OK, first bug: I lost my PS/2 keyboard, and had to reboot to get it
> > back. Unplugging and replugging it made Num Lock work again, but the
> > system did not respond to the keyboard at all. USB mouse continued to
> > work fine.
>
> I lost my keyboard as well, though only in X, but I figured that could
> be caused by some changes to the input layer that went in between
> 2.6.9-rc2 and -rc3. My synaptics touchpad also stopped working
> properly. USB keyboard and mouse worked properly.

Looks like the same areas that were problematic with the VP kernel will
be an issue here. I suspect many of the fixes already exist in Ingo's
patch or in -mm.

I think my keyboard issue is different because it worked fine, then I
lost it suddenly.

Lee

2004-10-10 01:15:49

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sat, 2004-10-09 at 01:59, Sven-Thorsten Dietrich wrote:
> Announcing the availability of prototype real-time (RT)
> enhancements to the Linux 2.6 kernel.
>

More "scheduling while atomic":

Oct 9 21:06:55 krustophenia kernel: bad: scheduling while atomic!
Oct 9 21:06:55 krustophenia kernel: [schedule+1578/1584] schedule+0x62a/0x630
Oct 9 21:06:55 krustophenia kernel: [__p_mutex_down+493/864] __p_mutex_down+0x1ed/0x360
Oct 9 21:06:55 krustophenia kernel: [kmutex_is_locked+32/64] kmutex_is_locked+0x20/0x40
Oct 9 21:06:55 krustophenia kernel: [pg0+509195408/1070195712] snd_emu10k1_ptr_read+0xc0/0xe0 [snd_emu10k1]
Oct 9 21:06:55 krustophenia kernel: [pg0+509190243/1070195712] snd_emu10k1_capture_pointer+0x33/0x70 [snd_emu10k1]
Oct 9 21:06:55 krustophenia kernel: [pg0+508937310/1070195712] snd_pcm_period_elapsed+0xde/0x3d0 [snd_pcm]
Oct 9 21:06:55 krustophenia kernel: [pg0+509178406/1070195712] snd_emu10k1_interrupt+0xd6/0x400 [snd_emu10k1]
Oct 9 21:06:55 krustophenia kernel: [generic_handle_IRQ_event+49/96] generic_handle_IRQ_event+0x31/0x60
Oct 9 21:06:55 krustophenia kernel: [do_IRQ+317/848] do_IRQ+0x13d/0x350
Oct 9 21:06:55 krustophenia kernel: [common_interrupt+24/32] common_interrupt+0x18/0x20
Oct 9 21:06:55 krustophenia kernel: [pg0+509195625/1070195712] snd_emu10k1_ptr_write+0xb9/0xc0 [snd_emu10k1]
Oct 9 21:06:55 krustophenia kernel: [pg0+509174891/1070195712] snd_emu10k1_voice_init+0x11b/0x1e0 [snd_emu10k1]
Oct 9 21:06:55 krustophenia kernel: [pg0+509182616/1070195712] snd_emu10k1_voice_free+0x38/0x70 [snd_emu10k1]
Oct 9 21:06:55 krustophenia kernel: [pg0+509187993/1070195712] snd_emu10k1_playback_hw_free+0x99/0xd0 [snd_emu10k1]
Oct 9 21:06:55 krustophenia kernel: [pg0+510049279/1070195712] snd_pcm_oss_release_file+0xbf/0x110 [snd_pcm_oss]
Oct 9 21:06:55 krustophenia kernel: [pg0+510051019/1070195712] snd_pcm_oss_release+0x4b/0x100 [snd_pcm_oss]
Oct 9 21:06:55 krustophenia kernel: [__fput+292/320] __fput+0x124/0x140
Oct 9 21:06:55 krustophenia kernel: [filp_close+67/112] filp_close+0x43/0x70
Oct 9 21:06:55 krustophenia kernel: [sys_close+88/112] sys_close+0x58/0x70
Oct 9 21:06:55 krustophenia kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Oct 9 21:06:56 krustophenia kernel: Mtx: ddfc1ed0 [1445] pri (0) inherit from [1495] pri(10)
Oct 9 21:06:56 krustophenia kernel: bad: scheduling while atomic!
Oct 9 21:06:56 krustophenia kernel: [schedule+1578/1584] schedule+0x62a/0x630
Oct 9 21:06:56 krustophenia kernel: [__p_mutex_down+493/864] __p_mutex_down+0x1ed/0x360
Oct 9 21:06:56 krustophenia kernel: [kmutex_is_locked+32/64] kmutex_is_locked+0x20/0x40
Oct 9 21:06:56 krustophenia kernel: [pg0+508918664/1070195712] snd_pcm_capture_poll+0x48/0x120 [snd_pcm]
Oct 9 21:06:56 krustophenia kernel: [do_pollfd+125/144] do_pollfd+0x7d/0x90
Oct 9 21:06:56 krustophenia kernel: [do_poll+95/192] do_poll+0x5f/0xc0
Oct 9 21:06:56 krustophenia kernel: [sys_poll+330/560] sys_poll+0x14a/0x230
Oct 9 21:06:56 krustophenia kernel: [__pollwait+0/160] __pollwait+0x0/0xa0
Oct 9 21:06:56 krustophenia kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Oct 9 21:09:27 krustophenia kernel: Mtx: cde58020 [1383] pri (0) inherit from [3] pri(92)
Oct 9 21:09:27 krustophenia kernel: bad: scheduling while atomic!
Oct 9 21:09:27 krustophenia kernel: [schedule+1578/1584] schedule+0x62a/0x630
Oct 9 21:09:27 krustophenia kernel: [__p_mutex_down+493/864] __p_mutex_down+0x1ed/0x360
Oct 9 21:09:27 krustophenia kernel: [kmutex_is_locked+32/64] kmutex_is_locked+0x20/0x40
Oct 9 21:09:27 krustophenia kernel: [tcp_v4_rcv+1207/2048] tcp_v4_rcv+0x4b7/0x800
Oct 9 21:09:27 krustophenia kernel: [ip_local_deliver+154/304] ip_local_deliver+0x9a/0x130
Oct 9 21:09:27 krustophenia kernel: [ip_rcv+729/992] ip_rcv+0x2d9/0x3e0
Oct 9 21:09:27 krustophenia kernel: [netif_receive_skb+264/448] netif_receive_skb+0x108/0x1c0
Oct 9 21:09:27 krustophenia kernel: [process_backlog+125/272] process_backlog+0x7d/0x110
Oct 9 21:09:27 krustophenia kernel: [ksoftirqd_high_prio+0/192] ksoftirqd_high_prio+0x0/0xc0
Oct 9 21:09:27 krustophenia kernel: [net_rx_action+108/256] net_rx_action+0x6c/0x100
Oct 9 21:09:27 krustophenia kernel: [__do_softirq+99/112] __do_softirq+0x63/0x70
Oct 9 21:09:27 krustophenia kernel: [do_softirq+53/64] do_softirq+0x35/0x40
Oct 9 21:09:27 krustophenia kernel: [ksoftirqd_high_prio+133/192] ksoftirqd_high_prio+0x85/0xc0
Oct 9 21:09:27 krustophenia kernel: [kthread+163/176] kthread+0xa3/0xb0
Oct 9 21:09:27 krustophenia kernel: [kthread+0/176] kthread+0x0/0xb0
Oct 9 21:09:27 krustophenia kernel: [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Oct 9 21:09:42 krustophenia kernel: Mtx: cbafd9a0 [1388] pri (0) inherit from [3] pri(92)
Oct 9 21:09:42 krustophenia kernel: bad: scheduling while atomic!
Oct 9 21:09:42 krustophenia kernel: [schedule+1578/1584] schedule+0x62a/0x630
Oct 9 21:09:42 krustophenia kernel: [__p_mutex_down+493/864] __p_mutex_down+0x1ed/0x360
Oct 9 21:09:42 krustophenia kernel: [kmutex_is_locked+32/64] kmutex_is_locked+0x20/0x40
Oct 9 21:09:42 krustophenia kernel: [tcp_v4_rcv+1207/2048] tcp_v4_rcv+0x4b7/0x800
Oct 9 21:09:42 krustophenia kernel: [ip_local_deliver+154/304] ip_local_deliver+0x9a/0x130
Oct 9 21:09:42 krustophenia kernel: [ip_rcv+729/992] ip_rcv+0x2d9/0x3e0
Oct 9 21:09:42 krustophenia kernel: [netif_receive_skb+264/448] netif_receive_skb+0x108/0x1c0
Oct 9 21:09:42 krustophenia kernel: [process_backlog+125/272] process_backlog+0x7d/0x110
Oct 9 21:09:42 krustophenia kernel: [ksoftirqd_high_prio+0/192] ksoftirqd_high_prio+0x0/0xc0
Oct 9 21:09:42 krustophenia kernel: [net_rx_action+108/256] net_rx_action+0x6c/0x100
Oct 9 21:09:42 krustophenia kernel: [__do_softirq+99/112] __do_softirq+0x63/0x70
Oct 9 21:09:42 krustophenia kernel: [do_softirq+53/64] do_softirq+0x35/0x40
Oct 9 21:09:42 krustophenia kernel: [ksoftirqd_high_prio+133/192] ksoftirqd_high_prio+0x85/0xc0
Oct 9 21:09:42 krustophenia kernel: [kthread+163/176] kthread+0xa3/0xb0
Oct 9 21:09:42 krustophenia kernel: [kthread+0/176] kthread+0x0/0xb0
Oct 9 21:09:42 krustophenia kernel: [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Oct 9 21:09:42 krustophenia kernel: Mtx cbafd9a0 task [1388] pri (92) restored pri(0). Next owner [3] pri (92)

Looks like the Mtx debug messages are related.

Lee

2004-10-10 08:45:13

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

* Sven-Thorsten Dietrich <[email protected]> wrote:

> Announcing the availability of prototype real-time (RT)
> enhancements to the Linux 2.6 kernel.
>
> We will submit 3 additional emails following this one, containing
> the remaining 3 patches (of 4) inline, with their descriptions.

cool! Basically the biggest problem is not the technology itself, but
its proper integration into Linux. As it can be seen from the 2.4 RT
patches (TimeSys and yours), just walking the path towards a fully
preemptible kernel is not fruitful because it generates lots of huge,
intrusive patches that end up being unmaintainable forks of the Linux
tree.

the other approach is what i'm currently doing with the
voluntary-preempt patchset: to improve the generic kernel for latency
purposes without actually adding too many extra features. Here is what
is happening in the -mm tree right now:

- the generic irq subsystem: irq threading is a simple ~200-lines,
architecture-independent add-on to this. It makes no sense to offer 3
different implementations - pick one and help make it work well.

- preemptible BKL. Related to this is new debugging infrastructure in
-mm that allows the safe and slow conversion of spinlocks to mutexes.
In the case of the BKL this conversion is expected to be permanent,
for most of the other spinlocks it will be optional - but the
debugging code can still be used.

- various fixes and latency improvements. A mutex based kernel is of
little use if the only code you can execute reliably is user-space
code and the moment you hit kernel-space your RT app is exposed to
high latencies.

A couple of suggestions wrt. how to speed up the integration effort: you
might want to rebase this stuff to the -mm tree. Also, what i dont see
in your (and others') patches (yet?) is some of the harder stuff:

- the handling of per-CPU data structures (get_cpu_var())

- RCU and softirq data structures

- the handling of the IRQ flag

These are basic correctness issues that affect UP just as much as SMP.
Without these the kernel is still not a "fully preemptible" kernel.
These need infrastructure changes too, so they must preceed any addition
of a spinlock -> mutex conversion feature.

So the mutex patch will probably the one that can go upstream _last_,
which will do the "final step" of making the kernel fully preemptible.

Ingo

2004-10-10 12:21:57

by John Richard Moser

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sven-Thorsten Dietrich wrote:
|
| Announcing the availability of prototype real-time (RT)
| enhancements to the Linux 2.6 kernel.
|
| We will submit 3 additional emails following this one, containing
| the remaining 3 patches (of 4) inline, with their descriptions.
|
| Download:
|
| Patches against the Linux-2.6.9-rc3 kernel are available at:
|
| ftp://source.mvista.com/pub/realtime/Linux-2.6.9-rc3-RT_irqthreads.patch
| ftp://source.mvista.com/pub/realtime/Linux-2.6.9-rc3-RT_mutex.patch
| ftp://source.mvista.com/pub/realtime/Linux-2.6.9-rc3-RT_spinlock1.patch
| ftp://source.mvista.com/pub/realtime/Linux-2.6.9-rc3-RT_spinlock2.patch
|
| The patches are to be applied to the linux-2.6.9-rc3 kernel in the
| order listed above.

Does any of this 'work' on x86_64 yet? I heard that Ingo's voluntary
pre-empt was x86 only and didn't work on amd64; this stuff's kinda new,
does it work outside x86 yet?

I'd like to see what these kinds of things do. :)

[...]

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBaSk6hDd4aOud5P8RAotcAJ9GgA3P1mAG/CpdlJDknGK6zwA92QCePZi4
AyNDvW6urtDNdvJAPDMZZfk=
=gVeZ
-----END PGP SIGNATURE-----

2004-10-10 17:29:47

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sun, 2004-10-10 at 05:21, John Richard Moser wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> Sven-Thorsten Dietrich wrote:
> |
> | Announcing the availability of prototype real-time (RT)
> | enhancements to the Linux 2.6 kernel.
> |
> | We will submit 3 additional emails following this one, containing
> | the remaining 3 patches (of 4) inline, with their descriptions.
> |
> | Download:
> |
> | Patches against the Linux-2.6.9-rc3 kernel are available at:
> |
> | ftp://source.mvista.com/pub/realtime/Linux-2.6.9-rc3-RT_irqthreads.patch
> | ftp://source.mvista.com/pub/realtime/Linux-2.6.9-rc3-RT_mutex.patch
> | ftp://source.mvista.com/pub/realtime/Linux-2.6.9-rc3-RT_spinlock1.patch
> | ftp://source.mvista.com/pub/realtime/Linux-2.6.9-rc3-RT_spinlock2.patch
> |
> | The patches are to be applied to the linux-2.6.9-rc3 kernel in the
> | order listed above.
>
> Does any of this 'work' on x86_64 yet? I heard that Ingo's voluntary
> pre-empt was x86 only and didn't work on amd64; this stuff's kinda new,
> does it work outside x86 yet?
>
> I'd like to see what these kinds of things do. :)

No it's x86 only right now. The mutex is partly in assembly, and the
IRQ threads that we are using are (both of them) x86 only.

Daniel Walker

2004-10-10 17:32:26

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sun, 2004-10-10 at 08:21, John Richard Moser wrote:

> Does any of this 'work' on x86_64 yet? I heard that Ingo's voluntary
> pre-empt was x86 only and didn't work on amd64; this stuff's kinda new,
> does it work outside x86 yet?
>
> I'd like to see what these kinds of things do. :)

The VP patches currently work on x86, x64, amd64, and ppc AFAIK. As
stated in the docs, the MontaVista stuff is x86 only right now.

My tests show the worst case latency with the MontaVista patches is
about twice that of the VP patches. Probably due to debug overhead and
a bug or two. But, as expected, the average case latency is _much_
better.

Here's the top of the VP histogram, delay is in usecs:

Delay #
0 5764433
1 3154867
2 461521
3 332445
4 403847
5 320120
6 237955
7 152418
8 94274
9 66496
10 52976
11 44605
12 38437
13 31620
14 27816
15 26845
16 23743
17 20648
18 21611
19 24853
20 30352
21 50046
22 101989
23 24843
24 28829
25 56247
26 42408
27 28228
28 20773
29 19521

Here's the top of the Mvista histogram:

Delay #
0 6771692
1 26
2 29
3 12
4 15
5 15
6 15
7 18
8 19
9 10
10 15
11 10
12 19
13 12
14 15
15 11
16 13
17 13
18 11
19 13
20 12
21 9
22 11
23 13
24 17
25 10
26 9
27 11
28 8
29 12

Lee

2004-10-10 18:45:14

by John Richard Moser

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lee Revell wrote:
| On Sun, 2004-10-10 at 08:21, John Richard Moser wrote:
|
|
|>Does any of this 'work' on x86_64 yet? I heard that Ingo's voluntary
|>pre-empt was x86 only and didn't work on amd64; this stuff's kinda new,
|>does it work outside x86 yet?
|>
|>I'd like to see what these kinds of things do. :)
|
|
| The VP patches currently work on x86, x64, amd64, and ppc AFAIK. As
| stated in the docs, the MontaVista stuff is x86 only right now.

Is there a stable amd64 voluntary pre-empt patch for 2.6.7? I'm using
PaX so I can't go up until the author catches up to the new VM changes
introduced in 2.6.8+.

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBaYM7hDd4aOud5P8RAl9tAJ9mJmKtt4p+I4iLh9u1hiFQXK1DlwCfbBhL
TTXwLyxVxwBNuZvnpfj5BN8=
=tbRd
-----END PGP SIGNATURE-----

2004-10-10 19:42:07

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sun, 2004-10-10 at 01:46, Ingo Molnar wrote:
> - the generic irq subsystem: irq threading is a simple ~200-lines,
> architecture-independent add-on to this. It makes no sense to offer 3
> different implementations - pick one and help make it work well.
>
> - preemptible BKL. Related to this is new debugging infrastructure in
> -mm that allows the safe and slow conversion of spinlocks to mutexes.
> In the case of the BKL this conversion is expected to be permanent,
> for most of the other spinlocks it will be optional - but the
> debugging code can still be used.

Are you referring to the lock metering? I've ported our changes to
-mm3-VP-T3 on top of lock metering. It needs some clean up but It will
be released soon. It's very similar to our rc3 release only without the
IRQ threads patch.

Daniel Walker

2004-10-10 20:19:22

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

* John Richard Moser <[email protected]> wrote:

> | The VP patches currently work on x86, x64, amd64, and ppc AFAIK. As
> | stated in the docs, the MontaVista stuff is x86 only right now.
>
> Is there a stable amd64 voluntary pre-empt patch for 2.6.7? I'm using
> PaX so I can't go up until the author catches up to the new VM changes
> introduced in 2.6.8+.

nope, latest -VP is against 2.6.9-rc3-mm3-ish kernels. Since half of -VP
is in -mm already in various forms of patches it would be quite hard to
extract all of that even against a vanilla 2.6.9-rc3 kernel - let alone
against 2.6.7.

Ingo

2004-10-10 20:44:11

by John Richard Moser

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ingo Molnar wrote:
| * John Richard Moser <[email protected]> wrote:
|
|

[...]

|
| nope, latest -VP is against 2.6.9-rc3-mm3-ish kernels. Since half of -VP
| is in -mm already in various forms of patches it would be quite hard to
| extract all of that even against a vanilla 2.6.9-rc3 kernel - let alone
| against 2.6.7.

Alright, I'll just wait for a new PaX patch then.

[...]

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBaZ8fhDd4aOud5P8RAjYlAJ98UZqYWigQacDJLg1BPHLgS9dxQgCggv0S
KDoa7bJJYso9DlRTwldbFlo=
=u2eR
-----END PGP SIGNATURE-----

2004-10-10 19:44:35

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

* Daniel Walker <[email protected]> wrote:

> On Sun, 2004-10-10 at 01:46, Ingo Molnar wrote:
> > - the generic irq subsystem: irq threading is a simple ~200-lines,
> > architecture-independent add-on to this. It makes no sense to offer 3
> > different implementations - pick one and help make it work well.
> >
> > - preemptible BKL. Related to this is new debugging infrastructure in
> > -mm that allows the safe and slow conversion of spinlocks to mutexes.
> > In the case of the BKL this conversion is expected to be permanent,
> > for most of the other spinlocks it will be optional - but the
> > debugging code can still be used.
>
> Are you referring to the lock metering? I've ported our changes
> to -mm3-VP-T3 on top of lock metering. It needs some clean up but It
> will be released soon. It's very similar to our rc3 release only
> without the IRQ threads patch.

no, i mean the smp_processor_id() debugger, and the other bits triggered
by CONFIG_DEBUG_PREEMPT.

Ingo

2004-10-10 21:22:22

by Andrew Morton

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Daniel Walker <[email protected]> wrote:
>
> On Sun, 2004-10-10 at 01:46, Ingo Molnar wrote:
> > - the generic irq subsystem: irq threading is a simple ~200-lines,
> > architecture-independent add-on to this. It makes no sense to offer 3
> > different implementations - pick one and help make it work well.
> >
> > - preemptible BKL. Related to this is new debugging infrastructure in
> > -mm that allows the safe and slow conversion of spinlocks to mutexes.
> > In the case of the BKL this conversion is expected to be permanent,
> > for most of the other spinlocks it will be optional - but the
> > debugging code can still be used.
>
> Are you referring to the lock metering? I've ported our changes to
> -mm3-VP-T3 on top of lock metering.

Lockmeter gets in the way of all this activity in a big way. I'll drop it.

2004-10-10 21:57:39

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

* Andrew Morton <[email protected]> wrote:

> Lockmeter gets in the way of all this activity in a big way. I'll
> drop it.

great. Daniel, would you mind to merge your patchkit against the
following base:

-mm3, minus lockmeter, plus the -T3 patch

? To make this easier i've uploaded a combined undo-lockmeter patch to:

http://redhat.com/~mingo/voluntary-preempt/undo-lockmeter-2.6.9-rc3-mm3-A1

which you should apply to vanilla -mm3, then apply the -T3 patch:

http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc3-mm3-T3

this will apply cleanly with some minor fuzz. The resulting kernel
builds & boots fine with my .config.

Ingo

2004-10-11 15:28:00

by Vadim Lebedev

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Sven-Thorsten Dietrich <[email protected]> wrote in message
news:<[email protected]>...
> Announcing the availability of prototype real-time (RT)
> enhancements to the Linux 2.6 kernel.

Reading the sources i believe that __p_mutex_up is not constant time
operation because of __p_mutex_down....

It is clear that
__p_mutex_down is not constant time operation because of insertion
into the priority-sorted sleepers list. However both __p_mutex_down
and __p_mutex_up are synchronize on the same global spinlock
(m_spin_lock) .... so if the __p_mutex_down is holding this spinlock
while inserting NO other process(or) is able to perform any __p_mutex
operation...

Maybe the better idea would be to have a per-mutex spinlock? or even
better, given that the task->rt_priority have a finite range maybe each
mutex can have a table of sleeper lists indexed by rt_priority?

Vadim

2004-10-11 16:02:07

by Eugeny S. Mints

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Vadim Lebedev wrote:
> Sven-Thorsten Dietrich <[email protected]> wrote in message
> news:<[email protected]>...
>
>>Announcing the availability of prototype real-time (RT)
>>enhancements to the Linux 2.6 kernel.
>
>
> Reading the sources i believe that __p_mutex_up is not constant time
> operation because of __p_mutex_down....
>
> It is clear that
> __p_mutex_down is not constant time operation because of insertion
> into the priority-sorted sleepers list. However both __p_mutex_down
> and __p_mutex_up are synchronize on the same global spinlock
> (m_spin_lock) .... so if the __p_mutex_down is holding this spinlock
> while inserting NO other process(or) is able to perform any __p_mutex
> operation...

Current pmutex implementation was chosen only as prototype
implementation. kmutex abstraction layer allows to switch easily between
any (alternative) mutex implementations and to choose optimal one on a
benchmarking basis.

>
> Maybe the better idea would be to have a per-mutex spinlock? or even
> better, given that the task->rt_priority have a finite range maybe each
> mutex can have a table of sleeper lists indexed by rt_priority?
>
>
> Vadim
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
>

2004-10-11 17:53:21

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Sun, 2004-10-10 at 14:59, Ingo Molnar wrote:
> * Andrew Morton <[email protected]> wrote:
>
> > Lockmeter gets in the way of all this activity in a big way. I'll
> > drop it.
>
> great. Daniel, would you mind to merge your patchkit against the
> following base:
>
> -mm3, minus lockmeter, plus the -T3 patch

No problem. Next release will be without lockmeter. Thanks for the
patches.

Daniel Walker

2004-10-11 20:48:33

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

* Daniel Walker <[email protected]> wrote:

> On Sun, 2004-10-10 at 14:59, Ingo Molnar wrote:
> > * Andrew Morton <[email protected]> wrote:
> >
> > > Lockmeter gets in the way of all this activity in a big way. I'll
> > > drop it.
> >
> > great. Daniel, would you mind to merge your patchkit against the
> > following base:
> >
> > -mm3, minus lockmeter, plus the -T3 patch
>
>
> No problem. Next release will be without lockmeter. Thanks for the
> patches.

what do you think about the PREEMPT_REALTIME stuff in -T4? Ideally, if
you agree with the generic approach, the next step would be to add your
priority inheritance handling code to Linux semaphores and
rw-semaphores. The sched.c bits for that looked pretty straightforward.
The list walking is a bit ugly but probably unavoidable - the only other
option would be 100 priority queues per semaphore -> yuck.

Ingo

2004-10-11 21:44:37

[permalink] [raw]

Subject: RE: [ANNOUNCE] Linux 2.6 Real Time Kernel

I think Daniel has some separate thoughts,
here are mine:

Regarding the list walking stuff:

There are a lot of hashing options, indexing,
etc. that could be done. We thought of it
as a future optimization. An easy fix would
be to insert RT processes at the front, non-RT
from the tail of the queue.

Regarding patch size: clearly this is
an issue. We are working on creating a
good map of spinlock nestings, to help
with this.

Will publish that ASAP.

IMO the number of raw_spinlocks should be
lower, I said teens before.

Theoretically, it should only need to be
around hardware registers and some memory maps
and cache code, plus interrupt controller
and other SMP-contended hardware.

Practically, its an efficiency judgement call.
Its not worth blocking for 5 instructions in
a critical section under any circumstance,
so the deepest nested locks should probably remain
spinlocks.

There are some concurrency issues in kernel threads,
and I think there is a lot of work here.
The abstraction for LOCK_OPS is a good alternative,
but like the spin_undefs, its difficult to tell
in the code whether you are dealing with a mutex
or a spinlock.

Regarding the use of the system semaphore:
We have WIP on PMUTEX modified to use atomic_t,
thereby eliminating the assembly for instant
portability.

Its slow, but optimizations are allowed for.

Of course for actual portability the
IRQ threads must also be running on those
other platforms.

Your IRQ abstraction is ideal for that.

Eventually, I think that we will see
optimization - the last touches would have
the final mutex code converted back to
assembly, for performance reasons.

There are a whole lot of caveats and race
conditions that have not yet been unearthed
by the brief LKML testing. A lot of them
have to do with wakeups of tasks blocked
on a mutex, and differentiating between
blocked "ready" and blocked "mutex" states.
Here the system semaphore may have an advantage.

With that, maybe we can work back towards
the abstraction, so that we can evaluate both
solutions for their specific advantages.

I'll have to take a look at the new T4 patch
in detail, but at first glance it seems
that both mutexes could coexist in the
abstraction.

We'll give it a test run, and look forward to
your thoughts.

Thanks,

Sven

> -----Original Message-----
> From: Ingo Molnar [mailto:[email protected]]
> Sent: Monday, October 11, 2004 1:50 PM
> To: Daniel Walker
> Cc: Andrew Morton; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel
>
>
>
> * Daniel Walker <[email protected]> wrote:
>
> > On Sun, 2004-10-10 at 14:59, Ingo Molnar wrote:
> > > * Andrew Morton <[email protected]> wrote:
> > >
> > > > Lockmeter gets in the way of all this activity in a big way. I'll
> > > > drop it.
> > >
> > > great. Daniel, would you mind to merge your patchkit against the
> > > following base:
> > >
> > > -mm3, minus lockmeter, plus the -T3 patch
> >
> >
> > No problem. Next release will be without lockmeter. Thanks for the
> > patches.
>
> what do you think about the PREEMPT_REALTIME stuff in -T4? Ideally, if
> you agree with the generic approach, the next step would be to add your
> priority inheritance handling code to Linux semaphores and
> rw-semaphores. The sched.c bits for that looked pretty straightforward.
> The list walking is a bit ugly but probably unavoidable - the only other
> option would be 100 priority queues per semaphore -> yuck.
>
> Ingo
>

2004-10-11 21:55:55

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

* Sven Dietrich <[email protected]> wrote:

> IMO the number of raw_spinlocks should be lower, I said teens before.
>
> Theoretically, it should only need to be around hardware registers and
> some memory maps and cache code, plus interrupt controller and other
> SMP-contended hardware.

yeah, fully agreed. Right now the 90 locks i have means roughly 20% of
all locking still happens as raw spinlocks.

But, there is a 'correctness' _minimum_ set of spinlocks that _must_ be
raw spinlocks - this i tried to map in the -T4 patch. The patch does run
on SMP systems for example. (it was developed as an SMP kernel - in fact
i never compiled it as UP :-|.) If code has per-CPU or preemption
assumptions then there is no choice but to make it a raw spinlock, until
those assumptions are fixed.

> There are some concurrency issues in kernel threads, and I think there
> is a lot of work here. The abstraction for LOCK_OPS is a good
> alternative, but like the spin_undefs, its difficult to tell in the
> code whether you are dealing with a mutex or a spinlock.

what do you mean by 'it's difficult to tell'? In -T4 you do the choice
of type in the data structure and the API adapts automatically. If the
type is raw_spinlock_t then a spin_lock() is turned into a
_raw_spin_lock(). If the type is spinlock_t then the spin_lock() is
redirected to mutex_lock(). It's all transparently done and always
correct.

> There are a whole lot of caveats and race conditions that have not yet
> been unearthed by the brief LKML testing. [...]

actually, have you tried your patchset on an SMP box? As far as i can
see the locking in it ignores SMP issues _completely_, which makes the
choice of locks much less useful.

Ingo

2004-10-11 23:07:54

[permalink] [raw]

Subject: RE: [ANNOUNCE] Linux 2.6 Real Time Kernel

>
> * Sven Dietrich <[email protected]> wrote:
>
> > IMO the number of raw_spinlocks should be lower, I said teens before.
> >
> > Theoretically, it should only need to be around hardware registers and
> > some memory maps and cache code, plus interrupt controller and other
> > SMP-contended hardware.
>
> yeah, fully agreed. Right now the 90 locks i have means roughly 20% of
> all locking still happens as raw spinlocks.
>
> But, there is a 'correctness' _minimum_ set of spinlocks that _must_ be
> raw spinlocks - this i tried to map in the -T4 patch. The patch does run
> on SMP systems for example. (it was developed as an SMP kernel - in fact
> i never compiled it as UP :-|.) If code has per-CPU or preemption
> assumptions then there is no choice but to make it a raw spinlock, until
> those assumptions are fixed.
>

The grunt work is in identifying those problem areas and coming up with
elegant, low-impact solutions. RCU locks is one example as mentioned
before. We had a fix to serialize RCU access, but weren't happy with that.
We were hoping to get some input on this, but these problems seem to show
up more readily on slow systems (we are also testing with a bunch of
old P1, P2 and K6 boxes all far sub 1 GHz)

> > There are some concurrency issues in kernel threads, and I think there
> > is a lot of work here. The abstraction for LOCK_OPS is a good
> > alternative, but like the spin_undefs, its difficult to tell in the
> > code whether you are dealing with a mutex or a spinlock.
>
> what do you mean by 'it's difficult to tell'? In -T4 you do the choice
> of type in the data structure and the API adapts automatically. If the
> type is raw_spinlock_t then a spin_lock() is turned into a
> _raw_spin_lock(). If the type is spinlock_t then the spin_lock() is
> redirected to mutex_lock(). It's all transparently done and always
> correct.
>

I was making this observation:
One can't look at an arbitrary piece of code and tell if it will
be a spinlock or a mutex. One has to go look elsewhere.
In the spin_undefs case one can look the top of the file and check for it,
in the LOCK_OPS case, you have to call up the data structure declaration.

> > There are a whole lot of caveats and race conditions that have not yet
> > been unearthed by the brief LKML testing. [...]
>
> actually, have you tried your patchset on an SMP box? As far as i can
> see the locking in it ignores SMP issues _completely_, which makes the
> choice of locks much less useful.
>

We stated that its been tested minimally on SMP. That means we have
had it up and running and found it to be unstable. I fully agree that
SMP is the superset to get it working on, and that PMutex is not
perfect at this point.

We will take a look at the T5 patch and see what we can do about
PI for the system semaphore, but I am not sure how portable it would
be without also touching the assembly. FWIW PMutex is already based
in part on the system semaphore, so we might get similar problems when
porting elsewhere.

I think we should try and eliminate the mutex as an issue ASAP so we can
move on to the real meat. We have spec'd some requirements in the
rttReleaseNotes, clearly not all are being met, but we hoped to capture
most of them.
I have copied Arndt Heursch and Witold Jaworski in Germany, maybe they
will also have some insights.

Sven

2004-10-12 05:49:33

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

* Sven Dietrich <[email protected]> wrote:

> > But, there is a 'correctness' _minimum_ set of spinlocks that _must_ be
> > raw spinlocks - this i tried to map in the -T4 patch. The patch does run
> > on SMP systems for example. (it was developed as an SMP kernel - in fact
> > i never compiled it as UP :-|.) If code has per-CPU or preemption
> > assumptions then there is no choice but to make it a raw spinlock, until
> > those assumptions are fixed.

> The grunt work is in identifying those problem areas and coming up
> with elegant, low-impact solutions. RCU locks is one example as
> mentioned before. We had a fix to serialize RCU access, but weren't
> happy with that. We were hoping to get some input on this, but these
> problems seem to show up more readily on slow systems (we are also
> testing with a bunch of old P1, P2 and K6 boxes all far sub 1 GHz)

identifying problem areas is near 100% automatic if you look at -T5: all
illegal sleeps and illegal smp_processor_id() assumptions are reported
when they happen. That's how i identified & fixed the core 90 locks in
the first wave, in just a couple of hours. The only minor annoyance when
doing a conversion is the inflexibility of SPIN_LOCK_UNLOCKED and
RW_LOCK_UNLOCKED initializer. If it werent for the initializers then a
'conversion' would be a matter of a 2-line change, the change of the
prototype and the change of the definition. Now it's a 3-line change
most of the time - still very fast and painless.

regarding RCU serialization - i think that is the way to go - i dont
think there is any sensible way to extend RCU to a fully preempted
model, RCU is all about per-CPU-ness and per-CPU-ness is quite limited
in a fully preemptible model.

could you send those RCU patches (no matter how incomplete/broken)? It's
the main issue that causes the dcache_lock to be raw still. (and a
number of dependent locks: fs-writeback.c, inode.c, etc.) We can make
those RCU changes not impact the normal !PREEMPT_REALTIME locking so it
might have a chance for upstream merging as well.

> I was making this observation: One can't look at an arbitrary piece of
> code and tell if it will be a spinlock or a mutex. One has to go look
> elsewhere. In the spin_undefs case one can look the top of the file
> and check for it, in the LOCK_OPS case, you have to call up the data
> structure declaration.

ok, i now understand what you mean. The way i drove it wasnt really via
code review but via: 'compile kernel, look at the bootlogs, fix the
first lock reported, repeat' iterations. This was much easier and much
more reliable than trying to figure out lock dependencies from the
source. The turnaround for a single lock was 2-3 minutes in the typical
case, allowing the conversion of 90 locks in a couple of hours.

> > > There are a whole lot of caveats and race conditions that have not yet
> > > been unearthed by the brief LKML testing. [...]
> >
> > actually, have you tried your patchset on an SMP box? As far as i can
> > see the locking in it ignores SMP issues _completely_, which makes the
> > choice of locks much less useful.
>
> We stated that its been tested minimally on SMP. That means we have
> had it up and running and found it to be unstable. I fully agree that
> SMP is the superset to get it working on, and that PMutex is not
> perfect at this point.

it's not just the problem of PMutex - i believe it's mainly the plain
inadequacy of the 30 raw locks you have identified - and identifying the
locks is the bigger work, not the semaphore implementation. I'm now at
90 locks (20% of all locking in this .config) and that's just to quiet
the DEBUG_PREEMPT violations on my testboxes.

and no matter how well UP works, to fix SMP one has to 'cover' all the
necessary locks first before fixing it, which (drastic) increase in raw
locks invalidates most of the UP efforts of getting rid of raw locks.
That's why i decided to go for SMP primarily - didnt see much point in
going for UP.

> We will take a look at the T5 patch and see what we can do about PI
> for the system semaphore, but I am not sure how portable it would be
> without also touching the assembly. FWIW PMutex is already based in
> part on the system semaphore, so we might get similar problems when
> porting elsewhere.

there are in-C variants of Linux mutexes and rw-semaphores in the kernel
source, so worst-case we could just make use of them in the
PREEMPT_REALTIME case. I'm not a big fan of assembly optimizations (or
having to touch assembly optimizations) at an early stage like this.

Ingo

2004-10-12 18:51:18

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Mon, 2004-10-11 at 13:49, Ingo Molnar wrote:
> * Daniel Walker <[email protected]> wrote:
>
> what do you think about the PREEMPT_REALTIME stuff in -T4? Ideally, if
> you agree with the generic approach, the next step would be to add your
> priority inheritance handling code to Linux semaphores and
> rw-semaphores. The sched.c bits for that looked pretty straightforward.
> The list walking is a bit ugly but probably unavoidable - the only other
> option would be 100 priority queues per semaphore -> yuck.

I think patch size is an issue, but I also think that , eventually, we
should change all spin_lock calls that actually lock a mutex to be more
distinct so it's obvious what is going on. Sven and I both agree that
this should be addressed. Is this a non-issue for you? What does the
community want? I don't find your code or ours acceptable in it's
current form , due to this issue.

With the addition of PREEMPT_REALTIME it looks like you more than
doubled the size of voluntary preempt. I really feel that it should
remain as two distinct patches. They are dependent , but the scope of
the changes are too vast to lump it all together.

Daniel Walker

2004-10-12 19:55:23

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, 2004-10-12 at 20:50, Daniel Walker wrote:
> > what do you think about the PREEMPT_REALTIME stuff in -T4? Ideally, if
> > you agree with the generic approach, the next step would be to add your
> > priority inheritance handling code to Linux semaphores and
> > rw-semaphores. The sched.c bits for that looked pretty straightforward.
> > The list walking is a bit ugly but probably unavoidable - the only other
> > option would be 100 priority queues per semaphore -> yuck.
>
> I think patch size is an issue, but I also think that , eventually, we
> should change all spin_lock calls that actually lock a mutex to be more
> distinct so it's obvious what is going on. Sven and I both agree that
> this should be addressed. Is this a non-issue for you? What does the
> community want? I don't find your code or ours acceptable in it's
> current form , due to this issue.
>
> With the addition of PREEMPT_REALTIME it looks like you more than
> doubled the size of voluntary preempt. I really feel that it should
> remain as two distinct patches. They are dependent , but the scope of
> the changes are too vast to lump it all together.
>

Both patches (MV & Ingos) have their good bits, but both share the same
ugliness and are hard to compare and harder to combine. The conversion
of spin_lock to _spin_lock and substitution of spin_lock by mutexes,
semaphores or what ever makes it more than hard to keep the code in a
readable form.

If there is the tendency to touch the concurrency controls in general
all over the kernel, then I would suggest a script driven overhaul of
all concurrency controls like spin_locks, mutexes and semaphores to
general macros like

enter_critical_section(TYPE, &var, &flags, whatever);
leave_critical_section(TYPE, &var, flags, whatever);

where TYPE might be SPIN_LOCK, SPIN_LOCK_IRQ, MUTEX, PMUTEX or whatever
we have and come up with in the future.

This could be done in a first step and then it is clearly identifiable
and it gives us more flexibility to wrap different implementations and
lets us change particular points in a more clear way.

I would be willing to provide some scripted conversion aid, if there is
enough interest to that. I started with some test files and the results
are quite encouraging.

Any thoughts ?

tglx

2004-10-12 20:32:15

[permalink] [raw]

Subject: RE: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

> >
> > I think patch size is an issue, but I also think that , eventually, we
> > should change all spin_lock calls that actually lock a mutex to be more
> > distinct so it's obvious what is going on. Sven and I both agree that
> > this should be addressed. Is this a non-issue for you? What does the
> > community want? I don't find your code or ours acceptable in it's
> > current form , due to this issue.
> >
> > With the addition of PREEMPT_REALTIME it looks like you more than
> > doubled the size of voluntary preempt. I really feel that it should
> > remain as two distinct patches. They are dependent , but the scope of
> > the changes are too vast to lump it all together.
> >
>
>
> If there is the tendency to touch the concurrency controls in general
> all over the kernel, then I would suggest a script driven overhaul of
> all concurrency controls like spin_locks, mutexes and semaphores to
> general macros like
>
> enter_critical_section(TYPE, &var, &flags, whatever);
> leave_critical_section(TYPE, &var, flags, whatever);
>
> where TYPE might be SPIN_LOCK, SPIN_LOCK_IRQ, MUTEX, PMUTEX or whatever
> we have and come up with in the future.
>
> This could be done in a first step and then it is clearly identifiable
> and it gives us more flexibility to wrap different implementations and
> lets us change particular points in a more clear way.
>
> I would be willing to provide some scripted conversion aid, if there is
> enough interest to that. I started with some test files and the results
> are quite encouraging.
>

Ideally we would eventually provide some level of tunability, i.e.
if you want the spinlocks all the way around it should be possible
to have that, or one could enable degrees of enhancements,
expanding on the existing sequence starting with PREEMPT, IRQ_THREADS,
BKL, MUTEX, etc. In addition to that, once the minim set of spinlocks
necessary for RT is established, additional layers, corresponding to
the lock nesting order, could be established, making the "mutex-depth"
somewhat configurable based on the performance requirements.

The entire effort would have the side effect of making the locking and
critical sections more distinct, and reveal soft spots in concurrency
code, as well as to raise awareness of the code density inside
critical sections.

The concept of tunable foreground / background responsiveness,
based on preemptability of low priority processes comes to mind.
A lot of folks would probably not mind making UI responsiveness
a little crisper, others will want the throughput.

I realize this is an early stage to be looking at it so high end,
but I think in general this type of script would not be a bad addition
to the patch kit(s).

Sven

2004-10-12 20:48:01

[permalink] [raw]

Subject: RE: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, 2004-10-12 at 22:31, Sven Dietrich wrote:
> > >
> > > I think patch size is an issue, but I also think that , eventually, we
> > > should change all spin_lock calls that actually lock a mutex to be more
> > > distinct so it's obvious what is going on. Sven and I both agree that
> > > this should be addressed. Is this a non-issue for you? What does the
> > > community want? I don't find your code or ours acceptable in it's
> > > current form , due to this issue.
> > >
> > > With the addition of PREEMPT_REALTIME it looks like you more than
> > > doubled the size of voluntary preempt. I really feel that it should
> > > remain as two distinct patches. They are dependent , but the scope of
> > > the changes are too vast to lump it all together.
> > >
> >
> >
> > If there is the tendency to touch the concurrency controls in general
> > all over the kernel, then I would suggest a script driven overhaul of
> > all concurrency controls like spin_locks, mutexes and semaphores to
> > general macros like
> >
> > enter_critical_section(TYPE, &var, &flags, whatever);
> > leave_critical_section(TYPE, &var, flags, whatever);
> >
> > where TYPE might be SPIN_LOCK, SPIN_LOCK_IRQ, MUTEX, PMUTEX or whatever
> > we have and come up with in the future.
> >
> > This could be done in a first step and then it is clearly identifiable
> > and it gives us more flexibility to wrap different implementations and
> > lets us change particular points in a more clear way.
> >
> > I would be willing to provide some scripted conversion aid, if there is
> > enough interest to that. I started with some test files and the results
> > are quite encouraging.
> >

> Ideally we would eventually provide some level of tunability, i.e.
> if you want the spinlocks all the way around it should be possible
> to have that, or one could enable degrees of enhancements,
> expanding on the existing sequence starting with PREEMPT, IRQ_THREADS,
> BKL, MUTEX, etc. In addition to that, once the minim set of spinlocks
> necessary for RT is established, additional layers, corresponding to
> the lock nesting order, could be established, making the "mutex-depth"
> somewhat configurable based on the performance requirements.
>
> The entire effort would have the side effect of making the locking and
> critical sections more distinct, and reveal soft spots in concurrency
> code, as well as to raise awareness of the code density inside
> critical sections.
>
> The concept of tunable foreground / background responsiveness,
> based on preemptability of low priority processes comes to mind.
> A lot of folks would probably not mind making UI responsiveness
> a little crisper, others will want the throughput.

Yup, and having a unique identifiable thing for all that stuff in the
code would make life easier for coders and for people who want to
experiment and change things.

> I realize this is an early stage to be looking at it so high end,
> but I think in general this type of script would not be a bad addition
> to the patch kit(s).

Ok, will try to make it work on more than two files and two patterns.

Any preferences on scripting language ?

tglx

2004-10-12 21:12:43

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, Oct 12, 2004 at 09:46:34PM +0200, Thomas Gleixner wrote:
> Both patches (MV & Ingos) have their good bits, but both share the same
> ugliness and are hard to compare and harder to combine. The conversion
> of spin_lock to _spin_lock and substitution of spin_lock by mutexes,
> semaphores or what ever makes it more than hard to keep the code in a
> readable form.
>
> If there is the tendency to touch the concurrency controls in general
> all over the kernel, then I would suggest a script driven overhaul of
> all concurrency controls like spin_locks, mutexes and semaphores to
> general macros like
>
> enter_critical_section(TYPE, &var, &flags, whatever);
> leave_critical_section(TYPE, &var, flags, whatever);

FreeBSD uses these things, but it they create severe pipeline stalls
since they toggle interrupt flags on entry and exit. The current scheme
in Linux with preempt_count use to be a curse when I was working on an
equivalent implementation of there stuff at:

http://mmlinux.sf.net

It's a project I've been working for a long time and I'm farther than
them in the area of stability and most likely the problem space in general.
They are 7 and I am a single engineer though.

I don't have the latest sources up and I'm going to up load them in a
couple of hours. I've been playing with it for about 2 months, late July,
since it was able to boot reliably and I've felt/measure how a fully
preemptable kernel like this can perform. I'm getting about 4-6 usecs
average latency in the system from interrupt exception frame to the start
of the irq-thread in question. Tons of events were at 2 usecs which I
thought was insane at the time, but a ndelay insert into the path verified
this to be correct. The majority of the spread was at 5 and 10 usecs,
pushing to about 12 usecs. That's fantastic latency performance and I
was floored when the measurements were validating my preemption ideas
at the time.

> where TYPE might be SPIN_LOCK, SPIN_LOCK_IRQ, MUTEX, PMUTEX or whatever
> we have and come up with in the future.

There's two problems that need to be solved at this moment regarding
this issue. One is long term which should have a clear differentiation
of what is a persistent spinlock across a compile .config context
(choice of preemptable or standard kernels) is useful since it clearly
identifies which devices and low level systems. The other is Ingo's need
to be able to rapidly change mutexes at the drop of a hat. Eventually,
the long term goal will impose on stylistic issues in the Linux kernel
community and papers/documentation will have to be written to describe
these changes across all kernel subsystems and drivers. It's complete
epic flame bait.

In my system, I do exactly what you just outlined. With a three character
"vim" command, I capitalize the entire word, spin_lock -> SPIN_LOCK
repeated with a ".". I choose this convention because capitals standout
broadly in the source code. It's good because having this kind of
visibility can show static/compile time sleep violations that are the
main source of instability, and almost certainly all of the deadlocks
in Monta Vista's current preemption release.

My tree is stable. I was able to hammer this machine for 2-3 days straight
(no networking, that's another major can of worms) with deadlocking
using multipule mass "find / -exec egrep" of some sort that stress both
process creation and all parts of the IO system.

The lock graph changes I made ironically outlined some serious Linux
structural problems as it concerns latency. Through my effort of fixing
all of the sleep violation, I came all of the way back to the start of
the project which is that all major systems have become non-preemptable
again.

That graph that I saw from Lee is consistent with my results in that a
deadlock prone system will have phenomenal latency performance at the
expense of being absolutely incorrect. It's just a flat out broken
system at this point that they've released.

> This could be done in a first step and then it is clearly identifiable
> and it gives us more flexibility to wrap different implementations and
> lets us change particular points in a more clear way.

Yes, I agree, but the convention needs to be standardized.

> I would be willing to provide some scripted conversion aid, if there is
> enough interest to that. I started with some test files and the results
> are quite encouraging.

No, all of this can only be manual at this time, either through static
analysis by a compiler, like what Ingo did over the weekend or by hand
with runtime sleep violation checks.

Give me a bit of time to upload those files. I was just given permission
to talk about this openly now. But I can definitely tell you that I had
this running months before Monta Vista's announcement over the weekend.

Full preemption has just heated up in serious way. :) It's going to be
interesting.

> Any thoughts ?

bill

2004-10-12 21:24:50

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, Oct 12, 2004 at 02:12:01PM -0700, Bill Huey wrote:
> On Tue, Oct 12, 2004 at 09:46:34PM +0200, Thomas Gleixner wrote:
> > enter_critical_section(TYPE, &var, &flags, whatever);
> > leave_critical_section(TYPE, &var, flags, whatever);
>
> FreeBSD uses these things, but it they create severe pipeline stalls
> since they toggle interrupt flags on entry and exit. The current scheme
> in Linux with preempt_count use to be a curse when I was working on an
> equivalent implementation of there stuff at:
>
> http://mmlinux.sf.net

Duh, I didn't finish the sentence. I meant this method above is nasty
filled with pipeline stalls. Don't know if that's what were saying, but
non-preemptable critical sections denoted by preempt_count must have some
kind of conceptual overlap with local_irq* functions. I use to curse the
seperation of the two since it made my own conception irregular, but I
have come to the conclusion that using relatively something light weight
like preempt_count() for that functionality instead. That's what I
meant. :)

bill

2004-10-12 21:40:17

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, 2004-10-12 at 23:24, Bill Huey wrote:
> On Tue, Oct 12, 2004 at 02:12:01PM -0700, Bill Huey wrote:
> > On Tue, Oct 12, 2004 at 09:46:34PM +0200, Thomas Gleixner wrote:
> > > enter_critical_section(TYPE, &var, &flags, whatever);
> > > leave_critical_section(TYPE, &var, flags, whatever);
> >
> > FreeBSD uses these things, but it they create severe pipeline stalls
> > since they toggle interrupt flags on entry and exit. The current scheme
> > in Linux with preempt_count use to be a curse when I was working on an
> > equivalent implementation of there stuff at:

You missed the point. TYPE decides whether to toggle interrupts or not.
It's a generic function equivivalent, which identifies sections of code,
which must be protected. The grade of protection is defined in TYPE.

> > http://mmlinux.sf.net
>
> Duh, I didn't finish the sentence. I meant this method above is nasty
> filled with pipeline stalls. Don't know if that's what were saying, but
> non-preemptable critical sections denoted by preempt_count must have some
> kind of conceptual overlap with local_irq* functions. I use to curse the
> seperation of the two since it made my own conception irregular, but I
> have come to the conclusion that using relatively something light weight
> like preempt_count() for that functionality instead. That's what I
> meant. :)

I dont see a drawback in the proposal of enter_critical_section and
leave_critical_section conversion.

They indicate a none preemptible region, which must be protected in one
or another way. Which way is choosen, must be evaluated by the
programmer.

There are several grades from preempt_disable over mutexes, spinlocks
and irq blocking. All those grades allow different implementations for
different goals.

Systems which are optimized for througput will use other mechanisms than
systems which are optimized for guaranteed repsonse times.

There is no generic sulotion available for those problems.

But having a generic identifiable expression is more suitable for
improvements, than struggling with substitutions of x,y and z.

tglx

2004-10-12 21:42:43

[permalink] [raw]

Subject: RE: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

I emailed the mmlinux project about 2 months ago,
telling you that we were doing this.

There was no response.

I am sorry that the early stage of our development upsets you.

It was intended to promote discussion, and that seems to be working.

We are aware of the issues you describe, and are making
every effort to raise awareness of these problems.

It is difficult to solve them for a team of 1 or N,
in a maintainable fashion, as it requires some level
of awareness by the maintainers that we are looking
at it from that angle.

Thanks for the insights, we look forward to seeing your
implementation added to the smorgasbord ;)

Sven

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Bill Huey (hui)
> Sent: Tuesday, October 12, 2004 2:12 PM
> To: Thomas Gleixner
> Cc: [email protected]; Ingo Molnar; Andrew Morton;
> [email protected]; [email protected]; LKML
> Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel
>
>
> On Tue, Oct 12, 2004 at 09:46:34PM +0200, Thomas Gleixner wrote:
> > Both patches (MV & Ingos) have their good bits, but both share the same
> > ugliness and are hard to compare and harder to combine. The conversion
> > of spin_lock to _spin_lock and substitution of spin_lock by mutexes,
> > semaphores or what ever makes it more than hard to keep the code in a
> > readable form.
> >
> > If there is the tendency to touch the concurrency controls in general
> > all over the kernel, then I would suggest a script driven overhaul of
> > all concurrency controls like spin_locks, mutexes and semaphores to
> > general macros like
> >
> > enter_critical_section(TYPE, &var, &flags, whatever);
> > leave_critical_section(TYPE, &var, flags, whatever);
>
> FreeBSD uses these things, but it they create severe pipeline stalls
> since they toggle interrupt flags on entry and exit. The current scheme
> in Linux with preempt_count use to be a curse when I was working on an
> equivalent implementation of there stuff at:
>
> http://mmlinux.sf.net
>
> It's a project I've been working for a long time and I'm farther than
> them in the area of stability and most likely the problem space in general.
> They are 7 and I am a single engineer though.
>
> I don't have the latest sources up and I'm going to up load them in a
> couple of hours. I've been playing with it for about 2 months, late July,
> since it was able to boot reliably and I've felt/measure how a fully
> preemptable kernel like this can perform. I'm getting about 4-6 usecs
> average latency in the system from interrupt exception frame to the start
> of the irq-thread in question. Tons of events were at 2 usecs which I
> thought was insane at the time, but a ndelay insert into the path verified
> this to be correct. The majority of the spread was at 5 and 10 usecs,
> pushing to about 12 usecs. That's fantastic latency performance and I
> was floored when the measurements were validating my preemption ideas
> at the time.
>
> > where TYPE might be SPIN_LOCK, SPIN_LOCK_IRQ, MUTEX, PMUTEX or whatever
> > we have and come up with in the future.
>
> There's two problems that need to be solved at this moment regarding
> this issue. One is long term which should have a clear differentiation
> of what is a persistent spinlock across a compile .config context
> (choice of preemptable or standard kernels) is useful since it clearly
> identifies which devices and low level systems. The other is Ingo's need
> to be able to rapidly change mutexes at the drop of a hat. Eventually,
> the long term goal will impose on stylistic issues in the Linux kernel
> community and papers/documentation will have to be written to describe
> these changes across all kernel subsystems and drivers. It's complete
> epic flame bait.
>
> In my system, I do exactly what you just outlined. With a three character
> "vim" command, I capitalize the entire word, spin_lock -> SPIN_LOCK
> repeated with a ".". I choose this convention because capitals standout
> broadly in the source code. It's good because having this kind of
> visibility can show static/compile time sleep violations that are the
> main source of instability, and almost certainly all of the deadlocks
> in Monta Vista's current preemption release.
>
> My tree is stable. I was able to hammer this machine for 2-3 days straight
> (no networking, that's another major can of worms) with deadlocking
> using multipule mass "find / -exec egrep" of some sort that stress both
> process creation and all parts of the IO system.
>
> The lock graph changes I made ironically outlined some serious Linux
> structural problems as it concerns latency. Through my effort of fixing
> all of the sleep violation, I came all of the way back to the start of
> the project which is that all major systems have become non-preemptable
> again.
>
> That graph that I saw from Lee is consistent with my results in that a
> deadlock prone system will have phenomenal latency performance at the
> expense of being absolutely incorrect. It's just a flat out broken
> system at this point that they've released.
>
> > This could be done in a first step and then it is clearly identifiable
> > and it gives us more flexibility to wrap different implementations and
> > lets us change particular points in a more clear way.
>
> Yes, I agree, but the convention needs to be standardized.
>
> > I would be willing to provide some scripted conversion aid, if there is
> > enough interest to that. I started with some test files and the results
> > are quite encouraging.
>
> No, all of this can only be manual at this time, either through static
> analysis by a compiler, like what Ingo did over the weekend or by hand
> with runtime sleep violation checks.
>
> Give me a bit of time to upload those files. I was just given permission
> to talk about this openly now. But I can definitely tell you that I had
> this running months before Monta Vista's announcement over the weekend.
>
> Full preemption has just heated up in serious way. :) It's going to be
> interesting.
>
> > Any thoughts ?
>
> bill
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-10-12 22:08:30

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, 2004-10-12 at 23:12, Bill Huey wrote:
> My tree is stable. I was able to hammer this machine for 2-3 days straight
> (no networking, that's another major can of worms) with deadlocking
> using multipule mass "find / -exec egrep" of some sort that stress both
> process creation and all parts of the IO system.

He, a system without networking is a real measurement ? Ever heard of
hackbench in combination with ping -f ?

> That graph that I saw from Lee is consistent with my results in that a
> deadlock prone system will have phenomenal latency performance at the
> expense of being absolutely incorrect. It's just a flat out broken
> system at this point that they've released.

Thats a major problem caused by "dumb" priority inheritence. The goal is
not priority inheritence at the very end. It's proxy execution, where
priority inheritence is a subset.

> > This could be done in a first step and then it is clearly identifiable
> > and it gives us more flexibility to wrap different implementations and
> > lets us change particular points in a more clear way.
>
> Yes, I agree, but the convention needs to be standardized.

That's all I was talking about.

> > I would be willing to provide some scripted conversion aid, if there is
> > enough interest to that. I started with some test files and the results
> > are quite encouraging.
>
> No, all of this can only be manual at this time, either through static
> analysis by a compiler, like what Ingo did over the weekend or by hand
> with runtime sleep violation checks.

I'm not talking about automatic conversion of rules. I'm talking about
automatic conversion of different concurrency controls into a
equivillance function, which lets you better identify the neccecary
manual changes and leaves room for simple and non intrusive replacement
implementations.

> Give me a bit of time to upload those files. I was just given permission
> to talk about this openly now. But I can definitely tell you that I had
> this running months before Monta Vista's announcement over the weekend.

There are a bunch of other efforts underway around the world, which
might be concentrated now into one.

tglx

2004-10-12 22:39:28

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Wed, Oct 13, 2004 at 12:00:16AM +0200, Thomas Gleixner wrote:
> On Tue, 2004-10-12 at 23:12, Bill Huey wrote:
> > My tree is stable. I was able to hammer this machine for 2-3 days straight
> > (no networking, that's another major can of worms) with deadlocking
> > using multipule mass "find / -exec egrep" of some sort that stress both
> > process creation and all parts of the IO system.
>
> He, a system without networking is a real measurement ? Ever heard of
> hackbench in combination with ping -f ?

The problem with doing this project is to create an identically
functioning system that's correct. The current track taking by Monta Vista
is highly unstable given the lack of locking throughout their kernel. It
has all of the complexities of mutex style conventions without any debugging
methodology attached to it. It's no longer the spinlock universe that
Linux is using since a deadlock situation just leaves use running in
cpu_idle wondering what is going on.

It's something that needs to be address in the large scheme of the project.

> > That graph that I saw from Lee is consistent with my results in that a
> > deadlock prone system will have phenomenal latency performance at the
> > expense of being absolutely incorrect. It's just a flat out broken
> > system at this point that they've released.
>
> Thats a major problem caused by "dumb" priority inheritence. The goal is
> not priority inheritence at the very end. It's proxy execution, where
> priority inheritence is a subset.

This has been articulate a couple of times by both me and Ingo (recent email).
The MV's system is highly unstable, not because of priority inheritance,
but because of basic lock violation in the lock graph itself. It's another kind
of SMP granularity problem. The hard problem was just what Ingo was saying and
it's higher, but higher in the graph.

> > Yes, I agree, but the convention needs to be standardized.
>
> That's all I was talking about.

Yeah, it needs to be done. I like the "_" methodology that both Monta Vista
and Ingo are using. I'll convert my stuff over to using it when I'm finished
with a couple of large items here.

> I'm not talking about automatic conversion of rules. I'm talking about
> automatic conversion of different concurrency controls into a
> equivillance function, which lets you better identify the neccecary
> manual changes and leaves room for simple and non intrusive replacement
> implementations.

This is kind of a sketchy problem. So far all of what I've seen really needs
to be done manually and can be done using the all of the normal Linux locking
and scheduler/interrupt masking primitives. I'd hate to see another system
added to this that solves a problem that may not exist. Please, correct
me if I'm not understanding you.

> > Give me a bit of time to upload those files. I was just given permission
> > to talk about this openly now. But I can definitely tell you that I had
> > this running months before Monta Vista's announcement over the weekend.

> There are a bunch of other efforts underway around the world, which
> might be concentrated now into one.

bill

2004-10-12 22:57:41

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, Oct 12, 2004 at 02:41:02PM -0700, Sven Dietrich wrote:
>
> I emailed the mmlinux project about 2 months ago,
> telling you that we were doing this.

I don't remember getting an email from you. I get tons of
email at times and I don't know if I had lost it or not.
I'm sorry if I didn't respond to you, but being in the
context of commerical development has a certain kind of
conflict with open source culture and balancing them with
competitors is tenuous and tense. I'm about as die hard
open source as it gets and it's a difficult balance if
one thinks of this problem in with these constraints.

> There was no response.
>
> I am sorry that the early stage of our development upsets you.

Well, it kind of forced a number of things to happen that
is premature from multipule folks, namely me (Ingo can
speak for himself). I didn't want to release these patches
until I had solved a number of really critical problems,
since it would have made the release rather useless.

But since this is in a commerical context we have to save
face by at least putting our cards on the table and establishing
a sort of role in this community. That commericial development
attitude the reason why I haven't been permitted to talk about
this stuff openly, only sort of on the side in various
preemption discussions.

> It was intended to promote discussion, and that seems to be working.

Yeah, for me a bit of freak out Saturday that is still
kind of happening since this has been a personal project
of mine for a long time. :) I interpreted it as a visibility
move on your company's part, which I hate to say is a bit
unnerving to know that another group was doing the same
work. TimeSys's Scott Wood and friends are doing something
like this as well. I'm only being fair by mentioning them. :)

BTW, I'm using their irq-thread patches with modifications.
I intuited that they were doing an incremental model, which,
since this problem space is a bit more known, is no longer
a clearly viable track for them, assuming they are going this
route, because of all of the recent work.

There's going to be tons of overlap here and I suspect
that Ingo is going to kick all of our respective commerical
butts. :)

> We are aware of the issues you describe, and are making
> every effort to raise awareness of these problems.

> It is difficult to solve them for a team of 1 or N,
> in a maintainable fashion, as it requires some level
> of awareness by the maintainers that we are looking
> at it from that angle.

> Thanks for the insights, we look forward to seeing your
> implementation added to the smorgasbord ;)

Well, uh, at least you're single kernel image folks like
us and not flaming us/me yet for corrupting the sancity
of Linux. Oh man, I feel a flame war coming. This is such
touchy material.

What's Monta Vista's attitude toward preemption development ?
open or closed ? I know this is a charged question, but
this has to be asked. :)

This commerical thing is going to be weird. I wish I was
an angry hippie instead of having a job at certain moments. :)

But the bay area is pretty damn cool, so... that makes up
for it. :)

bill

2004-10-12 23:14:19

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, Oct 12, 2004 at 11:32:18PM +0200, Thomas Gleixner wrote:
> You missed the point. TYPE decides whether to toggle interrupts or not.
> It's a generic function equivivalent, which identifies sections of code,
> which must be protected. The grade of protection is defined in TYPE.

Sorry, I misunderstood out of my impulsiveness. If I understand you,
you just want a gradual method of determining which critical sections
need to be preemptive or not depending if you need a server or RT
performance ?

I thought you were talking about something else if this is the case.

bill

2004-10-12 23:19:23

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Wed, 2004-10-13 at 00:36, Bill Huey wrote:
> On Wed, Oct 13, 2004 at 12:00:16AM +0200, Thomas Gleixner wrote:
> > On Tue, 2004-10-12 at 23:12, Bill Huey wrote:
> > > My tree is stable. I was able to hammer this machine for 2-3 days straight
> > > (no networking, that's another major can of worms) with deadlocking
> > > using multipule mass "find / -exec egrep" of some sort that stress both
> > > process creation and all parts of the IO system.
> >
> > He, a system without networking is a real measurement ? Ever heard of
> > hackbench in combination with ping -f ?
>
> The problem with doing this project is to create an identically
> functioning system that's correct. The current track taking by Monta Vista
> is highly unstable given the lack of locking throughout their kernel. It
> has all of the complexities of mutex style conventions without any debugging
> methodology attached to it. It's no longer the spinlock universe that
> Linux is using since a deadlock situation just leaves use running in
> cpu_idle wondering what is going on.
>
> It's something that needs to be address in the large scheme of the project.

Ack.

> > > That graph that I saw from Lee is consistent with my results in that a
> > > deadlock prone system will have phenomenal latency performance at the
> > > expense of being absolutely incorrect. It's just a flat out broken
> > > system at this point that they've released.
> >
> > Thats a major problem caused by "dumb" priority inheritence. The goal is
> > not priority inheritence at the very end. It's proxy execution, where
> > priority inheritence is a subset.
>
> This has been articulate a couple of times by both me and Ingo (recent email).
> The MV's system is highly unstable, not because of priority inheritance,
> but because of basic lock violation in the lock graph itself. It's another kind
> of SMP granularity problem. The hard problem was just what Ingo was saying and
> it's higher, but higher in the graph.

Can you point me a bit more clear on what you are talking about ?

> > > Yes, I agree, but the convention needs to be standardized.
> >
> > That's all I was talking about.
>
> Yeah, it needs to be done. I like the "_" methodology that both Monta Vista
> and Ingo are using. I'll convert my stuff over to using it when I'm finished
> with a couple of large items here.

That's totaly fucked up. Compile XFS with that and you are toast. That's
ugly and not understandable/fixable for anybody in the universe without
more ugly and less understandable hacks. Yes I managed to get XFS up,
but I refuse to show the patch, because it's making me barf when I look
into it.

_spinlock = spinlock
spinlock = mutex
_mutex = semaphore
semphore = whatever
....

That's violating every single aspect of software design. That's messing
up the whole kernel.

What have we at the very end ? A endless mess of non understandable
macros, which are resolved by compiler magic ? Where nobody can see on
the first look, which kind of concurrency control you are using ? That's
a nice thing to do some proof of concept implementation, but it can not
be a solution for something what is targeted to go into mainline. The
frequency of T4-T7 patches including the small fixes posted on LKML is
just a proof of this.

> > I'm not talking about automatic conversion of rules. I'm talking about
> > automatic conversion of different concurrency controls into a
> > equivillance function, which lets you better identify the neccecary
> > manual changes and leaves room for simple and non intrusive replacement
> > implementations.
>
> This is kind of a sketchy problem. So far all of what I've seen really needs
> to be done manually and can be done using the all of the normal Linux locking
> and scheduler/interrupt masking primitives. I'd hate to see another system
> added to this that solves a problem that may not exist. Please, correct
> me if I'm not understanding you.

We have spinlocks, mutexes, semaphores and preemption as types of
concurrency control implementations in the kernel. They represent
different grades of access exclusion control.

But all of them have one in common. Exclusive access to resources.

So the natural consequence is to convert _all_ concurrency control
mechanisms into a single identifiable one. That's a purely semantical
conversion, in terms of macro replacement, where no functional change
takes place.

After you have done this, it is much more easier to

a) identify the nested places, as you have to look for exactly one
pattern instead of N
b) to easy experiment with replacment functions
c) to make clear which changes to the code you are making

substituting

enter_critical_section(SPIN_LOCK,....) by
enter_critical_section(XYZ_MUTEX,....) is
understandable for most of the people.

Changing it by hidden gcc magic is not.

The bad thing of hidden gcc magic is that you will not be able to
analyse nested concurrency controls in one go. You have to figure out
what the heck spin_lock vs. _spin_lock vs. semaphore vs. _semaphore vs.
mutex vs. _mutex means.

So cleaning up the purely semantical (clear wording sense) way is the
first step to go instead of changing a bunch of macros over the place
and break the half of the kernel compile.

tglx

2004-10-12 23:22:24

by Adam Heath

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, 12 Oct 2004, Bill Huey wrote:

> Yeah, for me a bit of freak out Saturday that is still
> kind of happening since this has been a personal project
> of mine for a long time. :) I interpreted it as a visibility
> move on your company's part, which I hate to say is a bit
> unnerving to know that another group was doing the same
> work. TimeSys's Scott Wood and friends are doing something
> like this as well. I'm only being fair by mentioning them. :)

This is because companies and inviduals still think that developing things
privately is the correct way to go. Doing things this way will leave
open the possibility that someone else will do the same bit of work, and
the final output will clash.

Remember, release early, release often.

2004-10-12 23:33:10

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Wed, 2004-10-13 at 00:57, Bill Huey wrote:
> But since this is in a commerical context we have to save
> face by at least putting our cards on the table and establishing
> a sort of role in this community.

Yeah, a pretty good way to establish a role by keeping your mouth shut
and let others do redundant work.

> That commericial development
> attitude the reason why I haven't been permitted to talk about
> this stuff openly, only sort of on the side in various
> preemption discussions.

Discuss this with your company.

> Yeah, for me a bit of freak out Saturday that is still
> kind of happening since this has been a personal project
> of mine for a long time. :) I interpreted it as a visibility
> move on your company's part, which I hate to say is a bit
> unnerving to know that another group was doing the same
> work. TimeSys's Scott Wood and friends are doing something
> like this as well. I'm only being fair by mentioning them. :)

There are other people around who worked on similar things openly.

> Well, uh, at least you're single kernel image folks like
> us and not flaming us/me yet for corrupting the sancity
> of Linux. Oh man, I feel a flame war coming. This is such
> touchy material.

The flame war might come, ...

> What's Monta Vista's attitude toward preemption development ?
> open or closed ? I know this is a charged question, but
> this has to be asked. :)
>
> This commerical thing is going to be weird. I wish I was
> an angry hippie instead of having a job at certain moments. :)
>
> But the bay area is pretty damn cool, so... that makes up
> for it. :)

... but not about realtime improvements.

It might be about: " hey we putting this up to play a role attitude of
comapnies".

tglx

2004-10-12 23:33:42

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Wed, Oct 13, 2004 at 01:10:34AM +0200, Thomas Gleixner wrote:
> > This has been articulate a couple of times by both me and Ingo (recent email).
> > The MV's system is highly unstable, not because of priority inheritance,
> > but because of basic lock violation in the lock graph itself. It's another kind
> > of SMP granularity problem. The hard problem was just what Ingo was saying and
> > it's higher, but higher in the graph.
>
> Can you point me a bit more clear on what you are talking about ?

It's just a lock graph dependency problem. Things up top in the graph
force things below it to be non-preemptable. The things up top need
to be changed, so that things below it can also be preemptable. Sleeping
within an atomic critical section, local_irq* or preempt_count() > 0,
is a deadlock waiting to happen.

> So the natural consequence is to convert _all_ concurrency control
> mechanisms into a single identifiable one. That's a purely semantical
> conversion, in terms of macro replacement, where no functional change
> takes place.
...
> The bad thing of hidden gcc magic is that you will not be able to
> analyse nested concurrency controls in one go. You have to figure out
> what the heck spin_lock vs. _spin_lock vs. semaphore vs. _semaphore vs.
> mutex vs. _mutex means.

Yeah, I thought of it initially as a great idea, but ultimately this
is going to impose on the overall Linux development methodology if
these patches go into the mainstream.

I know what you're saying, but I ask you to be patient. All of this
stuff is going to get clean up when I get some critical parts in place.
And, yes, I do agree that this is unspeakably horrid. The static
type determination thing probably will have to be removed at some point,
but it's useful for rapid changing in the kernel at this time so that
Ingo can make changes to keep up with MontaVista.

All I can ask is for folks to be patient as all groups get synced up
to each other and then we'll be able to talk about it more meaningfully.
A bunch of things will fall into place once we all parties are mentally
synced up.

bill

2004-10-12 23:43:04

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, 2004-10-12 at 19:17, Adam Heath wrote:
> This is because companies and inviduals still think that developing things
> privately is the correct way to go. Doing things this way will leave
> open the possibility that someone else will do the same bit of work, and
> the final output will clash.
>
> Remember, release early, release often.

Except that none of the parties involved claim to have solved all the
priority inheritance issues etc. "Releasing early" if it doesn't work
yet just makes you look bad. There are perfectly valid reasons to do
kernel development privately. MontaVista was doing just that and they
saw that some of their work may be duplicated so they released it. I
don't see how this conflicts with the open source development process at
all.

Lee

2004-10-12 23:45:11

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Wed, 2004-10-13 at 01:33, Bill Huey wrote:
> Yeah, I thought of it initially as a great idea, but ultimately this
> is going to impose on the overall Linux development methodology if
> these patches go into the mainstream.
>
> I know what you're saying, but I ask you to be patient. All of this
> stuff is going to get clean up when I get some critical parts in place.
> And, yes, I do agree that this is unspeakably horrid. The static
> type determination thing probably will have to be removed at some point,
> but it's useful for rapid changing in the kernel at this time so that
> Ingo can make changes to keep up with MontaVista.
>
> All I can ask is for folks to be patient as all groups get synced up
> to each other and then we'll be able to talk about it more meaningfully.
> A bunch of things will fall into place once we all parties are mentally
> synced up.

Hey, what are you talking about ?

Everybody should shut up, until some people have decided that others can
participate in the development ?

I proposed this to stop this stupid race for the better solution, which
is ugly and horrid, as you accept yourself.

There is no rush to push those enhancements within no time and there is
no Nobel prize to win.

Both groups have published their incomplete solutions and now we should
stop and contemplate how to merge this effort in a less nerve racking
way so we can improve and investigate this further on a common base.

tglx

2004-10-12 23:52:41

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Wed, Oct 13, 2004 at 01:37:06AM +0200, Thomas Gleixner wrote:
> Hey, what are you talking about ?
>
> Everybody should shut up, until some people have decided that others can
> participate in the development ?

No, just wait and your (everybody's) concern should be address. It takes
time to work through all of the slop. I'm all for syncing to a single
solution, but there's a ton of problems that still need to be addressed.

> I proposed this to stop this stupid race for the better solution, which
> is ugly and horrid, as you accept yourself.

Yes, the efforts are distant from each other and it's going to take time
to resolve it. I'm probably going to use Ingo's stuff in 2.6.9+, but my
stuff in 2.6.7 is useful as a specialized kind of test harness. I'll
have to think about what's the best way of resolving this. I agree on
these points.

bill

2004-10-13 00:31:31

by George Anzinger

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Sven Dietrich wrote:
>>>I think patch size is an issue, but I also think that , eventually, we
>>>should change all spin_lock calls that actually lock a mutex to be more
>>>distinct so it's obvious what is going on. Sven and I both agree that
>>>this should be addressed. Is this a non-issue for you? What does the
>>>community want? I don't find your code or ours acceptable in it's
>>>current form , due to this issue.
>>>
>>>With the addition of PREEMPT_REALTIME it looks like you more than
>>>doubled the size of voluntary preempt. I really feel that it should
>>>remain as two distinct patches. They are dependent , but the scope of
>>>the changes are too vast to lump it all together.
>>>
>>
>>
>>If there is the tendency to touch the concurrency controls in general
>>all over the kernel, then I would suggest a script driven overhaul of
>>all concurrency controls like spin_locks, mutexes and semaphores to
>>general macros like
>>
>>enter_critical_section(TYPE, &var, &flags, whatever);
>>leave_critical_section(TYPE, &var, flags, whatever);

There is nothing here that can not be done with a macro. Don't really need a
script. The optimizer would drop out unused code...

-g
>>
>>where TYPE might be SPIN_LOCK, SPIN_LOCK_IRQ, MUTEX, PMUTEX or whatever
>>we have and come up with in the future.
>>
>>This could be done in a first step and then it is clearly identifiable
>>and it gives us more flexibility to wrap different implementations and
>>lets us change particular points in a more clear way.
>>
>>I would be willing to provide some scripted conversion aid, if there is
>>enough interest to that. I started with some test files and the results
>>are quite encouraging.
>>
>
>
>
>
> Ideally we would eventually provide some level of tunability, i.e.
> if you want the spinlocks all the way around it should be possible
> to have that, or one could enable degrees of enhancements,
> expanding on the existing sequence starting with PREEMPT, IRQ_THREADS,
> BKL, MUTEX, etc. In addition to that, once the minim set of spinlocks
> necessary for RT is established, additional layers, corresponding to
> the lock nesting order, could be established, making the "mutex-depth"
> somewhat configurable based on the performance requirements.
>
> The entire effort would have the side effect of making the locking and
> critical sections more distinct, and reveal soft spots in concurrency
> code, as well as to raise awareness of the code density inside
> critical sections.
>
> The concept of tunable foreground / background responsiveness,
> based on preemptability of low priority processes comes to mind.
> A lot of folks would probably not mind making UI responsiveness
> a little crisper, others will want the throughput.
>
> I realize this is an early stage to be looking at it so high end,
> but I think in general this type of script would not be a bad addition
> to the patch kit(s).
>
>
> Sven
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

2004-10-13 01:01:52

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Wed, 13 Oct 2004 01:10:34 +0200, Thomas Gleixner said:

> What have we at the very end ? A endless mess of non understandable
> macros, which are resolved by compiler magic ? Where nobody can see on
> the first look, which kind of concurrency control you are using ? That's
> a nice thing to do some proof of concept implementation, but it can not
> be a solution for something what is targeted to go into mainline. The
> frequency of T4-T7 patches including the small fixes posted on LKML is
> just a proof of this.

I seem to remember Ingo saying that this *is* still somewhat "proof of concept",
and that the gcc preprocessor ad-crockery was just a *really* nice way of doing
it semi-automagically while minimizing the patch footprint and intrusiveness.

I'm sure that once we've got a non-moving target, at least 2 or 3 levels
of preprocessor redirection will get cleaned up and removed, to save
future programmer's sanity..

(Viewed alternatively - how many more flubs would the T4-T7 series have
if Ingo wasn't using the preprocessor to do the heavy lifting? For something
at the current level of cookedness, it's doing fairly well)...

Attachments:

(No filename) (226.00 B)

2004-10-13 02:02:29

by K.R. Foley

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

Bill Huey (hui) wrote:
<snip>
>
> Well, uh, at least you're single kernel image folks like
> us and not flaming us/me yet for corrupting the sancity
> of Linux. Oh man, I feel a flame war coming. This is such
> touchy material.
>
> What's Monta Vista's attitude toward preemption development ?
> open or closed ? I know this is a charged question, but
> this has to be asked. :)
>
> This commerical thing is going to be weird. I wish I was
> an angry hippie instead of having a job at certain moments. :)

Aside from being able to claim first to market, what is to be gained by
having this effort closed? If it is truly integrated into the Linux
kernel and not another segregated/multi-kernel solution, is there any
way to keep it closed?

>
> But the bay area is pretty damn cool, so... that makes up
> for it. :)
>
> bill
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-10-13 03:56:17

[permalink] [raw]

Subject: Re: [Ext-rt-dev] Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, Oct 12, 2004 at 02:41:02PM -0700, Sven Dietrich wrote:
> I emailed the mmlinux project about 2 months ago,
> telling you that we were doing this.

http://mmlinux.sourceforge.net/temp/

I'll do an official announcement tomorrow. It's party time for me. :)

bill

2004-10-14 05:06:22

by Dipankar Sarma

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Tue, Oct 12, 2004 at 07:50:29AM +0200, Ingo Molnar wrote:
>
> regarding RCU serialization - i think that is the way to go - i dont
> think there is any sensible way to extend RCU to a fully preempted
> model, RCU is all about per-CPU-ness and per-CPU-ness is quite limited
> in a fully preemptible model.

It seems that way to me too. Long ago I implemented preemptible
RCU, but did not follow it through because I believed it
was not a good idea. The original patch is here :

http://www.uwsg.iu.edu/hypermail/linux/kernel/0205.1/0026.html

This allows read-side critical sections of RCU to be preempted.
It will take a bit of work to re-use it in RCU as of now, but
I don't think it makes sense to do so. My primary concern is
DoS/OOM situation due to preempted tasks holding up RCU.

>
> could you send those RCU patches (no matter how incomplete/broken)? It's
> the main issue that causes the dcache_lock to be raw still. (and a
> number of dependent locks: fs-writeback.c, inode.c, etc.) We can make
> those RCU changes not impact the normal !PREEMPT_REALTIME locking so it
> might have a chance for upstream merging as well.

I would be interested in this too.

Thanks
Dipankar

2004-10-14 07:16:58

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

* Dipankar Sarma <[email protected]> wrote:

> On Tue, Oct 12, 2004 at 07:50:29AM +0200, Ingo Molnar wrote:
> >
> > regarding RCU serialization - i think that is the way to go - i dont
> > think there is any sensible way to extend RCU to a fully preempted
> > model, RCU is all about per-CPU-ness and per-CPU-ness is quite limited
> > in a fully preemptible model.
>
> It seems that way to me too. Long ago I implemented preemptible RCU,
> but did not follow it through because I believed it was not a good
> idea. The original patch is here :
>
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0205.1/0026.html

interesting!

> This allows read-side critical sections of RCU to be preempted. It
> will take a bit of work to re-use it in RCU as of now, but I don't
> think it makes sense to do so. [...]

note that meanwhile i have implemented another variant:

http://marc.theaimsgroup.com/?l=linux-kernel&m=109771365907797&w=2

i dont think this will be the final interface (the _rt postfix is
stupid, it should probably be _spin?), but i think this is roughly the
structure of how to attack it - a minimal extension to the RCU APIs to
allow for serialization. What do you think about this particular
approach?

> [...] My primary concern is DoS/OOM situation due to preempted tasks
> holding up RCU.

in the serialization solution in -U0 it would be possible to immediately
free the RCU entries and hence have no DoS/OOM situation - although the
-U0 patch does not do this yet.

Ingo

2004-10-15 15:04:17

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Thu, Oct 14, 2004 at 09:18:10AM +0200, Ingo Molnar wrote:
>
> * Dipankar Sarma <[email protected]> wrote:
>
> > On Tue, Oct 12, 2004 at 07:50:29AM +0200, Ingo Molnar wrote:
> > >
> > > regarding RCU serialization - i think that is the way to go - i dont
> > > think there is any sensible way to extend RCU to a fully preempted
> > > model, RCU is all about per-CPU-ness and per-CPU-ness is quite limited
> > > in a fully preemptible model.
> >
> > It seems that way to me too. Long ago I implemented preemptible RCU,
> > but did not follow it through because I believed it was not a good
> > idea. The original patch is here :
> >
> > http://www.uwsg.iu.edu/hypermail/linux/kernel/0205.1/0026.html
>
> interesting!
>
> > This allows read-side critical sections of RCU to be preempted. It
> > will take a bit of work to re-use it in RCU as of now, but I don't
> > think it makes sense to do so. [...]
>
> note that meanwhile i have implemented another variant:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=109771365907797&w=2
>
> i dont think this will be the final interface (the _rt postfix is
> stupid, it should probably be _spin?), but i think this is roughly the
> structure of how to attack it - a minimal extension to the RCU APIs to
> allow for serialization. What do you think about this particular
> approach?

One caution (which you are no doubt already aware of) -- if an RCU
algorithm that reads (rcu_read_lock()/rcu_read_unlock()) in process
context and updates in softirq/bh/irq context, you can see deadlocks.

Thanx, Paul

> > [...] My primary concern is DoS/OOM situation due to preempted tasks
> > holding up RCU.
>
> in the serialization solution in -U0 it would be possible to immediately
> free the RCU entries and hence have no DoS/OOM situation - although the
> -U0 patch does not do this yet.
>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2004-10-15 15:44:39

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

* Paul E. McKenney <[email protected]> wrote:

> One caution (which you are no doubt already aware of) -- if an RCU
> algorithm that reads (rcu_read_lock()/rcu_read_unlock()) in process
> context and updates in softirq/bh/irq context, you can see deadlocks.

yeah - but in the PREEMPT_REALTIME kernel there are simply no irq or
softirq contexts in process contexts - everything is a task. So
everything can (and does) block.

Ingo

2004-10-15 16:46:02

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Fri, Oct 15, 2004 at 05:45:42PM +0200, Ingo Molnar wrote:
>
> * Paul E. McKenney <[email protected]> wrote:
>
> > One caution (which you are no doubt already aware of) -- if an RCU
> > algorithm that reads (rcu_read_lock()/rcu_read_unlock()) in process
> > context and updates in softirq/bh/irq context, you can see deadlocks.
>
> yeah - but in the PREEMPT_REALTIME kernel there are simply no irq or
> softirq contexts in process contexts - everything is a task. So
> everything can (and does) block.

OK, am probably confused, but I thought that the whole point of your
PREEMPT_REALTIME implementation of rcu_read_lock_rt() was to enable
preemption in the RCU read-side critical section. If this is indeed
the case, then it looks to me like code that would run in softirq/bh/irq
context in a kernel compiled non-PREEMPT_REALTIME could now run during
the time that a code path running under rcu_read_lock_rt() was preempted.

If so, then the kernel can end up freeing a data item that the preempted
RCU read-side critical section is still referencing.

OK, so what am I missing here?

Thanx, Paul

2004-10-15 16:50:56

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel

On Fri, Oct 15, 2004 at 09:40:39AM -0700, Paul E. McKenney wrote:
> On Fri, Oct 15, 2004 at 05:45:42PM +0200, Ingo Molnar wrote:
> >
> > * Paul E. McKenney <[email protected]> wrote:
> >
> > > One caution (which you are no doubt already aware of) -- if an RCU
> > > algorithm that reads (rcu_read_lock()/rcu_read_unlock()) in process
> > > context and updates in softirq/bh/irq context, you can see deadlocks.
> >
> > yeah - but in the PREEMPT_REALTIME kernel there are simply no irq or
> > softirq contexts in process contexts - everything is a task. So
> > everything can (and does) block.
>
> OK, am probably confused, but I thought that the whole point of your
> PREEMPT_REALTIME implementation of rcu_read_lock_rt() was to enable
> preemption in the RCU read-side critical section. If this is indeed
> the case, then it looks to me like code that would run in softirq/bh/irq
> context in a kernel compiled non-PREEMPT_REALTIME could now run during
> the time that a code path running under rcu_read_lock_rt() was preempted.
>
> If so, then the kernel can end up freeing a data item that the preempted
> RCU read-side critical section is still referencing.
>
> OK, so what am I missing here?

Never mind!!! You insert the mutex. Sorry for the noise!

Thanx, Paul

2004-10-17 17:10:47