2006-02-21 15:57:51

by Ingo Molnar

[permalink] [raw]
Subject: 2.6.15-rt17

i have released the 2.6.15-rt17 tree, which can be downloaded from the
usual place:

http://redhat.com/~mingo/realtime-preempt/

lots of changes all across the map. There are several bigger changes:

the biggest change is the new PI code from Esben Nielsen, Thomas
Gleixner and Steven Rostedt. This big rework simplifies and streamlines
the PI code, and fixes a couple of bugs and races:

- only the top priority waiter on a lock is enqueued into the pi_list
of the task which holds the lock. No more pi list walking in the
boost case.

- simpler locking rules

- fast Atomic acquire for the non contended case and atomic release
for non waiter case is fully functional now

- use task_t references instead of thread_info pointers

- BKL handling for semaphore style locks changed so that BKL is
dropped before the scheduler is entered and reaquired in the return
path. This solves a possible deadlock situation in the BKL reacquire
path of the scheduler.

another change is the reworking of the SLAB code: it now closely matches
the upstream SLAB code, and it should now work on NUMA systems too
(untested though).

the tasklet code was reworked too to be PREEMPT_RT friendly: the new PI
code unearthed a fundamental livelock scenario with PREEMPT_RT, and the
fix was to rework the tasklet code to get rid of the 'retrigger
softirqs' approach.

other changes: various hrtimers fixes, latency tracer enhancements - and
more. (Robust-futexes are not expected to work in this release.)

please report any new breakages, and re-report any old breakages as
well.

to build a 2.6.15-rt17 tree, the following patches should be applied:

http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.15.tar.bz2
http://redhat.com/~mingo/realtime-preempt/patch-2.6.15-rt17

Ingo


2006-02-21 16:11:33

by K.R. Foley

[permalink] [raw]
Subject: Re: 2.6.15-rt17

Ingo Molnar wrote:
> i have released the 2.6.15-rt17 tree, which can be downloaded from the
> usual place:
>
> http://redhat.com/~mingo/realtime-preempt/
>
> lots of changes all across the map. There are several bigger changes:
>
> the biggest change is the new PI code from Esben Nielsen, Thomas
> Gleixner and Steven Rostedt. This big rework simplifies and streamlines
> the PI code, and fixes a couple of bugs and races:
>
> - only the top priority waiter on a lock is enqueued into the pi_list
> of the task which holds the lock. No more pi list walking in the
> boost case.
>
> - simpler locking rules
>
> - fast Atomic acquire for the non contended case and atomic release
> for non waiter case is fully functional now
>
> - use task_t references instead of thread_info pointers
>
> - BKL handling for semaphore style locks changed so that BKL is
> dropped before the scheduler is entered and reaquired in the return
> path. This solves a possible deadlock situation in the BKL reacquire
> path of the scheduler.
>
> another change is the reworking of the SLAB code: it now closely matches
> the upstream SLAB code, and it should now work on NUMA systems too
> (untested though).
>
> the tasklet code was reworked too to be PREEMPT_RT friendly: the new PI
> code unearthed a fundamental livelock scenario with PREEMPT_RT, and the
> fix was to rework the tasklet code to get rid of the 'retrigger
> softirqs' approach.
>
> other changes: various hrtimers fixes, latency tracer enhancements - and
> more. (Robust-futexes are not expected to work in this release.)
>
> please report any new breakages, and re-report any old breakages as
> well.
>
> to build a 2.6.15-rt17 tree, the following patches should be applied:
>
> http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.15.tar.bz2
> http://redhat.com/~mingo/realtime-preempt/patch-2.6.15-rt17
>
> Ingo

Ingo I think you fat fingered the name of the patchfile on your site. :)

--
kr

2006-02-21 16:21:39

by K.R. Foley

[permalink] [raw]
Subject: Re: 2.6.15-rt17

Ingo Molnar wrote:
> i have released the 2.6.15-rt17 tree, which can be downloaded from the
> usual place:
>
> http://redhat.com/~mingo/realtime-preempt/
>
> lots of changes all across the map. There are several bigger changes:
>
> the biggest change is the new PI code from Esben Nielsen, Thomas
> Gleixner and Steven Rostedt. This big rework simplifies and streamlines
> the PI code, and fixes a couple of bugs and races:
>
> - only the top priority waiter on a lock is enqueued into the pi_list
> of the task which holds the lock. No more pi list walking in the
> boost case.
>
> - simpler locking rules
>
> - fast Atomic acquire for the non contended case and atomic release
> for non waiter case is fully functional now
>
> - use task_t references instead of thread_info pointers
>
> - BKL handling for semaphore style locks changed so that BKL is
> dropped before the scheduler is entered and reaquired in the return
> path. This solves a possible deadlock situation in the BKL reacquire
> path of the scheduler.
>
> another change is the reworking of the SLAB code: it now closely matches
> the upstream SLAB code, and it should now work on NUMA systems too
> (untested though).
>
> the tasklet code was reworked too to be PREEMPT_RT friendly: the new PI
> code unearthed a fundamental livelock scenario with PREEMPT_RT, and the
> fix was to rework the tasklet code to get rid of the 'retrigger
> softirqs' approach.
>
> other changes: various hrtimers fixes, latency tracer enhancements - and
> more. (Robust-futexes are not expected to work in this release.)
>
> please report any new breakages, and re-report any old breakages as
> well.
>
> to build a 2.6.15-rt17 tree, the following patches should be applied:
>
> http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.15.tar.bz2
> http://redhat.com/~mingo/realtime-preempt/patch-2.6.15-rt17
>
> Ingo
> -

Also would you apply the attached patch to fix the RTC_HISTOGRAM Kconfig
option. In it's current state "tristate" it allows building as a module,
which of course doesn't work.

--
kr


Attachments:
rtc-Kconfig.patch (360.00 B)

2006-02-21 17:09:28

by Timothy R. Chavez

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On Tue, 2006-02-21 at 16:55 +0100, Ingo Molnar wrote:
> i have released the 2.6.15-rt17 tree, which can be downloaded from the
> usual place:
>
> http://redhat.com/~mingo/realtime-preempt/
>
> lots of changes all across the map. There are several bigger changes:
>
> the biggest change is the new PI code from Esben Nielsen, Thomas
> Gleixner and Steven Rostedt. This big rework simplifies and streamlines
> the PI code, and fixes a couple of bugs and races:
>
> - only the top priority waiter on a lock is enqueued into the pi_list
> of the task which holds the lock. No more pi list walking in the
> boost case.
>
> - simpler locking rules
>
> - fast Atomic acquire for the non contended case and atomic release
> for non waiter case is fully functional now
>
> - use task_t references instead of thread_info pointers
>
> - BKL handling for semaphore style locks changed so that BKL is
> dropped before the scheduler is entered and reaquired in the return
> path. This solves a possible deadlock situation in the BKL reacquire
> path of the scheduler.
>
> another change is the reworking of the SLAB code: it now closely matches
> the upstream SLAB code, and it should now work on NUMA systems too
> (untested though).
>
> the tasklet code was reworked too to be PREEMPT_RT friendly: the new PI
> code unearthed a fundamental livelock scenario with PREEMPT_RT, and the
> fix was to rework the tasklet code to get rid of the 'retrigger
> softirqs' approach.
>
> other changes: various hrtimers fixes, latency tracer enhancements - and
> more. (Robust-futexes are not expected to work in this release.)
>
> please report any new breakages, and re-report any old breakages as
> well.
>
> to build a 2.6.15-rt17 tree, the following patches should be applied:
>
> http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.15.tar.bz2
> http://redhat.com/~mingo/realtime-preempt/patch-2.6.15-rt17

http://people.redhat.com/mingo/realtime-preempt/patch-2.6.16-rt17 ???

-tim

>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2006-02-21 17:16:26

by Michal Piotrowski

[permalink] [raw]
Subject: Re: 2.6.15-rt17

Hi,

On 21/02/06, Ingo Molnar <[email protected]> wrote:
> i have released the 2.6.15-rt17 tree, which can be downloaded from the
> usual place:
[snip]
> to build a 2.6.15-rt17 tree, the following patches should be applied:
>
> http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.15.tar.bz2
> http://redhat.com/~mingo/realtime-preempt/patch-2.6.15-rt17
>
> Ingo

I have noticed some bugs

----------------------------->
| new stack-footprint maximum: swapper/0, 788 bytes (out of 4044 bytes).
------------|
{ 24} [<c0135e7d>] debug_stackoverflow+0x80/0xb2
{ 28} [<c01364e9>] __mcount+0x3b/0xb2
{ 20} [<c010ebd8>] mcount+0x14/0x18
{ 124} [<c01dc5cf>] number+0xe/0x204
{ 76} [<c01dcbc3>] vsnprintf+0x3fe/0x438
{ 28} [<c01dcc18>] vscnprintf+0x1b/0x2b
{ 128} [<c011bd04>] vprintk+0x7a/0x23d
{ 20} [<c011bc88>] printk+0x18/0x1a
{ 172} [<c010ec2a>] MP_processor_info+0x4a/0x1b4
{ 36} [<c010ee2e>] mp_register_lapic+0x9a/0xa0
{ 24} [<c0482d5e>] acpi_parse_lapic+0x45/0x56
{ 28} [<c048dda6>] acpi_table_parse_madt_family+0xb0/0x100
{ 28} [<c048de10>] acpi_table_parse_madt+0x1a/0x1c
{ 32} [<c048312a>] acpi_parse_madt_lapic_entries+0x49/0x9a
{ 8} [<c048327d>] acpi_process_madt+0x23/0xb3
{ 24} [<c0483540>] acpi_boot_init+0x3c/0x4c
{ 20} [<c04805d4>] setup_arch+0x1af/0x1ed
{ 24} [<c047c72f>] start_kernel+0x30/0x19a


----------------------------->
----------------------------->
| new stack-footprint maximum: rcS/332, 2224 bytes (out of 4044 bytes).
| new stack-footprint maximum: rcS/332, 2224 bytes (out of 4044 bytes).
------------|
------------|
{ 24} { 24} [<c0135e7d>] [<c0135e7d>]
debug_stackoverflow+0x80/0xb2debug_stackoverflow+0x80/0xb2

{ 28} { 28} [<c01364e9>] [<c01364e9>] __mcount+0x3b/0xb2__mcount+0x3b/0xb2

{ 20} { 20} [<c010ebd8>] [<c010ebd8>] mcount+0x14/0x18mcount+0x14/0x18

{ 12} { 12} [<c02c3404>] [<c02c3404>]
_raw_spin_lock+0x9/0x26_raw_spin_lock+0x9/0x26

{ 16} { 16} [<c02c2a18>] [<c02c2a18>]
__lock_text_start+0x20/0x55__lock_text_start+0x20/0x55

{ 32} { 32} [<c015ee47>] [<c015ee47>]
kmem_cache_alloc+0x50/0xb8kmem_cache_alloc+0x50/0xb8

{ 16} { 16} [<c0148603>] [<c0148603>]
mempool_alloc_slab+0x13/0x15mempool_alloc_slab+0x13/0x15

{ 52} { 52} [<c01484e3>] [<c01484e3>]
mempool_alloc+0x3f/0xdamempool_alloc+0x3f/0xda

{ 24} { 24} [<c01d815c>] [<c01d815c>]
as_set_request+0x21/0x76as_set_request+0x21/0x76

{ 24} { 24} [<c01cfea3>] [<c01cfea3>]
elv_set_request+0x24/0x34elv_set_request+0x24/0x34

{ 44} { 44} [<c01d21b9>] get_request+0x173/0x279
{ 60} [<c01d22de>] get_request_wait+0x1f/0xd2
{ 76} [<c01d2eeb>] __make_request+0x2c2/0x42d[<c01d21b9>]
{ 60} get_request+0x173/0x279[<c01d31e0>]
generic_make_request+0x108/0x11a{ 60}
[<c01d22de>] { 60} get_request_wait+0x1f/0xd2[<c01d32cb>]
submit_bio+0xd9/0xe2{ 76}
[<c01d2eeb>] { 36} __make_request+0x2c2/0x42d[<c0166a92>]
submit_bh+0x10f/0x131{ 60}
[<c01d31e0>] { 20} generic_make_request+0x108/0x11a[<c0164a80>]
__bread_slow+0x51/0x8d{ 60}
[<c01d32cb>] { 12} submit_bio+0xd9/0xe2
[<c0164d32>] { 36} __bread+0x2b/0x32[<c0166a92>]
submit_bh+0x10f/0x131{ 48}
[<c019bc1b>] { 20} ext3_get_branch+0x6a/0xea[<c0164a80>]
__bread_slow+0x51/0x8d{ 120}
[<c019c136>] { 12} ext3_get_block_handle+0xab/0x289[<c0164d32>]
__bread+0x2b/0x32{ 40}
[<c019c389>] { 48} ext3_get_block+0x75/0x7c[<c019bc1b>]
ext3_get_branch+0x6a/0xea{ 328}
[<c01834ca>] { 120} do_mpage_readpage+0x14d/0x383[<c019c136>]
ext3_get_block_handle+0xab/0x289{ 84}
[<c0183779>] { 40} mpage_readpages+0x79/0xf4[<c019c389>]
ext3_get_block+0x75/0x7c{ 24}
[<c019d0c4>] { 328} ext3_readpages+0x1b/0x1d[<c01834ca>]
do_mpage_readpage+0x14d/0x383{ 76}
[<c014bb7a>] { 84} read_pages+0x2a/0xcd[<c0183779>]
mpage_readpages+0x79/0xf4{ 56}
{ 24} [<c019d0c4>] [<c014bd6d>]
ext3_readpages+0x1b/0x1d__do_page_cache_readahead+0x150/0x185

{ 76} { 28} [<c014bb7a>] [<c014be56>]
read_pages+0x2a/0xcddo_page_cache_readahead+0x40/0x4c

{ 56} { 60} [<c014bd6d>] [<c0146943>]
__do_page_cache_readahead+0x150/0x185filemap_nopage+0x148/0x2e8

{ 28} { 60} [<c014be56>] [<c0151def>]
do_page_cache_readahead+0x40/0x4cdo_no_page+0x98/0x2a7

{ 60} { 48} [<c0146943>] [<c015212f>]
filemap_nopage+0x148/0x2e8__handle_mm_fault+0xc5/0x17d

{ 60} { 76} [<c0151def>] [<c02c4576>]
do_no_page+0x98/0x2a7do_page_fault+0x16b/0x494

{ 48} { 80} [<c015212f>] [<c01037df>]
__handle_mm_fault+0xc5/0x17derror_code+0x4f/0x54

{ 76} { 16} [<c02c4576>] [<c0188529>]
do_page_fault+0x16b/0x494padzero+0x23/0x34

{ 80} { 140} [<c01037df>] [<c0189484>]
error_code+0x4f/0x54load_elf_binary+0x752/0xa9f

{ 16} { 40} [<c0188529>] [<c016cd86>]
padzero+0x23/0x34search_binary_handler+0xcc/0x299

{ 140} { 164} [<c0189484>] [<c0188461>]
load_elf_binary+0x752/0xa9fload_script+0x195/0x1a8

{ 40} { 40} [<c016cd86>] [<c016cd86>]
search_binary_handler+0xcc/0x299search_binary_handler+0xcc/0x299

{ 164} { 32} [<c0188461>] [<c016d0c4>]
load_script+0x195/0x1a8do_execve+0x171/0x21c

{ 40} { 36} [<c016cd86>] [<c01019da>]
search_binary_handler+0xcc/0x299sys_execve+0x2f/0x74

{ 32} {=2212} [<c016d0c4>] [<c0102c7e>]
do_execve+0x171/0x21csyscall_call+0x7/0xb

{ 36} <---------------------------

[<c01019da>] sys_execve+0x2f/0x74
{=2212} [<c0102c7e>] syscall_call+0x7/0xb
<---------------------------

Here is config http://www.stardust.webpages.pl/files/rt/2.6.15-rt17/rt-config
Here is dmesg http://www.stardust.webpages.pl/files/rt/2.6.15-rt17/rt-dmesg

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)

2006-02-21 18:25:04

by Daniel Walker

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On Tue, 2006-02-21 at 16:55 +0100, Ingo Molnar wrote:
> i have released the 2.6.15-rt17 tree, which can be downloaded from the
> usual place:
>
> http://redhat.com/~mingo/realtime-preempt/

The patch at thie URL is patch-2.6.16-rt17 . But it doesn't look like
it's for 2.6.16-rc4 .

Daniel

2006-02-21 18:45:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On Tue, 2006-02-21 at 10:24 -0800, Daniel Walker wrote:
> On Tue, 2006-02-21 at 16:55 +0100, Ingo Molnar wrote:
> > i have released the 2.6.15-rt17 tree, which can be downloaded from the
> > usual place:
> >
> > http://redhat.com/~mingo/realtime-preempt/
>
> The patch at thie URL is patch-2.6.16-rt17 . But it doesn't look like
> it's for 2.6.16-rc4 .

It's agains 2.6.15

tglx


2006-02-21 18:53:42

by Michal Piotrowski

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On 21/02/06, Michal Piotrowski <[email protected]> wrote:
[snip]
> Here is config http://www.stardust.webpages.pl/files/rt/2.6.15-rt17/rt-config
> Here is dmesg http://www.stardust.webpages.pl/files/rt/2.6.15-rt17/rt-dmesg

Here is updated config
http://www.stardust.webpages.pl/files/rt/2.6.15-rt17/rt-config2
Here is updated dmesg
http://www.stardust.webpages.pl/files/rt/2.6.15-rt17/rt-dmesg2

BTW. thanks for fixing CONFIG_DEBUG_HIGHMEM=y

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)

2006-02-21 20:36:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On Tue, 2006-02-21 at 19:53 +0100, Michal Piotrowski wrote:
> On 21/02/06, Michal Piotrowski <[email protected]> wrote:
> [snip]
> > Here is config http://www.stardust.webpages.pl/files/rt/2.6.15-rt17/rt-config
> > Here is dmesg http://www.stardust.webpages.pl/files/rt/2.6.15-rt17/rt-dmesg
>
> Here is updated config
> http://www.stardust.webpages.pl/files/rt/2.6.15-rt17/rt-config2
> Here is updated dmesg
> http://www.stardust.webpages.pl/files/rt/2.6.15-rt17/rt-dmesg2
>
> BTW. thanks for fixing CONFIG_DEBUG_HIGHMEM=y

The stack maximum messages are harmless. They just report a new
watermark high quite verbose. You can turn them off by disabling
DEBUG_STACKOVERFLOW.

skge eth0: Link is up at 100 Mbps, full duplex, flow control tx and rx
WARNING: softirq-tasklet/8 changed soft IRQ-flags.
[<c0103b33>] dump_stack+0x1b/0x1f (20)
[<c0134035>] illegal_API_call+0x41/0x46 (20)
[<c0134084>] local_irq_disable+0x1d/0x1f (8)
[<f8839bf3>] skge_extirq+0x117/0x138 [skge] (32)
[<c0120a73>] __tasklet_action+0xb8/0xfd (28)
[<c0120afc>] tasklet_action+0x44/0x4e (28)
[<c0120ce8>] ksoftirqd+0x10f/0x19e (32)
[<c012dfa6>] kthread+0x7b/0xa9 (36)
[<c01010dd>] kernel_thread_helper+0x5/0xb (1037991964)

This one is interesting. It brought my attention to that piece of code
which is a long standing problem on one of my boxen anyway. Patch
attached.

tglx







Attachments:
skge-fix.patch (1.49 kB)

2006-02-21 21:15:50

by Michal Piotrowski

[permalink] [raw]
Subject: Re: 2.6.15-rt17

Hi,

On 21/02/06, Thomas Gleixner <[email protected]> wrote:
> The stack maximum messages are harmless. They just report a new
> watermark high quite verbose. You can turn them off by disabling
> DEBUG_STACKOVERFLOW.

Sorry, I forgot about this.

>
> skge eth0: Link is up at 100 Mbps, full duplex, flow control tx and rx
> WARNING: softirq-tasklet/8 changed soft IRQ-flags.
> [<c0103b33>] dump_stack+0x1b/0x1f (20)
> [<c0134035>] illegal_API_call+0x41/0x46 (20)
> [<c0134084>] local_irq_disable+0x1d/0x1f (8)
> [<f8839bf3>] skge_extirq+0x117/0x138 [skge] (32)
> [<c0120a73>] __tasklet_action+0xb8/0xfd (28)
> [<c0120afc>] tasklet_action+0x44/0x4e (28)
> [<c0120ce8>] ksoftirqd+0x10f/0x19e (32)
> [<c012dfa6>] kthread+0x7b/0xa9 (36)
> [<c01010dd>] kernel_thread_helper+0x5/0xb (1037991964)
>
> This one is interesting. It brought my attention to that piece of code
> which is a long standing problem on one of my boxen anyway. Patch
> attached.
>
> tglx
>

Thanks, problem solved.

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)

2006-02-22 07:53:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.15-rt17


* K.R. Foley <[email protected]> wrote:

> Also would you apply the attached patch to fix the RTC_HISTOGRAM
> Kconfig option. In it's current state "tristate" it allows building as
> a module, which of course doesn't work.

thanks, applied - should show up in -rt18.

Ingo

2006-02-22 12:22:39

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.15-rt17


On Tue, 21 Feb 2006, Michal Piotrowski wrote:

> Hi,
>
> On 21/02/06, Ingo Molnar <[email protected]> wrote:
> > i have released the 2.6.15-rt17 tree, which can be downloaded from the
> > usual place:
> [snip]
> > to build a 2.6.15-rt17 tree, the following patches should be applied:
> >
> > http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.15.tar.bz2
> > http://redhat.com/~mingo/realtime-preempt/patch-2.6.15-rt17
> >
> > Ingo
>
> I have noticed some bugs
>
> ----------------------------->
> | new stack-footprint maximum: swapper/0, 788 bytes (out of 4044 bytes).
> ------------|
> { 24} [<c0135e7d>] debug_stackoverflow+0x80/0xb2
> { 28} [<c01364e9>] __mcount+0x3b/0xb2
> { 20} [<c010ebd8>] mcount+0x14/0x18
> { 124} [<c01dc5cf>] number+0xe/0x204
> { 76} [<c01dcbc3>] vsnprintf+0x3fe/0x438
> { 28} [<c01dcc18>] vscnprintf+0x1b/0x2b
> { 128} [<c011bd04>] vprintk+0x7a/0x23d
> { 20} [<c011bc88>] printk+0x18/0x1a
> { 172} [<c010ec2a>] MP_processor_info+0x4a/0x1b4
> { 36} [<c010ee2e>] mp_register_lapic+0x9a/0xa0
> { 24} [<c0482d5e>] acpi_parse_lapic+0x45/0x56
> { 28} [<c048dda6>] acpi_table_parse_madt_family+0xb0/0x100
> { 28} [<c048de10>] acpi_table_parse_madt+0x1a/0x1c
> { 32} [<c048312a>] acpi_parse_madt_lapic_entries+0x49/0x9a
> { 8} [<c048327d>] acpi_process_madt+0x23/0xb3
> { 24} [<c0483540>] acpi_boot_init+0x3c/0x4c
> { 20} [<c04805d4>] setup_arch+0x1af/0x1ed
> { 24} [<c047c72f>] start_kernel+0x30/0x19a
>

Ingo,

Maybe the following patch is needed, so that people know that this is not
a bug.

-- Steve

Index: linux-2.6.15-rt17/kernel/latency.c
===================================================================
--- linux-2.6.15-rt17.orig/kernel/latency.c 2006-02-21 11:44:54.000000000 -0500
+++ linux-2.6.15-rt17/kernel/latency.c 2006-02-22 07:20:55.000000000 -0500
@@ -418,6 +418,8 @@ static notrace void __print_worst_stack(
printk("| new stack-footprint maximum: %s/%d, %ld bytes (out of %ld bytes).\n",
worst_stack_comm, worst_stack_pid,
MAX_STACK-worst_stack_left, (long)MAX_STACK);
+ printk("| This is not a BUG\n");
+ printk("| turn off CONFIG_DEBUG_STACK_OVERFLOW if you don't want this\n");
printk("------------|\n");

show_stackframe();

2006-02-22 13:47:09

by Michal Piotrowski

[permalink] [raw]
Subject: Re: 2.6.15-rt17

Hi,

On 22/02/06, Steven Rostedt <[email protected]> wrote:
>
> Ingo,
>
> Maybe the following patch is needed, so that people know that this is not
> a bug.
>
> -- Steve

It will be a very useful piece of information for kernel testers (aka.
not hackers ;).

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)

2006-02-22 16:51:17

by Esben Nielsen

[permalink] [raw]
Subject: Re: 2.6.15-rt17



On Tue, 21 Feb 2006, Ingo Molnar wrote:

> i have released the 2.6.15-rt17 tree, which can be downloaded from the
> usual place:
>
> http://redhat.com/~mingo/realtime-preempt/
>
> lots of changes all across the map. There are several bigger changes:
>
> the biggest change is the new PI code from Esben Nielsen, Thomas
> Gleixner and Steven Rostedt. This big rework simplifies and streamlines
> the PI code, and fixes a couple of bugs and races:
>
I didn't know anyone looked at my patch! I am ofcourse happy about it :-)

I switftly looked at 2.6.15-rt17 to see how it turned out. I found one
problem in get_blocked_on(task):

First the task->pi_lock is locked, lock = task->blocked_on->lock, then
task->pi_lock is unlocked. Now lock is used. This is not safe.
Once task->pi_lock is unlocked we can be preempted and wait for a long
while the lock can be freed (if forinstance a driver is unloaded). You
either have to hold task->pi_lock or lock->wait_lock while refering to
lock. Otherwise the lock can be deallocated behind your bag. I assume that
nobody deallocate a lock with waiters.

That was why I had _reversed_ the lock ordering relative to normal in the
original patch: First lock task->pi_lock. Assign lock. Lock
lock->wait_lock. Then unlock task->pi_lock. Now it is safe to refer to
lock. To avoid deadlocks I used _raw_spin_trylock() when locking the 2.
lock.

To make a long story short:

--- linux-2.6.15-rt17/kernel/rt.c.orig 2006-02-22 16:53:44.000000000
+0100
+++ linux-2.6.15-rt17/kernel/rt.c 2006-02-22 16:56:21.000000000
+0100
@@ -873,10 +873,17 @@
}

lock = task->blocked_on->lock;
+
+ /* Now we have to take lock->wait_lock _before_ releasing
+ task->pi_lock. Otherwise lock can be deallocated while we are
+ refering to it as the subsystem has no way of knowing about us
+ hanging around in here. */
+ if (!_raw_spin_trylock(&lock->wait_lock)) {
+ _raw_spin_unlock(&task->pi_lock);
+ goto try_again;
+ }
_raw_spin_unlock(&task->pi_lock);

- if (!_raw_spin_trylock(&lock->wait_lock))
- goto try_again;

owner = lock_owner(lock);
if (owner)

It compiles, but I have not tested it. I can't see why it shouldn't work.

Esben

2006-02-22 23:16:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On Wed, 2006-02-22 at 17:50 +0100, Esben Nielsen wrote:
> I didn't know anyone looked at my patch! I am ofcourse happy about it :-)

Was just delayed due to other work in progress.

> That was why I had _reversed_ the lock ordering relative to normal in the
> original patch: First lock task->pi_lock. Assign lock. Lock
> lock->wait_lock. Then unlock task->pi_lock. Now it is safe to refer to
> lock. To avoid deadlocks I used _raw_spin_trylock() when locking the 2.
> lock.

Stupid me. I messed that one up. Should show up in the next -rt

Thanks

tglx


2006-02-23 13:49:36

by Bill Huey

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On Tue, Feb 21, 2006 at 04:55:48PM +0100, Ingo Molnar wrote:
> another change is the reworking of the SLAB code: it now closely matches
> the upstream SLAB code, and it should now work on NUMA systems too
> (untested though).

SLAB you say ? What's going on with this error ?

----
mm/built-in.o: In function `drain_alien_cache':slab.c:(.text+0x1fb2c):
undefined reference to `slab_spin_unlock_irqrestore'
make: *** [.tmp_vmlinux1] Error 1
----

bill

2006-02-23 13:55:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.15-rt17


* Bill Huey <[email protected]> wrote:

> On Tue, Feb 21, 2006 at 04:55:48PM +0100, Ingo Molnar wrote:
> > another change is the reworking of the SLAB code: it now closely matches
> > the upstream SLAB code, and it should now work on NUMA systems too
> > (untested though).
>
> SLAB you say ? What's going on with this error ?
>
> ----
> mm/built-in.o: In function `drain_alien_cache':slab.c:(.text+0x1fb2c):
> undefined reference to `slab_spin_unlock_irqrestore'
> make: *** [.tmp_vmlinux1] Error 1
> ----

find the fix from S�bastien Dugu� below.

Ingo

Index: linux-2.6.15-rt17-numa/mm/slab.c
===================================================================
--- linux-2.6.15-rt17-numa.orig/mm/slab.c 2006-02-22 10:38:26.000000000 +0100
+++ linux-2.6.15-rt17-numa/mm/slab.c 2006-02-22 11:12:42.000000000 +0100
@@ -163,8 +163,8 @@
spin_unlock_irq(lock)
# define slab_spin_lock_irqsave(lock, flags, cpu) \
do { spin_lock_irqsave(lock, flags); (cpu) = smp_processor_id(); } while (0)
-# define slab_spin_lock_irqrestore(lock, flags, cpu) \
- do { spin_lock_irqrestore(lock, flags); } while (0)
+# define slab_spin_unlock_irqrestore(lock, flags, cpu) \
+ do { spin_unlock_irqrestore(lock, flags); } while (0)
#else
DEFINE_PER_CPU_LOCKED(int, slab_locks) = { 0, };
# define slab_irq_disable(cpu) get_cpu_var_locked(slab_locks, &(cpu))
@@ -183,8 +183,8 @@
do { spin_unlock(lock); slab_irq_enable(cpu); } while (0)
# define slab_spin_lock_irqsave(lock, flags, cpu) \
do { slab_irq_disable(cpu); spin_lock_irqsave(lock, flags); } while (0)
-# define slab_spin_lock_irqrestore(lock, flags, cpu) \
- do { spin_lock_irqrestore(lock, flags); slab_irq_enable(cpu); } while (0)
+# define slab_spin_unlock_irqrestore(lock, flags, cpu) \
+ do { spin_unlock_irqrestore(lock, flags); slab_irq_enable(cpu); } while (0)
#endif

/* Shouldn't this be in a header file somewhere? */

2006-02-23 14:05:58

by Bill Huey

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On Thu, Feb 23, 2006 at 02:53:50PM +0100, Ingo Molnar wrote:
> * Bill Huey <[email protected]> wrote:
> > SLAB you say ? What's going on with this error ?
> > undefined reference to `slab_spin_unlock_irqrestore'
>
> find the fix from S???bastien Dugu??? below.

I recreate the fix since it was obvious and this new version fixes the previous
SLAB related crash in -rt16. Looking good.

bill

2006-02-24 06:39:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.15-rt17


* Steven Rostedt <[email protected]> wrote:

> + printk("| This is not a BUG\n");
> + printk("| turn off CONFIG_DEBUG_STACK_OVERFLOW if you don't want this\n");

i added the patch below - we want to know about too high stack
footprints.

Ingo

Index: linux-rt.q/kernel/latency.c
===================================================================
--- linux-rt.q.orig/kernel/latency.c
+++ linux-rt.q/kernel/latency.c
@@ -414,10 +414,17 @@ static void show_stackframe(void)

static notrace void __print_worst_stack(void)
{
+ unsigned long fill_ratio;
printk("----------------------------->\n");
- printk("| new stack-footprint maximum: %s/%d, %ld bytes (out of %ld bytes).\n",
+ printk("| new stack fill maximum: %s/%d, %ld bytes (out of %ld bytes).\n",
worst_stack_comm, worst_stack_pid,
MAX_STACK-worst_stack_left, (long)MAX_STACK);
+ fill_ratio = (MAX_STACK-worst_stack_left)*100/(long)MAX_STACK;
+ printk("| Stack fill ratio: %02ld%%", fill_ratio);
+ if (fill_ratio >= 90)
+ printk(" - BUG: that's quite high, please report this!\n");
+ else
+ printk(" - that's still OK, no need to report this.\n");
printk("------------|\n");

show_stackframe();

2006-02-24 07:09:37

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.15-rt17


On Fri, 24 Feb 2006, Ingo Molnar wrote:

>
> * Steven Rostedt <[email protected]> wrote:
>
> > + printk("| This is not a BUG\n");
> > + printk("| turn off CONFIG_DEBUG_STACK_OVERFLOW if you don't want this\n");
>
> i added the patch below - we want to know about too high stack
> footprints.

Yeah you're right, those should be reported.

>
> MAX_STACK-worst_stack_left, (long)MAX_STACK);
> + fill_ratio = (MAX_STACK-worst_stack_left)*100/(long)MAX_STACK;
> + printk("| Stack fill ratio: %02ld%%", fill_ratio);
> + if (fill_ratio >= 90)
> + printk(" - BUG: that's quite high, please report this!\n");
> + else
> + printk(" - that's still OK, no need to report this.\n");
> printk("------------|\n");

On the non bug case, I would still state the way to turn it off. It
becomes quite annoying if there are no bugs, and a non kernel hacker may
not know how to disable it.

-- Steve


2006-02-25 17:12:12

by Esben Nielsen

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On Thu, 23 Feb 2006, Thomas Gleixner wrote:

> On Wed, 2006-02-22 at 17:50 +0100, Esben Nielsen wrote:
> > I didn't know anyone looked at my patch! I am ofcourse happy about it :-)
>
> Was just delayed due to other work in progress.
>

You didn't use the "TestRTMutex" I send along with the patch :-(

Since I learned to use unittesting that way I have been a big fan. It does
catch a lot of stupid bugs without having to wait for the program to get
there while keeping the ability to debug with gdb and fix it right away.
And most important: you can keep the tests and check if your program still
parses them after a rewrite. Very usefull to prevent repeating bugs.

So here is the issues I have found:

1) grablockBKL.tst failes. Apparently you can "grab" the BKL now? Is this
intended? I have updated the test to accept this.

2) 2locksdeadlock.tst loops forever. It is a livelock: When two RT-tasks
"deadlocks" on two mutexes they keep waking up each other in other. I
quickly fixed that bug.

3) While fixing that I came to think about what happens when you signal a
task blocked on a task blocked on a task. I thus wrote
2lock3tasksBoostSignal.tst. Well, the priorities wasn't set back
correctly. I fixed that too.

I have attached the patch againt 2.6.17-rt17 (and therefore also
included the previous patch) along with the updated tester and tests.

Esben


> > That was why I had _reversed_ the lock ordering relative to normal in the
> > original patch: First lock task->pi_lock. Assign lock. Lock
> > lock->wait_lock. Then unlock task->pi_lock. Now it is safe to refer to
> > lock. To avoid deadlocks I used _raw_spin_trylock() when locking the 2.
> > lock.
>
> Stupid me. I messed that one up. Should show up in the next -rt
>
> Thanks
>
> tglx
>
>


Attachments:
TestRTMutex.tgz (24.19 kB)
pi.patch (2.78 kB)
Download all attachments

2006-02-25 17:32:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.15-rt17


On Sat, 25 Feb 2006, Esben Nielsen wrote:

> On Thu, 23 Feb 2006, Thomas Gleixner wrote:
>
> > On Wed, 2006-02-22 at 17:50 +0100, Esben Nielsen wrote:
> > > I didn't know anyone looked at my patch! I am ofcourse happy about it :-)
> >
> > Was just delayed due to other work in progress.
> >
>
> You didn't use the "TestRTMutex" I send along with the patch :-(
>
> Since I learned to use unittesting that way I have been a big fan. It does
> catch a lot of stupid bugs without having to wait for the program to get
> there while keeping the ability to debug with gdb and fix it right away.
> And most important: you can keep the tests and check if your program still
> parses them after a rewrite. Very usefull to prevent repeating bugs.
>
> So here is the issues I have found:
>
> 1) grablockBKL.tst failes. Apparently you can "grab" the BKL now? Is this
> intended? I have updated the test to accept this.

since the BKL is now released on semaphores, I guess this is ok.

>
> 2) 2locksdeadlock.tst loops forever. It is a livelock: When two RT-tasks
> "deadlocks" on two mutexes they keep waking up each other in other. I
> quickly fixed that bug.

I actually thought about that. But it still is an improvement, since it
doesn't deadlock the kernel. Just all processes that are of lower
priority. This still should be fixed, but it needs to not cause any
noticable overhead.

>
> 3) While fixing that I came to think about what happens when you signal a
> task blocked on a task blocked on a task. I thus wrote
> 2lock3tasksBoostSignal.tst. Well, the priorities wasn't set back
> correctly. I fixed that too.
>
> I have attached the patch againt 2.6.17-rt17 (and therefore also
> included the previous patch) along with the updated tester and tests.
>
> Esben

I'll take a look at this tomorrow.

-- Steve

>
>
> > > That was why I had _reversed_ the lock ordering relative to normal in the
> > > original patch: First lock task->pi_lock. Assign lock. Lock
> > > lock->wait_lock. Then unlock task->pi_lock. Now it is safe to refer to
> > > lock. To avoid deadlocks I used _raw_spin_trylock() when locking the 2.
> > > lock.
> >
> > Stupid me. I messed that one up. Should show up in the next -rt
> >
> > Thanks
> >
> > tglx
> >
> >
>
>

2006-02-25 21:30:17

by Esben Nielsen

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On Sat, 25 Feb 2006, Steven Rostedt wrote:

>
> On Sat, 25 Feb 2006, Esben Nielsen wrote:
>
> > On Thu, 23 Feb 2006, Thomas Gleixner wrote:
> >
> > > On Wed, 2006-02-22 at 17:50 +0100, Esben Nielsen wrote:
> > > > I didn't know anyone looked at my patch! I am ofcourse happy about it :-)
> > >
> > > Was just delayed due to other work in progress.
> > >
> >
> > You didn't use the "TestRTMutex" I send along with the patch :-(
> >
> > Since I learned to use unittesting that way I have been a big fan. It does
> > catch a lot of stupid bugs without having to wait for the program to get
> > there while keeping the ability to debug with gdb and fix it right away.
> > And most important: you can keep the tests and check if your program still
> > parses them after a rewrite. Very usefull to prevent repeating bugs.
> >
> > So here is the issues I have found:
> >
> > 1) grablockBKL.tst failes. Apparently you can "grab" the BKL now? Is this
> > intended? I have updated the test to accept this.
>
> since the BKL is now released on semaphores, I guess this is ok.

When I started on the tester this was so as well. It is a basic feature of
CONFIG_PREEMPT_BKL that you release the BKL semaphore when you block. The
problem in PREEMT_RT is that if you block in spin_lock() you should _not_
give BKL away.
And now I remember why you couldn't grab BKL as any other lock before: The
pending owner flag was on the task. Now if you could grab BKL you could be
pending owner of both BKL and another mutex. With the new pending owner
implementation this is no longer a problem.

>
> >
> > 2) 2locksdeadlock.tst loops forever. It is a livelock: When two RT-tasks
> > "deadlocks" on two mutexes they keep waking up each other in other. I
> > quickly fixed that bug.
>
> I actually thought about that. But it still is an improvement, since it
> doesn't deadlock the kernel. Just all processes that are of lower
> priority.

Which are almost all the processes since only RT tasks can do this....

> This still should be fixed, but it needs to not cause any
> noticable overhead.

With this patch there is less overhead than before:
wake_up_process_mutex() is called less often.

>
> >
> > 3) While fixing that I came to think about what happens when you signal a
> > task blocked on a task blocked on a task. I thus wrote
> > 2lock3tasksBoostSignal.tst. Well, the priorities wasn't set back
> > correctly. I fixed that too.
> >

This time around wake_up_process_mutex() wasn't called when it ought to
be... Now that I think about it there still is a problem with the
patch I sent: First the priority is set down, then the task is woken up.
But then it can't continue to de-boost the next task... Let me write a
test with 4 tasks and 3 locks to demonstrate.


> > I have attached the patch againt 2.6.17-rt17 (and therefore also
> > included the previous patch) along with the updated tester and tests.
> >
> > Esben
>
> I'll take a look at this tomorrow.
>
> -- Steve
>
> >
> >
> > > > That was why I had _reversed_ the lock ordering relative to normal in the
> > > > original patch: First lock task->pi_lock. Assign lock. Lock
> > > > lock->wait_lock. Then unlock task->pi_lock. Now it is safe to refer to
> > > > lock. To avoid deadlocks I used _raw_spin_trylock() when locking the 2.
> > > > lock.
> > >
> > > Stupid me. I messed that one up. Should show up in the next -rt
> > >
> > > Thanks
> > >
> > > tglx
> > >
> > >
> >
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2006-02-25 22:25:27

by Esben Nielsen

[permalink] [raw]
Subject: Re: 2.6.15-rt17

On Sat, 25 Feb 2006, Esben Nielsen wrote:

> On Sat, 25 Feb 2006, Steven Rostedt wrote:
> [...]
> This time around wake_up_process_mutex() wasn't called when it ought to
> be... Now that I think about it there still is a problem with the
> patch I sent: First the priority is set down, then the task is woken up.
> But then it can't continue to de-boost the next task... Let me write a
> test with 4 tasks and 3 locks to demonstrate.
>
Ok, attached is the test and a better patch which solves the problem.

Esben

>
> > > I have attached the patch againt 2.6.17-rt17 (and therefore also
> > > included the previous patch) along with the updated tester and tests.
> > >
> > > Esben
> >
> > I'll take a look at this tomorrow.
> >
> > -- Steve
> >
> > >
> > >
> > > > > That was why I had _reversed_ the lock ordering relative to normal in the
> > > > > original patch: First lock task->pi_lock. Assign lock. Lock
> > > > > lock->wait_lock. Then unlock task->pi_lock. Now it is safe to refer to
> > > > > lock. To avoid deadlocks I used _raw_spin_trylock() when locking the 2.
> > > > > lock.
> > > >
> > > > Stupid me. I messed that one up. Should show up in the next -rt
> > > >
> > > > Thanks
> > > >
> > > > tglx
> > > >
> > > >
> > >
> > >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


Attachments:
3locks4tasksBoostSignal.tst (711.00 B)
pi.patch (3.76 kB)
Download all attachments

2006-02-26 15:14:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.15-rt17


Please inline patches! it makes it a lot easier to comment on.


On Sat, 25 Feb 2006, Esben Nielsen wrote:
> On Sat, 25 Feb 2006, Steven Rostedt wrote:
> > On Sat, 25 Feb 2006, Esben Nielsen wrote:
> >
> > >
> > > You didn't use the "TestRTMutex" I send along with the patch :-(

These are good to have, but it should not be included in a kernel patch.
You can always host this tool too.

> > >
> > > Since I learned to use unittesting that way I have been a big fan. It does
> > > catch a lot of stupid bugs without having to wait for the program to get
> > > there while keeping the ability to debug with gdb and fix it right away.
> > > And most important: you can keep the tests and check if your program still
> > > parses them after a rewrite. Very usefull to prevent repeating bugs.
> > >
> > > So here is the issues I have found:
> > >
> > > 1) grablockBKL.tst failes. Apparently you can "grab" the BKL now? Is this
> > > intended? I have updated the test to accept this.
> >
> > since the BKL is now released on semaphores, I guess this is ok.
>
> When I started on the tester this was so as well. It is a basic feature of
> CONFIG_PREEMPT_BKL that you release the BKL semaphore when you block. The
> problem in PREEMT_RT is that if you block in spin_lock() you should _not_
> give BKL away.

And we don't. The BKL is held, we fool the scheduler by lowering setting
the lock depth to below zero, so the scheduler doesn't think we own it.
Or did you test program not realize that?

here in ____down_mutex:


if (waiter.task) {
unsigned long saved_flags = current->flags & PF_NOSCHED;
int saved_lock_depth = current->lock_depth;

/*
* Prevent schedule() to drop BKL, while waiting for
* the lock ! We restore lock_depth when we come back.
*/
current->lock_depth = -1;
current->flags &= ~PF_NOSCHED;
raw_local_irq_enable();

schedule();

raw_local_irq_disable();
current->flags |= saved_flags;
current->lock_depth = saved_lock_depth;
state = xchg(&current->state, TASK_UNINTERRUPTIBLE);
if (unlikely(state == TASK_RUNNING))
saved_state = TASK_RUNNING;
}


> And now I remember why you couldn't grab BKL as any other lock before: The
> pending owner flag was on the task. Now if you could grab BKL you could be
> pending owner of both BKL and another mutex. With the new pending owner
> implementation this is no longer a problem.

Correct, but it still causes problems in the logic. Say task A blocks on
semaphore X, which is owned by task B (which had the BKL and released it
on schedule). But B is also blocked on semaphore Y. Task A wakes B and
when B goes to grab the BKL it blocks. Thus you are boosting two lines of
pi. This is a really bad design, which was easily solved by releasing the
BKL outside, of the rt_mutex logic. When B really does have semaphore Y
it then grabs the BKL. This way if it fails, the pi logic is still sane.

>
> >
> > >
> > > 2) 2locksdeadlock.tst loops forever. It is a livelock: When two RT-tasks
> > > "deadlocks" on two mutexes they keep waking up each other in other. I
> > > quickly fixed that bug.
> >
> > I actually thought about that. But it still is an improvement, since it
> > doesn't deadlock the kernel. Just all processes that are of lower
> > priority.
>
> Which are almost all the processes since only RT tasks can do this....

And remember that this is also only a problem with futexes, since there
are no deadlocks in the kernel ;-) So if you have an RT task higher than
all the others, it will still run. Allowing you to at least shutdown the
machine nicely :)

>
> > This still should be fixed, but it needs to not cause any
> > noticable overhead.
>
> With this patch there is less overhead than before:
> wake_up_process_mutex() is called less often.
>

Does it detect deadlocks??

> >
> > >
> > > 3) While fixing that I came to think about what happens when you signal a
> > > task blocked on a task blocked on a task. I thus wrote
> > > 2lock3tasksBoostSignal.tst. Well, the priorities wasn't set back
> > > correctly. I fixed that too.
> > >
>
> This time around wake_up_process_mutex() wasn't called when it ought to
> be... Now that I think about it there still is a problem with the
> patch I sent: First the priority is set down, then the task is woken up.
> But then it can't continue to de-boost the next task... Let me write a
> test with 4 tasks and 3 locks to demonstrate.
>
>
> > > I have attached the patch againt 2.6.17-rt17 (and therefore also
> > > included the previous patch) along with the updated tester and tests.
> > >
> > > Esben
> >
> > I'll take a look at this tomorrow.
> >

I'm looking at the last patch you sent:

+static void adjust_prio_no_wakeup(task_t *task)
{
int prio = calc_pi_prio(task);

- if (task->prio != prio)
+ if (task->prio != prio) {
+ mutex_setprio(task, prio);
+ }
+}

Remove the brackets.

+static void adjust_prio_wakeup(task_t *task)
+{
+ int prio = calc_pi_prio(task);
+
+ if (task->prio < prio) {

will always be false, since cal_pi_prio returns the highest priority of
either the task or one of it's waiters (the lower prio is, the higher the
priority).

-- Steve

2006-02-26 22:38:22

by Esben Nielsen

[permalink] [raw]
Subject: Re: 2.6.15-rt17



On Sun, 26 Feb 2006, Steven Rostedt wrote:

>
> Please inline patches! it makes it a lot easier to comment on.
>
Ok, I set up the alternative editor in Pine now, so I can do it in the future.

>
> On Sat, 25 Feb 2006, Esben Nielsen wrote:
> > On Sat, 25 Feb 2006, Steven Rostedt wrote:
> > > On Sat, 25 Feb 2006, Esben Nielsen wrote:
> > >
> > > >
> > > > You didn't use the "TestRTMutex" I send along with the patch :-(
>
> These are good to have, but it should not be included in a kernel patch.
> You can always host this tool too.
>
On the contrary: The kernel should have a test/ directy at the top level and
a "make test" target which runs all sort of unit-tests. For all patches to be
accepted make test should be succesfull. Unit tests are best when maintained
along with the code it tests.

> > > >
> > > > Since I learned to use unittesting that way I have been a big fan. It does
> > > > catch a lot of stupid bugs without having to wait for the program to get
> > > > there while keeping the ability to debug with gdb and fix it right away.
> > > > And most important: you can keep the tests and check if your program still
> > > > parses them after a rewrite. Very usefull to prevent repeating bugs.
> > > >
> > > > So here is the issues I have found:
> > > >
> > > > 1) grablockBKL.tst failes. Apparently you can "grab" the BKL now? Is this
> > > > intended? I have updated the test to accept this.
> > >
> > > since the BKL is now released on semaphores, I guess this is ok.
> >
> > When I started on the tester this was so as well. It is a basic feature of
> > CONFIG_PREEMPT_BKL that you release the BKL semaphore when you block. The
> > problem in PREEMT_RT is that if you block in spin_lock() you should _not_
> > give BKL away.
>
> And we don't. The BKL is held, we fool the scheduler by lowering setting
> the lock depth to below zero, so the scheduler doesn't think we own it.
> Or did you test program not realize that?
>

Yes, it did. Here is teh test script (this time inlined):

#priorities:
threads: 101 102
#BKL is lock 0

lock 0 spinlock 1
test: + +
test: prio 101 prio 102

#Reverse the locks:
spinlock 1 lock 0

#You do _not_ give away the BKL on a spin lock. Therefore we deadlock.
test: - -
test: prio 101 prio 102


There is a test without spinlock, but just lock (meaning down). This will not
deadlock because BKL is given away.

> [...]
>
>
> > And now I remember why you couldn't grab BKL as any other lock before: The
> > pending owner flag was on the task. Now if you could grab BKL you could be
> > pending owner of both BKL and another mutex. With the new pending owner
> > implementation this is no longer a problem.
>
> Correct, but it still causes problems in the logic. Say task A blocks on
> semaphore X, which is owned by task B (which had the BKL and released it
> on schedule). But B is also blocked on semaphore Y. Task A wakes B and
> when B goes to grab the BKL it blocks. Thus you are boosting two lines of
> pi. This is a really bad design, which was easily solved by releasing the
> BKL outside, of the rt_mutex logic. When B really does have semaphore Y
> it then grabs the BKL. This way if it fails, the pi logic is still sane.

*nod*

>
> >
> > >
> > > >
> > > > 2) 2locksdeadlock.tst loops forever. It is a livelock: When two RT-tasks
> > > > "deadlocks" on two mutexes they keep waking up each other in other. I
> > > > quickly fixed that bug.
> > >
> > > I actually thought about that. But it still is an improvement, since it
> > > doesn't deadlock the kernel. Just all processes that are of lower
> > > priority.
> >
> > Which are almost all the processes since only RT tasks can do this....
>
> And remember that this is also only a problem with futexes, since there
> are no deadlocks in the kernel ;-) So if you have an RT task higher than
> all the others, it will still run. Allowing you to at least shutdown the
> machine nicely :)

Yeah, but futexes with PI will be something comming back in soonish I think.

>
> >
> > > This still should be fixed, but it needs to not cause any
> > > noticable overhead.
> >
> > With this patch there is less overhead than before:
> > wake_up_process_mutex() is called less often.
> >
>
> Does it detect deadlocks??

No, it just only wake up the blocked owner after changing the priortu
when the priority is actually changed.

> [...]
> > >
> > > I'll take a look at this tomorrow.
> > >
>
> I'm looking at the last patch you sent:
>
> +static void adjust_prio_no_wakeup(task_t *task)
> {
> int prio = calc_pi_prio(task);
>
> - if (task->prio != prio)
> + if (task->prio != prio) {
> + mutex_setprio(task, prio);
> + }
> +}
>
> Remove the brackets.
>
> +static void adjust_prio_wakeup(task_t *task)
> +{
> + int prio = calc_pi_prio(task);
> +
> + if (task->prio < prio) {
>
> will always be false, since cal_pi_prio returns the highest priority of
> either the task or one of it's waiters (the lower prio is, the higher the
> priority).
>

No:

/*
* Calculate task priority from the waiter list priority
*
* Return task->normal_prio when the waiter list is empty or when
* the waiter is not allowed to do priority boosting
*/
static inline int calc_pi_prio(task_t *task)
{
int prio = task->normal_prio;
...

When PI boosted task->prio < task->normal_prio and the condition can be true.
Notice that adjust_prio_wakeup() is called from remove_waiter().

> -- Steve
>

Esben

2006-02-27 07:15:46

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.15-rt17


On Sun, 26 Feb 2006, Esben Nielsen wrote:
>
> On Sun, 26 Feb 2006, Steven Rostedt wrote:
>
> >
> > Please inline patches! it makes it a lot easier to comment on.
> >
> Ok, I set up the alternative editor in Pine now, so I can do it in the future.
>

Hmm, I'm using pine now remotely. Doesn't ^R read in a file? Or is that
just with the default editor?

> >
> > On Sat, 25 Feb 2006, Esben Nielsen wrote:
> > > On Sat, 25 Feb 2006, Steven Rostedt wrote:
> > > > On Sat, 25 Feb 2006, Esben Nielsen wrote:
> > > >
> > > > >
> > > > > You didn't use the "TestRTMutex" I send along with the patch :-(
> >
> > These are good to have, but it should not be included in a kernel patch.
> > You can always host this tool too.
> >
> On the contrary: The kernel should have a test/ directy at the top level and
> a "make test" target which runs all sort of unit-tests. For all patches to be
> accepted make test should be succesfull. Unit tests are best when maintained
> along with the code it tests.

I actually agree with you here, but this issue is with the kernel
mantainers, and not the -rt ones. BTW, I wasn't able to get your test to
compile. But I don't have time to look at why.

>
> > > > >
> > > > > Since I learned to use unittesting that way I have been a big fan. It does
> > > > > catch a lot of stupid bugs without having to wait for the program to get
> > > > > there while keeping the ability to debug with gdb and fix it right away.
> > > > > And most important: you can keep the tests and check if your program still
> > > > > parses them after a rewrite. Very usefull to prevent repeating bugs.
> > > > >
> > > > > So here is the issues I have found:
> > > > >
> > > > > 1) grablockBKL.tst failes. Apparently you can "grab" the BKL now? Is this
> > > > > intended? I have updated the test to accept this.
> > > >
> > > > since the BKL is now released on semaphores, I guess this is ok.
> > >
> > > When I started on the tester this was so as well. It is a basic feature of
> > > CONFIG_PREEMPT_BKL that you release the BKL semaphore when you block. The
> > > problem in PREEMT_RT is that if you block in spin_lock() you should _not_
> > > give BKL away.
> >
> > And we don't. The BKL is held, we fool the scheduler by lowering setting
> > the lock depth to below zero, so the scheduler doesn't think we own it.
> > Or did you test program not realize that?
> >
>
> Yes, it did. Here is teh test script (this time inlined):
>
> #priorities:
> threads: 101 102
> #BKL is lock 0
>
> lock 0 spinlock 1
> test: + +
> test: prio 101 prio 102
>
> #Reverse the locks:
> spinlock 1 lock 0
>
> #You do _not_ give away the BKL on a spin lock. Therefore we deadlock.

I'm confused here. You're saying that we did not give away the BKL on
spin lock. But that _is_ the right behavior.

> test: - -
> test: prio 101 prio 102
>
>
> There is a test without spinlock, but just lock (meaning down). This will not
> deadlock because BKL is given away.

And we give it away on semaphore, which is also the right behavior.

>
> > [...]
> >
> >
> > > And now I remember why you couldn't grab BKL as any other lock before: The
> > > pending owner flag was on the task. Now if you could grab BKL you could be
> > > pending owner of both BKL and another mutex. With the new pending owner
> > > implementation this is no longer a problem.
> >
> > Correct, but it still causes problems in the logic. Say task A blocks on
> > semaphore X, which is owned by task B (which had the BKL and released it
> > on schedule). But B is also blocked on semaphore Y. Task A wakes B and
> > when B goes to grab the BKL it blocks. Thus you are boosting two lines of
> > pi. This is a really bad design, which was easily solved by releasing the
> > BKL outside, of the rt_mutex logic. When B really does have semaphore Y
> > it then grabs the BKL. This way if it fails, the pi logic is still sane.
>
> *nod*
>
> >
> > >
> > > >
> > > > >
> > > > > 2) 2locksdeadlock.tst loops forever. It is a livelock: When two RT-tasks
> > > > > "deadlocks" on two mutexes they keep waking up each other in other. I
> > > > > quickly fixed that bug.
> > > >
> > > > I actually thought about that. But it still is an improvement, since it
> > > > doesn't deadlock the kernel. Just all processes that are of lower
> > > > priority.
> > >
> > > Which are almost all the processes since only RT tasks can do this....
> >
> > And remember that this is also only a problem with futexes, since there
> > are no deadlocks in the kernel ;-) So if you have an RT task higher than
> > all the others, it will still run. Allowing you to at least shutdown the
> > machine nicely :)
>
> Yeah, but futexes with PI will be something comming back in soonish I think.

It may still be a while, since there's still lots of bugs and issues to
solve. I wasn't saying that the current solution is correct wrt the
spinning threads. This is most certainly a bug. I was just pointing out
that it's better than the previous hard deadlock that we had. But this
still needs a way to be fixed with low overhead.

> >
> > >
> > > > This still should be fixed, but it needs to not cause any
> > > > noticable overhead.
> > >
> > > With this patch there is less overhead than before:
> > > wake_up_process_mutex() is called less often.
> > >
> >
> > Does it detect deadlocks??
>
> No, it just only wake up the blocked owner after changing the priortu
> when the priority is actually changed.

I was talking about a low overhead wrt detecting deadlocks.

>
> > [...]
> > > >
> > > > I'll take a look at this tomorrow.
> > > >
> >
> > I'm looking at the last patch you sent:
> >
> > +static void adjust_prio_no_wakeup(task_t *task)
> > {
> > int prio = calc_pi_prio(task);
> >
> > - if (task->prio != prio)
> > + if (task->prio != prio) {
> > + mutex_setprio(task, prio);
> > + }
> > +}
> >
> > Remove the brackets.
> >
> > +static void adjust_prio_wakeup(task_t *task)
> > +{
> > + int prio = calc_pi_prio(task);
> > +
> > + if (task->prio < prio) {
> >
> > will always be false, since cal_pi_prio returns the highest priority of
> > either the task or one of it's waiters (the lower prio is, the higher the
> > priority).
> >
>
> No:
>
> /*
> * Calculate task priority from the waiter list priority
> *
> * Return task->normal_prio when the waiter list is empty or when
> * the waiter is not allowed to do priority boosting
> */
> static inline int calc_pi_prio(task_t *task)
> {
> int prio = task->normal_prio;
> ...
>
> When PI boosted task->prio < task->normal_prio and the condition can be true.
> Notice that adjust_prio_wakeup() is called from remove_waiter().
>

/me blushes

OK, I didn't notice the normal_prio part. You're right there. When I get
some more time I'll look more into the rest of your patch.

-- Steve

2006-02-28 19:16:13

by Karsten Wiese

[permalink] [raw]
Subject: Re: 2.6.15-rt17

> the tasklet code was reworked too to be PREEMPT_RT friendly: the new PI
> code unearthed a fundamental livelock scenario with PREEMPT_RT, and the
> fix was to rework the tasklet code to get rid of the 'retrigger
> softirqs' approach.

following patch re enables tasklet_hi.
needed by ALSA MIDI.

Karsten


--- /tmp/softirq.c~ 2006-02-28 20:17:03.000000000 +0100
+++ /tmp/softirq.c 2006-02-28 20:17:03.000000000 +0100
@@ -351,13 +351,13 @@
static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec) = { NULL };

static void inline
-__tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head)
+__tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
{
if (tasklet_trylock(t)) {
WARN_ON(t->next != NULL);
t->next = head->list;
head->list = t;
- raise_softirq_irqoff(TASKLET_SOFTIRQ);
+ raise_softirq_irqoff(nr);
tasklet_unlock(t);
}
}
@@ -367,7 +367,7 @@
unsigned long flags;

raw_local_irq_save(flags);
- __tasklet_common_schedule(t, &__get_cpu_var(tasklet_vec));
+ __tasklet_common_schedule(t, &__get_cpu_var(tasklet_vec), TASKLET_SOFTIRQ);
raw_local_irq_restore(flags);
}

@@ -378,7 +378,7 @@
unsigned long flags;

raw_local_irq_save(flags);
- __tasklet_common_schedule(t, &__get_cpu_var(tasklet_hi_vec));
+ __tasklet_common_schedule(t, &__get_cpu_var(tasklet_hi_vec), HI_SOFTIRQ);
raw_local_irq_restore(flags);
}

_





___________________________________________________________
Telefonate ohne weitere Kosten vom PC zum PC: http://messenger.yahoo.de

2006-02-28 19:44:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.15-rt17


* Karsten Wiese <[email protected]> wrote:

> > the tasklet code was reworked too to be PREEMPT_RT friendly: the new PI
> > code unearthed a fundamental livelock scenario with PREEMPT_RT, and the
> > fix was to rework the tasklet code to get rid of the 'retrigger
> > softirqs' approach.
>
> following patch re enables tasklet_hi.
> needed by ALSA MIDI.

ahh ... this indeed explains some of the bugreports! Applied.

Ingo