2007-11-16 15:52:13

by Remy Bohmer

[permalink] [raw]
Subject: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

Hello All,

I have a problem with the RT-mutex code and signals. The problem is
very easily reproducible, but I do not have found the root-cause yet.
I hope someone can help me on this one.

This is what I am doing:
* I have a simple character driver with a read call.(called spi_read()
in the logging below )
* The read call blocks on a semaphore until some condition in hardware
is reached. (in the routine wait_for_io_level(), see logging below)
* I use a down_interruptible() call on a 'struct semaphore' type
semaphore, which eventually blocks on a mutex. (the semaphore is, of
course, initialised with sema_init() )

What happens is that when a user-space RT-thread is waiting on the
semaphore through the spi_read() call, and a signal arrives during the
wait at this thread (like e.g. CTRL-C), the kernel starts oopsing
until it is as good as brain-dead.

If I do NOT sent a posix-signal the code/mutex/semaphore is working
properly for days,
So, it seems to be related by waking up from a blocking situation,
because of a pending posix-signal.

I tried also the types 'struct compat_semaphore', and mutexes; none of
them work.
In fact the real Mutex type, declared with init_MUTEX() has the same problem.

Anyone an idea?
Below the kernel oops output. (I run on ARM, but I think that should
not matter for this problem)

Kind Regards,

Remy Bohmer

[ 52.113254] WARNING: at kernel/rtmutex.c:585 remove_waiter()
[ 52.113254] [<c0022d88>] (dump_stack+0x0/0x14) from [<c004db84>]
(remove_waiter+0x14c/0x1e0)
[ 52.113254] [<c004da38>] (remove_waiter+0x0/0x1e0) from
[<c0172ef4>] (rt_mutex_slowlock+0x278/0x2ec)
[ 52.113254] [<c0172c7c>] (rt_mutex_slowlock+0x0/0x2ec) from
[<c0172b20>] (rt_mutex_lock_interruptible+0x58/0x5c)
[ 52.113254] [<c0172ac8>] (rt_mutex_lock_interruptible+0x0/0x5c)
from [<c004e458>] (rt_down_interruptible+0x24/0x5c)
[ 52.113254] [<c004e434>] (rt_down_interruptible+0x0/0x5c) from
[<c00fd9a4>] (wait_for_io_level+0x7c/0xcc)
[ 52.113254] r6:00000000 r5:00000024 r4:c01d5a34
[ 52.113254] [<c00fd928>] (wait_for_io_level+0x0/0xcc) from
[<c00fe20c>] (spi_read+0x60/0x198)
[ 52.113254] r6:c3823f78 r5:c01d59b0 r4:00000000
[ 52.113254] [<c00fe1ac>] (spi_read+0x0/0x198) from [<c006e750>]
(vfs_read+0xb4/0xf4)
[ 52.113254] r8:c001f044 r7:c3823f78 r6:40967c29 r5:c3e2ae80 r4:000001ff
[ 52.113254] [<c006e69c>] (vfs_read+0x0/0xf4) from [<c006eb4c>]
(sys_read+0x4c/0x7c)
[ 52.113254] r7:00000003 r6:c3e2ae80 r5:00000000 r4:00000000
[ 52.113254] [<c006eb00>] (sys_read+0x0/0x7c) from [<c001eec0>]
(ret_fast_syscall+0x0/0x2c)
[ 52.113254] r6:40042000 r5:40969490 r4:409694d8
[ 52.791965] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[ 52.791965] pgd = c3ff4000
[ 52.791965] [00000000] *pgd=23e3b031, *pte=00000000, *ppte=00000000
[ 52.791965] stopped custom tracer.
[ 52.791965] Internal error: Oops: 817 [#1] PREEMPT
[ 52.791965] CPU: 0 Not tainted (2.6.23.1-rt5 #23)
[ 52.791965] PC is at task_setprio+0x18/0x17c
[ 52.791965] LR is at __rt_mutex_adjust_prio+0x30/0x34
[ 52.791965] pc : [<c002a95c>] lr : [<c004d488>] psr: a0000093
[ 52.791965] sp : c3823d80 ip : c3823db0 fp : c3823dac
[ 52.791965] r10: ffffffff r9 : a0000013 r8 : c3823ed0
[ 52.791965] r7 : c036f544 r6 : c3823e08 r5 : a0000013 r4 : c02b6ca0
[ 52.791965] r3 : 00000000 r2 : c02b6ec4 r1 : c02b6ec4 r0 : c02b6ca0
[ 52.791965] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM
Segment user
[ 52.791965] Control: c000717f Table: 23ff4000 DAC: 00000015
[ 52.791965] Process ceidrvtest (pid: 301, stack limit = 0xc3822260)
[ 52.791965] Stack: (0xc3823d80 to 0xc3824000)
[ 52.791965] 3d80: 00000002 00000001 c3823da4 c02b6ca0 a0000013
c3823e08 c036f544 c3823ed0
[ 52.791965] 3da0: c3823dc4 c3823db0 c004d488 c002a954 c3823dd4
c3822000 c3823e04 c3823dc8
[ 52.791965] 3dc0: c004dc60 c004d468 c0298000 ffffffff 00000000
c02b63a0 c3823e0c c3822000
[ 52.791965] 3de0: a0000013 c036f544 00000000 c3823ed0 00000004
ffffffff c3823e64 c3823e08
[ 52.791965] 3e00: c017309c c004dc28 c003c144 c003c0e0 00000000
c01b5ff0 c3823e44 c3823e28
[ 52.791965] 3e20: c001eea8 c003c134 c3823e4c c3823e38 00000000
c004d128 c3823e64 c3822000
[ 52.791965] 3e40: 00000000 c3823fb0 c3822000 c3823ed0 00000000
c3823f50 c3823e74 c3823e68
[ 52.791965] 3e60: c0173420 c0172fc8 c3823e84 c3823e78 c0173434
c01733f0 c3823ebc c3823e88
[ 52.791965] 3e80: c003ac84 c0173434 c3823fb0 c02b6ec4 c02b6f28
c3822000 00000000 c3823fb0
[ 52.791965] 3ea0: c001f044 c001f044 c3822000 400298a8 c3823f9c
c3823ec0 c0021558 c003ac4c
[ 52.791965] 3ec0: c3822000 400298a8 c02b6ec4 c3823ed8 c0172b20
c0172c8c c3823f04 c3823ee8
[ 52.791965] 3ee0: c004e458 c0172ad8 c0171d70 c01d5a34 00000024
00000000 c3823f24 c3823f08
[ 52.791965] 3f00: c00fd9a4 c004e444 c3823f54 00000000 c01d59b0
c3823f78 c3823f54 c3823f28
[ 52.791965] 3f20: c00fe20c c00fd938 00000000 00000000 c001f044
000001ff c3e2ae80 40967c29
[ 52.791965] 3f40: c3823f78 c001f044 c3823f74 c3823f58 c006e750
c00fe1bc 00000000 00000000
[ 52.791965] 3f60: c3e2ae80 00000003 c3823fa4 c3823f78 c006eb70
409694d8 40969490 40042000
[ 52.791965] 3f80: 00000003 c001f044 c3822000 400298a8 c3823fac
c3823fa0 c00219a0 c00214fc
[ 52.791965] 3fa0: 00000000 c3823fb0 c001ef0c c0021984 fffffffc
40967c29 000001ff 000001ff
[ 52.791965] 3fc0: 409694d8 40969490 40042000 00000003 40042000
003d0f00 400298a8 40967c0c
[ 52.791965] 3fe0: 00000000 40967bd8 40033ea4 40033eb4 80000010
00000003 20002031 20002431
[ 52.791965] Backtrace:
[ 52.791965] [<c002a944>] (task_setprio+0x0/0x17c) from [<c004d488>]
(__rt_mutex_adjust_prio+0x30/0x34)
[ 52.791965] r8:c3823ed0 r7:c036f544 r6:c3823e08 r5:a0000013 r4:c02b6ca0
[ 52.791965] [<c004d458>] (__rt_mutex_adjust_prio+0x0/0x34) from
[<c004dc60>] (task_blocks_on_rt_mutex+0x48/0x204)
[ 52.791965] r4:c3822000
[ 52.791965] [<c004dc18>] (task_blocks_on_rt_mutex+0x0/0x204) from
[<c017309c>] (rt_spin_lock_slowlock+0xe4/0x1e8)
[ 52.791965] [<c0172fb8>] (rt_spin_lock_slowlock+0x0/0x1e8) from
[<c0173420>] (__rt_spin_lock+0x40/0x44)
[ 52.791965] [<c01733e0>] (__rt_spin_lock+0x0/0x44) from
[<c0173434>] (rt_spin_lock+0x10/0x14)
[ 52.791965] [<c0173424>] (rt_spin_lock+0x0/0x14) from [<c003ac84>]
(get_signal_to_deliver+0x48/0x3e8)
[ 52.791965] [<c003ac3c>] (get_signal_to_deliver+0x0/0x3e8) from
[<c0021558>] (do_signal+0x6c/0x488)
[ 52.791965] [<c00214ec>] (do_signal+0x0/0x488) from [<c00219a0>]
(do_notify_resume+0x2c/0x30)
[ 52.791965] [<c0021974>] (do_notify_resume+0x0/0x30) from
[<c001ef0c>] (work_pending+0x1c/0x20)
[ 52.791965] Code: e24cb004 e24dd00c e351008c 83a03000 (85833000)
[ 54.968721] note: ceidrvtest[301] exited with preempt_count 2
[ 55.002901] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[ 55.002901] pgd = c0004000
[ 55.002901] [00000000] *pgd=00000000
[ 55.002901] Internal error: Oops: 817 [#2] PREEMPT
[ 55.002901] CPU: 0 Tainted: G D (2.6.23.1-rt5 #23)
[ 55.002901] PC is at task_setprio+0x18/0x17c
[ 55.002901] LR is at __rt_mutex_adjust_prio+0x30/0x34
[ 55.002901] pc : [<c002a95c>] lr : [<c004d488>] psr: a0000093
[ 55.002901] sp : c3823ae8 ip : c3823b18 fp : c3823b14
[ 55.002901] r10: ffffffff r9 : a0000013 r8 : 00000000
[ 55.002901] r7 : c01b2000 r6 : c3823b70 r5 : a0000013 r4 : c02b6ca0
[ 55.002901] r3 : 00000000 r2 : c02b6ec4 r1 : c02b6ec4 r0 : c02b6ca0
[ 55.002901] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM
Segment user
[ 55.002901] Control: c000717f Table: 23ff4000 DAC: 00000015
[ 55.002901] Process ceidrvtest (pid: 301, stack limit = 0xc3822260)
[ 55.002901] Stack: (0xc3823ae8 to 0xc3824000)
[ 55.002901] 3ae0: c3822000 000028cd 000028cd
c02b6ca0 a0000013 c3823b70
[ 55.002901] 3b00: c01b2000 00000000 c3823b2c c3823b18 c004d488
c002a954 c002e7c8 c3822000
[ 55.002901] 3b20: c3823b6c c3823b30 c004dc60 c004d468 00000001
00000000 00000000 c02b63a0
[ 55.002901] 3b40: c004cb38 c3822000 a0000013 c01b2000 00000004
00000000 00000004 ffffffff
[ 55.002901] 3b60: c3823bcc c3823b70 c017309c c004dc28 37383639
205d3132 c0033c00 00000000
[ 55.002901] 3b80: c02b6ca0 409694d8 c0018674 c0018674 c02b6ca0
00000000 00000000 c0018674
[ 55.002901] 3ba0: c02b6ca0 c02b6f10 c02b6ca0 c01b763c 00000000
00000000 c0018640 00000000
[ 55.002901] 3bc0: c3823bdc c3823bd0 c0173420 c0172fc8 c3823bec
c3823be0 c0173618 c01733f0
[ 55.002901] 3be0: c3823c0c c3823bf0 c00318e4 c0173618 c3823c0c
c3823c00 c002e0a4 c002dfe8
[ 55.002901] 3c00: c3823c1c c3823c10 c0022c8c c00314f0 c3823c3c
c3823c20 c0023f9c c0022acc
[ 55.002901] 3c20: ffffffff c01b422c c02b6ca0 c3823d38 c3823c84
c3823c40 c00241c8 c0023f40
[ 55.002901] 3c40: c3823c64 c3823c50 c0028b08 c00289b8 c3823d38
00000817 c3823c7c ffffffff
[ 55.002901] 3c60: c01b422c 00000817 c3823d38 00000000 a0000093
ffffffff c3823d34 c3823c88
[ 55.002901] 3c80: c001e264 c0023fbc c3823cc4 c3823c98 c002abcc
c0171db0 c3823ca4 80000093
[ 55.002901] 3ca0: c3823cec c028a040 00000000 c01ce5a8 c01b4d28
c01b4d50 c3823cd4 c3823cc8
[ 55.002901] 3cc0: c002ac94 c002aad0 c3823cec c3823cd8 c00337a8
c002ac88 c01c7b80 20000093
[ 55.002901] 3ce0: c3823d04 c3823cf0 c0034280 c0033740 c3823d14
c3823d00 c0051d50 c0052410
[ 55.002901] 3d00: c0038120 c02b6ca0 c3823d2c c3823d18 c00382d4
ffffffff c3823d6c c3823e08
[ 55.002901] 3d20: c036f544 c3823ed0 c3823dac c3823d38 c001ea80
c001e238 c02b6ca0 c02b6ec4
[ 55.002901] 3d40: c02b6ec4 00000000 c02b6ca0 a0000013 c3823e08
c036f544 c3823ed0 a0000013
[ 55.002901] 3d60: ffffffff c3823dac c3823db0 c3823d80 c004d488
c002a95c a0000093 ffffffff
[ 55.002901] 3d80: 00000002 00000001 c3823da4 c02b6ca0 a0000013
c3823e08 c036f544 c3823ed0
[ 55.002901] 3da0: c3823dc4 c3823db0 c004d488 c002a954 c3823dd4
c3822000 c3823e04 c3823dc8
[ 55.002901] 3dc0: c004dc60 c004d468 c0298000 ffffffff 00000000
c02b63a0 c3823e0c c3822000
[ 55.002901] 3de0: a0000013 c036f544 00000000 c3823ed0 00000004
ffffffff c3823e64 c3823e08
[ 55.002901] 3e00: c017309c c004dc28 c003c144 c003c0e0 00000000
c01b5ff0 c3823e44 c3823e28
[ 55.002901] 3e20: c001eea8 c003c134 c3823e4c c3823e38 00000000
c004d128 c3823e64 c3822000
[ 55.002901] 3e40: 00000000 c3823fb0 c3822000 c3823ed0 00000000
c3823f50 c3823e74 c3823e68
[ 55.002901] 3e60: c0173420 c0172fc8 c3823e84 c3823e78 c0173434
c01733f0 c3823ebc c3823e88
[ 55.002901] 3e80: c003ac84 c0173434 c3823fb0 c02b6ec4 c02b6f28
c3822000 00000000 c3823fb0
[ 55.002901] 3ea0: c001f044 c001f044 c3822000 400298a8 c3823f9c
c3823ec0 c0021558 c003ac4c
[ 55.002901] 3ec0: c3822000 400298a8 c02b6ec4 c3823ed8 c0172b20
c0172c8c c3823f04 c3823ee8
[ 55.002901] 3ee0: c004e458 c0172ad8 c0171d70 c01d5a34 00000024
00000000 c3823f24 c3823f08
[ 55.002901] 3f00: c00fd9a4 c004e444 c3823f54 00000000 c01d59b0
c3823f78 c3823f54 c3823f28
[ 55.002901] 3f20: c00fe20c c00fd938 00000000 00000000 c001f044
000001ff c3e2ae80 40967c29
[ 55.002901] 3f40: c3823f78 c001f044 c3823f74 c3823f58 c006e750
c00fe1bc 00000000 00000000
[ 55.002901] 3f60: c3e2ae80 00000003 c3823fa4 c3823f78 c006eb70
409694d8 40969490 40042000
[ 55.002901] 3f80: 00000003 c001f044 c3822000 400298a8 c3823fac
c3823fa0 c00219a0 c00214fc
[ 55.002901] 3fa0: 00000000 c3823fb0 c001ef0c c0021984 fffffffc
40967c29 000001ff 000001ff
[ 55.002901] 3fc0: 409694d8 40969490 40042000 00000003 40042000
003d0f00 400298a8 40967c0c
[ 55.002901] 3fe0: 00000000 40967bd8 40033ea4 40033eb4 80000010
00000003 20002031 20002431
[ 55.002901] Backtrace:
[ 55.002901] [<c002a944>] (task_setprio+0x0/0x17c) from [<c004d488>]
(__rt_mutex_adjust_prio+0x30/0x34)
[ 55.002901] r8:00000000 r7:c01b2000 r6:c3823b70 r5:a0000013 r4:c02b6ca0
[ 55.002901] [<c004d458>] (__rt_mutex_adjust_prio+0x0/0x34) from
[<c004dc60>] (task_blocks_on_rt_mutex+0x48/0x204)
[ 55.002901] r4:c3822000
[ 55.002901] [<c004dc18>] (task_blocks_on_rt_mutex+0x0/0x204) from
[<c017309c>] (rt_spin_lock_slowlock+0xe4/0x1e8)
[ 55.002901] [<c0172fb8>] (rt_spin_lock_slowlock+0x0/0x1e8) from
[<c0173420>] (__rt_spin_lock+0x40/0x44)
[ 55.002901] [<c01733e0>] (__rt_spin_lock+0x0/0x44) from
[<c0173618>] (rt_write_lock+0x10/0x14)
[ 55.002901] [<c0173608>] (rt_write_lock+0x0/0x14) from [<c00318e4>]
(do_exit+0x404/0x7fc)
[ 55.002901] [<c00314e0>] (do_exit+0x0/0x7fc) from [<c0022c8c>]
(die+0x1d0/0x21c)
[ 55.002901] [<c0022abc>] (die+0x0/0x21c) from [<c0023f9c>]
(__do_kernel_fault+0x6c/0x7c)
[ 55.002901] [<c0023f30>] (__do_kernel_fault+0x0/0x7c) from
[<c00241c8>] (do_page_fault+0x21c/0x238)
[ 55.002901] r7:c3823d38 r6:c02b6ca0 r5:c01b422c r4:ffffffff
[ 55.002901] [<c0023fac>] (do_page_fault+0x0/0x238) from
[<c001e264>] (do_DataAbort+0x3c/0xa0)
[ 55.002901] [<c001e228>] (do_DataAbort+0x0/0xa0) from [<c001ea80>]
(__dabt_svc+0x40/0x60)
[ 55.002901] Exception stack(0xc3823d38 to 0xc3823d80)
[ 55.002901] 3d20:
c02b6ca0 c02b6ec4
[ 55.002901] 3d40: c02b6ec4 00000000 c02b6ca0 a0000013 c3823e08
c036f544 c3823ed0 a0000013
[ 55.002901] 3d60: ffffffff c3823dac c3823db0 c3823d80 c004d488
c002a95c a0000093 ffffffff
[ 55.002901] r8:c3823ed0 r7:c036f544 r6:c3823e08 r5:c3823d6c r4:ffffffff
[ 55.002901] [<c002a944>] (task_setprio+0x0/0x17c) from [<c004d488>]
(__rt_mutex_adjust_prio+0x30/0x34)
[ 55.002901] r8:c3823ed0 r7:c036f544 r6:c3823e08 r5:a0000013 r4:c02b6ca0
[ 55.002901] [<c004d458>] (__rt_mutex_adjust_prio+0x0/0x34) from
[<c004dc60>] (task_blocks_on_rt_mutex+0x48/0x204)
[ 55.002901] r4:c3822000
[ 55.002901] [<c004dc18>] (task_blocks_on_rt_mutex+0x0/0x204) from
[<c017309c>] (rt_spin_lock_slowlock+0xe4/0x1e8)
[ 55.002901] [<c0172fb8>] (rt_spin_lock_slowlock+0x0/0x1e8) from
[<c0173420>] (__rt_spin_lock+0x40/0x44)
[ 55.002901] [<c01733e0>] (__rt_spin_lock+0x0/0x44) from
[<c0173434>] (rt_spin_lock+0x10/0x14)
[ 55.002901] [<c0173424>] (rt_spin_lock+0x0/0x14) from [<c003ac84>]
(get_signal_to_deliver+0x48/0x3e8)
[ 55.002901] [<c003ac3c>] (get_signal_to_deliver+0x0/0x3e8) from
[<c0021558>] (do_signal+0x6c/0x488)
[ 55.002901] [<c00214ec>] (do_signal+0x0/0x488) from [<c00219a0>]
(do_notify_resume+0x2c/0x30)
[ 55.002901] [<c0021974>] (do_notify_resume+0x0/0x30) from
[<c001ef0c>] (work_pending+0x1c/0x20)
[ 55.002901] Code: e24cb004 e24dd00c e351008c 83a03000 (85833000)
[ 59.099579] Fixing recursive fault but reboot is needed!
[ 59.130829] BUG: scheduling while atomic: ceidrvtest/0x00000005/301, CPU#0
[ 59.130829] [<c0022d88>] (dump_stack+0x0/0x14) from [<c0029bec>]
(__schedule_bug+0x3c/0x48)
[ 59.130829] [<c0029bb0>] (__schedule_bug+0x0/0x48) from
[<c0171820>] (__schedule+0x90/0x44c)
[ 59.130829] [<c0171790>] (__schedule+0x0/0x44c) from [<c0171fa4>]
(schedule+0xd0/0x118)
[ 59.130829] r7:00000817 r6:0000000b r5:c02b6ca0 r4:c3822000
[ 59.130829] [<c0171ed4>] (schedule+0x0/0x118) from [<c00315ec>]
(do_exit+0x10c/0x7fc)
[ 59.130829] r4:c3822000
[ 59.130829] [<c00314e0>] (do_exit+0x0/0x7fc) from [<c0022c8c>]
(die+0x1d0/0x21c)
[ 59.130829] [<c0022abc>] (die+0x0/0x21c) from [<c0023f9c>]
(__do_kernel_fault+0x6c/0x7c)
[ 59.130829] [<c0023f30>] (__do_kernel_fault+0x0/0x7c) from
[<c00241c8>] (do_page_fault+0x21c/0x238)
[ 59.130829] r7:c3823aa0 r6:c02b6ca0 r5:c01b422c r4:ffffffff
[ 59.130829] [<c0023fac>] (do_page_fault+0x0/0x238) from
[<c001e264>] (do_DataAbort+0x3c/0xa0)
[ 59.130829] [<c001e228>] (do_DataAbort+0x0/0xa0) from [<c001ea80>]
(__dabt_svc+0x40/0x60)
[ 59.130829] Exception stack(0xc3823aa0 to 0xc3823ae8)
[ 59.130829] 3aa0: c02b6ca0 c02b6ec4 c02b6ec4 00000000 c02b6ca0
a0000013 c3823b70 c01b2000
[ 59.130829] 3ac0: 00000000 a0000013 ffffffff c3823b14 c3823b18
c3823ae8 c004d488 c002a95c
[ 59.130829] 3ae0: a0000093 ffffffff
[ 59.130829] r8:00000000 r7:c01b2000 r6:c3823b70 r5:c3823ad4 r4:ffffffff
[ 59.130829] [<c002a944>] (task_setprio+0x0/0x17c) from [<c004d488>]
(__rt_mutex_adjust_prio+0x30/0x34)
[ 59.130829] r8:00000000 r7:c01b2000 r6:c3823b70 r5:a0000013 r4:c02b6ca0
[ 59.130829] [<c004d458>] (__rt_mutex_adjust_prio+0x0/0x34) from
[<c004dc60>] (task_blocks_on_rt_mutex+0x48/0x204)
[ 59.130829] r4:c3822000
[ 59.130829] [<c004dc18>] (task_blocks_on_rt_mutex+0x0/0x204) from
[<c017309c>] (rt_spin_lock_slowlock+0xe4/0x1e8)
[ 59.130829] [<c0172fb8>] (rt_spin_lock_slowlock+0x0/0x1e8) from
[<c0173420>] (__rt_spin_lock+0x40/0x44)
[ 59.130829] [<c01733e0>] (__rt_spin_lock+0x0/0x44) from
[<c0173618>] (rt_write_lock+0x10/0x14)
[ 59.130829] [<c0173608>] (rt_write_lock+0x0/0x14) from [<c00318e4>]
(do_exit+0x404/0x7fc)
[ 59.130829] [<c00314e0>] (do_exit+0x0/0x7fc) from [<c0022c8c>]
(die+0x1d0/0x21c)
[ 59.130829] [<c0022abc>] (die+0x0/0x21c) from [<c0023f9c>]
(__do_kernel_fault+0x6c/0x7c)
[ 59.130829] [<c0023f30>] (__do_kernel_fault+0x0/0x7c) from
[<c00241c8>] (do_page_fault+0x21c/0x238)
[ 59.130829] r7:c3823d38 r6:c02b6ca0 r5:c01b422c r4:ffffffff
[ 59.130829] [<c0023fac>] (do_page_fault+0x0/0x238) from
[<c001e264>] (do_DataAbort+0x3c/0xa0)
[ 59.130829] [<c001e228>] (do_DataAbort+0x0/0xa0) from [<c001ea80>]
(__dabt_svc+0x40/0x60)
[ 59.130829] Exception stack(0xc3823d38 to 0xc3823d80)
[ 59.130829] 3d20:
c02b6ca0 c02b6ec4
[ 59.130829] 3d40: c02b6ec4 00000000 c02b6ca0 a0000013 c3823e08
c036f544 c3823ed0 a0000013
[ 59.130829] 3d60: ffffffff c3823dac c3823db0 c3823d80 c004d488
c002a95c a0000093 ffffffff
[ 59.130829] r8:c3823ed0 r7:c036f544 r6:c3823e08 r5:c3823d6c r4:ffffffff
[ 59.130829] [<c002a944>] (task_setprio+0x0/0x17c) from [<c004d488>]
(__rt_mutex_adjust_prio+0x30/0x34)
[ 59.130829] r8:c3823ed0 r7:c036f544 r6:c3823e08 r5:a0000013 r4:c02b6ca0
[ 59.130829] [<c004d458>] (__rt_mutex_adjust_prio+0x0/0x34) from
[<c004dc60>] (task_blocks_on_rt_mutex+0x48/0x204)
[ 59.130829] r4:c3822000
[ 59.130829] [<c004dc18>] (task_blocks_on_rt_mutex+0x0/0x204) from
[<c017309c>] (rt_spin_lock_slowlock+0xe4/0x1e8)
[ 59.130829] [<c0172fb8>] (rt_spin_lock_slowlock+0x0/0x1e8) from
[<c0173420>] (__rt_spin_lock+0x40/0x44)
[ 59.130829] [<c01733e0>] (__rt_spin_lock+0x0/0x44) from
[<c0173434>] (rt_spin_lock+0x10/0x14)
[ 59.130829] [<c0173424>] (rt_spin_lock+0x0/0x14) from [<c003ac84>]
(get_signal_to_deliver+0x48/0x3e8)
[ 59.130829] [<c003ac3c>] (get_signal_to_deliver+0x0/0x3e8) from
[<c0021558>] (do_signal+0x6c/0x488)
[ 59.130829] [<c00214ec>] (do_signal+0x0/0x488) from [<c00219a0>]
(do_notify_resume+0x2c/0x30)
[ 59.130829] [<c0021974>] (do_notify_resume+0x0/0x30) from
[<c001ef0c>] (work_pending+0x1c/0x20)


2007-11-16 20:20:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals



On Fri, 16 Nov 2007, Remy Bohmer wrote:

> Hello All,
>
> I have a problem with the RT-mutex code and signals. The problem is
> very easily reproducible, but I do not have found the root-cause yet.
> I hope someone can help me on this one.
>
> This is what I am doing:
> * I have a simple character driver with a read call.(called spi_read()
> in the logging below )
> * The read call blocks on a semaphore until some condition in hardware
> is reached. (in the routine wait_for_io_level(), see logging below)
> * I use a down_interruptible() call on a 'struct semaphore' type
> semaphore, which eventually blocks on a mutex. (the semaphore is, of
> course, initialised with sema_init() )

The above sounds more like you need a completion. What's used to wake up
the caller of down_interruptible?

Can you post some code to see what you are doing. That would make it much
easier to analyze.

-- Steve


>
> What happens is that when a user-space RT-thread is waiting on the
> semaphore through the spi_read() call, and a signal arrives during the
> wait at this thread (like e.g. CTRL-C), the kernel starts oopsing
> until it is as good as brain-dead.
>
> If I do NOT sent a posix-signal the code/mutex/semaphore is working
> properly for days,
> So, it seems to be related by waking up from a blocking situation,
> because of a pending posix-signal.
>
> I tried also the types 'struct compat_semaphore', and mutexes; none of
> them work.
> In fact the real Mutex type, declared with init_MUTEX() has the same problem.
>
> Anyone an idea?
> Below the kernel oops output. (I run on ARM, but I think that should
> not matter for this problem)
>

2007-11-16 23:02:54

by Remy Bohmer

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

Hello Steven,

Thanks for your reply

> The above sounds more like you need a completion.
Funny, I first started with using completion structures, but that did
not work either. I get similar OOPses on all these kind of locking
mechanisms, as long as I use the _interruptible() type. I tried every
work-around I can think of, but none worked :-((
Even if I block on an ordinary rt-mutex in the same routine, wait a
_interruptible() type, I get the same problem.

> What's used to wake up the caller of down_interruptible?
A call to up() is used from inside an interrupt(thread) context, but
this is not relevant for the problem, because only blocking on a
semaphore with down_interruptible() and waking the thread by CTRL-C is
enough to get this Oops.

I saw that the code is trying to wake 'other waiters', while there is
only 1 thread waiting on the semaphore at most. I feel that the root
cause of this problem has to be searched there.

I believe that executing any PI code on semaphores is a strange
behavior anyway, because a semaphore is never 'owned' by a thread, and
it is always another thread that wakes the thread that blocks on a
semaphore, and because the waker is unknown, the PI code will always
boost the prio of the wrong thread.

Strange is also, that I get different behavior on ARM if I use
sema_init(&sema, 1) versus sema_init(&sema,0). The latter seems to
crash less, it will not crash until the first up(); while the first
will crash even without any up().

Attached I have put a sample driver I just hacked together a few
minutes ago. It is NOT the driver that has generates the oops in the
previous mail, but I have stripped a scull-driver down that much that
it will be much easier to talk about, and to keep us focussed on the
part of the code that is causing this.
Besides: I tested this driver on X86 with 2.6.23.1-rt5 and I get the
also OOPSes although slightly different than on ARM. See the attached
dummy.txt file.


Beware: The up(&sema) is NOT required to get this OOPS, I get it even
without any up(&sema) !

I hope you can look at the attached driver source and help me with this...


Kind Regards,

Remy Bohmer



2007/11/16, Steven Rostedt <[email protected]>:
>
>
> On Fri, 16 Nov 2007, Remy Bohmer wrote:
>
> > Hello All,
> >
> > I have a problem with the RT-mutex code and signals. The problem is
> > very easily reproducible, but I do not have found the root-cause yet.
> > I hope someone can help me on this one.
> >
> > This is what I am doing:
> > * I have a simple character driver with a read call.(called spi_read()
> > in the logging below )
> > * The read call blocks on a semaphore until some condition in hardware
> > is reached. (in the routine wait_for_io_level(), see logging below)
> > * I use a down_interruptible() call on a 'struct semaphore' type
> > semaphore, which eventually blocks on a mutex. (the semaphore is, of
> > course, initialised with sema_init() )
>
> The above sounds more like you need a completion. What's used to wake up
> the caller of down_interruptible?
>
> Can you post some code to see what you are doing. That would make it much
> easier to analyze.
>
> -- Steve
>
>
> >
> > What happens is that when a user-space RT-thread is waiting on the
> > semaphore through the spi_read() call, and a signal arrives during the
> > wait at this thread (like e.g. CTRL-C), the kernel starts oopsing
> > until it is as good as brain-dead.
> >
> > If I do NOT sent a posix-signal the code/mutex/semaphore is working
> > properly for days,
> > So, it seems to be related by waking up from a blocking situation,
> > because of a pending posix-signal.
> >
> > I tried also the types 'struct compat_semaphore', and mutexes; none of
> > them work.
> > In fact the real Mutex type, declared with init_MUTEX() has the same problem.
> >
> > Anyone an idea?
> > Below the kernel oops output. (I run on ARM, but I think that should
> > not matter for this problem)
> >
>


Attachments:
(No filename) (3.84 kB)
dummy.tgz (1.32 kB)
dummy.txt (30.12 kB)
Download all attachments

2007-11-16 23:37:46

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals




On Sat, 17 Nov 2007, Remy Bohmer wrote:

> Hello Steven,
>
> Thanks for your reply
>
> > The above sounds more like you need a completion.
> Funny, I first started with using completion structures, but that did
> not work either. I get similar OOPses on all these kind of locking
> mechanisms, as long as I use the _interruptible() type. I tried every
> work-around I can think of, but none worked :-((
> Even if I block on an ordinary rt-mutex in the same routine, wait a
> _interruptible() type, I get the same problem.

The taker of a mutex must also be the one that releases it. I don't see
how you could use a mutex for this. It really requires some kind of
completion, or a compat_semaphore.

>
> > What's used to wake up the caller of down_interruptible?
> A call to up() is used from inside an interrupt(thread) context, but
> this is not relevant for the problem, because only blocking on a
> semaphore with down_interruptible() and waking the thread by CTRL-C is
> enough to get this Oops.
>
> I saw that the code is trying to wake 'other waiters', while there is
> only 1 thread waiting on the semaphore at most. I feel that the root
> cause of this problem has to be searched there.
>
> I believe that executing any PI code on semaphores is a strange
> behavior anyway, because a semaphore is never 'owned' by a thread, and
> it is always another thread that wakes the thread that blocks on a
> semaphore, and because the waker is unknown, the PI code will always
> boost the prio of the wrong thread.

Exactly why it should be a completion or compat semaphores. The reason we
did PI on semaphores is only because they were used as mutexes before Ingo
pushed to actually get a mutex primative into the kernel. Since then,
we've been trying to remove all semaphores with either a mutex or
completion.

>
> Strange is also, that I get different behavior on ARM if I use
> sema_init(&sema, 1) versus sema_init(&sema,0). The latter seems to
> crash less, it will not crash until the first up(); while the first
> will crash even without any up().
>
> Attached I have put a sample driver I just hacked together a few
> minutes ago. It is NOT the driver that has generates the oops in the
> previous mail, but I have stripped a scull-driver down that much that
> it will be much easier to talk about, and to keep us focussed on the
> part of the code that is causing this.
> Besides: I tested this driver on X86 with 2.6.23.1-rt5 and I get the
> also OOPSes although slightly different than on ARM. See the attached
> dummy.txt file.
>
>
> Beware: The up(&sema) is NOT required to get this OOPS, I get it even
> without any up(&sema) !
>
> I hope you can look at the attached driver source and help me with this...
>

down_interruptible(&dummy);
printk("We will block now, and if you press CTRL-C from here, we get an OOPS.\n");
down_interruptible(&dummy);


This double down is actually illegal with rt semaphores. Because we treat
semaphores as mutexes unless they are declared as compat_semaphores. In
which case we don't do PI.

Seems that you need to work out how to use a completion for your code. And
if that doesn't work, then use a compat_semaphore. But beware, that the
compat_semaphore can cause unbounded latencies. But then again, so can
completions.


-- Steve

2007-11-17 11:44:44

by Remy Bohmer

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

Hello Steven,

> The taker of a mutex must also be the one that releases it. I don't see
> how you could use a mutex for this. It really requires some kind of
> completion, or a compat_semaphore.

I tried several ways of working around the bug, even tried
implementing it with kernel threads and protecting global data with
mutexes. Therefor I know that I have the same problem with mutexes. I
just created a simple example that showed the problem quickly, this
does not mean that this is the only case that does not work.

BTW: I am hacking around in the PREEMPT-RT kernel for years now, I
know the history very well, and I know what I am doing... Please, do
not find holes in the example I quickly hacked together, please focus
on the OOPS message, and help me figuring out what is causing this. I
can give you other examples of code that shows the same problem. But,
basically, every call that blocks with _inturruptible() on a rt-mutex
beneath the surface in the context of a user space process (and
receives a signal during the block) shows the same problem.

> Exactly why it should be a completion or compat semaphores. The reason we
> did PI on semaphores is only because they were used as mutexes before Ingo
> pushed to actually get a mutex primative into the kernel. Since then,
> we've been trying to remove all semaphores with either a mutex or
> completion.

Okay, sounds fair. But: the current implementation does counting the
number of up's and down's, suggesting that it really behaves like a
semaphore. It only does some special things during the transition of
the counter from 0->1 and from 1->0. If this counting is illegal use
of the mutex mechanism, it should report (compile) errors if:
* sema_init is used on 'struct semaphore' -> init_MUTEX() must be used instead.
* sema_init should only be used on 'struct compat_semaphore' types.
* calls to up() and down() in a row should report a BUG message,
* if up() is called from a different thread than the down() it should
report a BUG message. Further, the counting up's and down's are not
allowed on struct semphore types, so it should be removed from the
code.
* PI should only take place if it is for 100% sure that the 'struct
semaphore' is used as a mutex. And this is only the case when it is
initialised with init_MUTEX().

So, because all these items are not there, I doubt it is really true
that it is illegal to use 'struct semaphore' types as counting
semaphores across multiple threads. BESIDES: Everything works fine
UNTIL a signal is generated during a block on the semaphore. I think
Ingo tried to make the 'struct semaphore' type to behave like the
non-RT kernel 'struct semaphore', which actually does NOT show this
problem wtih my example driver!!!

So, this is a regression if exactly the same driver is used in both
non-preempt-rt patched kernel and preempt-rt patched kernels.

> down_interruptible(&dummy);
> printk("We will block now, and if you press CTRL-C from here, we get an OOPS.\n");
> down_interruptible(&dummy);
>
> This double down is actually illegal with rt semaphores. Because we treat
> semaphores as mutexes unless they are declared as compat_semaphores. In
> which case we don't do PI.

According to code there is a counting mechanism there, which suggest
that this is allowed to do. It works fine, until a signal arrives.the
SIGNAL is the only problem here!

> Seems that you need to work out how to use a completion for your code. And
> if that doesn't work, then use a compat_semaphore. But beware, that the
> compat_semaphore can cause unbounded latencies. But then again, so can
> completions.

I hope you get my point now. Other mechanisms like ordinary rt-mutexes
show the same problem, so either case: Please help me figuring out
which bug the signal is triggering here.


Kind Regards,

Remy Bohmer


2007/11/17, Steven Rostedt <[email protected]>:
>
>
>
> On Sat, 17 Nov 2007, Remy Bohmer wrote:
>
> > Hello Steven,
> >
> > Thanks for your reply
> >
> > > The above sounds more like you need a completion.
> > Funny, I first started with using completion structures, but that did
> > not work either. I get similar OOPses on all these kind of locking
> > mechanisms, as long as I use the _interruptible() type. I tried every
> > work-around I can think of, but none worked :-((
> > Even if I block on an ordinary rt-mutex in the same routine, wait a
> > _interruptible() type, I get the same problem.
>
> The taker of a mutex must also be the one that releases it. I don't see
> how you could use a mutex for this. It really requires some kind of
> completion, or a compat_semaphore.
>
> >
> > > What's used to wake up the caller of down_interruptible?
> > A call to up() is used from inside an interrupt(thread) context, but
> > this is not relevant for the problem, because only blocking on a
> > semaphore with down_interruptible() and waking the thread by CTRL-C is
> > enough to get this Oops.
> >
> > I saw that the code is trying to wake 'other waiters', while there is
> > only 1 thread waiting on the semaphore at most. I feel that the root
> > cause of this problem has to be searched there.
> >
> > I believe that executing any PI code on semaphores is a strange
> > behavior anyway, because a semaphore is never 'owned' by a thread, and
> > it is always another thread that wakes the thread that blocks on a
> > semaphore, and because the waker is unknown, the PI code will always
> > boost the prio of the wrong thread.
>
> Exactly why it should be a completion or compat semaphores. The reason we
> did PI on semaphores is only because they were used as mutexes before Ingo
> pushed to actually get a mutex primative into the kernel. Since then,
> we've been trying to remove all semaphores with either a mutex or
> completion.
>
> >
> > Strange is also, that I get different behavior on ARM if I use
> > sema_init(&sema, 1) versus sema_init(&sema,0). The latter seems to
> > crash less, it will not crash until the first up(); while the first
> > will crash even without any up().
> >
> > Attached I have put a sample driver I just hacked together a few
> > minutes ago. It is NOT the driver that has generates the oops in the
> > previous mail, but I have stripped a scull-driver down that much that
> > it will be much easier to talk about, and to keep us focussed on the
> > part of the code that is causing this.
> > Besides: I tested this driver on X86 with 2.6.23.1-rt5 and I get the
> > also OOPSes although slightly different than on ARM. See the attached
> > dummy.txt file.
> >
> >
> > Beware: The up(&sema) is NOT required to get this OOPS, I get it even
> > without any up(&sema) !
> >
> > I hope you can look at the attached driver source and help me with this...
> >
>
> down_interruptible(&dummy);
> printk("We will block now, and if you press CTRL-C from here, we get an OOPS.\n");
> down_interruptible(&dummy);
>
>
> This double down is actually illegal with rt semaphores. Because we treat
> semaphores as mutexes unless they are declared as compat_semaphores. In
> which case we don't do PI.
>
> Seems that you need to work out how to use a completion for your code. And
> if that doesn't work, then use a compat_semaphore. But beware, that the
> compat_semaphore can cause unbounded latencies. But then again, so can
> completions.
>
>
> -- Steve
>
>

2007-11-17 14:08:59

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals



On Sat, 17 Nov 2007, Remy Bohmer wrote:

> Hello Steven,
>
> > The taker of a mutex must also be the one that releases it. I don't see
> > how you could use a mutex for this. It really requires some kind of
> > completion, or a compat_semaphore.
>
> I tried several ways of working around the bug, even tried
> implementing it with kernel threads and protecting global data with
> mutexes. Therefor I know that I have the same problem with mutexes. I
> just created a simple example that showed the problem quickly, this
> does not mean that this is the only case that does not work.
>
> BTW: I am hacking around in the PREEMPT-RT kernel for years now, I
> know the history very well, and I know what I am doing... Please, do
> not find holes in the example I quickly hacked together, please focus
> on the OOPS message, and help me figuring out what is causing this. I
> can give you other examples of code that shows the same problem. But,
> basically, every call that blocks with _inturruptible() on a rt-mutex
> beneath the surface in the context of a user space process (and
> receives a signal during the block) shows the same problem.

Please don't think that I just took your example to find a hole in it.

Actually, the reason I pointed that out was not because of "Oh, this
example is broken", but because of the description of the problem you
are trying to solve. To me, it needs to be done with a completion or
compat_semaphore otherwise it will become very complex in solving.

>
> > Exactly why it should be a completion or compat semaphores. The reason we
> > did PI on semaphores is only because they were used as mutexes before Ingo
> > pushed to actually get a mutex primative into the kernel. Since then,
> > we've been trying to remove all semaphores with either a mutex or
> > completion.
>
> Okay, sounds fair. But: the current implementation does counting the
> number of up's and down's, suggesting that it really behaves like a
> semaphore. It only does some special things during the transition of
> the counter from 0->1 and from 1->0. If this counting is illegal use
> of the mutex mechanism, it should report (compile) errors if:
> * sema_init is used on 'struct semaphore' -> init_MUTEX() must be used instead.
> * sema_init should only be used on 'struct compat_semaphore' types.
> * calls to up() and down() in a row should report a BUG message,
> * if up() is called from a different thread than the down() it should
> report a BUG message. Further, the counting up's and down's are not
> allowed on struct semphore types, so it should be removed from the
> code.

It should print out warnings, do you have CONFIG_DEBUG_RT_MUTEXES set?

> * PI should only take place if it is for 100% sure that the 'struct
> semaphore' is used as a mutex. And this is only the case when it is
> initialised with init_MUTEX().

Well, we can't determine that with code ;-) Remember, there are still
drivers out in the world that use semaphores as mutexes. So the PI
on semaphores is really more of a compatibility issue.

>
> So, because all these items are not there, I doubt it is really true
> that it is illegal to use 'struct semaphore' types as counting
> semaphores across multiple threads. BESIDES: Everything works fine
> UNTIL a signal is generated during a block on the semaphore. I think
> Ingo tried to make the 'struct semaphore' type to behave like the
> non-RT kernel 'struct semaphore', which actually does NOT show this
> problem wtih my example driver!!!

Right, because a non-RT semaphore _is_ a compat_semaphore. I'm saying if
you see the bug in your driver with the compat_semaphore then lets debug
that. Because that _is_ a bug!


>
> So, this is a regression if exactly the same driver is used in both
> non-preempt-rt patched kernel and preempt-rt patched kernels.

Not really. There are things that the preempt-rt kernels require. One, is
that things that need to keep semaphores instead of using them as mutexes,
they should be converted to compat_semaphores. Perhaps now that we have
mutexes, we can remove the PI on semaphores, and out-of-tree drivers will
need to make sure they don't use semaphores as mutexes anymore.

>
> > down_interruptible(&dummy);
> > printk("We will block now, and if you press CTRL-C from here, we get an OOPS.\n");
> > down_interruptible(&dummy);
> >
> > This double down is actually illegal with rt semaphores. Because we treat
> > semaphores as mutexes unless they are declared as compat_semaphores. In
> > which case we don't do PI.
>
> According to code there is a counting mechanism there, which suggest
> that this is allowed to do. It works fine, until a signal arrives.the
> SIGNAL is the only problem here!

Yeah, that code is more of a hack to convert counting semaphores into
mutexes. But semaphores still need to have owners, and they should not
block on themselves. That may be where the bug is.

>
> > Seems that you need to work out how to use a completion for your code. And
> > if that doesn't work, then use a compat_semaphore. But beware, that the
> > compat_semaphore can cause unbounded latencies. But then again, so can
> > completions.
>
> I hope you get my point now. Other mechanisms like ordinary rt-mutexes
> show the same problem, so either case: Please help me figuring out
> which bug the signal is triggering here.

OK, I wont be able to work on this this weekend, but I'll try to get to it
on Monday. A better example to show the bug you are looking for is simply
create a mutex and create a thread that grabs that mutex and goes to
sleep. Have your driver read grab that mutex with
mutex_lock_interruptible. And if the signal code is broken with this, then
you definitely got a point that the inerruptible code is broken.

This will keep the semantics clean and not obfuscate it with the semaphore
code.

I'll write up that example on Monday if you don't have the time.

Note, that the unloading of the module should wake up the thread that
grabbed the mutex so it can release it.

-- Steve

2007-11-17 15:07:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals


On Sat, 17 Nov 2007, Steven Rostedt wrote:
> On Sat, 17 Nov 2007, Remy Bohmer wrote:
> > * PI should only take place if it is for 100% sure that the 'struct
> > semaphore' is used as a mutex. And this is only the case when it is
> > initialised with init_MUTEX().
>
> Well, we can't determine that with code ;-) Remember, there are still
> drivers out in the world that use semaphores as mutexes. So the PI
> on semaphores is really more of a compatibility issue.
>

Oh, I missed your point here. I was up late last night so I'm blaming that
for my _not_so_thorough_ reading ;-)

You are saying that we should change the semaphore to a mutex only if it
was initialized by init_MUTEX(). I see your point.

Actually, if something is initialized by init_MUTEX it really needs to be
a mutex :-)

Right now the compiler determines that something is a mutex or a
semaphore. If we make it a run time option, that will add more complexity
to the code and probably make it less efficient. The more complex part
is something we can deal with. The less efficient part we can not.

find . -name "*.c" ! -type d | xargs grep "init_MUTEX" | wc -l
100

Heh, nice number.

Well, I guess I could start sending patches to mainline to convert
semaphores to mutexes. I'm sure that will probably annoy akpm, but I'll do
it one at a time, with lots of thought behind each change.

BTW, drivers breaking with the RT patch is not always considered a
regression.

For example, this is fine in vanilla kernel:

local_irq_save(flags);
[...]
spin_lock(&lock);

Where as in the RT patch, that would break. And it would require either
changing the local_irq_save to a local_irq_save_nort, or find a way to do
the spin_lock_irqsave(&lock, flags) instead.

-- Steve


2007-11-17 15:36:20

by Remy Bohmer

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

Hello Steven,

> It should print out warnings, do you have CONFIG_DEBUG_RT_MUTEXES set?

Nope, not yet... I will do that on Monday also. (On ARM I have as less
as debug options enabled per default, because it eats too much
CPU-power)

> > * PI should only take place if it is for 100% sure that the 'struct
> > semaphore' is used as a mutex. And this is only the case when it is
> > initialised with init_MUTEX().
>
> Well, we can't determine that with code ;-) Remember, there are still
> drivers out in the world that use semaphores as mutexes. So the PI
> on semaphores is really more of a compatibility issue.

Very strange, although there is now a real mutex in non-RT as in RT,
all semaphores are still converted to some
badly-implemented-recursive-but-not-recursive-callable-mutexes,
because the assumption is made that semaphores are intentionally
always used in the wrong way?!!
So, the code that is written nicely must be adapted to prevent
adaption of bad code?

I would expect a more logical solution like the introduction of raw_
types for the exceptions, just like what is done with spinlocks and so
on.

> > So, this is a regression if exactly the same driver is used in both
> > non-preempt-rt patched kernel and preempt-rt patched kernels.
>
> Not really. There are things that the preempt-rt kernels require. One, is
> that things that need to keep semaphores instead of using them as mutexes,
> they should be converted to compat_semaphores. Perhaps now that we have
> mutexes, we can remove the PI on semaphores, and out-of-tree drivers will
> need to make sure they don't use semaphores as mutexes anymore.

Up to this semaphore implementation the standard was always that a
driver written for a non-RT kernel should compile and work properly
without any adaption on a RT-kernel, until there are some specific
realtime requirements.
When it comes to semaphores this is thus completely not true.

And this is where the whole thing confused me completely.

> Yeah, that code is more of a hack to convert counting semaphores into
> mutexes.

Enough said...

> But semaphores still need to have owners, and they should not
> block on themselves. That may be where the bug is.

Or there should be a real recursive mutex implementation. But the need
for recursive mutexes is usually a sign for bad locking, and should
therefor be avoided.

> OK, I wont be able to work on this this weekend, but I'll try to get to it
> on Monday. A better example to show the bug you are looking for is simply
> create a mutex and create a thread that grabs that mutex and goes to
> sleep. Have your driver read grab that mutex with
> mutex_lock_interruptible. And if the signal code is broken with this, then
> you definitely got a point that the inerruptible code is broken.

I will work on an example like this on Monday.

> This will keep the semantics clean and not obfuscate it with the semaphore
> code.

I agree.

> I'll write up that example on Monday if you don't have the time.

I will make time ;-)

> Note, that the unloading of the module should wake up the thread that
> grabbed the mutex so it can release it.

Unloading of a module is not possible as long as there is still a
handle open to it. So, it should be safe without waking up the thread.
And returning from the read call and still have a mutex locked is not
nice either.

Thanks for the explanation. I will keep you informed on Monday.


Have a nice weekend !


Remy

2007-11-17 16:26:56

by Daniel Walker

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

On Sat, 2007-11-17 at 12:44 +0100, Remy Bohmer wrote:
> Hello Steven,
>
> > The taker of a mutex must also be the one that releases it. I don't see
> > how you could use a mutex for this. It really requires some kind of
> > completion, or a compat_semaphore.
>
> I tried several ways of working around the bug, even tried
> implementing it with kernel threads and protecting global data with
> mutexes. Therefor I know that I have the same problem with mutexes. I
> just created a simple example that showed the problem quickly, this
> does not mean that this is the only case that does not work.

I tried your example and I was able to reproduce the OOPS that you
found.. Although there is one problem, you don't have the same number of
up()'s to down() calls so you end up leaving the dummy_read function
with the lock still held ..

Reviewing the OOPS and the warnings it looks like your progressively
corrupting the mutex waiter list since remove_waiter() actually leaves
the stack based waiter object on the waiter list.. (That's what it looks
like anyway)..

So I converted your code to use a compat_semaphore, and no oops
happens.. Which makes sense because compat_semaphores are designed to
work the way your using them.

Daniel

2007-11-17 17:09:26

by Remy Bohmer

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

Hello Daniel,

Thanks for looking into it also.
Steven already made clear to me that the 'struct semaphore' type on
the RT-kernel should not be used as a counting-semaphore, but as some
sort of legacy-mutex... (The confusion that this will cause is clear
by now...)

I still do not understand the problems I had with the
interruptible-waits on a real rt-mutex, but I have to figure that out
again on Monday. Maybe one confusion let to another...

(Note, A completion will not work for me, because they are not
designed for reuse across several threads. The read/write runs in user
context and as such it can be called by different threads, which would
require a init of a completion before waiting on it, but that would be
racy, I could miss the awake by the init)

> So I converted your code to use a compat_semaphore, and no oops
> happens.. Which makes sense because compat_semaphores are designed to
> work the way your using them.

Actually, IMO, compat_semaphores behave like semaphores should behave,
and thus the same as they behave on a non-RT kernel, and at the
locations where the semaphores are now misused as mutexes on RT, we
should replace them by differently-named-mutex-type-semaphores, or
better: real-RT-mutexes..
IMO this wrong usage of semaphores is solved by modifying the code
that actually made proper use of the semaphores, and I think that if
the naming matches the mainline kernel, we only need to patch the
files that really need to be patched during the integration in
mainline of the RT-patch.

Kind Regards,

Remy Bohmer

2007-11-17 17:33:56

by Daniel Walker

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

On Sat, 2007-11-17 at 18:09 +0100, Remy Bohmer wrote:

> Actually, IMO, compat_semaphores behave like semaphores should behave,
> and thus the same as they behave on a non-RT kernel, and at the
> locations where the semaphores are now misused as mutexes on RT, we
> should replace them by differently-named-mutex-type-semaphores, or
> better: real-RT-mutexes..

The vast majority of semaphore are actually binary semaphores in the
Linux kernel .. So it's easier to mass convert semaphores to mutexes,
then address the ones that don't conform.. Usually they are converted to
the complete API in mainline..

> IMO this wrong usage of semaphores is solved by modifying the code
> that actually made proper use of the semaphores, and I think that if
> the naming matches the mainline kernel, we only need to patch the
> files that really need to be patched during the integration in
> mainline of the RT-patch.

As I say above, it's happen already.. Code is slowly getting converted
to the complete API or the code gets converted to use a binary semaphore
(or a mutex)..

Daniel

2007-11-17 17:46:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals


* Daniel Walker <[email protected]> wrote:

> > Actually, IMO, compat_semaphores behave like semaphores should
> > behave, and thus the same as they behave on a non-RT kernel, and at
> > the locations where the semaphores are now misused as mutexes on RT,
> > we should replace them by differently-named-mutex-type-semaphores,
> > or better: real-RT-mutexes..
>
> The vast majority of semaphore are actually binary semaphores in the
> Linux kernel .. So it's easier to mass convert semaphores to mutexes,
> then address the ones that don't conform.. Usually they are converted
> to the complete API in mainline..

right now there are 3992 mutex_lock() critical sections in the kernel
and only 351 down() based critical sections are left.

fixing the top 20:

4 &vuart_bus_priv.probe_mutex
5 &connections_lock
5 &irq_ptr->setting_up_sema
5 &kbd->sem
5 &pnp_res_mutex
5 &port->port_lock
5 &tq_init_sem
6 &adb_handler_sem
6 &dev->parent->sem
6 &driver_lock
6 &ha->vport_sem
7 &big_buffer_sem
8 &dir_f->sem
9 &c->alloc_sem
11 &dev->sem
11 &usbvision->lock
12 &c->erase_free_sem
15 &u132->scheduler_lock
16 &zfcp_data.config_sema
17 &f->sem

would remove 164 of them, so it would convert half of the remaining
semaphore use in the kernel. So the job is almost finished - would
anyone like to go for the final grand feat: complete removal of
semaphores from the kernel? :-)

Ingo

2007-11-17 18:00:14

by Daniel Walker

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

On Sat, 2007-11-17 at 18:46 +0100, Ingo Molnar wrote:
> * Daniel Walker <[email protected]> wrote:
>
> > > Actually, IMO, compat_semaphores behave like semaphores should
> > > behave, and thus the same as they behave on a non-RT kernel, and at
> > > the locations where the semaphores are now misused as mutexes on RT,
> > > we should replace them by differently-named-mutex-type-semaphores,
> > > or better: real-RT-mutexes..
> >
> > The vast majority of semaphore are actually binary semaphores in the
> > Linux kernel .. So it's easier to mass convert semaphores to mutexes,
> > then address the ones that don't conform.. Usually they are converted
> > to the complete API in mainline..
>
> right now there are 3992 mutex_lock() critical sections in the kernel
> and only 351 down() based critical sections are left.
>
> fixing the top 20:
>
> 4 &vuart_bus_priv.probe_mutex
> 5 &connections_lock
> 5 &irq_ptr->setting_up_sema
> 5 &kbd->sem
> 5 &pnp_res_mutex
> 5 &port->port_lock
> 5 &tq_init_sem
> 6 &adb_handler_sem
> 6 &dev->parent->sem
> 6 &driver_lock
> 6 &ha->vport_sem
> 7 &big_buffer_sem
> 8 &dir_f->sem
> 9 &c->alloc_sem
> 11 &dev->sem
> 11 &usbvision->lock
> 12 &c->erase_free_sem
> 15 &u132->scheduler_lock
> 16 &zfcp_data.config_sema
> 17 &f->sem
>
> would remove 164 of them, so it would convert half of the remaining
> semaphore use in the kernel. So the job is almost finished - would
> anyone like to go for the final grand feat: complete removal of
> semaphores from the kernel? :-)

Sure, you want to split the list?

Daniel

2007-11-17 18:05:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals


* Daniel Walker <[email protected]> wrote:

> On Sat, 2007-11-17 at 18:46 +0100, Ingo Molnar wrote:
> > * Daniel Walker <[email protected]> wrote:
> >
> > > > Actually, IMO, compat_semaphores behave like semaphores should
> > > > behave, and thus the same as they behave on a non-RT kernel, and at
> > > > the locations where the semaphores are now misused as mutexes on RT,
> > > > we should replace them by differently-named-mutex-type-semaphores,
> > > > or better: real-RT-mutexes..
> > >
> > > The vast majority of semaphore are actually binary semaphores in the
> > > Linux kernel .. So it's easier to mass convert semaphores to mutexes,
> > > then address the ones that don't conform.. Usually they are converted
> > > to the complete API in mainline..
> >
> > right now there are 3992 mutex_lock() critical sections in the kernel
> > and only 351 down() based critical sections are left.
> >
> > fixing the top 20:
> >
> > 4 &vuart_bus_priv.probe_mutex
> > 5 &connections_lock
> > 5 &irq_ptr->setting_up_sema
> > 5 &kbd->sem
> > 5 &pnp_res_mutex
> > 5 &port->port_lock
> > 5 &tq_init_sem
> > 6 &adb_handler_sem
> > 6 &dev->parent->sem
> > 6 &driver_lock
> > 6 &ha->vport_sem
> > 7 &big_buffer_sem
> > 8 &dir_f->sem
> > 9 &c->alloc_sem
> > 11 &dev->sem
> > 11 &usbvision->lock
> > 12 &c->erase_free_sem
> > 15 &u132->scheduler_lock
> > 16 &zfcp_data.config_sema
> > 17 &f->sem
> >
> > would remove 164 of them, so it would convert half of the remaining
> > semaphore use in the kernel. So the job is almost finished - would
> > anyone like to go for the final grand feat: complete removal of
> > semaphores from the kernel? :-)
>
> Sure, you want to split the list?

split the list with you? Feel free to take any of those :-) dev->sem is
nontrivial and probably not possible right now - and some of the others
might be problematic too. But there might be fixable ones in the list.
This shouldnt become like the BKL conversion - never truly finished.

Ingo

2007-11-17 18:17:22

by Daniel Walker

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

On Sat, 2007-11-17 at 19:04 +0100, Ingo Molnar wrote:

>
> split the list with you? Feel free to take any of those :-) dev->sem is
> nontrivial and probably not possible right now - and some of the others
> might be problematic too. But there might be fixable ones in the list.
> This shouldnt become like the BKL conversion - never truly finished.

If I said I was going to do some set, at least it's more likely that the
work isn't duplicated..

What specifically is wrong with dev->sem ?

Daniel

2007-11-17 22:49:33

by Remy Bohmer

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

> > Sure, you want to split the list?
>
> split the list with you? Feel free to take any of those :-) dev->sem is
> nontrivial and probably not possible right now - and some of the others
> might be problematic too. But there might be fixable ones in the list.
> This shouldnt become like the BKL conversion - never truly finished.

Hey Ingo and Daniel, Leave some of the fun open for me :-)

I just looked at the list and I found a few that seem doable.


Remy

2007-11-18 12:35:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals


On Sat, 2007-11-17 at 10:12 -0800, Daniel Walker wrote:

> What specifically is wrong with dev->sem ?

Nothing really, other than that they use semaphores to avoid lockdep :-/

I think I know how to annotate this, after Alan Stern explained all the
use cases, but I haven't come around to implementing it. Hope to do that
soonish.

Another real semaphore user is XFS, they really use the down/up
asymmetry that semaphores allow, last time I spoke with Dave Chinner he
didn't know a way around this.

2007-11-18 22:26:58

by David Chinner

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

On Sun, Nov 18, 2007 at 01:33:50PM +0100, Peter Zijlstra wrote:
>
> On Sat, 2007-11-17 at 10:12 -0800, Daniel Walker wrote:
>
> > What specifically is wrong with dev->sem ?
>
> Nothing really, other than that they use semaphores to avoid lockdep :-/
>
> I think I know how to annotate this, after Alan Stern explained all the
> use cases, but I haven't come around to implementing it. Hope to do that
> soonish.
>
> Another real semaphore user is XFS, they really use the down/up
> asymmetry that semaphores allow, last time I spoke with Dave Chinner he
> didn't know a way around this.

Hasn't changed recently. We could convert some to completions, but
others are much more difficult.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2007-11-19 07:25:32

by Jon Masters

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

On Sat, 2007-11-17 at 09:55 -0800, Daniel Walker wrote:

> Sure, you want to split the list?

I'm happy to grab a few of these too. Let me know if either of you or
Ingo is working on the whole lot and about to dump it on us ;-)

Jon.



2007-11-19 12:55:59

by Remy Bohmer

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

Hello Steven,

> > OK, I wont be able to work on this this weekend, but I'll try to get to it
> > on Monday. A better example to show the bug you are looking for is simply
> > create a mutex and create a thread that grabs that mutex and goes to
> > sleep. Have your driver read grab that mutex with
> > mutex_lock_interruptible. And if the signal code is broken with this, then
> > you definitely got a point that the inerruptible code is broken.

I removed the 'struct semaphore' completely from my driver, using real
mutexes now instead, replace the signalling semaphores by 'struct
completion' mechanisms and got rid of the possible race I saw with the
completions in a different way, and now the problem is completely
gone!

Posix Signals work properly now (no OOPS anymore), so the problem was
likely related to the way I used the 'struct semaphore' types, which
is thus different compared to the non-RT kernel and therefor quite
confusing.

So, thank you (and Daniel) for pointing me into the right direction.

Now lets get rid of the 'struct semaphore' completely in the kernel :-))


Kind Regards,

Remy Bohmer

2007-11-19 13:55:33

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals


On Mon, 19 Nov 2007, Remy Bohmer wrote:

> Hello Steven,
>
> > > OK, I wont be able to work on this this weekend, but I'll try to get to it
> > > on Monday. A better example to show the bug you are looking for is simply
> > > create a mutex and create a thread that grabs that mutex and goes to
> > > sleep. Have your driver read grab that mutex with
> > > mutex_lock_interruptible. And if the signal code is broken with this, then
> > > you definitely got a point that the inerruptible code is broken.
>
> I removed the 'struct semaphore' completely from my driver, using real
> mutexes now instead, replace the signalling semaphores by 'struct
> completion' mechanisms and got rid of the possible race I saw with the
> completions in a different way, and now the problem is completely
> gone!
>
> Posix Signals work properly now (no OOPS anymore), so the problem was
> likely related to the way I used the 'struct semaphore' types, which
> is thus different compared to the non-RT kernel and therefor quite
> confusing.
>
> So, thank you (and Daniel) for pointing me into the right direction.
>
> Now lets get rid of the 'struct semaphore' completely in the kernel :-))

Remy,

Thanks a lot for looking further into this. I'd like to join the fun in
removing the rest of the semaphores in the kernel, but with you, Ingo,
Daniel and Jon going to do that, one more cook will just spoil the stew.

Once we get rid of all the semaphores in the kernel that are being used as
mutexes, then we can change the code in the -rt patch to keep semaphores
default as compat_semaphores. And any out of tree driver would just need
to be fixed to use mutexes.

Well, read/write semaphores might have to stay special.

Thanks,

-- Steve

2007-11-19 15:36:43

by Daniel Walker

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

On Mon, 2007-11-19 at 02:24 -0500, Jon Masters wrote:
> On Sat, 2007-11-17 at 09:55 -0800, Daniel Walker wrote:
>
> > Sure, you want to split the list?
>
> I'm happy to grab a few of these too. Let me know if either of you or
> Ingo is working on the whole lot and about to dump it on us ;-)

I was planning to just try a few, so I think your safe..

Daniel

2007-11-19 15:58:26

by Remy Bohmer

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

Hello Daniel,

Can we coordinate who is doing what somehow, to prevent double work?

Remy

2007/11/19, Daniel Walker <[email protected]>:
> On Mon, 2007-11-19 at 02:24 -0500, Jon Masters wrote:
> > On Sat, 2007-11-17 at 09:55 -0800, Daniel Walker wrote:
> >
> > > Sure, you want to split the list?
> >
> > I'm happy to grab a few of these too. Let me know if either of you or
> > Ingo is working on the whole lot and about to dump it on us ;-)
>
> I was planning to just try a few, so I think your safe..
>
> Daniel
>
>

2007-11-19 16:16:46

by Daniel Walker

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

On Mon, 2007-11-19 at 16:51 +0100, Remy Bohmer wrote:
> Hello Daniel,
>
> Can we coordinate who is doing what somehow, to prevent double work?

I actually haven't looked at any on Ingo's list (i8042tregs).. Feel free
to call out the ones your looking at tho..

Daniel

2007-11-20 16:48:35

by Daniel Walker

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

On Sat, 2007-11-17 at 18:46 +0100, Ingo Molnar wrote:

> fixing the top 20:
>

There are about 25 DECLARE_MUTEX() semaphores remaining .. One is the
BKL which I would guess can't be converted. The others I've looked at
appear to be trivial find/replace changes to get them to use the mutex
type.. Any reason those haven't been converted over?

Daniel

2007-11-20 20:38:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals


* Daniel Walker <[email protected]> wrote:

> On Sat, 2007-11-17 at 18:46 +0100, Ingo Molnar wrote:
>
> > fixing the top 20:
>
> There are about 25 DECLARE_MUTEX() semaphores remaining .. One is the
> BKL which I would guess can't be converted. The others I've looked at
> appear to be trivial find/replace changes to get them to use the mutex
> type.. Any reason those haven't been converted over?

i guess talking about it instead of posting patches is a factor ;-)

Ingo

2007-11-20 20:59:34

by Daniel Walker

[permalink] [raw]
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

On Tue, 2007-11-20 at 21:37 +0100, Ingo Molnar wrote:
> * Daniel Walker <[email protected]> wrote:
>
> > On Sat, 2007-11-17 at 18:46 +0100, Ingo Molnar wrote:
> >
> > > fixing the top 20:
> >
> > There are about 25 DECLARE_MUTEX() semaphores remaining .. One is the
> > BKL which I would guess can't be converted. The others I've looked at
> > appear to be trivial find/replace changes to get them to use the mutex
> > type.. Any reason those haven't been converted over?
>
> i guess talking about it instead of posting patches is a factor ;-)

There's several using init_MUTEX() (~70 or so), reviewing a few they
look fairly trivial ..

Daniel