2014-10-07 09:42:48

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH] sched: Do not try to replenish from a non deadline tasks

Hi Daniel,

On 24/09/14 14:24, Daniel Wagner wrote:
> When a PI mutex is shared between an deadline task and normal task we
> might end up trying to replenish from the normal task. In this case neither
> dl_runtime, dl_period or dl_deadline are set. replenish_dl_entity() can't do
> anything useful.
>

Is this same bug we have with rt_mutex_setprio or something different?
I'm sorry, but I don't remember anymore :/. It looks like a different
issue, though.

Anyway, the callpath you talked about on IRC seems to make sense, does
what below fix the thing? Could you please point me again to where the
tests you are running are hosted, so that I can easily reproduce the
bug here?

Thanks a lot,

- Juri

>From f39b7668aeca5c48be1d4baed217cdd6c8d61150 Mon Sep 17 00:00:00 2001
From: Juri Lelli <[email protected]>
Date: Tue, 7 Oct 2014 10:29:09 +0100
Subject: [PATCH] sched/deadline: fix double enqueue on dl_task_timer.

Signed-off-by: Juri Lelli <[email protected]>
---
kernel/sched/deadline.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 255ce13..d0beefa 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -520,10 +520,13 @@ again:
/*
* We need to take care of a possible races here. In fact, the
* task might have changed its scheduling policy to something
- * different from SCHED_DEADLINE or changed its reservation
- * parameters (through sched_setattr()).
+ * different from SCHED_DEADLINE, changed its reservation
+ * parameters (through sched_setattr()) or inherited priority
+ * (and parameters) from someone else (in this last case it is
+ * also outside of bandwidth enforcement, so we can safely bail
+ * out).
*/
- if (!dl_task(p) || dl_se->dl_new)
+ if (!dl_task(p) || dl_se->dl_new || dl_se->dl_boosted)
goto unlock;

sched_clock_tick();
--
2.1.0


2014-10-07 12:02:47

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH] sched: Do not try to replenish from a non deadline tasks

Hi Juri,

On 10/07/2014 11:43 AM, Juri Lelli wrote:
> Hi Daniel,
>
> On 24/09/14 14:24, Daniel Wagner wrote:
>> When a PI mutex is shared between an deadline task and normal task we
>> might end up trying to replenish from the normal task. In this case neither
>> dl_runtime, dl_period or dl_deadline are set. replenish_dl_entity() can't do
>> anything useful.
>>
>
> Is this same bug we have with rt_mutex_setprio or something different?
> I'm sorry, but I don't remember anymore :/. It looks like a different
> issue, though.

There are two independent bugs but in both cases BUG_ON in
enqueue_dl_entity() is triggered.

The first one (this patch) is triggered by trying to use the sched_attr
of the non deadline task. The second one is the double queuing.

> Anyway, the callpath you talked about on IRC seems to make sense, does
> what below fix the thing?

I'll give it a spin.

> Could you please point me again to where the
> tests you are running are hosted, so that I can easily reproduce the
> bug here?

Sure. I reduced my original program to this:

http://www.monom.org/rt/pthread_test.c

If you let it run with 'pthread_test inherit' you should see the bugs
eventually.

cheers,
daniel

2014-10-07 12:11:08

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH] sched: Do not try to replenish from a non deadline tasks

On 10/07/2014 02:02 PM, Daniel Wagner wrote:
>> Anyway, the callpath you talked about on IRC seems to make sense, does
>> what below fix the thing?

For the record that was:

16:30 < wagi> juril: rt_mutex_setprio() resets p->dl.dl_throttled. So the pattern is: start_dl_timer()
throttled = 1, rt_mutex_setprio() throlled = 0, sched_switch() -> enqueue_task(), dl_task_timer
-> enqueue_task() throttled is 0

> I'll give it a spin.

Not much fun.

echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
trace-cmd start -e sched -e syscalls:*_futex

with a tracepoint at start_dl_timer, dequeue_dl_entity and enqueue_dl_entity


[ 36.689416] pthread_-1554 0...1 18486104us : sys_futex(uaddr: 6020e0, op: 87, val: 0, utime: 0, uaddr2: 6020e0, val3: 612)
[ 36.689416] pthread_-1554 0d..5 18486108us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=1
[ 36.689416] pthread_-1554 0d..5 18486109us : sched_wakeup: comm=pthread_test pid=1555 prio=-1 success=1 target_cpu=000
[ 36.689416] pthread_-1554 0d..4 18486111us : sched_pi_setprio: comm=pthread_test pid=1554 oldprio=-1 newprio=120
[ 36.689416] pthread_-1554 0d..4 18486111us : sched_dequeue_dl_entity: comm=pthread_test pid=1554 flags=0
[ 36.689416] pthread_-1554 0d..4 18486112us : sched_stat_runtime: comm=pthread_test pid=1554 runtime=851 [ns] vruntime=686604712 [ns]
[ 36.689416] pthread_-1554 0dN.3 18486113us : sched_stat_runtime: comm=pthread_test pid=1554 runtime=1714 [ns] vruntime=686606426 [ns]
[ 36.689416] pthread_-1554 0d..3 18486114us : sched_switch: prev_comm=pthread_test prev_pid=1554 prev_prio=120 prev_state=R ==> next_comm=pthread_test next_pid=1555 next_prio=-1
[ 36.689416] pthread_-1555 0...1 18486117us : sys_futex -> 0x0
[ 36.689416] pthread_-1555 0d..3 18486253us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
[ 36.689416] pthread_-1555 0d..3 18486254us : sched_start_dl_timer: comm=pthread_test pid=1555
[ 36.689416] pthread_-1555 0dN.3 18486255us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
[ 36.689416] pthread_-1555 0dN.3 18486256us : sched_stat_wait: comm=pthread_test pid=1554 delay=139764 [ns]
[ 36.689416] pthread_-1555 0d..3 18486256us : sched_switch: prev_comm=pthread_test prev_pid=1555 prev_prio=-1 prev_state=S ==> next_comm=pthread_test next_pid=1554 next_prio=120
[ 36.689416] pthread_-1554 0...1 18486257us : sys_futex -> 0x0
[ 36.689416] pthread_-1554 0...1 18486258us : sys_futex(uaddr: 6020e0, op: 86, val: 1, utime: 0, uaddr2: 0, val3: 612)
[ 36.689416] pthread_-1554 0d..4 18486262us : sched_pi_setprio: comm=pthread_test pid=1555 oldprio=-1 newprio=-1
[ 36.689416] pthread_-1554 0d..3 18486264us : sched_stat_runtime: comm=pthread_test pid=1554 runtime=10735 [ns] vruntime=686617161 [ns]
[ 36.689416] pthread_-1554 0d..3 18486265us : sched_stat_wait: comm=kworker/u2:0 pid=6 delay=304089 [ns]
[ 36.689416] pthread_-1554 0d..3 18486265us : sched_switch: prev_comm=pthread_test prev_pid=1554 prev_prio=120 prev_state=S ==> next_comm=kworker/u2:0 next_pid=6 next_prio=120
[ 36.689416] kworker/-6 0d.H5 18486392us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=1
[ 36.689416] kworker/-6 0dNH5 18486393us : sched_wakeup: comm=pthread_test pid=1555 prio=-1 success=1 target_cpu=000
[ 36.689416] kworker/-6 0dN.3 18486401us : sched_stat_runtime: comm=kworker/u2:0 pid=6 runtime=128451 [ns] vruntime=2964293245 [ns]
[ 36.689416] kworker/-6 0d..3 18486401us : sched_switch: prev_comm=kworker/u2:0 prev_pid=6 prev_prio=120 prev_state=R ==> next_comm=pthread_test next_pid=1555 next_prio=-1
[ 36.689416] pthread_-1555 0...1 18486403us : sys_futex(uaddr: 6020e0, op: 87, val: 0, utime: 0, uaddr2: 6020e0, val3: 613)
[ 36.689416] pthread_-1555 0d..5 18486408us : sched_stat_sleep: comm=pthread_test pid=1554 delay=143975 [ns]
[ 36.689416] pthread_-1555 0d..5 18486408us : sched_wakeup: comm=pthread_test pid=1554 prio=120 success=1 target_cpu=000
[ 36.689416] pthread_-1555 0d..4 18486420us : sched_pi_setprio: comm=pthread_test pid=1555 oldprio=-1 newprio=-1
[ 36.689416] pthread_-1555 0d..4 18486421us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
[ 36.689416] pthread_-1555 0d..4 18486421us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=8
[ 36.689416] pthread_-1555 0d..4 18486421us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
[ 36.689416] pthread_-1555 0d..4 18486422us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=0
[ 36.689416] pthread_-1555 0d.H4 18486539us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=8
[ 36.689416] ---------------------------------
[ 36.689416] Modules linked in:
[ 36.689416] CPU: 0 PID: 1555 Comm: pthread_test Not tainted 3.17.0-rc5+ #67
[ 36.689416] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 36.689416] task: ffff88007cbc28c0 ti: ffff88007a758000 task.ti: ffff88007a758000
[ 36.689416] RIP: 0010:[<ffffffff8106cc20>] [<ffffffff8106cc20>] enqueue_task_dl+0x2b0/0x330
[ 36.689416] RSP: 0018:ffffffff81e2baa8 EFLAGS: 00010082
[ 36.689416] RAX: 0000000000000000 RBX: ffff88007cbc28c0 RCX: ffff880078217000
[ 36.689416] RDX: 0000000000010104 RSI: 0000000000000046 RDI: ffff88007d041e00
[ 36.689416] RBP: ffffffff81e2bad8 R08: 0000000000000000 R09: ffff880078217eb4
[ 36.689416] R10: 000000088f9c5ab5 R11: 000000000000000d R12: ffff88007cbc2aa8
[ 36.689416] R13: 0000000000000008 R14: ffff88007cbc2aa8 R15: 0000000000000001
[ 36.689416] FS: 00007f82e9959700(0000) GS:ffffffff81e28000(0000) knlGS:0000000000000000
[ 36.689416] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 36.689416] CR2: 00007ff67cbd2000 CR3: 00000000780a2000 CR4: 00000000000006f0
[ 36.689416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 36.689416] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 36.689416] Stack:
[ 36.689416] ffff88007cbc28c0 ffff88007cbc2b08 ffffffff81e2bb48 ffffffff82010ca8
[ 36.689416] ffffffff82010c20 0000000000000003 ffffffff81e2baf0 ffffffff8106cd17
[ 36.689416] ffff88007cbc2b08 ffffffff81e2bb30 ffffffff81094ecf ffffffff8106cca0
[ 36.689416] Call Trace:
[ 36.689416] <IRQ>
[ 36.689416] [<ffffffff8106cd17>] dl_task_timer+0x77/0xb0
[ 36.689416] [<ffffffff81094ecf>] __run_hrtimer+0x7f/0x200
[ 36.689416] [<ffffffff8106cca0>] ? enqueue_task_dl+0x330/0x330
[ 36.689416] [<ffffffff810952a7>] hrtimer_interrupt+0xd7/0x250
[ 36.689416] [<ffffffff8102b022>] local_apic_timer_interrupt+0x22/0x50
[ 36.689416] [<ffffffff8102b698>] smp_apic_timer_interrupt+0x38/0x50
[ 36.689416] [<ffffffff818bd17d>] apic_timer_interrupt+0x6d/0x80
[ 36.689416] [<ffffffff818bb7d3>] ? _raw_spin_unlock_irqrestore+0x33/0x50
[ 36.689416] [<ffffffff815289cd>] ata_scsi_queuecmd+0x13d/0x420
[ 36.689416] [<ffffffff815259d0>] ? ata_scsi_invalid_field+0x40/0x40
[ 36.689416] [<ffffffff814fd5bf>] scsi_dispatch_cmd+0x9f/0x190
[ 36.689416] [<ffffffff81505ce5>] scsi_request_fn+0x415/0x650
[ 36.689416] [<ffffffff81333c0e>] __blk_run_queue+0x2e/0x40
[ 36.689416] [<ffffffff81333c41>] blk_run_queue+0x21/0x40
[ 36.689416] [<ffffffff81503580>] scsi_run_queue+0x260/0x300
[ 36.689416] [<ffffffff814fd1d3>] ? scsi_put_command+0x73/0xc0
[ 36.689416] [<ffffffff81505f5b>] scsi_next_command+0x1b/0x30
[ 36.689416] [<ffffffff815060c5>] scsi_end_request+0x155/0x1d0
[ 36.689416] [<ffffffff8150628c>] scsi_io_completion+0xec/0x5e0
[ 36.689416] [<ffffffff814fd769>] scsi_finish_command+0xb9/0xf0
[ 36.689416] [<ffffffff815058aa>] scsi_softirq_done+0x10a/0x130
[ 36.689416] [<ffffffff8133cabb>] blk_done_softirq+0x7b/0x90
[ 36.689416] [<ffffffff81045bd4>] __do_softirq+0x114/0x2e0
[ 36.689416] [<ffffffff81046035>] irq_exit+0xa5/0xb0
[ 36.689416] [<ffffffff81003f90>] do_IRQ+0x50/0xe0
[ 36.689416] [<ffffffff818bcead>] common_interrupt+0x6d/0x6d
[ 36.689416] <EOI>
[ 36.689416] [<ffffffff818b6701>] ? __schedule+0x2e1/0x6d0
[ 36.689416] [<ffffffff8136167e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 36.689416] [<ffffffff818b6b13>] schedule+0x23/0x60
[ 36.689416] [<ffffffff818bc5f3>] int_careful+0x12/0x1e
[ 36.689416] Code: 38 02 00 00 00 00 00 00 48 89 83 20 02 00 00 eb 9b be 1a 01 00 00 48 c7 c7 1b f0 cb 81 e8 49 5c fd ff 48 8b 93 e8 01 00 00 eb bd <0f> 0b 0f 0b 48 c7 c7 f8 44 cb 81 31 c0 c6 05 31 1a 09 01 01 e8
[ 36.689416] RIP [<ffffffff8106cc20>] enqueue_task_dl+0x2b0/0x330
[ 36.689416] RSP <ffffffff81e2baa8>
[ 36.689416] ---[ end trace 823be5ba7376cc37 ]---
[ 36.689416] Kernel panic - not syncing: Fatal exception in interrupt
[ 36.689416] Dumping ftrace buffer:
[ 36.689416] (ftrace buffer empty)
[ 36.689416] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[ 36.689416] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

2014-10-07 13:20:56

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH] sched: Do not try to replenish from a non deadline tasks

On 10/07/2014 02:10 PM, Daniel Wagner wrote:
> [ 36.689416] pthread_-1555 0d..5 18486408us : sched_stat_sleep: comm=pthread_test pid=1554 delay=143975 [ns]
> [ 36.689416] pthread_-1555 0d..5 18486408us : sched_wakeup: comm=pthread_test pid=1554 prio=120 success=1 target_cpu=000
> [ 36.689416] pthread_-1555 0d..4 18486420us : sched_pi_setprio: comm=pthread_test pid=1555 oldprio=-1 newprio=-1
> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=8
> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
> [ 36.689416] pthread_-1555 0d..4 18486422us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=0
> [ 36.689416] pthread_-1555 0d.H4 18486539us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=8

I noticed that the last two lines are different. Maybe that is yet
another path into enqueue_task_dl().

2014-10-09 09:47:24

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH] sched: Do not try to replenish from a non deadline tasks

Hi Juri,

On 10/07/2014 03:20 PM, Daniel Wagner wrote:
> On 10/07/2014 02:10 PM, Daniel Wagner wrote:
>> [ 36.689416] pthread_-1555 0d..5 18486408us : sched_stat_sleep: comm=pthread_test pid=1554 delay=143975 [ns]
>> [ 36.689416] pthread_-1555 0d..5 18486408us : sched_wakeup: comm=pthread_test pid=1554 prio=120 success=1 target_cpu=000
>> [ 36.689416] pthread_-1555 0d..4 18486420us : sched_pi_setprio: comm=pthread_test pid=1555 oldprio=-1 newprio=-1
>> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
>> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=8
>> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
>> [ 36.689416] pthread_-1555 0d..4 18486422us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=0
>> [ 36.689416] pthread_-1555 0d.H4 18486539us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=8
>
> I noticed that the last two lines are different. Maybe that is yet
> another path into enqueue_task_dl().

So more testing revealed that the patch also starve both task
eventually. Both process make no progress at all.

runnable tasks:
task PID tree-key switches prio exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
systemd 1 170.771190 2147 120 170.771190 433.550134 359375.395748 /autogroup-1
kthreadd 2 12481.085697 66 120 12481.085697 1.062411 848978.893057 /
ksoftirqd/0 3 12486.586001 10125 120 12486.586001 48.522819 856220.673708 /
kworker/0:0H 5 1218.349308 6 100 1218.349308 0.106697 835.585066 /
kworker/u2:0 6 12483.710138 1947 120 12483.710138 45.712779 854218.654119 /
khelper 7 13.326326 2 100 13.326326 0.000000 0.000000 /
kdevtmpfs 8 2001.861157 139 120 2001.861157 1.787992 10775.571085 /
netns 9 17.326324 2 100 17.326324 0.000000 0.000000 /
kworker/u2:1 10 2001.760377 678 120 2001.760377 9.362114 10766.675796 /
writeback 19 32.293597 2 100 32.293597 0.003126 0.002243 /
crypto 21 33.494501 2 100 33.494501 0.002662 0.002051 /
bioset 23 34.995536 2 100 34.995536 0.002601 0.002002 /
kblockd 25 36.497285 2 100 36.497285 0.003835 0.002050 /
ata_sff 61 78.865458 2 100 78.865458 0.004438 0.002205 /
khubd 64 80.079505 2 120 80.079505 0.015444 0.003833 /
md 66 81.268188 2 100 81.268188 0.003050 0.002173 /
kworker/0:1 68 12484.564192 957 120 12484.564192 49.238817 855345.418329 /
cfg80211 69 82.636198 2 100 82.636198 0.003627 0.002132 /
rpciod 130 156.580245 2 100 156.580245 0.005374 0.004012 /
kswapd0 138 928.454330 3 120 928.454330 0.048229 99.749401 /
fsnotify_mark 143 1963.891522 14 120 1963.891522 0.065959 3496.886517 /
nfsiod 146 171.432785 2 100 171.432785 0.004305 0.003535 /
xfsalloc 149 174.366108 2 100 174.366108 0.002861 0.002284 /
xfs_mru_cache 151 175.567171 2 100 175.567171 0.002752 0.002090 /
xfslogd 153 177.068445 2 100 177.068445 0.003142 0.002171 /
acpi_thermal_pm 177 192.956400 2 100 192.956400 0.003639 0.002317 /
scsi_eh_0 203 1188.912866 8 120 1188.912866 2.029062 153.702869 /
scsi_tmf_0 205 223.187411 2 100 223.187411 0.003086 0.002421 /
scsi_eh_1 207 1531.958332 29 120 1531.958332 2.988008 1276.298054 /
scsi_tmf_1 209 226.190410 2 100 226.190410 0.004007 0.002328 /
kpsmoused 218 235.418718 2 100 235.418718 0.003621 0.002384 /
ipv6_addrconf 227 244.650047 2 100 244.650047 0.004004 0.004044 /
deferwq 232 247.754373 2 100 247.754373 0.003005 0.002112 /
kworker/0:1H 882 12482.321178 1873 100 12482.321178 51.220049 850000.764070 /
xfs-data/sda2 887 1230.066143 2 100 1230.066143 0.003715 0.002886 /
xfs-conv/sda2 888 1231.567109 2 100 1231.567109 0.002769 0.002330 /
xfs-cil/sda2 889 1233.568672 2 100 1233.568672 0.044412 0.002514 /
xfsaild/sda2 890 12484.343046 4544 120 12484.343046 74.991234 847492.165181 /
systemd-journal 902 1019.896708 410 120 1019.896708 1021.188821 848668.196674 /autogroup-5
kauditd 905 7231.521873 59 120 7231.521873 0.606636 358980.677314 /
systemd-udevd 971 60.572337 520 120 60.572337 53.544845 12283.357288 /autogroup-10
jbd2/sda1-8 1422 1957.029513 2 120 1957.029513 0.062018 0.004464 /
ext4-rsv-conver 1423 1959.032062 2 100 1959.032062 0.004310 0.002700 /
auditd 1428 6.142863 98 116 6.142863 10.602551 845312.563408 /autogroup-21
auditd 1436 5.502422 81 116 5.502422 2.335235 357502.433148 /autogroup-21
NetworkManager 1445 82.669958 562 120 82.669958 64.904912 844091.171520 /autogroup-26
NetworkManager 1467 20.709072 1 120 20.709072 0.014731 0.000000 /autogroup-26
gmain 1470 66.064602 6 120 66.064602 0.366758 3699.249128 /autogroup-26
gdbus 1474 79.637929 140 120 79.637929 4.511798 838151.788025 /autogroup-26
systemd-logind 1446 8.311585 106 120 8.311585 9.677422 355305.433254 /autogroup-27
dbus-daemon 1447 23.241715 345 120 23.241715 24.616880 844172.717523 /autogroup-28
crond 1449 3.123550 18 120 3.123550 4.516948 841485.755390 /autogroup-29
agetty 1455 2.936206 7 120 2.936206 7.052828 4305.395671 /autogroup-37
agetty 1456 12.159459 11 120 12.159459 16.314073 3286.999222 /autogroup-36
polkitd 1475 20.631235 93 120 20.631235 15.310572 349234.784825 /autogroup-32
gmain 1478 8.881526 2 120 8.881526 0.065824 0.000000 /autogroup-32
gdbus 1479 21.328501 38 120 21.328501 2.320225 349229.316491 /autogroup-32
JS GC Helper 1480 14.370672 1 120 14.370672 0.031663 0.000000 /autogroup-32
JS Sour~ Thread 1481 17.473336 7 120 17.473336 0.071963 3.840505 /autogroup-32
runaway-killer- 1482 20.472425 5 120 20.472425 0.051006 0.250475 /autogroup-32
sshd 1486 13.338015 67 120 13.338015 14.729391 349142.506840 /autogroup-35
dhclient 1490 66.072235 65 120 66.072235 17.853255 1029.655591 /autogroup-26
sshd 1505 27.497743 218 120 27.497743 25.950965 340666.314097 /autogroup-39
systemd 1507 9.239660 26 120 9.239660 10.300048 729.175924 /autogroup-40
bash 1508 4560.989053 99 120 4560.989053 14.807953 340582.911272 /autogroup-41
(sd-pam) 1511 3.251482 1 120 3.251482 0.050250 0.000000 /autogroup-40
kworker/0:0 1529 12482.288839 648 120 12482.288839 24.882158 535264.262021 /
pthread_test 1536 0.000000 1510185 -1 0.000000 7142.137941 61496.639384 /autogroup-41
pthread_test 1537 0.000000 1508994 -1 0.000000 19830.272655 0.000000 /autogroup-41
sshd 1538 45.693223 859 120 45.693223 44.510292 497299.440632 /autogroup-42
bash 1540 336.221438 292 120 336.221438 15.725550 496379.865890 /autogroup-43
kworker/0:2 1561 12486.603821 72 120 12486.603821 1.009523 8366.740527 /
R cat 1562 337.711978 0 120 337.711978 0.655400 0.000000 /autogroup-43


[ 478.792528] pthread_test R running task 14232 1536 1508 0x10000080
[ 478.793133] ffff880078d1bf60 0000000000000046 ffff88007caee5e0 ffff880078d1bfd8
[ 478.793774] ffffffff81e1d4c0 00007fffa9c5e7b0 ffffffff8136187e 0000000000000000
[ 478.794451] 00000000004009d0 00007fffa9c5e890 0000000000000000 0000000000000000
[ 478.795118] Call Trace:
[ 478.795326] [<ffffffff8136187e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 478.795846] [<ffffffff818b6d13>] schedule+0x23/0x60
[ 478.796256] [<ffffffff818bd20f>] retint_careful+0x12/0x2d
[ 478.796695] pthread_test S ffff88007caea8c0 14424 1537 1508 0x10000080
[ 478.797287] ffff880078e63bd8 0000000000000046 ffff88007caea8c0 ffff880078e63fd8
[ 478.797909] ffffffff81e1d4c0 ffff88007caea8c0 ffff880078e63be8 ffff880000158010
[ 478.798542] ffff880078e63c58 0000000000000001 0000000000000001 ffff88007caea8c0
[ 478.799175] Call Trace:
[ 478.799376] [<ffffffff818b6d13>] schedule+0x23/0x60
[ 478.799773] [<ffffffff818ba2fb>] __rt_mutex_slowlock+0x5b/0xe0
[ 478.800257] [<ffffffff818ba4ce>] rt_mutex_slowlock+0xbe/0x1c0
[ 478.800725] [<ffffffff8107cbea>] rt_mutex_timed_futex_lock+0x3a/0x40
[ 478.801252] [<ffffffff810a6437>] futex_lock_pi.isra.20+0x257/0x370
[ 478.801753] [<ffffffff81068cd5>] ? sched_clock_local.constprop.6+0x15/0x80
[ 478.802317] [<ffffffff81068ec5>] ? sched_clock_cpu+0x55/0x80
[ 478.802774] [<ffffffff81068f45>] ? local_clock+0x15/0x30
[ 478.803225] [<ffffffff81068cd5>] ? sched_clock_local.constprop.6+0x15/0x80
[ 478.803784] [<ffffffff810a720c>] do_futex+0x2bc/0xa90
[ 478.804209] [<ffffffff81068cd5>] ? sched_clock_local.constprop.6+0x15/0x80
[ 478.804766] [<ffffffff81068ec5>] ? sched_clock_cpu+0x55/0x80
[ 478.805236] [<ffffffff81068f45>] ? local_clock+0x15/0x30
[ 478.805669] [<ffffffff8109a936>] ? current_kernel_time+0x56/0xb0
[ 478.806165] [<ffffffff8107648d>] ? trace_hardirqs_on+0xd/0x10
[ 478.806629] [<ffffffff810a7a4c>] SyS_futex+0x6c/0x150
[ 478.807046] [<ffffffff8100f60f>] ? syscall_trace_enter+0x21f/0x230
[ 478.807549] [<ffffffff818bc75f>] tracesys+0xdc/0xe1

(gdb) p $lx_task_by_pid(1536).dl
$1 = {
rb_node = {
__rb_parent_color = 18446612134406055880,
rb_right = 0x0,
rb_left = 0x0
},
dl_runtime = 0,
dl_deadline = 0,
dl_period = 0,
dl_bw = 0,
runtime = -283620,
deadline = 441965145261,
flags = 0,
dl_throttled = 0,
dl_new = 0,
dl_boosted = 1,
dl_yielded = 0,
dl_timer = {
node = {
node = {
__rb_parent_color = 18446612134406055976,
rb_right = 0x0,
rb_left = 0x0
},
expires = {
tv64 = 0
}
},
_softexpires = {
tv64 = 0
},
function = 0x0,
base = 0xffffffff82010d68 <hrtimer_bases+136>,
state = 0,
start_pid = -1,
start_site = 0x0,
start_comm = '\000' <repeats 15 times>
}
}

(gdb) p $lx_task_by_pid(1537).dl
$2 = {
rb_node = {
__rb_parent_color = 18446612134406040232,
rb_right = 0x0,
rb_left = 0x0
},
dl_runtime = 100000,
dl_deadline = 200000,
dl_period = 200000,
dl_bw = 524288,
runtime = 70824,
deadline = 441965047345,
flags = 0,
dl_throttled = 0,
dl_new = 0,
dl_boosted = 0,
dl_yielded = 0,
dl_timer = {
node = {
node = {
__rb_parent_color = 18446612134406040328,
rb_right = 0x0,
rb_left = 0x0
},
expires = {
tv64 = 441824359503
}
},
_softexpires = {
tv64 = 441824359503
},
function = 0xffffffff8106ce50 <dl_task_timer>,
base = 0xffffffff82010d68 <hrtimer_bases+136>,
state = 0,
start_pid = 1537,
start_site = 0xffffffff8106d178 <update_curr_dl+552>,
start_comm = "pthread_test\000\000\000"
}
}

cheers,
daniel

2014-10-09 13:51:20

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH] sched: Do not try to replenish from a non deadline tasks

Hi Daniel,

On 09/10/14 10:47, Daniel Wagner wrote:
> Hi Juri,
>
> On 10/07/2014 03:20 PM, Daniel Wagner wrote:
>> On 10/07/2014 02:10 PM, Daniel Wagner wrote:
>>> [ 36.689416] pthread_-1555 0d..5 18486408us : sched_stat_sleep: comm=pthread_test pid=1554 delay=143975 [ns]
>>> [ 36.689416] pthread_-1555 0d..5 18486408us : sched_wakeup: comm=pthread_test pid=1554 prio=120 success=1 target_cpu=000
>>> [ 36.689416] pthread_-1555 0d..4 18486420us : sched_pi_setprio: comm=pthread_test pid=1555 oldprio=-1 newprio=-1
>>> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
>>> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=8
>>> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
>>> [ 36.689416] pthread_-1555 0d..4 18486422us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=0
>>> [ 36.689416] pthread_-1555 0d.H4 18486539us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=8
>>
>> I noticed that the last two lines are different. Maybe that is yet
>> another path into enqueue_task_dl().
>
> So more testing revealed that the patch also starve both task
> eventually. Both process make no progress at all.
>

Mmm, that's bad.

Can you give a try to this different patch (after applying
the other one I sent out)?

This thing is looking good on my box. I'd like to do more
testing, but I have to context switch for a bit now :/.

Thanks a lot,

- Juri

>From f07e6373f89ad4d4173c8c5cd51c1595328888c2 Mon Sep 17 00:00:00 2001
From: Juri Lelli <[email protected]>
Date: Thu, 9 Oct 2014 11:36:55 +0100
Subject: [PATCH 2/2] sched/deadline: fix races between rt_mutex_setprio and
dl_task_timer

Signed-off-by: Juri Lelli <[email protected]>
---
kernel/sched/deadline.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index e89c27b..16a10f0 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -518,12 +518,20 @@ again:
}

/*
- * We need to take care of a possible races here. In fact, the
- * task might have changed its scheduling policy to something
- * different from SCHED_DEADLINE or changed its reservation
- * parameters (through sched_setattr()).
+ * We need to take care of several possible races here:
+ *
+ * - the task might have changed its scheduling policy
+ * to something different than SCHED_DEADLINE
+ * - the task might have changed its reservation parameters
+ * (through sched_setattr())
+ * - the task might have been boosted by someone else and
+ * might be in the boosting/deboosting path
+ *
+ * In all this cases we bail out, has the task is already
+ * in the runqueue or is going to be enqueued back anyway.
*/
- if (!dl_task(p) || dl_se->dl_new)
+ if (!dl_task(p) || dl_se->dl_new ||
+ dl_se->dl_boosted || !dl_se->dl_throttled)
goto unlock;

sched_clock_tick();
--
2.1.0

> runnable tasks:
> task PID tree-key switches prio exec-runtime sum-exec sum-sleep
> ----------------------------------------------------------------------------------------------------------
> systemd 1 170.771190 2147 120 170.771190 433.550134 359375.395748 /autogroup-1
> kthreadd 2 12481.085697 66 120 12481.085697 1.062411 848978.893057 /
> ksoftirqd/0 3 12486.586001 10125 120 12486.586001 48.522819 856220.673708 /
> kworker/0:0H 5 1218.349308 6 100 1218.349308 0.106697 835.585066 /
> kworker/u2:0 6 12483.710138 1947 120 12483.710138 45.712779 854218.654119 /
> khelper 7 13.326326 2 100 13.326326 0.000000 0.000000 /
> kdevtmpfs 8 2001.861157 139 120 2001.861157 1.787992 10775.571085 /
> netns 9 17.326324 2 100 17.326324 0.000000 0.000000 /
> kworker/u2:1 10 2001.760377 678 120 2001.760377 9.362114 10766.675796 /
> writeback 19 32.293597 2 100 32.293597 0.003126 0.002243 /
> crypto 21 33.494501 2 100 33.494501 0.002662 0.002051 /
> bioset 23 34.995536 2 100 34.995536 0.002601 0.002002 /
> kblockd 25 36.497285 2 100 36.497285 0.003835 0.002050 /
> ata_sff 61 78.865458 2 100 78.865458 0.004438 0.002205 /
> khubd 64 80.079505 2 120 80.079505 0.015444 0.003833 /
> md 66 81.268188 2 100 81.268188 0.003050 0.002173 /
> kworker/0:1 68 12484.564192 957 120 12484.564192 49.238817 855345.418329 /
> cfg80211 69 82.636198 2 100 82.636198 0.003627 0.002132 /
> rpciod 130 156.580245 2 100 156.580245 0.005374 0.004012 /
> kswapd0 138 928.454330 3 120 928.454330 0.048229 99.749401 /
> fsnotify_mark 143 1963.891522 14 120 1963.891522 0.065959 3496.886517 /
> nfsiod 146 171.432785 2 100 171.432785 0.004305 0.003535 /
> xfsalloc 149 174.366108 2 100 174.366108 0.002861 0.002284 /
> xfs_mru_cache 151 175.567171 2 100 175.567171 0.002752 0.002090 /
> xfslogd 153 177.068445 2 100 177.068445 0.003142 0.002171 /
> acpi_thermal_pm 177 192.956400 2 100 192.956400 0.003639 0.002317 /
> scsi_eh_0 203 1188.912866 8 120 1188.912866 2.029062 153.702869 /
> scsi_tmf_0 205 223.187411 2 100 223.187411 0.003086 0.002421 /
> scsi_eh_1 207 1531.958332 29 120 1531.958332 2.988008 1276.298054 /
> scsi_tmf_1 209 226.190410 2 100 226.190410 0.004007 0.002328 /
> kpsmoused 218 235.418718 2 100 235.418718 0.003621 0.002384 /
> ipv6_addrconf 227 244.650047 2 100 244.650047 0.004004 0.004044 /
> deferwq 232 247.754373 2 100 247.754373 0.003005 0.002112 /
> kworker/0:1H 882 12482.321178 1873 100 12482.321178 51.220049 850000.764070 /
> xfs-data/sda2 887 1230.066143 2 100 1230.066143 0.003715 0.002886 /
> xfs-conv/sda2 888 1231.567109 2 100 1231.567109 0.002769 0.002330 /
> xfs-cil/sda2 889 1233.568672 2 100 1233.568672 0.044412 0.002514 /
> xfsaild/sda2 890 12484.343046 4544 120 12484.343046 74.991234 847492.165181 /
> systemd-journal 902 1019.896708 410 120 1019.896708 1021.188821 848668.196674 /autogroup-5
> kauditd 905 7231.521873 59 120 7231.521873 0.606636 358980.677314 /
> systemd-udevd 971 60.572337 520 120 60.572337 53.544845 12283.357288 /autogroup-10
> jbd2/sda1-8 1422 1957.029513 2 120 1957.029513 0.062018 0.004464 /
> ext4-rsv-conver 1423 1959.032062 2 100 1959.032062 0.004310 0.002700 /
> auditd 1428 6.142863 98 116 6.142863 10.602551 845312.563408 /autogroup-21
> auditd 1436 5.502422 81 116 5.502422 2.335235 357502.433148 /autogroup-21
> NetworkManager 1445 82.669958 562 120 82.669958 64.904912 844091.171520 /autogroup-26
> NetworkManager 1467 20.709072 1 120 20.709072 0.014731 0.000000 /autogroup-26
> gmain 1470 66.064602 6 120 66.064602 0.366758 3699.249128 /autogroup-26
> gdbus 1474 79.637929 140 120 79.637929 4.511798 838151.788025 /autogroup-26
> systemd-logind 1446 8.311585 106 120 8.311585 9.677422 355305.433254 /autogroup-27
> dbus-daemon 1447 23.241715 345 120 23.241715 24.616880 844172.717523 /autogroup-28
> crond 1449 3.123550 18 120 3.123550 4.516948 841485.755390 /autogroup-29
> agetty 1455 2.936206 7 120 2.936206 7.052828 4305.395671 /autogroup-37
> agetty 1456 12.159459 11 120 12.159459 16.314073 3286.999222 /autogroup-36
> polkitd 1475 20.631235 93 120 20.631235 15.310572 349234.784825 /autogroup-32
> gmain 1478 8.881526 2 120 8.881526 0.065824 0.000000 /autogroup-32
> gdbus 1479 21.328501 38 120 21.328501 2.320225 349229.316491 /autogroup-32
> JS GC Helper 1480 14.370672 1 120 14.370672 0.031663 0.000000 /autogroup-32
> JS Sour~ Thread 1481 17.473336 7 120 17.473336 0.071963 3.840505 /autogroup-32
> runaway-killer- 1482 20.472425 5 120 20.472425 0.051006 0.250475 /autogroup-32
> sshd 1486 13.338015 67 120 13.338015 14.729391 349142.506840 /autogroup-35
> dhclient 1490 66.072235 65 120 66.072235 17.853255 1029.655591 /autogroup-26
> sshd 1505 27.497743 218 120 27.497743 25.950965 340666.314097 /autogroup-39
> systemd 1507 9.239660 26 120 9.239660 10.300048 729.175924 /autogroup-40
> bash 1508 4560.989053 99 120 4560.989053 14.807953 340582.911272 /autogroup-41
> (sd-pam) 1511 3.251482 1 120 3.251482 0.050250 0.000000 /autogroup-40
> kworker/0:0 1529 12482.288839 648 120 12482.288839 24.882158 535264.262021 /
> pthread_test 1536 0.000000 1510185 -1 0.000000 7142.137941 61496.639384 /autogroup-41
> pthread_test 1537 0.000000 1508994 -1 0.000000 19830.272655 0.000000 /autogroup-41
> sshd 1538 45.693223 859 120 45.693223 44.510292 497299.440632 /autogroup-42
> bash 1540 336.221438 292 120 336.221438 15.725550 496379.865890 /autogroup-43
> kworker/0:2 1561 12486.603821 72 120 12486.603821 1.009523 8366.740527 /
> R cat 1562 337.711978 0 120 337.711978 0.655400 0.000000 /autogroup-43
>
>
> [ 478.792528] pthread_test R running task 14232 1536 1508 0x10000080
> [ 478.793133] ffff880078d1bf60 0000000000000046 ffff88007caee5e0 ffff880078d1bfd8
> [ 478.793774] ffffffff81e1d4c0 00007fffa9c5e7b0 ffffffff8136187e 0000000000000000
> [ 478.794451] 00000000004009d0 00007fffa9c5e890 0000000000000000 0000000000000000
> [ 478.795118] Call Trace:
> [ 478.795326] [<ffffffff8136187e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 478.795846] [<ffffffff818b6d13>] schedule+0x23/0x60
> [ 478.796256] [<ffffffff818bd20f>] retint_careful+0x12/0x2d
> [ 478.796695] pthread_test S ffff88007caea8c0 14424 1537 1508 0x10000080
> [ 478.797287] ffff880078e63bd8 0000000000000046 ffff88007caea8c0 ffff880078e63fd8
> [ 478.797909] ffffffff81e1d4c0 ffff88007caea8c0 ffff880078e63be8 ffff880000158010
> [ 478.798542] ffff880078e63c58 0000000000000001 0000000000000001 ffff88007caea8c0
> [ 478.799175] Call Trace:
> [ 478.799376] [<ffffffff818b6d13>] schedule+0x23/0x60
> [ 478.799773] [<ffffffff818ba2fb>] __rt_mutex_slowlock+0x5b/0xe0
> [ 478.800257] [<ffffffff818ba4ce>] rt_mutex_slowlock+0xbe/0x1c0
> [ 478.800725] [<ffffffff8107cbea>] rt_mutex_timed_futex_lock+0x3a/0x40
> [ 478.801252] [<ffffffff810a6437>] futex_lock_pi.isra.20+0x257/0x370
> [ 478.801753] [<ffffffff81068cd5>] ? sched_clock_local.constprop.6+0x15/0x80
> [ 478.802317] [<ffffffff81068ec5>] ? sched_clock_cpu+0x55/0x80
> [ 478.802774] [<ffffffff81068f45>] ? local_clock+0x15/0x30
> [ 478.803225] [<ffffffff81068cd5>] ? sched_clock_local.constprop.6+0x15/0x80
> [ 478.803784] [<ffffffff810a720c>] do_futex+0x2bc/0xa90
> [ 478.804209] [<ffffffff81068cd5>] ? sched_clock_local.constprop.6+0x15/0x80
> [ 478.804766] [<ffffffff81068ec5>] ? sched_clock_cpu+0x55/0x80
> [ 478.805236] [<ffffffff81068f45>] ? local_clock+0x15/0x30
> [ 478.805669] [<ffffffff8109a936>] ? current_kernel_time+0x56/0xb0
> [ 478.806165] [<ffffffff8107648d>] ? trace_hardirqs_on+0xd/0x10
> [ 478.806629] [<ffffffff810a7a4c>] SyS_futex+0x6c/0x150
> [ 478.807046] [<ffffffff8100f60f>] ? syscall_trace_enter+0x21f/0x230
> [ 478.807549] [<ffffffff818bc75f>] tracesys+0xdc/0xe1
>
> (gdb) p $lx_task_by_pid(1536).dl
> $1 = {
> rb_node = {
> __rb_parent_color = 18446612134406055880,
> rb_right = 0x0,
> rb_left = 0x0
> },
> dl_runtime = 0,
> dl_deadline = 0,
> dl_period = 0,
> dl_bw = 0,
> runtime = -283620,
> deadline = 441965145261,
> flags = 0,
> dl_throttled = 0,
> dl_new = 0,
> dl_boosted = 1,
> dl_yielded = 0,
> dl_timer = {
> node = {
> node = {
> __rb_parent_color = 18446612134406055976,
> rb_right = 0x0,
> rb_left = 0x0
> },
> expires = {
> tv64 = 0
> }
> },
> _softexpires = {
> tv64 = 0
> },
> function = 0x0,
> base = 0xffffffff82010d68 <hrtimer_bases+136>,
> state = 0,
> start_pid = -1,
> start_site = 0x0,
> start_comm = '\000' <repeats 15 times>
> }
> }
>
> (gdb) p $lx_task_by_pid(1537).dl
> $2 = {
> rb_node = {
> __rb_parent_color = 18446612134406040232,
> rb_right = 0x0,
> rb_left = 0x0
> },
> dl_runtime = 100000,
> dl_deadline = 200000,
> dl_period = 200000,
> dl_bw = 524288,
> runtime = 70824,
> deadline = 441965047345,
> flags = 0,
> dl_throttled = 0,
> dl_new = 0,
> dl_boosted = 0,
> dl_yielded = 0,
> dl_timer = {
> node = {
> node = {
> __rb_parent_color = 18446612134406040328,
> rb_right = 0x0,
> rb_left = 0x0
> },
> expires = {
> tv64 = 441824359503
> }
> },
> _softexpires = {
> tv64 = 441824359503
> },
> function = 0xffffffff8106ce50 <dl_task_timer>,
> base = 0xffffffff82010d68 <hrtimer_bases+136>,
> state = 0,
> start_pid = 1537,
> start_site = 0xffffffff8106d178 <update_curr_dl+552>,
> start_comm = "pthread_test\000\000\000"
> }
> }
>
> cheers,
> daniel
>

2014-10-10 08:17:39

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH] sched: Do not try to replenish from a non deadline tasks

Good Morning Juri,

On 10/09/2014 03:51 PM, Juri Lelli wrote:
> Hi Daniel,
>
> On 09/10/14 10:47, Daniel Wagner wrote:
>> Hi Juri,
>>
>> On 10/07/2014 03:20 PM, Daniel Wagner wrote:
>>> On 10/07/2014 02:10 PM, Daniel Wagner wrote:
>>>> [ 36.689416] pthread_-1555 0d..5 18486408us : sched_stat_sleep: comm=pthread_test pid=1554 delay=143975 [ns]
>>>> [ 36.689416] pthread_-1555 0d..5 18486408us : sched_wakeup: comm=pthread_test pid=1554 prio=120 success=1 target_cpu=000
>>>> [ 36.689416] pthread_-1555 0d..4 18486420us : sched_pi_setprio: comm=pthread_test pid=1555 oldprio=-1 newprio=-1
>>>> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
>>>> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=8
>>>> [ 36.689416] pthread_-1555 0d..4 18486421us : sched_dequeue_dl_entity: comm=pthread_test pid=1555 flags=0
>>>> [ 36.689416] pthread_-1555 0d..4 18486422us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=0
>>>> [ 36.689416] pthread_-1555 0d.H4 18486539us : sched_enqueue_dl_entity: comm=pthread_test pid=1555 pi_comm=pthread_test pi_pid=1555 flags=8
>>>
>>> I noticed that the last two lines are different. Maybe that is yet
>>> another path into enqueue_task_dl().
>>
>> So more testing revealed that the patch also starve both task
>> eventually. Both process make no progress at all.
>>
>
> Mmm, that's bad.
>
> Can you give a try to this different patch (after applying
> the other one I sent out)?
>
> This thing is looking good on my box. I'd like to do more
> testing, but I have to context switch for a bit now :/.

I'll applied both patches and my test program runs fine since a couple
hours. Before that a panic triggered within minutes.

cheers,
daniel