Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753950AbdDKHka (ORCPT ); Tue, 11 Apr 2017 03:40:30 -0400 Received: from mail-wr0-f181.google.com ([209.85.128.181]:33981 "EHLO mail-wr0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752697AbdDKHkZ (ORCPT ); Tue, 11 Apr 2017 03:40:25 -0400 Subject: Re: [PATCH] sched/deadline: Throttle a constrained task activated if overflow To: xlpang@redhat.com, linux-kernel@vger.kernel.org References: <1491816131-20268-1-git-send-email-xlpang@redhat.com> <4d24b94c-1e57-b82c-ff0f-099157ba5526@redhat.com> <58EC3317.80303@redhat.com> Cc: Peter Zijlstra , Juri Lelli , Ingo Molnar , Luca Abeni , Steven Rostedt , Tommaso Cucinotta , =?UTF-8?Q?R=c3=b4mulo_Silva_de_Oliveira?= , Mathieu Poirier From: Daniel Bristot de Oliveira Message-ID: <568b0e8b-403d-8646-44ab-a870b17cff23@redhat.com> Date: Tue, 11 Apr 2017 09:40:12 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <58EC3317.80303@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3319 Lines: 73 On 04/11/2017 03:36 AM, Xunlei Pang wrote: > On 04/11/2017 at 04:47 AM, Daniel Bristot de Oliveira wrote: >> On 04/10/2017 11:22 AM, Xunlei Pang wrote: >>> I was testing Daniel's changes with his test case in the commit >>> df8eac8cafce ("sched/deadline: Throttle a constrained deadline >>> task activated after the deadline"), and tweaked it a little. >>> >>> Instead of having the runtime equal to the deadline, I tweaked >>> runtime, deadline and sleep value to ensure every time it calls >>> dl_check_constrained_dl() with "dl_se->deadline > rq_clock(rq)" >>> as well as true dl_entity_overflow(), so it does replenishing >>> every wake up in update_dl_entity(), and break its bandwidth. >>> >>> Daniel's test case had: >>> attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms */ >>> attr.sched_deadline = 2 * 1000 * 1000; /* 2 ms*/ >>> attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */ >>> ts.tv_sec = 0; >>> ts.tv_nsec = 2000 * 1000; /* 2 ms */ >>> >>> I changed it to: >>> attr.sched_runtime = 5 * 1000 * 1000; /* 5 ms */ >>> attr.sched_deadline = 7 * 1000 * 1000; /* 7 ms */ >>> attr.sched_period = 1 * 1000 * 1000 * 1000; /* 1 s */ >>> ts.tv_sec = 0; >>> ts.tv_nsec = 1000 * 1000; /* 1 ms */ >>> >>> The change above can result in over 25% of the CPU on my machine. >>> >>> In order to avoid the beakage, we improve dl_check_constrained_dl() >>> to prevent dl tasks from being activated until the next period if it >>> runs out of bandwidth of the current period. >> The problem now is that, with your patch, we will throttle the task >> with some possible runtime. Moreover, the task did not brake any >> rule, like being awakened after the deadline - the user-space is not >> misbehaving. >> >> That is +- what the reproducer is doing when using your patch, >> (I put some trace_printk when noticing the overflow in the wakeup). >> >> -0 [007] d.h. 1505.066439: enqueue_task_dl: my current runtime is 3657361 and the deadline is 4613027 from now >> -0 [007] d.h. 1505.066439: enqueue_task_dl: my dl_runtime is 5000000 >> >> and so the task will be throttled with 3657361 ns runtime available. >> >> As we can see, it is really breaking the density: >> >> 5ms / 7ms (.714285) < 3657361 / 4613027 (.792833) > For the runtime 5ms, deadline 7ms, 3657361 is the remaining runtime, so the actual runtime is > (5000000 - 3657361) = 1342639, and the actual density is 1342639 / 7000000 (.191806), The past density... we are not looking back here. > not break the limited density 5ms / 7ms (.714285). > > "3657361 / 4613027 (.792833)" means available density greater than limited density(.714285), > this means true dl_entity_overflow(), and can't meet the requirement in the current period any more, > thus need to throttle it until next period. This means that it has potential to break the rules, but it did not brake yet. It will break if it runs longer than: (dl_runtime / dl_deadline) * (deadline - now) By throttling the task too early, you will punish user-space, even if the user-space is not breaking the rules. I pointed a less pessimistic solution, but there are more possible solutions. Actually, from the theoretical standing point this is an open issue, that is, the perfect solution is unknown for self-suspending constrained deadline tasks. -- Daniel