在 2022/3/13 17:02, Peter Zijlstra 写道:
> On Sun, Mar 13, 2022 at 01:37:37PM +0800, chenying wrote:
>> 在 2022/3/12 20:03, Peter Zijlstra 写道:
>>> On Fri, Mar 11, 2022 at 03:58:47PM +0800, chenying wrote:
>>>> We add a time offset to the se->vruntime when the idle sched_entity
>>>> is enqueued, so that the idle entity will always be on the right of
>>>> the non-idle in the runqueue. This can allow non-idle tasks to be
>>>> selected and run before the idle.
>>>>
>>>> A use-case is that sched_idle for background tasks and non-idle
>>>> for foreground. The foreground tasks are latency sensitive and do
>>>> not want to be disturbed by the background. It is well known that
>>>> the idle tasks can be preempted by the non-idle tasks when waking up,
>>>> but will not distinguish between idle and non-idle when pick the next
>>>> entity. This may cause background tasks to disturb the foreground.
>>>>
>>>> Test results as below:
>>>>
>>>> ~$ ./loop.sh &
>>>> [1] 764
>>>> ~$ chrt -i 0 ./loop.sh &
>>>> [2] 765
>>>> ~$ taskset -p 04 764
>>>> ~$ taskset -p 04 765
>>>>
>>>> ~$ top -p 764 -p 765
>>>> top - 13:10:01 up 1 min, 2 users, load average: 1.30, 0.38, 0.13
>>>> Tasks: 2 total, 2 running, 0 sleeping, 0 stopped, 0 zombie
>>>> %Cpu(s): 12.5 us, 0.0 sy, 0.0 ni, 87.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0
>>>> st
>>>> KiB Mem : 16393492 total, 16142256 free, 111028 used, 140208 buff/cache
>>>> KiB Swap: 385836 total, 385836 free, 0 used. 16037992 avail Mem
>>>>
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>> 764 chenyin+ 20 0 12888 1144 1004 R 100.0 0.0 1:05.12 loop.sh
>>>> 765 chenyin+ 20 0 12888 1224 1080 R 0.0 0.0 0:16.21 loop.sh
>>>>
>>>> The non-idle process (764) can run at 100% and without being disturbed by
>>>> the idle process (765).
>>>
>>> Did you just do a very complicated true idle time scheduler, with all
>>> the problems that brings?
>>
>> When colocating CPU-intensive jobs with latency-sensitive services can
>> improve CPU utilization but it is difficult to meet the stringent
>> tail-latency requirements of latency-sensitive services. We use a true idle
>> time scheduler for CPU-intensive jobs to minimize the impact on
>> latency-sensitive services.
>
> Hard NAK on any true idle-time scheduler until you make the whole kernel
> immune to lock holder starvation issues.
If I set the sched_idle_vruntime_offset to a relatively small value
(e.g. 10 minutes), can this issues be avoided?
On Sun, Mar 13, 2022 at 3:07 AM chenying <[email protected]> wrote:
>
> If I set the sched_idle_vruntime_offset to a relatively small value
> (e.g. 10 minutes), can this issues be avoided?
That's still long enough to cause lockups.
Is the issue that you have a large number of sched_idle entities, and
the occasional latency sensitive thing that wakes up for a short
duration? Have you considered approaching this from the other
direction (ie. if we have a latency sensitive thing wake onto a cpu
running only sched idle stuff, we could change entity placement to
position the latency sensitive thing further left on the timeline,
akin to !GENTLE_FAIR_SLEEPERS).