Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp1024961rdf; Sat, 4 Nov 2023 04:01:17 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHKtfpLlcbFtbqsKFDFjXgUbGq00LbcFkezutRmDIY+Ura4qteV9VHKAfftSmTDiUCDQVDM X-Received: by 2002:a05:6820:1c4b:b0:571:aceb:26c8 with SMTP id cm11-20020a0568201c4b00b00571aceb26c8mr22854832oob.3.1699095677246; Sat, 04 Nov 2023 04:01:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699095677; cv=none; d=google.com; s=arc-20160816; b=RAYiz39AJ9FFqq8eCUNn74eQoelhvV0Rr72kQ43xfLVqd7X4htbHMiQikU3eg8YYyQ x0Bp7IrXRn5SM6CY8b7VRQvkl1teWckBdQb/LILNhYJra1eL24aJ2SyYLeehs19Z78W3 6oU4OgxfddyL86dYl+xaTdYRkOYc90bCrm2i57xs1+KOC92sTiYifyjvfNlgTAUWCL+G H1ebiteG0OA9KgvJGUR5os6Fd5GLaDdt/7b3ggQOo7SQz98ClwS+2ugt5RT3FvTQEKUk y1lo8isTUcwIkLhYZUTnGUCA5bBMzXNQ7MIHTA/s2Knmf5GLPk5xE61GKbAV81z24/Dx 4Rtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=A69ojJGkNmVns1zpdF/4Kib7TEJ7Ib074nCHMxgViA8=; fh=hclUZirF6ioy5RJ9ji4zprhkXEWhJA3xztaBC4vzfLk=; b=F9LTL6q1R0tpuml8u+oV7qo8Flz7V1UPqAfV2Qt6lrjvGjsbcMZ3+R+aLwYIPf9RGG Ygzi413broTSuP4C/kOJx5WqKu35vA+9IBPhEMAsM1+0xFlF07HXQLbgwf6SdyAZeR7F 9yqhIgaPbcJERCh9B8nwz55HcaBtjg0t5kCw+6c4XVqZlZOfqqrCgSEiNQj3QwpC2hQH D1PuXrZh2Woursi1CZ6X0AAbH7xzbtaT8nG5q18zjshZkTFJjI8OGESkjgzMuJzQXPz6 1L+mDToUO/ZA1AWeHQYKsbYHK7dqyRKuZs0BAVV1dyPJ/Adm8sgAX+C0IJ5r9xvkyJqo PhLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N9sQaMOD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id b5-20020a4aac85000000b0057babfeb85dsi1279090oon.46.2023.11.04.04.01.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Nov 2023 04:01:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N9sQaMOD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 51DD8808FBF4; Sat, 4 Nov 2023 04:01:02 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232079AbjKDK76 (ORCPT + 99 others); Sat, 4 Nov 2023 06:59:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232202AbjKDK7s (ORCPT ); Sat, 4 Nov 2023 06:59:48 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1D67D48 for ; Sat, 4 Nov 2023 03:59:33 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 57929C433C7; Sat, 4 Nov 2023 10:59:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699095573; bh=Z44CJflOtr1aWOSpGdoxVEyhwROBxLY72GaO/vu6uHU=; h=From:To:Cc:Subject:Date:From; b=N9sQaMODf/jDs6nHnKQ8YpazdBBlgL10frzc7yy8xtAURNdOu/NsXcDR5+CfJLdcj ZFQtFfwbmpmjcSzsPVIiJaHfGri3kL6ffzKBV4VJJU0IPDmq+rHd8kp9lA13JG9MzV C5nlqNDfI4U5Of5u02haWFO1T2yT+ECcEDzY6DP5QHVbqm5r4uYDPwSau1Jw2mHxFz W17aZWgsPSKcaywLy0b4YY++gtzEAAx9p4zCrJthjFG4+hWEMfxmC8jt0tR10QJ/mm wlrOgXsm1QIq+fLyXqHQL75H/ZMJNNehSSAcpqBF9NwwYNqUd7OAFlcyCAt7T91i7c YEVEOeijtJoFA== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld Subject: [PATCH v5 0/7] SCHED_DEADLINE server infrastructure Date: Sat, 4 Nov 2023 11:59:17 +0100 Message-Id: X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Sat, 04 Nov 2023 04:01:02 -0700 (PDT) This is v5 of Peter's SCHED_DEADLINE server infrastructure implementation [1]. SCHED_DEADLINE servers can help fixing starvation issues of low priority tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU cycles. Today we have RT Throttling; DEADLINE servers should be able to replace and improve that. In the v1 there was discussion raised about the consequence of using deadline based servers on the fixed-priority workloads. For a demonstration here is the baseline of timerlat scheduling latency as-is, with kernel build background workload: # rtla timerlat top -u -d 10m --------------------- %< ------------------------ Timer Latency 0 01:42:24 | IRQ Timer Latency (us) | Thread Timer Latency (us) | Ret user Timer Latency (us) CPU COUNT | cur min avg max | cur min avg max | cur min avg max 0 #6143559 | 0 0 0 92 | 2 1 3 98 | 4 1 5 100 1 #6143559 | 1 0 0 97 | 7 1 5 101 | 9 1 7 103 2 #6143559 | 0 0 0 88 | 3 1 5 95 | 5 1 7 99 3 #6143559 | 0 0 0 90 | 6 1 5 103 | 10 1 7 126 4 #6143558 | 1 0 0 81 | 7 1 4 86 | 9 1 7 90 5 #6143558 | 0 0 0 74 | 3 1 5 79 | 4 1 7 83 6 #6143558 | 0 0 0 83 | 2 1 5 89 | 3 0 7 108 7 #6143558 | 0 0 0 85 | 3 1 4 126 | 5 1 6 137 --------------------- >% ------------------------ And this is the same tests with DL server activating without any delay: --------------------- %< ------------------------ 0 00:10:01 | IRQ Timer Latency (us) | Thread Timer Latency (us) | Ret user Timer Latency (us) CPU COUNT | cur min avg max | cur min avg max | cur min avg max 0 #579147 | 0 0 0 54 | 2 1 52 61095 | 2 2 56 61102 1 #578766 | 0 0 0 83 | 2 1 49 55824 | 3 2 53 55831 2 #578559 | 0 0 1 59 | 2 1 50 55760 | 3 2 54 55770 3 #578318 | 0 0 0 76 | 2 1 49 55751 | 3 2 54 55760 4 #578611 | 0 0 0 64 | 2 1 49 55811 | 3 2 53 55820 5 #578347 | 0 0 1 40 | 2 1 50 56121 | 3 2 55 56133 6 #578938 | 0 0 1 75 | 2 1 49 55755 | 3 2 53 55764 7 #578631 | 0 0 1 36 | 3 1 51 55528 | 4 2 55 55541 --------------------- >% ------------------------ The problem with DL server only implementation is that FIFO tasks might suffer preemption from NORMAL even when spare CPU cycles are available. In fact, fair deadline server is enqueued right away when NORMAL tasks wake up and they are first scheduled by the server, thus potentially preempting a well behaving FIFO task. This is of course not ideal. We had discussions about it, and one of the possibilities would be using a different scheduling algorithm for this. But IMHO that is an overkill. Juri and I discussed this and though about delaying the server activation for the 0-lag time, thus enabling the server only if the fair scheduler is about to starve. The patch 6/7 adds the possibility to defer the server start to the (absolute deadline - runtime) point in time. This is achieved by enqueuing the dl server throttled, with a next replenishing time set to activate the server at (absolute deadline - runtime). Differently from v4, now the server is enqueued with the runtime replenished. As the fair scheduler runs without boost, its runtime is consumed. If the fair server has its runtime before the 0-laxity time, the a new period is set, and the timer armed for the new (deadline - runtime). The patch 7/7 add a per_rq interface for the knobs: fair_server_runtime (950 ms) fair_server_period (1s) fair_server_defer (enabled) With defer enabled on CPUs [0:3], the results get better, having a behavior similar to the one we have with the rt throttling. --------------------- %< ------------------------ Timer Latency 0 00:10:01 | IRQ Timer Latency (us) | Thread Timer Latency (us) | Ret user Timer Latency (us) CPU COUNT | cur min avg max | cur min avg max | cur min avg max 0 #599979 | 0 0 0 64 | 4 1 4 67 | 6 1 5 69 1 #599979 | 0 0 1 17 | 6 1 5 50 | 10 2 7 71 2 #599984 | 1 0 1 22 | 4 1 5 78 | 5 2 7 107 3 #599986 | 0 0 1 72 | 7 1 5 79 | 10 2 7 82 4 #581580 | 1 0 1 37 | 6 1 38 52797 | 10 2 41 52805 5 #583270 | 1 0 1 41 | 9 1 36 52617 | 12 2 38 52623 6 #581240 | 0 0 1 25 | 7 1 39 52870 | 11 2 41 52876 7 #581208 | 0 0 1 69 | 6 1 39 52917 | 9 2 41 52923 --------------------- >% ------------------------ Here are some osnoise measurement, with osnoise threads running as FIFO:1 with different setups (defer enabled): - CPU 2 isolated - CPU 3 isolated shared with a CFS busy loop task - CPU 8 non-isolated - CPU 9 non-isolated shared with a CFS busy loop task --------------------- %< ------------------------ ~# pgrep ktimer | while read pid; do chrt -p -f 2 $pid; done # for RT kernel ~# sysctl kernel.sched_rt_runtime_us=-1 ~# tuna isolate -c 2 ~# tuna isolate -c 3 ~# taskset -c 3 ./f & ~# taskset -c 9 ./f & ~# osnoise -P f:1 -c 2,3,8,9 -T 1 -d 10m -H 1 Operating System Noise duration: 0 00:10:00 | time is in us CPU Period Runtime Noise % CPU Aval Max Noise Max Single HW NMI IRQ Softirq Thread 2 #599 599000000 178 99.99997 18 2 0 0 270 0 0 3 #598 598054434 31351553 94.75774 104442 104442 0 0 2837523 0 1794 8 #599 599000001 567456 99.90526 3260 2375 2 89 620490 0 13539 9 #598 598021196 31742537 94.69207 71707 53357 0 90 3411023 0 1762 --------------------- >% ------------------------ the system runs fine! - no crashes (famous last words) - FIFO property is kept - per cpu interface because it is more flexible - and to detach this from the throttling concept. Global is broken, but it will > /dev/null. TODO: - Move rt throttling code to RT_GROUP_SCHED for now (then send it to the same place as global then). Changes from V4: - Enable the server when nr fair tasks is > 0 (peter) - Consume runtime if the zerolax server is not boosted (peterz) - Adjust interface to deal with admission control (peterz) - Rebased to 6.6 Changes from V3: - Add the defer server (Daniel) - Add an per rq interface (Daniel with peter's feedback) - Add an option not defer the server (for Joel) - Typos and 1-liner fixes (Valentin, Luca, Peter) - Fair scheduler running on dl server do not account as RT task (Daniel) - Changed the condition to enable the server (RT & fair tasks) (Daniel) Changes from v2: - Refactor/rephrase/typos changes - Defferable server using throttling - The server starts when RT && Fair tasks are enqueued - Interface with runtime/period/defer option Changes from v1: - rebased on 6.4-rc1 tip/sched/core Daniel Bristot de Oliveira (2): sched/deadline: Deferrable dl server sched/fair: Fair server interface Peter Zijlstra (5): sched: Unify runtime accounting across classes sched/deadline: Collect sched_dl_entity initialization sched/deadline: Move bandwidth accounting into {en,de}queue_dl_entity sched/deadline: Introduce deadline servers sched/fair: Add trivial fair server include/linux/sched.h | 26 +- kernel/sched/core.c | 23 +- kernel/sched/deadline.c | 671 ++++++++++++++++++++++++++++----------- kernel/sched/debug.c | 202 ++++++++++++ kernel/sched/fair.c | 87 ++++- kernel/sched/rt.c | 15 +- kernel/sched/sched.h | 56 +++- kernel/sched/stop_task.c | 13 +- 8 files changed, 847 insertions(+), 246 deletions(-) -- 2.40.1