Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp5727783rdb; Wed, 13 Dec 2023 18:47:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IF1i1W8GSrFDTfMVBOMo3gHDPRuBs5/48tQNTrzQgRfJ69mn5rLCW+mSxBVtJcSKXhYK+YZ X-Received: by 2002:a05:6358:441e:b0:170:4836:cd44 with SMTP id z30-20020a056358441e00b001704836cd44mr11583596rwc.50.1702522059447; Wed, 13 Dec 2023 18:47:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702522059; cv=none; d=google.com; s=arc-20160816; b=O7DvDUmgy319U5l4PB/tTpfWT7ZJc3GX5bHoGaE8x4/NfsjHMSUeW5OY7HWTDXKryu I+IDLG+mNvTsUj2hrIpyfjRkCuBuLg6sDslA66UQCJ5ub/nO8ofXf0Az+LawAOl06yOA J+48Bjk+MnlCQUAhwtTXmoooTHxaw7ihs/wshF0nSQp5riff0jjlKKiK7Hq85LY57r5p mS5pcsSF7FLVZEfwE0UlR+bZAoYB1v7n2JLA2yjY05s+3QaARcATX1VOGyOx1DNXSitt SAqFfi3/VzLBKKT0fqU6NZcmGXc/J67sCvvlZO1IqcTH2wTUAgYs9+pDgU7ws/ohoFE4 o/pw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=8RE5hEVhTsOaUL4lmbv8vOX8tm/L/gi42RzGRxsUiRw=; fh=BpsxSP1qGH0Fjh8zNyA95MhpgH6ecrH/4eZuGPwVJ+g=; b=SI20PQEMMp4o/XEF5C288vl/3TcKwF2xtV6tNU4Fy+tPnJsL5rBBsBHD3d9Yj5pjJn yLonQA8S0XySjLg2ihOae08dIuC7Tsh5xoUvG/ZGiREEZgKfQMqZJ0C9HJq9SdYiQFyN NZuz7d2aGfmW0J6vENbLkR/p3fhoH4EFq2engLR5248aDakEs1xJjkm+wTQACGo9Qj1I ap+2cgSzYrGPz344MS2af/Q+XgRZ51K5lY1JzfFsLmy2ej1epn6JnhkmYL0+tdYj6Yet mbNyOg4jvXI0RQ9jxnjgY8DemVtudJEo9ZiQK1xABpTvAgVoCydRR52JTMom33Mc/yXX cNKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bitbyteword.org header.s=google header.b=QrpamL3w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id q89-20020a17090a17e200b002866c761c77si718166pja.176.2023.12.13.18.47.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Dec 2023 18:47:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@bitbyteword.org header.s=google header.b=QrpamL3w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 0E18880C65FC; Wed, 13 Dec 2023 18:47:38 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229739AbjLNCr2 (ORCPT + 99 others); Wed, 13 Dec 2023 21:47:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229572AbjLNCr1 (ORCPT ); Wed, 13 Dec 2023 21:47:27 -0500 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABBBDCF for ; Wed, 13 Dec 2023 18:47:33 -0800 (PST) Received: by mail-qt1-x835.google.com with SMTP id d75a77b69052e-425baafa3c3so34474051cf.0 for ; Wed, 13 Dec 2023 18:47:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitbyteword.org; s=google; t=1702522053; x=1703126853; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=8RE5hEVhTsOaUL4lmbv8vOX8tm/L/gi42RzGRxsUiRw=; b=QrpamL3wvwZP+fQoduYmb9uCD+tnBlJ58TP2mcZWGL8UfKj9HBjKns6C0Qs/9ISERX WP9H+I8xfBD87j3QeakY8pPtb2QMwl4Zpvv5EHon4tfmYKhnb7hT5lSYrLwWyvFDEIlA ZOm2rqtw4G3ud2mWmMgK0gZ4Q1LW9CRMc1DP6f6RF/6lIw+Vlg7qmgVTNUSoXh5Ecrbw 3R6kIW9//VgrPgRzpr9mnthifDdfErN82GfdK8eNhx4Xzlvs5kq0cXxS615LM6Pr7TD4 F7TVjAhiQtvpMrjUMeT5rxFJExh+pQdpm77mi/OLbTh6vm6M7pcUQ+/5H/xAj8NR6ohC tyGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702522053; x=1703126853; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8RE5hEVhTsOaUL4lmbv8vOX8tm/L/gi42RzGRxsUiRw=; b=Qt+nv9DrYnUfyMuZ2JFqNQG3EqnLBKLheQLIakg2uGgW2pwfwqbLi/gB/YmJeGoPW/ 0J2/bFaddd3NoxLiPTKs5+W3DKqWsRanfZ5XGpK6qyOZIai34Ce4Wo40iwFmjFXAbwvi 0WqcUFiDWuY98VUH1NEpyGZjklGKWMidvD8QL/kYjF6tV/JOqB49GCfYO3Hgcm9uRJ3Z IiNVNZreahjRurXq1yXk1khGDSVz9jhje+9eHXYWakPivpgyREEvRYL6SvK3dceL6GDG U4zf5/PrYvKJggjFFm+aQkhUvl2LFSOLzGe+H+zf8Pl93zNDPsQQKpGh/ibeoH83nd4M O36g== X-Gm-Message-State: AOJu0YwRXpHO4EMvl3HjsvLuHhOd1GUj3yMCJehcpn8oFSddGtF0s2ct EG7i9g8UYJWBCd9MHMTNS56ifw== X-Received: by 2002:a05:622a:24a:b0:41e:26a1:7b3e with SMTP id c10-20020a05622a024a00b0041e26a17b3emr13003580qtx.29.1702522052617; Wed, 13 Dec 2023 18:47:32 -0800 (PST) Received: from vinp3lin.lan (c-73-143-21-186.hsd1.vt.comcast.net. [73.143.21.186]) by smtp.gmail.com with ESMTPSA id fh3-20020a05622a588300b00425b356b919sm4240208qtb.55.2023.12.13.18.47.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Dec 2023 18:47:31 -0800 (PST) From: "Vineeth Pillai (Google)" To: Ben Segall , Borislav Petkov , Daniel Bristot de Oliveira , Dave Hansen , Dietmar Eggemann , "H . Peter Anvin" , Ingo Molnar , Juri Lelli , Mel Gorman , Paolo Bonzini , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Steven Rostedt , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Vitaly Kuznetsov , Wanpeng Li Cc: "Vineeth Pillai (Google)" , Suleiman Souhlal , Masami Hiramatsu , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [RFC PATCH 0/8] Dynamic vcpu priority management in kvm Date: Wed, 13 Dec 2023 21:47:17 -0500 Message-ID: <20231214024727.3503870-1-vineeth@bitbyteword.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 13 Dec 2023 18:47:38 -0800 (PST) Double scheduling is a concern with virtualization hosts where the host schedules vcpus without knowing whats run by the vcpu and guest schedules tasks without knowing where the vcpu is physically running. This causes issues related to latencies, power consumption, resource utilization etc. An ideal solution would be to have a cooperative scheduling framework where the guest and host shares scheduling related information and makes an educated scheduling decision to optimally handle the workloads. As a first step, we are taking a stab at reducing latencies for latency sensitive workloads in the guest. This series of patches aims to implement a framework for dynamically managing the priority of vcpu threads based on the needs of the workload running on the vcpu. Latency sensitive workloads (nmi, irq, softirq, critcal sections, RT tasks etc) will get a boost from the host so as to minimize the latency. The host can proactively boost the vcpu threads when it has enough information about what is going to run on the vcpu - fo eg: injecting interrupts. For rest of the case, guest can request boost if the vcpu is not already boosted. The guest can subsequently request unboost after the latency sensitive workloads completes. Guest can also request a boost if needed. A shared memory region is used to communicate the scheduling information. Guest shares its needs for priority boosting and host shares the boosting status of the vcpu. Guest sets a flag when it needs a boost and continues running. Host reads this on next VMEXIT and boosts the vcpu thread. For unboosting, it is done synchronously so that host workloads can fairly compete with guests when guest is not running any latency sensitive workload. This RFC is x86 specific. This is mostly feature complete, but more work needs to be done on the following areas: - Use of paravirt ops framework. - Optimizing critical paths for speed, cache efficiency etc - Extensibility of this idea for sharing more scheduling information to make better educated scheduling decisions in guest and host. - Prevent misuse by rogue/buggy guest kernels Tests ------ Real world workload on chromeos shows considerable improvement. Audio and video applications running on low end devices experience high latencies when the system is under load. This patch helps in mitigating the audio and video glitches caused due to scheduling latencies. Following are the results from oboetester app on android vm running in chromeos. This app tests for audio glitches. ------------------------------------------------------- | | Noload || Busy | | Buffer Size |---------------------------------------- | | Vanilla | patches || Vanilla | Patches | ------------------------------------------------------- | 96 (2ms) | 20 | 4 || 1365 | 67 | ------------------------------------------------------- | 256 (4ms) | 3 | 1 || 524 | 23 | ------------------------------------------------------- | 512 (10ms) | 0 | 0 || 25 | 24 | ------------------------------------------------------- Noload: Tests run on idle system Busy: Busy system simulated by Speedometer benchmark The test shows considerable reduction in glitches especially with smaller buffer sizes. Following are data collected from few micro benchmark tests. cyclictest was run on a VM to measure the latency with and without the patches. We also took a baseline of the results with all vcpus statically boosted to RT(chrt). This is to observe the difference between dynamic and static boosting and its effect on host as well. Cyclictest on guest is to observe the effect of the patches on guest and cyclictest on host is to see if the patch affects workloads on the host. cyclictest is run on both host and guest. cyclictest cmdline: "cyclictest -q -D 90s -i 500 -d $INTERVAL" where $INTERVAL used was 500 and 1000 us. Host is Intel N4500 4C/4T. Guest also has 4 vcpus. In the following tables, Vanilla: baseline: vanilla kernel Dynamic: the patches applied Static: baseline: all vcpus statically boosted to RT(chrt) Idle tests ---------- The Host is idle and cyclictest on host and guest. ----------------------------------------------------------------------- | | Avg Latency(us): Guest || Avg Latency(us): Host | ----------------------------------------------------------------------- | Interval | vanilla | dynamic | static || vanilla | dynamic | static | ----------------------------------------------------------------------- | 500 | 9 | 9 | 10 || 5 | 3 | 3 | ----------------------------------------------------------------------- | 1000 | 34 | 35 | 35 || 5 | 3 | 3 | ---------------------------------------------------------------------- ----------------------------------------------------------------------- | | Max Latency(us): Guest || Max Latency(us): Host | ----------------------------------------------------------------------- | Interval | vanilla | dynamic | static || vanilla | dynamic | static | ----------------------------------------------------------------------- | 500 | 1577 | 1433 | 140 || 1577 | 1526 | 15969 | ----------------------------------------------------------------------- | 1000 | 6649 | 765 | 204 || 697 | 174 | 2444 | ----------------------------------------------------------------------- Busy Tests ---------- Here the a busy host was simulated using stress-ng and cyclictest was run on both host and guest. ----------------------------------------------------------------------- | | Avg Latency(us): Guest || Avg Latency(us): Host | ----------------------------------------------------------------------- | Interval | vanilla | dynamic | static || vanilla | dynamic | static | ----------------------------------------------------------------------- | 500 | 887 | 21 | 25 || 6 | 6 | 7 | ----------------------------------------------------------------------- | 1000 | 6335 | 45 | 38 || 11 | 11 | 14 | ---------------------------------------------------------------------- ----------------------------------------------------------------------- | | Max Latency(us): Guest || Max Latency(us): Host | ----------------------------------------------------------------------- | Interval | vanilla | dynamic | static || vanilla | dynamic | static | ----------------------------------------------------------------------- | 500 | 216835 | 13978 | 1728 || 2075 | 2114 | 2447 | ----------------------------------------------------------------------- | 1000 | 199575 | 70651 | 1537 || 1886 | 1285 | 27104 | ----------------------------------------------------------------------- These patches are rebased on 6.5.10. Patches 1-4: Implementation of the core host side feature Patch 5: A naive throttling mechanism for limiting boosted duration for preemption disabled state in the guest. This is a placeholder for the throttling mechanism for now and would need to be implemented differently Patch 6: Enable/disable tunables - global and per-vm Patches 7-8: Implementation of the code guest side feature --- Vineeth Pillai (Google) (8): kvm: x86: MSR for setting up scheduler info shared memory sched/core: sched_setscheduler_pi_nocheck for interrupt context usage kvm: x86: vcpu boosting/unboosting framework kvm: x86: boost vcpu threads on latency sensitive paths kvm: x86: upper bound for preemption based boost duration kvm: x86: enable/disable global/per-guest vcpu boost feature sched/core: boost/unboost in guest scheduler irq: boost/unboost in irq/nmi entry/exit and softirq arch/x86/Kconfig | 13 +++ arch/x86/include/asm/kvm_host.h | 69 ++++++++++++ arch/x86/include/asm/kvm_para.h | 7 ++ arch/x86/include/uapi/asm/kvm_para.h | 43 ++++++++ arch/x86/kernel/kvm.c | 16 +++ arch/x86/kvm/Kconfig | 12 +++ arch/x86/kvm/cpuid.c | 2 + arch/x86/kvm/i8259.c | 2 +- arch/x86/kvm/lapic.c | 8 +- arch/x86/kvm/svm/svm.c | 2 +- arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/x86.c | 154 +++++++++++++++++++++++++++ include/linux/kvm_host.h | 56 ++++++++++ include/linux/sched.h | 23 ++++ include/uapi/linux/kvm.h | 5 + kernel/entry/common.c | 39 +++++++ kernel/sched/core.c | 127 +++++++++++++++++++++- kernel/softirq.c | 11 ++ virt/kvm/kvm_main.c | 150 ++++++++++++++++++++++++++ 19 files changed, 730 insertions(+), 11 deletions(-) -- 2.43.0