Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp6844272rdb; Fri, 15 Dec 2023 09:40:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IFel02OFzdEqqYSSwXBwrKAeEx4kqxg/0KIuQWMgHsa2zYG0SS1oAAL5YXCKBZziOJUmt6k X-Received: by 2002:a17:907:c715:b0:a1d:551f:a197 with SMTP id ty21-20020a170907c71500b00a1d551fa197mr7507825ejc.82.1702662058309; Fri, 15 Dec 2023 09:40:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702662058; cv=none; d=google.com; s=arc-20160816; b=mV7/rITe8L8v0gos2PbMXBJA5K/oHczmOZPPctnVMjFJLVy73A8PdyzIPsA6kT1Boo xbyE9jp8Am7c/gXChUb+lBSBL5HA4hVNUQuEcM642LnmQ5wDsvmV80VK7RnfqQx0puXD aU59fFkcCikRtjweYnqo3RjItb3vrrHICXawVMwJMPHqbh6vo5i44SV7mc7jLFeOCXRW Hhtf3eLiOJJZDBzsEaSI7BgBxODJqhBGzVrvNOlrImj5Nw8AGmU2kpVKL0ZVilshWksC 4X66M2m7FQ+hKRuEnjjIuPv14JJGMwIki2H8BGxCB6jZrG5A5WkKfR/DvCCFntMoWmvS q/aQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :dkim-signature; bh=2xNwxrUoQHabToj/i/64zScVcUx7+IRz98c/crxfGeI=; fh=of7o24Li/hTzKjeBHbBM93VBOxwbV8MS8AxLlHMsnGU=; b=dC8sgqSQByJjJCyj85b13TKjCsSnWfZisogccHq6b+A/wRy13L5Xswrf8ZJdph705q kt3G65pneq4JluX0yMkmGZFcU2zv7IOuQp2u9GTjpG6n2s1t22dQHrW1u1xrC+XLl/1q iET2mA9me54EWInGznOox4CaFB7Kh9m/3eBNYNUdRMyRzAsb1TyY31ZQ8DV6Mn6cZJWc NvZv1fNeV3LJEDcVoP1AZD7qneKtfIB7IXuiZpumqIm025v8nrGqckCrc+pF4nkchMZF tUv5HkxOzkmI9Pnz3n1EGACuPMS7Texn+DQB347ZEZqHLMKZwXq+pCD4pN+E41nHcLD1 hJQw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bitbyteword.org header.s=google header.b=BVtp74CP; spf=pass (google.com: domain of linux-kernel+bounces-1433-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-1433-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id a26-20020a1709063a5a00b00a1fa4f852aesi4836460ejf.654.2023.12.15.09.40.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Dec 2023 09:40:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-1433-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@bitbyteword.org header.s=google header.b=BVtp74CP; spf=pass (google.com: domain of linux-kernel+bounces-1433-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-1433-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 4F8671F21038 for ; Fri, 15 Dec 2023 17:40:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4553A30119; Fri, 15 Dec 2023 17:40:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bitbyteword.org header.i=@bitbyteword.org header.b="BVtp74CP" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-ot1-f45.google.com (mail-ot1-f45.google.com [209.85.210.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D343330103 for ; Fri, 15 Dec 2023 17:40:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bitbyteword.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bitbyteword.org Received: by mail-ot1-f45.google.com with SMTP id 46e09a7af769-6d9f4eed60eso726565a34.1 for ; Fri, 15 Dec 2023 09:40:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitbyteword.org; s=google; t=1702662036; x=1703266836; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=2xNwxrUoQHabToj/i/64zScVcUx7+IRz98c/crxfGeI=; b=BVtp74CPxjNm5QaKeaSkn2w0WyrHXpP8EKKdjrCHGQgBiDjsFLPXLVku0qk2U0cBiz AlnmI+wgYFgMMt7Ma0YZAkQ/vVCah/5xqW5blOCxVr2c0E4dUAtCXZF7XmA5rZ1jcdzW pfzNyzpMr2qmGlRsVFGOJf8hZUnU4le3FcXa/VKqDaFs0023d3BWv8sfgmAfzE/bD6vj b0i77gmETNiZXcf0MIu10MPp6hNUuAxs/iXxZGmlXbTqz3oQKS6clrhAJEps52CqC6bj QMQVS7GbHbiATPIqi6JrGB89Dg88oUCaTv5EqeZ19hvHg6DtBiOctmtQHd/WlDbAiU/f /v1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702662036; x=1703266836; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=2xNwxrUoQHabToj/i/64zScVcUx7+IRz98c/crxfGeI=; b=muK4orRqS88zSon1ilVLexT2PADG3bVs88pfh/k7XziYTSyk7GpzCW4hYuZWof9phL 0ttCCV/kW6j7wHcw88fdzzY7wQ1Ejl6e1oBg+3ljhUzdsfg3lrhoqN6Eud9DfqbvvHeN 0fhpeIP7gE6uRts69a3cGcqga4Ig1kEqHrkAXDM28wEKWsFmigNhPLEBbbmcwfFrQYiD +t9yUZLP1Lfs4czX/IsXkUzJggRrghtoOvMX9qBcs3Uxys7Os7/fbH3QeyaUSKurrA6F eWbefHT7DmNiTxD7x1Q4Bf1uou0cCPqgHA1yU473VUiJtjK0/LsGDHopKJOD+iifWXbh oY0A== X-Gm-Message-State: AOJu0YwnFYmS6cnQHbzGfQ1eDkKH5yATZ1fhAC8ec1NDlMj7Fdx5iS2I ntNPe0Bw0Spn2jCW8YgP4ibuVcqc91GtSaoPbRhqww== X-Received: by 2002:a9d:7518:0:b0:6d9:ebaf:a5fa with SMTP id r24-20020a9d7518000000b006d9ebafa5famr11871714otk.54.1702662035860; Fri, 15 Dec 2023 09:40:35 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231214024727.3503870-1-vineeth@bitbyteword.org> In-Reply-To: From: Vineeth Remanan Pillai Date: Fri, 15 Dec 2023 12:40:24 -0500 Message-ID: Subject: Re: [RFC PATCH 0/8] Dynamic vcpu priority management in kvm To: Sean Christopherson Cc: Ben Segall , Borislav Petkov , Daniel Bristot de Oliveira , Dave Hansen , Dietmar Eggemann , "H . Peter Anvin" , Ingo Molnar , Juri Lelli , Mel Gorman , Paolo Bonzini , Andy Lutomirski , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Vitaly Kuznetsov , Wanpeng Li , Suleiman Souhlal , Masami Hiramatsu , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, Tejun Heo , Josh Don , Barret Rhoden , David Vernet , Joel Fernandes Content-Type: text/plain; charset="UTF-8" [...snip...] > > > IMO, this has a significantly lower ceiling than what is possible with something > > > like sched_ext, e.g. it requires a host tick to make scheduling decisions, and > > > because it'd require a kernel-defined ABI, would essentially be limited to knobs > > > that are broadly useful. I.e. every bit of information that you want to add to > > > the guest/host ABI will need to get approval from at least the affected subsystems > > > in the guest, from KVM, and possibly from the host scheduler too. That's going > > > to make for a very high bar. > > > > > Just thinking out loud, The ABI could be very simple to start with. A > > shared page with dedicated guest and host areas. Guest fills details > > about its priority requirements, host fills details about the actions > > it took(boost/unboost, priority/sched class etc). Passing this > > information could be in-band or out-of-band. out-of-band could be used > > by dedicated userland schedulers. If both guest and host agrees on > > in-band during guest startup, kvm could hand over the data to > > scheduler using a scheduler callback. I feel this small addition to > > kvm could be maintainable and by leaving the protocol for interpreting > > shared memory to guest and host, this would be very generic and cater > > to multiple use cases. Something like above could be used both by > > low-end devices and high-end server like systems and guest and host > > could have custom protocols to interpret the data and make decisions. > > > > In this RFC, we have a miniature form of the above, where we have a > > shared memory area and the scheduler callback is basically > > sched_setscheduler. But it could be made very generic as part of ABI > > design. For out-of-band schedulers, this call back could be setup by > > sched_ext, a userland scheduler and any similar out-of-band scheduler. > > > > I agree, getting a consensus and approval is non-trivial. IMHO, this > > use case is compelling for such an ABI because out-of-band schedulers > > might not give the desired results for low-end devices. > > > > > > Having a formal paravirt scheduling ABI is something we would want to > > > > pursue (as I mentioned in the cover letter) and this could help not > > > > only with latencies, but optimal task placement for efficiency, power > > > > utilization etc. kvm's role could be to set the stage and share > > > > information with minimum delay and less resource overhead. > > > > > > Making KVM middle-man is most definitely not going to provide minimum delay or > > > overhead. Minimum delay would be the guest directly communicating with the host > > > scheduler. I get that convincing the sched folks to add a bunch of paravirt > > > stuff is a tall order (for very good reason), but that's exactly why I Cc'd the > > > sched_ext folks. > > > > > As mentioned above, guest directly talking to host scheduler without > > involving kvm would mean an out-of-band scheduler and the > > effectiveness depends on how fast the scheduler gets to run. > > No, the "host scheduler" could very well be a dedicated in-kernel paravirt > scheduler. It could be a sched_ext BPF program that for all intents and purposes > is in-band. > Yes, if the scheduler is on the same physical cpu and acts on events like VMEXIT/VMENTRY etc, this would work perfectly. Having the VM talk to a scheduler running on another cpu and making decisions might not be quick enough when we do not have enough cpu capacity. > You are basically proposing that KVM bounce-buffer data between guest and host. > I'm saying there's no _technical_ reason to use a bounce-buffer, just do zero copy. > I was also meaning zero copy only. The help required from the kvm side is: - Pass the address of the shared memory to bpf programs/scheduler once the guest sets it up. - Invoke scheduler registered callbacks on events like VMEXIT, VEMENTRY, interrupt injection etc. Its the job of guest and host paravirt scheduler to interpret the shared memory contents and take actions. I admit current RFC doesn't strictly implement hooks and callbacks - it calls sched_setscheduler in place of all callbacks that I mentioned above. I guess this was your strongest objection. As you mentioned in the reply to Joel, if it is fine for kvm to allow hooks into events (VMEXIT, VMENTRY, interrupt injection etc) then, it makes it easier to develop the ABI I was mentioning and have the hooks implemented by a paravirt scheduler. We shall re-design the architecture based on this for v2. Thanks, Vineeth