Received: by 2002:a05:7412:5112:b0:fa:6e18:a558 with SMTP id fm18csp1467520rdb; Wed, 24 Jan 2024 17:09:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IFXHgblpILqAQIkBKYzW+NIIP29O0kIbuQbB44UoLyFngdppA2/v16DBGE2Kqw5aOXxLCQ6 X-Received: by 2002:a17:902:eb45:b0:1d2:eb39:afa7 with SMTP id i5-20020a170902eb4500b001d2eb39afa7mr220465pli.138.1706144981481; Wed, 24 Jan 2024 17:09:41 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706144981; cv=pass; d=google.com; s=arc-20160816; b=c0l7tl+FCI91GCP1/LJFJg/oSGYFpka06uFqS8UUvyiwUjODtHOVxEBiDDgUBlTYFm s+oDAgPfveTl/cD2fSHAB3LEeVZw9M9nprMr3WLnfSYjJpWCjD6EQuU1QX84NRYDX/OX DPrYLuSlObOPZRcrEj4ly8L3hdq6e+7LCWvz274gp+SNWX1drSFcMrUVNBLbpNWxCOgT x6+FfxzKrm7RtK5paUJXV4m/MUdf/Nvpv6CIqGwEKHbAaPIE2RviLrowNWVwJCmN+zu6 Wi2Tx7KmM7DLQ54v5U3IQfanDOqwQ6tjpRaCyQtCKPwP8MgE1SD4ExiTV2BswURznP1h rErw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=PiUW1kySFY5d/pkzoDzjFVz4adphI3+IFsUwCl7g0Cw=; fh=MPn11PjsBk8vDcIg6B7X50yjHYCZMv75PgqPGPD5hdM=; b=v9+CAN4uh3ygzLWZx+8MJaLf2e/pSzcbJb+kIJE/JH2ixT9AMgiCdN/3S4kpXI6xap a2yXD855V1cuoBfZdq7ixxBghDAzAZ5T1qF46gu9n/Q7iPIg8ypEROaqpK/1yTvpshVU mJgfk82UJriQQ6wylYHLIhqLsW+9FqJ62cfLf3n9GSUqlQZaIXDo0a7hqedAwvS3Uf1/ FkvcVjqGMAWZsc1P4+8kABnd6R0K2B4LcZlkenWX1g3nuzDzD7P0wCVBCVgSYpBO8pX8 4kKbGPkNIoIDQTXOXXF4OEURmd3VrfmnmLUeWnZebZSCgcoRu8o69d8SEF7EJHb9ALUV eTXA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=M1limboU; arc=pass (i=1 spf=pass spfdomain=joelfernandes.org dkim=pass dkdomain=joelfernandes.org); spf=pass (google.com: domain of linux-kernel+bounces-37848-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-37848-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id c5-20020a170902d48500b001d76a322667si4443847plg.72.2024.01.24.17.09.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jan 2024 17:09:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-37848-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=M1limboU; arc=pass (i=1 spf=pass spfdomain=joelfernandes.org dkim=pass dkdomain=joelfernandes.org); spf=pass (google.com: domain of linux-kernel+bounces-37848-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-37848-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 181A3284730 for ; Thu, 25 Jan 2024 01:09:41 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BA4F96FCB; Thu, 25 Jan 2024 01:09:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=joelfernandes.org header.i=@joelfernandes.org header.b="M1limboU" Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D230E538D for ; Thu, 25 Jan 2024 01:09:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706144952; cv=none; b=lYIHhjFZnAS4NZ6ftULSoJtaLEeqOBdXKXGBGCPmPmgWLHPYqidhVHNL8d3CT/BM5qQAqwdsSyXkmqPsBE5imAohKOJGlBKtoxMzA3Ty59FV4BN9xMBH6SP3rPbAxOVM5oVKLbQRtvzb3n/B5cEcS0tz/MxG7Q3y6EVUjXu6WnU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706144952; c=relaxed/simple; bh=XfnH0XnxBRxSw8s+Kmggue61fl8TjZsRhAk9ljaDQ/o=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=ReWljtIC0A0S70W5a74R59BO3Y+tWEtUIB6/6fgqKsPSADWv/7SxrWxam5Zxz0eyRSJh3eRf30Kv+kluiPY0/7IaItdvtlPy/2kktwjJEFLSJ4/DB0WHWgNLnWQk2q2irMG8nNyffsM+YRT39JDFrePid0q1tSrmv2BqMn4Puhw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=joelfernandes.org; spf=pass smtp.mailfrom=joelfernandes.org; dkim=pass (1024-bit key) header.d=joelfernandes.org header.i=@joelfernandes.org header.b=M1limboU; arc=none smtp.client-ip=209.85.208.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=joelfernandes.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=joelfernandes.org Received: by mail-lj1-f175.google.com with SMTP id 38308e7fff4ca-2cf2a381b86so12919301fa.0 for ; Wed, 24 Jan 2024 17:09:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1706144949; x=1706749749; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PiUW1kySFY5d/pkzoDzjFVz4adphI3+IFsUwCl7g0Cw=; b=M1limboU6m+53976lC325ZeB0bSQMABBxqExECk6McTgXWGObUxu5ocGquLfaT2XQM fR27aMpNBTLy3GJCLB0VkY/uXts/f3HLltthLAg/jH934I8QsGAxgJrwKVhu1z2x0VqP p+CNS2aand+nHQpF5PpP1xt/K4MW5CoRpsyu0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706144949; x=1706749749; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PiUW1kySFY5d/pkzoDzjFVz4adphI3+IFsUwCl7g0Cw=; b=g8wyzWwGu0UbOmDWz0REQHpKDdwP9l/urK4zZSOSW+tqXquLlsFiHQyNo2ew7CpxsS NfIQiTSmYKciPeyZr+a+SKbo5TrJlFLhL+bees0DJc9K4c1AkPvA20gEVI/DC0v3JRFO awRROiLnlyAXuU+8BqnckaQ40tjGUQA/HEEbZuVQwa0JDhvVWiB9YQgJ4rvx9dGgV+En x3CwO8mAX8VoHMiBc98+cNE/tBG6JsAu05fxIjmO9e7RwvTGufUjffNxbx8kNgVtS3FD kdah2oDHZXZPfHR52/YUZ1oxT7XRWHchkKdpLWUFeKvjQwYp6/EGuudvOBsoxJifJkDs S+wA== X-Gm-Message-State: AOJu0Yw1DOMC6rBvALQonHOrbXlkbQHC+tt8u8TAqmaafkQsCNGn2B11 TbiBrimTRJ82nX5sq5MybIAuskA3rJOh5w817ThhypYnvCCsxNc1Pdj/lnAMA00RIifrbvbQ9GV Ix2JaWAG/jnYLmii4PnsX7Se+s7p9r7Ju1X1KVA== X-Received: by 2002:a2e:3a03:0:b0:2cf:1c74:9bcb with SMTP id h3-20020a2e3a03000000b002cf1c749bcbmr100829lja.106.1706144948656; Wed, 24 Jan 2024 17:09:08 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231214024727.3503870-1-vineeth@bitbyteword.org> <20231215181014.GB2853@maniforge> <6595bee6.e90a0220.57b35.76e9@mx.google.com> <20240104223410.GE303539@maniforge> <052b0521-2273-4b1f-bd94-a3decceb9b05@joelfernandes.org> <20240124170648.GA249939@maniforge> In-Reply-To: <20240124170648.GA249939@maniforge> From: Joel Fernandes Date: Wed, 24 Jan 2024 20:08:56 -0500 Message-ID: Subject: Re: [RFC PATCH 0/8] Dynamic vcpu priority management in kvm To: David Vernet Cc: Sean Christopherson , "Vineeth Pillai (Google)" , Ben Segall , Borislav Petkov , Daniel Bristot de Oliveira , Dave Hansen , Dietmar Eggemann , "H . Peter Anvin" , Ingo Molnar , Juri Lelli , Mel Gorman , Paolo Bonzini , Andy Lutomirski , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Vitaly Kuznetsov , Wanpeng Li , Suleiman Souhlal , Masami Hiramatsu , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, Tejun Heo , Josh Don , Barret Rhoden , David Dunn , julia.lawall@inria.fr, himadrispandya@gmail.com, jean-pierre.lozi@inria.fr, ast@kernel.org, paulmck@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi David, On Wed, Jan 24, 2024 at 12:06=E2=80=AFPM David Vernet = wrote: > [...] > > There might be a caveat to the unboosting path though needing a hyperca= ll and I > > need to check with Vineeth on his latest code whether it needs a hyperc= all, but > > we could probably figure that out. In the latest design, one thing I kn= ow is > > that we just have to force a VMEXIT for both boosting and unboosting. W= ell for > > boosting, the VMEXIT just happens automatically due to vCPU preemption,= but for > > unboosting it may not. > > As mentioned above, I think we'd need to add UAPI for setting state from > the guest scheduler, even if we didn't use a hypercall to induce a > VMEXIT, right? I see what you mean now. I'll think more about it. The immediate thought is to load BPF programs to trigger at appropriate points in the guest. For instance, we already have tracepoints for preemption disabling. I added that upstream like 8 years ago or something. And sched_switch already knows when we switch to RT, which we could leverage in the guest. The BPF program would set some shared memory state in whatever format it desires, when it runs is what I'm envisioning. By the way, one crazy idea about loading BPF programs into a guest.. Maybe KVM can pass along the BPF programs to be loaded to the guest? The VMM can do that. The nice thing there is only the host would be the only responsible for the BPF programs. I am not sure if that makes sense, so please let me know what you think. I guess the VMM should also be passing additional metadata, like which tracepoints to hook to, in the guest, etc. > > In any case, can we not just force a VMEXIT from relevant path within t= he guest, > > again using a BPF program? I don't know what the BPF prog to do that wo= uld look > > like, but I was envisioning we would call a BPF prog from within a gues= t if > > needed at relevant point (example, return to guest userspace). > > I agree it would be useful to have a kfunc that could be used to force a > VMEXIT if we e.g. need to trigger a resched or something. In general > that seems like a pretty reasonable building block for something like > this. I expect there are use cases where doing everything async would be > useful as well. We'll have to see what works well in experimentation. Sure. > > >> Still there is a lot of merit to sharing memory with BPF and let BPF= decide > > >> the format of the shared memory, than baking it into the kernel... s= o thanks > > >> for bringing this up! Lets talk more about it... Oh, and there's my = LSFMMBPF > > >> invitiation request ;-) ;-). > > > > > > Discussing this BPF feature at LSFMMBPF is a great idea -- I'll submi= t a > > > proposal for it and cc you. I looked and couldn't seem to find the > > > thread for your LSFMMBPF proposal. Would you mind please sending a li= nk? > > > > I actually have not even submitted one for LSFMM but my management is s= upportive > > of my visit. Do you want to go ahead and submit one with all of us incl= uded in > > the proposal? And I am again sorry for the late reply and hopefully we = did not > > miss any deadlines. Also on related note, there is interest in sched_ex= t for > > I see that you submitted a proposal in [2] yesterday. Thanks for writing > it up, it looks great and I'll comment on that thread adding a +1 for > the discussion. > > [2]: https://lore.kernel.org/all/653c2448-614e-48d6-af31-c5920d688f3e@joe= lfernandes.org/ > > No worries at all about the reply latency. Thank you for being so open > to discussing different approaches, and for driving the discussion. I > think this could be a very powerful feature for the kernel so I'm > pretty excited to further flesh out the design and figure out what makes > the most sense here. Great! > > As mentioned above, for boosting, there is no hypercall. The VMEXIT is = induced > > by host preemption. > > I expect I am indeed missing something then, as mentioned above. VMEXIT > aside, we still need some UAPI for the shared structure between the > guest and host where the guest indicates its need for boosting, no? Yes you are right, it is more clear now what you were referring to with UAPI. I think we need figure that issue out. But if we can make the VMM load BPF programs, then the host can completely decide how to structure the shared memory. > > > 2. What is the cost we're imposing on users if we force paravirt to b= e > > > done through BPF? Is this prohibitively high? > > > > > > There is certainly a nonzero cost. As you pointed out, right now Andr= oid > > > apparently doesn't use much BPF, and adding the requisite logic to us= e > > > and manage BPF programs is not insigificant. > > > > > > Is that cost prohibitively high? I would say no. BPF should be fully > > > supported on aarch64 at this point, so it's really a user space probl= em. > > > Managing the system is what user space does best, and many other > > > ecosystems have managed to integrate BPF to great effect. So while th= e > > > cost is cetainly nonzero, I think there's a reasonable argument to be > > > made that it's not prohibitively high. > > > > Yes, I think it is doable. > > > > Glad to be able to finally reply, and I shall prioritize this thread mo= re on my > > side moving forward. > > Thanks for your detailed reply, and happy belated birthday :-) Thank you!!! :-) - Joel