Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp2090535rwl; Thu, 6 Apr 2023 05:53:18 -0700 (PDT) X-Google-Smtp-Source: AKy350bPvGcnCPrYXkkrs50HT209+kwIBk1wyNjw8+s6qyARYDUvRtsjX/i9foyG0GhyTc6ATPMB X-Received: by 2002:a05:6a20:6b96:b0:df:559a:96dc with SMTP id bu22-20020a056a206b9600b000df559a96dcmr2509601pzb.32.1680785598159; Thu, 06 Apr 2023 05:53:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680785598; cv=none; d=google.com; s=arc-20160816; b=iLd7SDPWA4WKCqDnoTvNaml626E6HsEZDKWZRbgBJ/du8WypfAKOL/OnW9EzXMHSpA PrAj+gxdtiAmrVu/m95oSTq9uPolWIvg6XPW5KyOqOqQpm/rpmnAZa1XqHN8dtqBtxf9 ml+6Ny66FRYUdpniwzDSIKNCLSTSHTg+fg5jf8OWM39aB/e+2Nf3fWb5VgoOxO1Fdn5B cTEdieFnO6qZl5hqfvhrUFVWcdsvZGsoYh9VXWMTz29kw3/Hp4uD+YQ8HccO7UeC04B9 dMaI6/K0Ci4tGsQDw2gWunSedFPS9kb9TIq4EfVbNmraGITcyMTyrk6yCIywGgRJ1eTJ xGmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=IJscOEPUxerIocq5ivhaFnZ1SHI0paGFj+4RY7CI49c=; b=TIQKER8w8sbletWzSaPt8UaMgZW6vzS8y64O7weOQbpQaNCBGRDX8/BE2zoHAIX/UM O45Z+Cgsb4RpRenkbbB1thlESYrMRB3khTWXYGbY3vxySd3QJtUwbmPHOuQkdIlJY1Xl A6m1KTbnDER74ynTLQnqaT/fd/1sR3bFZxUgDGijsphk0yNTClrIW/jphr9RX9zU1B8c sRnhGv6ry4dpOei1NpNVwksOZOs0GtxxhQyHF5EaX8H8DJoqOBtgEp+KEOLsxS2dyYg7 +g9r5f5X1/Ww7GSv8nS6/Lyj6NPwZAl7jQd9D7c3DgZCTyT1ASr6KaZ05ytkqaBiQKEN dX1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=mmi2ytKq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e21-20020a635455000000b0051344de1eb8si1203600pgm.276.2023.04.06.05.53.06; Thu, 06 Apr 2023 05:53:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=mmi2ytKq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236287AbjDFMxA (ORCPT + 99 others); Thu, 6 Apr 2023 08:53:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236599AbjDFMw7 (ORCPT ); Thu, 6 Apr 2023 08:52:59 -0400 Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [IPv6:2a00:1450:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E332F7ED0 for ; Thu, 6 Apr 2023 05:52:56 -0700 (PDT) Received: by mail-ej1-x635.google.com with SMTP id sg7so1129573ejc.9 for ; Thu, 06 Apr 2023 05:52:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680785575; x=1683377575; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=IJscOEPUxerIocq5ivhaFnZ1SHI0paGFj+4RY7CI49c=; b=mmi2ytKq+NaQhoDUPRHBUEjRhCCPgZiy7rLgIJXfMFck1daXezUDFfW+f8Ya9qW8Nu 77o8Nv8mCPZJVOJ/Mw5IkzQ/lVA8K8pLYcnCheg3aUT4rIjJIAXX7Utgjw5MQqmXLhtW 435VTHVUt6RP3++nMGoeLtublMXBqOmF5FEHQb6o99n31lb7S6WZxLIvcTevt83tl6fV cKM/LYDc4weOyCwuDkg3btNjWsl6SyXFBeMGeyFU1GWXKNRtgmCulev2zmpW5HSdx9TQ WDUIoKFjlt3z2DEYShN6woFM3WBqXviuTCQLf58p6eWApFbTnPJA8N2vEAKJbmcOE3wx 8r5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680785575; x=1683377575; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IJscOEPUxerIocq5ivhaFnZ1SHI0paGFj+4RY7CI49c=; b=cIxkNYtPTACZmDpnVE++cjaJjJ7yWaB28VFoNVI1KQrqvzU6/MI4GgnAXdVNkTqvmc pdyMaXjEXcASdgTnsM/LP4TjWl8pWBhUfyR1L7CRPXCSfBiJaSnQZv+AQe0EV0/s2mD3 HfTNQshcjthxEyTQv6p75wSJ32xvXjtIzL11ZqLz9iMWXLzQQPEmNaubC5L1UOUKf2dV HqRr8bOITEIuzzgW6vkJO/3hDGUsTb6N6A7V1hBywMtm4ejNgdqZLdLk4YSSc2hHL6fD Xcjr+aFPThbm7LH+UnFNnyJ0Fg1uq+8V0qVJv7hkdyQF7LyEj8CyFmCt2Y1xKE8JXZIw pzEQ== X-Gm-Message-State: AAQBX9eXEhEYwrzMtOFGGf8MHSBIz/n/7i29YjifvLowAELAU+3U4C2S xNJutu1+rbN0TEKXhN5cWND5rg== X-Received: by 2002:a17:906:9be2:b0:931:624b:680c with SMTP id de34-20020a1709069be200b00931624b680cmr6404112ejc.29.1680785575139; Thu, 06 Apr 2023 05:52:55 -0700 (PDT) Received: from google.com (64.227.90.34.bc.googleusercontent.com. [34.90.227.64]) by smtp.gmail.com with ESMTPSA id rj13-20020a1709076c8d00b00947ccb6150bsm657067ejc.102.2023.04.06.05.52.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Apr 2023 05:52:54 -0700 (PDT) Date: Thu, 6 Apr 2023 12:52:51 +0000 From: Quentin Perret To: Saravana Kannan Cc: Marc Zyngier , David Dai , Oliver Upton , "Rafael J. Wysocki" , Viresh Kumar , Rob Herring , Krzysztof Kozlowski , Paolo Bonzini , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Mark Rutland , Lorenzo Pieralisi , Sudeep Holla , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , kernel-team@android.com, linux-pm@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev Subject: Re: [RFC PATCH 0/6] Improve VM DVFS and task placement behavior Message-ID: References: <20230330224348.1006691-1-davidai@google.com> <86sfdfv0e1.wl-maz@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-15.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wednesday 05 Apr 2023 at 14:07:18 (-0700), Saravana Kannan wrote: > On Wed, Apr 5, 2023 at 12:48 AM 'Quentin Perret' via kernel-team > > And I concur with all the above as well. Putting this in the kernel is > > not an obvious fit at all as that requires a number of assumptions about > > the VMM. > > > > As Oliver pointed out, the guest topology, and how it maps to the host > > topology (vcpu pinning etc) is very much a VMM policy decision and will > > be particularly important to handle guest frequency requests correctly. > > > > In addition to that, the VMM's software architecture may have an impact. > > Crosvm for example does device emulation in separate processes for > > security reasons, so it is likely that adjusting the scheduling > > parameters ('util_guest', uclamp, or else) only for the vCPU thread that > > issues frequency requests will be sub-optimal for performance, we may > > want to adjust those parameters for all the tasks that are on the > > critical path. > > > > And at an even higher level, assuming in the kernel a certain mapping of > > vCPU threads to host threads feels kinda wrong, this too is a host > > userspace policy decision I believe. Not that anybody in their right > > mind would want to do this, but I _think_ it would technically be > > feasible to serialize the execution of multiple vCPUs on the same host > > thread, at which point the util_guest thingy becomes entirely bogus. (I > > obviously don't want to conflate this use-case, it's just an example > > that shows the proposed abstraction in the series is not a perfect fit > > for the KVM userspace delegation model.) > > See my reply to Oliver and Marc. To me it looks like we are converging > towards having shared memory between guest, host kernel and VMM and > that should address all our concerns. Hmm, that is not at all my understanding of what has been the most important part of the feedback so far: this whole thing belongs to userspace. > The guest will see a MMIO device, writing to it will trigger the host > kernel to do the basic "set util_guest/uclamp for the vCPU thread that > corresponds to the vCPU" and then the VMM can do more on top as/if > needed (because it has access to the shared memory too). Does that > make sense? Not really no. I've given examples of why this doesn't make sense for the kernel to do this, which still seems to be the case with what you're suggesting here. > Even in the extreme example, the stuff the kernel would do would still > be helpful, but not sufficient. You can aggregate the > util_guest/uclamp and do whatever from the VMM. > Technically in the extreme example, you don't need any of this. The > normal util tracking of the vCPU thread on the host side would be > sufficient. > > Actually any time we have only 1 vCPU host thread per VM, we shouldn't > be using anything in this patch series and not instantiate the guest > device at all. > > So +1 from me to move this as a virtual device of some kind. And if the > > extra cost of exiting all the way back to userspace is prohibitive (is > > it btw?), > > I think the "13% increase in battery consumption for games" makes it > pretty clear that going to userspace is prohibitive. And that's just > one example. I beg to differ. We need to understand where these 13% come from in more details. Is it really the actual cost of the userspace exit? Or is it just that from userspace the only knob you can play with is uclamp and that didn't reach the expected level of performance? If that is the userspace exit, then we can work to optimize that -- it's a fairly common problem in the virt world, nothing special here. And if the issue is the lack of expressiveness in uclamp, then that too is something we should work on, but clearly giving vCPU threads more 'power' than normal host threads is a bit of a red flag IMO. vCPU threads must be constrained in the same way that userspace threads are, because they _are_ userspace threads. > > then we can try to work on that. Maybe something a la vhost > > can be done to optimize, I'll have a think. > > > > > The one thing I'd like to understand that the comment seems to imply > > > that there is a significant difference in overhead between a hypercall > > > and an MMIO. In my experience, both are pretty similar in cost for a > > > handling location (both in userspace or both in the kernel). MMIO > > > handling is a tiny bit more expensive due to a guaranteed TLB miss > > > followed by a walk of the in-kernel device ranges, but that's all. It > > > should hardly register. > > > > > > And if you really want some super-low latency, low overhead > > > signalling, maybe an exception is the wrong tool for the job. Shared > > > memory communication could be more appropriate. > > > > I presume some kind of signalling mechanism will be necessary to > > synchronously update host scheduling parameters in response to guest > > frequency requests, but if the volume of data requires it then a shared > > buffer + doorbell type of approach should do. > > Part of the communication doesn't need synchronous handling by the > host. So, what I said above. I've also replied to another message about the scale invariance issue, and I'm not convinced the frequency based interface proposed here really makes sense. An AMU-like interface is very likely to be superior. > > Thinking about it, using SCMI over virtio would implement exactly that. > > Linux-as-a-guest already supports it IIRC, so possibly the problem > > being addressed in this series could be 'simply' solved using an SCMI > > backend in the VMM... > > This will be worse than all the options we've tried so far because it > has the userspace overhead AND uclamp overhead. But it doesn't violate the whole KVM userspace delegation model, so we should start from there and then optimize further if need be. Thanks, Quentin