Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp54173rwl; Thu, 6 Apr 2023 14:39:58 -0700 (PDT) X-Google-Smtp-Source: AKy350YDFfmFHxuZuQ3H/pttJwAGh9xu1u20Pzb3FofkGCD/VFiRjMe7RI63qbeWtM4nnsZfNYfx X-Received: by 2002:a17:90a:195b:b0:240:973d:b431 with SMTP id 27-20020a17090a195b00b00240973db431mr12693680pjh.42.1680817197914; Thu, 06 Apr 2023 14:39:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680817197; cv=none; d=google.com; s=arc-20160816; b=N3xi7azRtE/fMF2vnkyPvNrff5IsioxmshJ7xZLtrlyA6SYBpHKlqSzpT4xomilpsr Kw1MLZOyOFaQSI6mTyvnO7BDWDZefdBvw6EPClEbF08BZpBwby6C/S8lRnQ2dmTEpirt 0nVm0ChNAUK1vUWIoYCv6SIdkNRj0wD4uAEHvniEFub+bIA1W8/pW37Zabvj4sY+aPTS KksJ7rVgPWaGJOuVWlgQAmFYCT41aV8F8DP7ynQZgYttq9wm3PAwsXlJq/M9hmJHMVqj 3dqMwoGbEaXz1rGBgEmLWU6rkbds9sQ5aKjZgzSZhgkZnX+AQHab/EMCy5Y8PTiUkDyQ kjkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=pj5O6q4/SLV++nc+no3Y4fbUsfJNZC+GMe261qLamEw=; b=Yi4D1erXkgBn66BVx+ZTIFe8E8cSORSruS6H46HUg0A1wdqWTPYTHpcgQc+Z7OU8rb LUtW0qNQUg1mqOTyCEIqosuKLvGZBLJbC9VeWLdFdVtx4Z9OuL1DzemvLloGxKajwafl iD/O/0zHnMxTda7o1JUKFlhhf+j4po4Q4OGQBpoiMKrA/GndcU2xvl8PlzV0uEEdOh4o D5n+NdzbxEhnl164EXDHfV/J3YhYTAz1V9XYM6UOv6GC+07MLeaKkZ8lDO+BfHDGXz9j dkF7+1dN7lrqI0M9kQTyB7MtEKQaaSUHDMfnAe3J69zZIoxky2pOZglidczOMkyCgBCl aCJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=n6h9c0j+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ng14-20020a17090b1a8e00b0023f18634864si2257367pjb.132.2023.04.06.14.39.26; Thu, 06 Apr 2023 14:39:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=n6h9c0j+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237261AbjDFVjX (ORCPT + 99 others); Thu, 6 Apr 2023 17:39:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229575AbjDFVjW (ORCPT ); Thu, 6 Apr 2023 17:39:22 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 730BE83DC for ; Thu, 6 Apr 2023 14:39:20 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id r29so40803222wra.13 for ; Thu, 06 Apr 2023 14:39:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680817159; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pj5O6q4/SLV++nc+no3Y4fbUsfJNZC+GMe261qLamEw=; b=n6h9c0j+sES0yh6qBjYK+xyko80XB8y/5ykeeb8R97B2LZPG62DfWbTddtyczMWNNt LGOjh/Qqu/RL3BkQkh9z9XKmf7JNg1Zq7cu7vKfTGi8YZ7Pr92/nGYRgQMcyXlADf081 +n/51mtxYPJSRnwhZ6fiSX/fBrn6Yqiz+X4rmovqJLCsdo2gkxHrpVTaf/l+JPNgyxLj w36G+Zu9GjCCKHGMCk6oyAo8OXc7v5ABrXlclZNSBgBdWbLxzCaDTW+inb54gArnz07A 3vVssEmLjoCOdoWdyoE9BkUDeKAroTXGArrI+dTVqjwxpaCvpyIadrhvgOTVkIEHQOf2 D8jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680817159; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pj5O6q4/SLV++nc+no3Y4fbUsfJNZC+GMe261qLamEw=; b=cBu5yRqadoe6JyoII3k8SUrfH+kHcJbYSvbKP87mv7zWrQAsYojyX/RhTZdyWJFVy5 MNL5ozN03TMh4LhGySbhHBJ/R27fknyiibufQqKrzhuBI50gC+jCvix742c17hXGn6BL /B4KECPMJkUujlTDDps3vvxrpmV+O2HV45BTlTF4Wr0lvtuc4boTifWEylF/KMIUZaxt +CNlwc/ysUZ43UtTSEKZhUwU2vY/JMYX6rXMTqCMFDDnG//n7zGr+/0o3NAp6t66HYlq UaAUMIInN6CFbmog5Odj9I10mfvOyGDe8h58Q/9QF3ef4qkNhfE2SbC3+3GpSO1dNHRu W16g== X-Gm-Message-State: AAQBX9ffx9jkDba4JiXpE258l7fjjBkHzSVHrT7bPkOJG9NiMQvxF9Ex X6tPs1sN/eBv4ieG9qZQ44qqLe/Gb3ycT0wMM9UMzA== X-Received: by 2002:adf:eb4d:0:b0:2cf:f053:a32e with SMTP id u13-20020adfeb4d000000b002cff053a32emr2429273wrn.6.1680817158708; Thu, 06 Apr 2023 14:39:18 -0700 (PDT) MIME-Version: 1.0 References: <20230330224348.1006691-1-davidai@google.com> <86sfdfv0e1.wl-maz@kernel.org> In-Reply-To: From: David Dai Date: Thu, 6 Apr 2023 14:39:07 -0700 Message-ID: Subject: Re: [RFC PATCH 0/6] Improve VM DVFS and task placement behavior To: Quentin Perret Cc: Saravana Kannan , Marc Zyngier , Oliver Upton , "Rafael J. Wysocki" , Viresh Kumar , Rob Herring , Krzysztof Kozlowski , Paolo Bonzini , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Mark Rutland , Lorenzo Pieralisi , Sudeep Holla , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , kernel-team@android.com, linux-pm@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-15.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 6, 2023 at 5:52=E2=80=AFAM Quentin Perret = wrote: > > On Wednesday 05 Apr 2023 at 14:07:18 (-0700), Saravana Kannan wrote: > > On Wed, Apr 5, 2023 at 12:48=E2=80=AFAM 'Quentin Perret' via kernel-tea= m > > > And I concur with all the above as well. Putting this in the kernel i= s > > > not an obvious fit at all as that requires a number of assumptions ab= out > > > the VMM. > > > > > > As Oliver pointed out, the guest topology, and how it maps to the hos= t > > > topology (vcpu pinning etc) is very much a VMM policy decision and wi= ll > > > be particularly important to handle guest frequency requests correctl= y. > > > > > > In addition to that, the VMM's software architecture may have an impa= ct. > > > Crosvm for example does device emulation in separate processes for > > > security reasons, so it is likely that adjusting the scheduling > > > parameters ('util_guest', uclamp, or else) only for the vCPU thread t= hat > > > issues frequency requests will be sub-optimal for performance, we may > > > want to adjust those parameters for all the tasks that are on the > > > critical path. > > > > > > And at an even higher level, assuming in the kernel a certain mapping= of > > > vCPU threads to host threads feels kinda wrong, this too is a host > > > userspace policy decision I believe. Not that anybody in their right > > > mind would want to do this, but I _think_ it would technically be > > > feasible to serialize the execution of multiple vCPUs on the same hos= t > > > thread, at which point the util_guest thingy becomes entirely bogus. = (I > > > obviously don't want to conflate this use-case, it's just an example > > > that shows the proposed abstraction in the series is not a perfect fi= t > > > for the KVM userspace delegation model.) > > > > See my reply to Oliver and Marc. To me it looks like we are converging > > towards having shared memory between guest, host kernel and VMM and > > that should address all our concerns. > > Hmm, that is not at all my understanding of what has been the most > important part of the feedback so far: this whole thing belongs to > userspace. > > > The guest will see a MMIO device, writing to it will trigger the host > > kernel to do the basic "set util_guest/uclamp for the vCPU thread that > > corresponds to the vCPU" and then the VMM can do more on top as/if > > needed (because it has access to the shared memory too). Does that > > make sense? > > Not really no. I've given examples of why this doesn't make sense for > the kernel to do this, which still seems to be the case with what you're > suggesting here. > > > Even in the extreme example, the stuff the kernel would do would still > > be helpful, but not sufficient. You can aggregate the > > util_guest/uclamp and do whatever from the VMM. > > Technically in the extreme example, you don't need any of this. The > > normal util tracking of the vCPU thread on the host side would be > > sufficient. > > > > Actually any time we have only 1 vCPU host thread per VM, we shouldn't > > be using anything in this patch series and not instantiate the guest > > device at all. > > > > So +1 from me to move this as a virtual device of some kind. And if t= he > > > extra cost of exiting all the way back to userspace is prohibitive (i= s > > > it btw?), > > > > I think the "13% increase in battery consumption for games" makes it > > pretty clear that going to userspace is prohibitive. And that's just > > one example. > Hi Quentin, Appreciate the feedback, > I beg to differ. We need to understand where these 13% come from in more > details. Is it really the actual cost of the userspace exit? Or is it > just that from userspace the only knob you can play with is uclamp and > that didn't reach the expected level of performance? To clarify, the MMIO numbers shown in the cover letter were collected with updating vCPU task's util_guest as opposed to uclamp_min. In that configuration, userspace(VMM) handles the mmio_exit from the guest and makes an ioctl on the host kernel to update util_guest for the vCPU task. > > If that is the userspace exit, then we can work to optimize that -- it's > a fairly common problem in the virt world, nothing special here. > Ok, we're open to suggestions on how to better optimize here. > And if the issue is the lack of expressiveness in uclamp, then that too > is something we should work on, but clearly giving vCPU threads more > 'power' than normal host threads is a bit of a red flag IMO. vCPU > threads must be constrained in the same way that userspace threads are, > because they _are_ userspace threads. > > > > then we can try to work on that. Maybe something a la vhost > > > can be done to optimize, I'll have a think. > > > > > > > The one thing I'd like to understand that the comment seems to impl= y > > > > that there is a significant difference in overhead between a hyperc= all > > > > and an MMIO. In my experience, both are pretty similar in cost for = a > > > > handling location (both in userspace or both in the kernel). MMIO > > > > handling is a tiny bit more expensive due to a guaranteed TLB miss > > > > followed by a walk of the in-kernel device ranges, but that's all. = It > > > > should hardly register. > > > > > > > > And if you really want some super-low latency, low overhead > > > > signalling, maybe an exception is the wrong tool for the job. Share= d > > > > memory communication could be more appropriate. > > > > > > I presume some kind of signalling mechanism will be necessary to > > > synchronously update host scheduling parameters in response to guest > > > frequency requests, but if the volume of data requires it then a shar= ed > > > buffer + doorbell type of approach should do. > > > > Part of the communication doesn't need synchronous handling by the > > host. So, what I said above. > > I've also replied to another message about the scale invariance issue, > and I'm not convinced the frequency based interface proposed here really > makes sense. An AMU-like interface is very likely to be superior. > Some sort of AMU-based interface was discussed offline with Saravana, but I'm not sure how to best implement that. If you have any pointers to get started, that would be helpful. > > > Thinking about it, using SCMI over virtio would implement exactly tha= t. > > > Linux-as-a-guest already supports it IIRC, so possibly the problem > > > being addressed in this series could be 'simply' solved using an SCMI > > > backend in the VMM... > > > > This will be worse than all the options we've tried so far because it > > has the userspace overhead AND uclamp overhead. > > But it doesn't violate the whole KVM userspace delegation model, so we > should start from there and then optimize further if need be. Do you have any references we can experiment with getting started for SCMI? (ex. SCMI backend support in CrosVM). For RFC V3, I'll post a CPUfreq driver implementation that only uses MMIO and without any kernel host modifications(I.E. Only using uclamp as a knob to tune the host) along with performance numbers and then work on optimizing from there. Thanks, David > > Thanks, > Quentin