Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1250416rwl; Wed, 5 Apr 2023 14:10:15 -0700 (PDT) X-Google-Smtp-Source: AKy350aItqBnON+c4QS3KRUbhpvGMi+TPzoJxWGXP48CBLHOtpXVERk2pkAo7DRGfQXRK9qx6vpX X-Received: by 2002:aa7:d8cc:0:b0:501:cde8:c523 with SMTP id k12-20020aa7d8cc000000b00501cde8c523mr3475168eds.6.1680729015452; Wed, 05 Apr 2023 14:10:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680729015; cv=none; d=google.com; s=arc-20160816; b=usk8tNSAWyeny94P7uw7xq5Kp6xG/40ekAUniBG0QLxlYNiFqsezTT1n11MDjzjpaw rUsl30yvBe6soZoQ9F0BbIWeBTaiWbzyridp3+C4rJYyvgwyR8J6uJZYlzFjSfKSOhDF OEoY1O19rJWFnBmyILqP9m/bNLs4zg1iLTU5CgzWZ/zKpdg2paMzj4KNWnNyFmMRGBTl MX6tnL8dDRMabimScUY8cetiuKiOLTtSety0KIlhdXDXmezcrJmdaPWrNXfiTObNfLez sDTMZIG2sf+4zmu4yX6uE7jnCV09LI37DN+Be+EnviHQxWb6sbgUe8TJ2BCGUHMBdZnI 954g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=sml6aLd7NKNtOPArQglRTFmPrwxLRHpfQGEdpLGQuKU=; b=fPufgaoKnytYAdNRjqKxFYIA39sUsFWVtpBVFZsoCj0/1Kz7hkKPrnz92SsoisBE5n RBE+drSGzhzotDiSWua38SXiryc+m/Q8U3W9+61KB33Xy4B7jsBsZQ38C2KJvGoFtqYY 8ZKpCPeMK6JGMsHOG+U0vb3YJcxE3sKw+JcpGpyj1d+fC8YpCdIOwyMmlk7+L0RDLbnI nPlNpjUDniW6yJiIkGPod73KoTQVhzHf9kcCEt8Ehgzoh/uUXDszRfXioklyxmL7Z1R7 Dg1Z2dq1YuM3SrY03Fbf/OBDEtaj38+olwvkEPv+Rzj33TzcZFVQVP1o7obB9AAhbPMO 2x7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=hzTkqmv9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ca21-20020aa7cd75000000b004aad91c6589si4091952edb.193.2023.04.05.14.09.48; Wed, 05 Apr 2023 14:10:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=hzTkqmv9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231429AbjDEVIH (ORCPT + 99 others); Wed, 5 Apr 2023 17:08:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234135AbjDEVH7 (ORCPT ); Wed, 5 Apr 2023 17:07:59 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27E7B7AAB for ; Wed, 5 Apr 2023 14:07:56 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id bm13so10422171pfb.5 for ; Wed, 05 Apr 2023 14:07:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680728875; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sml6aLd7NKNtOPArQglRTFmPrwxLRHpfQGEdpLGQuKU=; b=hzTkqmv9kRVCuEZOHA2Eqet8sLVNbtm1v12CcRdciL89msd39FaXGObgLEAtilXQyw 8/nTJHQyEKtHUXCZDNVNUwjpTKf3T8Bkx+UVT8ihgYKBi77tqtVuIBHbp6G5BxfUwt8G W5zxIeGHDJwNo9S7D5TYRs0cwadJf/BNQQs0a4O3I6poQik0dh2klYu6Z2u3dGzNnweb OgcKeZNhlCtp8r2xGYgU3R1uYe35oXfVzjM1MTAFlcQRsQqvtrJSbVphRz52PniweMYe mIyeb3OvhjOXZbUZJkXeD9VUUvZjh8RZ3FdimwT4HGsTjmkW282x/qfCTCsK7d/MGJDt 9q7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680728875; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sml6aLd7NKNtOPArQglRTFmPrwxLRHpfQGEdpLGQuKU=; b=tEndlk/yDHQeAaZNXeEHIWpR8Xt2d9Qmclxnb+LBDI0ng0ubmx5+rFWBM3JDdfTjkn Lp4yiBMFK7Ns8ikWPPhingS594qlczRY9JfeIM7wY9LxLZuxAUbnLJBEsslj8SQnvKzG ponLPo+3HW9mwXrlmr8uoe0H8Zt/ZpiVyaLGCDuKDXK6x3b2uXvuE94jyPgm76ROQUQO 00e4EtasyxgLmJSN2kNg7tgw4SaBzvc9OyeTfLR8ehAg1IiWKB/QFwAhh6vwIuAGTGDO 9jUy8qV8ksuZS/v1BZeutfs0UiSBXh6WIChjbbiFditaTXsURR589CFE9DV3G98EbJzr 1lHQ== X-Gm-Message-State: AAQBX9dt9RiecVaYNcZFZYkijGkk5pXNUGcmxm2SgGsJIdNBJZvpIIQo gpuX4wrh/ttSkdruto+8xjghQku3uEmcD0wAa9CCmQ== X-Received: by 2002:a63:474b:0:b0:50f:ad00:3f92 with SMTP id w11-20020a63474b000000b0050fad003f92mr2366072pgk.3.1680728874841; Wed, 05 Apr 2023 14:07:54 -0700 (PDT) MIME-Version: 1.0 References: <20230330224348.1006691-1-davidai@google.com> <86sfdfv0e1.wl-maz@kernel.org> In-Reply-To: From: Saravana Kannan Date: Wed, 5 Apr 2023 14:07:18 -0700 Message-ID: Subject: Re: [RFC PATCH 0/6] Improve VM DVFS and task placement behavior To: Quentin Perret Cc: Marc Zyngier , David Dai , Oliver Upton , "Rafael J. Wysocki" , Viresh Kumar , Rob Herring , Krzysztof Kozlowski , Paolo Bonzini , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Mark Rutland , Lorenzo Pieralisi , Sudeep Holla , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , kernel-team@android.com, linux-pm@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-15.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 5, 2023 at 12:48=E2=80=AFAM 'Quentin Perret' via kernel-team wrote: > > On Tuesday 04 Apr 2023 at 21:49:10 (+0100), Marc Zyngier wrote: > > On Tue, 04 Apr 2023 20:43:40 +0100, > > Oliver Upton wrote: > > > > > > Folks, > > > > > > On Thu, Mar 30, 2023 at 03:43:35PM -0700, David Dai wrote: > > > > > > > > > > > > > PCMark > > > > Higher is better > > > > +-------------------+----------+------------+--------+-------+-----= ---+ > > > > | Test Case (score) | Baseline | Hypercall | %delta | MMIO | %del= ta | > > > > +-------------------+----------+------------+--------+-------+-----= ---+ > > > > | Weighted Total | 6136 | 7274 | +19% | 6867 | +1= 2% | > > > > +-------------------+----------+------------+--------+-------+-----= ---+ > > > > | Web Browsing | 5558 | 6273 | +13% | 6035 | += 9% | > > > > +-------------------+----------+------------+--------+-------+-----= ---+ > > > > | Video Editing | 4921 | 5221 | +6% | 5167 | += 5% | > > > > +-------------------+----------+------------+--------+-------+-----= ---+ > > > > | Writing | 6864 | 8825 | +29% | 8529 | +2= 4% | > > > > +-------------------+----------+------------+--------+-------+-----= ---+ > > > > | Photo Editing | 7983 | 11593 | +45% | 10812 | +3= 5% | > > > > +-------------------+----------+------------+--------+-------+-----= ---+ > > > > | Data Manipulation | 5814 | 6081 | +5% | 5327 | -= 8% | > > > > +-------------------+----------+------------+--------+-------+-----= ---+ > > > > > > > > PCMark Performance/mAh > > > > Higher is better > > > > +-----------+----------+-----------+--------+------+--------+ > > > > | | Baseline | Hypercall | %delta | MMIO | %delta | > > > > +-----------+----------+-----------+--------+------+--------+ > > > > | Score/mAh | 79 | 88 | +11% | 83 | +7% | > > > > +-----------+----------+-----------+--------+------+--------+ > > > > > > > > Roblox > > > > Higher is better > > > > +-----+----------+------------+--------+-------+--------+ > > > > | | Baseline | Hypercall | %delta | MMIO | %delta | > > > > +-----+----------+------------+--------+-------+--------+ > > > > | FPS | 18.25 | 28.66 | +57% | 24.06 | +32% | > > > > +-----+----------+------------+--------+-------+--------+ > > > > > > > > Roblox Frames/mAh > > > > Higher is better > > > > +------------+----------+------------+--------+--------+--------+ > > > > | | Baseline | Hypercall | %delta | MMIO | %delta | > > > > +------------+----------+------------+--------+--------+--------+ > > > > | Frames/mAh | 91.25 | 114.64 | +26% | 103.11 | +13% | > > > > +------------+----------+------------+--------+--------+--------+ > > > > > > > > > > > > > Next steps: > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > We are continuing to look into communication mechanisms other than > > > > hypercalls that are just as/more efficient and avoid switching into= the VMM > > > > userspace. Any inputs in this regard are greatly appreciated. > > > > > > We're highly unlikely to entertain such an interface in KVM. > > > > > > The entire feature is dependent on pinning vCPUs to physical cores, f= or which > > > userspace is in the driver's seat. That is a well established and doc= umented > > > policy which can be seen in the way we handle heterogeneous systems a= nd > > > vPMU. > > > > > > Additionally, this bloats the KVM PV ABI with highly VMM-dependent in= terfaces > > > that I would not expect to benefit the typical user of KVM. > > > > > > Based on the data above, it would appear that the userspace implement= ation is > > > in the same neighborhood as a KVM-based implementation, which only fu= rther > > > weakens the case for moving this into the kernel. > > > > > > I certainly can appreciate the motivation for the series, but this fe= ature > > > should be in userspace as some form of a virtual device. > > > > +1 on all of the above. > > And I concur with all the above as well. Putting this in the kernel is > not an obvious fit at all as that requires a number of assumptions about > the VMM. > > As Oliver pointed out, the guest topology, and how it maps to the host > topology (vcpu pinning etc) is very much a VMM policy decision and will > be particularly important to handle guest frequency requests correctly. > > In addition to that, the VMM's software architecture may have an impact. > Crosvm for example does device emulation in separate processes for > security reasons, so it is likely that adjusting the scheduling > parameters ('util_guest', uclamp, or else) only for the vCPU thread that > issues frequency requests will be sub-optimal for performance, we may > want to adjust those parameters for all the tasks that are on the > critical path. > > And at an even higher level, assuming in the kernel a certain mapping of > vCPU threads to host threads feels kinda wrong, this too is a host > userspace policy decision I believe. Not that anybody in their right > mind would want to do this, but I _think_ it would technically be > feasible to serialize the execution of multiple vCPUs on the same host > thread, at which point the util_guest thingy becomes entirely bogus. (I > obviously don't want to conflate this use-case, it's just an example > that shows the proposed abstraction in the series is not a perfect fit > for the KVM userspace delegation model.) See my reply to Oliver and Marc. To me it looks like we are converging towards having shared memory between guest, host kernel and VMM and that should address all our concerns. The guest will see a MMIO device, writing to it will trigger the host kernel to do the basic "set util_guest/uclamp for the vCPU thread that corresponds to the vCPU" and then the VMM can do more on top as/if needed (because it has access to the shared memory too). Does that make sense? Even in the extreme example, the stuff the kernel would do would still be helpful, but not sufficient. You can aggregate the util_guest/uclamp and do whatever from the VMM. Technically in the extreme example, you don't need any of this. The normal util tracking of the vCPU thread on the host side would be sufficient. Actually any time we have only 1 vCPU host thread per VM, we shouldn't be using anything in this patch series and not instantiate the guest device at all. > So +1 from me to move this as a virtual device of some kind. And if the > extra cost of exiting all the way back to userspace is prohibitive (is > it btw?), I think the "13% increase in battery consumption for games" makes it pretty clear that going to userspace is prohibitive. And that's just one example. > then we can try to work on that. Maybe something a la vhost > can be done to optimize, I'll have a think. > > > The one thing I'd like to understand that the comment seems to imply > > that there is a significant difference in overhead between a hypercall > > and an MMIO. In my experience, both are pretty similar in cost for a > > handling location (both in userspace or both in the kernel). MMIO > > handling is a tiny bit more expensive due to a guaranteed TLB miss > > followed by a walk of the in-kernel device ranges, but that's all. It > > should hardly register. > > > > And if you really want some super-low latency, low overhead > > signalling, maybe an exception is the wrong tool for the job. Shared > > memory communication could be more appropriate. > > I presume some kind of signalling mechanism will be necessary to > synchronously update host scheduling parameters in response to guest > frequency requests, but if the volume of data requires it then a shared > buffer + doorbell type of approach should do. Part of the communication doesn't need synchronous handling by the host. So, what I said above. > Thinking about it, using SCMI over virtio would implement exactly that. > Linux-as-a-guest already supports it IIRC, so possibly the problem > being addressed in this series could be 'simply' solved using an SCMI > backend in the VMM... This will be worse than all the options we've tried so far because it has the userspace overhead AND uclamp overhead. -Saravana