Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp366872rwl; Wed, 5 Apr 2023 01:39:41 -0700 (PDT) X-Google-Smtp-Source: AKy350aFId58VRBgqt8VpHGmH0ziukVo5hgcJCfRCZWVdkxclWjT4UDPKBIb3ng/k3T6cXXGabJd X-Received: by 2002:a05:6a20:7523:b0:da:5ef4:69f5 with SMTP id r35-20020a056a20752300b000da5ef469f5mr4157504pzd.21.1680683980882; Wed, 05 Apr 2023 01:39:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680683980; cv=none; d=google.com; s=arc-20160816; b=x1Lg/J3cwQUHBFLqcZhYv0ObPPTtCk7c65HMqcx4NZYbzU0Cq60ZYjeC2Tln0eIT4W tJMgpmYaXjXWmF5Cu4Y52u8khoGieUOJ/efE4smxEBaIA1XJWYXf4C12TDuws80Z+5uc ENrFcW4+ZbQMFlqVvVW3bk6DYxzzLUCds+UfNIeEQOymXT4mS6Vn/Qk8hZulU87KIJsn H6lDoLZ3y1tTLx0nEdzEaokiI9Y7c8ioSOrQmeDQUJf33hPxoscXt8DnRyLEgclfU740 t3tiksCiTwhPkrsty6Pxl4it48vT9g88jgkszViHzvQil0t6QaA3s/DYJHboikH9O30o dYDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=uswkZCNmvnRlVRQCSju2fZnjPjabnOYplQP8HVJ2wjE=; b=d0jNj7UYdjoKr3GBcwIVpLg1E6XOgZGVLvhWIiRYeL7i95buF+jnRVuKGTwnFahwwV 5zPof6uY3t2unCotKQclKI0aReHFDkvpNKAyZV/sSCoC9PvvX2bS4YXAcNTzafJkQ7wi eM5hHxvRI6aHDcbAiRwWZIiyu4cW1bgLtoEgY6QNSaGJAoFB1nfVhJCGGbEdBr3NT5Qf aPJc6BbFOvMNzJksqdYxIxLO1U1ZqC6KUNdNEbhmCIGLLWauiV8goFNmHju7jZ70Ds1c 3uhqqzoBjnLKCaB7WWH+JGxxJ3EX35lYfkhnD/kB1lToXbAqh9RecftN7hb8NiOuciJX /dzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=hDTg9cZ5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r19-20020a63e513000000b0050fa03c6d7dsi11566982pgh.37.2023.04.05.01.39.29; Wed, 05 Apr 2023 01:39:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=hDTg9cZ5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237541AbjDEIeU (ORCPT + 99 others); Wed, 5 Apr 2023 04:34:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237125AbjDEIeR (ORCPT ); Wed, 5 Apr 2023 04:34:17 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 927AB4C22 for ; Wed, 5 Apr 2023 01:34:10 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id p13-20020a17090a284d00b0023d2e945aebso3579381pjf.0 for ; Wed, 05 Apr 2023 01:34:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1680683650; x=1683275650; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=uswkZCNmvnRlVRQCSju2fZnjPjabnOYplQP8HVJ2wjE=; b=hDTg9cZ5dgqFve21IhvP0vjH/LT5Qn7NqlC7lp8kDbxvOzwqcjsmIaSgUiFkxF9/rA CX6RJ+FZxOinhHJXTmJWHeiM0PU12K7peccwTyhJ8SacTyOawUNsD711ZuiS/VYAiGjD JxibN6nFFF/mlX0xY9ioRXj8B1hj/IrRo1VRraKc0O5K/v3oPqUvBr2n5pgr62FnlVaU 2hRpKMOuYdJLVWz7IDpKKO6Go35uPSTQZUdpvVPnGKhPc+aWLP8aA0Tz8VCWrmG+oJjo p54SNKfSHgQ/MCAetv/xg7nmJSwNXLhATwlFTNFdgDWIHq16UuKOooTHU7SDn+oVRjSK JQPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680683650; x=1683275650; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uswkZCNmvnRlVRQCSju2fZnjPjabnOYplQP8HVJ2wjE=; b=MdNyou1gBPKF3a1rfvGyyrAB9ZVqvNWP0dA/19ct8hP8btv2YeCVTzr8tKs5uAeyAC af/za5qWa0xdvJapBLi9+PwmUM25Lc7tow9fIBiTajz/fqOY42bzotyYeMFOgaTFQZfP 8QkLv024TX84mgUq+5IqE6qW0tgcsXu9nCGQztNo3CkKXLy+Ak1RpOV2kATnjNGhMjDj fZZjB8yn5e1AIdr9IPw7WoZeOwQFZcI+h88AKdMCaa7ZYdzEqoS3zuKhm6xrJqj/EWGW Zd3DrVIMJSxs377lt5igQEDJ0qDBX99nd6pgHsBHI7mwENWS/677qsDq+xhixxKVJfHu +UvA== X-Gm-Message-State: AAQBX9cbAiBVRUJTYfjWAuLY814REL6cl07sPIeF8eyBgB8/MZuVAx0X OW1RrzcxXCRSQ4EDmpzdfaDYhrEllt806h0XTGZ+UQ== X-Received: by 2002:a17:90a:d201:b0:240:d8d8:12c4 with SMTP id o1-20020a17090ad20100b00240d8d812c4mr1918429pju.3.1680683649941; Wed, 05 Apr 2023 01:34:09 -0700 (PDT) MIME-Version: 1.0 References: <20230330224348.1006691-1-davidai@google.com> <86sfdfv0e1.wl-maz@kernel.org> In-Reply-To: From: Vincent Guittot Date: Wed, 5 Apr 2023 10:33:58 +0200 Message-ID: Subject: Re: [RFC PATCH 0/6] Improve VM DVFS and task placement behavior To: Quentin Perret Cc: Marc Zyngier , David Dai , Oliver Upton , "Rafael J. Wysocki" , Viresh Kumar , Rob Herring , Krzysztof Kozlowski , Paolo Bonzini , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Mark Rutland , Lorenzo Pieralisi , Sudeep Holla , Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , kernel-team@android.com, linux-pm@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 5 Apr 2023 at 09:48, Quentin Perret wrote: > > On Tuesday 04 Apr 2023 at 21:49:10 (+0100), Marc Zyngier wrote: > > On Tue, 04 Apr 2023 20:43:40 +0100, > > Oliver Upton wrote: > > > > > > Folks, > > > > > > On Thu, Mar 30, 2023 at 03:43:35PM -0700, David Dai wrote: > > > > > > > > > > > > > PCMark > > > > Higher is better > > > > +-------------------+----------+------------+--------+-------+--------+ > > > > | Test Case (score) | Baseline | Hypercall | %delta | MMIO | %delta | > > > > +-------------------+----------+------------+--------+-------+--------+ > > > > | Weighted Total | 6136 | 7274 | +19% | 6867 | +12% | > > > > +-------------------+----------+------------+--------+-------+--------+ > > > > | Web Browsing | 5558 | 6273 | +13% | 6035 | +9% | > > > > +-------------------+----------+------------+--------+-------+--------+ > > > > | Video Editing | 4921 | 5221 | +6% | 5167 | +5% | > > > > +-------------------+----------+------------+--------+-------+--------+ > > > > | Writing | 6864 | 8825 | +29% | 8529 | +24% | > > > > +-------------------+----------+------------+--------+-------+--------+ > > > > | Photo Editing | 7983 | 11593 | +45% | 10812 | +35% | > > > > +-------------------+----------+------------+--------+-------+--------+ > > > > | Data Manipulation | 5814 | 6081 | +5% | 5327 | -8% | > > > > +-------------------+----------+------------+--------+-------+--------+ > > > > > > > > PCMark Performance/mAh > > > > Higher is better > > > > +-----------+----------+-----------+--------+------+--------+ > > > > | | Baseline | Hypercall | %delta | MMIO | %delta | > > > > +-----------+----------+-----------+--------+------+--------+ > > > > | Score/mAh | 79 | 88 | +11% | 83 | +7% | > > > > +-----------+----------+-----------+--------+------+--------+ > > > > > > > > Roblox > > > > Higher is better > > > > +-----+----------+------------+--------+-------+--------+ > > > > | | Baseline | Hypercall | %delta | MMIO | %delta | > > > > +-----+----------+------------+--------+-------+--------+ > > > > | FPS | 18.25 | 28.66 | +57% | 24.06 | +32% | > > > > +-----+----------+------------+--------+-------+--------+ > > > > > > > > Roblox Frames/mAh > > > > Higher is better > > > > +------------+----------+------------+--------+--------+--------+ > > > > | | Baseline | Hypercall | %delta | MMIO | %delta | > > > > +------------+----------+------------+--------+--------+--------+ > > > > | Frames/mAh | 91.25 | 114.64 | +26% | 103.11 | +13% | > > > > +------------+----------+------------+--------+--------+--------+ > > > > > > > > > > > > > Next steps: > > > > =========== > > > > We are continuing to look into communication mechanisms other than > > > > hypercalls that are just as/more efficient and avoid switching into the VMM > > > > userspace. Any inputs in this regard are greatly appreciated. > > > > > > We're highly unlikely to entertain such an interface in KVM. > > > > > > The entire feature is dependent on pinning vCPUs to physical cores, for which > > > userspace is in the driver's seat. That is a well established and documented > > > policy which can be seen in the way we handle heterogeneous systems and > > > vPMU. > > > > > > Additionally, this bloats the KVM PV ABI with highly VMM-dependent interfaces > > > that I would not expect to benefit the typical user of KVM. > > > > > > Based on the data above, it would appear that the userspace implementation is > > > in the same neighborhood as a KVM-based implementation, which only further > > > weakens the case for moving this into the kernel. > > > > > > I certainly can appreciate the motivation for the series, but this feature > > > should be in userspace as some form of a virtual device. > > > > +1 on all of the above. > > And I concur with all the above as well. Putting this in the kernel is > not an obvious fit at all as that requires a number of assumptions about > the VMM. > > As Oliver pointed out, the guest topology, and how it maps to the host > topology (vcpu pinning etc) is very much a VMM policy decision and will > be particularly important to handle guest frequency requests correctly. > > In addition to that, the VMM's software architecture may have an impact. > Crosvm for example does device emulation in separate processes for > security reasons, so it is likely that adjusting the scheduling > parameters ('util_guest', uclamp, or else) only for the vCPU thread that > issues frequency requests will be sub-optimal for performance, we may > want to adjust those parameters for all the tasks that are on the > critical path. > > And at an even higher level, assuming in the kernel a certain mapping of > vCPU threads to host threads feels kinda wrong, this too is a host > userspace policy decision I believe. Not that anybody in their right > mind would want to do this, but I _think_ it would technically be > feasible to serialize the execution of multiple vCPUs on the same host > thread, at which point the util_guest thingy becomes entirely bogus. (I > obviously don't want to conflate this use-case, it's just an example > that shows the proposed abstraction in the series is not a perfect fit > for the KVM userspace delegation model.) > > So +1 from me to move this as a virtual device of some kind. And if the > extra cost of exiting all the way back to userspace is prohibitive (is > it btw?), then we can try to work on that. Maybe something a la vhost > can be done to optimize, I'll have a think. > > > The one thing I'd like to understand that the comment seems to imply > > that there is a significant difference in overhead between a hypercall > > and an MMIO. In my experience, both are pretty similar in cost for a > > handling location (both in userspace or both in the kernel). MMIO > > handling is a tiny bit more expensive due to a guaranteed TLB miss > > followed by a walk of the in-kernel device ranges, but that's all. It > > should hardly register. > > > > And if you really want some super-low latency, low overhead > > signalling, maybe an exception is the wrong tool for the job. Shared > > memory communication could be more appropriate. > > I presume some kind of signalling mechanism will be necessary to > synchronously update host scheduling parameters in response to guest > frequency requests, but if the volume of data requires it then a shared > buffer + doorbell type of approach should do. > > Thinking about it, using SCMI over virtio would implement exactly that. > Linux-as-a-guest already supports it IIRC, so possibly the problem > being addressed in this series could be 'simply' solved using an SCMI > backend in the VMM... This is what was suggested at LPC: using virtio-scmi and scmi performance domain in the guest for cpufreq driver using a vhost user scmi backend in user space from this vhost userspace backend updates the uclamp min of the vCPU thread or use another method is this one is not good enough > > Thanks, > Quentin