Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp2708469lqt; Mon, 22 Apr 2024 21:24:30 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU/lhhksuy2RtOjcaU/t4F+/C/q67oYgpJlQCk3eEiTmkHjTOLcuwCMKLrm94UT2Kk640BvHBFkP7yU1wuzYheWzSNC9qjYuRCqx1B8yQ== X-Google-Smtp-Source: AGHT+IEtY9L0571B5LljNWrzZFOrsdwNJebFeFSxo7G6WbW72GPz+Ep3vOydGtKYd6lnQaoqDczb X-Received: by 2002:a17:907:7dac:b0:a56:2555:1235 with SMTP id oz44-20020a1709077dac00b00a5625551235mr2746319ejc.45.1713846269942; Mon, 22 Apr 2024 21:24:29 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713846269; cv=pass; d=google.com; s=arc-20160816; b=tMoQ4QIzB9ea/onrX8sRgSI1QzgkJ7TU8f0wP22+5Emhf3DNz/ziz+fSe+JoL/uYSy jmRY1Eiq9I2mxPh62xbg2D2g8oAxhMwfan7RAXuPoqHe6JccZLsfGhdEl7BG3qYE08rb HjN4gifBC6zBXm4HrIhJNMgHUt0N14gJe5/u3Csq24MxqwHVbXTH3I3S/lLXdyoA/p3f 0mZzhG4jigHKh6Bd9vi2ir0sfEyNJnBoJquJYgn3AqwluN9anRHSeFtNCkGyRW/jkp2T 0uP9XFdzS77XaUg86CfW15vpKuwcTbnjghWlVNzUp20mfI2BEvqfh+v/6rWua6NREM9t ov4Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=5buiYtGazihCB3jxFyzdy4lm+my6nJSrcsNRV34lvg0=; fh=2XFJzCFkIz1HAM/E/NAfqSCcyrQuIkGhRxoewEM94Gw=; b=pb38aEy0DfzRAFjOYt8GWiA8qTyBD8oOdrR26UP1nJ/JFht4NVVxSSpdjqJaDK8x+f VQCNkrsfgJ3W4ElbV5DXH9cEw6tEr6I0wI42ug4rcmNyG6/I1kWnQebAJVXtzCJLT4GX iR8sUEdiQ87DJEtiUOjRFTc94y4RjmWuct8j6FQvG7ozog5V0i0EqQo51R8c/iABYa3C UC11MYfxuIxBb9telN1fdCUmcOMMjfto8DRIIS/UVgH3RLnYatLiKSHvugFV9buhPdHU 2GIwmv/8nZQ4SXlyihFc4kdP+gVjxp38vyIM7zKliKzfIrN9V73RiEE0PTkjf1JbYFeM pnOA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=4D4xuL6R; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-154494-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-154494-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id n2-20020a170906118200b00a554eb24af0si6778556eja.26.2024.04.22.21.24.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Apr 2024 21:24:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-154494-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=4D4xuL6R; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-154494-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-154494-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 7ED7A1F22A07 for ; Tue, 23 Apr 2024 04:24:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 840971F934; Tue, 23 Apr 2024 04:24:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4D4xuL6R" Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E1231CABA for ; Tue, 23 Apr 2024 04:24:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713846256; cv=none; b=s5BV3HqQYdlOe4EShVkN07KDvklbOAIWmUZ0VF7RSz115aV4qdJVN0vrjrYidsGDqUMqqmpmCLtjKKrSQcSTnKvlnurXdMzuvQX0fdpP1TnLYrLcwsTjdz7jN/rb7dnnnvwDmvamXm9//lGXN04yHDOjFTHlWN+Q6Y5600guFok= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713846256; c=relaxed/simple; bh=d5MJ4e2a30eG+jDYKNT7mG9Gc/6eaI9RNoBEMPFEkhI=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=bp/s/QfFGBnxzB72GJEefy+WeZRAPQnwijZPFZrlRqbPez/SbPQFnNjzHbIo8Y/iZAG0SkFTzDqapzp89HER282hy7FTbLT6aiI9TLq7Ti5wIf7+77zpMlHMvlb9+H/xtpsPeSxLpUtZxBLGcaH6I9oRYwBC1M2abvBm+7CXlbM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4D4xuL6R; arc=none smtp.client-ip=209.85.218.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-a55b93f5540so199818466b.1 for ; Mon, 22 Apr 2024 21:24:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713846253; x=1714451053; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5buiYtGazihCB3jxFyzdy4lm+my6nJSrcsNRV34lvg0=; b=4D4xuL6RrYLKJ57dqdUwtENAhUfshdUvkani8En8wSqDPC0KoCmvI5pbPdReqeO/Y3 L6ugBUuH4yXYkeV1WTasxv6Hu5CwdabCypnv+rKWRAjDmMysqpMTZxn2gTUQqRP0Nb5F frSFwcJcPwXYztG9qaV8PCcxT+ipd4wJea9rn6bMQMiWAvQXXugrQ0NSkyWsv6sPlHq6 2hgI4NpVDGo3iDRfvWHBmus/5V79Agsd1lTPQ0GUgCzQsQHoHcbuKSpeKqJOTouohybv mu/5cto2ZZmiuufZNCSdnPaItJgr5t73FZ7mHE/FSxg8xDYG0Y1kDyTdkoWMzVZifzZq 4riQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713846253; x=1714451053; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5buiYtGazihCB3jxFyzdy4lm+my6nJSrcsNRV34lvg0=; b=ea4BVyLhmqBaNOUSpvTMiWaNet/1MJ0/0RA1KZeHB4a3WtHjsvzVNKoz/q70ft/ake tv1Ed64nShTPAt2tRal9CJnOIWPnxYN5LjmmnNkIElxPaKrso/5Ks36IpfOy5aHP/BNq xTw1gNDBuJmL4Dog0IdA36VtQYdmAYAplLPH4O7/J+j3NY9EE1GlNaD3hdfEtod4n73X giVwnhMTjDkfKj1HmFyqQrELpN6cKcntaexQ34Q1Wtf/DRfGTo53V6AXOBEAwtJBQ7QT eadbgJ3JWTBj7b4j6qXnLxrQs5DPGvdnxT4+TbU1EH0McNLrjDYvMFJfLYLdIbRQf/3z vLAg== X-Forwarded-Encrypted: i=1; AJvYcCVTmdtiBTliCV+0B+KHbJR9RNwWgPOFehCV1xsu7TXpE47GVgLI1R+cuoAF7hTow+8FtNBNz9ql7cB4tu2oR3F5nDPqJf3XjyZ17c2D X-Gm-Message-State: AOJu0Yyde0Z5t7iHMDhZWDGRGe4X4uMpR/eJ3QnYTei21O73IPO+dECf WktqBMikwGpF3lmhJuUW4Ru5i+5kmmJpVxUHNvQAVyes5cBONBDDsgdImm7x35G5pplYVAfbF1H DyznYNUPxqJmJIKQlY1igXb9WhsY2hAd2AUzH X-Received: by 2002:a17:906:4e82:b0:a55:b67a:c3ad with SMTP id v2-20020a1709064e8200b00a55b67ac3admr2904085eju.73.1713846252447; Mon, 22 Apr 2024 21:24:12 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <18b19dd4-6d76-4ed8-b784-32436ab93d06@linux.intel.com> <4c47b975-ad30-4be9-a0a9-f0989d1fa395@linux.intel.com> <737f0c66-2237-4ed3-8999-19fe9cca9ecc@linux.intel.com> <4d60384a-11e0-2f2b-a568-517b40c91b25@loongson.cn> <1ec7a21c-71d0-4f3e-9fa3-3de8ca0f7315@linux.intel.com> <5279eabc-ca46-ee1b-b80d-9a511ba90a36@loongson.cn> In-Reply-To: <5279eabc-ca46-ee1b-b80d-9a511ba90a36@loongson.cn> From: Mingwei Zhang Date: Mon, 22 Apr 2024 21:23:35 -0700 Message-ID: Subject: Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU To: maobibo Cc: "Mi, Dapeng" , Sean Christopherson , Xiong Zhang , pbonzini@redhat.com, peterz@infradead.org, kan.liang@intel.com, zhenyuw@linux.intel.com, jmattson@google.com, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, zhiyuan.lv@intel.com, eranian@google.com, irogers@google.com, samantha.alt@intel.com, like.xu.linux@gmail.com, chao.gao@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Apr 22, 2024 at 8:55=E2=80=AFPM maobibo wrote= : > > > > On 2024/4/23 =E4=B8=8A=E5=8D=8811:13, Mi, Dapeng wrote: > > > > On 4/23/2024 10:53 AM, maobibo wrote: > >> > >> > >> On 2024/4/23 =E4=B8=8A=E5=8D=8810:44, Mi, Dapeng wrote: > >>> > >>> On 4/23/2024 9:01 AM, maobibo wrote: > >>>> > >>>> > >>>> On 2024/4/23 =E4=B8=8A=E5=8D=881:01, Sean Christopherson wrote: > >>>>> On Mon, Apr 22, 2024, maobibo wrote: > >>>>>> On 2024/4/16 =E4=B8=8A=E5=8D=886:45, Sean Christopherson wrote: > >>>>>>> On Mon, Apr 15, 2024, Mingwei Zhang wrote: > >>>>>>>> On Mon, Apr 15, 2024 at 10:38=E2=80=AFAM Sean Christopherson > >>>>>>>> wrote: > >>>>>>>>> One my biggest complaints with the current vPMU code is that > >>>>>>>>> the roles and > >>>>>>>>> responsibilities between KVM and perf are poorly defined, which > >>>>>>>>> leads to suboptimal > >>>>>>>>> and hard to maintain code. > >>>>>>>>> > >>>>>>>>> Case in point, I'm pretty sure leaving guest values in PMCs > >>>>>>>>> _would_ leak guest > >>>>>>>>> state to userspace processes that have RDPMC permissions, as > >>>>>>>>> the PMCs might not > >>>>>>>>> be dirty from perf's perspective (see > >>>>>>>>> perf_clear_dirty_counters()). > >>>>>>>>> > >>>>>>>>> Blindly clearing PMCs in KVM "solves" that problem, but in > >>>>>>>>> doing so makes the > >>>>>>>>> overall code brittle because it's not clear whether KVM _needs_ > >>>>>>>>> to clear PMCs, > >>>>>>>>> or if KVM is just being paranoid. > >>>>>>>> > >>>>>>>> So once this rolls out, perf and vPMU are clients directly to > >>>>>>>> PMU HW. > >>>>>>> > >>>>>>> I don't think this is a statement we want to make, as it opens a > >>>>>>> discussion > >>>>>>> that we won't win. Nor do I think it's one we *need* to make. > >>>>>>> KVM doesn't need > >>>>>>> to be on equal footing with perf in terms of owning/managing PMU > >>>>>>> hardware, KVM > >>>>>>> just needs a few APIs to allow faithfully and accurately > >>>>>>> virtualizing a guest PMU. > >>>>>>> > >>>>>>>> Faithful cleaning (blind cleaning) has to be the baseline > >>>>>>>> implementation, until both clients agree to a "deal" between the= m. > >>>>>>>> Currently, there is no such deal, but I believe we could have > >>>>>>>> one via > >>>>>>>> future discussion. > >>>>>>> > >>>>>>> What I am saying is that there needs to be a "deal" in place > >>>>>>> before this code > >>>>>>> is merged. It doesn't need to be anything fancy, e.g. perf can > >>>>>>> still pave over > >>>>>>> PMCs it doesn't immediately load, as opposed to using > >>>>>>> cpu_hw_events.dirty to lazily > >>>>>>> do the clearing. But perf and KVM need to work together from the > >>>>>>> get go, ie. I > >>>>>>> don't want KVM doing something without regard to what perf does, > >>>>>>> and vice versa. > >>>>>>> > >>>>>> There is similar issue on LoongArch vPMU where vm can directly pmu > >>>>>> hardware > >>>>>> and pmu hw is shard with guest and host. Besides context switch > >>>>>> there are > >>>>>> other places where perf core will access pmu hw, such as tick > >>>>>> timer/hrtimer/ipi function call, and KVM can only intercept > >>>>>> context switch. > >>>>> > >>>>> Two questions: > >>>>> > >>>>> 1) Can KVM prevent the guest from accessing the PMU? > >>>>> > >>>>> 2) If so, KVM can grant partial access to the PMU, or is it all > >>>>> or nothing? > >>>>> > >>>>> If the answer to both questions is "yes", then it sounds like > >>>>> LoongArch *requires* > >>>>> mediated/passthrough support in order to virtualize its PMU. > >>>> > >>>> Hi Sean, > >>>> > >>>> Thank for your quick response. > >>>> > >>>> yes, kvm can prevent guest from accessing the PMU and grant partial > >>>> or all to access to the PMU. Only that if one pmu event is granted > >>>> to VM, host can not access this pmu event again. There must be pmu > >>>> event switch if host want to. > >>> > >>> PMU event is a software entity which won't be shared. did you mean if > >>> a PMU HW counter is granted to VM, then Host can't access the PMU HW > >>> counter, right? > >> yes, if PMU HW counter/control is granted to VM. The value comes from > >> guest, and is not meaningful for host. Host pmu core does not know > >> that it is granted to VM, host still think that it owns pmu. > > > > That's one issue this patchset tries to solve. Current new mediated x86 > > vPMU framework doesn't allow Host or Guest own the PMU HW resource > > simultaneously. Only when there is no !exclude_guest event on host, > > guest is allowed to exclusively own the PMU HW resource. > > > > > >> > >> Just like FPU register, it is shared by VM and host during different > >> time and it is lately switched. But if IPI or timer interrupt uses FPU > >> register on host, there will be the same issue. > > > > I didn't fully get your point. When IPI or timer interrupt reach, a > > VM-exit is triggered to make CPU traps into host first and then the hos= t > yes, it is. This is correct. And this is one of the points that we had debated internally whether we should do PMU context switch at vcpu loop boundary or VM Enter/exit boundary. (host-level) timer interrupt can force VM Exit, which I think happens every 4ms or 1ms, depending on configuration. One of the key reasons we currently propose this is because it is the same boundary as the legacy PMU, i.e., it would be simple to propose from the perf subsystem perspective. Performance wise, doing PMU context switch at vcpu boundary would be way better in general. But the downside is that perf sub-system lose the capability to profile majority of the KVM code (functions) when guest PMU is enabled. > > > interrupt handler is called. Or are you complaining the executing > > sequence of switching guest PMU MSRs and these interrupt handler? > In our vPMU implementation, it is ok if vPMU is switched in vm exit > path, however there is problem if vPMU is switched during vcpu thread > sched-out/sched-in path since IPI/timer irq interrupt access pmu > register in host mode. Oh, the IPI/timer irq handler will access PMU registers? I thought only the host-level NMI handler will access the PMU MSRs since PMI is registered under NMI. In that case, you should disable IRQ during vcpu context switch. For NMI, we prevent its handler from accessing the PMU registers. In particular, we use a per-cpu variable to guard that. So, the host-level PMI handler for perf sub-system will check the variable before proceeding. > > In general it will be better if the switch is done in vcpu thread > sched-out/sched-in, else there is requirement to profile kvm > hypervisor.Even there is such requirement, it is only one option. In > most conditions, it will better if time of VM context exit is small. > Performance wise, agree, but there will be debate on perf functionality loss at the host level. Maybe, (just maybe), it is possible to do PMU context switch at vcpu boundary normally, but doing it at VM Enter/Exit boundary when host is profiling KVM kernel module. So, dynamically adjusting PMU context switch location could be an option. > > > > > >> > >> Regards > >> Bibo Mao > >>> > >>> > >>>> > >>>>> > >>>>>> Can we add callback handler in structure kvm_guest_cbs? just like > >>>>>> this: > >>>>>> @@ -6403,6 +6403,7 @@ static struct perf_guest_info_callbacks > >>>>>> kvm_guest_cbs > >>>>>> =3D { > >>>>>> .state =3D kvm_guest_state, > >>>>>> .get_ip =3D kvm_guest_get_ip, > >>>>>> .handle_intel_pt_intr =3D NULL, > >>>>>> + .lose_pmu =3D kvm_guest_lose_pmu, > >>>>>> }; > >>>>>> > >>>>>> By the way, I do not know should the callback handler be triggered > >>>>>> in perf > >>>>>> core or detailed pmu hw driver. From ARM pmu hw driver, it is > >>>>>> triggered in > >>>>>> pmu hw driver such as function kvm_vcpu_pmu_resync_el0, > >>>>>> but I think it will be better if it is done in perf core. > >>>>> > >>>>> I don't think we want to take the approach of perf and KVM guests > >>>>> "fighting" over > >>>>> the PMU. That's effectively what we have today, and it's a mess > >>>>> for KVM because > >>>>> it's impossible to provide consistent, deterministic behavior for > >>>>> the guest. And > >>>>> it's just as messy for perf, which ends up having wierd, cumbersome > >>>>> flows that > >>>>> exists purely to try to play nice with KVM. > >>>> With existing pmu core code, in tick timer interrupt or IPI function > >>>> call interrupt pmu hw may be accessed by host when VM is running and > >>>> pmu is already granted to guest. KVM can not intercept host > >>>> IPI/timer interrupt, there is no pmu context switch, there will be > >>>> problem. > >>>> > >>>> Regards > >>>> Bibo Mao > >>>> > >> >