Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp2689973lqt; Mon, 22 Apr 2024 20:26:40 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUi2hxfCT61+7uzLLZFMl8LJ7PsTo6TaBvr0WtwmNUIVuWduImLmoS89mO46OBhWDGpElrO0R9FzWqluelFbK+2Vw2zFvmvQhsmEgpaRg== X-Google-Smtp-Source: AGHT+IGkk95fGd0X1Gr0OCJKexh5z/BHCS584yS9galqHdL+e8i7e2KJKdFA1+ZKC3SqgT7TvN6G X-Received: by 2002:a05:620a:3bd6:b0:78e:eb70:c7aa with SMTP id yf22-20020a05620a3bd600b0078eeb70c7aamr12964887qkn.56.1713842800493; Mon, 22 Apr 2024 20:26:40 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713842800; cv=pass; d=google.com; s=arc-20160816; b=QqoWicNHb11g25A5br462qOXRbSp7VesJaCpGsjSrO36RAN69O/fXhCWuMNPL+YOwL qPTkTHYwL95unX/QmMBGrDoZTsOzSvINoKiICux80p944iITRE91q9ml91z4uB3N5y5v Ocj8IL4/mmITywjOzajUTiYECpNA3hcBt4ZNdI3t3hJxiEOAp8tkc7ZVKaNe6CbL79nC pmRSKEGikmDSgYPaFjpaT7sw42zXw/6NB1guAkozZjoR5GFpQ83Q8gGF0YRtTlRPJ5CU A1OjgDpxrzp4O+9bnx8792M2PuwIiKSMGKUDKXouOrovNcLv9YMHJg1vJ9GF327WfSoI nbXg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:user-agent:date :message-id:from:references:cc:to:subject; bh=X4nWZuOkvD4qUjzK3p47WJzMcGQ2vS2ZPs/3LLUuS+o=; fh=rJJ/Sinn7lP97KnCML8B2uMCYSti/0iex80Lmq5Iz/o=; b=Ex+Pa6tLXx12uf7HeYvVrlUjUCH1JezfqZAIL0DmtrHoHGFX4SLunwslYdmNx4HZqA jPH87rAlmiMTOG9RvX1kMZUOhaj2jIZ3Jn5nXCf3DFr0v0+ZDCftCrRVqIpc3l6entJH b/SnGLwF6YiVOCGDD1rJX1AR6nxuRrfMd7yKfuNXi2VtAd3At8vDwZ4KgKIPSr83wvhd GBVl5k1lPlFjbWyQzNqWHRVoUi60L8WuJKbZaJYgh4IDM5OrT2O9S/8U2MozviAHUmfF QmWsxeXuFUWIoxy39nmlvCjbq3FfdswIRoqY5PvLSx2wlumN9WuyAjTm+bGLbk9VXMeN lpig==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=loongson.cn); spf=pass (google.com: domain of linux-kernel+bounces-154457-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-154457-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id wk17-20020a05620a579100b007906eb6d704si4958271qkn.774.2024.04.22.20.26.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Apr 2024 20:26:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-154457-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=loongson.cn); spf=pass (google.com: domain of linux-kernel+bounces-154457-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-154457-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 0F4E41C21B27 for ; Tue, 23 Apr 2024 03:26:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A97FE1BDDB; Tue, 23 Apr 2024 03:26:23 +0000 (UTC) Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D88332F5E; Tue, 23 Apr 2024 03:26:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=114.242.206.163 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713842782; cv=none; b=Cpxl3V48KDRoUwLL3z1rWU18TwKd+RcztoMF2GmvPFYw9dWw0UEOPlfDNspjMTZiw1V1tDH3jAGGaaHuz4b0iNS6rI5IHiewU9O3m7+QK7xKPWOD7cASGXwevC2wTS9AudNPmeKfs24Sjod9Y1EEdetBCMn0M+S1DPauRB+rxXo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713842782; c=relaxed/simple; bh=CAJpcXGdqvr9FJJtpqAZjbO1UWsz5IjjJ1sxEz09kzM=; h=Subject:To:Cc:References:From:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type; b=iNbyx7vLjtfcF1p8txr8sVSriBK8eywDUo3pzX6o/wekpHUeVuLdlKQpDGGoInXcg1tYsdQvSbEoXuUeMs+NS1RdC9T80okywXGWlcfG2c7T7EH8VVfKjlAuBRsiKHJoBa8SPNLRkZ3ttquytaMXBruz4kqfjoSMOonjd9/frAE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=loongson.cn; spf=pass smtp.mailfrom=loongson.cn; arc=none smtp.client-ip=114.242.206.163 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.20.42.173]) by gateway (Coremail) with SMTP id _____8CxJvBXKidmVyUBAA--.6777S3; Tue, 23 Apr 2024 11:26:15 +0800 (CST) Received: from [10.20.42.173] (unknown [10.20.42.173]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Bxut1UKidmN_ABAA--.8069S3; Tue, 23 Apr 2024 11:26:14 +0800 (CST) Subject: Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU To: "Mi, Dapeng" , Sean Christopherson Cc: Mingwei Zhang , Xiong Zhang , pbonzini@redhat.com, peterz@infradead.org, kan.liang@intel.com, zhenyuw@linux.intel.com, jmattson@google.com, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, zhiyuan.lv@intel.com, eranian@google.com, irogers@google.com, samantha.alt@intel.com, like.xu.linux@gmail.com, chao.gao@intel.com References: <18b19dd4-6d76-4ed8-b784-32436ab93d06@linux.intel.com> <4c47b975-ad30-4be9-a0a9-f0989d1fa395@linux.intel.com> <737f0c66-2237-4ed3-8999-19fe9cca9ecc@linux.intel.com> <4d60384a-11e0-2f2b-a568-517b40c91b25@loongson.cn> <1ec7a21c-71d0-4f3e-9fa3-3de8ca0f7315@linux.intel.com> From: maobibo Message-ID: <5f27b793-b19e-d429-190c-1c20a6d1c649@loongson.cn> Date: Tue, 23 Apr 2024 11:26:12 +0800 User-Agent: Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: <1ec7a21c-71d0-4f3e-9fa3-3de8ca0f7315@linux.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-CM-TRANSID:AQAAf8Bxut1UKidmN_ABAA--.8069S3 X-CM-SenderInfo: xpdruxter6z05rqj20fqof0/ X-Coremail-Antispam: 1Uk129KBj93XoW3Xr17XryfJFyUtFy3GryrXwc_yoWxZr17pF WxAF4jkr4DJr10yw1Utw18JFyUtrWUJw1UXrn8tFyUA3909r1Fqr1UXryj9FyUWr48GF1j qr4Ut347Zw1DAagCm3ZEXasCq-sJn29KB7ZKAUJUUUU7529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUPab4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1Y6r17M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2kKe7AKxVWUXVWUAwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07 AIYIkI8VC2zVCFFI0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWU AVWUtwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcVAKI4 8JMxk0xIA0c2IEe2xFo4CEbIxvr21lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij64vI r41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1l4IxYO2xFxVAFwI0_Jw0_GFylx2IqxVAqx4xG67 AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIY rxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWx JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxU4qg4DU UUU On 2024/4/23 上午11:13, Mi, Dapeng wrote: > > On 4/23/2024 10:53 AM, maobibo wrote: >> >> >> On 2024/4/23 上午10:44, Mi, Dapeng wrote: >>> >>> On 4/23/2024 9:01 AM, maobibo wrote: >>>> >>>> >>>> On 2024/4/23 上午1:01, Sean Christopherson wrote: >>>>> On Mon, Apr 22, 2024, maobibo wrote: >>>>>> On 2024/4/16 上午6:45, Sean Christopherson wrote: >>>>>>> On Mon, Apr 15, 2024, Mingwei Zhang wrote: >>>>>>>> On Mon, Apr 15, 2024 at 10:38 AM Sean Christopherson >>>>>>>> wrote: >>>>>>>>> One my biggest complaints with the current vPMU code is that >>>>>>>>> the roles and >>>>>>>>> responsibilities between KVM and perf are poorly defined, which >>>>>>>>> leads to suboptimal >>>>>>>>> and hard to maintain code. >>>>>>>>> >>>>>>>>> Case in point, I'm pretty sure leaving guest values in PMCs >>>>>>>>> _would_ leak guest >>>>>>>>> state to userspace processes that have RDPMC permissions, as >>>>>>>>> the PMCs might not >>>>>>>>> be dirty from perf's perspective (see >>>>>>>>> perf_clear_dirty_counters()). >>>>>>>>> >>>>>>>>> Blindly clearing PMCs in KVM "solves" that problem, but in >>>>>>>>> doing so makes the >>>>>>>>> overall code brittle because it's not clear whether KVM _needs_ >>>>>>>>> to clear PMCs, >>>>>>>>> or if KVM is just being paranoid. >>>>>>>> >>>>>>>> So once this rolls out, perf and vPMU are clients directly to >>>>>>>> PMU HW. >>>>>>> >>>>>>> I don't think this is a statement we want to make, as it opens a >>>>>>> discussion >>>>>>> that we won't win.  Nor do I think it's one we *need* to make. >>>>>>> KVM doesn't need >>>>>>> to be on equal footing with perf in terms of owning/managing PMU >>>>>>> hardware, KVM >>>>>>> just needs a few APIs to allow faithfully and accurately >>>>>>> virtualizing a guest PMU. >>>>>>> >>>>>>>> Faithful cleaning (blind cleaning) has to be the baseline >>>>>>>> implementation, until both clients agree to a "deal" between them. >>>>>>>> Currently, there is no such deal, but I believe we could have >>>>>>>> one via >>>>>>>> future discussion. >>>>>>> >>>>>>> What I am saying is that there needs to be a "deal" in place >>>>>>> before this code >>>>>>> is merged.  It doesn't need to be anything fancy, e.g. perf can >>>>>>> still pave over >>>>>>> PMCs it doesn't immediately load, as opposed to using >>>>>>> cpu_hw_events.dirty to lazily >>>>>>> do the clearing.  But perf and KVM need to work together from the >>>>>>> get go, ie. I >>>>>>> don't want KVM doing something without regard to what perf does, >>>>>>> and vice versa. >>>>>>> >>>>>> There is similar issue on LoongArch vPMU where vm can directly pmu >>>>>> hardware >>>>>> and pmu hw is shard with guest and host. Besides context switch >>>>>> there are >>>>>> other places where perf core will access pmu hw, such as tick >>>>>> timer/hrtimer/ipi function call, and KVM can only intercept >>>>>> context switch. >>>>> >>>>> Two questions: >>>>> >>>>>   1) Can KVM prevent the guest from accessing the PMU? >>>>> >>>>>   2) If so, KVM can grant partial access to the PMU, or is it all >>>>> or nothing? >>>>> >>>>> If the answer to both questions is "yes", then it sounds like >>>>> LoongArch *requires* >>>>> mediated/passthrough support in order to virtualize its PMU. >>>> >>>> Hi Sean, >>>> >>>> Thank for your quick response. >>>> >>>> yes, kvm can prevent guest from accessing the PMU and grant partial >>>> or all to access to the PMU. Only that if one pmu event is granted >>>> to VM, host can not access this pmu event again. There must be pmu >>>> event switch if host want to. >>> >>> PMU event is a software entity which won't be shared. did you mean if >>> a PMU HW counter is granted to VM, then Host can't access the PMU HW >>> counter, right? >> yes, if PMU HW counter/control is granted to VM. The value comes from >> guest, and is not meaningful for host.  Host pmu core does not know >> that it is granted to VM, host still think that it owns pmu. > > That's one issue this patchset tries to solve. Current new mediated x86 > vPMU framework doesn't allow Host or Guest own the PMU HW resource > simultaneously. Only when there is no !exclude_guest event on host, > guest is allowed to exclusively own the PMU HW resource. > > >> >> Just like FPU register, it is shared by VM and host during different >> time and it is lately switched. But if IPI or timer interrupt uses FPU >> register on host, there will be the same issue. > > I didn't fully get your point. When IPI or timer interrupt reach, a > VM-exit is triggered to make CPU traps into host first and then the host > interrupt handler is called. Or are you complaining the executing > sequence of switching guest PMU MSRs and these interrupt handler? It is not necessary to save/restore PMU HW at every vm exit, it had better be lately saved/restored, such as only when vcpu thread is sched-out/sched-in, else the cost will be a little expensive. I know little about perf core. However there is PMU HW access in interrupt mode. That means PMU HW access should be irq disabled in general mode, else there may be nested PMU HW access. Is that true? > > >> >> Regards >> Bibo Mao >>> >>> >>>> >>>>> >>>>>> Can we add callback handler in structure kvm_guest_cbs?  just like >>>>>> this: >>>>>> @@ -6403,6 +6403,7 @@ static struct perf_guest_info_callbacks >>>>>> kvm_guest_cbs >>>>>> = { >>>>>>          .state                  = kvm_guest_state, >>>>>>          .get_ip                 = kvm_guest_get_ip, >>>>>>          .handle_intel_pt_intr   = NULL, >>>>>> +       .lose_pmu               = kvm_guest_lose_pmu, >>>>>>   }; >>>>>> >>>>>> By the way, I do not know should the callback handler be triggered >>>>>> in perf >>>>>> core or detailed pmu hw driver. From ARM pmu hw driver, it is >>>>>> triggered in >>>>>> pmu hw driver such as function kvm_vcpu_pmu_resync_el0, >>>>>> but I think it will be better if it is done in perf core. >>>>> >>>>> I don't think we want to take the approach of perf and KVM guests >>>>> "fighting" over >>>>> the PMU.  That's effectively what we have today, and it's a mess >>>>> for KVM because >>>>> it's impossible to provide consistent, deterministic behavior for >>>>> the guest.  And >>>>> it's just as messy for perf, which ends up having wierd, cumbersome >>>>> flows that >>>>> exists purely to try to play nice with KVM. >>>> With existing pmu core code, in tick timer interrupt or IPI function >>>> call interrupt pmu hw may be accessed by host when VM is running and >>>> pmu is already granted to guest. KVM can not intercept host >>>> IPI/timer interrupt, there is no pmu context switch, there will be >>>> problem. >>>> >>>> Regards >>>> Bibo Mao >>>> >>