Received: by 2002:a89:48b:0:b0:1f5:f2ab:c469 with SMTP id a11csp1415485lqd; Thu, 25 Apr 2024 15:12:36 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXrdmX8jD/ESpCgDv6Hr3tNJlPmC2KUv/re31lMzfps7vkmkvw6YSP7Da1vLfHhhV150fcAvyB59u2drMzPuktr5OYQNOohS5buFBMLQw== X-Google-Smtp-Source: AGHT+IFFLHIg2925VYpRLy7wLrnt84bNa+Z83WRkPkQ5ZgQggzTq+Gk4YopjOMqqm9ic90i7cFPw X-Received: by 2002:a19:f704:0:b0:51c:9d2:f43e with SMTP id z4-20020a19f704000000b0051c09d2f43emr445192lfe.36.1714083156706; Thu, 25 Apr 2024 15:12:36 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1714083156; cv=pass; d=google.com; s=arc-20160816; b=W5krqJyHZeD6+Tu5i/eAwXzTKKz2o5fh6TkeDMyo2kteUrzY8IeLLh4wNyAXZLGrr1 mj/wCno0CIeJvaRUC5osg6jTRLq1sIA1AXRcM+BoEVvsvA0ZzyJj9LCCTa/OryeOYFgM yc0CHkzq3tTHv5mYUJBICFJbsYR0JyeKakPYN7UShy6AfuqgNUMvYkfg9JB/oBdbJQ+M UJyyG8v7BwceqZ+ukIjvpsbBFJcwUJokawzJqNF4xERcqL5iKVaQ0oxG4h5lDWnT0c6n Mvg1P4QEq87ZBI0UCCYuksL1UzrMaH2Hfb8JBCr/Uc9fxi8th9zM28iW0TCCCLVRQwm6 IHoQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=Ry7lsmVNlzOHSJL7R9g/zY1L87NzHqD+U00zYZIHsz4=; fh=Aib7iYuOwQpcY4XZUIAGHWC71BvYIwU4ewM9KtSJd3k=; b=Fu4nAsEyM8rFHfRCbVQf033d1Tn3XfyrNuhxB/35MZ4c4EPxs7PwY71rbcifKJ7I5f Qz1BAHHsPIoJdVW2YojG5c4YdweR5c9rubWMm0iNqJ9qrWRsIOo+vBVhctpA1OumbShc 0SLUbk0HD9OT4Q6U3Yt8XA/RFzZMC3nqS4RyrBfJCUV40WGWtU+yAuw+UTjc+NF6z8br 66e6nYyPJzL9I8hloGPFUg/OG51wT5/duw9ULF1fwWmzC7/gqPgWeAcieGxgENi2Erl7 MsMi+E+bzqjXrhZ7YtDbFxA20/JI0lgghuxmETmWILg6VlN9H6cmEZenLST76DsjDWpV g+OA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ik7vF7AA; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-158875-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-158875-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id t25-20020a17090605d900b00a5569c15e77si10309375ejt.993.2024.04.25.15.12.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 15:12:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-158875-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ik7vF7AA; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-158875-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-158875-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 13E1F1F2206C for ; Thu, 25 Apr 2024 16:14:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6E6CA14D43E; Thu, 25 Apr 2024 16:13:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Ik7vF7AA" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBFB014D282; Thu, 25 Apr 2024 16:13:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714061637; cv=none; b=G32xF0BMZVhrYQElWvwG+6RrN7jPeXmHQC/PlZQHmhC78nFv5Tch2lcpWek0D8oP36Ok9+ulfjpIj5V8T4WBvcJHg7Lmk42hfG+U97Pt0E78AfMgXX1H2874pUjwYfWjR4Do5A1fk/06sBao924sT0TKN7PfXCTA+kqb5RxM8l8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714061637; c=relaxed/simple; bh=eLtrtiJAi4FDNby7Vd3QmuFgHx8HIqmusW8j3U4jF6E=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=VnIPNzCHjvaToBfdWlyJf+MNV32mg3Yoo4Ti/t6m491AwN1XRYtzi+/zWMz9kozk2PQjuj+eJDMt21fDrIqzWjwiBIKBelt57eqw2NA5n6f6nEKWNaNAneYUEQeHyQCvVoqDGylnfTcB9Isp8wpVrgwaxp+ua6JLwjvCqbUO670= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Ik7vF7AA; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1714061636; x=1745597636; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=eLtrtiJAi4FDNby7Vd3QmuFgHx8HIqmusW8j3U4jF6E=; b=Ik7vF7AAgsNyxD5M95V6CPIQsbEHA9CILT6xtRFC9iaEsyPRU5UsggBh LLs5SQsk0417tO3Zqxxr+hL2MCj4yGrI0ovIzVRs5YkYoJgGlYckiKRmw 5g1vskx6809BdA87yRny0vzUxFbKOgPk0rfrpmatoZaXwlJvaxInq5gQ2 DK6bWgRn9zRRn/vcqt3Z+8IsFqnnLdjjAQ2B+C9YuI6jsyS927d/xlUes KXiV7TwoIQdAAUK5CxYak3I/lU1cThgn8Ouhx0vfgW3rc8IbaMShqOxm0 EIZ12ZnnoiJ09Cui+aw6UH4umqz3YoqBHB7Iug29SH5fJ/2U5ic2WQNS0 Q==; X-CSE-ConnectionGUID: 6RPntY8yRImmUHRnHoqYrg== X-CSE-MsgGUID: 3Im+XkFhQvCcJumvqZsNHw== X-IronPort-AV: E=McAfee;i="6600,9927,11055"; a="20451477" X-IronPort-AV: E=Sophos;i="6.07,229,1708416000"; d="scan'208";a="20451477" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2024 09:13:55 -0700 X-CSE-ConnectionGUID: /kPodIciRMeXzmaoTEIePg== X-CSE-MsgGUID: EcJsFUZ9TMSFGj5p/O6djw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,229,1708416000"; d="scan'208";a="56069657" Received: from linux.intel.com ([10.54.29.200]) by orviesa002.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2024 09:13:54 -0700 Received: from [10.212.96.44] (kliang2-mobl1.ccr.corp.intel.com [10.212.96.44]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by linux.intel.com (Postfix) with ESMTPS id 3F119206D89D; Thu, 25 Apr 2024 09:13:52 -0700 (PDT) Message-ID: <6af2da05-cb47-46f7-b129-08463bc9469b@linux.intel.com> Date: Thu, 25 Apr 2024 12:13:50 -0400 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU To: Mingwei Zhang , "Mi, Dapeng" Cc: Sean Christopherson , maobibo , Xiong Zhang , pbonzini@redhat.com, peterz@infradead.org, kan.liang@intel.com, zhenyuw@linux.intel.com, jmattson@google.com, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, zhiyuan.lv@intel.com, eranian@google.com, irogers@google.com, samantha.alt@intel.com, like.xu.linux@gmail.com, chao.gao@intel.com References: <1ec7a21c-71d0-4f3e-9fa3-3de8ca0f7315@linux.intel.com> <5279eabc-ca46-ee1b-b80d-9a511ba90a36@loongson.cn> <7834a811-4764-42aa-8198-55c4556d947b@linux.intel.com> Content-Language: en-US From: "Liang, Kan" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 2024-04-25 12:24 a.m., Mingwei Zhang wrote: > On Wed, Apr 24, 2024 at 8:56 PM Mi, Dapeng wrote: >> >> >> On 4/24/2024 11:00 PM, Sean Christopherson wrote: >>> On Wed, Apr 24, 2024, Dapeng Mi wrote: >>>> On 4/24/2024 1:02 AM, Mingwei Zhang wrote: >>>>>>> Maybe, (just maybe), it is possible to do PMU context switch at vcpu >>>>>>> boundary normally, but doing it at VM Enter/Exit boundary when host is >>>>>>> profiling KVM kernel module. So, dynamically adjusting PMU context >>>>>>> switch location could be an option. >>>>>> If there are two VMs with pmu enabled both, however host PMU is not >>>>>> enabled. PMU context switch should be done in vcpu thread sched-out path. >>>>>> >>>>>> If host pmu is used also, we can choose whether PMU switch should be >>>>>> done in vm exit path or vcpu thread sched-out path. >>>>>> >>>>> host PMU is always enabled, ie., Linux currently does not support KVM >>>>> PMU running standalone. I guess what you mean is there are no active >>>>> perf_events on the host side. Allowing a PMU context switch drifting >>>>> from vm-enter/exit boundary to vcpu loop boundary by checking host >>>>> side events might be a good option. We can keep the discussion, but I >>>>> won't propose that in v2. >>>> I suspect if it's really doable to do this deferring. This still makes host >>>> lose the most of capability to profile KVM. Per my understanding, most of >>>> KVM overhead happens in the vcpu loop, exactly speaking in VM-exit handling. >>>> We have no idea when host want to create perf event to profile KVM, it could >>>> be at any time. >>> No, the idea is that KVM will load host PMU state asap, but only when host PMU >>> state actually needs to be loaded, i.e. only when there are relevant host events. >>> >>> If there are no host perf events, KVM keeps guest PMU state loaded for the entire >>> KVM_RUN loop, i.e. provides optimal behavior for the guest. But if a host perf >>> events exists (or comes along), the KVM context switches PMU at VM-Enter/VM-Exit, >>> i.e. lets the host profile almost all of KVM, at the cost of a degraded experience >>> for the guest while host perf events are active. >> >> I see. So KVM needs to provide a callback which needs to be called in >> the IPI handler. The KVM callback needs to be called to switch PMU state >> before perf really enabling host event and touching PMU MSRs. And only >> the perf event with exclude_guest attribute is allowed to create on >> host. Thanks. > > Do we really need a KVM callback? I think that is one option. > > Immediately after VMEXIT, KVM will check whether there are "host perf > events". If so, do the PMU context switch immediately. Otherwise, keep > deferring the context switch to the end of vPMU loop. > > Detecting if there are "host perf events" would be interesting. The > "host perf events" refer to the perf_events on the host that are > active and assigned with HW counters and that are saved when context > switching to the guest PMU. I think getting those events could be done > by fetching the bitmaps in cpuc. The cpuc is ARCH specific structure. I don't think it can be get in the generic code. You probably have to implement ARCH specific functions to fetch the bitmaps. It probably won't worth it. You may check the pinned_groups and flexible_groups to understand if there are host perf events which may be scheduled when VM-exit. But it will not tell the idx of the counters which can only be got when the host event is really scheduled. > I have to look into the details. But > at the time of VMEXIT, kvm should already have that information, so it > can immediately decide whether to do the PMU context switch or not. > > oh, but when the control is executing within the run loop, a > host-level profiling starts, say 'perf record -a ...', it will > generate an IPI to all CPUs. Maybe that's when we need a callback so > the KVM guest PMU context gets preempted for the host-level profiling. > Gah.. > > hmm, not a fan of that. That means the host can poke the guest PMU > context at any time and cause higher overhead. But I admit it is much > better than the current approach. > > The only thing is that: any command like 'perf record/stat -a' shot in > dark corners of the host can preempt guest PMUs of _all_ running VMs. > So, to alleviate that, maybe a module parameter that disables this > "preemption" is possible? This should fit scenarios where we don't > want guest PMU to be preempted outside of the vCPU loop? > It should not happen. For the current implementation, perf rejects all the !exclude_guest system-wide event creation if a guest with the vPMU is running. However, it's possible to create an exclude_guest system-wide event at any time. KVM cannot use the information from the VM-entry to decide if there will be active perf events in the VM-exit. The perf_guest_exit() will reload the host state. It's impossible to save the guest state after that. We may need a KVM callback. So perf can tell KVM whether to save the guest state before perf reloads the host state. Thanks, Kan >> >> >>> >>> My original sketch: https://lore.kernel.org/all/ZR3eNtP5IVAHeFNC@googlecom >