Received: by 2002:ab2:715a:0:b0:1fd:c064:50c with SMTP id l26csp93564lqm; Mon, 10 Jun 2024 13:54:37 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWzAAH6IQuYX9CkqqC8YRL8WflQmtDvLPDinSXjtjJYc+w/3cRJ1NlhqJOKvua7ZTnkz4egcV+EXn5/kKueuht7gL9WEag5I4WZ23Kwow== X-Google-Smtp-Source: AGHT+IEiRW/0+X7dsShX0Tw888xbYR73OFuMd7/oFOKxdZSf45i5CB1VmrNCSKY/yORtapdZ6ohm X-Received: by 2002:a17:906:3548:b0:a6f:132a:8214 with SMTP id a640c23a62f3a-a6f132a866bmr301140466b.39.1718052876893; Mon, 10 Jun 2024 13:54:36 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718052876; cv=pass; d=google.com; s=arc-20160816; b=t8wfppf/9+D8N4Ux0aROMqrQKACsXsmCQobmqV1gvQoP2MZeAYhltUi4sgcHV35Ffv uC5qjWWw4Ub9l8jVFEl4wwgmzwr9EL4Oat/y9AysG4ldxwFXM8dBglZImOOU08oKVxx+ FSjSlX/u3Oeaf58NYRtuLnu8SvNn3ltXGBxPgn1W0jsgKksp9KUN7T4RiJz/6KX5rDEc VK+i+TUilZFF5WjFfZKSp4r7FahfOhtWbds0HeCyEtl9m9EaNJZnO0ZgTPrNoR0um00h AsEzl8GsuGqb16YE7KqxCri5n/Dh+TSdJlLcdI/XXmoN/1e6LPSFdR2QZbXVyy0c9rRd IJpQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=VS6xSyzjS3Rv5GrMxcihCiEJx2R82kJuQhsJU9W0eQI=; fh=tUM4tSMzCcuGy0pI05THMwc/Yuyx+riNMlYaVmJ63ps=; b=MG9kAkv1oC5mIXoG31RROPKAf9U0TbPsXnEUYWFZuJ9r2V+EnRc5qe0aGE7Vu/apO4 Uvyij/FTV7jpV2Sk/rpTQvtRcSGh5BfMydNNVmZqgbPZEhPXooOZqtin1Oeee2H9HQr+ NlL+CJlndQoiHeYJIKtcugiggglxtu6G4Wdn1FdjWjVDxl2IFl8FA10oe7bmAzwqU0NQ Bx8opeuuTP9W22xOYgrwVEhs2323QCjw2jAcRj3Uk9fT/4H4UfZvcEhEdGy80EnQEfmR kk4rR83WNe/jekXoqhvpO+C5cjIf+lU0QqglHR1KtieeuKMyqASSnuHIAgytxrxvL9rw 8P7Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=h4QetuE9; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-208892-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-208892-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id a640c23a62f3a-a6ef707e548si320575866b.653.2024.06.10.13.54.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jun 2024 13:54:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-208892-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=h4QetuE9; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-208892-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-208892-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 37CE21F212F4 for ; Mon, 10 Jun 2024 20:54:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B4B6274071; Mon, 10 Jun 2024 20:54:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="h4QetuE9" Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97E396F30A for ; Mon, 10 Jun 2024 20:54:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718052869; cv=none; b=YKvcZlfR+oJeeL5/2d6KAzlc0/U0DWBDHlpj5YkA5rEOkF4WEJHk+SK6Qc6CAJBTYtRSQW67eLtvZBGuV3hYAR22ncdOhKSm+kCXUhw4z+lO34uRDFQi33AtDrLU6pB4G8t2ZMn6UnkUd6/7hCHy+MurwoPsse9tyCqun8DNHLo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718052869; c=relaxed/simple; bh=3IJ170TfkXxgGvDgJ3RTbvUYPnXkd2ZZUF0wrDSkLBc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=cDXL+R/t9LsPm9gwl6NpDjFPtgUoPoI1VHEceHxmxtkQvyNiqcnW3cQKUjpj+ACska7F5xmGV8Tigd5GBz2CYlft5DxC/AylhmpX+NNC+NcHupEmt1N/JKUG/PcuVGBpwMTjOFzqsXJRUO89FrzW0IjUxWmkduUPe5ONP0uU7h0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=h4QetuE9; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=VS6xSyzjS3Rv5GrMxcihCiEJx2R82kJuQhsJU9W0eQI=; b=h4QetuE99k/oK0fB+yhlDQMS7E ASpUJ7PkP1CZkbUAihiNHKnnyYnp0GCutTdDF+rvdWDDb0o9zZh1VQU92ehwOZh+L2RcBfjKL6ICC NeSyV9F7yJoO04n9EtF1CkLfaHuFKnLTMsOQ8AB73DEFfLfaSS7XCJ5V8Bgs+IJVgW2X63DXsVNLW yuDPLVkmEro+0o0d1xeGmtIRXVWhdwEtfwCpVOhJgTTSKvNVZQ7BoeDxst93PhAE/cR0sNMG3HlRy NOZm0ApS/UeFTd7wsbM2Oz6q/oKgYFURlQ11uiIV6JE2aqgaweOmBAh1f3TuCxIn5/BH5KCWK9CAV RE+Cm0uw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sGm1i-00000001fHm-3JMy; Mon, 10 Jun 2024 20:54:14 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 5BE00300439; Mon, 10 Jun 2024 22:54:05 +0200 (CEST) Date: Mon, 10 Jun 2024 22:54:05 +0200 From: Peter Zijlstra To: Stephane Eranian Cc: LKML , Ian Rogers , "Liang, Kan" , Andi Kleen , Ingo Molnar , "Narayan, Ananth" , "Bangoria, Ravikumar" , Namhyung Kim , Mingwei Zhang , Dapeng Mi , Zhang Xiong Subject: Re: [RFC] perf_events: exclude_guest impact on time_enabled/time_running Message-ID: <20240610205405.GA8774@noisy.programming.kicks-ass.net> References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Thu, Jun 06, 2024 at 12:57:35AM -0700, Stephane Eranian wrote: > Hi Peter, > > In the context of the new vPMU passthru patch series, we have to look > closer at the definition and implementation of the exclude_guest > filter in the perf_event_attr structure. This filter has been in the > kernel for many years. See patch: > https://lore.kernel.org/all/20240506053020.3911940-8-mizhang@google.com/ > > The presumed definition of the filter is that the user does not want > the event to count while the processor is running in guest mode (i.e., > inside the virtual machine guest OS or guest user code). > > The perf tool sets is by default on all core PMU events: > $ perf stat -vv -e cycles sleep 0 > ------------------------------------------------------------ > perf_event_attr: > size 112 > sample_type IDENTIFIER > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > disabled 1 > inherit 1 > enable_on_exec 1 > exclude_guest 1 > ------------------------------------------------------------ > > In the kernel, the way this is treated differs between AMD and Intel > because AMD does provide a hardware filter for guest vs. host in the > PMU counters whereas Intel does not. For the latter, the kernel > simply disables the event in the hardware counters, i.e., the event is > not descheduled. Both approaches produce pretty much the same desired > effect, the event is not counted while in guest mode. > > The issue I would like to raise has to do with the effects on > time_enabled and time_running for exclude_guest=1 events. > > Given the event is not scheduled out while in guest mode, even though > it is stopped, both time_enabled and time_running continue ticking > while in guest mode. If a measurement is 10s long but only 5s are in > non-guest mode, then time_enabled=10s, time_running=10s. The count > represents 10s worth of non guest mode, of which only 5s were really > actively monitoring, but the user has no way of determining this. > > If we look at vPMU passthru, the host event must have exclude_guest=1 > to avoid going into an error state on context switch to the vCPU > thread (with vPMU enabled). But this time, the event is scheduled out, > that means that time_enabled keeps counting, but time_running stops. > On context switch back in, the host event is scheduled again and > time_running restarts ticking. For a 10s measurement, where 5s here in > the guest, the event will come out with time_enabled=10s, > time_running=5s, and the tool will scale it up because it thinks the > event was multiplexed, when in fact it was not. This is not the > intended outcome here. The tool should not scale the count, it was not > multiplexed, it was descheduled because the filter forced it out. > Note that if the event had been multiplexed while running on the host, > then the scaling would be appropriate. > > In that case, I argue, time_running should be updated to cover the > time the event was not running. That would bring us back to the case I > was describing earlier. > > It boils down to the exact definition of exclude_guest and expected > impact on time_enabled and time_running. Then, with or without vPMU > passthru, we can fix the kernel to ensure a uniform behavior. > > What are your thoughts on this problem? So with those patches having explicit scheduling points, we can actually do this time accounting accurately, so I don't see a reason to not do the right thing here. Hysterically this was left vague in order to be able to avoid the scheduling for these scenarios -- performance raisins etc. The thing is, if you push this to its limits, we should start time accounting for the ring selectors too, and that's going to be painful.