Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp5953850pxb; Mon, 14 Feb 2022 11:34:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJxa2lLI6cz0PiRtD47KXViUBHBltQ85q4d61FWJqS0+w4X1u68NAWfkfd4Pns4Pxczrn4ec X-Received: by 2002:a63:101f:: with SMTP id f31mr489747pgl.242.1644867282883; Mon, 14 Feb 2022 11:34:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644867282; cv=none; d=google.com; s=arc-20160816; b=txGISYrvKCpm3OjPocox8rfgnqDAaIrXPjV8hWqVWdgeIBA7xbd6zNukedQHLY60mr dRuhAZJaUM0ojTV9RPRnx83tI+ZVcKby9Y7E4GiZk4WBW+FmWclePz/FCKp5VrOM7aCV cvTiR+wRP9t5qsqz9p0PWjrJsHQN8PP2C21GVf2d7BAs5NRKK7R3qaxnJWKokHJ8BRki M5C/l1cb3L3voK6YIa8A9+obEEPuSVi15IEd32NsX0W9Rv8n4Jrl+sTt3JnL4yrr4Y0t UBhOegrJu284sPNxNBVktXofxh06O2uSnhxhCda88VUy3IFVbDWQUm7HIdtC85yJAvv0 slag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:organization :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=JPSorvjjcUfRbDtfIWLkPf4ubDpDO2+7TZEsYAA+8KY=; b=qN2eUVh4YAaoaCIMjZ8yzmW5xsjNuFhobvTXVaLCNR9HzQMCIJViK9lf/ag9m/JaHy Z93nUfRWvNTjrZyuaeNfJ+T2wXb/3uJ+Xq7Nj3kcVgIlcRY5K5KQAzPQwHaeNOJBkhOS mKIk+BZ+D6PFFO7DLuKKMGqlIonVhl6grwZaBIufc3cvlLhRY7C43XVszUoqG45QsXx6 h6cRdEw5NTPqLdJo3I7kKBvqNHoinB4Z2R6k13QqZI2j0q0UAvPBBmQ27aZEe9GKuMLF twvB9JZ2rnPzTzeZHqU4JWrulv1uvHibilG35sh6X2g1FaSkT01UnO7xL6cAJK6iC5Nu l82Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=bLRwq4Cr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id b7si1019659plh.330.2022.02.14.11.34.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Feb 2022 11:34:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=bLRwq4Cr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 075D0F7444; Mon, 14 Feb 2022 11:22:25 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351629AbiBNLcW (ORCPT + 99 others); Mon, 14 Feb 2022 06:32:22 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:32854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351236AbiBNL3t (ORCPT ); Mon, 14 Feb 2022 06:29:49 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4306B1C91A; Mon, 14 Feb 2022 03:09:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1644836984; x=1676372984; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4jmg151yDeRVjcnQYpGboO10DRS6jYizSfbjNlgT6NE=; b=bLRwq4CrU7R+EgJcs3rY+D1MmG2vZPgvuTrw+0D91S/roKSpzOMMg+tn IrzbiUB/dxd4bdoflrJDK3ul9uMvliPQNP16TzndNGm7hBakhacLCbOvr rs7dBbPtroHm+Xh4sXhF/smnU1jqDfm+8qp2OlRZJ1HTpjA3HZ+GNFLTh 4ZpjKV9cWaWuiy95ScMFcUl7dG0dwpwnh1tvPCL4RQ2VN/8np9ZB8+2aG 1J3hRbq+MzdTJVZr1WOp1d6+H1CMJJzuCyapS9s+p7NHaBTvKd2s0hbmZ jvD5hHX+3pQi4+A4Y1Q2CQv956Kv0zWuPORBOimm9x1fwnSmSxMSGm4dw A==; X-IronPort-AV: E=McAfee;i="6200,9189,10257"; a="274639385" X-IronPort-AV: E=Sophos;i="5.88,367,1635231600"; d="scan'208";a="274639385" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Feb 2022 03:09:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,367,1635231600"; d="scan'208";a="635103413" Received: from ahunter-desktop.fi.intel.com ([10.237.72.92]) by orsmga004.jf.intel.com with ESMTP; 14 Feb 2022 03:09:36 -0800 From: Adrian Hunter To: Peter Zijlstra Cc: Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, kvm@vger.kernel.org, H Peter Anvin , Mathieu Poirier , Suzuki K Poulose , Leo Yan Subject: [PATCH V2 03/11] perf/x86: Add support for TSC in nanoseconds as a perf event clock Date: Mon, 14 Feb 2022 13:09:06 +0200 Message-Id: <20220214110914.268126-4-adrian.hunter@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220214110914.268126-1-adrian.hunter@intel.com> References: <20220214110914.268126-1-adrian.hunter@intel.com> MIME-Version: 1.0 Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, when Intel PT is used within a VM guest, it is not possible to make use of TSC because perf clock is subject to paravirtualization. If the hypervisor leaves rdtsc alone, the TSC value will be subject only to the VMCS TSC Offset and Scaling, the same as the TSC packet from Intel PT. The new clock is based on rdtsc and not subject to paravirtualization. Hence it would be possible to use this new clock for Intel PT decoding within a VM guest. Signed-off-by: Adrian Hunter --- arch/x86/events/core.c | 41 ++++++++++++++++++++----------- arch/x86/include/asm/perf_event.h | 2 ++ include/uapi/linux/perf_event.h | 6 +++++ kernel/events/core.c | 6 +++++ 4 files changed, 40 insertions(+), 15 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 51d5345de30a..905975a7d475 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -41,6 +41,7 @@ #include #include #include +#include #include "perf_event.h" @@ -2728,18 +2729,26 @@ void arch_perf_update_userpage(struct perf_event *event, !!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT); userpg->pmc_width = x86_pmu.cntval_bits; - if (event->attr.use_clockid && - event->attr.ns_clockid && - event->attr.clockid == CLOCK_PERF_HW_CLOCK) { - userpg->cap_user_time_zero = 1; - userpg->time_mult = 1; - userpg->time_shift = 0; - userpg->time_offset = 0; - userpg->time_zero = 0; - return; + if (event->attr.use_clockid && event->attr.ns_clockid) { + if (event->attr.clockid == CLOCK_PERF_HW_CLOCK) { + userpg->cap_user_time_zero = 1; + userpg->time_mult = 1; + userpg->time_shift = 0; + userpg->time_offset = 0; + userpg->time_zero = 0; + return; + } + if (event->attr.clockid == CLOCK_PERF_HW_CLOCK_NS) + userpg->cap_user_time_zero = 1; + } + + if (using_native_sched_clock() && sched_clock_stable()) { + userpg->cap_user_time = 1; + if (!event->attr.use_clockid) + userpg->cap_user_time_zero = 1; } - if (!using_native_sched_clock() || !sched_clock_stable()) + if (!userpg->cap_user_time && !userpg->cap_user_time_zero) return; cyc2ns_read_begin(&data); @@ -2750,19 +2759,16 @@ void arch_perf_update_userpage(struct perf_event *event, * Internal timekeeping for enabled/running/stopped times * is always in the local_clock domain. */ - userpg->cap_user_time = 1; userpg->time_mult = data.cyc2ns_mul; userpg->time_shift = data.cyc2ns_shift; userpg->time_offset = offset - now; /* * cap_user_time_zero doesn't make sense when we're using a different - * time base for the records. + * time base for the records, except for CLOCK_PERF_HW_CLOCK_NS. */ - if (!event->attr.use_clockid) { - userpg->cap_user_time_zero = 1; + if (userpg->cap_user_time_zero) userpg->time_zero = offset; - } cyc2ns_read_end(); } @@ -2996,6 +3002,11 @@ u64 perf_hw_clock(void) return rdtsc_ordered(); } +u64 perf_hw_clock_ns(void) +{ + return native_sched_clock_from_tsc(perf_hw_clock()); +} + void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) { cap->version = x86_pmu.version; diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index 5288ea1ae2ba..46cbca90cdd1 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -453,6 +453,8 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs); extern u64 perf_hw_clock(void); #define perf_hw_clock perf_hw_clock +extern u64 perf_hw_clock_ns(void); +#define perf_hw_clock_ns perf_hw_clock_ns #include diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index e8617efd552b..0edc005f8ddf 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -298,6 +298,12 @@ enum { * paravirtualized. */ #define CLOCK_PERF_HW_CLOCK 0x10000000 +/* + * Same as CLOCK_PERF_HW_CLOCK but in nanoseconds. Note support of + * CLOCK_PERF_HW_CLOCK_NS does not necesssarily imply support of + * CLOCK_PERF_HW_CLOCK or vice versa. + */ +#define CLOCK_PERF_HW_CLOCK_NS 0x10000001 /* * The format of the data returned by read() on a perf event fd, diff --git a/kernel/events/core.c b/kernel/events/core.c index 15dee265a5b9..65e70fb669fd 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -12019,6 +12019,12 @@ static int perf_event_set_clock(struct perf_event *event, clockid_t clk_id, bool event->clock = &perf_hw_clock; nmi_safe = true; break; +#endif +#ifdef perf_hw_clock_ns + case CLOCK_PERF_HW_CLOCK_NS: + event->clock = &perf_hw_clock_ns; + nmi_safe = true; + break; #endif default: return -EINVAL; -- 2.25.1