Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp1374055imm; Fri, 22 Jun 2018 15:45:53 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLB7lRmLDe8FeI9pHqylCrjX2ULylAvvGdC+K0n1nFHvA4OJw+bAZPWM4KtslJ8PZhpoxbH X-Received: by 2002:a62:2044:: with SMTP id g65-v6mr3574878pfg.40.1529707553078; Fri, 22 Jun 2018 15:45:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529707553; cv=none; d=google.com; s=arc-20160816; b=Basi2XpNRD0Uhvieu80JHmyuEUpQVnNLS6p32dXupOk+fUB8BPekKVJM/tBCANnDZY HuwEFdd0rKVeTrsz4+90CI2gGzjUr9hHZ3h+/wVp58dbcXJD9E1IhrEh/jI0LEJ14QHc eHGcDKpoc8miH1tM9JxR2qtQnZCs/Uh54G9mgOo+57o6S5crtbAOn5u8viK0q1IYL5pr NMs/SfNYmUoKmBCAchknQL/T0vAjlJ+HsXJRuwkuuI+RutvdhX9zDqA/BWww3Tg+W65K OX/t/5PVlOE8Eu5v12KHwsFFRtb1ff4QBdZw8JbCHNn3nkAP0LNYYW57fYcwGyH7RghW TeqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=6Tji6FXU/bJhX8HnLPdpbP4DeofBdPXrvdwBMvBrtXU=; b=OqhOuEVacOOypLHEkEqDhYX9X393L7Bdf06LUg9XXK8BktNWxstriTaQ/11XemEiWn Q0fMhhZ0C7va2xr74XAqqVhKVOK+gLkM+UtRVd8lXBEQWRy5SYu2dVHRq5XkSoj3Q++9 3GSg7nbmto+3mBAQMvXtnsfEr/t0vuW3jY7szYDutSzuyHhqW3WkmpKstyW+v0MxEAxy lCobqjYh5RYd7Z/arQcm0dEavkxQWQBKToY0vN7FCpx4E71ybBATTtmQu+FDMY5dFBsH OyFjzPwUDk/wtnQ8soPXEWeVGlCs79ommkAb2Y96W40NhYF+XFxx/RmTpNGfncUVuqqT Otdg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i1-v6si8238063plt.183.2018.06.22.15.45.38; Fri, 22 Jun 2018 15:45:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754737AbeFVWoM (ORCPT + 99 others); Fri, 22 Jun 2018 18:44:12 -0400 Received: from mga02.intel.com ([134.134.136.20]:22310 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933980AbeFVWm4 (ORCPT ); Fri, 22 Jun 2018 18:42:56 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Jun 2018 15:42:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,259,1526367600"; d="scan'208";a="234843460" Received: from rchatre-s.jf.intel.com ([10.54.70.76]) by orsmga005.jf.intel.com with ESMTP; 22 Jun 2018 15:42:48 -0700 From: Reinette Chatre To: tglx@linutronix.de, fenghua.yu@intel.com, tony.luck@intel.com, vikas.shivappa@linux.intel.com Cc: gavin.hindman@intel.com, jithu.joseph@intel.com, dave.hansen@intel.com, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org, Reinette Chatre Subject: [PATCH V7 37/41] x86/intel_rdt: More precise L2 hit/miss measurements Date: Fri, 22 Jun 2018 15:42:28 -0700 Message-Id: <06b1456da65b543479dac8d9493e41f92f175d6c.1529706536.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.17.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Intel Goldmont processors supports non-architectural precise events that can be used to give us more insight into the success of L2 cache pseudo-locking on these platforms. Introduce a new measurement trigger that will enable two precise events, MEM_LOAD_UOPS_RETIRED.L2_HIT and MEM_LOAD_UOPS_RETIRED.L2_MISS, while accessing pseudo-locked data. A new tracepoint, pseudo_lock_l2, is created to make these results visible to the user. Signed-off-by: Reinette Chatre Signed-off-by: Thomas Gleixner Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: vikas.shivappa@linux.intel.com Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/8ac0d22a62d419266bcc26fdd0e6c4fb6d320d5a.1527593971.git.reinette.chatre@intel.com --- arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 145 ++++++++++++++++-- .../kernel/cpu/intel_rdt_pseudo_lock_event.h | 10 ++ 2 files changed, 146 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c index 652c95ab51c8..acaec07134c7 100644 --- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c +++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "intel_rdt.h" #define CREATE_TRACE_POINTS @@ -60,6 +61,9 @@ static struct class *pseudo_lock_class; * hardware prefetch disable bits are included here as they are documented * in the SDM. * + * When adding a platform here also add support for its cache events to + * measure_cycles_perf_fn() + * * Return: * If platform is supported, the bits to disable hardware prefetchers, 0 * if platform is not supported. @@ -98,6 +102,16 @@ static u64 get_prefetch_disable_bits(void) return 0; } +/* + * Helper to write 64bit value to MSR without tracing. Used when + * use of the cache should be restricted and use of registers used + * for local variables avoided. + */ +static inline void pseudo_wrmsrl_notrace(unsigned int msr, u64 val) +{ + __wrmsr(msr, (u32)(val & 0xffffffffULL), (u32)(val >> 32)); +} + /** * pseudo_lock_minor_get - Obtain available minor number * @minor: Pointer to where new minor number will be stored @@ -831,6 +845,107 @@ static int measure_cycles_lat_fn(void *_plr) return 0; } +static int measure_cycles_perf_fn(void *_plr) +{ + struct pseudo_lock_region *plr = _plr; + unsigned long long l2_hits, l2_miss; + u64 l2_hit_bits, l2_miss_bits; + u64 i; +#ifdef CONFIG_KASAN + /* + * The registers used for local register variables are also used + * when KASAN is active. When KASAN is active we use regular variables + * at the cost of including cache access latency to these variables + * in the measurements. + */ + unsigned int line_size; + unsigned int size; + void *mem_r; +#else + register unsigned int line_size asm("esi"); + register unsigned int size asm("edi"); +#ifdef CONFIG_X86_64 + register void *mem_r asm("rbx"); +#else + register void *mem_r asm("ebx"); +#endif /* CONFIG_X86_64 */ +#endif /* CONFIG_KASAN */ + + /* + * Non-architectural event for the Goldmont Microarchitecture + * from Intel x86 Architecture Software Developer Manual (SDM): + * MEM_LOAD_UOPS_RETIRED D1H (event number) + * Umask values: + * L1_HIT 01H + * L2_HIT 02H + * L1_MISS 08H + * L2_MISS 10H + */ + + /* + * Start by setting flags for IA32_PERFEVTSELx: + * OS (Operating system mode) 0x2 + * INT (APIC interrupt enable) 0x10 + * EN (Enable counter) 0x40 + * + * Then add the Umask value and event number to select performance + * event. + */ + + switch (boot_cpu_data.x86_model) { + case INTEL_FAM6_ATOM_GOLDMONT: + case INTEL_FAM6_ATOM_GEMINI_LAKE: + l2_hit_bits = (0x52ULL << 16) | (0x2 << 8) | 0xd1; + l2_miss_bits = (0x52ULL << 16) | (0x10 << 8) | 0xd1; + break; + default: + goto out; + } + + local_irq_disable(); + /* + * Call wrmsr direcly to avoid the local register variables from + * being overwritten due to reordering of their assignment with + * the wrmsr calls. + */ + __wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0); + /* Disable events and reset counters */ + pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, 0x0); + pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x0); + pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0, 0x0); + pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 1, 0x0); + /* Set and enable the L2 counters */ + pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, l2_hit_bits); + pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, l2_miss_bits); + mem_r = plr->kmem; + size = plr->size; + line_size = plr->line_size; + for (i = 0; i < size; i += line_size) { + asm volatile("mov (%0,%1,1), %%eax\n\t" + : + : "r" (mem_r), "r" (i) + : "%eax", "memory"); + } + /* + * Call wrmsr directly (no tracing) to not influence + * the cache access counters as they are disabled. + */ + pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, + l2_hit_bits & ~(0x40ULL << 16)); + pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, + l2_miss_bits & ~(0x40ULL << 16)); + l2_hits = native_read_pmc(0); + l2_miss = native_read_pmc(1); + wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0); + local_irq_enable(); + trace_pseudo_lock_l2(l2_hits, l2_miss); + +out: + plr->thread_done = 1; + wake_up_interruptible(&plr->lock_thread_wq); + return 0; +} + /** * pseudo_lock_measure_cycles - Trigger latency measure to pseudo-locked region * @@ -841,12 +956,12 @@ static int measure_cycles_lat_fn(void *_plr) * * Return: 0 on success, <0 on failure */ -static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp) +static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp, int sel) { struct pseudo_lock_region *plr = rdtgrp->plr; struct task_struct *thread; unsigned int cpu; - int ret; + int ret = -1; cpus_read_lock(); mutex_lock(&rdtgroup_mutex); @@ -863,9 +978,19 @@ static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp) goto out; } - thread = kthread_create_on_node(measure_cycles_lat_fn, plr, - cpu_to_node(cpu), - "pseudo_lock_measure/%u", cpu); + if (sel == 1) + thread = kthread_create_on_node(measure_cycles_lat_fn, plr, + cpu_to_node(cpu), + "pseudo_lock_measure/%u", + cpu); + else if (sel == 2) + thread = kthread_create_on_node(measure_cycles_perf_fn, plr, + cpu_to_node(cpu), + "pseudo_lock_measure/%u", + cpu); + else + goto out; + if (IS_ERR(thread)) { ret = PTR_ERR(thread); goto out; @@ -894,19 +1019,21 @@ static ssize_t pseudo_lock_measure_trigger(struct file *file, size_t buf_size; char buf[32]; int ret; - bool bv; + int sel; buf_size = min(count, (sizeof(buf) - 1)); if (copy_from_user(buf, user_buf, buf_size)) return -EFAULT; buf[buf_size] = '\0'; - ret = strtobool(buf, &bv); - if (ret == 0 && bv) { + ret = kstrtoint(buf, 10, &sel); + if (ret == 0) { + if (sel != 1 && sel != 2) + return -EINVAL; ret = debugfs_file_get(file->f_path.dentry); if (ret) return ret; - ret = pseudo_lock_measure_cycles(rdtgrp); + ret = pseudo_lock_measure_cycles(rdtgrp, sel); if (ret == 0) ret = count; debugfs_file_put(file->f_path.dentry); diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h index 3cd0fa27d5fe..efad50d2ee2f 100644 --- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h +++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h @@ -15,6 +15,16 @@ TRACE_EVENT(pseudo_lock_mem_latency, TP_printk("latency=%u", __entry->latency) ); +TRACE_EVENT(pseudo_lock_l2, + TP_PROTO(u64 l2_hits, u64 l2_miss), + TP_ARGS(l2_hits, l2_miss), + TP_STRUCT__entry(__field(u64, l2_hits) + __field(u64, l2_miss)), + TP_fast_assign(__entry->l2_hits = l2_hits; + __entry->l2_miss = l2_miss;), + TP_printk("hits=%llu miss=%llu", + __entry->l2_hits, __entry->l2_miss)); + #endif /* _TRACE_PSEUDO_LOCK_H */ #undef TRACE_INCLUDE_PATH -- 2.17.0