Received: by 10.213.65.68 with SMTP id h4csp1059321imn; Wed, 28 Mar 2018 19:31:45 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+xT2LU2wfMSThsg+r8x+2WteeSSAe/xf48JmoUvtb0gOYVFLb04cfdY5oiiairS2bi1qxT X-Received: by 2002:a17:902:9308:: with SMTP id bc8-v6mr6264357plb.189.1522290705235; Wed, 28 Mar 2018 19:31:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522290705; cv=none; d=google.com; s=arc-20160816; b=S3aC8Gr+bi23EP3mqubUyHV7IkF2j7/NSXli7GI8yoqUI+kcXZ3qMsmC7kJNlKkTi5 IB3txKLTmUsi1jsqj10fi0z31NdmaSjGPtkCH2FEoPsEOWIe8piMeP2wTgtOYlu2q32a 5q3y1KnM/5aWYkWIhIsIMihQExsshAsnFl1vtoCb1vtMUZNphqfREr5R1tawlrg1Bx+N 6aZq+jtNIYL19mr/OstNQ/Gv0rGVV3NMaFzA7cUaZFTMUi5zSw3+gcnCvlBJcBtupg/5 2r1z/IwLPVuWhNUJ2j4rGJFUEYtZp2iMBwvohifp4DDDvOnLs3G1kH2OsxnOrQRrP+Hv 1b+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=1JietXVk2ZbgNiwB40L9fy/+1NJPyZsx5tN5AgQ3sbA=; b=nLyPfN4r/lbmL5WjORUOmsAyPFp9/vF7DUuTZNEmN9wyZWBaEw1s9o8urtgwOk6j+p eBf7k7GluYVQqyoGwKJwmx+sNQzv26qZ85nH8DdGudeXztUtQi61ALJ1j+rdPAgAL4jd 7Mg/WakNGCayYkjp5/sqtxAc/ceG2dbMJ8mXWU0kUjQGweG9cF0CsfEkMsmjfIQ0sFhw A1sYQ7MpD4ZxXwFfHLp3kqColjTFWf9rP277yL2fPdp3WqEZGwt7VjFNr4HkIoJs0xLr hS+mwZSUXEozr/TSGkFWWU49cpUetZ+c3yfhLcDVrGtbTProZ5cAG2t+Q1hVpkWOOSlv B49w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l30-v6si4869207plg.541.2018.03.28.19.31.31; Wed, 28 Mar 2018 19:31:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751172AbeC2Cag (ORCPT + 99 others); Wed, 28 Mar 2018 22:30:36 -0400 Received: from exmail.andestech.com ([59.124.169.137]:30968 "EHLO ATCSQR.andestech.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750753AbeC2Caf (ORCPT ); Wed, 28 Mar 2018 22:30:35 -0400 Received: from mail.andestech.com (atcpcs16.andestech.com [10.0.1.222]) by ATCSQR.andestech.com with ESMTP id w2T2PhDN017589; Thu, 29 Mar 2018 10:25:43 +0800 (GMT-8) (envelope-from alankao@andestech.com) Received: from andestech.com (10.0.1.85) by ATCPCS16.andestech.com (10.0.1.222) with Microsoft SMTP Server id 14.3.123.3; Thu, 29 Mar 2018 10:30:23 +0800 Date: Thu, 29 Mar 2018 10:30:24 +0800 From: Alan Kao To: Alex Solomatnikov CC: Palmer Dabbelt , Albert Ou , "Peter Zijlstra" , Ingo Molnar , "Arnaldo Carvalho de Melo" , Alexander Shishkin , Jiri Olsa , "Namhyung Kim" , Jonathan Corbet , , , , Nick Hu , Greentime Hu Subject: Re: [PATCH 1/2] perf: riscv: preliminary RISC-V support Message-ID: <20180329023024.GA32659@andestech.com> References: <1522051075-6442-1-git-send-email-alankao@andestech.com> <1522051075-6442-2-git-send-email-alankao@andestech.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [10.0.1.85] X-DNSRBL: X-MAIL: ATCSQR.andestech.com w2T2PhDN017589 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alex, I'm appreciated for your reply and tests. On Wed, Mar 28, 2018 at 03:58:41PM -0700, Alex Solomatnikov wrote: > Did you test this code? I did test this patch on QEMU's virt model with multi-hart, which is the only RISC-V machine I have for now. But as I mentioned in https://github.com/riscv/riscv-qemu/pull/115 , the hardware counter support in QEMU is not fully conformed to the 1.10 Priv-Spec, so I had to slightly tweak the code to make reading work. Specifically, the read to cycle and instret in QEMU looks like this: ... case CSR_INSTRET: case CSR_CYCLE: // if (ctr_ok) { return cpu_get_host_ticks(); // } break; ... and the two lines of comment was the tweak. On such environment, I did not get anything unexpected. No matter which of them is requested, QEMU returns the host's tick. > > I got funny numbers when I tried to run it on HiFive Unleashed: > > perf stat mem-latency > ... > > Performance counter stats for 'mem-latency': > > 157.907000 task-clock (msec) # 0.940 CPUs utilized > > 1 context-switches # 0.006 K/sec > > 1 cpu-migrations # 0.006 K/sec > > 4102 page-faults # 0.026 M/sec > > 157923752 cycles # 1.000 GHz > > 9223372034948899840 instructions # 58403957087.78 insn > per cycle > branches > > branch-misses > > > 0.168046000 seconds time elapsed > > > Tracing read_counter(), I see this: > > Jan 1 00:41:50 buildroot user.info kernel: [ 2510.058809] CPU 3: > read_counter idx=0 val=2528358954912 > Jan 1 00:41:50 buildroot user.info kernel: [ 2510.063339] CPU 3: > read_counter idx=1 val=53892244920 > Jan 1 00:41:50 buildroot user.info kernel: [ 2510.118160] CPU 3: > read_counter idx=0 val=2528418303035 > Jan 1 00:41:50 buildroot user.info kernel: [ 2510.122694] CPU 3: > read_counter idx=1 val=53906699665 > Jan 1 00:41:50 buildroot user.info kernel: [ 2510.216736] CPU 1: > read_counter idx=0 val=2528516878664 > Jan 1 00:41:50 buildroot user.info kernel: [ 2510.221270] CPU 1: > read_counter idx=1 val=51986369142 > > It looks like the counter values from different cores are subtracted and > wraparound occurs. > Thanks for the hint. It makes sense. 9223372034948899840 is 7fffffff8e66a400, which should be a wraparound with the mask I set (63-bit) in the code. I will try this direction. Ideally, we can solve it by explicitly syncing the hwc->prev_count when a cpu migration event happens. > > Also, core IDs and socket IDs are wrong in perf report: > As Palmer has replied to this, I have no comment here. > perf report --header -I > Error: > The perf.data file has no samples! > # ======== > # captured on: Thu Jan 1 02:52:07 1970 > # hostname : buildroot > # os release : 4.15.0-00045-g0d7c030-dirty > # perf version : 4.15.0 > # arch : riscv64 > # nrcpus online : 4 > # nrcpus avail : 5 > # total memory : 8188340 kB > # cmdline : /usr/bin/perf record -F 1000 lat_mem_rd -P 1 -W 1 -N 1 -t 10 > # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = > 1000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = > 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, > sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1 > # sibling cores : 1 > # sibling cores : 2 > # sibling cores : 3 > # sibling cores : 4 > # sibling threads : 1 > # sibling threads : 2 > # sibling threads : 3 > # sibling threads : 4 > # CPU 0: Core ID -1, Socket ID -1 > # CPU 1: Core ID 0, Socket ID -1 > # CPU 2: Core ID 0, Socket ID -1 > # CPU 3: Core ID 0, Socket ID -1 > # CPU 4: Core ID 0, Socket ID -1 > # pmu mappings: cpu = 4, software = 1 > # CPU cache info: > # L1 Instruction 32K [1] > # L1 Data 32K [1] > # L1 Instruction 32K [2] > # L1 Data 32K [2] > # L1 Instruction 32K [3] > # L1 Data 32K [3] > # missing features: TRACING_DATA BUILD_ID CPUDESC CPUID NUMA_TOPOLOGY > BRANCH_STACK GROUP_DESC AUXTRACE STAT > # ======== > > > Alex > Many thanks, Alan > On Mon, Mar 26, 2018 at 12:57 AM, Alan Kao wrote: > > > This patch provide a basic PMU, riscv_base_pmu, which supports two > > general hardware event, instructions and cycles. Furthermore, this > > PMU serves as a reference implementation to ease the portings in > > the future. > > > > riscv_base_pmu should be able to run on any RISC-V machine that > > conforms to the Priv-Spec. Note that the latest qemu model hasn't > > fully support a proper behavior of Priv-Spec 1.10 yet, but work > > around should be easy with very small fixes. Please check > > https://github.com/riscv/riscv-qemu/pull/115 for future updates. > > > > Cc: Nick Hu > > Cc: Greentime Hu > > Signed-off-by: Alan Kao > > --- > > arch/riscv/Kconfig | 12 + > > arch/riscv/include/asm/perf_event.h | 76 +++++- > > arch/riscv/kernel/Makefile | 1 + > > arch/riscv/kernel/perf_event.c | 469 ++++++++++++++++++++++++++++++ > > ++++++ > > 4 files changed, 554 insertions(+), 4 deletions(-) > > create mode 100644 arch/riscv/kernel/perf_event.c > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > index 310b9a5d6737..dd4aecfb5265 100644 > > --- a/arch/riscv/Kconfig > > +++ b/arch/riscv/Kconfig > > @@ -195,6 +195,18 @@ config RISCV_ISA_C > > config RISCV_ISA_A > > def_bool y > > > > +menu "PMU type" > > + depends on PERF_EVENTS > > + > > +config RISCV_BASE_PMU > > + bool "Base Performance Monitoring Unit" > > + def_bool y > > + help > > + A base PMU that serves as a reference implementation and has > > limited > > + feature of perf. > > + > > +endmenu > > + > > endmenu > > > > menu "Kernel type" > > diff --git a/arch/riscv/include/asm/perf_event.h > > b/arch/riscv/include/asm/perf_event.h > > index e13d2ff29e83..98e2efb02d25 100644 > > --- a/arch/riscv/include/asm/perf_event.h > > +++ b/arch/riscv/include/asm/perf_event.h > > @@ -1,13 +1,81 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > /* > > * Copyright (C) 2018 SiFive > > + * Copyright (C) 2018 Andes Technology Corporation > > * > > - * This program is free software; you can redistribute it and/or > > - * modify it under the terms of the GNU General Public Licence > > - * as published by the Free Software Foundation; either version > > - * 2 of the Licence, or (at your option) any later version. > > */ > > > > #ifndef _ASM_RISCV_PERF_EVENT_H > > #define _ASM_RISCV_PERF_EVENT_H > > > > +#include > > +#include > > + > > +#define RISCV_BASE_COUNTERS 2 > > + > > +/* > > + * The RISCV_MAX_COUNTERS parameter should be specified. > > + */ > > + > > +#ifdef CONFIG_RISCV_BASE_PMU > > +#define RISCV_MAX_COUNTERS 2 > > +#endif > > + > > +#ifndef RISCV_MAX_COUNTERS > > +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU." > > +#endif > > + > > +/* > > + * These are the indexes of bits in counteren register *minus* 1, > > + * except for cycle. It would be coherent if it can directly mapped > > + * to counteren bit definition, but there is a *time* register at > > + * counteren[1]. Per-cpu structure is scarce resource here. > > + * > > + * According to the spec, an implementation can support counter up to > > + * mhpmcounter31, but many high-end processors has at most 6 general > > + * PMCs, we give the definition to MHPMCOUNTER8 here. > > + */ > > +#define RISCV_PMU_CYCLE 0 > > +#define RISCV_PMU_INSTRET 1 > > +#define RISCV_PMU_MHPMCOUNTER3 2 > > +#define RISCV_PMU_MHPMCOUNTER4 3 > > +#define RISCV_PMU_MHPMCOUNTER5 4 > > +#define RISCV_PMU_MHPMCOUNTER6 5 > > +#define RISCV_PMU_MHPMCOUNTER7 6 > > +#define RISCV_PMU_MHPMCOUNTER8 7 > > + > > +#define RISCV_OP_UNSUPP (-EOPNOTSUPP) > > + > > +struct cpu_hw_events { > > + /* # currently enabled events*/ > > + int n_events; > > + /* currently enabled events */ > > + struct perf_event *events[RISCV_MAX_COUNTERS]; > > + /* vendor-defined PMU data */ > > + void *platform; > > +}; > > + > > +struct riscv_pmu { > > + struct pmu *pmu; > > + > > + /* generic hw/cache events table */ > > + const int *hw_events; > > + const int (*cache_events)[PERF_COUNT_HW_CACHE_MAX] > > + [PERF_COUNT_HW_CACHE_OP_MAX] > > + [PERF_COUNT_HW_CACHE_RESULT_MAX]; > > + /* method used to map hw/cache events */ > > + int (*map_hw_event)(u64 config); > > + int (*map_cache_event)(u64 config); > > + > > + /* max generic hw events in map */ > > + int max_events; > > + /* number total counters, 2(base) + x(general) */ > > + int num_counters; > > + /* the width of the counter */ > > + int counter_width; > > + > > + /* vendor-defined PMU features */ > > + void *platform; > > +}; > > + > > #endif /* _ASM_RISCV_PERF_EVENT_H */ > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile > > index 196f62ffc428..849c38d9105f 100644 > > --- a/arch/riscv/kernel/Makefile > > +++ b/arch/riscv/kernel/Makefile > > @@ -36,5 +36,6 @@ obj-$(CONFIG_SMP) += smp.o > > obj-$(CONFIG_MODULES) += module.o > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o > > obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o > > +obj-$(CONFIG_PERF_EVENTS) += perf_event.o > > > > clean: > > diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_ > > event.c > > new file mode 100644 > > index 000000000000..b78cb486683b > > --- /dev/null > > +++ b/arch/riscv/kernel/perf_event.c > > @@ -0,0 +1,469 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > +/* > > + * Copyright (C) 2008 Thomas Gleixner > > + * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar > > + * Copyright (C) 2009 Jaswinder Singh Rajput > > + * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter > > + * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra > > + * Copyright (C) 2009 Intel Corporation, > > + * Copyright (C) 2009 Google, Inc., Stephane Eranian > > + * Copyright 2014 Tilera Corporation. All Rights Reserved. > > + * Copyright (C) 2018 Andes Technology Corporation > > + * > > + * Perf_events support for RISC-V platforms. > > + * > > + * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough > > + * functionality for perf event to fully work, this file provides > > + * the very basic framework only. > > + * > > + * For platform portings, please check Documentations/riscv/pmu.txt. > > + * > > + * The Copyright line includes x86 and tile ones. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +static const struct riscv_pmu *riscv_pmu __read_mostly; > > +static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events); > > + > > +/* > > + * Hardware & cache maps and their methods > > + */ > > + > > +static const int riscv_hw_event_map[] = { > > + [PERF_COUNT_HW_CPU_CYCLES] = RISCV_PMU_CYCLE, > > + [PERF_COUNT_HW_INSTRUCTIONS] = RISCV_PMU_INSTRET, > > + [PERF_COUNT_HW_CACHE_REFERENCES] = RISCV_OP_UNSUPP, > > + [PERF_COUNT_HW_CACHE_MISSES] = RISCV_OP_UNSUPP, > > + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = RISCV_OP_UNSUPP, > > + [PERF_COUNT_HW_BRANCH_MISSES] = RISCV_OP_UNSUPP, > > + [PERF_COUNT_HW_BUS_CYCLES] = RISCV_OP_UNSUPP, > > +}; > > + > > +#define C(x) PERF_COUNT_HW_CACHE_##x > > +static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX] > > +[PERF_COUNT_HW_CACHE_OP_MAX] > > +[PERF_COUNT_HW_CACHE_RESULT_MAX] = { > > + [C(L1D)] = { > > + [C(OP_READ)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_WRITE)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_PREFETCH)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + }, > > + [C(L1I)] = { > > + [C(OP_READ)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_WRITE)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_PREFETCH)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + }, > > + [C(LL)] = { > > + [C(OP_READ)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_WRITE)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_PREFETCH)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + }, > > + [C(DTLB)] = { > > + [C(OP_READ)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_WRITE)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_PREFETCH)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + }, > > + [C(ITLB)] = { > > + [C(OP_READ)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_WRITE)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_PREFETCH)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + }, > > + [C(BPU)] = { > > + [C(OP_READ)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_WRITE)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + [C(OP_PREFETCH)] = { > > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > > + }, > > + }, > > +}; > > + > > +static int riscv_map_hw_event(u64 config) > > +{ > > + if (config >= riscv_pmu->max_events) > > + return -EINVAL; > > + > > + return riscv_pmu->hw_events[config]; > > +} > > + > > +int riscv_map_cache_decode(u64 config, unsigned int *type, > > + unsigned int *op, unsigned int *result) > > +{ > > + return -ENOENT; > > +} > > + > > +static int riscv_map_cache_event(u64 config) > > +{ > > + unsigned int type, op, result; > > + int err = -ENOENT; > > + int code; > > + > > + err = riscv_map_cache_decode(config, &type, &op, &result); > > + if (!riscv_pmu->cache_events || err) > > + return err; > > + > > + if (type >= PERF_COUNT_HW_CACHE_MAX || > > + op >= PERF_COUNT_HW_CACHE_OP_MAX || > > + result >= PERF_COUNT_HW_CACHE_RESULT_MAX) > > + return -EINVAL; > > + > > + code = (*riscv_pmu->cache_events)[type][op][result]; > > + if (code == RISCV_OP_UNSUPP) > > + return -EINVAL; > > + > > + return code; > > +} > > + > > +/* > > + * Low-level functions: reading/writing counters > > + */ > > + > > +static inline u64 read_counter(int idx) > > +{ > > + u64 val = 0; > > + > > + switch (idx) { > > + case RISCV_PMU_CYCLE: > > + val = csr_read(cycle); > > + break; > > + case RISCV_PMU_INSTRET: > > + val = csr_read(instret); > > + break; > > + default: > > + WARN_ON_ONCE(idx < 0 || idx > RISCV_MAX_COUNTERS); > > + return -EINVAL; > > + } > > + > > + return val; > > +} > > + > > +static inline void write_counter(int idx, u64 value) > > +{ > > + /* currently not supported */ > > +} > > + > > +/* > > + * pmu->read: read and update the counter > > + * > > + * Other architectures' implementation often have a xxx_perf_event_update > > + * routine, which can return counter values when called in the IRQ, but > > + * return void when being called by the pmu->read method. > > + */ > > +static void riscv_pmu_read(struct perf_event *event) > > +{ > > + struct hw_perf_event *hwc = &event->hw; > > + u64 prev_raw_count, new_raw_count; > > + u64 oldval; > > + int idx = hwc->idx; > > + u64 delta; > > + > > + do { > > + prev_raw_count = local64_read(&hwc->prev_count); > > + new_raw_count = read_counter(idx); > > + > > + oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count, > > + new_raw_count); > > + } while (oldval != prev_raw_count); > > + > > + /* > > + * delta is the value to update the counter we maintain in the > > kernel. > > + */ > > + delta = (new_raw_count - prev_raw_count) & > > + ((1ULL << riscv_pmu->counter_width) - 1); > > + local64_add(delta, &event->count); > > + /* > > + * Something like local64_sub(delta, &hwc->period_left) here is > > + * needed if there is an interrupt for perf. > > + */ > > +} > > + > > +/* > > + * State transition functions: > > + * > > + * stop()/start() & add()/del() > > + */ > > + > > +/* > > + * pmu->stop: stop the counter > > + */ > > +static void riscv_pmu_stop(struct perf_event *event, int flags) > > +{ > > + struct hw_perf_event *hwc = &event->hw; > > + > > + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); > > + hwc->state |= PERF_HES_STOPPED; > > + > > + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) > > { > > + riscv_pmu_read(event); > > + hwc->state |= PERF_HES_UPTODATE; > > + } > > +} > > + > > +/* > > + * pmu->start: start the event. > > + */ > > +static void riscv_pmu_start(struct perf_event *event, int flags) > > +{ > > + struct hw_perf_event *hwc = &event->hw; > > + > > + if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) > > + return; > > + > > + if (flags & PERF_EF_RELOAD) { > > + WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE)); > > + > > + /* > > + * Set the counter to the period to the next interrupt > > here, > > + * if you have any. > > + */ > > + } > > + > > + hwc->state = 0; > > + perf_event_update_userpage(event); > > + > > + /* > > + * Since we cannot write to counters, this serves as an > > initialization > > + * to the delta-mechanism in pmu->read(); otherwise, the delta > > would be > > + * wrong when pmu->read is called for the first time. > > + */ > > + if (local64_read(&hwc->prev_count) == 0) > > + local64_set(&hwc->prev_count, read_counter(hwc->idx)); > > +} > > + > > +/* > > + * pmu->add: add the event to PMU. > > + */ > > +static int riscv_pmu_add(struct perf_event *event, int flags) > > +{ > > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > > + struct hw_perf_event *hwc = &event->hw; > > + > > + if (cpuc->n_events == riscv_pmu->num_counters) > > + return -ENOSPC; > > + > > + /* > > + * We don't have general conunters, so no binding-event-to-counter > > + * process here. > > + * > > + * Indexing using hwc->config generally not works, since config may > > + * contain extra information, but here the only info we have in > > + * hwc->config is the event index. > > + */ > > + hwc->idx = hwc->config; > > + cpuc->events[hwc->idx] = event; > > + cpuc->n_events++; > > + > > + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; > > + > > + if (flags & PERF_EF_START) > > + riscv_pmu_start(event, PERF_EF_RELOAD); > > + > > + return 0; > > +} > > + > > +/* > > + * pmu->del: delete the event from PMU. > > + */ > > +static void riscv_pmu_del(struct perf_event *event, int flags) > > +{ > > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > > + struct hw_perf_event *hwc = &event->hw; > > + > > + cpuc->events[hwc->idx] = NULL; > > + cpuc->n_events--; > > + riscv_pmu_stop(event, PERF_EF_UPDATE); > > + perf_event_update_userpage(event); > > +} > > + > > +/* > > + * Interrupt > > + */ > > + > > +static DEFINE_MUTEX(pmc_reserve_mutex); > > +typedef void (*perf_irq_t)(void *riscv_perf_irq); > > +perf_irq_t perf_irq; > > + > > +void riscv_pmu_handle_irq(void *riscv_perf_irq) > > +{ > > +} > > + > > +static perf_irq_t reserve_pmc_hardware(void) > > +{ > > + perf_irq_t old; > > + > > + mutex_lock(&pmc_reserve_mutex); > > + old = perf_irq; > > + perf_irq = &riscv_pmu_handle_irq; > > + mutex_unlock(&pmc_reserve_mutex); > > + > > + return old; > > +} > > + > > +void release_pmc_hardware(void) > > +{ > > + mutex_lock(&pmc_reserve_mutex); > > + perf_irq = NULL; > > + mutex_unlock(&pmc_reserve_mutex); > > +} > > + > > +/* > > + * Event Initialization > > + */ > > + > > +static atomic_t riscv_active_events; > > + > > +static void riscv_event_destroy(struct perf_event *event) > > +{ > > + if (atomic_dec_return(&riscv_active_events) == 0) > > + release_pmc_hardware(); > > +} > > + > > +static int riscv_event_init(struct perf_event *event) > > +{ > > + struct perf_event_attr *attr = &event->attr; > > + struct hw_perf_event *hwc = &event->hw; > > + perf_irq_t old_irq_handler = NULL; > > + int code; > > + > > + if (atomic_inc_return(&riscv_active_events) == 1) > > + old_irq_handler = reserve_pmc_hardware(); > > + > > + if (old_irq_handler) { > > + pr_warn("PMC hardware busy (reserved by oprofile)\n"); > > + atomic_dec(&riscv_active_events); > > + return -EBUSY; > > + } > > + > > + switch (event->attr.type) { > > + case PERF_TYPE_HARDWARE: > > + code = riscv_pmu->map_hw_event(attr->config); > > + break; > > + case PERF_TYPE_HW_CACHE: > > + code = riscv_pmu->map_cache_event(attr->config); > > + break; > > + case PERF_TYPE_RAW: > > + return -EOPNOTSUPP; > > + default: > > + return -ENOENT; > > + } > > + > > + event->destroy = riscv_event_destroy; > > + if (code < 0) { > > + event->destroy(event); > > + return code; > > + } > > + > > + /* > > + * idx is set to -1 because the index of a general event should > > not be > > + * decided until binding to some counter in pmu->add(). > > + * > > + * But since we don't have such support, later in pmu->add(), we > > just > > + * use hwc->config as the index instead. > > + */ > > + hwc->config = code; > > + hwc->idx = -1; > > + > > + return 0; > > +} > > + > > +/* > > + * Initialization > > + */ > > + > > +static struct pmu min_pmu = { > > + .name = "riscv-base", > > + .event_init = riscv_event_init, > > + .add = riscv_pmu_add, > > + .del = riscv_pmu_del, > > + .start = riscv_pmu_start, > > + .stop = riscv_pmu_stop, > > + .read = riscv_pmu_read, > > +}; > > + > > +static const struct riscv_pmu riscv_base_pmu = { > > + .pmu = &min_pmu, > > + .max_events = ARRAY_SIZE(riscv_hw_event_map), > > + .map_hw_event = riscv_map_hw_event, > > + .hw_events = riscv_hw_event_map, > > + .map_cache_event = riscv_map_cache_event, > > + .cache_events = &riscv_cache_event_map, > > + .counter_width = 63, > > + .num_counters = RISCV_BASE_COUNTERS + 0, > > +}; > > + > > +struct pmu * __weak __init riscv_init_platform_pmu(void) > > +{ > > + riscv_pmu = &riscv_base_pmu; > > + return riscv_pmu->pmu; > > +} > > + > > +int __init init_hw_perf_events(void) > > +{ > > + struct pmu *pmu = riscv_init_platform_pmu(); > > + > > + perf_irq = NULL; > > + perf_pmu_register(pmu, "cpu", PERF_TYPE_RAW); > > + return 0; > > +} > > +arch_initcall(init_hw_perf_events); > > -- > > 2.16.2 > > > >