Received: by 10.192.165.156 with SMTP id m28csp1583489imm; Tue, 17 Apr 2018 01:40:46 -0700 (PDT) X-Google-Smtp-Source: AIpwx48zJOQRIW7KXh2O/zaiNwg5SmoARa1NUPmyNxWxZAB1+dUgz2AbauOL3alQuJWAch9VX2fZ X-Received: by 10.98.155.12 with SMTP id r12mr1175833pfd.15.1523954446025; Tue, 17 Apr 2018 01:40:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523954445; cv=none; d=google.com; s=arc-20160816; b=OOQl2cePPtAISmWTaylsel8u9026oAtqqiGIVRrDC/0qeku+K9C51Y4uoMwPI4/YhN GC/ebeMbgW8ImYKsMJC4DsmlOIyfqo0r+EvshGoM6nEwH62KqbdUj61wWLZWQ0/gzytd ZbMemhvPTFydmioTvHdlg2WXuuGvnpzQUGRkz2NmzgKZfXaMqoI5TiMRmUi+KR/OCNCK SWXbBTrcDL+hFf1CMAYc/WM6R3IGPaCWYvOqj1v0PZ8ZlU26RFaRFEFKmTjxV90XLA3A 1Uj3cdBjo8loanGU1s3fAgKZxpNfupWVBZi2K3dLlfrJnuZHH/gBROJJK6j9RFBFIIKk qbCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:arc-authentication-results; bh=RhzsiCt8TujTzlUvpMSv9V1Vn32sJRNAE/HipmmTHKA=; b=B81G+5MHoCtJv0XeX9ljaTpemVa+Leu8dWBb2EkIdtkpzwQRgQW/nht936e5193usV FEB6i0S2z2OD6nZ3EdSMfbiZ2nxwLQnG3vIMIgVBVCdDqFjw/SzfrklZT2wLFNvRm00b rSwL6RLsrJT6UALMD20RzOhMX9gKw4xdB0JoiNWUN7agb1AfCSjWcq3Q6smTXGB73pzu 9dqNNOKgOOY6xMumRrjeStR0lMWzHXMX3XvFnkjegV7OVvA3/tyr6OYMJjwc42xPl53+ QwsWSdpFZFCzxqQmRJzcs84qRcnSPnI5+du6rCMImLICUHrDF3bXKOIldPmSZhrhe9TE QE5w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b61-v6si13998022plc.500.2018.04.17.01.40.30; Tue, 17 Apr 2018 01:40:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752568AbeDQIjX (ORCPT + 99 others); Tue, 17 Apr 2018 04:39:23 -0400 Received: from exmail.andestech.com ([59.124.169.137]:32291 "EHLO ATCSQR.andestech.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751238AbeDQIjU (ORCPT ); Tue, 17 Apr 2018 04:39:20 -0400 Received: from mail.andestech.com (atcpcs16.andestech.com [10.0.1.222]) by ATCSQR.andestech.com with ESMTP id w3H8Wv5W066876; Tue, 17 Apr 2018 16:32:57 +0800 (GMT-8) (envelope-from alankao@andestech.com) Received: from atcsqa06.andestech.com (10.0.1.85) by ATCPCS16.andestech.com (10.0.1.222) with Microsoft SMTP Server id 14.3.123.3; Tue, 17 Apr 2018 16:39:10 +0800 From: Alan Kao To: Palmer Dabbelt , Albert Ou , "Peter Zijlstra" , Ingo Molnar , "Arnaldo Carvalho de Melo" , Alexander Shishkin , Jiri Olsa , "Namhyung Kim" , Alex Solomatnikov , "Jonathan Corbet" , , , CC: Alan Kao , Nick Hu , Greentime Hu Subject: [PATCH v3 2/2] perf: riscv: Add Document for Future Porting Guide Date: Tue, 17 Apr 2018 16:39:00 +0800 Message-ID: <1523954340-8708-3-git-send-email-alankao@andestech.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1523954340-8708-1-git-send-email-alankao@andestech.com> References: <1523954340-8708-1-git-send-email-alankao@andestech.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.0.1.85] X-DNSRBL: X-MAIL: ATCSQR.andestech.com w3H8Wv5W066876 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Reviewed-by: Alex Solomatnikov Cc: Nick Hu Cc: Greentime Hu Signed-off-by: Alan Kao --- Documentation/riscv/pmu.txt | 249 ++++++++++++++++++++++++++++++++++++ 1 file changed, 249 insertions(+) create mode 100644 Documentation/riscv/pmu.txt diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt new file mode 100644 index 000000000000..b29f03a6d82f --- /dev/null +++ b/Documentation/riscv/pmu.txt @@ -0,0 +1,249 @@ +Supporting PMUs on RISC-V platforms +========================================== +Alan Kao , Mar 2018 + +Introduction +------------ + +As of this writing, perf_event-related features mentioned in The RISC-V ISA +Privileged Version 1.10 are as follows: +(please check the manual for more details) + +* [m|s]counteren +* mcycle[h], cycle[h] +* minstret[h], instret[h] +* mhpeventx, mhpcounterx[h] + +With such function set only, porting perf would require a lot of work, due to +the lack of the following general architectural performance monitoring features: + +* Enabling/Disabling counters + Counters are just free-running all the time in our case. +* Interrupt caused by counter overflow + No such feature in the spec. +* Interrupt indicator + It is not possible to have many interrupt ports for all counters, so an + interrupt indicator is required for software to tell which counter has + just overflowed. +* Writing to counters + There will be an SBI to support this since the kernel cannot modify the + counters [1]. Alternatively, some vendor considers to implement + hardware-extension for M-S-U model machines to write counters directly. + +This document aims to provide developers a quick guide on supporting their +PMUs in the kernel. The following sections briefly explain perf' mechanism +and todos. + +You may check previous discussions here [1][2]. Also, it might be helpful +to check the appendix for related kernel structures. + + +1. Initialization +----------------- + +*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains +various methods according to perf's internal convention and PMU-specific +parameters. One should declare such instance to represent the PMU. By default, +*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very +basic support to a baseline QEMU model. + +Then he/she can either assign the instance's pointer to *riscv_pmu* so that +the minimal and already-implemented logic can be leveraged, or invent his/her +own *riscv_init_platform_pmu* implementation. + +In other words, existing sources of *riscv_base_pmu* merely provide a +reference implementation. Developers can flexibly decide how many parts they +can leverage, and in the most extreme case, they can customize every function +according to their needs. + + +2. Event Initialization +----------------------- + +When a user launches a perf command to monitor some events, it is first +interpreted by the userspace perf tool into multiple *perf_event_open* +system calls, and then each of them calls to the body of *event_init* +member function that was assigned in the previous step. In *riscv_base_pmu*'s +case, it is *riscv_event_init*. + +The main purpose of this function is to translate the event provided by user +into bitmap, so that HW-related control registers or counters can directly be +manipulated. The translation is based on the mappings and methods provided in +*riscv_pmu*. + +Note that some features can be done in this stage as well: + +(1) interrupt setting, which is stated in the next section; +(2) privilege level setting (user space only, kernel space only, both); +(3) destructor setting. Normally it is sufficient to apply *riscv_destroy_event*; +(4) tweaks for non-sampling events, which will be utilized by functions such as +*perf_adjust_period*, usually something like the follows: + +if (!is_sampling_event(event)) { + hwc->sample_period = x86_pmu.max_period; + hwc->last_period = hwc->sample_period; + local64_set(&hwc->period_left, hwc->sample_period); +} + +In the case of *riscv_base_pmu*, only (3) is provided for now. + + +3. Interrupt +------------ + +3.1. Interrupt Initialization + +This often occurs at the beginning of the *event_init* method. In common +practice, this should be a code segment like + +int x86_reserve_hardware(void) +{ + int err = 0; + + if (!atomic_inc_not_zero(&pmc_refcount)) { + mutex_lock(&pmc_reserve_mutex); + if (atomic_read(&pmc_refcount) == 0) { + if (!reserve_pmc_hardware()) + err = -EBUSY; + else + reserve_ds_buffers(); + } + if (!err) + atomic_inc(&pmc_refcount); + mutex_unlock(&pmc_reserve_mutex); + } + + return err; +} + +And the magic is in *reserve_pmc_hardware*, which usually does atomic +operations to make implemented IRQ accessible from some global function pointer. +*release_pmc_hardware* serves the opposite purpose, and it is used in event +destructors mentioned in previous section. + +(Note: From the implementations in all the architectures, the *reserve/release* +pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading. +It does NOT deal with the binding between an event and a physical counter, +which will be introduced in the next section.) + +3.2. IRQ Structure + +Basically, a IRQ runs the following pseudo code: + +for each hardware counter that triggered this overflow + + get the event of this counter + + // following two steps are defined as *read()*, + // check the section Reading/Writing Counters for details. + count the delta value since previous interrupt + update the event->count (# event occurs) by adding delta, and + event->hw.period_left by subtracting delta + + if the event overflows + sample data + set the counter appropriately for the next overflow + + if the event overflows again + too frequently, throttle this event + fi + fi + +end for + +However as of this writing, none of the RISC-V implementations have designed an +interrupt for perf, so the details are to be completed in the future. + +4. Reading/Writing Counters +--------------------------- + +They seem symmetric but perf treats them quite differently. For reading, there +is a *read* interface in *struct pmu*, but it serves more than just reading. +According to the context, the *read* function not only reads the content of the +counter (event->count), but also updates the left period to the next interrupt +(event->hw.period_left). + +But the core of perf does not need direct write to counters. Writing counters +is hidden behind the abstraction of 1) *pmu->start*, literally start counting so one +has to set the counter to a good value for the next interrupt; 2) inside the IRQ +it should set the counter to the same resonable value. + +Reading is not a problem in RISC-V but writing would need some effort, since +counters are not allowed to be written by S-mode. + + +5. add()/del()/start()/stop() +----------------------------- + +Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop() +starts/stop the counter of some event in the PMU. All of them take the same +arguments: *struct perf_event *event* and *int flag*. + +Consider perf as a state machine, then you will find that these functions serve +as the state transition process between those states. +Three states (event->hw.state) are defined: + +* PERF_HES_STOPPED: the counter is stopped +* PERF_HES_UPTODATE: the event->count is up-to-date +* PERF_HES_ARCH: arch-dependent usage ... we don't need this for now + +A normal flow of these state transitions are as follows: + +* A user launches a perf event, resulting in calling to *event_init*. +* When being context-switched in, *add* is called by the perf core, with a flag + PERF_EF_START, which means that the event should be started after it is added. + At this stage, a general event is bound to a physical counter, if any. + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now + stopped, and the (software) event count does not need updating. +** *start* is then called, and the counter is enabled. + With flag PERF_EF_RELOAD, it writes an appropriate value to the counter (check + previous section for detail). + Nothing is written if the flag does not contain PERF_EF_RELOAD. + The state now is reset to none, because it is neither stopped nor updated + (the counting already started) +* When being context-switched out, *del* is called. It then checks out all the + events in the PMU and calls *stop* to update their counts. +** *stop* is called by *del* + and the perf core with flag PERF_EF_UPDATE, and it often shares the same + subroutine as *read* with the same logic. + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again. + +** Life cycle of these two pairs: *add* and *del* are called repeatedly as + tasks switch in-and-out; *start* and *stop* is also called when the perf core + needs a quick stop-and-start, for instance, when the interrupt period is being + adjusted. + +Current implementation is sufficient for now and can be easily extended to +features in the future. + +A. Related Structures +--------------------- + +* struct pmu: include/linux/perf_event.h +* struct riscv_pmu: arch/riscv/include/asm/perf_event.h + + Both structures are designed to be read-only. + + *struct pmu* defines some function pointer interfaces, and most of them take +*struct perf_event* as a main argument, dealing with perf events according to +perf's internal state machine (check kernel/events/core.c for details). + + *struct riscv_pmu* defines PMU-specific parameters. The naming follows the +convention of all other architectures. + +* struct perf_event: include/linux/perf_event.h +* struct hw_perf_event + + The generic structure that represents perf events, and the hardware-related +details. + +* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h + + The structure that holds the status of events, has two fixed members: +the number of events and the array of the events. + +References +---------- + +[1] https://github.com/riscv/riscv-linux/pull/124 +[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA -- 2.17.0