Received: by 10.213.65.68 with SMTP id h4csp215518imn; Tue, 3 Apr 2018 19:10:07 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+JEUbAuEWW4zKt/3aeSZn7Vip19douPi2xiAexqRjqYESkmT/giuyLjSrTbimuirSR4SqD X-Received: by 10.99.167.6 with SMTP id d6mr10438367pgf.287.1522807807354; Tue, 03 Apr 2018 19:10:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522807807; cv=none; d=google.com; s=arc-20160816; b=OqBKXJTq/KBoE/adCiyQ4J80jx66h4WaWDh3BJVcWvdctpQWu9CYdmJ3LwET5L/dZL MSlEvZXa3KHEjOb1klvN6rJwYDIHiAVftK6/bqs3VaJZPDUptO/jpy20uivyHTCZKBsi oCZJa37x/4a7Uvcyw4AAF2kiQ1ZptLnx2SkOhHTdyF1IyB3y5lze7YrxIUHqeSiZnjFP zAZO6WiNL/e1Fy20WXb7WWrxrZNto4FMruBRvGTLsSx7ok+Qbs4PB+RUeKTYbQQwDtql erZQZKy7z5TzScoUxCQe7Yaggbar5noWr+JdJmDt7Z1AdlH65n9E2f2IA62pis7C/6LR BPsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=LNuNXr+yJhhVYt2O4zj55POvPgPTeijaj+ZZSKE1vHE=; b=qFfITiZU90wtdwgNj7wmFhxQL2N4I9fkwwi2fiFrTnLZZqdsmkU9lS2FwtAW0ACDmG I60LKRVbA0cBzRcTX37wk0dcIOy/7ekvyBtyClGaiD2BQQvs+eAsl+dSNkhBW6ieOhTj 1tyse6AZr26XyGm1FJtJIpkqR7VDgeO3mgHXss8IbcBedZo8l+gn9cM3gMGnRDqXmUsj t9x6aV0F//i8J92tTaFHs7erfqqj+WsoqOhtgDllL9RZBFzH9Zts7JHxvjrpZ9l7jAAA Ckg6A7P6gQNM53lqXOe6axrLa5yhddv9y74lKO/QNOVy55rhYgSJarUqSKl/6hrU1E8S tAMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sifive.com header.s=google header.b=AaD5y3lB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j3si2853141pga.364.2018.04.03.19.09.52; Tue, 03 Apr 2018 19:10:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@sifive.com header.s=google header.b=AaD5y3lB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752822AbeDDCIr (ORCPT + 99 others); Tue, 3 Apr 2018 22:08:47 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:34966 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751295AbeDDCIp (ORCPT ); Tue, 3 Apr 2018 22:08:45 -0400 Received: by mail-wm0-f66.google.com with SMTP id r82so39041468wme.0 for ; Tue, 03 Apr 2018 19:08:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=LNuNXr+yJhhVYt2O4zj55POvPgPTeijaj+ZZSKE1vHE=; b=AaD5y3lBH1RgUXeGbHOqrb/zpbz2Cw9X5wJ0PrdJk0y+z2QKfiwlnKWCPMDBQ3o/mn 7df4T2AeqyyYvHBiZqvmBEHDIYJCeZ8Eye+mKbQML4cRG7r0LQCtPKTWPFb3oHBXc0Wg rsi7Ir61VC5Q52+toHaX+FlqTElVc+YNUjsukPu70Z67LDYtYtrZZm7YhD21f9w/St2e kzUWtzNFxpgY6sM5Ly7NhMEgH65fJMaoaPudzbL2Y+fGn9wtdAGpq9w1CqIXPgnMHSab IxPu4+OCnxTVomDRL7yo6xyWxzylH5jhUsgWTeRu1f2IBbKSURo5M6IwoWQzMnhH3Sm4 xpHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=LNuNXr+yJhhVYt2O4zj55POvPgPTeijaj+ZZSKE1vHE=; b=hWOcWRtQLEMxvoBOU2nkFN6n/BscNedJBV3X2JF1PrFOGMUm50cwMVMYszqL64vQZR VBH57DahGKGaSDO0QLw/2+XxKN7GDdZWg8TlYHNPf6zuvYRebfymfY5T58+S+EKQv+7A Vr3b5/8bOwEsz/meQN7WLGBlUu6T20iMD1S+ap6kMUXetBZg2Dqza8J7ErttQdU+dc1q YhT1ky/cLIkgaQY2P+ABTKz5WA2S3CI4J6FULNojvPhZupuQUrr14oFb1Hxwws/jO9iw ohnwnsYJ4HD7TXkvbO7NS9afxHBNL/ugq/TRVfY1WwVrS3r8zcYoadSBNQyzKsD0tjhS fM7Q== X-Gm-Message-State: ALQs6tDuTEaWm0dE2oDjsOrW3YtZN68+PMqfzXzM/D+idTUK4E/LIyHg MoR8wErKMJYu0P+L1+EM+HuqFY4v+g28IKfe+zCdow== X-Received: by 10.28.220.193 with SMTP id t184mr6081418wmg.138.1522807724302; Tue, 03 Apr 2018 19:08:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.15.140 with HTTP; Tue, 3 Apr 2018 19:08:43 -0700 (PDT) In-Reply-To: <1522672284-29593-3-git-send-email-alankao@andestech.com> References: <1522672284-29593-1-git-send-email-alankao@andestech.com> <1522672284-29593-3-git-send-email-alankao@andestech.com> From: Alex Solomatnikov Date: Tue, 3 Apr 2018 19:08:43 -0700 Message-ID: Subject: Re: [PATCH v2 2/2] perf: riscv: Add Document for Future Porting Guide To: Alan Kao Cc: Palmer Dabbelt , Albert Ou , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Jonathan Corbet , linux-riscv@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Nick Hu , Greentime Hu Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Doc fixes: diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt index a3e930e..ae90a5e 100644 --- a/Documentation/riscv/pmu.txt +++ b/Documentation/riscv/pmu.txt @@ -20,7 +20,7 @@ the lack of the following general architectural performance monitoring features: * Enabling/Disabling counters Counters are just free-running all the time in our case. * Interrupt caused by counter overflow - No such design in the spec. + No such feature in the spec. * Interrupt indicator It is not possible to have many interrupt ports for all counters, so an interrupt indicator is required for software to tell which counter has @@ -159,14 +159,14 @@ interrupt for perf, so the details are to be completed in the future. They seem symmetric but perf treats them quite differently. For reading, there is a *read* interface in *struct pmu*, but it serves more than just reading. -According to the context, the *read* function not only read the content of the -counter (event->count), but also update the left period to the next interrupt +According to the context, the *read* function not only reads the content of the +counter (event->count), but also updates the left period for the next interrupt (event->hw.period_left). But the core of perf does not need direct write to counters. Writing counters -hides behind the abstraction of 1) *pmu->start*, literally start counting so one +is hidden behind the abstraction of 1) *pmu->start*, literally start counting so one has to set the counter to a good value for the next interrupt; 2) inside the IRQ -it should set the counter with the same reason. +it should set the counter to the same reasonable value. Reading is not a problem in RISC-V but writing would need some effort, since counters are not allowed to be written by S-mode. @@ -190,37 +190,37 @@ Three states (event->hw.state) are defined: A normal flow of these state transitions are as follows: * A user launches a perf event, resulting in calling to *event_init*. -* When being context-switched in, *add* is called by the perf core, with flag - PERF_EF_START, which mean that the event should be started after it is added. - In this stage, an general event is binded to a physical counter, if any. +* When being context-switched in, *add* is called by the perf core, with a flag + PERF_EF_START, which means that the event should be started after it is added. + At this stage, a general event is bound to a physical counter, if any. The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now stopped, and the (software) event count does not need updating. ** *start* is then called, and the counter is enabled. - With flag PERF_EF_RELOAD, it write the counter to an appropriate value (check - previous section for detail). - No writing is made if the flag does not contain PERF_EF_RELOAD. - The state now is reset to none, because it is neither stopped nor update - (the counting already starts) -* When being context-switched out, *del* is called. It then checkout all the - events in the PMU and call *stop* to update their counts. + With flag PERF_EF_RELOAD, it writes an appropriate value to the counter (check + the previous section for details). + Nothing is written if the flag does not contain PERF_EF_RELOAD. + The state now is reset to none, because it is neither stopped nor updated + (the counting already started) +* When being context-switched out, *del* is called. It then checks out all the + events in the PMU and calls *stop* to update their counts. ** *stop* is called by *del* and the perf core with flag PERF_EF_UPDATE, and it often shares the same subroutine as *read* with the same logic. The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again. -** Life cycles of these two pairs: *add* and *del* are called repeatedly as +** Life cycle of these two pairs: *add* and *del* are called repeatedly as tasks switch in-and-out; *start* and *stop* is also called when the perf core needs a quick stop-and-start, for instance, when the interrupt period is being adjusted. -Current implementation is sufficient for now and can be easily extend to +Current implementation is sufficient for now and can be easily extended with new features in the future. A. Related Structures --------------------- -* struct pmu: include/linux/perf_events.h -* struct riscv_pmu: arch/riscv/include/asm/perf_events.h +* struct pmu: include/linux/perf_event.h +* struct riscv_pmu: arch/riscv/include/asm/perf_event.h Both structures are designed to be read-only. @@ -231,13 +231,13 @@ perf's internal state machine (check kernel/events/core.c for details). *struct riscv_pmu* defines PMU-specific parameters. The naming follows the convention of all other architectures. -* struct perf_event: include/linux/perf_events.h +* struct perf_event: include/linux/perf_event.h * struct hw_perf_event The generic structure that represents perf events, and the hardware-related details. -* struct riscv_hw_events: arch/riscv/include/asm/perf_events.h +* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h The structure that holds the status of events, has two fixed members: the number of events and the array of the events. On Mon, Apr 2, 2018 at 5:31 AM, Alan Kao wrote: > Cc: Nick Hu > Cc: Greentime Hu > Signed-off-by: Alan Kao > --- > Documentation/riscv/pmu.txt | 249 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 249 insertions(+) > create mode 100644 Documentation/riscv/pmu.txt > > diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt > new file mode 100644 > index 000000000000..a3e930ed5141 > --- /dev/null > +++ b/Documentation/riscv/pmu.txt > @@ -0,0 +1,249 @@ > +Supporting PMUs on RISC-V platforms > +========================================== > +Alan Kao , Mar 2018 > + > +Introduction > +------------ > + > +As of this writing, perf_event-related features mentioned in The RISC-V ISA > +Privileged Version 1.10 are as follows: > +(please check the manual for more details) > + > +* [m|s]counteren > +* mcycle[h], cycle[h] > +* minstret[h], instret[h] > +* mhpeventx, mhpcounterx[h] > + > +With such function set only, porting perf would require a lot of work, due to > +the lack of the following general architectural performance monitoring features: > + > +* Enabling/Disabling counters > + Counters are just free-running all the time in our case. > +* Interrupt caused by counter overflow > + No such design in the spec. > +* Interrupt indicator > + It is not possible to have many interrupt ports for all counters, so an > + interrupt indicator is required for software to tell which counter has > + just overflowed. > +* Writing to counters > + There will be an SBI to support this since the kernel cannot modify the > + counters [1]. Alternatively, some vendor considers to implement > + hardware-extension for M-S-U model machines to write counters directly. > + > +This document aims to provide developers a quick guide on supporting their > +PMUs in the kernel. The following sections briefly explain perf' mechanism > +and todos. > + > +You may check previous discussions here [1][2]. Also, it might be helpful > +to check the appendix for related kernel structures. > + > + > +1. Initialization > +----------------- > + > +*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains > +various methods according to perf's internal convention and PMU-specific > +parameters. One should declare such instance to represent the PMU. By default, > +*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very > +basic support to a baseline QEMU model. > + > +Then he/she can either assign the instance's pointer to *riscv_pmu* so that > +the minimal and already-implemented logic can be leveraged, or invent his/her > +own *riscv_init_platform_pmu* implementation. > + > +In other words, existing sources of *riscv_base_pmu* merely provide a > +reference implementation. Developers can flexibly decide how many parts they > +can leverage, and in the most extreme case, they can customize every function > +according to their needs. > + > + > +2. Event Initialization > +----------------------- > + > +When a user launches a perf command to monitor some events, it is first > +interpreted by the userspace perf tool into multiple *perf_event_open* > +system calls, and then each of them calls to the body of *event_init* > +member function that was assigned in the previous step. In *riscv_base_pmu*'s > +case, it is *riscv_event_init*. > + > +The main purpose of this function is to translate the event provided by user > +into bitmap, so that HW-related control registers or counters can directly be > +manipulated. The translation is based on the mappings and methods provided in > +*riscv_pmu*. > + > +Note that some features can be done in this stage as well: > + > +(1) interrupt setting, which is stated in the next section; > +(2) privilege level setting (user space only, kernel space only, both); > +(3) destructor setting. Normally it is sufficient to apply *riscv_destroy_event*; > +(4) tweaks for non-sampling events, which will be utilized by functions such as > +*perf_adjust_period*, usually something like the follows: > + > +if (!is_sampling_event(event)) { > + hwc->sample_period = x86_pmu.max_period; > + hwc->last_period = hwc->sample_period; > + local64_set(&hwc->period_left, hwc->sample_period); > +} > + > +In the case of *riscv_base_pmu*, only (3) is provided for now. > + > + > +3. Interrupt > +------------ > + > +3.1. Interrupt Initialization > + > +This often occurs at the beginning of the *event_init* method. In common > +practice, this should be a code segment like > + > +int x86_reserve_hardware(void) > +{ > + int err = 0; > + > + if (!atomic_inc_not_zero(&pmc_refcount)) { > + mutex_lock(&pmc_reserve_mutex); > + if (atomic_read(&pmc_refcount) == 0) { > + if (!reserve_pmc_hardware()) > + err = -EBUSY; > + else > + reserve_ds_buffers(); > + } > + if (!err) > + atomic_inc(&pmc_refcount); > + mutex_unlock(&pmc_reserve_mutex); > + } > + > + return err; > +} > + > +And the magic is in *reserve_pmc_hardware*, which usually does atomic > +operations to make implemented IRQ accessible from some global function pointer. > +*release_pmc_hardware* serves the opposite purpose, and it is used in event > +destructors mentioned in previous section. > + > +(Note: From the implementations in all the architectures, the *reserve/release* > +pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading. > +It does NOT deal with the binding between an event and a physical counter, > +which will be introduced in the next section.) > + > +3.2. IRQ Structure > + > +Basically, a IRQ runs the following pseudo code: > + > +for each hardware counter that triggered this overflow > + > + get the event of this counter > + > + // following two steps are defined as *read()*, > + // check the section Reading/Writing Counters for details. > + count the delta value since previous interrupt > + update the event->count (# event occurs) by adding delta, and > + event->hw.period_left by subtracting delta > + > + if the event overflows > + sample data > + set the counter appropriately for the next overflow > + > + if the event overflows again > + too frequently, throttle this event > + fi > + fi > + > +end for > + > +However as of this writing, none of the RISC-V implementations have designed an > +interrupt for perf, so the details are to be completed in the future. > + > +4. Reading/Writing Counters > +--------------------------- > + > +They seem symmetric but perf treats them quite differently. For reading, there > +is a *read* interface in *struct pmu*, but it serves more than just reading. > +According to the context, the *read* function not only read the content of the > +counter (event->count), but also update the left period to the next interrupt > +(event->hw.period_left). > + > +But the core of perf does not need direct write to counters. Writing counters > +hides behind the abstraction of 1) *pmu->start*, literally start counting so one > +has to set the counter to a good value for the next interrupt; 2) inside the IRQ > +it should set the counter with the same reason. > + > +Reading is not a problem in RISC-V but writing would need some effort, since > +counters are not allowed to be written by S-mode. > + > + > +5. add()/del()/start()/stop() > +----------------------------- > + > +Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop() > +starts/stop the counter of some event in the PMU. All of them take the same > +arguments: *struct perf_event *event* and *int flag*. > + > +Consider perf as a state machine, then you will find that these functions serve > +as the state transition process between those states. > +Three states (event->hw.state) are defined: > + > +* PERF_HES_STOPPED: the counter is stopped > +* PERF_HES_UPTODATE: the event->count is up-to-date > +* PERF_HES_ARCH: arch-dependent usage ... we don't need this for now > + > +A normal flow of these state transitions are as follows: > + > +* A user launches a perf event, resulting in calling to *event_init*. > +* When being context-switched in, *add* is called by the perf core, with flag > + PERF_EF_START, which mean that the event should be started after it is added. > + In this stage, an general event is binded to a physical counter, if any. > + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now > + stopped, and the (software) event count does not need updating. > +** *start* is then called, and the counter is enabled. > + With flag PERF_EF_RELOAD, it write the counter to an appropriate value (check > + previous section for detail). > + No writing is made if the flag does not contain PERF_EF_RELOAD. > + The state now is reset to none, because it is neither stopped nor update > + (the counting already starts) > +* When being context-switched out, *del* is called. It then checkout all the > + events in the PMU and call *stop* to update their counts. > +** *stop* is called by *del* > + and the perf core with flag PERF_EF_UPDATE, and it often shares the same > + subroutine as *read* with the same logic. > + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again. > + > +** Life cycles of these two pairs: *add* and *del* are called repeatedly as > + tasks switch in-and-out; *start* and *stop* is also called when the perf core > + needs a quick stop-and-start, for instance, when the interrupt period is being > + adjusted. > + > +Current implementation is sufficient for now and can be easily extend to > +features in the future. > + > +A. Related Structures > +--------------------- > + > +* struct pmu: include/linux/perf_events.h > +* struct riscv_pmu: arch/riscv/include/asm/perf_events.h > + > + Both structures are designed to be read-only. > + > + *struct pmu* defines some function pointer interfaces, and most of them take > +*struct perf_event* as a main argument, dealing with perf events according to > +perf's internal state machine (check kernel/events/core.c for details). > + > + *struct riscv_pmu* defines PMU-specific parameters. The naming follows the > +convention of all other architectures. > + > +* struct perf_event: include/linux/perf_events.h > +* struct hw_perf_event > + > + The generic structure that represents perf events, and the hardware-related > +details. > + > +* struct riscv_hw_events: arch/riscv/include/asm/perf_events.h > + > + The structure that holds the status of events, has two fixed members: > +the number of events and the array of the events. > + > +References > +---------- > + > +[1] https://github.com/riscv/riscv-linux/pull/124 > +[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA > -- > 2.16.2 >