Received: by 2002:ab2:788f:0:b0:1ee:8f2e:70ae with SMTP id b15csp147814lqi; Wed, 6 Mar 2024 12:39:00 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVSLbgo5ol+oQNMrXGftLqdFhINZFBpUJBi8BBv8j0lMVBcDzur0IXqDbj8p5HCdhSuHkLqXNsjA6hSG7NVVaO6KW2Zjjk/VewMfpNLkQ== X-Google-Smtp-Source: AGHT+IEHqX3cJv+hxRGENZYNotmR/l7PJXGaNZ9ayZaBh4vXkD+PwAbuocnhL+bUkrEwAV6x5geu X-Received: by 2002:a17:902:c406:b0:1dc:af71:4d46 with SMTP id k6-20020a170902c40600b001dcaf714d46mr6258953plk.13.1709757540490; Wed, 06 Mar 2024 12:39:00 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709757540; cv=pass; d=google.com; s=arc-20160816; b=rLtaLk7KDXud5HzklZisbSR3ubyI3ypFrICsUgOOETuYWT2IubLuHJbkN5u2sWeqx8 CN7ok1qAACkGG3hbJTn2t3allFhpKb4tQsWCKhPEnCGskTO6hBo3Nz850D/3o0eW0kd1 PasgGNG9QqJIDDaflGUn8kJUuS2SphM7mvG5QkzFSRjmjJ3slleGdO+JMNaQETfVsdy3 g4Kxypfpjtcg/jT4srBUK+TXW4kaYeaDsKTNRkdCWk6hJG7ZUMDAWtyMhKhcc+vkMNnN +65Abk7GwhD8Qr1JoAmdYrB2G7UMIlATrmXnVuHXmJmtm1LQ8eZuD9Qd8KCOySDuzgbs 2vHw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:cc:references:to:from :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=SGJs3fAgrOrt4SIDU6iB5d/VXcdxQ+ueK2YAbXPI/Lc=; fh=LSVo7dGImZ0JFImEMQXi/lzCkwXO0F+YevDj9ZDxIGY=; b=Ngzh+XP3DjMs8ADeL0W6af5+GaFWTSef/iCNCnEeYcF+SUcY4pg10Gc3MdRcbtGxl8 lYPMe369PfWo9WwZcZM14pslrHXf01wo6mFv/mYTvl57wC6Gya9ZGH3n4f5hlg/n2hxl FPEKQhdjK4KKwn8Dxe3SJXn60aGZMQErox/g+/cWRwdgeow1dEJaeBAp3tGntkGVq5ex lCZeyncuyIeZnV/Kq4H1tIy4OnvDR2MupcAdCtPEWyuC4UkQfqeoPYJDstbBHsQTppn3 Nvdls/xqkk5vdwjRUAbc48GeAASH8LLT1bj7hJ+Wu5hMamyjVZErk4JRubZ4mjDgndGq 3fng==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=ghiti.fr); spf=pass (google.com: domain of linux-kernel+bounces-94562-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-94562-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id v5-20020a170903238500b001dca843f0bcsi12737126plh.443.2024.03.06.12.39.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Mar 2024 12:39:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-94562-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=ghiti.fr); spf=pass (google.com: domain of linux-kernel+bounces-94562-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-94562-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 31951285E80 for ; Wed, 6 Mar 2024 20:38:16 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F0C4214A82; Wed, 6 Mar 2024 20:38:11 +0000 (UTC) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18695D531; Wed, 6 Mar 2024 20:38:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.183.197 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709757490; cv=none; b=HrUCPuL9S+B3fcuxDpE81uU551QFa+ijshtIQDWf9ScBvUYUaeUhuo6XBsvsO4bIBaylsXsPx8IeJVuSd+/nZdK9lIXQA4G0giM8o7c0ASolkzZGrCeREK2/G7X6IStosm2NDaoWSVshC7eIGXSbFaYJyQE+X9Q1OIfGB2/j1X8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709757490; c=relaxed/simple; bh=nCr9yN0DKHVMyRm+ySvDPc9U+ggnPJT9UTR6/iKlEzM=; h=Message-ID:Date:MIME-Version:Subject:From:To:References:Cc: In-Reply-To:Content-Type; b=Abyj38W5JGJoYHUFWIHBmO+1yJ+s62AAyXSGCYLXBnpvqRwdLaWqzFF7mE7/d6NODMhMaR9hz6+F04NWOzf3yDQRv1IWCPclfT1Ihuwyu6csPFExz7PJoZLEZ+PdkOqZ18vZhFfnBfSqwv15tmkyJ6j/u2uMw5aEQy5/zFPjG7E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ghiti.fr; spf=pass smtp.mailfrom=ghiti.fr; arc=none smtp.client-ip=217.70.183.197 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ghiti.fr Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ghiti.fr Received: by mail.gandi.net (Postfix) with ESMTPSA id 06D2B1C0004; Wed, 6 Mar 2024 20:38:01 +0000 (UTC) Message-ID: <60a1ad85-7bca-45ef-91be-8f1b3cfc9011@ghiti.fr> Date: Wed, 6 Mar 2024 21:38:01 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS Content-Language: en-US From: Alexandre Ghiti To: Puranjay Mohan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Steven Rostedt , Masami Hiramatsu , Mark Rutland , Sami Tolvanen , Guo Ren , Ley Foon Tan , Deepak Gupta , Sia Jee Heng , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Song Shuai , =?UTF-8?B?Q2zDqW1lbnQgTMOpZ2Vy?= , Al Viro , Jisheng Zhang , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org References: <20240306165904.108141-1-puranjay12@gmail.com> <34c9d54a-0f19-4671-8b05-9adf3b6c9a1a@ghiti.fr> Cc: Evgenii Shatokhin , Andy Chiu In-Reply-To: <34c9d54a-0f19-4671-8b05-9adf3b6c9a1a@ghiti.fr> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-GND-Sasl: alex@ghiti.fr +cc Andy and Evgenii On 06/03/2024 21:35, Alexandre Ghiti wrote: > Hi Puranjay, > > On 06/03/2024 17:59, Puranjay Mohan wrote: >> This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V. >> This allows each ftrace callsite to provide an ftrace_ops to the common >> ftrace trampoline, allowing each callsite to invoke distinct tracer >> functions without the need to fall back to list processing or to >> allocate custom trampolines for each callsite. This significantly speeds >> up cases where multiple distinct trace functions are used and callsites >> are mostly traced by a single tracer. >> >> The idea and most of the implementation is taken from the ARM64's >> implementation of the same feature. The idea is to place a pointer to >> the ftrace_ops as a literal at a fixed offset from the function entry >> point, which can be recovered by the common ftrace trampoline. >> >> We use -fpatchable-function-entry to reserve 8 bytes above the function >> entry by emitting 2 4 byte or 4 2 byte  nops depending on the >> presence of >> CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer >> to the associated ftrace_ops for that callsite. Functions are aligned to >> 8 bytes to make sure that the accesses to this literal are atomic. >> >> This approach allows for directly invoking ftrace_ops::func even for >> ftrace_ops which are dynamically-allocated (or part of a module), >> without going via ftrace_ops_list_func. >> >> I've benchamrked this with the ftrace_ops sample module on Qemu, with >> the next version, I will provide benchmarks on real hardware: >> >> Without this patch: >> >> +-----------------------+-----------------+----------------------------+ >> |  Number of tracers    | Total time (ns) | Per-call average time      | >> |-----------------------+-----------------+----------------------------| >> | Relevant | Irrelevant |    100000 calls | Total (ns) | Overhead (ns) | >> |----------+------------+-----------------+------------+---------------| >> |        0 |          0 |        15615700 |        156 |             - | >> |        0 |          1 |        15917600 |        159 |             - | >> |        0 |          2 |        15668000 |        156 |             - | >> |        0 |         10 |        14971500 |        149 |             - | >> |        0 |        100 |        15417600 |        154 |             - | >> |        0 |        200 |        15387000 |        153 |             - | >> |----------+------------+-----------------+------------+---------------| >> |        1 |          0 |       119906800 |       1199 |          1043 | >> |        1 |          1 |       137428600 |       1374 |          1218 | >> |        1 |          2 |       159562400 |       1374 |          1218 | >> |        1 |         10 |       302099900 |       3020 |          2864 | >> |        1 |        100 |      2008785500 |      20087 | 19931 | >> |        1 |        200 |      3965221900 |      39652 | 39496 | >> |----------+------------+-----------------+------------+---------------| >> |        1 |          0 |       119166700 |       1191 |          1035 | >> |        2 |          0 |       157999900 |       1579 |          1423 | >> |       10 |          0 |       425370100 |       4253 |          4097 | >> |      100 |          0 |      3595252100 |      35952 | 35796 | >> |      200 |          0 |      7023485700 |      70234 | 70078 | >> +----------+------------+-----------------+------------+---------------+ >> >> Note: per-call overhead is estimated relative to the baseline case with >> 0 relevant tracers and 0 irrelevant tracers. >> >> With this patch: >> >> +-----------------------+-----------------+----------------------------+ >> |   Number of tracers   | Total time (ns) | Per-call average time      | >> |-----------------------+-----------------+----------------------------| >> | Relevant | Irrelevant |    100000 calls | Total (ns) | Overhead (ns) | >> |----------+------------+-----------------+------------+---------------| >> |        0 |          0 |        15254600 |        152 |             - | >> |        0 |          1 |        16136700 |        161 |             - | >> |        0 |          2 |        15329500 |        153 |             - | >> |        0 |         10 |        15148800 |        151 |             - | >> |        0 |        100 |        15746900 |        157 |             - | >> |        0 |        200 |        15737400 |        157 |             - | >> |----------+------------+-----------------+------------+---------------| >> |        1 |          0 |        47909000 |        479 |           327 | >> |        1 |          1 |        48297400 |        482 |           330 | >> |        1 |          2 |        47314100 |        473 |           321 | >> |        1 |         10 |        47844900 |        478 |           326 | >> |        1 |        100 |        46591900 |        465 |           313 | >> |        1 |        200 |        47178900 |        471 |           319 | >> |----------+------------+-----------------+------------+---------------| >> |        1 |          0 |        46715800 |        467 |           315 | >> |        2 |          0 |       155134500 |       1551 |          1399 | >> |       10 |          0 |       442672800 |       4426 |          4274 | >> |      100 |          0 |      4092353900 |      40923 | 40771 | >> |      200 |          0 |      7135796400 |      71357 | 71205 | >> +----------+------------+-----------------+------------+---------------+ >> >> Note: per-call overhead is estimated relative to the baseline case with >> 0 relevant tracers and 0 irrelevant tracers. >> >> As can be seen from the above: >> >>   a) Whenever there is a single relevant tracer function associated >> with a >>      tracee, the overhead of invoking the tracer is constant, and >> does not >>      scale with the number of tracers which are *not* associated with >> that >>      tracee. >> >>   b) The overhead for a single relevant tracer has dropped to ~1/3 of >> the >>      overhead prior to this series (from 1035ns to 315ns). This is >> largely >>      due to permitting calls to dynamically-allocated ftrace_ops without >>      going through ftrace_ops_list_func. >> >> Why is this patch a RFC patch: >>   1. I saw some rcu stalls on Qemu and need to debug them and see if >> they >>      were introduced by this patch. > > > FYI, I'm currently working on debugging such issues (and other) with > the *current* ftrace implementation, so probably not caused by your > patchset. But keep debugging too, maybe this introduces other issues > or even better, you'll find the root cause :) > > >>   2. This needs to be tested thoroughly on real hardware. >>   3. Seeking reviews to fix any fundamental problems with this patch >> that I >>      might have missed due to my lack of RISC-V architecture knowledge. >>   4. I would like to benchmark this on real hardware and put the >> results in >>      the commit message. >> >> Signed-off-by: Puranjay Mohan >> --- >>   arch/riscv/Kconfig              |  2 ++ >>   arch/riscv/Makefile             |  8 +++++ >>   arch/riscv/include/asm/ftrace.h |  3 ++ >>   arch/riscv/kernel/asm-offsets.c |  3 ++ >>   arch/riscv/kernel/ftrace.c      | 59 +++++++++++++++++++++++++++++++++ >>   arch/riscv/kernel/mcount-dyn.S  | 42 ++++++++++++++++++++--- >>   6 files changed, 112 insertions(+), 5 deletions(-) >> >> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >> index 0bfcfec67ed5..e474742e23b2 100644 >> --- a/arch/riscv/Kconfig >> +++ b/arch/riscv/Kconfig >> @@ -78,6 +78,7 @@ config RISCV >>       select EDAC_SUPPORT >>       select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && >> !DYNAMIC_FTRACE) >>       select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if >> DYNAMIC_FTRACE >> +    select FUNCTION_ALIGNMENT_8B if DYNAMIC_FTRACE_WITH_CALL_OPS > > > A recent discussion [1] states that -falign-functions cannot guarantee > this alignment for all code and that gcc developers came up with a new > option [2]: WDYT? I have added Andy and Evgenii in +cc to help on that. > > [1] > https://lore.kernel.org/linux-riscv/4fe4567b-96be-4102-952b-7d39109b2186@yadro.com/ > [2] > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326 > > >>       select GENERIC_ARCH_TOPOLOGY >>       select GENERIC_ATOMIC64 if !64BIT >>       select GENERIC_CLOCKEVENTS_BROADCAST if SMP >> @@ -127,6 +128,7 @@ config RISCV >>       select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && >> (CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE) >>       select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS >>       select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE >> +    select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if >> (DYNAMIC_FTRACE_WITH_REGS && !CFI_CLANG) >>       select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL >>       select HAVE_FUNCTION_GRAPH_TRACER >>       select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER >> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile >> index 252d63942f34..875ad5dc3d32 100644 >> --- a/arch/riscv/Makefile >> +++ b/arch/riscv/Makefile >> @@ -14,12 +14,20 @@ endif >>   ifeq ($(CONFIG_DYNAMIC_FTRACE),y) >>       LDFLAGS_vmlinux += --no-relax >>       KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY >> +ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS), y) >> +ifeq ($(CONFIG_RISCV_ISA_C),y) >> +    CC_FLAGS_FTRACE := -fpatchable-function-entry=8,4 >> +else >> +    CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2 >> +endif >> +else >>   ifeq ($(CONFIG_RISCV_ISA_C),y) >>       CC_FLAGS_FTRACE := -fpatchable-function-entry=4 >>   else >>       CC_FLAGS_FTRACE := -fpatchable-function-entry=2 >>   endif >>   endif >> +endif >>     ifeq ($(CONFIG_CMODEL_MEDLOW),y) >>   KBUILD_CFLAGS_MODULE += -mcmodel=medany >> diff --git a/arch/riscv/include/asm/ftrace.h >> b/arch/riscv/include/asm/ftrace.h >> index 329172122952..c9a84222c9ea 100644 >> --- a/arch/riscv/include/asm/ftrace.h >> +++ b/arch/riscv/include/asm/ftrace.h >> @@ -28,6 +28,9 @@ >>   void MCOUNT_NAME(void); >>   static inline unsigned long ftrace_call_adjust(unsigned long addr) >>   { >> +    if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) >> +        return addr + 8; >> + >>       return addr; >>   } >>   diff --git a/arch/riscv/kernel/asm-offsets.c >> b/arch/riscv/kernel/asm-offsets.c >> index a03129f40c46..7d7c4b486852 100644 >> --- a/arch/riscv/kernel/asm-offsets.c >> +++ b/arch/riscv/kernel/asm-offsets.c >> @@ -488,4 +488,7 @@ void asm_offsets(void) >>       DEFINE(STACKFRAME_SIZE_ON_STACK, ALIGN(sizeof(struct >> stackframe), STACK_ALIGN)); >>       OFFSET(STACKFRAME_FP, stackframe, fp); >>       OFFSET(STACKFRAME_RA, stackframe, ra); >> +#ifdef CONFIG_FUNCTION_TRACER >> +    DEFINE(FTRACE_OPS_FUNC,        offsetof(struct ftrace_ops, func)); >> +#endif >>   } >> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c >> index f5aa24d9e1c1..e2e75e15d32e 100644 >> --- a/arch/riscv/kernel/ftrace.c >> +++ b/arch/riscv/kernel/ftrace.c >> @@ -82,9 +82,52 @@ static int __ftrace_modify_call(unsigned long >> hook_pos, unsigned long target, >>       return 0; >>   } >>   +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS >> +static const struct ftrace_ops *riscv64_rec_get_ops(struct >> dyn_ftrace *rec) >> +{ >> +    const struct ftrace_ops *ops = NULL; >> + >> +    if (rec->flags & FTRACE_FL_CALL_OPS_EN) { >> +        ops = ftrace_find_unique_ops(rec); >> +        WARN_ON_ONCE(!ops); >> +    } >> + >> +    if (!ops) >> +        ops = &ftrace_list_ops; >> + >> +    return ops; >> +} >> + >> +static int ftrace_rec_set_ops(const struct dyn_ftrace *rec, >> +                  const struct ftrace_ops *ops) >> +{ >> +    unsigned long literal = rec->ip - 8; >> + >> +    return patch_text_nosync((void *)literal, &ops, sizeof(ops)); >> +} >> + >> +static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) >> +{ >> +    return ftrace_rec_set_ops(rec, &ftrace_nop_ops); >> +} >> + >> +static int ftrace_rec_update_ops(struct dyn_ftrace *rec) >> +{ >> +    return ftrace_rec_set_ops(rec, riscv64_rec_get_ops(rec)); >> +} >> +#else >> +static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) { return 0; } >> +static int ftrace_rec_update_ops(struct dyn_ftrace *rec) { return 0; } >> +#endif >> + >>   int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) >>   { >>       unsigned int call[2]; >> +    int ret; >> + >> +    ret = ftrace_rec_update_ops(rec); >> +    if (ret) >> +        return ret; >>         make_call_t0(rec->ip, addr, call); >>   @@ -98,6 +141,11 @@ int ftrace_make_nop(struct module *mod, struct >> dyn_ftrace *rec, >>               unsigned long addr) >>   { >>       unsigned int nops[2] = {NOP4, NOP4}; >> +    int ret; >> + >> +    ret = ftrace_rec_set_nop_ops(rec); >> +    if (ret) >> +        return ret; >>         if (patch_text_nosync((void *)rec->ip, nops, MCOUNT_INSN_SIZE)) >>           return -EPERM; >> @@ -125,6 +173,13 @@ int ftrace_init_nop(struct module *mod, struct >> dyn_ftrace *rec) >>     int ftrace_update_ftrace_func(ftrace_func_t func) >>   { >> +    /* >> +     * When using CALL_OPS, the function to call is associated with the >> +     * call site, and we don't have a global function pointer to >> update. >> +     */ >> +    if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) >> +        return 0; >> + >>       int ret = __ftrace_modify_call((unsigned long)&ftrace_call, >>                          (unsigned long)func, true, true); >>       if (!ret) { >> @@ -147,6 +202,10 @@ int ftrace_modify_call(struct dyn_ftrace *rec, >> unsigned long old_addr, >>       make_call_t0(caller, old_addr, call); >>       ret = ftrace_check_current_call(caller, call); >>   +    if (ret) >> +        return ret; >> + >> +    ret = ftrace_rec_update_ops(rec); >>       if (ret) >>           return ret; >>   diff --git a/arch/riscv/kernel/mcount-dyn.S >> b/arch/riscv/kernel/mcount-dyn.S >> index b7561288e8da..cb241e36e514 100644 >> --- a/arch/riscv/kernel/mcount-dyn.S >> +++ b/arch/riscv/kernel/mcount-dyn.S >> @@ -191,11 +191,35 @@ >>       .endm >>         .macro PREPARE_ARGS >> -    addi    a0, t0, -FENTRY_RA_OFFSET >> +    addi    a0, t0, -FENTRY_RA_OFFSET    // ip (callsite's auipc insn) >> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS >> +    /* >> +     * When CALL_OPS is enabled (2 or 4) nops [8B] are placed before >> the >> +     * function entry, these are later overwritten with the pointer >> to the >> +     * associated struct ftrace_ops. >> +     * >> +     * -8: &ftrace_ops of the associated tracer function. >> +     *: >> +     *  0: auipc  t0/ra, 0x? >> +     *  4: jalr   t0/ra, ?(t0/ra) >> +     * >> +     * -8: &ftrace_nop_ops >> +     *: >> +     *  0: nop >> +     *  4: nop >> +     * >> +     * t0 is set to ip+8 after the jalr is executed at the callsite, >> +     * so we find the associated op at t0-16. >> +     */ >> +    mv     a1, ra                // parent_ip >> +    REG_L   a2, -16(t0)            // op >> +    REG_L   ra, FTRACE_OPS_FUNC(a2)        // op->func >> +#else >>       la    a1, function_trace_op >> -    REG_L    a2, 0(a1) >> -    mv    a1, ra >> -    mv    a3, sp >> +    REG_L    a2, 0(a1)            // op >> +    mv    a1, ra                // parent_ip >> +#endif >> +    mv    a3, sp                // regs >>       .endm >>     #endif /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */ >> @@ -233,8 +257,12 @@ SYM_FUNC_START(ftrace_regs_caller) >>       SAVE_ABI_REGS 1 >>       PREPARE_ARGS >>   +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS >> +    jalr ra >> +#else >>   SYM_INNER_LABEL(ftrace_regs_call, SYM_L_GLOBAL) >>       call    ftrace_stub >> +#endif >>         RESTORE_ABI_REGS 1 >>       bnez    t1, .Ldirect >> @@ -247,9 +275,13 @@ SYM_FUNC_START(ftrace_caller) >>       SAVE_ABI_REGS 0 >>       PREPARE_ARGS >>   -SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL) >> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS >> +    jalr ra >> +#else >> +SYM_INNER_LABEL(ftrace_regs_call, SYM_L_GLOBAL) >>       call    ftrace_stub >>   +#endif >>       RESTORE_ABI_REGS 0 >>       jr    t0 >>   SYM_FUNC_END(ftrace_caller) > > > As I'm diving into ftrace right now, I'll give a proper review soon. > But as a note, I noticed that the function_graph tracer, when enabled, > makes the whole system unresponsive (but still up, just very slow). A > fix I sent recently seems to really improve that if you're interested > in testing it (I am :)). You can find it here: > https://lore.kernel.org/linux-riscv/20240229121056.203419-1-alexghiti@rivosinc.com/ > > Thanks, > > Alex > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv