Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp2424316ybl; Thu, 9 Jan 2020 12:30:45 -0800 (PST) X-Google-Smtp-Source: APXvYqwSsrQbDEnLJdFywWAwnK8h7qs0ecl3FOhX/Gs8EekCXBVd5uOgwzedA5rTWtq8I/E+HvGL X-Received: by 2002:a9d:6f85:: with SMTP id h5mr10063898otq.19.1578601844986; Thu, 09 Jan 2020 12:30:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578601844; cv=none; d=google.com; s=arc-20160816; b=JzVY6Fjm5lVLvJ1CqWGsU15466u/SvL0sQRKS9TN3uhgNzhk134CnFN4UOcDkJdS+G dNoBmbgnBM9f+RadXK6cUJzv9M2xA11prRl7vb9+gCtC0w8d60VTydLmxO1O7K0VmB7l rtYG6V7ghmAj9hQ/ofx/rkWfMCx2uO+D5i7bmzl31tkAG/Ufv4QY5dl1Jtigim7uidCb FJS3VVjwY7rPmFKcGVwXCjEspiBsFJ6r93am71tRlwfUH/Nu/yukgQfYSSqzT7X0SLxC skcyJciCdr+618T/2CAGo58ur9OibBVACWgPvVhHWs+F6oOB0MLxRyf5kJKDA43feOeM 3LYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=VYnZ1FPHQclUk4cXfh30rGCIFJc9rOeC+V0fSTZhAXQ=; b=B88dbo0w0ic9U/kPI4n2EvF59sbkZhg9zf7Yij+LeRlMeQuwBpdk6yUPhH9mGNJgcM 5fjYwJNoYYf+gkYiImvJUBqGdUabtG4LNOVeJKupPgU4xo5RwIp0a6Hil9aCTqbnJlvj cFfn3pyJ23vwzO8tQdx4+kINScHuM6/rvmfNPFNR4k8iyh5UtE87rS8SIyQesGtEqqH3 kNxOSYlSmDUjxOJ9CLQhI+7dq1/mPvPd9cmr3wOq6JRn+7tGJ2k/YtXVqE/Dx6PO7xzO sxwMxgBmcaQChRb66z6f44+XFXrVSqysFTYFMYh6aCqGMWbLfe/ZhgvP6/3zwLExptfr womA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v1si5176403otf.161.2020.01.09.12.30.32; Thu, 09 Jan 2020 12:30:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387861AbgAIQtE (ORCPT + 99 others); Thu, 9 Jan 2020 11:49:04 -0500 Received: from foss.arm.com ([217.140.110.172]:34536 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727738AbgAIQtD (ORCPT ); Thu, 9 Jan 2020 11:49:03 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BC0BE1FB; Thu, 9 Jan 2020 08:49:02 -0800 (PST) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 742D23F703; Thu, 9 Jan 2020 08:49:01 -0800 (PST) Date: Thu, 9 Jan 2020 16:48:59 +0000 From: Mark Rutland To: Cheng Jian Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, xiexiuqi@huawei.com, huawei.libin@huawei.com, bobo.shaobowang@huawei.com, catalin.marinas@arm.com, duwe@lst.de Subject: Re: [RFC PATCH] arm64/ftrace: support dynamically allocated trampolines Message-ID: <20200109164858.GH3112@lakrids.cambridge.arm.com> References: <20200109142736.1122-1-cj.chengjian@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200109142736.1122-1-cj.chengjian@huawei.com> User-Agent: Mutt/1.11.1+11 (2f07cb52) (2018-12-01) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 09, 2020 at 02:27:36PM +0000, Cheng Jian wrote: > When we tracing multiple functions, it has to use a list > function and cause all the other functions being traced > to check the hash of the ftrace_ops. But this is very > inefficient. Just how bad is this, and when/where does it matter? How much does this patch improve matters? > we can call a dynamically allocated trampoline which calls > the callback directly to solve this problem. This patch > introduce dynamically alloced trampolines for ARM64. > > If a callback is registered to a function and there's no > other callback registered to that function, the ftrace_ops > will get its own trampoline allocated for it that will call > the function directly. > > We merge two functions (ftrace_caller/ftrace_regs_caller and > ftrace_common) into one function, so we no longer need a jump > to ftrace_common and fix it to NOP. > > similar to X86_64, save the local ftrace_ops at the end. > > the ftrace trampoline layout : > > low > ftrace_(regs_)caller => +---------------+ > | ftrace_caller | > ftrace_common => +---------------+ > | ftrace_common | > function_trace_op_ptr => | ... | ldr x2, > | | b ftrace_stub | > | | nop | fgraph call > | +---------------+ > +------------>| ftrace_ops | > +---------------+ > | PLT entrys | (TODO) > +---------------+ > high > > Known issues : > If kaslr is enabled, the address of tramp and ftrace call > may be far away. Therefore, long jump support is required. > Here I intend to use the same solution as module relocating, > Reserve enough space for PLT at the end when allocating, can > use PLT to complete these long jumps. This can happen both ways; the callsite can also be too far from the trampoline to be able to branch to it. I've had issues with that for other reasons, and I think that we might be able to use -fpatchable-function-entry=N,M to place a PLT immediately before each function for that. However, I'm wary of doing so because it makes it much harder to modify the patch site itself. > > Signed-off-by: Cheng Jian > --- > arch/arm64/kernel/entry-ftrace.S | 4 + > arch/arm64/kernel/ftrace.c | 310 +++++++++++++++++++++++++++++++ > 2 files changed, 314 insertions(+) > > diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S > index 7d02f9966d34..f5ee797804ac 100644 > --- a/arch/arm64/kernel/entry-ftrace.S > +++ b/arch/arm64/kernel/entry-ftrace.S > @@ -77,17 +77,20 @@ > > ENTRY(ftrace_regs_caller) > ftrace_regs_entry 1 > +GLOBAL(ftrace_regs_caller_end) > b ftrace_common > ENDPROC(ftrace_regs_caller) > > ENTRY(ftrace_caller) > ftrace_regs_entry 0 > +GLOBAL(ftrace_caller_end) > b ftrace_common > ENDPROC(ftrace_caller) > > ENTRY(ftrace_common) > sub x0, x30, #AARCH64_INSN_SIZE // ip (callsite's BL insn) > mov x1, x9 // parent_ip (callsite's LR) > +GLOBAL(function_trace_op_ptr) > ldr_l x2, function_trace_op // op > mov x3, sp // regs > > @@ -121,6 +124,7 @@ ftrace_common_return: > /* Restore the callsite's SP */ > add sp, sp, #S_FRAME_SIZE + 16 > > +GLOBAL(ftrace_common_end) > ret x9 This doesn't look right. Surely you want the RET, too? > ENDPROC(ftrace_common) > > diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c > index 8618faa82e6d..95ea68ef6228 100644 > --- a/arch/arm64/kernel/ftrace.c > +++ b/arch/arm64/kernel/ftrace.c > @@ -10,11 +10,13 @@ > #include > #include > #include > +#include > > #include > #include > #include > #include > +#include > > #ifdef CONFIG_DYNAMIC_FTRACE > /* > @@ -47,6 +49,314 @@ static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, > return 0; > } > > +/* ftrace dynamic trampolines */ > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS > +#ifdef CONFIG_MODULES > +#include > + > +static inline void *alloc_tramp(unsigned long size) > +{ > + return module_alloc(size); > +} > + > +static inline void tramp_free(void *tramp) > +{ > + module_memfree(tramp); > +} > +#else > +static inline void *alloc_tramp(unsigned long size) > +{ > + return NULL; > +} > + > +static inline void tramp_free(void *tramp) {} > +#endif > + > +extern void ftrace_regs_caller_end(void); > +extern void ftrace_caller_end(void); > +extern void ftrace_common(void); > +extern void ftrace_common_end(void); > + > +extern void function_trace_op_ptr(void); > + > +extern struct ftrace_ops *function_trace_op; > + > +/* > + * ftrace_caller() or ftrace_regs_caller() trampoline > + * +-----------------------+ > + * ftrace_(regs_)caller => | ...... | > + * ftrace_(regs_)caller_end => | b ftrace_common | => nop > + * +-----------------------+ > + * ftrace_common => | ...... | > + * function_trace_op_ptr => | adrp x2, sym | => nop > + * | ldr x2,[x2,:lo12:sym]| => ldr x2 > + * | ...... | > + * ftrace_common_end => | retq | Copy-paste from x86? arm64 doesn't have a retq instruction. > + * +-----------------------+ > + * ftrace_opt => | ftrace_opt | > + * +-----------------------+ Typo: s/opt/ops/ ? > + */ > +static unsigned long create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) > +{ > + unsigned long start_offset_caller, end_offset_caller, caller_size; > + unsigned long start_offset_common, end_offset_common, common_size; > + unsigned long op_offset, offset, size, ip, npages; > + void *trampoline; > + unsigned long *ptr; > + /* ldr x2,