Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp2070776rwb; Wed, 5 Oct 2022 08:37:27 -0700 (PDT) X-Google-Smtp-Source: AMsMyM49D6HxmR+dbx4YZHiBV2Nh26rAiR5CtbaEDN4YQplD7wtmIM1bwaqOZ2ToeC98OHNno2qV X-Received: by 2002:a17:906:730d:b0:782:a4e0:bb54 with SMTP id di13-20020a170906730d00b00782a4e0bb54mr126942ejc.659.1664984247352; Wed, 05 Oct 2022 08:37:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664984247; cv=none; d=google.com; s=arc-20160816; b=evxdKkl9a9+zodg0KkIMm31M/znC19TQ2Bdp/kS31+6TjS7d+Tc2qeD37xLr6K4gba X3HZmOBAcphKfJGr5rsgJ/cGxw9i8FnYl3z5RxiBtKOePqXtkP+hFlhe/5ZRbMhjeQLa LsCjzlmDE2r3Ip100RVs0TKWCCieqxL84ZmsrStimkk+D8/rmVqAh55lGL6RoXSp98Q5 Id5GaEqMBck5DI8XAth1zN5RNnk5mVeiiy+v5AMkmpZDkmkdn1QZ4uCwFPQsgQAYZclH Pv7K5sZQR3Hy1Qj0AoBoek4gYAWMs3w+dJ//Pj3FGvum2Z78DttRk68OTKqdiMbP+xip kwFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=MRe2qRNgeyTmuA1mUSXQPTVt7cgNNQX9Ellk+B3r+KA=; b=s7Pl6c1EjiOHJ7HYig6YIXBduGG4/JCo65lLbTF7UZsSVMfQjWHEipT8o/fBRd/62J YVXcYQJUhM64ZsOhNsed/LUu6Tm+YlcOKVcyv3ZBOtbi/iaT4oQXnN5AY05dLwzzH63P TBTBZr9gtawq8aTrg5Z0Jo7vfpeINpx8pNp614heFKd0cePNEva0REIXLzXZwmstZF54 S5pa7CCS2EbcM82ru2kRJ58R9B9830/DdGn01Xd8M2WkIZeYYns3AQ1qqnTv6w7GxoWA xpFY2T+aMVHfMRD6XTHaxNrbO67Iv6Sy3lspflrRJqFVwMHB48PUrtEzAqOODV4Ouh4Q tpaA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dt5-20020a170907728500b0078d25d59f62si3128257ejc.21.2022.10.05.08.37.01; Wed, 05 Oct 2022 08:37:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229910AbiJEPaj (ORCPT + 99 others); Wed, 5 Oct 2022 11:30:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230491AbiJEPa0 (ORCPT ); Wed, 5 Oct 2022 11:30:26 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 251CA21270; Wed, 5 Oct 2022 08:30:21 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8B832B81E62; Wed, 5 Oct 2022 15:30:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 14B79C4347C; Wed, 5 Oct 2022 15:30:15 +0000 (UTC) Date: Wed, 5 Oct 2022 11:30:19 -0400 From: Steven Rostedt To: Florent Revest Cc: Xu Kuohai , Mark Rutland , Catalin Marinas , Daniel Borkmann , Xu Kuohai , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, Will Deacon , Jean-Philippe Brucker , Ingo Molnar , Oleg Nesterov , Alexei Starovoitov , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Zi Shen Lim , Pasha Tatashin , Ard Biesheuvel , Marc Zyngier , Guo Ren , Masami Hiramatsu Subject: Re: [PATCH bpf-next v2 0/4] Add ftrace direct call for arm64 Message-ID: <20221005113019.18aeda76@gandalf.local.home> In-Reply-To: References: <20220913162732.163631-1-xukuohai@huaweicloud.com> <970a25e4-9b79-9e0c-b338-ed1a934f2770@huawei.com> <2cb606b4-aa8b-e259-cdfd-1bfc61fd7c44@huawei.com> <7f34d333-3b2a-aea5-f411-d53be2c46eee@huawei.com> <20221005110707.55bd9354@gandalf.local.home> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 5 Oct 2022 17:10:33 +0200 Florent Revest wrote: > On Wed, Oct 5, 2022 at 5:07 PM Steven Rostedt wrote: > > > > On Wed, 5 Oct 2022 22:54:15 +0800 > > Xu Kuohai wrote: > > > > > 1.3 attach bpf prog with with direct call, bpftrace -e 'kfunc:vfs_write {}' > > > > > > # dd if=/dev/zero of=/dev/null count=1000000 > > > 1000000+0 records in > > > 1000000+0 records out > > > 512000000 bytes (512 MB, 488 MiB) copied, 1.72973 s, 296 MB/s > > > > > > > > > 1.4 attach bpf prog with with indirect call, bpftrace -e 'kfunc:vfs_write {}' > > > > > > # dd if=/dev/zero of=/dev/null count=1000000 > > > 1000000+0 records in > > > 1000000+0 records out > > > 512000000 bytes (512 MB, 488 MiB) copied, 1.99179 s, 257 MB/s > > Thanks for the measurements Xu! > > > Can you show the implementation of the indirect call you used? > > Xu used my development branch here > https://github.com/FlorentRevest/linux/commits/fprobe-min-args That looks like it could be optimized quite a bit too. Specifically this part: static bool bpf_fprobe_entry(struct fprobe *fp, unsigned long ip, struct ftrace_regs *regs, void *private) { struct bpf_fprobe_call_context *call_ctx = private; struct bpf_fprobe_context *fprobe_ctx = fp->ops.private; struct bpf_tramp_links *links = fprobe_ctx->links; struct bpf_tramp_links *fentry = &links[BPF_TRAMP_FENTRY]; struct bpf_tramp_links *fmod_ret = &links[BPF_TRAMP_MODIFY_RETURN]; struct bpf_tramp_links *fexit = &links[BPF_TRAMP_FEXIT]; int i, ret; memset(&call_ctx->ctx, 0, sizeof(call_ctx->ctx)); call_ctx->ip = ip; for (i = 0; i < fprobe_ctx->nr_args; i++) call_ctx->args[i] = ftrace_regs_get_argument(regs, i); for (i = 0; i < fentry->nr_links; i++) call_bpf_prog(fentry->links[i], &call_ctx->ctx, call_ctx->args); call_ctx->args[fprobe_ctx->nr_args] = 0; for (i = 0; i < fmod_ret->nr_links; i++) { ret = call_bpf_prog(fmod_ret->links[i], &call_ctx->ctx, call_ctx->args); if (ret) { ftrace_regs_set_return_value(regs, ret); ftrace_override_function_with_return(regs); bpf_fprobe_exit(fp, ip, regs, private); return false; } } return fexit->nr_links; } There's a lot of low hanging fruit to speed up there. I wouldn't be too fast to throw out this solution if it hasn't had the care that direct calls have had to speed that up. For example, trampolines currently only allow to attach to functions with 6 parameters or less (3 on x86_32). You could make 7 specific callbacks, with zero to 6 parameters, and unroll the argument loop. Would also be interesting to run perf to see where the overhead is. There may be other locations to work on to make it almost as fast as direct callers without the other baggage. -- Steve > > As it stands, the performance impact of the fprobe based > implementation would be too high for us. I wonder how much Mark's idea > here https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/per-callsite-ops > would help but it doesn't work right now.