Received: by 2002:a05:7412:d008:b0:f9:6acb:47ec with SMTP id bd8csp274253rdb; Tue, 19 Dec 2023 17:01:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IErySXjHNTh9jCoNv9CCWriIlUKDj/aSYiSs3Jn2d9scoGd6XsDlFpXydQ1XMqIXDcGu5x8 X-Received: by 2002:a05:6102:c46:b0:466:919a:a7c2 with SMTP id y6-20020a0561020c4600b00466919aa7c2mr3021970vss.9.1703034072476; Tue, 19 Dec 2023 17:01:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703034072; cv=none; d=google.com; s=arc-20160816; b=AbKTBYpAt2ZPCeoT/Wcsa7u1CI+5LGLIJinbWzpqM+4erLTRKt8HQQD5h8SkzV/Mma Z7JKElCyy09CyYnupCFtQWlQJqKzNTvaWk08SXuwU+n8fJnebYtJJpH3o+K3eRvOPkHY pPtCinFUJ9gamf27v8r3n3Nm0KiLDmymH33W1aV4AaohR4V0KthvVHHlkNyjfOqY0gQn 8xTS/3ywpXI3p1bHh1P+vbfCSRdvdpUqOHCbWBZOMWCei2x+StKOuto1DYCcoOASG0in 3TWiO3BllHl99uCPVxMvITGV9hb/8auAP4aTGHaI3UBEGrpK/Yk9TQc45bEFhWv4GnTC HcPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=aKHw6V3gMR4hknnrY0T6+gPqP55LzzkWkhXb8mcxFAc=; fh=MHFbjiEyIRiJDyyjklkE8bixMO1uD9yEJOq7UyhE4VE=; b=QcO+njkmaCJfTnKIwUn3K449SHRExsFlR6B/kUHWc+IyF68jQ6/d+SqFngSsNBCICz zOvUQ6VDGOaDGQO3tlEPFqschFeD/E9J5vB/qmQPSRnn3HswiUeVWMfYEUaWixRqFZpl mt0ZxVyg3htqKBPuIsMyNZZ3aBhVjvrFBxZhulZGQ2CEQ8su91TIZv9cSGRw34QlyWuT Xemp7YP0x/lzM/FFXVLt1J3aNqDjPed0PSz6H55jHw5KcaW5NvV8aeEFNBCjOK40FgSh 21cCLmN6/GVTuTqWhazBtkOvrcEGF8t8L6nr0YFqTrvgjWuW5Nd0GlZY+AxzrlBIvHBU 1Kig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ZdPqqC2m; spf=pass (google.com: domain of linux-kernel+bounces-6220-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-6220-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id w22-20020a05610205f600b00464893cc9bfsi781007vsf.260.2023.12.19.17.01.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 17:01:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-6220-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ZdPqqC2m; spf=pass (google.com: domain of linux-kernel+bounces-6220-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-6220-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 2BD741C251E0 for ; Wed, 20 Dec 2023 01:01:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A4F8D4697; Wed, 20 Dec 2023 01:00:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ZdPqqC2m" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB86A79CC; Wed, 20 Dec 2023 01:00:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6587AC433C8; Wed, 20 Dec 2023 01:00:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1703034054; bh=4syDIv0RLwOoNUlwaRQGOzsRhe4oIXvWdi4qYIv7kc8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=ZdPqqC2m65iV4CGkhx4G5oQ95lk6Am7JoKxR95gUf91AXAr38xSW/Ie3u6p57szTH +xQ/05BAVUzvNCEfnxIVTtKrKvUNqjTtLVoznBNSNL7V0ouLSV3RyteV2OkF2C/aFk AFp4E7cfVa1RSgP521RsfQUNr3QNvY9B74a+VJJvs2zJTdhc64rTr0z7Hu3Y8s9EoA sK81M+Iy3Ue0P2B6DyRg93Z/yx3yGeCj5G7h4yMCwfJ0nAmIA+GRjCQPf8CcKLPwuZ YGl0lNaIh3QT2wmVP04/E8jR7LoZRiRtKzpYvviV6/MBMGgHClSpSrOUcI890fk+Vo iblBm9KsE5Dug== Date: Wed, 20 Dec 2023 10:00:47 +0900 From: Masami Hiramatsu (Google) To: Jiri Olsa Cc: Alexei Starovoitov , Steven Rostedt , Florent Revest , linux-trace-kernel@vger.kernel.org, LKML , Martin KaFai Lau , bpf , Sven Schnelle , Alexei Starovoitov , Arnaldo Carvalho de Melo , Daniel Borkmann , Alan Maguire , Mark Rutland , Peter Zijlstra , Thomas Gleixner , Guo Ren Subject: Re: [PATCH v5 28/34] fprobe: Rewrite fprobe on function-graph tracer Message-Id: <20231220100047.e33d862cb869423c2a3a82bf@kernel.org> In-Reply-To: References: <170290509018.220107.1347127510564358608.stgit@devnote2> <170290542972.220107.9135357273431693988.stgit@devnote2> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Tue, 19 Dec 2023 15:39:03 +0100 Jiri Olsa wrote: > On Mon, Dec 18, 2023 at 10:17:10PM +0900, Masami Hiramatsu (Google) wrote: > > SNIP > > > -static void fprobe_exit_handler(struct rethook_node *rh, void *data, > > - unsigned long ret_ip, struct pt_regs *regs) > > +static int fprobe_entry(unsigned long func, unsigned long ret_ip, > > + struct ftrace_regs *fregs, struct fgraph_ops *gops) > > { > > - struct fprobe *fp = (struct fprobe *)data; > > - struct fprobe_rethook_node *fpr; > > - struct ftrace_regs *fregs = (struct ftrace_regs *)regs; > > - int bit; > > + struct fprobe_hlist_node *node, *first; > > + unsigned long *fgraph_data = NULL; > > + unsigned long header; > > + int reserved_words; > > + struct fprobe *fp; > > + int used, ret; > > > > - if (!fp || fprobe_disabled(fp)) > > - return; > > + if (WARN_ON_ONCE(!fregs)) > > + return 0; > > > > - fpr = container_of(rh, struct fprobe_rethook_node, node); > > + first = node = find_first_fprobe_node(func); > > + if (unlikely(!first)) > > + return 0; > > + > > + reserved_words = 0; > > + hlist_for_each_entry_from_rcu(node, hlist) { > > + if (node->addr != func) > > + break; > > + fp = READ_ONCE(node->fp); > > + if (!fp || !fp->exit_handler) > > + continue; > > + /* > > + * Since fprobe can be enabled until the next loop, we ignore the > > + * fprobe's disabled flag in this loop. > > + */ > > + reserved_words += > > + DIV_ROUND_UP(fp->entry_data_size, sizeof(long)) + 1; > > + } > > + node = first; > > + if (reserved_words) { > > + fgraph_data = fgraph_reserve_data(gops->idx, reserved_words * sizeof(long)); > > + if (unlikely(!fgraph_data)) { > > + hlist_for_each_entry_from_rcu(node, hlist) { > > + if (node->addr != func) > > + break; > > + fp = READ_ONCE(node->fp); > > + if (fp && !fprobe_disabled(fp)) > > + fp->nmissed++; > > + } > > + return 0; > > + } > > + } > > this looks expensive compared to what we do now.. IIUC due to the graph > ops limitations (16 ctive ops), you have just single graph ops for fprobe > and each fprobe registration stores ips into hash which you need to search > in here to get registered callbacks..? I think this is not so expensive. Most cases, it only hits 1 fprobe on the hash. And if the fprobe is only used to hook the entry, reserved_words == 0. > I wonder would it make sense to allow arbitrary number of active graph_ops > with the price some callback might fail because there's no stack space so > each fprobe instance would have its own graph_ops.. and we would get rid > of the code above (and below) ? Yeah, actually my first implementation is that. But I realized that doesn't work, this requires intermediate object which has refcounter because the "marker" on the shadow stack will be left after unregistering it. We need to identify which is still available and which is not. And for that purpose, we may need to introduce similar structure in the fgraph too. The current multi-fgraph does; - if CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n (called from dedicated mcount asm code), it has to loop on all fgraph_ops and check the hash, which is inefficient but it can easily push the return trace entry on the shadow stack. - if CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y (called from ftrace asm code), it does not need to loop (that will be done by ftrace) but each handler does NOT know who pushed the return trace entry on the shadow stack. Thus it has to decode the shadow stack and check it needs to push return trace entry or not. And this is hard if the traced function is self- recursive call or tail call. To check the recursive call, I introduced a bitmap entry on the shadow stack. This bitmap size limits the max number of fgraph. So, unlimit the number of fgraph, we may need to stack the number of fgraph on the stack and each fgraph callback has to unwind the shadow stack to check whether their own number is there instead of checking the bit in the bitmap. That will be more trusted way but maybe slow. Another option is introducing a pair of pre- and post-callbacks which is called before and after calling the list/direct call of ftrace_ops. And pre-callback will push the ret_stack on shadow stack and post-callback will commit or cancel it. (but this one is hard to design... maybe becomes ugly interface.) Thank you, -- Masami Hiramatsu (Google)