Received: by 2002:ab2:2994:0:b0:1ef:ca3e:3cd5 with SMTP id n20csp166538lqb; Thu, 14 Mar 2024 08:08:25 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU2Blkg+oKc6p52WMl5yYm4hb2Vq6IybHWvgH+4jwq2i7BH95kv+kMt2QIboXsM9yxyfmVe5ZS7PmZftqVwCEyaEJCzQ3dTolUlvANBrg== X-Google-Smtp-Source: AGHT+IHJzf21iyR7hJLK9GR+OBhiKW1jSkf1pwKGfpKthWi+6qMYaUr7IUraCVyWavuDpw3nWODk X-Received: by 2002:a05:6359:4c0d:b0:17b:64af:cd7b with SMTP id kj13-20020a0563594c0d00b0017b64afcd7bmr2679186rwc.16.1710428905298; Thu, 14 Mar 2024 08:08:25 -0700 (PDT) Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id r77-20020a632b50000000b005dc892e61d3si748880pgr.657.2024.03.14.08.08.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Mar 2024 08:08:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-103456-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kernel.org header.s=k20201202 header.b=Gfl+o6Vt; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-103456-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-103456-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 09D51B231AD for ; Thu, 14 Mar 2024 15:07:47 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 66CB371743; Thu, 14 Mar 2024 15:07:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Gfl+o6Vt" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87853DDD9; Thu, 14 Mar 2024 15:07:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710428857; cv=none; b=SZ4ilK/PguZv6sZ0LkDcyztq7NGviH5+ziHB2rqOQj3xEj4vZutNRZnu96mNTzPfR3biSUSi29EFqer9pV2hEsKtXLZABd1gU+XiFk6s5c4pHCdQ2TfqTJhdYpqVdqBh8RDUTo9NLVl0vzmz9obILF0B0evKGS1n7YUe+2JtbGY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710428857; c=relaxed/simple; bh=GTQVrWcs6XBMnV9a14mAay7RyQJOjR0XkVbs68SqZPM=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=F0CKbxS9CbwAI4V1LblbZvVy5KOGq75UAGoKdxwRmRdvintycE6pAceu24mLE4TEJ4jOj0QxDLaQVHj/4IPcBaifgltVGOubC4ieoXd/53Y+gaud+ENJ/VO2bkjeDt6XIvHk1akc9tTRCEwL5uBDVcDK5Ck8KovSmdGD2EhhIJE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Gfl+o6Vt; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 92F50C433F1; Thu, 14 Mar 2024 15:07:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1710428857; bh=GTQVrWcs6XBMnV9a14mAay7RyQJOjR0XkVbs68SqZPM=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=Gfl+o6VtDn9x2Fj955iyoM1+yBnt/TsF7oYp6foNPn9asCM6e9NsX2I9HbgeW5uI5 fcoKb3FedKukBXX05fexGqKId3twgTL/3iq3jcYCxobZ48JHMiJZeVz6tNs+6+wtPk w0PHcqFQDARplTSrKVp2EpgNYKTWO/itjTRfg5WmqLw27khIRRf86AdB+DLL+0n6yq YuRRZ8GgUx+px5xHy1V2xBnnxhSwa01rK/mFRerJSP2rfaYES1mWmZYix4YtILbgy8 IdhL0TkeHcMvr787XJOsXnCZPM+gUee6OTjB2Y6oT4qAy7EhvEt6PntEVMY5jSyrkk loOBoYle/+6mg== From: =?utf-8?B?QmrDtnJuIFTDtnBlbA==?= To: Puranjay Mohan , Mark Rutland , Andy Chiu Cc: Paul Walmsley , Palmer Dabbelt , Albert Ou , Steven Rostedt , Masami Hiramatsu , Sami Tolvanen , Guo Ren , Ley Foon Tan , Deepak Gupta , Sia Jee Heng , Bjorn Topel , Song Shuai , Cl'ement L'eger , Al Viro , Jisheng Zhang , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, Robbin Ehn Subject: Re: [RFC PATCH] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS In-Reply-To: References: <20240306165904.108141-1-puranjay12@gmail.com> <87ttlhdeqb.fsf@all.your.base.are.belong.to.us> <8734suqsth.fsf@all.your.base.are.belong.to.us> Date: Thu, 14 Mar 2024 16:07:33 +0100 Message-ID: <87zfv0onre.fsf@all.your.base.are.belong.to.us> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Puranjay Mohan writes: > Bj=C3=B6rn T=C3=B6pel writes: > >> >> Hmm, depending on RISC-V's CMODX path, the pro/cons CALL_OPS vs dynamic >> trampolines changes quite a bit. >> >> The more I look at the pains of patching two instruction ("split >> immediates"), the better "patch data" + one insn patching look. > > I was looking at how dynamic trampolines would be implemented for RISC-V. > > With CALL-OPS we need to patch the auipc+jalr at function entry only, the > ops pointer above the function can be patched atomically. > > With a dynamic trampoline we need a auipc+jalr pair at function entry to = jump > to the trampoline and then another auipc+jalr pair to jump from trampolin= e to > ops->func. When the ops->func is modified, we would need to update the > auipc+jalr at in the trampoline. > > So, I am not sure how to move forward here, CALL-OPS or Dynamic trampolin= es? Yeah. Honestly, we need to figure out the patching story prior choosing the path, so let's start there. After reading Mark's reply, and discussing with OpenJDK folks (who does the most crazy text patching on all platforms), having to patch multiple instructions (where the address materialization is split over multiple instructions) is a no-go. It's just a too big can of worms. So, if we can only patch one insn, it's CALL_OPS. A couple of options (in addition to Andy's), and all require a per-function landing address ala CALL_OPS) tweaking what Mark is doing on Arm (given the poor branch range). ..and maybe we'll get RISC-V rainbows/unicorns in the future getting better reach (full 64b! ;-)). A) Use auipc/jalr, only patch jalr to take us to a common dispatcher/trampoline =20=20 | # probably on a data cache-line !=3D func .t= ext to avoid ping-pong | ... | func: | ...make sure ra isn't messed up... | aupic | nop <=3D> jalr # Text patch point -> common_dispatch | ACTUAL_FUNC |=20 | common_dispatch: | load based on ra | jalr | ... The auipc is never touched, and will be overhead. Also, we need a mv to store ra in a scratch register as well -- like Arm. We'll have two insn per-caller overhead for a disabled caller. B) Use jal, which can only take us +/-1M, and requires multiple dispatchers (and tracking which one to use, and properly distribute them. Ick.) | # probably on a data cache-line !=3D func .t= ext to avoid ping-pong | ... | func: | ...make sure ra isn't messed up... | nop <=3D> jal # Text patch point -> within_1M_to_func_dispatch | ACTUAL_FUNC |=20 | within_1M_to_func_dispatch: | load based on ra | jalr C) Use jal, which can only take us +/-1M, and use a per-function trampoline requires multiple dispatchers (and tracking which one to use). Blows up text size A LOT. | # somewhere, but probably on a different cac= heline than the .text to avoid ping-ongs | ... | per_func_dispatch | load based on ra | jalr | func: | ...make sure ra isn't messed up... | nop <=3D> jal # Text patch point -> per_func_dispatch | ACTUAL_FUNC It's a bit sad that we'll always have to have a dispatcher/trampoline, but it's still better than stop_machine(). (And we'll need a fencei IPI as well, but only one. ;-)) Today, I'm leaning towards A (which is what Mark suggested, and also Robbin).. Any other options? [Now how do we implement OPTPROBES? I'm kidding. ;-)] Bj=C3=B6rn