Received: by 2002:ab2:2994:0:b0:1ef:ca3e:3cd5 with SMTP id n20csp373000lqb; Thu, 14 Mar 2024 13:50:23 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWWNQvwV072klRRaVLC4sOQhNlgsgihHtPd9rvCsdvD+zX2dUqK3ILw+2Gd76FI1GgYV34bxDaa+IsKF0yr3EaxzoqPGdXBhAlsKFo0Xg== X-Google-Smtp-Source: AGHT+IGI5wlKT0hLI/lX5BPQF5OD8UPaXHPIZEODmYdOsolicslfrvhEwIOSnifgageh62bsLI9W X-Received: by 2002:ad4:4b73:0:b0:690:b51a:b425 with SMTP id m19-20020ad44b73000000b00690b51ab425mr2893665qvx.20.1710449423055; Thu, 14 Mar 2024 13:50:23 -0700 (PDT) Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id gs14-20020a056214226e00b0068f960b1288si1441009qvb.368.2024.03.14.13.50.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Mar 2024 13:50:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-103785-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kernel.org header.s=k20201202 header.b=iTZXdwP6; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-103785-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-103785-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id AEBB41C21492 for ; Thu, 14 Mar 2024 20:50:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 74024763F4; Thu, 14 Mar 2024 20:50:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iTZXdwP6" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D0DB2A1C7; Thu, 14 Mar 2024 20:50:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710449415; cv=none; b=chqMPv/ANUVoIOMwsjIOnh/AzDaQ/8fNm5hT6t5Q0nSqHqNeXWp2L8JAXmarnsrDhqCasjmGOZ8O1rCQ9i8I0rO3U1jnA9VMJ/1rTBm3V7u1cYgDLjC7NxKFuuUESOnHS4J2aNAtQUI80L4fqXt9/ocG5JcTfKd3C680YF3GHnA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710449415; c=relaxed/simple; bh=dkDMtvv2pEkypl9VNokAWmauFxTdX5cLHzL3XjbhbMw=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=dqXiAYVFqBMPc4LvXx5I8dREpKHyQzAO3bdhvkekYzp2e0wCQ5Ma23ChouSl/EkcE+ClKCucyz1iC4mfZvEgCUBVX6+EdPApCxPPfBvdbzTcQ1qy6Q7ILodkmEZy7kcGZ16a/sy1As/etyLHWQNW2C2zZuBk+ybrSzVl3zmcBHE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iTZXdwP6; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 530BAC433F1; Thu, 14 Mar 2024 20:50:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1710449414; bh=dkDMtvv2pEkypl9VNokAWmauFxTdX5cLHzL3XjbhbMw=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=iTZXdwP6aC7xzjJ8MgcLLrQ6/iIDqdEfmXRQNgIkp9Oau1qgT57pSLXX7sP1cseXV ER0FR23LtfY+R50uwGom+cHRiAn8/hp3TGEPVxJitlEigZqlpzB9qeeF38xACjfE0K fHfgJVxUU2wm9w3OsEB26AhNZiUQ35VIMZNVAkMgomTRGb4pJ+D25n6SeY5ZDtl2sC XduHyAmKFNOEoTGzs1ugIu+ksQALSr1qtPdnGxojMIUIuc4GtSCuyhomgglQrR6SL7 AIDGNHpddNLwpUUySh34r20VVWDQFt9vnn2IcApEvWbZqyda5nzA2tSJdFVfrSpGoN guOU5J466Os3w== From: =?utf-8?B?QmrDtnJuIFTDtnBlbA==?= To: Puranjay Mohan , Mark Rutland , Andy Chiu Cc: Paul Walmsley , Palmer Dabbelt , Albert Ou , Steven Rostedt , Masami Hiramatsu , Sami Tolvanen , Guo Ren , Ley Foon Tan , Deepak Gupta , Sia Jee Heng , Bjorn Topel , Song Shuai , Cl'ement L'eger , Al Viro , Jisheng Zhang , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, Robbin Ehn , Brendan Sweeney Subject: Re: [RFC PATCH] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS In-Reply-To: <87zfv0onre.fsf@all.your.base.are.belong.to.us> References: <20240306165904.108141-1-puranjay12@gmail.com> <87ttlhdeqb.fsf@all.your.base.are.belong.to.us> <8734suqsth.fsf@all.your.base.are.belong.to.us> <87zfv0onre.fsf@all.your.base.are.belong.to.us> Date: Thu, 14 Mar 2024 21:50:11 +0100 Message-ID: <87il1oedx8.fsf@all.your.base.are.belong.to.us> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Bj=C3=B6rn T=C3=B6pel writes: > Puranjay Mohan writes: > >> Bj=C3=B6rn T=C3=B6pel writes: >> >>> >>> Hmm, depending on RISC-V's CMODX path, the pro/cons CALL_OPS vs dynamic >>> trampolines changes quite a bit. >>> >>> The more I look at the pains of patching two instruction ("split >>> immediates"), the better "patch data" + one insn patching look. >> >> I was looking at how dynamic trampolines would be implemented for RISC-V. >> >> With CALL-OPS we need to patch the auipc+jalr at function entry only, the >> ops pointer above the function can be patched atomically. >> >> With a dynamic trampoline we need a auipc+jalr pair at function entry to= jump >> to the trampoline and then another auipc+jalr pair to jump from trampoli= ne to >> ops->func. When the ops->func is modified, we would need to update the >> auipc+jalr at in the trampoline. >> >> So, I am not sure how to move forward here, CALL-OPS or Dynamic trampoli= nes? > > Yeah. Honestly, we need to figure out the patching story prior > choosing the path, so let's start there. > > After reading Mark's reply, and discussing with OpenJDK folks (who does > the most crazy text patching on all platforms), having to patch multiple > instructions (where the address materialization is split over multiple > instructions) is a no-go. It's just a too big can of worms. So, if we > can only patch one insn, it's CALL_OPS. > > A couple of options (in addition to Andy's), and all require a > per-function landing address ala CALL_OPS) tweaking what Mark is doing > on Arm (given the poor branch range). > > ...and maybe we'll get RISC-V rainbows/unicorns in the future getting > better reach (full 64b! ;-)). > > A) Use auipc/jalr, only patch jalr to take us to a common > dispatcher/trampoline >=20=20=20 > | # probably on a data cache-line !=3D func = text to avoid ping-pong > | ... > | func: > | ...make sure ra isn't messed up... > | aupic > | nop <=3D> jalr # Text patch point -> common_dispatch > | ACTUAL_FUNC > |=20 > | common_dispatch: > | load based on ra > | jalr > | ... > > The auipc is never touched, and will be overhead. Also, we need a mv to > store ra in a scratch register as well -- like Arm. We'll have two insn > per-caller overhead for a disabled caller. > > B) Use jal, which can only take us +/-1M, and requires multiple > dispatchers (and tracking which one to use, and properly distribute > them. Ick.) > > | # probably on a data cache-line !=3D func = text to avoid ping-pong > | ... > | func: > | ...make sure ra isn't messed up... > | nop <=3D> jal # Text patch point -> within_1M_to_func_dispatch > | ACTUAL_FUNC > |=20 > | within_1M_to_func_dispatch: > | load based on ra > | jalr > > C) Use jal, which can only take us +/-1M, and use a per-function > trampoline requires multiple dispatchers (and tracking which one to > use). Blows up text size A LOT. > > | # somewhere, but probably on a different c= acheline than the .text to avoid ping-ongs > | ... > | per_func_dispatch > | load based on ra > | jalr > | func: > | ...make sure ra isn't messed up... > | nop <=3D> jal # Text patch point -> per_func_dispatch > | ACTUAL_FUNC Brendan proposed yet another option, "in-function dispatch": D)=20 | # idk somewhere | ... | func: | mv tmp1, ra | aupic tmp2, | mv tmp3, zero <=3D> ld tmp3, tmp2 | nop <=3D> jalr ra, tmp3 | ACTUAL_FUNC There are 4 CMODX possiblities: mv, nop: fully disabled, no problems mv, jalr: We will jump to zero. We would need to have the inst page/access fault handler take care of this case. Especially if we align the instructions so that they can be patched together, being interrupted in the middle and taking this path will be rare. ld, nop: no problems ld, jalr: fully enabled, no problems Patching is a 64b store/sd, and we only need a fence.i at the end, since we can handle all 4 possibilities. For the disabled case we'll have: A) mv, aupic, nop D) mv, aupic, mv, nop. Puranjay, I've flipped. Let's go Mark's CALL_OPS together with a new text patch mechanism w/o stop_machine(). Bj=C3=B6rn