2024-02-26 11:33:13

by James Clark

[permalink] [raw]
Subject: [PATCH v6 0/8] kvm/coresight: Support exclude guest and exclude host

This is a combination of the RFC for nVHE here [1] and v3 of VHE version
here [2]. After a few of the review comments it seemed much simpler for
both versions to use the same interface and be in the same patchset.

FEAT_TRF is a Coresight feature that allows trace capture to be
completely filtered at different exception levels, unlike the existing
TRCVICTLR controls which may still emit target addresses of branches,
even if the following trace is filtered.

Without FEAT_TRF, it was possible to start a trace session on a host and
also collect trace from the guest as TRCVICTLR was never programmed to
exclude guests (and it could still emit target addresses even if it
was).

With FEAT_TRF, the current behavior of trace in guests exists depends on
whether nVHE or VHE are being used. Both of the examples below are from
the host's point of view, as Coresight isn't accessible from guests.
This patchset is only relevant to when FEAT_TRF exists, otherwise there
is no change.

nVHE/pKVM:

Because the host and the guest are both using TRFCR_EL1, trace will be
generated in guests depending on the same filter rules the host is
using. For example if the host is tracing userspace only, then guest
userspace trace will also be collected.

(This is further limited by whether TRBE is used because an issue
with TRBE means that it's completely disabled in nVHE guests, but it's
possible to have other tracing components.)

VHE:

With VHE, the host filters will be in TRFCR_EL2, but the filters in
TRFCR_EL1 will be active when the guest is running. Because we don't
write to TRFCR_EL1, guest trace will be completely disabled.

With this change, the guest filtering rules from the Perf session are
now honored for both nVHE and VHE modes. This is done by either writing
to TRFCR_EL12 at the start of the Perf session and doing nothing else
further, or caching the guest value and writing it at guest switch for
nVHE. In pKVM, trace is now be disabled for both protected and
unprotected guests.

---

Changes since V5 [4]:
* Sort new sysreg entries by encoding
* Add a comment about sorting arch/arm64/tools/sysreg
* Warn on preemptible() before calling smp_processor_id()
* Pickup tags
* Change TRFCR_EL2 from SysregFields to Sysreg because it was only
used once

Changes since V4 [3]:
* Remove all V3 changes that made it work in pKVM and just disable
trace there instead
* Restore PMU host/hyp state sharing back to how it was
(kvm_pmu_update_vcpu_events())
* Simplify some of the duplication in the comments and function docs
* Add a WARN_ON_ONCE() if kvm_etm_set_guest_trfcr() is called when
the trace filtering feature doesn't exist.
* Split sysreg change into a tools update followed by the new register
addition

Changes since V3:
* Create a new shared area to store the host state instead of copying
it before each VCPU run
* Drop commit that moved SPE and trace registers from host_debug_state
into the kvm sysregs array because the guest values were never used
* Document kvm_etm_set_guest_trfcr()
* Guard kvm_etm_set_guest_trfcr() with a feature check
* Drop Mark B and Suzuki's review tags on the sysreg patch because it
turned out that broke the Perf build and needed some unconventional
changes to fix it (as in: to update the tools copy of the headers in
the same commit as the kernel changes)

Changes since V2:

* Add a new iflag to signify presence of FEAT_TRF and keep the
existing TRBE iflag. This fixes the issue where TRBLIMITR_EL1 was
being accessed even if TRBE didn't exist
* Reword a commit message

Changes since V1:

* Squashed all the arm64/tools/sysreg changes into the first commit
* Add a new commit to move SPE and TRBE regs into the kvm sysreg array
* Add a comment above the TRFCR global that it's per host CPU rather
than vcpu

Changes since nVHE RFC [1]:

* Re-write just in terms of the register value to be written for the
host and the guest. This removes some logic from the hyp code and
a value of kvm_vcpu_arch:trfcr_el1 = 0 no longer means "don't
restore".
* Remove all the conditional compilation and new files.
* Change the kvm_etm_update_vcpu_events macro to a function.
* Re-use DEBUG_STATE_SAVE_TRFCR so iflags don't need to be expanded
anymore.
* Expand the cover letter.

Changes since VHE v3 [2]:

* Use the same interface as nVHE mode so TRFCR_EL12 is now written by
kvm.

[1]: https://lore.kernel.org/kvmarm/[email protected]/
[2]: https://lore.kernel.org/kvmarm/[email protected]/
[3]: https://lore.kernel.org/linux-arm-kernel/[email protected]/
[4]: https://lore.kernel.org/all/[email protected]/

James Clark (8):
arm64: KVM: Fix renamed function in comment
arm64/sysreg: Add a comment that the sysreg file should be sorted
tools: arm64: Update sysreg.h header files
arm64/sysreg/tools: Move TRFCR definitions to sysreg
arm64: KVM: Add iflag for FEAT_TRF
arm64: KVM: Add interface to set guest value for TRFCR register
arm64: KVM: Write TRFCR value on guest switch with nVHE
coresight: Pass guest TRFCR value to KVM

arch/arm64/include/asm/kvm_host.h | 7 +-
arch/arm64/include/asm/sysreg.h | 12 -
arch/arm64/kernel/image-vars.h | 1 +
arch/arm64/kvm/debug.c | 53 ++-
arch/arm64/kvm/hyp/nvhe/debug-sr.c | 53 ++-
arch/arm64/kvm/hyp/nvhe/setup.c | 2 +-
arch/arm64/tools/sysreg | 38 ++
.../coresight/coresight-etm4x-core.c | 42 +-
drivers/hwtracing/coresight/coresight-etm4x.h | 2 +-
drivers/hwtracing/coresight/coresight-priv.h | 3 +
tools/arch/arm64/include/asm/sysreg.h | 375 +++++++++++++++++-
tools/include/linux/kasan-tags.h | 15 +
12 files changed, 541 insertions(+), 62 deletions(-)
create mode 100644 tools/include/linux/kasan-tags.h

--
2.34.1



2024-02-26 11:33:28

by James Clark

[permalink] [raw]
Subject: [PATCH v6 1/8] arm64: KVM: Fix renamed function in comment

finalize_host_mappings() became fix_host_ownership() in
commit 0d16d12eb26e ("KVM: arm64: Fix-up hyp stage-1 refcounts for all
pages mapped at EL2") so update the comment.

Reviewed-by: Suzuki K Poulose <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/setup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 9fdcf5484dfe..f0ee75bd6ac3 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -147,7 +147,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
* can't be donated or shared with another entity.
*
* The ownership transition requires matching changes in the host
- * stage-2. This will be done later (see finalize_host_mappings()) once
+ * stage-2. This will be done later (see fix_host_ownership()) once
* the hyp_vmemmap is addressable.
*/
prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
--
2.34.1


2024-02-26 11:33:41

by James Clark

[permalink] [raw]
Subject: [PATCH v6 2/8] arm64/sysreg: Add a comment that the sysreg file should be sorted

There are a few entries particularly at the end of the file that aren't
in order. To avoid confusion, add a comment that might help new entries
to be added in the right place.

Signed-off-by: James Clark <[email protected]>
---
arch/arm64/tools/sysreg | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index fa3fe0856880..a84c19c111fa 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -48,6 +48,8 @@
# feature that introduces them (eg, FEAT_LS64_ACCDATA introduces enumeration
# item ACCDATA) though it may be more taseful to do something else.

+# Please try to keep entries in this file sorted by sysreg encoding.
+
Sysreg OSDTRRX_EL1 2 0 0 0 2
Res0 63:32
Field 31:0 DTRRX
--
2.34.1


2024-02-26 11:33:54

by James Clark

[permalink] [raw]
Subject: [PATCH v6 3/8] tools: arm64: Update sysreg.h header files

Created with the following:

cp include/linux/kasan-tags.h tools/include/linux/
cp arch/arm64/include/asm/sysreg.h tools/arch/arm64/include/asm/

Update the tools copy of sysreg.h so that the next commit to add a new
register doesn't have unrelated changes in it. Because the new version
of sysreg.h includes kasan-tags.h, that file also now needs to be copied
into tools.

Acked-by: Mark Brown <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
tools/arch/arm64/include/asm/sysreg.h | 363 +++++++++++++++++++++++++-
tools/include/linux/kasan-tags.h | 15 ++
2 files changed, 375 insertions(+), 3 deletions(-)
create mode 100644 tools/include/linux/kasan-tags.h

diff --git a/tools/arch/arm64/include/asm/sysreg.h b/tools/arch/arm64/include/asm/sysreg.h
index ccc13e991376..9e8999592f3a 100644
--- a/tools/arch/arm64/include/asm/sysreg.h
+++ b/tools/arch/arm64/include/asm/sysreg.h
@@ -11,6 +11,7 @@

#include <linux/bits.h>
#include <linux/stringify.h>
+#include <linux/kasan-tags.h>

#include <asm/gpr-num.h>

@@ -123,6 +124,37 @@
#define SYS_DC_CIGSW sys_insn(1, 0, 7, 14, 4)
#define SYS_DC_CIGDSW sys_insn(1, 0, 7, 14, 6)

+#define SYS_IC_IALLUIS sys_insn(1, 0, 7, 1, 0)
+#define SYS_IC_IALLU sys_insn(1, 0, 7, 5, 0)
+#define SYS_IC_IVAU sys_insn(1, 3, 7, 5, 1)
+
+#define SYS_DC_IVAC sys_insn(1, 0, 7, 6, 1)
+#define SYS_DC_IGVAC sys_insn(1, 0, 7, 6, 3)
+#define SYS_DC_IGDVAC sys_insn(1, 0, 7, 6, 5)
+
+#define SYS_DC_CVAC sys_insn(1, 3, 7, 10, 1)
+#define SYS_DC_CGVAC sys_insn(1, 3, 7, 10, 3)
+#define SYS_DC_CGDVAC sys_insn(1, 3, 7, 10, 5)
+
+#define SYS_DC_CVAU sys_insn(1, 3, 7, 11, 1)
+
+#define SYS_DC_CVAP sys_insn(1, 3, 7, 12, 1)
+#define SYS_DC_CGVAP sys_insn(1, 3, 7, 12, 3)
+#define SYS_DC_CGDVAP sys_insn(1, 3, 7, 12, 5)
+
+#define SYS_DC_CVADP sys_insn(1, 3, 7, 13, 1)
+#define SYS_DC_CGVADP sys_insn(1, 3, 7, 13, 3)
+#define SYS_DC_CGDVADP sys_insn(1, 3, 7, 13, 5)
+
+#define SYS_DC_CIVAC sys_insn(1, 3, 7, 14, 1)
+#define SYS_DC_CIGVAC sys_insn(1, 3, 7, 14, 3)
+#define SYS_DC_CIGDVAC sys_insn(1, 3, 7, 14, 5)
+
+/* Data cache zero operations */
+#define SYS_DC_ZVA sys_insn(1, 3, 7, 4, 1)
+#define SYS_DC_GVA sys_insn(1, 3, 7, 4, 3)
+#define SYS_DC_GZVA sys_insn(1, 3, 7, 4, 4)
+
/*
* Automatically generated definitions for system registers, the
* manual encodings below are in the process of being converted to
@@ -162,6 +194,84 @@
#define SYS_DBGDTRTX_EL0 sys_reg(2, 3, 0, 5, 0)
#define SYS_DBGVCR32_EL2 sys_reg(2, 4, 0, 7, 0)

+#define SYS_BRBINF_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 0))
+#define SYS_BRBINFINJ_EL1 sys_reg(2, 1, 9, 1, 0)
+#define SYS_BRBSRC_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 1))
+#define SYS_BRBSRCINJ_EL1 sys_reg(2, 1, 9, 1, 1)
+#define SYS_BRBTGT_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 2))
+#define SYS_BRBTGTINJ_EL1 sys_reg(2, 1, 9, 1, 2)
+#define SYS_BRBTS_EL1 sys_reg(2, 1, 9, 0, 2)
+
+#define SYS_BRBCR_EL1 sys_reg(2, 1, 9, 0, 0)
+#define SYS_BRBFCR_EL1 sys_reg(2, 1, 9, 0, 1)
+#define SYS_BRBIDR0_EL1 sys_reg(2, 1, 9, 2, 0)
+
+#define SYS_TRCITECR_EL1 sys_reg(3, 0, 1, 2, 3)
+#define SYS_TRCACATR(m) sys_reg(2, 1, 2, ((m & 7) << 1), (2 | (m >> 3)))
+#define SYS_TRCACVR(m) sys_reg(2, 1, 2, ((m & 7) << 1), (0 | (m >> 3)))
+#define SYS_TRCAUTHSTATUS sys_reg(2, 1, 7, 14, 6)
+#define SYS_TRCAUXCTLR sys_reg(2, 1, 0, 6, 0)
+#define SYS_TRCBBCTLR sys_reg(2, 1, 0, 15, 0)
+#define SYS_TRCCCCTLR sys_reg(2, 1, 0, 14, 0)
+#define SYS_TRCCIDCCTLR0 sys_reg(2, 1, 3, 0, 2)
+#define SYS_TRCCIDCCTLR1 sys_reg(2, 1, 3, 1, 2)
+#define SYS_TRCCIDCVR(m) sys_reg(2, 1, 3, ((m & 7) << 1), 0)
+#define SYS_TRCCLAIMCLR sys_reg(2, 1, 7, 9, 6)
+#define SYS_TRCCLAIMSET sys_reg(2, 1, 7, 8, 6)
+#define SYS_TRCCNTCTLR(m) sys_reg(2, 1, 0, (4 | (m & 3)), 5)
+#define SYS_TRCCNTRLDVR(m) sys_reg(2, 1, 0, (0 | (m & 3)), 5)
+#define SYS_TRCCNTVR(m) sys_reg(2, 1, 0, (8 | (m & 3)), 5)
+#define SYS_TRCCONFIGR sys_reg(2, 1, 0, 4, 0)
+#define SYS_TRCDEVARCH sys_reg(2, 1, 7, 15, 6)
+#define SYS_TRCDEVID sys_reg(2, 1, 7, 2, 7)
+#define SYS_TRCEVENTCTL0R sys_reg(2, 1, 0, 8, 0)
+#define SYS_TRCEVENTCTL1R sys_reg(2, 1, 0, 9, 0)
+#define SYS_TRCEXTINSELR(m) sys_reg(2, 1, 0, (8 | (m & 3)), 4)
+#define SYS_TRCIDR0 sys_reg(2, 1, 0, 8, 7)
+#define SYS_TRCIDR10 sys_reg(2, 1, 0, 2, 6)
+#define SYS_TRCIDR11 sys_reg(2, 1, 0, 3, 6)
+#define SYS_TRCIDR12 sys_reg(2, 1, 0, 4, 6)
+#define SYS_TRCIDR13 sys_reg(2, 1, 0, 5, 6)
+#define SYS_TRCIDR1 sys_reg(2, 1, 0, 9, 7)
+#define SYS_TRCIDR2 sys_reg(2, 1, 0, 10, 7)
+#define SYS_TRCIDR3 sys_reg(2, 1, 0, 11, 7)
+#define SYS_TRCIDR4 sys_reg(2, 1, 0, 12, 7)
+#define SYS_TRCIDR5 sys_reg(2, 1, 0, 13, 7)
+#define SYS_TRCIDR6 sys_reg(2, 1, 0, 14, 7)
+#define SYS_TRCIDR7 sys_reg(2, 1, 0, 15, 7)
+#define SYS_TRCIDR8 sys_reg(2, 1, 0, 0, 6)
+#define SYS_TRCIDR9 sys_reg(2, 1, 0, 1, 6)
+#define SYS_TRCIMSPEC(m) sys_reg(2, 1, 0, (m & 7), 7)
+#define SYS_TRCITEEDCR sys_reg(2, 1, 0, 2, 1)
+#define SYS_TRCOSLSR sys_reg(2, 1, 1, 1, 4)
+#define SYS_TRCPRGCTLR sys_reg(2, 1, 0, 1, 0)
+#define SYS_TRCQCTLR sys_reg(2, 1, 0, 1, 1)
+#define SYS_TRCRSCTLR(m) sys_reg(2, 1, 1, (m & 15), (0 | (m >> 4)))
+#define SYS_TRCRSR sys_reg(2, 1, 0, 10, 0)
+#define SYS_TRCSEQEVR(m) sys_reg(2, 1, 0, (m & 3), 4)
+#define SYS_TRCSEQRSTEVR sys_reg(2, 1, 0, 6, 4)
+#define SYS_TRCSEQSTR sys_reg(2, 1, 0, 7, 4)
+#define SYS_TRCSSCCR(m) sys_reg(2, 1, 1, (m & 7), 2)
+#define SYS_TRCSSCSR(m) sys_reg(2, 1, 1, (8 | (m & 7)), 2)
+#define SYS_TRCSSPCICR(m) sys_reg(2, 1, 1, (m & 7), 3)
+#define SYS_TRCSTALLCTLR sys_reg(2, 1, 0, 11, 0)
+#define SYS_TRCSTATR sys_reg(2, 1, 0, 3, 0)
+#define SYS_TRCSYNCPR sys_reg(2, 1, 0, 13, 0)
+#define SYS_TRCTRACEIDR sys_reg(2, 1, 0, 0, 1)
+#define SYS_TRCTSCTLR sys_reg(2, 1, 0, 12, 0)
+#define SYS_TRCVICTLR sys_reg(2, 1, 0, 0, 2)
+#define SYS_TRCVIIECTLR sys_reg(2, 1, 0, 1, 2)
+#define SYS_TRCVIPCSSCTLR sys_reg(2, 1, 0, 3, 2)
+#define SYS_TRCVISSCTLR sys_reg(2, 1, 0, 2, 2)
+#define SYS_TRCVMIDCCTLR0 sys_reg(2, 1, 3, 2, 2)
+#define SYS_TRCVMIDCCTLR1 sys_reg(2, 1, 3, 3, 2)
+#define SYS_TRCVMIDCVR(m) sys_reg(2, 1, 3, ((m & 7) << 1), 1)
+
+/* ETM */
+#define SYS_TRCOSLAR sys_reg(2, 1, 1, 0, 4)
+
+#define SYS_BRBCR_EL2 sys_reg(2, 4, 9, 0, 0)
+
#define SYS_MIDR_EL1 sys_reg(3, 0, 0, 0, 0)
#define SYS_MPIDR_EL1 sys_reg(3, 0, 0, 0, 5)
#define SYS_REVIDR_EL1 sys_reg(3, 0, 0, 0, 6)
@@ -202,8 +312,13 @@
#define SYS_ERXCTLR_EL1 sys_reg(3, 0, 5, 4, 1)
#define SYS_ERXSTATUS_EL1 sys_reg(3, 0, 5, 4, 2)
#define SYS_ERXADDR_EL1 sys_reg(3, 0, 5, 4, 3)
+#define SYS_ERXPFGF_EL1 sys_reg(3, 0, 5, 4, 4)
+#define SYS_ERXPFGCTL_EL1 sys_reg(3, 0, 5, 4, 5)
+#define SYS_ERXPFGCDN_EL1 sys_reg(3, 0, 5, 4, 6)
#define SYS_ERXMISC0_EL1 sys_reg(3, 0, 5, 5, 0)
#define SYS_ERXMISC1_EL1 sys_reg(3, 0, 5, 5, 1)
+#define SYS_ERXMISC2_EL1 sys_reg(3, 0, 5, 5, 2)
+#define SYS_ERXMISC3_EL1 sys_reg(3, 0, 5, 5, 3)
#define SYS_TFSR_EL1 sys_reg(3, 0, 5, 6, 0)
#define SYS_TFSRE0_EL1 sys_reg(3, 0, 5, 6, 1)

@@ -274,6 +389,8 @@
#define SYS_ICC_IGRPEN0_EL1 sys_reg(3, 0, 12, 12, 6)
#define SYS_ICC_IGRPEN1_EL1 sys_reg(3, 0, 12, 12, 7)

+#define SYS_ACCDATA_EL1 sys_reg(3, 0, 13, 0, 5)
+
#define SYS_CNTKCTL_EL1 sys_reg(3, 0, 14, 1, 0)

#define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7)
@@ -369,6 +486,7 @@

#define SYS_SCTLR_EL2 sys_reg(3, 4, 1, 0, 0)
#define SYS_ACTLR_EL2 sys_reg(3, 4, 1, 0, 1)
+#define SYS_SCTLR2_EL2 sys_reg(3, 4, 1, 0, 3)
#define SYS_HCR_EL2 sys_reg(3, 4, 1, 1, 0)
#define SYS_MDCR_EL2 sys_reg(3, 4, 1, 1, 1)
#define SYS_CPTR_EL2 sys_reg(3, 4, 1, 1, 2)
@@ -382,12 +500,15 @@
#define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2)

#define SYS_TRFCR_EL2 sys_reg(3, 4, 1, 2, 1)
-#define SYS_HDFGRTR_EL2 sys_reg(3, 4, 3, 1, 4)
-#define SYS_HDFGWTR_EL2 sys_reg(3, 4, 3, 1, 5)
+#define SYS_VNCR_EL2 sys_reg(3, 4, 2, 2, 0)
#define SYS_HAFGRTR_EL2 sys_reg(3, 4, 3, 1, 6)
#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)
#define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1)
#define SYS_SP_EL1 sys_reg(3, 4, 4, 1, 0)
+#define SYS_SPSR_irq sys_reg(3, 4, 4, 3, 0)
+#define SYS_SPSR_abt sys_reg(3, 4, 4, 3, 1)
+#define SYS_SPSR_und sys_reg(3, 4, 4, 3, 2)
+#define SYS_SPSR_fiq sys_reg(3, 4, 4, 3, 3)
#define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
#define SYS_AFSR0_EL2 sys_reg(3, 4, 5, 1, 0)
#define SYS_AFSR1_EL2 sys_reg(3, 4, 5, 1, 1)
@@ -401,6 +522,18 @@

#define SYS_MAIR_EL2 sys_reg(3, 4, 10, 2, 0)
#define SYS_AMAIR_EL2 sys_reg(3, 4, 10, 3, 0)
+#define SYS_MPAMHCR_EL2 sys_reg(3, 4, 10, 4, 0)
+#define SYS_MPAMVPMV_EL2 sys_reg(3, 4, 10, 4, 1)
+#define SYS_MPAM2_EL2 sys_reg(3, 4, 10, 5, 0)
+#define __SYS__MPAMVPMx_EL2(x) sys_reg(3, 4, 10, 6, x)
+#define SYS_MPAMVPM0_EL2 __SYS__MPAMVPMx_EL2(0)
+#define SYS_MPAMVPM1_EL2 __SYS__MPAMVPMx_EL2(1)
+#define SYS_MPAMVPM2_EL2 __SYS__MPAMVPMx_EL2(2)
+#define SYS_MPAMVPM3_EL2 __SYS__MPAMVPMx_EL2(3)
+#define SYS_MPAMVPM4_EL2 __SYS__MPAMVPMx_EL2(4)
+#define SYS_MPAMVPM5_EL2 __SYS__MPAMVPMx_EL2(5)
+#define SYS_MPAMVPM6_EL2 __SYS__MPAMVPMx_EL2(6)
+#define SYS_MPAMVPM7_EL2 __SYS__MPAMVPMx_EL2(7)

#define SYS_VBAR_EL2 sys_reg(3, 4, 12, 0, 0)
#define SYS_RVBAR_EL2 sys_reg(3, 4, 12, 0, 1)
@@ -449,24 +582,49 @@

#define SYS_CONTEXTIDR_EL2 sys_reg(3, 4, 13, 0, 1)
#define SYS_TPIDR_EL2 sys_reg(3, 4, 13, 0, 2)
+#define SYS_SCXTNUM_EL2 sys_reg(3, 4, 13, 0, 7)
+
+#define __AMEV_op2(m) (m & 0x7)
+#define __AMEV_CRm(n, m) (n | ((m & 0x8) >> 3))
+#define __SYS__AMEVCNTVOFF0n_EL2(m) sys_reg(3, 4, 13, __AMEV_CRm(0x8, m), __AMEV_op2(m))
+#define SYS_AMEVCNTVOFF0n_EL2(m) __SYS__AMEVCNTVOFF0n_EL2(m)
+#define __SYS__AMEVCNTVOFF1n_EL2(m) sys_reg(3, 4, 13, __AMEV_CRm(0xA, m), __AMEV_op2(m))
+#define SYS_AMEVCNTVOFF1n_EL2(m) __SYS__AMEVCNTVOFF1n_EL2(m)

#define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
#define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)
+#define SYS_CNTHP_TVAL_EL2 sys_reg(3, 4, 14, 2, 0)
+#define SYS_CNTHP_CTL_EL2 sys_reg(3, 4, 14, 2, 1)
+#define SYS_CNTHP_CVAL_EL2 sys_reg(3, 4, 14, 2, 2)
+#define SYS_CNTHV_TVAL_EL2 sys_reg(3, 4, 14, 3, 0)
+#define SYS_CNTHV_CTL_EL2 sys_reg(3, 4, 14, 3, 1)
+#define SYS_CNTHV_CVAL_EL2 sys_reg(3, 4, 14, 3, 2)

/* VHE encodings for architectural EL0/1 system registers */
+#define SYS_BRBCR_EL12 sys_reg(2, 5, 9, 0, 0)
#define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0)
+#define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2)
+#define SYS_SCTLR2_EL12 sys_reg(3, 5, 1, 0, 3)
+#define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0)
+#define SYS_TRFCR_EL12 sys_reg(3, 5, 1, 2, 1)
+#define SYS_SMCR_EL12 sys_reg(3, 5, 1, 2, 6)
#define SYS_TTBR0_EL12 sys_reg(3, 5, 2, 0, 0)
#define SYS_TTBR1_EL12 sys_reg(3, 5, 2, 0, 1)
#define SYS_TCR_EL12 sys_reg(3, 5, 2, 0, 2)
+#define SYS_TCR2_EL12 sys_reg(3, 5, 2, 0, 3)
#define SYS_SPSR_EL12 sys_reg(3, 5, 4, 0, 0)
#define SYS_ELR_EL12 sys_reg(3, 5, 4, 0, 1)
#define SYS_AFSR0_EL12 sys_reg(3, 5, 5, 1, 0)
#define SYS_AFSR1_EL12 sys_reg(3, 5, 5, 1, 1)
#define SYS_ESR_EL12 sys_reg(3, 5, 5, 2, 0)
#define SYS_TFSR_EL12 sys_reg(3, 5, 5, 6, 0)
+#define SYS_FAR_EL12 sys_reg(3, 5, 6, 0, 0)
+#define SYS_PMSCR_EL12 sys_reg(3, 5, 9, 9, 0)
#define SYS_MAIR_EL12 sys_reg(3, 5, 10, 2, 0)
#define SYS_AMAIR_EL12 sys_reg(3, 5, 10, 3, 0)
#define SYS_VBAR_EL12 sys_reg(3, 5, 12, 0, 0)
+#define SYS_CONTEXTIDR_EL12 sys_reg(3, 5, 13, 0, 1)
+#define SYS_SCXTNUM_EL12 sys_reg(3, 5, 13, 0, 7)
#define SYS_CNTKCTL_EL12 sys_reg(3, 5, 14, 1, 0)
#define SYS_CNTP_TVAL_EL02 sys_reg(3, 5, 14, 2, 0)
#define SYS_CNTP_CTL_EL02 sys_reg(3, 5, 14, 2, 1)
@@ -477,6 +635,165 @@

#define SYS_SP_EL2 sys_reg(3, 6, 4, 1, 0)

+/* AT instructions */
+#define AT_Op0 1
+#define AT_CRn 7
+
+#define OP_AT_S1E1R sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
+#define OP_AT_S1E1W sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
+#define OP_AT_S1E0R sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
+#define OP_AT_S1E0W sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
+#define OP_AT_S1E1RP sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
+#define OP_AT_S1E1WP sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
+#define OP_AT_S1E1A sys_insn(AT_Op0, 0, AT_CRn, 9, 2)
+#define OP_AT_S1E2R sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
+#define OP_AT_S1E2W sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
+#define OP_AT_S12E1R sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
+#define OP_AT_S12E1W sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
+#define OP_AT_S12E0R sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
+#define OP_AT_S12E0W sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
+
+/* TLBI instructions */
+#define OP_TLBI_VMALLE1OS sys_insn(1, 0, 8, 1, 0)
+#define OP_TLBI_VAE1OS sys_insn(1, 0, 8, 1, 1)
+#define OP_TLBI_ASIDE1OS sys_insn(1, 0, 8, 1, 2)
+#define OP_TLBI_VAAE1OS sys_insn(1, 0, 8, 1, 3)
+#define OP_TLBI_VALE1OS sys_insn(1, 0, 8, 1, 5)
+#define OP_TLBI_VAALE1OS sys_insn(1, 0, 8, 1, 7)
+#define OP_TLBI_RVAE1IS sys_insn(1, 0, 8, 2, 1)
+#define OP_TLBI_RVAAE1IS sys_insn(1, 0, 8, 2, 3)
+#define OP_TLBI_RVALE1IS sys_insn(1, 0, 8, 2, 5)
+#define OP_TLBI_RVAALE1IS sys_insn(1, 0, 8, 2, 7)
+#define OP_TLBI_VMALLE1IS sys_insn(1, 0, 8, 3, 0)
+#define OP_TLBI_VAE1IS sys_insn(1, 0, 8, 3, 1)
+#define OP_TLBI_ASIDE1IS sys_insn(1, 0, 8, 3, 2)
+#define OP_TLBI_VAAE1IS sys_insn(1, 0, 8, 3, 3)
+#define OP_TLBI_VALE1IS sys_insn(1, 0, 8, 3, 5)
+#define OP_TLBI_VAALE1IS sys_insn(1, 0, 8, 3, 7)
+#define OP_TLBI_RVAE1OS sys_insn(1, 0, 8, 5, 1)
+#define OP_TLBI_RVAAE1OS sys_insn(1, 0, 8, 5, 3)
+#define OP_TLBI_RVALE1OS sys_insn(1, 0, 8, 5, 5)
+#define OP_TLBI_RVAALE1OS sys_insn(1, 0, 8, 5, 7)
+#define OP_TLBI_RVAE1 sys_insn(1, 0, 8, 6, 1)
+#define OP_TLBI_RVAAE1 sys_insn(1, 0, 8, 6, 3)
+#define OP_TLBI_RVALE1 sys_insn(1, 0, 8, 6, 5)
+#define OP_TLBI_RVAALE1 sys_insn(1, 0, 8, 6, 7)
+#define OP_TLBI_VMALLE1 sys_insn(1, 0, 8, 7, 0)
+#define OP_TLBI_VAE1 sys_insn(1, 0, 8, 7, 1)
+#define OP_TLBI_ASIDE1 sys_insn(1, 0, 8, 7, 2)
+#define OP_TLBI_VAAE1 sys_insn(1, 0, 8, 7, 3)
+#define OP_TLBI_VALE1 sys_insn(1, 0, 8, 7, 5)
+#define OP_TLBI_VAALE1 sys_insn(1, 0, 8, 7, 7)
+#define OP_TLBI_VMALLE1OSNXS sys_insn(1, 0, 9, 1, 0)
+#define OP_TLBI_VAE1OSNXS sys_insn(1, 0, 9, 1, 1)
+#define OP_TLBI_ASIDE1OSNXS sys_insn(1, 0, 9, 1, 2)
+#define OP_TLBI_VAAE1OSNXS sys_insn(1, 0, 9, 1, 3)
+#define OP_TLBI_VALE1OSNXS sys_insn(1, 0, 9, 1, 5)
+#define OP_TLBI_VAALE1OSNXS sys_insn(1, 0, 9, 1, 7)
+#define OP_TLBI_RVAE1ISNXS sys_insn(1, 0, 9, 2, 1)
+#define OP_TLBI_RVAAE1ISNXS sys_insn(1, 0, 9, 2, 3)
+#define OP_TLBI_RVALE1ISNXS sys_insn(1, 0, 9, 2, 5)
+#define OP_TLBI_RVAALE1ISNXS sys_insn(1, 0, 9, 2, 7)
+#define OP_TLBI_VMALLE1ISNXS sys_insn(1, 0, 9, 3, 0)
+#define OP_TLBI_VAE1ISNXS sys_insn(1, 0, 9, 3, 1)
+#define OP_TLBI_ASIDE1ISNXS sys_insn(1, 0, 9, 3, 2)
+#define OP_TLBI_VAAE1ISNXS sys_insn(1, 0, 9, 3, 3)
+#define OP_TLBI_VALE1ISNXS sys_insn(1, 0, 9, 3, 5)
+#define OP_TLBI_VAALE1ISNXS sys_insn(1, 0, 9, 3, 7)
+#define OP_TLBI_RVAE1OSNXS sys_insn(1, 0, 9, 5, 1)
+#define OP_TLBI_RVAAE1OSNXS sys_insn(1, 0, 9, 5, 3)
+#define OP_TLBI_RVALE1OSNXS sys_insn(1, 0, 9, 5, 5)
+#define OP_TLBI_RVAALE1OSNXS sys_insn(1, 0, 9, 5, 7)
+#define OP_TLBI_RVAE1NXS sys_insn(1, 0, 9, 6, 1)
+#define OP_TLBI_RVAAE1NXS sys_insn(1, 0, 9, 6, 3)
+#define OP_TLBI_RVALE1NXS sys_insn(1, 0, 9, 6, 5)
+#define OP_TLBI_RVAALE1NXS sys_insn(1, 0, 9, 6, 7)
+#define OP_TLBI_VMALLE1NXS sys_insn(1, 0, 9, 7, 0)
+#define OP_TLBI_VAE1NXS sys_insn(1, 0, 9, 7, 1)
+#define OP_TLBI_ASIDE1NXS sys_insn(1, 0, 9, 7, 2)
+#define OP_TLBI_VAAE1NXS sys_insn(1, 0, 9, 7, 3)
+#define OP_TLBI_VALE1NXS sys_insn(1, 0, 9, 7, 5)
+#define OP_TLBI_VAALE1NXS sys_insn(1, 0, 9, 7, 7)
+#define OP_TLBI_IPAS2E1IS sys_insn(1, 4, 8, 0, 1)
+#define OP_TLBI_RIPAS2E1IS sys_insn(1, 4, 8, 0, 2)
+#define OP_TLBI_IPAS2LE1IS sys_insn(1, 4, 8, 0, 5)
+#define OP_TLBI_RIPAS2LE1IS sys_insn(1, 4, 8, 0, 6)
+#define OP_TLBI_ALLE2OS sys_insn(1, 4, 8, 1, 0)
+#define OP_TLBI_VAE2OS sys_insn(1, 4, 8, 1, 1)
+#define OP_TLBI_ALLE1OS sys_insn(1, 4, 8, 1, 4)
+#define OP_TLBI_VALE2OS sys_insn(1, 4, 8, 1, 5)
+#define OP_TLBI_VMALLS12E1OS sys_insn(1, 4, 8, 1, 6)
+#define OP_TLBI_RVAE2IS sys_insn(1, 4, 8, 2, 1)
+#define OP_TLBI_RVALE2IS sys_insn(1, 4, 8, 2, 5)
+#define OP_TLBI_ALLE2IS sys_insn(1, 4, 8, 3, 0)
+#define OP_TLBI_VAE2IS sys_insn(1, 4, 8, 3, 1)
+#define OP_TLBI_ALLE1IS sys_insn(1, 4, 8, 3, 4)
+#define OP_TLBI_VALE2IS sys_insn(1, 4, 8, 3, 5)
+#define OP_TLBI_VMALLS12E1IS sys_insn(1, 4, 8, 3, 6)
+#define OP_TLBI_IPAS2E1OS sys_insn(1, 4, 8, 4, 0)
+#define OP_TLBI_IPAS2E1 sys_insn(1, 4, 8, 4, 1)
+#define OP_TLBI_RIPAS2E1 sys_insn(1, 4, 8, 4, 2)
+#define OP_TLBI_RIPAS2E1OS sys_insn(1, 4, 8, 4, 3)
+#define OP_TLBI_IPAS2LE1OS sys_insn(1, 4, 8, 4, 4)
+#define OP_TLBI_IPAS2LE1 sys_insn(1, 4, 8, 4, 5)
+#define OP_TLBI_RIPAS2LE1 sys_insn(1, 4, 8, 4, 6)
+#define OP_TLBI_RIPAS2LE1OS sys_insn(1, 4, 8, 4, 7)
+#define OP_TLBI_RVAE2OS sys_insn(1, 4, 8, 5, 1)
+#define OP_TLBI_RVALE2OS sys_insn(1, 4, 8, 5, 5)
+#define OP_TLBI_RVAE2 sys_insn(1, 4, 8, 6, 1)
+#define OP_TLBI_RVALE2 sys_insn(1, 4, 8, 6, 5)
+#define OP_TLBI_ALLE2 sys_insn(1, 4, 8, 7, 0)
+#define OP_TLBI_VAE2 sys_insn(1, 4, 8, 7, 1)
+#define OP_TLBI_ALLE1 sys_insn(1, 4, 8, 7, 4)
+#define OP_TLBI_VALE2 sys_insn(1, 4, 8, 7, 5)
+#define OP_TLBI_VMALLS12E1 sys_insn(1, 4, 8, 7, 6)
+#define OP_TLBI_IPAS2E1ISNXS sys_insn(1, 4, 9, 0, 1)
+#define OP_TLBI_RIPAS2E1ISNXS sys_insn(1, 4, 9, 0, 2)
+#define OP_TLBI_IPAS2LE1ISNXS sys_insn(1, 4, 9, 0, 5)
+#define OP_TLBI_RIPAS2LE1ISNXS sys_insn(1, 4, 9, 0, 6)
+#define OP_TLBI_ALLE2OSNXS sys_insn(1, 4, 9, 1, 0)
+#define OP_TLBI_VAE2OSNXS sys_insn(1, 4, 9, 1, 1)
+#define OP_TLBI_ALLE1OSNXS sys_insn(1, 4, 9, 1, 4)
+#define OP_TLBI_VALE2OSNXS sys_insn(1, 4, 9, 1, 5)
+#define OP_TLBI_VMALLS12E1OSNXS sys_insn(1, 4, 9, 1, 6)
+#define OP_TLBI_RVAE2ISNXS sys_insn(1, 4, 9, 2, 1)
+#define OP_TLBI_RVALE2ISNXS sys_insn(1, 4, 9, 2, 5)
+#define OP_TLBI_ALLE2ISNXS sys_insn(1, 4, 9, 3, 0)
+#define OP_TLBI_VAE2ISNXS sys_insn(1, 4, 9, 3, 1)
+#define OP_TLBI_ALLE1ISNXS sys_insn(1, 4, 9, 3, 4)
+#define OP_TLBI_VALE2ISNXS sys_insn(1, 4, 9, 3, 5)
+#define OP_TLBI_VMALLS12E1ISNXS sys_insn(1, 4, 9, 3, 6)
+#define OP_TLBI_IPAS2E1OSNXS sys_insn(1, 4, 9, 4, 0)
+#define OP_TLBI_IPAS2E1NXS sys_insn(1, 4, 9, 4, 1)
+#define OP_TLBI_RIPAS2E1NXS sys_insn(1, 4, 9, 4, 2)
+#define OP_TLBI_RIPAS2E1OSNXS sys_insn(1, 4, 9, 4, 3)
+#define OP_TLBI_IPAS2LE1OSNXS sys_insn(1, 4, 9, 4, 4)
+#define OP_TLBI_IPAS2LE1NXS sys_insn(1, 4, 9, 4, 5)
+#define OP_TLBI_RIPAS2LE1NXS sys_insn(1, 4, 9, 4, 6)
+#define OP_TLBI_RIPAS2LE1OSNXS sys_insn(1, 4, 9, 4, 7)
+#define OP_TLBI_RVAE2OSNXS sys_insn(1, 4, 9, 5, 1)
+#define OP_TLBI_RVALE2OSNXS sys_insn(1, 4, 9, 5, 5)
+#define OP_TLBI_RVAE2NXS sys_insn(1, 4, 9, 6, 1)
+#define OP_TLBI_RVALE2NXS sys_insn(1, 4, 9, 6, 5)
+#define OP_TLBI_ALLE2NXS sys_insn(1, 4, 9, 7, 0)
+#define OP_TLBI_VAE2NXS sys_insn(1, 4, 9, 7, 1)
+#define OP_TLBI_ALLE1NXS sys_insn(1, 4, 9, 7, 4)
+#define OP_TLBI_VALE2NXS sys_insn(1, 4, 9, 7, 5)
+#define OP_TLBI_VMALLS12E1NXS sys_insn(1, 4, 9, 7, 6)
+
+/* Misc instructions */
+#define OP_GCSPUSHX sys_insn(1, 0, 7, 7, 4)
+#define OP_GCSPOPCX sys_insn(1, 0, 7, 7, 5)
+#define OP_GCSPOPX sys_insn(1, 0, 7, 7, 6)
+#define OP_GCSPUSHM sys_insn(1, 3, 7, 7, 0)
+
+#define OP_BRB_IALL sys_insn(1, 1, 7, 2, 4)
+#define OP_BRB_INJ sys_insn(1, 1, 7, 2, 5)
+#define OP_CFP_RCTX sys_insn(1, 3, 7, 3, 4)
+#define OP_DVP_RCTX sys_insn(1, 3, 7, 3, 5)
+#define OP_COSP_RCTX sys_insn(1, 3, 7, 3, 6)
+#define OP_CPP_RCTX sys_insn(1, 3, 7, 3, 7)
+
/* Common SCTLR_ELx flags. */
#define SCTLR_ELx_ENTP2 (BIT(60))
#define SCTLR_ELx_DSSBS (BIT(44))
@@ -561,10 +878,12 @@

/* id_aa64mmfr0 */
#define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN 0x0
+#define ID_AA64MMFR0_EL1_TGRAN4_LPA2 ID_AA64MMFR0_EL1_TGRAN4_52_BIT
#define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX 0x7
#define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN 0x0
#define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX 0x7
#define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN 0x1
+#define ID_AA64MMFR0_EL1_TGRAN16_LPA2 ID_AA64MMFR0_EL1_TGRAN16_52_BIT
#define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX 0xf

#define ARM64_MIN_PARANGE_BITS 32
@@ -572,6 +891,7 @@
#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT 0x0
#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE 0x1
#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN 0x2
+#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2 0x3
#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX 0x7

#ifdef CONFIG_ARM64_PA_BITS_52
@@ -582,11 +902,13 @@

#if defined(CONFIG_ARM64_4K_PAGES)
#define ID_AA64MMFR0_EL1_TGRAN_SHIFT ID_AA64MMFR0_EL1_TGRAN4_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2 ID_AA64MMFR0_EL1_TGRAN4_52_BIT
#define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
#define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
#define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
#elif defined(CONFIG_ARM64_16K_PAGES)
#define ID_AA64MMFR0_EL1_TGRAN_SHIFT ID_AA64MMFR0_EL1_TGRAN16_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2 ID_AA64MMFR0_EL1_TGRAN16_52_BIT
#define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN
#define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX
#define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT ID_AA64MMFR0_EL1_TGRAN16_2_SHIFT
@@ -610,6 +932,19 @@
#define SYS_GCR_EL1_RRND (BIT(16))
#define SYS_GCR_EL1_EXCL_MASK 0xffffUL

+#ifdef CONFIG_KASAN_HW_TAGS
+/*
+ * KASAN always uses a whole byte for its tags. With CONFIG_KASAN_HW_TAGS it
+ * only uses tags in the range 0xF0-0xFF, which we map to MTE tags 0x0-0xF.
+ */
+#define __MTE_TAG_MIN (KASAN_TAG_MIN & 0xf)
+#define __MTE_TAG_MAX (KASAN_TAG_MAX & 0xf)
+#define __MTE_TAG_INCL GENMASK(__MTE_TAG_MAX, __MTE_TAG_MIN)
+#define KERNEL_GCR_EL1_EXCL (SYS_GCR_EL1_EXCL_MASK & ~__MTE_TAG_INCL)
+#else
+#define KERNEL_GCR_EL1_EXCL SYS_GCR_EL1_EXCL_MASK
+#endif
+
#define KERNEL_GCR_EL1 (SYS_GCR_EL1_RRND | KERNEL_GCR_EL1_EXCL)

/* RGSR_EL1 Definitions */
@@ -716,6 +1051,19 @@

#define PIRx_ELx_PERM(idx, perm) ((perm) << ((idx) * 4))

+/*
+ * Permission Overlay Extension (POE) permission encodings.
+ */
+#define POE_NONE UL(0x0)
+#define POE_R UL(0x1)
+#define POE_X UL(0x2)
+#define POE_RX UL(0x3)
+#define POE_W UL(0x4)
+#define POE_RW UL(0x5)
+#define POE_XW UL(0x6)
+#define POE_RXW UL(0x7)
+#define POE_MASK UL(0xf)
+
#define ARM64_FEATURE_FIELD_BITS 4

/* Defined for compatibility only, do not add new users. */
@@ -789,15 +1137,21 @@
/*
* For registers without architectural names, or simply unsupported by
* GAS.
+ *
+ * __check_r forces warnings to be generated by the compiler when
+ * evaluating r which wouldn't normally happen due to being passed to
+ * the assembler via __stringify(r).
*/
#define read_sysreg_s(r) ({ \
u64 __val; \
+ u32 __maybe_unused __check_r = (u32)(r); \
asm volatile(__mrs_s("%0", r) : "=r" (__val)); \
__val; \
})

#define write_sysreg_s(v, r) do { \
u64 __val = (u64)(v); \
+ u32 __maybe_unused __check_r = (u32)(r); \
asm volatile(__msr_s(r, "%x0") : : "rZ" (__val)); \
} while (0)

@@ -827,6 +1181,8 @@
par; \
})

+#define SYS_FIELD_VALUE(reg, field, val) reg##_##field##_##val
+
#define SYS_FIELD_GET(reg, field, val) \
FIELD_GET(reg##_##field##_MASK, val)

@@ -834,7 +1190,8 @@
FIELD_PREP(reg##_##field##_MASK, val)

#define SYS_FIELD_PREP_ENUM(reg, field, val) \
- FIELD_PREP(reg##_##field##_MASK, reg##_##field##_##val)
+ FIELD_PREP(reg##_##field##_MASK, \
+ SYS_FIELD_VALUE(reg, field, val))

#endif

diff --git a/tools/include/linux/kasan-tags.h b/tools/include/linux/kasan-tags.h
new file mode 100644
index 000000000000..4f85f562512c
--- /dev/null
+++ b/tools/include/linux/kasan-tags.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_KASAN_TAGS_H
+#define _LINUX_KASAN_TAGS_H
+
+#define KASAN_TAG_KERNEL 0xFF /* native kernel pointers tag */
+#define KASAN_TAG_INVALID 0xFE /* inaccessible memory tag */
+#define KASAN_TAG_MAX 0xFD /* maximum value for random tags */
+
+#ifdef CONFIG_KASAN_HW_TAGS
+#define KASAN_TAG_MIN 0xF0 /* minimum value for random tags */
+#else
+#define KASAN_TAG_MIN 0x00 /* minimum value for random tags */
+#endif
+
+#endif /* LINUX_KASAN_TAGS_H */
--
2.34.1


2024-02-26 11:34:09

by James Clark

[permalink] [raw]
Subject: [PATCH v6 4/8] arm64/sysreg/tools: Move TRFCR definitions to sysreg

Convert TRFCR to automatic generation. Add separate definitions for ELx
and EL2 as TRFCR_EL1 doesn't have CX. This also mirrors the previous
definition so no code change is required.

Also add TRFCR_EL12 which will start to be used in a later commit.

Unfortunately, to avoid breaking the Perf build with duplicate
definition errors, the tools copy of the sysreg.h header needs to be
updated at the same time rather than the usual second commit. This is
because the generated version of sysreg
(arch/arm64/include/generated/asm/sysreg-defs.h), is currently shared
and tools/ does not have its own copy.

Signed-off-by: James Clark <[email protected]>
---
arch/arm64/include/asm/sysreg.h | 12 ---------
arch/arm64/tools/sysreg | 36 +++++++++++++++++++++++++++
tools/arch/arm64/include/asm/sysreg.h | 12 ---------
3 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 9e8999592f3a..35890cf3c49f 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -280,8 +280,6 @@
#define SYS_RGSR_EL1 sys_reg(3, 0, 1, 0, 5)
#define SYS_GCR_EL1 sys_reg(3, 0, 1, 0, 6)

-#define SYS_TRFCR_EL1 sys_reg(3, 0, 1, 2, 1)
-
#define SYS_TCR_EL1 sys_reg(3, 0, 2, 0, 2)

#define SYS_APIAKEYLO_EL1 sys_reg(3, 0, 2, 1, 0)
@@ -499,7 +497,6 @@
#define SYS_VTTBR_EL2 sys_reg(3, 4, 2, 1, 0)
#define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2)

-#define SYS_TRFCR_EL2 sys_reg(3, 4, 1, 2, 1)
#define SYS_VNCR_EL2 sys_reg(3, 4, 2, 2, 0)
#define SYS_HAFGRTR_EL2 sys_reg(3, 4, 3, 1, 6)
#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)
@@ -961,15 +958,6 @@
/* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
#define SYS_MPIDR_SAFE_VAL (BIT(31))

-#define TRFCR_ELx_TS_SHIFT 5
-#define TRFCR_ELx_TS_MASK ((0x3UL) << TRFCR_ELx_TS_SHIFT)
-#define TRFCR_ELx_TS_VIRTUAL ((0x1UL) << TRFCR_ELx_TS_SHIFT)
-#define TRFCR_ELx_TS_GUEST_PHYSICAL ((0x2UL) << TRFCR_ELx_TS_SHIFT)
-#define TRFCR_ELx_TS_PHYSICAL ((0x3UL) << TRFCR_ELx_TS_SHIFT)
-#define TRFCR_EL2_CX BIT(3)
-#define TRFCR_ELx_ExTRE BIT(1)
-#define TRFCR_ELx_E0TRE BIT(0)
-
/* GIC Hypervisor interface registers */
/* ICH_MISR_EL2 bit definitions */
#define ICH_MISR_EOI (1 << 0)
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index a84c19c111fa..af0cf1ce5d03 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -1919,6 +1919,22 @@ Sysreg CPACR_EL1 3 0 1 0 2
Fields CPACR_ELx
EndSysreg

+SysregFields TRFCR_ELx
+Res0 63:7
+UnsignedEnum 6:5 TS
+ 0b0001 VIRTUAL
+ 0b0010 GUEST_PHYSICAL
+ 0b0011 PHYSICAL
+EndEnum
+Res0 4:2
+Field 1 ExTRE
+Field 0 E0TRE
+EndSysregFields
+
+Sysreg TRFCR_EL1 3 0 1 2 1
+Fields TRFCR_ELx
+EndSysreg
+
Sysreg SMPRI_EL1 3 0 1 2 4
Res0 63:4
Field 3:0 PRIORITY
@@ -2396,6 +2412,22 @@ Field 1 ICIALLU
Field 0 ICIALLUIS
EndSysreg

+Sysreg TRFCR_EL2 3 4 1 2 1
+Res0 63:7
+UnsignedEnum 6:5 TS
+ 0b0000 USE_TRFCR_EL1_TS
+ 0b0001 VIRTUAL
+ 0b0010 GUEST_PHYSICAL
+ 0b0011 PHYSICAL
+EndEnum
+Res0 4
+Field 3 CX
+Res0 2
+Field 1 E2TRE
+Field 0 E0HTRE
+EndSysreg
+
+
Sysreg HDFGRTR_EL2 3 4 3 1 4
Field 63 PMBIDR_EL1
Field 62 nPMSNEVFR_EL1
@@ -2686,6 +2718,10 @@ Sysreg ZCR_EL12 3 5 1 2 0
Fields ZCR_ELx
EndSysreg

+Sysreg TRFCR_EL12 3 5 1 2 1
+Fields TRFCR_ELx
+EndSysreg
+
Sysreg SMCR_EL12 3 5 1 2 6
Fields SMCR_ELx
EndSysreg
diff --git a/tools/arch/arm64/include/asm/sysreg.h b/tools/arch/arm64/include/asm/sysreg.h
index 9e8999592f3a..35890cf3c49f 100644
--- a/tools/arch/arm64/include/asm/sysreg.h
+++ b/tools/arch/arm64/include/asm/sysreg.h
@@ -280,8 +280,6 @@
#define SYS_RGSR_EL1 sys_reg(3, 0, 1, 0, 5)
#define SYS_GCR_EL1 sys_reg(3, 0, 1, 0, 6)

-#define SYS_TRFCR_EL1 sys_reg(3, 0, 1, 2, 1)
-
#define SYS_TCR_EL1 sys_reg(3, 0, 2, 0, 2)

#define SYS_APIAKEYLO_EL1 sys_reg(3, 0, 2, 1, 0)
@@ -499,7 +497,6 @@
#define SYS_VTTBR_EL2 sys_reg(3, 4, 2, 1, 0)
#define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2)

-#define SYS_TRFCR_EL2 sys_reg(3, 4, 1, 2, 1)
#define SYS_VNCR_EL2 sys_reg(3, 4, 2, 2, 0)
#define SYS_HAFGRTR_EL2 sys_reg(3, 4, 3, 1, 6)
#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)
@@ -961,15 +958,6 @@
/* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
#define SYS_MPIDR_SAFE_VAL (BIT(31))

-#define TRFCR_ELx_TS_SHIFT 5
-#define TRFCR_ELx_TS_MASK ((0x3UL) << TRFCR_ELx_TS_SHIFT)
-#define TRFCR_ELx_TS_VIRTUAL ((0x1UL) << TRFCR_ELx_TS_SHIFT)
-#define TRFCR_ELx_TS_GUEST_PHYSICAL ((0x2UL) << TRFCR_ELx_TS_SHIFT)
-#define TRFCR_ELx_TS_PHYSICAL ((0x3UL) << TRFCR_ELx_TS_SHIFT)
-#define TRFCR_EL2_CX BIT(3)
-#define TRFCR_ELx_ExTRE BIT(1)
-#define TRFCR_ELx_E0TRE BIT(0)
-
/* GIC Hypervisor interface registers */
/* ICH_MISR_EL2 bit definitions */
#define ICH_MISR_EOI (1 << 0)
--
2.34.1


2024-02-26 11:34:41

by James Clark

[permalink] [raw]
Subject: [PATCH v6 5/8] arm64: KVM: Add iflag for FEAT_TRF

Add an extra iflag to signify if the TRFCR register is accessible.
Because TRBE requires FEAT_TRF, DEBUG_STATE_SAVE_TRBE still has the same
behavior even though it's only set when FEAT_TRF is present.

The following holes are left in struct kvm_vcpu_arch, but there aren't
enough other 8 bit fields to rearrange it to leave any hole smaller than
7 bytes:

u8 cflags; /* 2292 1 */
/* XXX 1 byte hole, try to pack */
u16 iflags; /* 2294 2 */
u8 sflags; /* 2296 1 */
bool pause; /* 2297 1 */
/* XXX 6 bytes hole, try to pack */

Reviewed-by: Suzuki K Poulose <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 4 +++-
arch/arm64/kvm/debug.c | 24 ++++++++++++++++++++----
2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 21c57b812569..85b5477bd1b4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -569,7 +569,7 @@ struct kvm_vcpu_arch {
u8 cflags;

/* Input flags to the hypervisor code, potentially cleared after use */
- u8 iflags;
+ u16 iflags;

/* State flags for kernel bookkeeping, unused by the hypervisor code */
u8 sflags;
@@ -779,6 +779,8 @@ struct kvm_vcpu_arch {
#define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
/* vcpu running in HYP context */
#define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
+/* Save trace filter controls */
+#define DEBUG_STATE_SAVE_TRFCR __vcpu_single_flag(iflags, BIT(8))

/* SVE enabled for host EL0 */
#define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index ce8886122ed3..49a13e72ddd2 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -332,14 +332,30 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
!(read_sysreg_s(SYS_PMBIDR_EL1) & BIT(PMBIDR_EL1_P_SHIFT)))
vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_SPE);

- /* Check if we have TRBE implemented and available at the host */
- if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
- !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
- vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
+ /*
+ * Set SAVE_TRFCR flag if FEAT_TRF (TraceFilt) exists. This flag
+ * signifies that the exclude_host/exclude_guest settings of any active
+ * host Perf session on a core running a VCPU can be written into
+ * TRFCR_EL1 on guest switch.
+ */
+ if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceFilt_SHIFT)) {
+ vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);
+ /*
+ * Check if we have TRBE implemented and available at the host.
+ * If it's in use at the time of guest switch then trace will
+ * need to be completely disabled. The architecture mandates
+ * FEAT_TRF with TRBE, so we only need to check for TRBE after
+ * TRF.
+ */
+ if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
+ !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
+ vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
+ }
}

void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
{
vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
+ vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);
}
--
2.34.1


2024-02-26 11:35:12

by James Clark

[permalink] [raw]
Subject: [PATCH v6 6/8] arm64: KVM: Add interface to set guest value for TRFCR register

Add an interface for the Coresight driver to use to set the value of the
TRFCR register for the guest. This register controls the exclude
settings for trace at different exception levels, and is used to honor
the exclude_host and exclude_guest parameters from the Perf session.
This will be used to later write TRFCR_EL1 on nVHE at guest switch. For
VHE, the host trace is controlled by TRFCR_EL2 and thus we can write to
the TRFCR_EL1 immediately. Because guest writes to the register are
trapped, the value will persist and can't be modified.

Instead of adding a load of infrastructure to share the host's per-cpu
offsets with the hypervisor, just define the new storage as a NR_CPUS
array.

Signed-off-by: James Clark <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 3 +++
arch/arm64/kernel/image-vars.h | 1 +
arch/arm64/kvm/debug.c | 29 +++++++++++++++++++++++++++++
3 files changed, 33 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 85b5477bd1b4..56b7f7eca195 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -509,6 +509,7 @@ struct kvm_host_psci_config {
bool psci_0_1_cpu_off_implemented;
bool psci_0_1_migrate_implemented;
};
+extern u64 ____cacheline_aligned kvm_guest_trfcr[NR_CPUS];

extern struct kvm_host_psci_config kvm_nvhe_sym(kvm_host_psci_config);
#define kvm_host_psci_config CHOOSE_NVHE_SYM(kvm_host_psci_config)
@@ -1174,6 +1175,7 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
void kvm_clr_pmu_events(u32 clr);
bool kvm_set_pmuserenr(u64 val);
+void kvm_etm_set_guest_trfcr(u64 trfcr_guest);
#else
static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
static inline void kvm_clr_pmu_events(u32 clr) {}
@@ -1181,6 +1183,7 @@ static inline bool kvm_set_pmuserenr(u64 val)
{
return false;
}
+static inline void kvm_etm_set_guest_trfcr(u64 trfcr_guest) {}
#endif

void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 31daa1da191c..fe9e2bd7f43a 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -59,6 +59,7 @@ KVM_NVHE_ALIAS(alt_cb_patch_nops);

/* Global kernel state accessed by nVHE hyp code. */
KVM_NVHE_ALIAS(kvm_vgic_global_state);
+KVM_NVHE_ALIAS(kvm_guest_trfcr);

/* Kernel symbols used to call panic() from nVHE hyp code (via ERET). */
KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 49a13e72ddd2..fe90bc7d6dd4 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -22,6 +22,7 @@
DBG_MDSCR_MDE)

static DEFINE_PER_CPU(u64, mdcr_el2);
+u64 ____cacheline_aligned kvm_guest_trfcr[NR_CPUS];

/*
* save/restore_guest_debug_regs
@@ -359,3 +360,31 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);
}
+
+/*
+ * Interface for the Coresight driver to use to set the value of the TRFCR
+ * register for the guest. This register controls the exclude settings for trace
+ * at different exception levels, and is used to honor the exclude_host and
+ * exclude_guest parameters from the Perf session.
+ *
+ * This will be used to later write TRFCR_EL1 on nVHE at guest switch. For VHE,
+ * the host trace is controlled by TRFCR_EL2 and thus we can write to the
+ * TRFCR_EL1 immediately. Because guest writes to the register are trapped, the
+ * value will persist and can't be modified. For pKVM, kvm_guest_trfcr can't
+ * be read by the hypervisor, so don't bother writing it.
+ */
+void kvm_etm_set_guest_trfcr(u64 trfcr_guest)
+{
+ if (WARN_ON_ONCE(!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
+ ID_AA64DFR0_EL1_TraceFilt_SHIFT)))
+ return;
+
+ /* Warn in invalid use of smp_processor_id() */
+ WARN_ON_ONCE(preemptible());
+
+ if (has_vhe())
+ write_sysreg_s(trfcr_guest, SYS_TRFCR_EL12);
+ else if (!is_protected_kvm_enabled())
+ kvm_guest_trfcr[smp_processor_id()] = trfcr_guest;
+}
+EXPORT_SYMBOL_GPL(kvm_etm_set_guest_trfcr);
--
2.34.1


2024-02-26 11:35:49

by James Clark

[permalink] [raw]
Subject: [PATCH v6 7/8] arm64: KVM: Write TRFCR value on guest switch with nVHE

The guest value for TRFCR requested by the Coresight driver is saved in
kvm_guest_trfcr. On guest switch this value needs to be written to
the register. Currently TRFCR is only modified when we want to disable
trace completely in guests due to an issue with TRBE. Expand the
__debug_save_trace() function to always write to the register if a
different value for guests is required, but also keep the existing TRBE
disable behavior if that's required.

In pKVM, the kvm_guest_trfcr can't be read and the host isn't trusted,
so always disable trace.

__debug_restore_trace() now has to restore unconditionally, because even
a value of 0 needs to be written to overwrite whatever was set for the
guest.

Reviewed-by: Suzuki K Poulose <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/debug-sr.c | 53 +++++++++++++++++-------------
1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 4558c02eb352..3adac2e01908 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -51,30 +51,39 @@ static void __debug_restore_spe(u64 pmscr_el1)
write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
}

-static void __debug_save_trace(u64 *trfcr_el1)
+static void __debug_save_trace(struct kvm_vcpu *vcpu)
{
- *trfcr_el1 = 0;
-
- /* Check if the TRBE is enabled */
- if (!(read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E))
- return;
- /*
- * Prohibit trace generation while we are in guest.
- * Since access to TRFCR_EL1 is trapped, the guest can't
- * modify the filtering set by the host.
- */
- *trfcr_el1 = read_sysreg_s(SYS_TRFCR_EL1);
- write_sysreg_s(0, SYS_TRFCR_EL1);
- isb();
- /* Drain the trace buffer to memory */
- tsb_csync();
+ u64 host_trfcr_el1 = read_sysreg_s(SYS_TRFCR_EL1);
+ u64 guest_trfcr_el1;
+
+ vcpu->arch.host_debug_state.trfcr_el1 = host_trfcr_el1;
+
+ /* Check if the TRBE buffer or pKVM is enabled */
+ if (is_protected_kvm_enabled() ||
+ (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE) &&
+ read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E)) {
+ /*
+ * Prohibit trace generation while we are in guest. Since access
+ * to TRFCR_EL1 is trapped, the guest can't modify the filtering
+ * set by the host.
+ */
+ write_sysreg_s(0, SYS_TRFCR_EL1);
+ isb();
+ /* Drain the trace buffer to memory */
+ tsb_csync();
+ } else {
+ /*
+ * Tracing is allowed, apply the filters provided by the
+ * Coresight driver.
+ */
+ guest_trfcr_el1 = kvm_guest_trfcr[vcpu->cpu];
+ if (host_trfcr_el1 != guest_trfcr_el1)
+ write_sysreg_s(guest_trfcr_el1, SYS_TRFCR_EL1);
+ }
}

static void __debug_restore_trace(u64 trfcr_el1)
{
- if (!trfcr_el1)
- return;
-
/* Restore trace filter controls */
write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1);
}
@@ -85,8 +94,8 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
__debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
/* Disable and flush Self-Hosted Trace generation */
- if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
- __debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
+ if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRFCR))
+ __debug_save_trace(vcpu);
}

void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
@@ -98,7 +107,7 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
- if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
+ if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRFCR))
__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
}

--
2.34.1


2024-02-26 11:36:02

by James Clark

[permalink] [raw]
Subject: [PATCH v6 8/8] coresight: Pass guest TRFCR value to KVM

Currently the userspace and kernel filters for guests are never set, so
no trace will be generated for them. Add support for tracing guests by
passing the desired TRFCR value to KVM so it can be applied to the
guest.

By writing either E1TRE or E0TRE, filtering on either guest kernel or
guest userspace is also supported. And if both E1TRE and E0TRE are
cleared when exclude_guest is set, that option is supported too. This
change also brings exclude_host support which is difficult to add as a
separate commit without excess churn and resulting in no trace at all.

Testing
=======

The addresses were counted with the following:

$ perf report -D | grep -Eo 'EL2|EL1|EL0' | sort | uniq -c

Guest kernel only:

$ perf record -e cs_etm//Gk -a -- true
535 EL1
1 EL2

Guest user only (only 5 addresses because the guest runs slowly in the
model):

$ perf record -e cs_etm//Gu -a -- true
5 EL0

Host kernel only:

$ perf record -e cs_etm//Hk -a -- true
3501 EL2

Host userspace only:

$ perf record -e cs_etm//Hu -a -- true
408 EL0
1 EL2

Reviewed-by: Suzuki K Poulose <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
.../coresight/coresight-etm4x-core.c | 42 ++++++++++++++++---
drivers/hwtracing/coresight/coresight-etm4x.h | 2 +-
drivers/hwtracing/coresight/coresight-priv.h | 3 ++
3 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index ce1995a2827f..45a69bfdc6b5 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -6,6 +6,7 @@
#include <linux/acpi.h>
#include <linux/bitops.h>
#include <linux/kernel.h>
+#include <linux/kvm_host.h>
#include <linux/moduleparam.h>
#include <linux/init.h>
#include <linux/types.h>
@@ -271,9 +272,22 @@ static void etm4x_prohibit_trace(struct etmv4_drvdata *drvdata)
/* If the CPU doesn't support FEAT_TRF, nothing to do */
if (!drvdata->trfcr)
return;
+ kvm_etm_set_guest_trfcr(0);
cpu_prohibit_trace();
}

+static u64 etm4x_get_kern_user_filter(struct etmv4_drvdata *drvdata)
+{
+ u64 trfcr = drvdata->trfcr;
+
+ if (drvdata->config.mode & ETM_MODE_EXCL_KERN)
+ trfcr &= ~TRFCR_ELx_ExTRE;
+ if (drvdata->config.mode & ETM_MODE_EXCL_USER)
+ trfcr &= ~TRFCR_ELx_E0TRE;
+
+ return trfcr;
+}
+
/*
* etm4x_allow_trace - Allow CPU tracing in the respective ELs,
* as configured by the drvdata->config.mode for the current
@@ -286,18 +300,28 @@ static void etm4x_prohibit_trace(struct etmv4_drvdata *drvdata)
*/
static void etm4x_allow_trace(struct etmv4_drvdata *drvdata)
{
- u64 trfcr = drvdata->trfcr;
+ u64 trfcr;

/* If the CPU doesn't support FEAT_TRF, nothing to do */
- if (!trfcr)
+ if (!drvdata->trfcr)
return;

- if (drvdata->config.mode & ETM_MODE_EXCL_KERN)
- trfcr &= ~TRFCR_ELx_ExTRE;
- if (drvdata->config.mode & ETM_MODE_EXCL_USER)
- trfcr &= ~TRFCR_ELx_E0TRE;
+ if (drvdata->config.mode & ETM_MODE_EXCL_HOST)
+ trfcr = drvdata->trfcr & ~(TRFCR_ELx_ExTRE | TRFCR_ELx_E0TRE);
+ else
+ trfcr = etm4x_get_kern_user_filter(drvdata);

write_trfcr(trfcr);
+
+ /* Set filters for guests and pass to KVM */
+ if (drvdata->config.mode & ETM_MODE_EXCL_GUEST)
+ trfcr = drvdata->trfcr & ~(TRFCR_ELx_ExTRE | TRFCR_ELx_E0TRE);
+ else
+ trfcr = etm4x_get_kern_user_filter(drvdata);
+
+ /* TRFCR_EL1 doesn't have CX so mask it out. */
+ trfcr &= ~TRFCR_EL2_CX;
+ kvm_etm_set_guest_trfcr(trfcr);
}

#ifdef CONFIG_ETM4X_IMPDEF_FEATURE
@@ -655,6 +679,12 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
if (attr->exclude_user)
config->mode = ETM_MODE_EXCL_USER;

+ if (attr->exclude_host)
+ config->mode |= ETM_MODE_EXCL_HOST;
+
+ if (attr->exclude_guest)
+ config->mode |= ETM_MODE_EXCL_GUEST;
+
/* Always start from the default config */
etm4_set_default_config(config);

diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index da17b6c49b0f..70c29e91f4b5 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -841,7 +841,7 @@ enum etm_impdef_type {
* @s_ex_level: Secure ELs where tracing is supported.
*/
struct etmv4_config {
- u32 mode;
+ u64 mode;
u32 pe_sel;
u32 cfg;
u32 eventctrl0;
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
index 767076e07970..727dd27ba800 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -39,6 +39,9 @@

#define ETM_MODE_EXCL_KERN BIT(30)
#define ETM_MODE_EXCL_USER BIT(31)
+#define ETM_MODE_EXCL_HOST BIT(32)
+#define ETM_MODE_EXCL_GUEST BIT(33)
+
struct cs_pair_attribute {
struct device_attribute attr;
u32 lo_off;
--
2.34.1


2024-02-26 13:19:34

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v6 2/8] arm64/sysreg: Add a comment that the sysreg file should be sorted

On Mon, Feb 26, 2024 at 11:30:30AM +0000, James Clark wrote:
> There are a few entries particularly at the end of the file that aren't
> in order. To avoid confusion, add a comment that might help new entries
> to be added in the right place.
>
> Signed-off-by: James Clark <[email protected]>

Reviewed-by: Mark Brown <[email protected]>


Attachments:
(No filename) (351.00 B)
signature.asc (499.00 B)
Download all attachments

2024-02-26 13:21:23

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v6 4/8] arm64/sysreg/tools: Move TRFCR definitions to sysreg

On Mon, Feb 26, 2024 at 11:30:32AM +0000, James Clark wrote:
> Convert TRFCR to automatic generation. Add separate definitions for ELx
> and EL2 as TRFCR_EL1 doesn't have CX. This also mirrors the previous
> definition so no code change is required.
>
> Also add TRFCR_EL12 which will start to be used in a later commit.

Reviewed-by: Mark Brown <[email protected]>


Attachments:
(No filename) (376.00 B)
signature.asc (499.00 B)
Download all attachments

2024-02-26 13:35:39

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v6 5/8] arm64: KVM: Add iflag for FEAT_TRF

On Mon, 26 Feb 2024 11:30:33 +0000,
James Clark <[email protected]> wrote:
>
> Add an extra iflag to signify if the TRFCR register is accessible.

That's not what this flag means: it indicates whether TRFCR needs to
be saved. At lease that's what the name suggests.

> Because TRBE requires FEAT_TRF, DEBUG_STATE_SAVE_TRBE still has the same
> behavior even though it's only set when FEAT_TRF is present.

This sentence seems completely out of context, because you didn't
explain that you were making TRBE *conditional* on TRF being
implemented, as per the architecture requirements.

>
> The following holes are left in struct kvm_vcpu_arch, but there aren't
> enough other 8 bit fields to rearrange it to leave any hole smaller than
> 7 bytes:
>
> u8 cflags; /* 2292 1 */
> /* XXX 1 byte hole, try to pack */
> u16 iflags; /* 2294 2 */
> u8 sflags; /* 2296 1 */
> bool pause; /* 2297 1 */
> /* XXX 6 bytes hole, try to pack */

I don't think that's particularly useful in a commit message, but more
relevant to the cover letter. However, see below.

>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Signed-off-by: James Clark <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 4 +++-
> arch/arm64/kvm/debug.c | 24 ++++++++++++++++++++----
> 2 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 21c57b812569..85b5477bd1b4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -569,7 +569,7 @@ struct kvm_vcpu_arch {
> u8 cflags;
>
> /* Input flags to the hypervisor code, potentially cleared after use */
> - u8 iflags;
> + u16 iflags;
>
> /* State flags for kernel bookkeeping, unused by the hypervisor code */
> u8 sflags;
> @@ -779,6 +779,8 @@ struct kvm_vcpu_arch {
> #define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
> /* vcpu running in HYP context */
> #define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
> +/* Save trace filter controls */
> +#define DEBUG_STATE_SAVE_TRFCR __vcpu_single_flag(iflags, BIT(8))

I'd rather you cherry-pick [1] and avoid expanding the iflags.

[1] https://lore.kernel.org/r/[email protected]

Now, I think the whole SPE/TRBE/TRCR flag management should be
improved, see below.

>
> /* SVE enabled for host EL0 */
> #define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index ce8886122ed3..49a13e72ddd2 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -332,14 +332,30 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
> !(read_sysreg_s(SYS_PMBIDR_EL1) & BIT(PMBIDR_EL1_P_SHIFT)))
> vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_SPE);
>
> - /* Check if we have TRBE implemented and available at the host */
> - if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
> - !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
> - vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
> + /*
> + * Set SAVE_TRFCR flag if FEAT_TRF (TraceFilt) exists. This flag
> + * signifies that the exclude_host/exclude_guest settings of any active
> + * host Perf session on a core running a VCPU can be written into
> + * TRFCR_EL1 on guest switch.
> + */
> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceFilt_SHIFT)) {
> + vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);

Can we avoid doing this unconditionally? It only makes sense to save
the trace crud if it is going to be changed, right?

> + /*
> + * Check if we have TRBE implemented and available at the host.
> + * If it's in use at the time of guest switch then trace will
> + * need to be completely disabled. The architecture mandates
> + * FEAT_TRF with TRBE, so we only need to check for TRBE after
> + * TRF.
> + */
> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
> + !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
> + vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
> + }
> }
>
> void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
> {
> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
> + vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);
> }

Dealing with flags that are strongly coupled in a disjoined way a
pretty bad idea. Look at the generated code, and realise we flip the
preempt flag on each access.

Can we do better? You bet. The vcpu_{set,clear}_flags infrastructure
is capable of dealing with multiple flags at once, as demonstrated by
the way we deal with exception encoding.

Something like:

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index addf79ba8fa0..3e50e535fdd4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -885,6 +885,10 @@ struct kvm_vcpu_arch {
#define DEBUG_STATE_SAVE_SPE __vcpu_single_flag(iflags, BIT(5))
/* Save TRBE context if active */
#define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
+/* Save Trace Filter Controls */
+#define DEBUG_STATE_SAVE_TRFCR __vcpu_single_flag(iflags, BIT(7))
+/* Global debug mask */
+#define DEBUG_STATE_SAVE_MASK __vcpu_single_flag(iflags, GENMASK(7, 5))

/* SVE enabled for host EL0 */
#define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 8725291cb00a..f9b197a00582 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -339,6 +339,6 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)

void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
{
- vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
- vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
+ if (!has_vhe())
+ vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_MASK);
}

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2024-02-26 15:22:48

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v6 6/8] arm64: KVM: Add interface to set guest value for TRFCR register

On Mon, 26 Feb 2024 11:30:34 +0000,
James Clark <[email protected]> wrote:
>
> Add an interface for the Coresight driver to use to set the value of the
> TRFCR register for the guest. This register controls the exclude

This is not *for* the guest. It is the *host* value while running the
guest.

> settings for trace at different exception levels, and is used to honor
> the exclude_host and exclude_guest parameters from the Perf session.
> This will be used to later write TRFCR_EL1 on nVHE at guest switch. For
> VHE, the host trace is controlled by TRFCR_EL2 and thus we can write to
> the TRFCR_EL1 immediately. Because guest writes to the register are
> trapped, the value will persist and can't be modified.

See?

>
> Instead of adding a load of infrastructure to share the host's per-cpu
> offsets with the hypervisor, just define the new storage as a NR_CPUS
> array.
>
> Signed-off-by: James Clark <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 3 +++
> arch/arm64/kernel/image-vars.h | 1 +
> arch/arm64/kvm/debug.c | 29 +++++++++++++++++++++++++++++
> 3 files changed, 33 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 85b5477bd1b4..56b7f7eca195 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -509,6 +509,7 @@ struct kvm_host_psci_config {
> bool psci_0_1_cpu_off_implemented;
> bool psci_0_1_migrate_implemented;
> };
> +extern u64 ____cacheline_aligned kvm_guest_trfcr[NR_CPUS];

Great. So you are making it a guarantee that this is going to
ping-pong on every CPU that accesses this stuff. I'm sure my nVHE 64
core system is going to enjoy it. Not.

Look, we already have some per-CPU context: it's called kvm_host_data,
and we link to it from each and every vcpu. So as long as you're in
the context of a vcpu, you have access to it. Simples. We even have
accessors that pick the correct instance between VHE and (n/h)VHE.

What is wrong with using that?

>
> extern struct kvm_host_psci_config kvm_nvhe_sym(kvm_host_psci_config);
> #define kvm_host_psci_config CHOOSE_NVHE_SYM(kvm_host_psci_config)
> @@ -1174,6 +1175,7 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
> void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
> void kvm_clr_pmu_events(u32 clr);
> bool kvm_set_pmuserenr(u64 val);
> +void kvm_etm_set_guest_trfcr(u64 trfcr_guest);
> #else
> static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
> static inline void kvm_clr_pmu_events(u32 clr) {}
> @@ -1181,6 +1183,7 @@ static inline bool kvm_set_pmuserenr(u64 val)
> {
> return false;
> }
> +static inline void kvm_etm_set_guest_trfcr(u64 trfcr_guest) {}
> #endif
>
> void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index 31daa1da191c..fe9e2bd7f43a 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -59,6 +59,7 @@ KVM_NVHE_ALIAS(alt_cb_patch_nops);
>
> /* Global kernel state accessed by nVHE hyp code. */
> KVM_NVHE_ALIAS(kvm_vgic_global_state);
> +KVM_NVHE_ALIAS(kvm_guest_trfcr);
>
> /* Kernel symbols used to call panic() from nVHE hyp code (via ERET). */
> KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 49a13e72ddd2..fe90bc7d6dd4 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -22,6 +22,7 @@
> DBG_MDSCR_MDE)
>
> static DEFINE_PER_CPU(u64, mdcr_el2);
> +u64 ____cacheline_aligned kvm_guest_trfcr[NR_CPUS];
>
> /*
> * save/restore_guest_debug_regs
> @@ -359,3 +360,31 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);
> }
> +
> +/*
> + * Interface for the Coresight driver to use to set the value of the TRFCR

nit: s/to use//

> + * register for the guest. This register controls the exclude settings for trace

s/for the guest/for *tracing* the guest/

> + * at different exception levels, and is used to honor the exclude_host and
> + * exclude_guest parameters from the Perf session.
> + *
> + * This will be used to later write TRFCR_EL1 on nVHE at guest switch. For VHE,
> + * the host trace is controlled by TRFCR_EL2 and thus we can write to the

s/to the/to/

> + * TRFCR_EL1 immediately. Because guest writes to the register are trapped, the
> + * value will persist and can't be modified. For pKVM, kvm_guest_trfcr can't
> + * be read by the hypervisor, so don't bother writing it.

I don't know what you mean by "can't be read". Because controlling all
of the EL1 memory is not enough?

> + */
> +void kvm_etm_set_guest_trfcr(u64 trfcr_guest)
> +{
> + if (WARN_ON_ONCE(!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> + ID_AA64DFR0_EL1_TraceFilt_SHIFT)))
> + return;
> +
> + /* Warn in invalid use of smp_processor_id() */
> + WARN_ON_ONCE(preemptible());

What does it buy us to WARN, but continue to do the *wrong* thing?

> +
> + if (has_vhe())
> + write_sysreg_s(trfcr_guest, SYS_TRFCR_EL12);

Please use write_sysreg_el1() instead.

> + else if (!is_protected_kvm_enabled())
> + kvm_guest_trfcr[smp_processor_id()] = trfcr_guest;
> +}
> +EXPORT_SYMBOL_GPL(kvm_etm_set_guest_trfcr);

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2024-02-26 15:42:04

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v6 5/8] arm64: KVM: Add iflag for FEAT_TRF



On 26/02/2024 13:35, Marc Zyngier wrote:
> On Mon, 26 Feb 2024 11:30:33 +0000,
> James Clark <[email protected]> wrote:
>>
>> Add an extra iflag to signify if the TRFCR register is accessible.
>
> That's not what this flag means: it indicates whether TRFCR needs to
> be saved. At lease that's what the name suggests.
>
>> Because TRBE requires FEAT_TRF, DEBUG_STATE_SAVE_TRBE still has the same
>> behavior even though it's only set when FEAT_TRF is present.
>
> This sentence seems completely out of context, because you didn't
> explain that you were making TRBE *conditional* on TRF being
> implemented, as per the architecture requirements.
>
>>
>> The following holes are left in struct kvm_vcpu_arch, but there aren't
>> enough other 8 bit fields to rearrange it to leave any hole smaller than
>> 7 bytes:
>>
>> u8 cflags; /* 2292 1 */
>> /* XXX 1 byte hole, try to pack */
>> u16 iflags; /* 2294 2 */
>> u8 sflags; /* 2296 1 */
>> bool pause; /* 2297 1 */
>> /* XXX 6 bytes hole, try to pack */
>
> I don't think that's particularly useful in a commit message, but more
> relevant to the cover letter. However, see below.
>
>>
>> Reviewed-by: Suzuki K Poulose <[email protected]>
>> Signed-off-by: James Clark <[email protected]>
>> ---
>> arch/arm64/include/asm/kvm_host.h | 4 +++-
>> arch/arm64/kvm/debug.c | 24 ++++++++++++++++++++----
>> 2 files changed, 23 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 21c57b812569..85b5477bd1b4 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -569,7 +569,7 @@ struct kvm_vcpu_arch {
>> u8 cflags;
>>
>> /* Input flags to the hypervisor code, potentially cleared after use */
>> - u8 iflags;
>> + u16 iflags;
>>
>> /* State flags for kernel bookkeeping, unused by the hypervisor code */
>> u8 sflags;
>> @@ -779,6 +779,8 @@ struct kvm_vcpu_arch {
>> #define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
>> /* vcpu running in HYP context */
>> #define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
>> +/* Save trace filter controls */
>> +#define DEBUG_STATE_SAVE_TRFCR __vcpu_single_flag(iflags, BIT(8))
>
> I'd rather you cherry-pick [1] and avoid expanding the iflags.
>
> [1] https://lore.kernel.org/r/[email protected]
>
> Now, I think the whole SPE/TRBE/TRCR flag management should be
> improved, see below.
>
>>
>> /* SVE enabled for host EL0 */
>> #define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>> index ce8886122ed3..49a13e72ddd2 100644
>> --- a/arch/arm64/kvm/debug.c
>> +++ b/arch/arm64/kvm/debug.c
>> @@ -332,14 +332,30 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
>> !(read_sysreg_s(SYS_PMBIDR_EL1) & BIT(PMBIDR_EL1_P_SHIFT)))
>> vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_SPE);
>>
>> - /* Check if we have TRBE implemented and available at the host */
>> - if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
>> - !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
>> - vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
>> + /*
>> + * Set SAVE_TRFCR flag if FEAT_TRF (TraceFilt) exists. This flag
>> + * signifies that the exclude_host/exclude_guest settings of any active
>> + * host Perf session on a core running a VCPU can be written into
>> + * TRFCR_EL1 on guest switch.
>> + */
>> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceFilt_SHIFT)) {
>> + vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);
>
> Can we avoid doing this unconditionally? It only makes sense to save
> the trace crud if it is going to be changed, right?
>

Do you mean to see if kvm_guest_trfcr was non-zero (and would have to be
changed) at VCPU load? I assumed that it could be modified between load
and switch. That would mean there is no way to do it conditionally.

I also assumed that's the reason SPE and TRBE were implemented like
this, with the feat check at load and the enabled check at switch. It
doesn't feel like TRFCR is any different to those two.

Or do you mean to only set DEBUG_STATE_SAVE_TRFCR on switch if tracing
was enabled?

I suppose the names DEBUG_STATE_SAVE_SPE and DEBUG_STATE_SAVE_TRBE are
slightly misleading because neither are actually saved if they weren't
enabled. They're more like DEBUG_STATE_HAS_SPE and DEBUG_STATE_HAS_TRBE.

>> + /*
>> + * Check if we have TRBE implemented and available at the host.
>> + * If it's in use at the time of guest switch then trace will
>> + * need to be completely disabled. The architecture mandates
>> + * FEAT_TRF with TRBE, so we only need to check for TRBE after
>> + * TRF.
>> + */
>> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
>> + !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
>> + vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
>> + }
>> }
>>
>> void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
>> {
>> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
>> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
>> + vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);
>> }
>
> Dealing with flags that are strongly coupled in a disjoined way a
> pretty bad idea. Look at the generated code, and realise we flip the
> preempt flag on each access.
>
> Can we do better? You bet. The vcpu_{set,clear}_flags infrastructure
> is capable of dealing with multiple flags at once, as demonstrated by
> the way we deal with exception encoding.
>

Oops yeah I didn't realize that this was more than a bit set/clear. I
will combine them. I think I could probably combine the TRBE/TRF set as
well.

Agree with the rest of the comments too.

Thanks
James

> Something like:
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index addf79ba8fa0..3e50e535fdd4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -885,6 +885,10 @@ struct kvm_vcpu_arch {
> #define DEBUG_STATE_SAVE_SPE __vcpu_single_flag(iflags, BIT(5))
> /* Save TRBE context if active */
> #define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
> +/* Save Trace Filter Controls */
> +#define DEBUG_STATE_SAVE_TRFCR __vcpu_single_flag(iflags, BIT(7))
> +/* Global debug mask */
> +#define DEBUG_STATE_SAVE_MASK __vcpu_single_flag(iflags, GENMASK(7, 5))
>
> /* SVE enabled for host EL0 */
> #define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 8725291cb00a..f9b197a00582 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -339,6 +339,6 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
>
> void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
> {
> - vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
> - vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
> + if (!has_vhe())
> + vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_MASK);
> }
>
> Thanks,
>
> M.
>

2024-02-26 17:55:12

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v6 7/8] arm64: KVM: Write TRFCR value on guest switch with nVHE

On Mon, 26 Feb 2024 11:30:35 +0000,
James Clark <[email protected]> wrote:
>
> The guest value for TRFCR requested by the Coresight driver is saved in
> kvm_guest_trfcr. On guest switch this value needs to be written to
> the register. Currently TRFCR is only modified when we want to disable
> trace completely in guests due to an issue with TRBE. Expand the
> __debug_save_trace() function to always write to the register if a
> different value for guests is required, but also keep the existing TRBE
> disable behavior if that's required.
>
> In pKVM, the kvm_guest_trfcr can't be read and the host isn't trusted,
> so always disable trace.
>
> __debug_restore_trace() now has to restore unconditionally, because even
> a value of 0 needs to be written to overwrite whatever was set for the
> guest.
>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Signed-off-by: James Clark <[email protected]>
> ---
> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 53 +++++++++++++++++-------------
> 1 file changed, 31 insertions(+), 22 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 4558c02eb352..3adac2e01908 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -51,30 +51,39 @@ static void __debug_restore_spe(u64 pmscr_el1)
> write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
> }
>
> -static void __debug_save_trace(u64 *trfcr_el1)
> +static void __debug_save_trace(struct kvm_vcpu *vcpu)
> {
> - *trfcr_el1 = 0;
> -
> - /* Check if the TRBE is enabled */
> - if (!(read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E))
> - return;
> - /*
> - * Prohibit trace generation while we are in guest.
> - * Since access to TRFCR_EL1 is trapped, the guest can't
> - * modify the filtering set by the host.
> - */
> - *trfcr_el1 = read_sysreg_s(SYS_TRFCR_EL1);
> - write_sysreg_s(0, SYS_TRFCR_EL1);
> - isb();
> - /* Drain the trace buffer to memory */
> - tsb_csync();
> + u64 host_trfcr_el1 = read_sysreg_s(SYS_TRFCR_EL1);
> + u64 guest_trfcr_el1;
> +
> + vcpu->arch.host_debug_state.trfcr_el1 = host_trfcr_el1;

Huh, this madness has to stop. See patch below. The short story is
that we have to stop shoving host state in vcpus. This is gross, and a
stupid waste of memory.

> +
> + /* Check if the TRBE buffer or pKVM is enabled */
> + if (is_protected_kvm_enabled() ||
> + (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE) &&
> + read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E)) {
> + /*
> + * Prohibit trace generation while we are in guest. Since access
> + * to TRFCR_EL1 is trapped, the guest can't modify the filtering
> + * set by the host.
> + */
> + write_sysreg_s(0, SYS_TRFCR_EL1);
> + isb();
> + /* Drain the trace buffer to memory */
> + tsb_csync();
> + } else {
> + /*
> + * Tracing is allowed, apply the filters provided by the
> + * Coresight driver.
> + */
> + guest_trfcr_el1 = kvm_guest_trfcr[vcpu->cpu];
> + if (host_trfcr_el1 != guest_trfcr_el1)
> + write_sysreg_s(guest_trfcr_el1, SYS_TRFCR_EL1);

So we have 3 pieces of storage for TRFCR_EL1:

- the system register itself
- the copy in host_debug_state, which is only used transiently
- another version in kvm_guest_trfcr, provided by Coresight

Why do we need to save anything if nothing was enabled, which is *all
the time*? I'm sorry to break it to you, but nobody uses these
features. So I'd like them to have zero cost when not in use.

> + }
> }
>
> static void __debug_restore_trace(u64 trfcr_el1)
> {
> - if (!trfcr_el1)
> - return;
> -
> /* Restore trace filter controls */
> write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1);
> }
> @@ -85,8 +94,8 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
> if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
> __debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
> /* Disable and flush Self-Hosted Trace generation */
> - if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
> - __debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
> + if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRFCR))
> + __debug_save_trace(vcpu);

The more I read this code, the less I understand why we need these
flags. DEBUG_STATE_SAVE_TRFCR really means "I support TRF". But we
already have that information in the ID registers, and we could cache
it on a per-physical CPU basis instead of per-vcpu. Hell, on an
homogeneous system, this could even be a static key. Do we even have
systems out there where only half the CPUs support TRF?

Then we check whether TRBE is enabled. But if it isn't, we randomly
write whatever is in kvm_guest_trfcr[]? Why would we do that? Surely
there is something there that should say "yup, tracing" or not (such
as the enable bits), which would avoid hitting the sysreg pointlessly?

I really think that this logic should be:

- strictly based on ID registers or even better, static keys
- result in close to 0 system register access when not in use
- avoid storing state that doesn't need to be stored

Thanks,

M.

From a3e98d8428d854209f0e97aa38d1bee347c503f2 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <[email protected]>
Date: Mon, 26 Feb 2024 15:58:46 +0000
Subject: [PATCH] KVM: arm64: Exclude host_debug_data from vcpu_arch

Keeping host_debug_state on a per-vcpu basis is completely
pointless. The lifetime of this data is only that of the inner
run-loop, which means it is never accessed outside of the core
EL2 code.

Move the structure into kvm_host_data, and save over 500 bytes
per vcpu.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 31 +++++++++++++----------
arch/arm64/kvm/hyp/include/hyp/debug-sr.h | 4 +--
arch/arm64/kvm/hyp/nvhe/debug-sr.c | 8 +++---
3 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index addf79ba8fa0..599de77a232f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -601,6 +601,19 @@ struct kvm_cpu_context {

struct kvm_host_data {
struct kvm_cpu_context host_ctxt;
+
+ /*
+ * host_debug_state contains the host registers which are
+ * saved and restored during world switches.
+ */
+ struct {
+ /* {Break,watch}point registers */
+ struct kvm_guest_debug_arch regs;
+ /* Statistical profiling extension */
+ u64 pmscr_el1;
+ /* Self-hosted trace */
+ u64 trfcr_el1;
+ } host_debug_state;
};

struct kvm_host_psci_config {
@@ -695,11 +708,10 @@ struct kvm_vcpu_arch {
* We maintain more than a single set of debug registers to support
* debugging the guest from the host and to maintain separate host and
* guest state during world switches. vcpu_debug_state are the debug
- * registers of the vcpu as the guest sees them. host_debug_state are
- * the host registers which are saved and restored during
- * world switches. external_debug_state contains the debug
- * values we want to debug the guest. This is set via the
- * KVM_SET_GUEST_DEBUG ioctl.
+ * registers of the vcpu as the guest sees them.
+ *
+ * external_debug_state contains the debug values we want to debug the
+ * guest. This is set via the KVM_SET_GUEST_DEBUG ioctl.
*
* debug_ptr points to the set of debug registers that should be loaded
* onto the hardware when running the guest.
@@ -711,15 +723,6 @@ struct kvm_vcpu_arch {
struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
struct task_struct *parent_task;

- struct {
- /* {Break,watch}point registers */
- struct kvm_guest_debug_arch regs;
- /* Statistical profiling extension */
- u64 pmscr_el1;
- /* Self-hosted trace */
- u64 trfcr_el1;
- } host_debug_state;
-
/* VGIC state */
struct vgic_cpu vgic_cpu;
struct arch_timer_cpu timer_cpu;
diff --git a/arch/arm64/kvm/hyp/include/hyp/debug-sr.h b/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
index 961bbef104a6..d2a40eb82f15 100644
--- a/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
@@ -137,7 +137,7 @@ static inline void __debug_switch_to_guest_common(struct kvm_vcpu *vcpu)

host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
guest_ctxt = &vcpu->arch.ctxt;
- host_dbg = &vcpu->arch.host_debug_state.regs;
+ host_dbg = &this_cpu_ptr(&kvm_host_data)->host_debug_state.regs;
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);

__debug_save_state(host_dbg, host_ctxt);
@@ -156,7 +156,7 @@ static inline void __debug_switch_to_host_common(struct kvm_vcpu *vcpu)

host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
guest_ctxt = &vcpu->arch.ctxt;
- host_dbg = &vcpu->arch.host_debug_state.regs;
+ host_dbg = &this_cpu_ptr(&kvm_host_data)->host_debug_state.regs;
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);

__debug_save_state(guest_dbg, guest_ctxt);
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 4558c02eb352..8103f8c695b4 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -83,10 +83,10 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
/* Disable and flush SPE data generation */
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
- __debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
+ __debug_save_spe(&this_cpu_ptr(&kvm_host_data)->host_debug_state.pmscr_el1);
/* Disable and flush Self-Hosted Trace generation */
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
- __debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
+ __debug_save_trace(&this_cpu_ptr(&kvm_host_data)->host_debug_state.trfcr_el1);
}

void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
@@ -97,9 +97,9 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
- __debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
+ __debug_restore_spe(this_cpu_ptr(&kvm_host_data)->host_debug_state.pmscr_el1);
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
- __debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
+ __debug_restore_trace(this_cpu_ptr(&kvm_host_data)->host_debug_state.trfcr_el1);
}

void __debug_switch_to_host(struct kvm_vcpu *vcpu)
--
2.39.2


--
Without deviation from the norm, progress is not possible.

2024-02-26 18:04:07

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v6 5/8] arm64: KVM: Add iflag for FEAT_TRF

On Mon, 26 Feb 2024 15:41:02 +0000,
James Clark <[email protected]> wrote:
>
>
>
> On 26/02/2024 13:35, Marc Zyngier wrote:
> > On Mon, 26 Feb 2024 11:30:33 +0000,
> > James Clark <[email protected]> wrote:

[...]

> >> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> >> index ce8886122ed3..49a13e72ddd2 100644
> >> --- a/arch/arm64/kvm/debug.c
> >> +++ b/arch/arm64/kvm/debug.c
> >> @@ -332,14 +332,30 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
> >> !(read_sysreg_s(SYS_PMBIDR_EL1) & BIT(PMBIDR_EL1_P_SHIFT)))
> >> vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_SPE);
> >>
> >> - /* Check if we have TRBE implemented and available at the host */
> >> - if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
> >> - !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
> >> - vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
> >> + /*
> >> + * Set SAVE_TRFCR flag if FEAT_TRF (TraceFilt) exists. This flag
> >> + * signifies that the exclude_host/exclude_guest settings of any active
> >> + * host Perf session on a core running a VCPU can be written into
> >> + * TRFCR_EL1 on guest switch.
> >> + */
> >> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceFilt_SHIFT)) {
> >> + vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);
> >
> > Can we avoid doing this unconditionally? It only makes sense to save
> > the trace crud if it is going to be changed, right?
> >
>
> Do you mean to see if kvm_guest_trfcr was non-zero (and would have to be
> changed) at VCPU load? I assumed that it could be modified between load
> and switch. That would mean there is no way to do it conditionally.

What's the problem? If you change the value behind the vcpu's back,
you get what you deserve: garbage.

I'm baffled that you consider that randomly changing a value without
proper synchronisation (such as with an IPI) is a valid approach.
Please look at what is being done for the PMU in the same context.

> I also assumed that's the reason SPE and TRBE were implemented like
> this, with the feat check at load and the enabled check at switch. It
> doesn't feel like TRFCR is any different to those two.

Well, that' doesn't make it right. Having just looked at the debug
stuff, I'm ashamed to have let that stuff in.

> Or do you mean to only set DEBUG_STATE_SAVE_TRFCR on switch if tracing
> was enabled?

I don't think there should be any flag. The discriminant should be:

- does the HW support TRF?
- is the in-guest tracing enabled?

If both are true, and that this requires a change of configuration,
*then* you perform the change. Same thing on exit. No flag. And a
static key for TRF support, which should really be valid on all CPUs.

M.

--
Without deviation from the norm, progress is not possible.

2024-02-26 19:05:41

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v6 7/8] arm64: KVM: Write TRFCR value on guest switch with nVHE

On Mon, 26 Feb 2024 11:30:35 +0000,
James Clark <[email protected]> wrote:
>
> The guest value for TRFCR requested by the Coresight driver is saved in
> kvm_guest_trfcr. On guest switch this value needs to be written to
> the register. Currently TRFCR is only modified when we want to disable
> trace completely in guests due to an issue with TRBE. Expand the
> __debug_save_trace() function to always write to the register if a
> different value for guests is required, but also keep the existing TRBE
> disable behavior if that's required.
>
> In pKVM, the kvm_guest_trfcr can't be read and the host isn't trusted,
> so always disable trace.
>
> __debug_restore_trace() now has to restore unconditionally, because even
> a value of 0 needs to be written to overwrite whatever was set for the
> guest.
>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Signed-off-by: James Clark <[email protected]>
> ---
> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 53 +++++++++++++++++-------------
> 1 file changed, 31 insertions(+), 22 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 4558c02eb352..3adac2e01908 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -51,30 +51,39 @@ static void __debug_restore_spe(u64 pmscr_el1)
> write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
> }
>
> -static void __debug_save_trace(u64 *trfcr_el1)
> +static void __debug_save_trace(struct kvm_vcpu *vcpu)
> {
> - *trfcr_el1 = 0;
> -
> - /* Check if the TRBE is enabled */
> - if (!(read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E))
> - return;
> - /*
> - * Prohibit trace generation while we are in guest.
> - * Since access to TRFCR_EL1 is trapped, the guest can't
> - * modify the filtering set by the host.
> - */
> - *trfcr_el1 = read_sysreg_s(SYS_TRFCR_EL1);
> - write_sysreg_s(0, SYS_TRFCR_EL1);
> - isb();
> - /* Drain the trace buffer to memory */
> - tsb_csync();
> + u64 host_trfcr_el1 = read_sysreg_s(SYS_TRFCR_EL1);

Ah, and on top of that, this is already broken with hVHE.

I'll send a patch.

M.

--
Without deviation from the norm, progress is not possible.