2024-01-25 09:44:17

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V16 0/8] arm64/perf: Enable branch stack sampling

This series enables perf branch stack sampling support on arm64 platform
via a new arch feature called Branch Record Buffer Extension (BRBE). All
the relevant register definitions could be accessed here.

https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers

This series applies on 6.8-rc1.

Also this series is being hosted below for quick access, review and test.

https://git.gitlab.arm.com/linux-arm/linux-anshuman.git (brbe_v16)

There are still some open questions regarding handling multiple perf events
with different privilege branch filters getting on the same PMU, supporting
guest branch stack tracing from the host etc. Finally also looking for some
suggestions regarding supporting BRBE inside the guest. The series has been
re-organized completely as suggested earlier.

- Anshuman

========== Perf Branch Stack Sampling Support (arm64 platforms) ===========

Currently arm64 platform does not support perf branch stack sampling. Hence
any event requesting for branch stack records i.e PERF_SAMPLE_BRANCH_STACK
marked in event->attr.sample_type, will be rejected in armpmu_event_init().

static int armpmu_event_init(struct perf_event *event)
{
........
/* does not support taken branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
........
}

$perf record -j any,u,k ls
Error:
cycles:P: PMU Hardware or event type doesn't support branch stack sampling.

-------------------- CONFIG_ARM64_BRBE and FEAT_BRBE ----------------------

After this series, perf branch stack sampling feature gets enabled on arm64
platforms where FEAT_BRBE HW feature is supported, and CONFIG_ARM64_BRBE is
also selected during build. Let's observe all all possible scenarios here.

1. Feature not built (!CONFIG_ARM64_BRBE):

Falls back to the current behaviour i.e event gets rejected.

2. Feature built but HW not supported (CONFIG_ARM64_BRBE && !FEAT_BRBE):

Falls back to the current behaviour i.e event gets rejected.

3. Feature built and HW supported (CONFIG_ARM64_BRBE && FEAT_BRBE):

Platform supports branch stack sampling requests. Let's observe through a
simple example here.

$perf record -j any_call,u,k,save_type ls

[Please refer perf-record man pages for all possible branch filter options]

$perf report
-------------------------- Snip ----------------------
# Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
# ........ ....... .................... ............................................ ............................................ ..................
#
3.52% ls [kernel.kallsyms] [k] sched_clock_noinstr [k] arch_counter_get_cntpct 16
3.52% ls [kernel.kallsyms] [k] sched_clock [k] sched_clock_noinstr 9
1.85% ls [kernel.kallsyms] [k] sched_clock_cpu [k] sched_clock 5
1.80% ls [kernel.kallsyms] [k] irqtime_account_irq [k] sched_clock_cpu 20
1.58% ls [kernel.kallsyms] [k] gic_handle_irq [k] generic_handle_domain_irq 19
1.58% ls [kernel.kallsyms] [k] call_on_irq_stack [k] gic_handle_irq 9
1.58% ls [kernel.kallsyms] [k] do_interrupt_handler [k] call_on_irq_stack 23
1.58% ls [kernel.kallsyms] [k] generic_handle_domain_irq [k] __irq_resolve_mapping 6
1.58% ls [kernel.kallsyms] [k] __irq_resolve_mapping [k] __rcu_read_lock 10
-------------------------- Snip ----------------------

$perf report -D | grep cycles
-------------------------- Snip ----------------------
.... 1: ffff800080dd3334 -> ffff800080dd759c 39 cycles P 0 IND_CALL
.... 2: ffff800080ffaea0 -> ffff800080ffb688 16 cycles P 0 IND_CALL
.... 3: ffff800080139918 -> ffff800080ffae64 9 cycles P 0 CALL
.... 4: ffff800080dd3324 -> ffff8000801398f8 7 cycles P 0 CALL
.... 5: ffff8000800f8548 -> ffff800080dd330c 21 cycles P 0 IND_CALL
.... 6: ffff8000800f864c -> ffff8000800f84ec 6 cycles P 0 CALL
.... 7: ffff8000800f86dc -> ffff8000800f8638 11 cycles P 0 CALL
.... 8: ffff8000800f86d4 -> ffff800081008630 16 cycles P 0 CALL
-------------------------- Snip ----------------------

perf script and other tooling can also be applied on the captured perf.data
Similarly branch stack sampling records can be collected via direct system
call i.e perf_event_open() method after setting 'struct perf_event_attr' as
required.

event->attr.sample_type |= PERF_SAMPLE_BRANCH_STACK
event->attr.branch_sample_type |= PERF_SAMPLE_BRANCH_<FILTER_1> |
PERF_SAMPLE_BRANCH_<FILTER_2> |
PERF_SAMPLE_BRANCH_<FILTER_3> |
...............................

But all branch filters might not be supported on the platform.

----------------------- BRBE Branch Filters Support -----------------------

- Following branch filters are supported on arm64.

PERF_SAMPLE_BRANCH_USER /* Branch privilege filters */
PERF_SAMPLE_BRANCH_KERNEL
PERF_SAMPLE_BRANCH_HV

PERF_SAMPLE_BRANCH_ANY /* Branch type filters */
PERF_SAMPLE_BRANCH_ANY_CALL
PERF_SAMPLE_BRANCH_ANY_RETURN
PERF_SAMPLE_BRANCH_IND_CALL
PERF_SAMPLE_BRANCH_COND
PERF_SAMPLE_BRANCH_IND_JUMP
PERF_SAMPLE_BRANCH_CALL

PERF_SAMPLE_BRANCH_NO_FLAGS /* Branch record flags */
PERF_SAMPLE_BRANCH_NO_CYCLES
PERF_SAMPLE_BRANCH_TYPE_SAVE
PERF_SAMPLE_BRANCH_HW_INDEX
PERF_SAMPLE_BRANCH_PRIV_SAVE

- Following branch filters are not supported on arm64.

PERF_SAMPLE_BRANCH_ABORT_TX
PERF_SAMPLE_BRANCH_IN_TX
PERF_SAMPLE_BRANCH_NO_TX
PERF_SAMPLE_BRANCH_CALL_STACK

Events requesting above non-supported branch filters get rejected.

------------------ Possible 'branch_sample_type' Mismatch -----------------

Branch stack sampling attributes 'event->attr.branch_sample_type' generally
remain the same for all the events during a perf record session.

$perf record -e <event_1> -e <event_2> -j <branch_filters> [workload]

event_1->attr.branch_sample_type == event_2->attr.branch_sample_type

This 'branch_sample_type' is used to configure the BRBE hardware, when both
events i.e <event_1> and <event_2> get scheduled on a given PMU. But during
PMU HW event's privilege filter inheritance, 'branch_sample_type' does not
remain the same for all events. Let's consider the following example

$perf record -e cycles:u -e instructions:k -j any,save_type ls

cycles->attr.branch_sample_type != instructions->attr.branch_sample_type

Because cycles event inherits PERF_SAMPLE_BRANCH_USER and instruction event
inherits PERF_SAMPLE_BRANCH_KERNEL. The proposed solution here configures
BRBE hardware with 'branch_sample_type' from last event to be added in the
PMU and hence captured branch records only get passed on to matching events
during a PMU interrupt.

static int
armpmu_add(struct perf_event *event, int flags)
{
........
if (has_branch_stack(event)) {
/*
* Reset branch records buffer if a new task event gets
* scheduled on a PMU which might have existing records.
* Otherwise older branch records present in the buffer
* might leak into the new task event.
*/
if (event->ctx->task && hw_events->brbe_context != event->ctx) {
hw_events->brbe_context = event->ctx;
if (armpmu->branch_reset)
armpmu->branch_reset();
}
hw_events->brbe_users++;
Here -------> hw_events->brbe_sample_type = event->attr.branch_sample_type;
}
........
}

Instead of overriding existing 'branch_sample_type', both could be merged.

--------------------------- Virtualisation support ------------------------

- Branch stack sampling is not currently supported inside the guest (TODO)

- FEAT_BRBE advertised as absent via clearing ID_AA64DFR0_EL1.BRBE
- Future support in guest requires emulating FEAT_BRBE

- Branch stack sampling the guest is not supported in the host (TODO)

- Tracing the guest with event->attr.exclude_guest = 0
- There are multiple challenges involved regarding mixing events
with mismatched branch_sample_type and exclude_guest and passing
on captured BRBE records to intended events during PMU interrupt

- Guest access for BRBE registers and instructions has been blocked

- BRBE state save is not required for VHE host (EL2) guest (EL1) transition

- BRBE state is saved for NVHE host (EL1) guest (EL1) transition

-------------------------------- Testing ---------------------------------

- Cross compiled for both arm64 and arm32 platforms
- Passes all branch tests with 'perf test branch' on arm64

-------------------------------- Questions -------------------------------

- Instead of configuring the BRBE HW with branch_sample_type from the last
event to be added on the PMU as proposed, could those be merged together
e.g all privilege requests ORed, to form a common BRBE configuration and
all events get branch records after a PMU interrupt ?

Changes in V16

- Updated BRBINFx_EL1.TYPE = 0b110000 as field IMPDEF_TRAP_EL3
- Updated BRBCR_ELx[9] as field FZPSS
- Updated BRBINFINJ_EL1 to use sysreg field BRBINFx_EL1
- Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion
- Renamed arm_brbe.h as arm_pmuv3_branch.h
- Updated perf_sample_save_brstack()'s new argument requirements with NULL
- Fixed typo (s/informations/information) in Documentation/arch/arm64/brbe.rst
- Added SPDX-License-Identifier in Documentation/arch/arm64/brbe.rst
- Added new PERF_SAMPLE_BRANCH_COUNTERS into BRBE_EXCLUDE_BRANCH_FILTERS
- Dropped BRBFCR_EL1 and BRBCR_EL1 from enum vcpu_sysreg
- Reverted back the KVM NVHE patch - use host_debug_state based 'brbcr_el1'
element and dropped the previous dependency on Jame's coresight series

Changes in V15:

https://lore.kernel.org/all/[email protected]/

- Added a comment for armv8pmu_branch_probe() regarding single cpu probe
- Added a text in brbe.rst regarding single cpu probe
- Dropped runtime BRBE enable for setting DEBUG_STATE_SAVE_BRBE
- Dropped zero_branch_stack based zero branch records mechanism
- Replaced BRBFCR_EL1_DEFAULT_CONFIG with BRBFCR_EL1_CONFIG_MASK
- Added BRBFCR_EL1_CONFIG_MASK masking in branch_type_to_brbfcr()
- Moved BRBE helpers from arm_brbe.h into arm_brbe.c
- Moved armv8_pmu_xxx() declaration inside arm_brbe.h for arm64 (CONFIG_ARM64_BRBE)
- Moved armv8_pmu_xxx() stub definitions inside arm_brbe.h for arm32 (!CONFIG_ARM64_BRBE)
- Included arm_brbe.h header both in arm_pmuv3.c and arm_brbe.c
- Dropped BRBE custom pr_fmt()
- Dropped CONFIG_PERF_EVENTS wrapping from header entries
- Flush branch records when a cpu bound event follows a task bound event
- Dropped BRBFCR_EL1 from __debug_save_brbe()/__debug_restore_brbe()
- Always save the live SYS_BRBCR_EL1 in host context and then check if
BRBE was enabled before resetting SYS_BRBCR_EL1 for the host

Changes in V14:

https://lore.kernel.org/all/[email protected]/

- This series has been reorganised as suggested during V13
- There are just eight patches now i.e 5 enablement and 3 perf branch tests

- Fixed brackets problem in __SYS_BRBINFO/BRBSRC/BRBTGT() macros
- Renamed the macro i.e s/__SYS_BRBINFO/__SYS_BRBINF/
- Renamed s/BRB_IALL/BRB_IALL_INSN and s/BRBE_INJ/BRB_INJ_INSN
- Moved BRB_IALL_INSN and SYS_BRB_INSN instructions to sysreg patch
- Changed E1BRE as ExBRE in sysreg fields inside BRBCR_ELx
- Used BRBCR_ELx for defining all BRBCR_EL1, BRBCR_EL2, and BRBCR_EL12 (new)

- Folded the following three patches into a single patch i.e [PATCH 3/8]

drivers: perf: arm_pmu: Add new sched_task() callback
arm64/perf: Add branch stack support in struct arm_pmu
arm64/perf: Add branch stack support in struct pmu_hw_events
arm64/perf: Add branch stack support in ARMV8 PMU
arm64/perf: Add PERF_ATTACH_TASK_DATA to events with has_branch_stack()

- All armv8pmu_branch_xxxx() stub definitions have been moved inside
include/linux/perf/arm_pmuv3.h for easy access from both arm32 and arm64
- Added brbe_users, brbe_context and brbe_sample_type in struct pmu_hw_events
- Added comments for all the above new elements in struct pmu_hw_events
- Added branch_reset() and sched_task() callbacks
- Changed and optimized branch records processing during a PMU IRQ
- NO branch records get captured for event with mismatched brbe_sample_type
- Branch record context is tracked from armpmu_del() & armpmu_add()
- Branch record hardware is driven from armv8pmu_start() & armv8pmu_stop()
- Dropped NULL check for 'pmu_ctx' inside armv8pmu_sched_task()
- Moved down PERF_ATTACH_TASK_DATA assignment with a preceding comment
- In conflicting branch sample type requests, first event takes precedence

- Folded the following five patches from V13 into a single patch i.e
[PATCH 4/8]

arm64/perf: Enable branch stack events via FEAT_BRBE
arm64/perf: Add struct brbe_regset helper functions
arm64/perf: Implement branch records save on task sched out
arm64/perf: Implement branch records save on PMU IRQ

- Fixed the year in copyright statement
- Added Documentation/arch/arm64/brbe.rst
- Updated Documentation/arch/arm64/booting.rst (BRBCR_EL2.CC for EL1 entry)
- Added __init_el2_brbe() which enables branch record cycle count support
- Disabled EL2 traps in __init_el2_fgt() while accessing BRBE registers and
executing instructions
- Changed CONFIG_ARM64_BRBE user visible description
- Fixed a typo in CONFIG_ARM64_BRBE config option description text
- Added BUILD_BUG_ON() co-relating BRBE_BANK_MAX_ENTRIES and MAX_BRANCH_RECORDS
- Dropped arm64_create_brbe_task_ctx_kmem_cache()
- Moved down comment for PERF_SAMPLE_BRANCH_KERNEL in branch_type_to_brbcr()
- Renamed BRBCR_ELx_DEFAULT_CONFIG as BRBCR_ELx_CONFIG_MASK
- Replaced BRBCR_ELx_DEFAULT_TS with BRBCR_ELx_TS_MASK in BRBCR_ELx_CONFIG_MASK
- Replaced BRBCR_ELx_E1BRE instances with BRBCR_ELx_ExBRE

- Added BRBE specific branch stack sampling perf test patches into the series
- Added a patch to prevent guest accesses into BRBE registers and instructions
- Added a patch to save the BRBE host context in NVHE environment
- Updated most commit messages

Changes in V13:

https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/

- Added branch callback stubs for aarch32 pmuv3 based platforms
- Updated the comments for capture_brbe_regset()
- Deleted the comments in __read_brbe_regset()
- Reversed the arguments order in capture_brbe_regset() and brbe_branch_save()
- Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in armv8pmu_branch_read()
- Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in capture_brbe_regset()

Changes in V12:

https://lore.kernel.org/all/[email protected]/

- Replaced branch types with complete DIRECT/INDIRECT prefixes/suffixes
- Replaced branch types with complete INSN/ALIGN prefixes/suffixes
- Replaced return branch types as simple RET/ERET
- Replaced time field GST_PHYSICAL as GUEST_PHYSICAL
- Added 0 padding for BRBIDR0_EL1.NUMREC enum values
- Dropped helper arm_pmu_branch_stack_supported()
- Renamed armv8pmu_branch_valid() as armv8pmu_branch_attr_valid()
- Separated perf_task_ctx_cache setup from arm_pmu private allocation
- Collected changes to branch_records_alloc() in a single patch [5/10]
- Reworked and cleaned up branch_records_alloc()
- Reworked armv8pmu_branch_read() with new loop iterations in patch [6/10]
- Reworked capture_brbe_regset() with new loop iterations in patch [8/10]
- Updated the comment in branch_type_to_brbcr()
- Fixed the comment before stitch_stored_live_entries()
- Fixed BRBINFINJ_EL1 definition for VALID_FULL enum field
- Factored out helper __read_brbe_regset() from capture_brbe_regset()
- Dropped the helper copy_brbe_regset()
- Simplified stitch_stored_live_entries() with memcpy(), memmove()
- Reworked armv8pmu_probe_pmu() to bail out early with !probe.present
- Rework brbe_attributes_probe() without 'struct brbe_hw_attr'
- Dropped 'struct brbe_hw_attr' argument from capture_brbe_regset()
- Dropped 'struct brbe_hw_attr' argument from brbe_branch_save()
- Dropped arm_pmu->private and added arm_pmu->reg_trbidr instead

Changes in V11:

https://lore.kernel.org/all/[email protected]/

- Fixed the crash for per-cpu events without event->pmu_ctx->task_ctx_data

Changes in V10:

https://lore.kernel.org/all/[email protected]/

- Rebased the series on v6.4-rc2
- Moved ARMV8 PMUV3 changes inside drivers/perf/arm_pmuv3.c
- Moved BRBE driver changes inside drivers/perf/arm_brbe.[c|h]
- Moved the WARN_ON() inside the if condition in armv8pmu_handle_irq()

Changes in V9:

https://lore.kernel.org/all/[email protected]/

- Fixed build problem with has_branch_stack() in arm64 header
- BRBINF_EL1 definition has been changed from 'Sysreg' to 'SysregFields'
- Renamed all BRBINF_EL1 call sites as BRBINFx_EL1
- Dropped static const char branch_filter_error_msg[]
- Implemented a positive list check for BRBE supported perf branch filters
- Added a comment in armv8pmu_handle_irq()
- Implemented per-cpu allocation for struct branch_record records
- Skipped looping through bank 1 if an invalid record is detected in bank 0
- Added comment in armv8pmu_branch_read() explaining prohibited region etc
- Added comment warning about erroneously marking transactions as aborted
- Replaced the first argument (perf_branch_entry) in capture_brbe_flags()
- Dropped the last argument (idx) in capture_brbe_flags()
- Dropped the brbcr argument from capture_brbe_flags()
- Used perf_sample_save_brstack() to capture branch records for perf_sample_data
- Added comment explaining rationale for setting BRBCR_EL1_FZP for user only traces
- Dropped BRBE prohibited state mechanism while in armv8pmu_branch_read()
- Implemented event task context based branch records save mechanism

Changes in V8:

https://lore.kernel.org/all/[email protected]/

- Replaced arm_pmu->features as arm_pmu->has_branch_stack, updated its helper
- Added a comment and line break before arm_pmu->private element
- Added WARN_ON_ONCE() in helpers i.e armv8pmu_branch_[read|valid|enable|disable]()
- Dropped comments in armv8pmu_enable_event() and armv8pmu_disable_event()
- Replaced open bank encoding in BRBFCR_EL1 with SYS_FIELD_PREP()
- Changed brbe_hw_attr->brbe_version from 'bool' to 'int'
- Updated pr_warn() as pr_warn_once() with values in brbe_get_perf_[type|priv]()
- Replaced all pr_warn_once() as pr_debug_once() in armv8pmu_branch_valid()
- Added a comment in branch_type_to_brbcr() for the BRBCR_EL1 privilege settings
- Modified the comment related to BRBINFx_EL1.LASTFAILED in capture_brbe_flags()
- Modified brbe_get_perf_entry_type() as brbe_set_perf_entry_type()
- Renamed brbe_valid() as brbe_record_is_complete()
- Renamed brbe_source() as brbe_record_is_source_only()
- Renamed brbe_target() as brbe_record_is_target_only()
- Inverted checks for !brbe_record_is_[target|source]_only() for info capture
- Replaced 'fetch' with 'get' in all helpers that extract field value
- Dropped 'static int brbe_current_bank' optimization in select_brbe_bank()
- Dropped select_brbe_bank_index() completely, added capture_branch_entry()
- Process captured branch entries in two separate loops one for each BRBE bank
- Moved branch_records_alloc() inside armv8pmu_probe_pmu()
- Added a forward declaration for the helper has_branch_stack()
- Added new callbacks armv8pmu_private_alloc() and armv8pmu_private_free()
- Updated armv8pmu_probe_pmu() to allocate the private structure before SMP call

Changes in V7:

https://lore.kernel.org/all/[email protected]/

- Folded [PATCH 7/7] into [PATCH 3/7] which enables branch stack sampling event
- Defined BRBFCR_EL1_BRANCH_FILTERS, BRBCR_EL1_DEFAULT_CONFIG in the header
- Defined BRBFCR_EL1_DEFAULT_CONFIG in the header
- Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_FZP
- Defined BRBCR_EL1_DEFAULT_TS in the header
- Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_DEFAULT_TS
- Moved BRBCR_EL1_DEFAULT_CONFIG check inside branch_type_to_brbcr()
- Moved down BRBCR_EL1_CC, BRBCR_EL1_MPRED later in branch_type_to_brbcr()
- Also set BRBE in paused state in armv8pmu_branch_disable()
- Dropped brbe_paused(), set_brbe_paused() helpers
- Extracted error string via branch_filter_error_msg[] for armv8pmu_branch_valid()
- Replaced brbe_v1p1 with brbe_version in struct brbe_hw_attr
- Added valid_brbe_[cc, format, version]() helpers
- Split a separate brbe_attributes_probe() from armv8pmu_branch_probe()
- Capture event->attr.branch_sample_type earlier in armv8pmu_branch_valid()
- Defined enum brbe_bank_idx with possible values for BRBE bank indices
- Changed armpmu->hw_attr into armpmu->private
- Added missing space in stub definition for armv8pmu_branch_valid()
- Replaced both kmalloc() with kzalloc()
- Added BRBE_BANK_MAX_ENTRIES
- Updated comment for capture_brbe_flags()
- Updated comment for struct brbe_hw_attr
- Dropped space after type cast in couple of places
- Replaced inverse with negation for testing BRBCR_EL1_FZP in armv8pmu_branch_read()
- Captured cpuc->branches->branch_entries[idx] in a local variable
- Dropped saved_priv from armv8pmu_branch_read()
- Reorganize PERF_SAMPLE_BRANCH_NO_[CYCLES|NO_FLAGS] related configuration
- Replaced with FIELD_GET() and FIELD_PREP() wherever applicable
- Replaced BRBCR_EL1_TS_PHYSICAL with BRBCR_EL1_TS_VIRTUAL
- Moved valid_brbe_nr(), valid_brbe_cc(), valid_brbe_format(), valid_brbe_version()
select_brbe_bank(), select_brbe_bank_index() helpers inside the C implementation
- Reorganized brbe_valid_nr() and dropped the pr_warn() message
- Changed probe sequence in brbe_attributes_probe()
- Added 'brbcr' argument into capture_brbe_flags() to ascertain correct state
- Disable BRBE before disabling the PMU event counter
- Enable PERF_SAMPLE_BRANCH_HV filters when is_kernel_in_hyp_mode()
- Guard armv8pmu_reset() & armv8pmu_sched_task() with arm_pmu_branch_stack_supported()

Changes in V6:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

- Restore the exception level privilege after reading the branch records
- Unpause the buffer after reading the branch records
- Decouple BRBCR_EL1_EXCEPTION/ERTN from perf event privilege level
- Reworked BRBE implementation and branch stack sampling support on arm pmu
- BRBE implementation is now part of overall ARMV8 PMU implementation
- BRBE implementation moved from drivers/perf/ to inside arch/arm64/kernel/
- CONFIG_ARM_BRBE_PMU renamed as CONFIG_ARM64_BRBE in arch/arm64/Kconfig
- File moved - drivers/perf/arm_pmu_brbe.c -> arch/arm64/kernel/brbe.c
- File moved - drivers/perf/arm_pmu_brbe.h -> arch/arm64/kernel/brbe.h
- BRBE name has been dropped from struct arm_pmu and struct hw_pmu_events
- BRBE name has been abstracted out as 'branches' in arm_pmu and hw_pmu_events
- BRBE name has been abstracted out as 'branches' in ARMV8 PMU implementation
- Added sched_task() callback into struct arm_pmu
- Added 'hw_attr' into struct arm_pmu encapsulating possible PMU HW attributes
- Dropped explicit attributes brbe_(v1p1, nr, cc, format) from struct arm_pmu
- Dropped brbfcr, brbcr, registers scratch area from struct hw_pmu_events
- Dropped brbe_users, brbe_context tracking in struct hw_pmu_events
- Added 'features' tracking into struct arm_pmu with ARM_PMU_BRANCH_STACK flag
- armpmu->hw_attr maps into 'struct brbe_hw_attr' inside BRBE implementation
- Set ARM_PMU_BRANCH_STACK in 'arm_pmu->features' after successful BRBE probe
- Added armv8pmu_branch_reset() inside armv8pmu_branch_enable()
- Dropped brbe_supported() as events will be rejected via ARM_PMU_BRANCH_STACK
- Dropped set_brbe_disabled() as well
- Reformated armv8pmu_branch_valid() warnings while rejecting unsupported events

Changes in V5:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

- Changed BRBCR_EL1.VIRTUAL from 0b1 to 0b01
- Changed BRBFCR_EL1.EnL into BRBFCR_EL1.EnI
- Changed config ARM_BRBE_PMU from 'tristate' to 'bool'

Changes in V4:

https://lore.kernel.org/all/[email protected]/

- Changed ../tools/sysreg declarations as suggested
- Set PERF_SAMPLE_BRANCH_STACK in data.sample_flags
- Dropped perfmon_capable() check in armpmu_event_init()
- s/pr_warn_once/pr_info in armpmu_event_init()
- Added brbe_format element into struct pmu_hw_events
- Changed v1p1 as brbe_v1p1 in struct pmu_hw_events
- Dropped pr_info() from arm64_pmu_brbe_probe(), solved LOCKDEP warning

Changes in V3:

https://lore.kernel.org/all/[email protected]/

- Moved brbe_stack from the stack and now dynamically allocated
- Return PERF_BR_PRIV_UNKNOWN instead of -1 in brbe_fetch_perf_priv()
- Moved BRBIDR0, BRBCR, BRBFCR registers and fields into tools/sysreg
- Created dummy BRBINF_EL1 field definitions in tools/sysreg
- Dropped ARMPMU_EVT_PRIV framework which cached perfmon_capable()
- Both exception and exception return branche records are now captured
only if the event has PERF_SAMPLE_BRANCH_KERNEL which would already
been checked in generic perf via perf_allow_kernel()

Changes in V2:

https://lore.kernel.org/all/[email protected]/

- Dropped branch sample filter helpers consolidation patch from this series
- Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable()
- Use cached perfmon_capable() while configuring BRBE branch record filters

Changes in V1:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

- Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers
- Process new perf branch types via PERF_BR_EXTEND_ABI

Changes in RFC V2:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

- Added branch_sample_priv() while consolidating other branch sample filter helpers
- Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc
- Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5)
- Added documentation for struct arm_pmu changes, updated commit message
- Updated commit message for BRBE detection infrastructure patch
- PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver)
- Branch privilege state capture mechanism has now moved inside the driver

Changes in RFC V1:

https://lore.kernel.org/all/[email protected]/

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: James Clark <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Suzuki Poulose <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Anshuman Khandual (5):
arm64/sysreg: Add BRBE registers and fields
KVM: arm64: Prevent guest accesses into BRBE system registers/instructions
drivers: perf: arm_pmuv3: Enable branch stack sampling framework
drivers: perf: arm_pmuv3: Enable branch stack sampling via FEAT_BRBE
KVM: arm64: nvhe: Disable branch generation in nVHE guests

James Clark (3):
perf: test: Speed up running brstack test on an Arm model
perf: test: Remove empty lines from branch filter test output
perf: test: Extend branch stack sampling test for Arm64 BRBE

Documentation/arch/arm64/booting.rst | 6 +
Documentation/arch/arm64/brbe.rst | 158 ++++
arch/arm64/include/asm/el2_setup.h | 113 ++-
arch/arm64/include/asm/kvm_host.h | 5 +-
arch/arm64/include/asm/sysreg.h | 109 +++
arch/arm64/kvm/debug.c | 5 +
arch/arm64/kvm/hyp/nvhe/debug-sr.c | 33 +
arch/arm64/kvm/sys_regs.c | 56 ++
arch/arm64/tools/sysreg | 131 ++++
drivers/perf/Kconfig | 11 +
drivers/perf/Makefile | 1 +
drivers/perf/arm_brbe.c | 986 +++++++++++++++++++++++++
drivers/perf/arm_pmu.c | 57 +-
drivers/perf/arm_pmuv3.c | 141 +++-
drivers/perf/arm_pmuv3_branch.h | 63 ++
include/linux/perf/arm_pmu.h | 34 +-
include/linux/perf/arm_pmuv3.h | 1 -
tools/perf/tests/builtin-test.c | 1 +
tools/perf/tests/shell/test_brstack.sh | 57 +-
tools/perf/tests/tests.h | 1 +
tools/perf/tests/workloads/Build | 2 +
tools/perf/tests/workloads/traploop.c | 39 +
22 files changed, 1995 insertions(+), 15 deletions(-)
create mode 100644 Documentation/arch/arm64/brbe.rst
create mode 100644 drivers/perf/arm_brbe.c
create mode 100644 drivers/perf/arm_pmuv3_branch.h
create mode 100644 tools/perf/tests/workloads/traploop.c

--
2.25.1



2024-01-25 09:44:28

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions

Currently BRBE feature is not supported in a guest environment. This hides
BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field. This also
blocks guest accesses into BRBE system registers and instructions as if the
underlying hardware never implemented FEAT_BRBE feature.

Cc: Marc Zyngier <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: James Morse <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
Changes in V16:

- Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion

arch/arm64/kvm/sys_regs.c | 56 +++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 30253bd19917..6a06dc2f0c06 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1304,6 +1304,11 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
return 0;
}

+#define BRB_INF_SRC_TGT_EL1(n) \
+ { SYS_DESC(SYS_BRBINF##n##_EL1), undef_access }, \
+ { SYS_DESC(SYS_BRBSRC##n##_EL1), undef_access }, \
+ { SYS_DESC(SYS_BRBTGT##n##_EL1), undef_access } \
+
/* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
#define DBG_BCR_BVR_WCR_WVR_EL1(n) \
{ SYS_DESC(SYS_DBGBVRn_EL1(n)), \
@@ -1707,6 +1712,9 @@ static u64 read_sanitised_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
/* Hide SPE from guests */
val &= ~ID_AA64DFR0_EL1_PMSVer_MASK;

+ /* Hide BRBE from guests */
+ val &= ~ID_AA64DFR0_EL1_BRBE_MASK;
+
return val;
}

@@ -2195,6 +2203,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_DC_CISW), access_dcsw },
{ SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
{ SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
+ { SYS_DESC(OP_BRB_IALL), undef_access },
+ { SYS_DESC(OP_BRB_INJ), undef_access },

DBG_BCR_BVR_WCR_WVR_EL1(0),
DBG_BCR_BVR_WCR_WVR_EL1(1),
@@ -2225,6 +2235,52 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_DBGCLAIMCLR_EL1), trap_raz_wi },
{ SYS_DESC(SYS_DBGAUTHSTATUS_EL1), trap_dbgauthstatus_el1 },

+ /*
+ * BRBE branch record sysreg address space is interleaved between
+ * corresponding BRBINF<N>_EL1, BRBSRC<N>_EL1, and BRBTGT<N>_EL1.
+ */
+ BRB_INF_SRC_TGT_EL1(0),
+ BRB_INF_SRC_TGT_EL1(16),
+ BRB_INF_SRC_TGT_EL1(1),
+ BRB_INF_SRC_TGT_EL1(17),
+ BRB_INF_SRC_TGT_EL1(2),
+ BRB_INF_SRC_TGT_EL1(18),
+ BRB_INF_SRC_TGT_EL1(3),
+ BRB_INF_SRC_TGT_EL1(19),
+ BRB_INF_SRC_TGT_EL1(4),
+ BRB_INF_SRC_TGT_EL1(20),
+ BRB_INF_SRC_TGT_EL1(5),
+ BRB_INF_SRC_TGT_EL1(21),
+ BRB_INF_SRC_TGT_EL1(6),
+ BRB_INF_SRC_TGT_EL1(22),
+ BRB_INF_SRC_TGT_EL1(7),
+ BRB_INF_SRC_TGT_EL1(23),
+ BRB_INF_SRC_TGT_EL1(8),
+ BRB_INF_SRC_TGT_EL1(24),
+ BRB_INF_SRC_TGT_EL1(9),
+ BRB_INF_SRC_TGT_EL1(25),
+ BRB_INF_SRC_TGT_EL1(10),
+ BRB_INF_SRC_TGT_EL1(26),
+ BRB_INF_SRC_TGT_EL1(11),
+ BRB_INF_SRC_TGT_EL1(27),
+ BRB_INF_SRC_TGT_EL1(12),
+ BRB_INF_SRC_TGT_EL1(28),
+ BRB_INF_SRC_TGT_EL1(13),
+ BRB_INF_SRC_TGT_EL1(29),
+ BRB_INF_SRC_TGT_EL1(14),
+ BRB_INF_SRC_TGT_EL1(30),
+ BRB_INF_SRC_TGT_EL1(15),
+ BRB_INF_SRC_TGT_EL1(31),
+
+ /* Remaining BRBE sysreg addresses space */
+ { SYS_DESC(SYS_BRBCR_EL1), undef_access },
+ { SYS_DESC(SYS_BRBFCR_EL1), undef_access },
+ { SYS_DESC(SYS_BRBTS_EL1), undef_access },
+ { SYS_DESC(SYS_BRBINFINJ_EL1), undef_access },
+ { SYS_DESC(SYS_BRBSRCINJ_EL1), undef_access },
+ { SYS_DESC(SYS_BRBTGTINJ_EL1), undef_access },
+ { SYS_DESC(SYS_BRBIDR0_EL1), undef_access },
+
{ SYS_DESC(SYS_MDCCSR_EL0), trap_raz_wi },
{ SYS_DESC(SYS_DBGDTR_EL0), trap_raz_wi },
// DBGDTR[TR]X_EL0 share the same encoding
--
2.25.1


2024-01-25 09:51:09

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V16 3/8] drivers: perf: arm_pmuv3: Enable branch stack sampling framework

Branch stack sampling support i.e capturing branch records during execution
in core perf, rides along with normal HW events being scheduled on the PMU.
This prepares ARMV8 PMU framework for branch stack support on relevant PMUs
with required HW implementation.

ARMV8 PMU hardware support for branch stack sampling is indicated via a new
feature flag called 'has_branch_stack' that can be ascertained via probing.
This modifies current gate in armpmu_event_init() which blocks branch stack
sampling based perf events unconditionally. Instead allows such perf events
getting initialized on supporting PMU hardware.

Branch stack sampling is enabled and disabled along with regular PMU events
This adds required function callbacks in armv8pmu_branch_xxx() format, to
drive the PMU branch stack hardware when supported. This also adds fallback
stub definitions for these callbacks for PMUs which would not have required
support.

If a task gets scheduled out, the current branch records get saved in the
task's context data, which can be later used to fill in the records upon an
event overflow. Hence, we enable PERF_ATTACH_TASK_DATA (event->attach_state
based flag) for branch stack requesting perf events. But this also requires
adding support for pmu::sched_task() callback to arm_pmu.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
Changes in V16:

- Renamed arm_brbe.h as arm_pmuv3_branch.h
- Updated perf_sample_save_brstack()'s new argument requirements with NULL

drivers/perf/arm_pmu.c | 57 ++++++++++++-
drivers/perf/arm_pmuv3.c | 141 +++++++++++++++++++++++++++++++-
drivers/perf/arm_pmuv3_branch.h | 50 +++++++++++
include/linux/perf/arm_pmu.h | 29 ++++++-
include/linux/perf/arm_pmuv3.h | 1 -
5 files changed, 273 insertions(+), 5 deletions(-)
create mode 100644 drivers/perf/arm_pmuv3_branch.h

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 8458fe2cebb4..16f488ae7747 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -317,6 +317,15 @@ armpmu_del(struct perf_event *event, int flags)
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;

+ if (has_branch_stack(event)) {
+ WARN_ON_ONCE(!hw_events->brbe_users);
+ hw_events->brbe_users--;
+ if (!hw_events->brbe_users) {
+ hw_events->brbe_context = NULL;
+ hw_events->brbe_sample_type = 0;
+ }
+ }
+
armpmu_stop(event, PERF_EF_UPDATE);
hw_events->events[idx] = NULL;
armpmu->clear_event_idx(hw_events, event);
@@ -333,6 +342,38 @@ armpmu_add(struct perf_event *event, int flags)
struct hw_perf_event *hwc = &event->hw;
int idx;

+ if (has_branch_stack(event)) {
+ /*
+ * Reset branch records buffer if a new CPU bound event
+ * gets scheduled on a PMU. Otherwise existing branch
+ * records present in the buffer might just leak into
+ * such events.
+ *
+ * Also reset current 'hw_events->brbe_context' because
+ * any previous task bound event now would have lost an
+ * opportunity for continuous branch records.
+ */
+ if (!event->ctx->task) {
+ hw_events->brbe_context = NULL;
+ if (armpmu->branch_reset)
+ armpmu->branch_reset();
+ }
+
+ /*
+ * Reset branch records buffer if a new task event gets
+ * scheduled on a PMU which might have existing records.
+ * Otherwise older branch records present in the buffer
+ * might leak into the new task event.
+ */
+ if (event->ctx->task && hw_events->brbe_context != event->ctx) {
+ hw_events->brbe_context = event->ctx;
+ if (armpmu->branch_reset)
+ armpmu->branch_reset();
+ }
+ hw_events->brbe_users++;
+ hw_events->brbe_sample_type = event->attr.branch_sample_type;
+ }
+
/* An event following a process won't be stopped earlier */
if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
return -ENOENT;
@@ -511,13 +552,24 @@ static int armpmu_event_init(struct perf_event *event)
!cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
return -ENOENT;

- /* does not support taken branch sampling */
- if (has_branch_stack(event))
+ /*
+ * Branch stack sampling events are allowed
+ * only on PMU which has required support.
+ */
+ if (has_branch_stack(event) && !armpmu->has_branch_stack)
return -EOPNOTSUPP;

return __hw_perf_event_init(event);
}

+static void armpmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
+
+ if (armpmu->sched_task)
+ armpmu->sched_task(pmu_ctx, sched_in);
+}
+
static void armpmu_enable(struct pmu *pmu)
{
struct arm_pmu *armpmu = to_arm_pmu(pmu);
@@ -864,6 +916,7 @@ struct arm_pmu *armpmu_alloc(void)
}

pmu->pmu = (struct pmu) {
+ .sched_task = armpmu_sched_task,
.pmu_enable = armpmu_enable,
.pmu_disable = armpmu_disable,
.event_init = armpmu_event_init,
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 23fa6c5da82c..9e17764a0929 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -26,6 +26,7 @@
#include <linux/nmi.h>

#include <asm/arm_pmuv3.h>
+#include "arm_pmuv3_branch.h"

/* ARMv8 Cortex-A53 specific event types. */
#define ARMV8_A53_PERFCTR_PREF_LINEFILL 0xC2
@@ -829,14 +830,56 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu)
armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);

kvm_vcpu_pmu_resync_el0();
+ if (cpu_pmu->has_branch_stack)
+ armv8pmu_branch_enable(cpu_pmu);
}

static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
{
+ if (cpu_pmu->has_branch_stack)
+ armv8pmu_branch_disable();
+
/* Disable all counters */
armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
}

+static void read_branch_records(struct pmu_hw_events *cpuc,
+ struct perf_event *event,
+ struct perf_sample_data *data,
+ bool *branch_captured)
+{
+ /*
+ * CPU specific branch records buffer must have been allocated already
+ * for the hardware records to be captured and processed further.
+ */
+ if (WARN_ON(!cpuc->branches))
+ return;
+
+ /*
+ * Overflowed event's branch_sample_type does not match the configured
+ * branch filters in the BRBE HW. So the captured branch records here
+ * cannot be co-related to the overflowed event. Report to the user as
+ * if no branch records have been captured, and flush branch records.
+ * The same scenario is applicable when the current task context does
+ * not match with overflown event.
+ */
+ if ((cpuc->brbe_sample_type != event->attr.branch_sample_type) ||
+ (event->ctx->task && cpuc->brbe_context != event->ctx))
+ return;
+
+ /*
+ * Read the branch records from the hardware once after the PMU IRQ
+ * has been triggered but subsequently same records can be used for
+ * other events that might have been overflowed simultaneously thus
+ * saving much CPU cycles.
+ */
+ if (!*branch_captured) {
+ armv8pmu_branch_read(cpuc, event);
+ *branch_captured = true;
+ }
+ perf_sample_save_brstack(data, event, &cpuc->branches->branch_stack, NULL);
+}
+
static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
{
u32 pmovsr;
@@ -844,6 +887,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
struct pt_regs *regs;
int idx;
+ bool branch_captured = false;

/*
* Get and reset the IRQ flags
@@ -887,6 +931,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
if (!armpmu_event_set_period(event))
continue;

+ /*
+ * PMU IRQ should remain asserted until all branch records
+ * are captured and processed into struct perf_sample_data.
+ */
+ if (has_branch_stack(event) && cpu_pmu->has_branch_stack)
+ read_branch_records(cpuc, event, &data, &branch_captured);
+
/*
* Perf event overflow will queue the processing of the event as
* an irq_work which will be taken care of in the handling of
@@ -896,6 +947,8 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
cpu_pmu->disable(event);
}
armv8pmu_start(cpu_pmu);
+ if (cpu_pmu->has_branch_stack)
+ armv8pmu_branch_reset();

return IRQ_HANDLED;
}
@@ -985,6 +1038,24 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
return event->hw.idx;
}

+static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
+ void *task_ctx = pmu_ctx->task_ctx_data;
+
+ if (armpmu->has_branch_stack) {
+ /* Save branch records in task_ctx on sched out */
+ if (task_ctx && !sched_in) {
+ armv8pmu_branch_save(armpmu, task_ctx);
+ return;
+ }
+
+ /* Reset branch records on sched in */
+ if (sched_in)
+ armv8pmu_branch_reset();
+ }
+}
+
/*
* Add an event filter to a given event.
*/
@@ -1077,6 +1148,9 @@ static void armv8pmu_reset(void *info)
pmcr |= ARMV8_PMU_PMCR_LP;

armv8pmu_pmcr_write(pmcr);
+
+ if (cpu_pmu->has_branch_stack)
+ armv8pmu_branch_reset();
}

static int __armv8_pmuv3_map_event_id(struct arm_pmu *armpmu,
@@ -1114,6 +1188,20 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,

hw_event_id = __armv8_pmuv3_map_event_id(armpmu, event);

+ if (has_branch_stack(event)) {
+ if (!armv8pmu_branch_attr_valid(event))
+ return -EOPNOTSUPP;
+
+ /*
+ * If a task gets scheduled out, the current branch records
+ * get saved in the task's context data, which can be later
+ * used to fill in the records upon an event overflow. Let's
+ * enable PERF_ATTACH_TASK_DATA in 'event->attach_state' for
+ * all branch stack sampling perf events.
+ */
+ event->attach_state |= PERF_ATTACH_TASK_DATA;
+ }
+
/*
* CHAIN events only work when paired with an adjacent counter, and it
* never makes sense for a user to open one in isolation, as they'll be
@@ -1229,6 +1317,41 @@ static void __armv8pmu_probe_pmu(void *info)
cpu_pmu->reg_pmmir = read_pmmir();
else
cpu_pmu->reg_pmmir = 0;
+
+ /*
+ * BRBE is being probed on a single cpu for a
+ * given PMU. The remaining cpus, are assumed
+ * to have the exact same BRBE implementation.
+ */
+ armv8pmu_branch_probe(cpu_pmu);
+}
+
+static int branch_records_alloc(struct arm_pmu *armpmu)
+{
+ struct branch_records __percpu *records;
+ int cpu;
+
+ records = alloc_percpu_gfp(struct branch_records, GFP_KERNEL);
+ if (!records)
+ return -ENOMEM;
+
+ /*
+ * percpu memory allocated for 'records' gets completely consumed
+ * here, and never required to be freed up later. So permanently
+ * losing access to this anchor i.e 'records' is acceptable.
+ *
+ * Otherwise this allocation handle would have to be saved up for
+ * free_percpu() release later if required.
+ */
+ for_each_possible_cpu(cpu) {
+ struct pmu_hw_events *events_cpu;
+ struct branch_records *records_cpu;
+
+ events_cpu = per_cpu_ptr(armpmu->hw_events, cpu);
+ records_cpu = per_cpu_ptr(records, cpu);
+ events_cpu->branches = records_cpu;
+ }
+ return 0;
}

static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
@@ -1245,7 +1368,21 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
if (ret)
return ret;

- return probe.present ? 0 : -ENODEV;
+ if (!probe.present)
+ return -ENODEV;
+
+ if (cpu_pmu->has_branch_stack) {
+ ret = armv8pmu_task_ctx_cache_alloc(cpu_pmu);
+ if (ret)
+ return ret;
+
+ ret = branch_records_alloc(cpu_pmu);
+ if (ret) {
+ armv8pmu_task_ctx_cache_free(cpu_pmu);
+ return ret;
+ }
+ }
+ return 0;
}

static void armv8pmu_disable_user_access_ipi(void *unused)
@@ -1304,6 +1441,8 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
cpu_pmu->set_event_filter = armv8pmu_set_event_filter;

cpu_pmu->pmu.event_idx = armv8pmu_user_event_idx;
+ cpu_pmu->sched_task = armv8pmu_sched_task;
+ cpu_pmu->branch_reset = armv8pmu_branch_reset;

cpu_pmu->name = name;
cpu_pmu->map_event = map_event;
diff --git a/drivers/perf/arm_pmuv3_branch.h b/drivers/perf/arm_pmuv3_branch.h
new file mode 100644
index 000000000000..609e4d4ccac6
--- /dev/null
+++ b/drivers/perf/arm_pmuv3_branch.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Branch Record Buffer Extension Helpers.
+ *
+ * Copyright (C) 2022-2023 ARM Limited
+ *
+ * Author: Anshuman Khandual <[email protected]>
+ */
+#include <linux/perf/arm_pmu.h>
+
+static inline void armv8pmu_branch_reset(void)
+{
+}
+
+static inline void armv8pmu_branch_probe(struct arm_pmu *arm_pmu)
+{
+}
+
+static inline bool armv8pmu_branch_attr_valid(struct perf_event *event)
+{
+ WARN_ON_ONCE(!has_branch_stack(event));
+ return false;
+}
+
+static inline void armv8pmu_branch_enable(struct arm_pmu *arm_pmu)
+{
+}
+
+static inline void armv8pmu_branch_disable(void)
+{
+}
+
+static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc,
+ struct perf_event *event)
+{
+ WARN_ON_ONCE(!has_branch_stack(event));
+}
+
+static inline void armv8pmu_branch_save(struct arm_pmu *arm_pmu, void *ctx)
+{
+}
+
+static inline int armv8pmu_task_ctx_cache_alloc(struct arm_pmu *arm_pmu)
+{
+ return 0;
+}
+
+static inline void armv8pmu_task_ctx_cache_free(struct arm_pmu *arm_pmu)
+{
+}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index b3b34f6670cf..8cfcc735c0f7 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -46,6 +46,18 @@ static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_63BIT) == ARMPMU_EVT_63BIT);
}, \
}

+/*
+ * Maximum branch record entries which could be processed
+ * for core perf branch stack sampling support, regardless
+ * of the hardware support available on a given ARM PMU.
+ */
+#define MAX_BRANCH_RECORDS 64
+
+struct branch_records {
+ struct perf_branch_stack branch_stack;
+ struct perf_branch_entry branch_entries[MAX_BRANCH_RECORDS];
+};
+
/* The events for a given PMU register set. */
struct pmu_hw_events {
/*
@@ -66,6 +78,17 @@ struct pmu_hw_events {
struct arm_pmu *percpu_pmu;

int irq;
+
+ struct branch_records *branches;
+
+ /* Active context for task events */
+ void *brbe_context;
+
+ /* Active events requesting branch records */
+ unsigned int brbe_users;
+
+ /* Active branch sample type filters */
+ unsigned long brbe_sample_type;
};

enum armpmu_attr_groups {
@@ -96,8 +119,12 @@ struct arm_pmu {
void (*stop)(struct arm_pmu *);
void (*reset)(void *);
int (*map_event)(struct perf_event *event);
+ void (*sched_task)(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
+ void (*branch_reset)(void);
int num_events;
- bool secure_access; /* 32-bit ARM only */
+ unsigned int secure_access : 1, /* 32-bit ARM only */
+ has_branch_stack: 1, /* 64-bit ARM only */
+ reserved : 30;
#define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
#define ARMV8_PMUV3_EXT_COMMON_EVENT_BASE 0x4000
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index 46377e134d67..c3e7d2cfb737 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -308,5 +308,4 @@
default: WARN(1, "Invalid PMEV* index\n"); \
} \
} while (0)
-
#endif
--
2.25.1


2024-01-25 09:51:26

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V16 5/8] KVM: arm64: nvhe: Disable branch generation in nVHE guests

Disable the BRBE before we enter the guest, saving the status and enable it
back once we get out of the guest. This avoids capturing branch records in
the guest kernel or userspace, which would be confusing the host samples.

Cc: Marc Zyngier <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: James Morse <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
CC: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
Changes in V16:

- Dropped BRBCR_EL1 and BRBFCR_EL1 from enum vcpu_sysreg
- Reverted back the KVM NVHE patch - used host_debug_state based 'brbcr_el1'
element, and dropped the previous dependency on Jame's coresight series

arch/arm64/include/asm/kvm_host.h | 5 ++++-
arch/arm64/kvm/debug.c | 5 +++++
arch/arm64/kvm/hyp/nvhe/debug-sr.c | 33 ++++++++++++++++++++++++++++++
3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 21c57b812569..bce8792092af 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -569,7 +569,7 @@ struct kvm_vcpu_arch {
u8 cflags;

/* Input flags to the hypervisor code, potentially cleared after use */
- u8 iflags;
+ u16 iflags;

/* State flags for kernel bookkeeping, unused by the hypervisor code */
u8 sflags;
@@ -610,6 +610,7 @@ struct kvm_vcpu_arch {
u64 pmscr_el1;
/* Self-hosted trace */
u64 trfcr_el1;
+ u64 brbcr_el1;
} host_debug_state;

/* VGIC state */
@@ -779,6 +780,8 @@ struct kvm_vcpu_arch {
#define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
/* vcpu running in HYP context */
#define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
+/* Save BRBE context if active */
+#define DEBUG_STATE_SAVE_BRBE __vcpu_single_flag(iflags, BIT(8))

/* SVE enabled for host EL0 */
#define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 8725291cb00a..99f85d8acbf3 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -335,10 +335,15 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
!(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
+
+ /* Check if we have BRBE implemented and available at the host */
+ if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT))
+ vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_BRBE);
}

void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
{
vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
+ vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_BRBE);
}
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 4558c02eb352..79bcf0fb1326 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -79,6 +79,34 @@ static void __debug_restore_trace(u64 trfcr_el1)
write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1);
}

+static void __debug_save_brbe(u64 *brbcr_el1)
+{
+ *brbcr_el1 = 0;
+
+ /* Check if the BRBE is enabled */
+ if (!(read_sysreg_s(SYS_BRBCR_EL1) & (BRBCR_ELx_E0BRE | BRBCR_ELx_ExBRE)))
+ return;
+
+ /*
+ * Prohibit branch record generation while we are in guest.
+ * Since access to BRBCR_EL1 is trapped, the guest can't
+ * modify the filtering set by the host.
+ */
+ *brbcr_el1 = read_sysreg_s(SYS_BRBCR_EL1);
+ write_sysreg_s(0, SYS_BRBCR_EL1);
+ isb();
+}
+
+static void __debug_restore_brbe(u64 brbcr_el1)
+{
+ if (!brbcr_el1)
+ return;
+
+ /* Restore BRBE controls */
+ write_sysreg_s(brbcr_el1, SYS_BRBCR_EL1);
+ isb();
+}
+
void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
/* Disable and flush SPE data generation */
@@ -87,6 +115,9 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
/* Disable and flush Self-Hosted Trace generation */
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
+ /* Disable BRBE branch records */
+ if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_BRBE))
+ __debug_save_brbe(&vcpu->arch.host_debug_state.brbcr_el1);
}

void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
@@ -100,6 +131,8 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
+ if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_BRBE))
+ __debug_restore_brbe(vcpu->arch.host_debug_state.brbcr_el1);
}

void __debug_switch_to_host(struct kvm_vcpu *vcpu)
--
2.25.1


2024-01-25 09:51:42

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V16 6/8] perf: test: Speed up running brstack test on an Arm model

From: James Clark <[email protected]>

The test runs quite slowly in the model, so replace "xargs -n1" with
"tr ' ' '\n'" which does the same thing but in single digit minutes
instead of double digit minutes.

Also reduce the number of loops in the test application. Unfortunately
this causes intermittent failures on x86, presumably because the
sampling interval is too big to pickup any loops, so keep it the same
there.

Cc: Mark Rutland <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: James Clark <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
---
tools/perf/tests/shell/test_brstack.sh | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/shell/test_brstack.sh b/tools/perf/tests/shell/test_brstack.sh
index 5f14d0cb013f..5ea64d0c4a6f 100755
--- a/tools/perf/tests/shell/test_brstack.sh
+++ b/tools/perf/tests/shell/test_brstack.sh
@@ -18,7 +18,6 @@ fi
skip_test_missing_symbol brstack_bench

TMPDIR=$(mktemp -d /tmp/__perf_test.program.XXXXX)
-TESTPROG="perf test -w brstack"

cleanup() {
rm -rf $TMPDIR
@@ -26,11 +25,21 @@ cleanup() {

trap cleanup EXIT TERM INT

+is_arm64() {
+ uname -m | grep -q aarch64
+}
+
+if is_arm64; then
+ TESTPROG="perf test -w brstack 5000"
+else
+ TESTPROG="perf test -w brstack"
+fi
+
test_user_branches() {
echo "Testing user branch stack sampling"

perf record -o $TMPDIR/perf.data --branch-filter any,save_type,u -- ${TESTPROG} > /dev/null 2>&1
- perf script -i $TMPDIR/perf.data --fields brstacksym | xargs -n1 > $TMPDIR/perf.script
+ perf script -i $TMPDIR/perf.data --fields brstacksym | tr ' ' '\n' > $TMPDIR/perf.script

# example of branch entries:
# brstack_foo+0x14/brstack_bar+0x40/P/-/-/0/CALL
@@ -59,7 +68,7 @@ test_filter() {
echo "Testing branch stack filtering permutation ($test_filter_filter,$test_filter_expect)"

perf record -o $TMPDIR/perf.data --branch-filter $test_filter_filter,save_type,u -- ${TESTPROG} > /dev/null 2>&1
- perf script -i $TMPDIR/perf.data --fields brstack | xargs -n1 > $TMPDIR/perf.script
+ perf script -i $TMPDIR/perf.data --fields brstack | tr ' ' '\n' > $TMPDIR/perf.script

# fail if we find any branch type that doesn't match any of the expected ones
# also consider UNKNOWN branch types (-)
--
2.25.1


2024-01-25 09:52:32

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V16 8/8] perf: test: Extend branch stack sampling test for Arm64 BRBE

From: James Clark <[email protected]>

Add Arm64 BRBE-specific testing to the existing branch stack sampling test.
The test currently passes on the Arm FVP RevC model, but no hardware has
been tested yet.

Cc: Mark Rutland <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Co-developed-by: German Gomez <[email protected]>
Signed-off-by: German Gomez <[email protected]>
Signed-off-by: James Clark <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
---
tools/perf/tests/builtin-test.c | 1 +
tools/perf/tests/shell/test_brstack.sh | 42 ++++++++++++++++++++++++--
tools/perf/tests/tests.h | 1 +
tools/perf/tests/workloads/Build | 2 ++
tools/perf/tests/workloads/traploop.c | 39 ++++++++++++++++++++++++
5 files changed, 82 insertions(+), 3 deletions(-)
create mode 100644 tools/perf/tests/workloads/traploop.c

diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 4a5973f9bb9b..bd7202ff5cca 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -139,6 +139,7 @@ static struct test_workload *workloads[] = {
&workload__sqrtloop,
&workload__brstack,
&workload__datasym,
+ &workload__traploop
};

static int num_subtests(const struct test_suite *t)
diff --git a/tools/perf/tests/shell/test_brstack.sh b/tools/perf/tests/shell/test_brstack.sh
index 928790f35747..6a4069c930e8 100755
--- a/tools/perf/tests/shell/test_brstack.sh
+++ b/tools/perf/tests/shell/test_brstack.sh
@@ -53,12 +53,43 @@ test_user_branches() {
grep -E -m1 "^brstack_foo\+[^ ]*/brstack_bench\+[^ ]*/RET/.*$" $TMPDIR/perf.script
grep -E -m1 "^brstack_bench\+[^ ]*/brstack_bench\+[^ ]*/COND/.*$" $TMPDIR/perf.script
grep -E -m1 "^brstack\+[^ ]*/brstack\+[^ ]*/UNCOND/.*$" $TMPDIR/perf.script
+
+ if is_arm64; then
+ # in arm64 with BRBE, we get IRQ entries that correspond
+ # to any point in the process
+ grep -m1 "/IRQ/" $TMPDIR/perf.script
+ fi
set +x

# some branch types are still not being tested:
# IND COND_CALL COND_RET SYSCALL SYSRET IRQ SERROR NO_TX
}

+test_arm64_trap_eret_branches() {
+ echo "Testing trap & eret branches (arm64 brbe)"
+ perf record -o $TMPDIR/perf.data --branch-filter any,save_type,u -- \
+ perf test -w traploop 250
+ perf script -i $TMPDIR/perf.data --fields brstacksym | tr ' ' '\n' > $TMPDIR/perf.script
+ set -x
+ # BRBINF<n>.TYPE == TRAP are mapped to PERF_BR_SYSCALL by the BRBE driver
+ grep -E -m1 "^trap_bench\+[^ ]*/\[unknown\][^ ]*/SYSCALL/" $TMPDIR/perf.script
+ grep -E -m1 "^\[unknown\][^ ]*/trap_bench\+[^ ]*/ERET/" $TMPDIR/perf.script
+ set +x
+}
+
+test_arm64_kernel_branches() {
+ echo "Testing kernel branches (arm64 brbe)"
+ # skip if perf doesn't have enough privileges
+ if ! perf record --branch-filter any,k -o- -- true > /dev/null; then
+ echo "[skipped: not enough privileges]"
+ return 0
+ fi
+ perf record -o $TMPDIR/perf.data --branch-filter any,k -- uname -a
+ perf script -i $TMPDIR/perf.data --fields brstack | tr ' ' '\n' > $TMPDIR/perf.script
+ grep -E -m1 "0xffff[0-9a-f]{12}" $TMPDIR/perf.script
+ ! egrep -E -m1 "0x0000[0-9a-f]{12}" $TMPDIR/perf.script
+}
+
# first argument <arg0> is the argument passed to "--branch-stack <arg0>,save_type,u"
# second argument are the expected branch types for the given filter
test_filter() {
@@ -81,11 +112,16 @@ set -e

test_user_branches

-test_filter "any_call" "CALL|IND_CALL|COND_CALL|SYSCALL|IRQ"
+if is_arm64; then
+ test_arm64_trap_eret_branches
+ test_arm64_kernel_branches
+fi
+
+test_filter "any_call" "CALL|IND_CALL|COND_CALL|SYSCALL|IRQ|FAULT_DATA|FAULT_INST"
test_filter "call" "CALL|SYSCALL"
test_filter "cond" "COND"
test_filter "any_ret" "RET|COND_RET|SYSRET|ERET"

test_filter "call,cond" "CALL|SYSCALL|COND"
-test_filter "any_call,cond" "CALL|IND_CALL|COND_CALL|IRQ|SYSCALL|COND"
-test_filter "cond,any_call,any_ret" "COND|CALL|IND_CALL|COND_CALL|SYSCALL|IRQ|RET|COND_RET|SYSRET|ERET"
+test_filter "any_call,cond" "CALL|IND_CALL|COND_CALL|IRQ|SYSCALL|COND|FAULT_DATA|FAULT_INST"
+test_filter "cond,any_call,any_ret" "COND|CALL|IND_CALL|COND_CALL|SYSCALL|IRQ|RET|COND_RET|SYSRET|ERET|FAULT_DATA|FAULT_INST"
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index dad3d7414142..6d3d575352d5 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -205,6 +205,7 @@ DECLARE_WORKLOAD(leafloop);
DECLARE_WORKLOAD(sqrtloop);
DECLARE_WORKLOAD(brstack);
DECLARE_WORKLOAD(datasym);
+DECLARE_WORKLOAD(traploop);

extern const char *dso_to_test;
extern const char *test_objdump_path;
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index a1f34d5861e3..a9dc93d8468b 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -6,8 +6,10 @@ perf-y += leafloop.o
perf-y += sqrtloop.o
perf-y += brstack.o
perf-y += datasym.o
+perf-y += traploop.o

CFLAGS_sqrtloop.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE
CFLAGS_leafloop.o = -g -O0 -fno-inline -fno-omit-frame-pointer -U_FORTIFY_SOURCE
CFLAGS_brstack.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE
CFLAGS_datasym.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE
+CFLAGS_traploop.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE
diff --git a/tools/perf/tests/workloads/traploop.c b/tools/perf/tests/workloads/traploop.c
new file mode 100644
index 000000000000..7dac94897e49
--- /dev/null
+++ b/tools/perf/tests/workloads/traploop.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdlib.h>
+#include "../tests.h"
+
+#define BENCH_RUNS 999999
+
+static volatile int cnt;
+
+#ifdef __aarch64__
+static void trap_bench(void)
+{
+ unsigned long val;
+
+ asm("mrs %0, ID_AA64ISAR0_EL1" : "=r" (val)); /* TRAP + ERET */
+}
+#else
+static void trap_bench(void)
+{
+
+}
+#endif
+
+static int traploop(int argc, const char **argv)
+{
+ int num_loops = BENCH_RUNS;
+
+ if (argc > 0)
+ num_loops = atoi(argv[0]);
+
+ while (1) {
+ if ((cnt++) > num_loops)
+ break;
+
+ trap_bench();
+ }
+ return 0;
+}
+
+DEFINE_WORKLOAD(traploop);
--
2.25.1


2024-01-25 10:11:49

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V16 7/8] perf: test: Remove empty lines from branch filter test output

From: James Clark <[email protected]>

In the perf script command, spaces are turned into newlines. But when
there is a double space this results in empty lines which fail the
following inverse grep test, so strip the empty lines.

Cc: Mark Rutland <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: James Clark <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
---
tools/perf/tests/shell/test_brstack.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/tests/shell/test_brstack.sh b/tools/perf/tests/shell/test_brstack.sh
index 5ea64d0c4a6f..928790f35747 100755
--- a/tools/perf/tests/shell/test_brstack.sh
+++ b/tools/perf/tests/shell/test_brstack.sh
@@ -68,7 +68,7 @@ test_filter() {
echo "Testing branch stack filtering permutation ($test_filter_filter,$test_filter_expect)"

perf record -o $TMPDIR/perf.data --branch-filter $test_filter_filter,save_type,u -- ${TESTPROG} > /dev/null 2>&1
- perf script -i $TMPDIR/perf.data --fields brstack | tr ' ' '\n' > $TMPDIR/perf.script
+ perf script -i $TMPDIR/perf.data --fields brstack | tr ' ' '\n' | sed '/^[[:space:]]*$/d' > $TMPDIR/perf.script

# fail if we find any branch type that doesn't match any of the expected ones
# also consider UNKNOWN branch types (-)
--
2.25.1


2024-01-25 10:14:23

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V16 1/8] arm64/sysreg: Add BRBE registers and fields

This adds BRBE related register definitions and various other related field
macros there in. These will be used subsequently in a BRBE driver, which is
being added later on.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
Changes in V16:

- Updated BRBINFx_EL1.TYPE = 0b110000 as field IMPDEF_TRAP_EL3
- Updated BRBCR_ELx[9] as field FZPSS
- Updated BRBINFINJ_EL1 to use sysreg field BRBINFx_EL1

arch/arm64/include/asm/sysreg.h | 109 ++++++++++++++++++++++++++
arch/arm64/tools/sysreg | 131 ++++++++++++++++++++++++++++++++
2 files changed, 240 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index c3b19b376c86..72544b5c4951 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -272,6 +272,109 @@

#define SYS_BRBCR_EL2 sys_reg(2, 4, 9, 0, 0)

+#define __SYS_BRBINF(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 0))
+#define __SYS_BRBSRC(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 1))
+#define __SYS_BRBTGT(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 2))
+
+#define SYS_BRBINF0_EL1 __SYS_BRBINF(0)
+#define SYS_BRBINF1_EL1 __SYS_BRBINF(1)
+#define SYS_BRBINF2_EL1 __SYS_BRBINF(2)
+#define SYS_BRBINF3_EL1 __SYS_BRBINF(3)
+#define SYS_BRBINF4_EL1 __SYS_BRBINF(4)
+#define SYS_BRBINF5_EL1 __SYS_BRBINF(5)
+#define SYS_BRBINF6_EL1 __SYS_BRBINF(6)
+#define SYS_BRBINF7_EL1 __SYS_BRBINF(7)
+#define SYS_BRBINF8_EL1 __SYS_BRBINF(8)
+#define SYS_BRBINF9_EL1 __SYS_BRBINF(9)
+#define SYS_BRBINF10_EL1 __SYS_BRBINF(10)
+#define SYS_BRBINF11_EL1 __SYS_BRBINF(11)
+#define SYS_BRBINF12_EL1 __SYS_BRBINF(12)
+#define SYS_BRBINF13_EL1 __SYS_BRBINF(13)
+#define SYS_BRBINF14_EL1 __SYS_BRBINF(14)
+#define SYS_BRBINF15_EL1 __SYS_BRBINF(15)
+#define SYS_BRBINF16_EL1 __SYS_BRBINF(16)
+#define SYS_BRBINF17_EL1 __SYS_BRBINF(17)
+#define SYS_BRBINF18_EL1 __SYS_BRBINF(18)
+#define SYS_BRBINF19_EL1 __SYS_BRBINF(19)
+#define SYS_BRBINF20_EL1 __SYS_BRBINF(20)
+#define SYS_BRBINF21_EL1 __SYS_BRBINF(21)
+#define SYS_BRBINF22_EL1 __SYS_BRBINF(22)
+#define SYS_BRBINF23_EL1 __SYS_BRBINF(23)
+#define SYS_BRBINF24_EL1 __SYS_BRBINF(24)
+#define SYS_BRBINF25_EL1 __SYS_BRBINF(25)
+#define SYS_BRBINF26_EL1 __SYS_BRBINF(26)
+#define SYS_BRBINF27_EL1 __SYS_BRBINF(27)
+#define SYS_BRBINF28_EL1 __SYS_BRBINF(28)
+#define SYS_BRBINF29_EL1 __SYS_BRBINF(29)
+#define SYS_BRBINF30_EL1 __SYS_BRBINF(30)
+#define SYS_BRBINF31_EL1 __SYS_BRBINF(31)
+
+#define SYS_BRBSRC0_EL1 __SYS_BRBSRC(0)
+#define SYS_BRBSRC1_EL1 __SYS_BRBSRC(1)
+#define SYS_BRBSRC2_EL1 __SYS_BRBSRC(2)
+#define SYS_BRBSRC3_EL1 __SYS_BRBSRC(3)
+#define SYS_BRBSRC4_EL1 __SYS_BRBSRC(4)
+#define SYS_BRBSRC5_EL1 __SYS_BRBSRC(5)
+#define SYS_BRBSRC6_EL1 __SYS_BRBSRC(6)
+#define SYS_BRBSRC7_EL1 __SYS_BRBSRC(7)
+#define SYS_BRBSRC8_EL1 __SYS_BRBSRC(8)
+#define SYS_BRBSRC9_EL1 __SYS_BRBSRC(9)
+#define SYS_BRBSRC10_EL1 __SYS_BRBSRC(10)
+#define SYS_BRBSRC11_EL1 __SYS_BRBSRC(11)
+#define SYS_BRBSRC12_EL1 __SYS_BRBSRC(12)
+#define SYS_BRBSRC13_EL1 __SYS_BRBSRC(13)
+#define SYS_BRBSRC14_EL1 __SYS_BRBSRC(14)
+#define SYS_BRBSRC15_EL1 __SYS_BRBSRC(15)
+#define SYS_BRBSRC16_EL1 __SYS_BRBSRC(16)
+#define SYS_BRBSRC17_EL1 __SYS_BRBSRC(17)
+#define SYS_BRBSRC18_EL1 __SYS_BRBSRC(18)
+#define SYS_BRBSRC19_EL1 __SYS_BRBSRC(19)
+#define SYS_BRBSRC20_EL1 __SYS_BRBSRC(20)
+#define SYS_BRBSRC21_EL1 __SYS_BRBSRC(21)
+#define SYS_BRBSRC22_EL1 __SYS_BRBSRC(22)
+#define SYS_BRBSRC23_EL1 __SYS_BRBSRC(23)
+#define SYS_BRBSRC24_EL1 __SYS_BRBSRC(24)
+#define SYS_BRBSRC25_EL1 __SYS_BRBSRC(25)
+#define SYS_BRBSRC26_EL1 __SYS_BRBSRC(26)
+#define SYS_BRBSRC27_EL1 __SYS_BRBSRC(27)
+#define SYS_BRBSRC28_EL1 __SYS_BRBSRC(28)
+#define SYS_BRBSRC29_EL1 __SYS_BRBSRC(29)
+#define SYS_BRBSRC30_EL1 __SYS_BRBSRC(30)
+#define SYS_BRBSRC31_EL1 __SYS_BRBSRC(31)
+
+#define SYS_BRBTGT0_EL1 __SYS_BRBTGT(0)
+#define SYS_BRBTGT1_EL1 __SYS_BRBTGT(1)
+#define SYS_BRBTGT2_EL1 __SYS_BRBTGT(2)
+#define SYS_BRBTGT3_EL1 __SYS_BRBTGT(3)
+#define SYS_BRBTGT4_EL1 __SYS_BRBTGT(4)
+#define SYS_BRBTGT5_EL1 __SYS_BRBTGT(5)
+#define SYS_BRBTGT6_EL1 __SYS_BRBTGT(6)
+#define SYS_BRBTGT7_EL1 __SYS_BRBTGT(7)
+#define SYS_BRBTGT8_EL1 __SYS_BRBTGT(8)
+#define SYS_BRBTGT9_EL1 __SYS_BRBTGT(9)
+#define SYS_BRBTGT10_EL1 __SYS_BRBTGT(10)
+#define SYS_BRBTGT11_EL1 __SYS_BRBTGT(11)
+#define SYS_BRBTGT12_EL1 __SYS_BRBTGT(12)
+#define SYS_BRBTGT13_EL1 __SYS_BRBTGT(13)
+#define SYS_BRBTGT14_EL1 __SYS_BRBTGT(14)
+#define SYS_BRBTGT15_EL1 __SYS_BRBTGT(15)
+#define SYS_BRBTGT16_EL1 __SYS_BRBTGT(16)
+#define SYS_BRBTGT17_EL1 __SYS_BRBTGT(17)
+#define SYS_BRBTGT18_EL1 __SYS_BRBTGT(18)
+#define SYS_BRBTGT19_EL1 __SYS_BRBTGT(19)
+#define SYS_BRBTGT20_EL1 __SYS_BRBTGT(20)
+#define SYS_BRBTGT21_EL1 __SYS_BRBTGT(21)
+#define SYS_BRBTGT22_EL1 __SYS_BRBTGT(22)
+#define SYS_BRBTGT23_EL1 __SYS_BRBTGT(23)
+#define SYS_BRBTGT24_EL1 __SYS_BRBTGT(24)
+#define SYS_BRBTGT25_EL1 __SYS_BRBTGT(25)
+#define SYS_BRBTGT26_EL1 __SYS_BRBTGT(26)
+#define SYS_BRBTGT27_EL1 __SYS_BRBTGT(27)
+#define SYS_BRBTGT28_EL1 __SYS_BRBTGT(28)
+#define SYS_BRBTGT29_EL1 __SYS_BRBTGT(29)
+#define SYS_BRBTGT30_EL1 __SYS_BRBTGT(30)
+#define SYS_BRBTGT31_EL1 __SYS_BRBTGT(31)
+
#define SYS_MIDR_EL1 sys_reg(3, 0, 0, 0, 0)
#define SYS_MPIDR_EL1 sys_reg(3, 0, 0, 0, 5)
#define SYS_REVIDR_EL1 sys_reg(3, 0, 0, 0, 6)
@@ -794,6 +897,12 @@
#define OP_COSP_RCTX sys_insn(1, 3, 7, 3, 6)
#define OP_CPP_RCTX sys_insn(1, 3, 7, 3, 7)

+/*
+ * BRBE Instructions
+ */
+#define BRB_IALL_INSN __emit_inst(0xd5000000 | OP_BRB_IALL | (0x1f))
+#define BRB_INJ_INSN __emit_inst(0xd5000000 | OP_BRB_INJ | (0x1f))
+
/* Common SCTLR_ELx flags. */
#define SCTLR_ELx_ENTP2 (BIT(60))
#define SCTLR_ELx_DSSBS (BIT(44))
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 4c9b67934367..caf851ba5dc0 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -1023,6 +1023,137 @@ UnsignedEnum 3:0 MTEPERM
EndEnum
EndSysreg

+
+SysregFields BRBINFx_EL1
+Res0 63:47
+Field 46 CCU
+Field 45:32 CC
+Res0 31:18
+Field 17 LASTFAILED
+Field 16 T
+Res0 15:14
+Enum 13:8 TYPE
+ 0b000000 UNCOND_DIRECT
+ 0b000001 INDIRECT
+ 0b000010 DIRECT_LINK
+ 0b000011 INDIRECT_LINK
+ 0b000101 RET
+ 0b000111 ERET
+ 0b001000 COND_DIRECT
+ 0b100001 DEBUG_HALT
+ 0b100010 CALL
+ 0b100011 TRAP
+ 0b100100 SERROR
+ 0b100110 INSN_DEBUG
+ 0b100111 DATA_DEBUG
+ 0b101010 ALIGN_FAULT
+ 0b101011 INSN_FAULT
+ 0b101100 DATA_FAULT
+ 0b101110 IRQ
+ 0b101111 FIQ
+ 0b110000 IMPDEF_TRAP_EL3
+ 0b111001 DEBUG_EXIT
+EndEnum
+Enum 7:6 EL
+ 0b00 EL0
+ 0b01 EL1
+ 0b10 EL2
+ 0b11 EL3
+EndEnum
+Field 5 MPRED
+Res0 4:2
+Enum 1:0 VALID
+ 0b00 NONE
+ 0b01 TARGET
+ 0b10 SOURCE
+ 0b11 FULL
+EndEnum
+EndSysregFields
+
+SysregFields BRBCR_ELx
+Res0 63:24
+Field 23 EXCEPTION
+Field 22 ERTN
+Res0 21:10
+Field 9 FZPSS
+Field 8 FZP
+Res0 7
+Enum 6:5 TS
+ 0b01 VIRTUAL
+ 0b10 GUEST_PHYSICAL
+ 0b11 PHYSICAL
+EndEnum
+Field 4 MPRED
+Field 3 CC
+Res0 2
+Field 1 ExBRE
+Field 0 E0BRE
+EndSysregFields
+
+Sysreg BRBCR_EL2 2 4 9 0 0
+Fields BRBCR_ELx
+EndSysreg
+
+Sysreg BRBCR_EL1 2 1 9 0 0
+Fields BRBCR_ELx
+EndSysreg
+
+Sysreg BRBCR_EL12 2 5 9 0 0
+Fields BRBCR_ELx
+EndSysreg
+
+Sysreg BRBFCR_EL1 2 1 9 0 1
+Res0 63:30
+Enum 29:28 BANK
+ 0b0 FIRST
+ 0b1 SECOND
+EndEnum
+Res0 27:23
+Field 22 CONDDIR
+Field 21 DIRCALL
+Field 20 INDCALL
+Field 19 RTN
+Field 18 INDIRECT
+Field 17 DIRECT
+Field 16 EnI
+Res0 15:8
+Field 7 PAUSED
+Field 6 LASTFAILED
+Res0 5:0
+EndSysreg
+
+Sysreg BRBTS_EL1 2 1 9 0 2
+Field 63:0 TS
+EndSysreg
+
+Sysreg BRBINFINJ_EL1 2 1 9 1 0
+Fields BRBINFx_EL1
+EndSysreg
+
+Sysreg BRBSRCINJ_EL1 2 1 9 1 1
+Field 63:0 ADDRESS
+EndSysreg
+
+Sysreg BRBTGTINJ_EL1 2 1 9 1 2
+Field 63:0 ADDRESS
+EndSysreg
+
+Sysreg BRBIDR0_EL1 2 1 9 2 0
+Res0 63:16
+Enum 15:12 CC
+ 0b101 20_BIT
+EndEnum
+Enum 11:8 FORMAT
+ 0b0 0
+EndEnum
+Enum 7:0 NUMREC
+ 0b0001000 8
+ 0b0010000 16
+ 0b0100000 32
+ 0b1000000 64
+EndEnum
+EndSysreg
+
Sysreg ID_AA64ZFR0_EL1 3 0 0 4 4
Res0 63:60
UnsignedEnum 59:56 F64MM
--
2.25.1


2024-01-25 13:49:22

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH V16 3/8] drivers: perf: arm_pmuv3: Enable branch stack sampling framework

On 25/01/2024 09:41, Anshuman Khandual wrote:
> Branch stack sampling support i.e capturing branch records during execution
> in core perf, rides along with normal HW events being scheduled on the PMU.
> This prepares ARMV8 PMU framework for branch stack support on relevant PMUs
> with required HW implementation.
>
> ARMV8 PMU hardware support for branch stack sampling is indicated via a new
> feature flag called 'has_branch_stack' that can be ascertained via probing.
> This modifies current gate in armpmu_event_init() which blocks branch stack
> sampling based perf events unconditionally. Instead allows such perf events
> getting initialized on supporting PMU hardware.
>
> Branch stack sampling is enabled and disabled along with regular PMU events
> . This adds required function callbacks in armv8pmu_branch_xxx() format, to
> drive the PMU branch stack hardware when supported. This also adds fallback
> stub definitions for these callbacks for PMUs which would not have required
> support.
>
> If a task gets scheduled out, the current branch records get saved in the
> task's context data, which can be later used to fill in the records upon an
> event overflow. Hence, we enable PERF_ATTACH_TASK_DATA (event->attach_state
> based flag) for branch stack requesting perf events. But this also requires
> adding support for pmu::sched_task() callback to arm_pmu.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> Changes in V16:
>
> - Renamed arm_brbe.h as arm_pmuv3_branch.h
> - Updated perf_sample_save_brstack()'s new argument requirements with NULL
>
> drivers/perf/arm_pmu.c | 57 ++++++++++++-
> drivers/perf/arm_pmuv3.c | 141 +++++++++++++++++++++++++++++++-
> drivers/perf/arm_pmuv3_branch.h | 50 +++++++++++
> include/linux/perf/arm_pmu.h | 29 ++++++-
> include/linux/perf/arm_pmuv3.h | 1 -
> 5 files changed, 273 insertions(+), 5 deletions(-)
> create mode 100644 drivers/perf/arm_pmuv3_branch.h
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 8458fe2cebb4..16f488ae7747 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -317,6 +317,15 @@ armpmu_del(struct perf_event *event, int flags)
> struct hw_perf_event *hwc = &event->hw;
> int idx = hwc->idx;
>
> + if (has_branch_stack(event)) {
> + WARN_ON_ONCE(!hw_events->brbe_users);
> + hw_events->brbe_users--;

^^ Should we do this only if the event matches the sample type ? Put the
other way around, what does brbe_users track ? The number of events that
"share" the BRBE instance ? Or the number of active events that
has_branch_stack() ?

> + if (!hw_events->brbe_users) {
> + hw_events->brbe_context = NULL;
> + hw_events->brbe_sample_type = 0;
> + }
> + }
> +
> armpmu_stop(event, PERF_EF_UPDATE);
> hw_events->events[idx] = NULL;
> armpmu->clear_event_idx(hw_events, event);
> @@ -333,6 +342,38 @@ armpmu_add(struct perf_event *event, int flags)
> struct hw_perf_event *hwc = &event->hw;
> int idx;
>
> + if (has_branch_stack(event)) {
> + /*
> + * Reset branch records buffer if a new CPU bound event
> + * gets scheduled on a PMU. Otherwise existing branch
> + * records present in the buffer might just leak into
> + * such events.
> + *
> + * Also reset current 'hw_events->brbe_context' because
> + * any previous task bound event now would have lost an
> + * opportunity for continuous branch records.
> + */
> + if (!event->ctx->task) {
> + hw_events->brbe_context = NULL;
> + if (armpmu->branch_reset)
> + armpmu->branch_reset();
> + }
> +
> + /*
> + * Reset branch records buffer if a new task event gets
> + * scheduled on a PMU which might have existing records.
> + * Otherwise older branch records present in the buffer
> + * might leak into the new task event.
> + */
> + if (event->ctx->task && hw_events->brbe_context != event->ctx) {
> + hw_events->brbe_context = event->ctx;
> + if (armpmu->branch_reset)
> + armpmu->branch_reset();
> + }
> + hw_events->brbe_users++;
> + hw_events->brbe_sample_type = event->attr.branch_sample_type;
> + }
> +
> /* An event following a process won't be stopped earlier */
> if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
> return -ENOENT;
> @@ -511,13 +552,24 @@ static int armpmu_event_init(struct perf_event *event)
> !cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
> return -ENOENT;
>
> - /* does not support taken branch sampling */
> - if (has_branch_stack(event))
> + /*
> + * Branch stack sampling events are allowed
> + * only on PMU which has required support.
> + */
> + if (has_branch_stack(event) && !armpmu->has_branch_stack)
> return -EOPNOTSUPP;
>
> return __hw_perf_event_init(event);
> }
>
> +static void armpmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
> +{
> + struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
> +
> + if (armpmu->sched_task)
> + armpmu->sched_task(pmu_ctx, sched_in);
> +}
> +
> static void armpmu_enable(struct pmu *pmu)
> {
> struct arm_pmu *armpmu = to_arm_pmu(pmu);
> @@ -864,6 +916,7 @@ struct arm_pmu *armpmu_alloc(void)
> }
>
> pmu->pmu = (struct pmu) {
> + .sched_task = armpmu_sched_task,
> .pmu_enable = armpmu_enable,
> .pmu_disable = armpmu_disable,
> .event_init = armpmu_event_init,
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 23fa6c5da82c..9e17764a0929 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -26,6 +26,7 @@
> #include <linux/nmi.h>
>
> #include <asm/arm_pmuv3.h>
> +#include "arm_pmuv3_branch.h"
>
> /* ARMv8 Cortex-A53 specific event types. */
> #define ARMV8_A53_PERFCTR_PREF_LINEFILL 0xC2
> @@ -829,14 +830,56 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu)
> armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
>
> kvm_vcpu_pmu_resync_el0();
> + if (cpu_pmu->has_branch_stack)
> + armv8pmu_branch_enable(cpu_pmu);

Is there a reason why do this after kvm_vcpu_pmu_resync_el0() ?
Ideally, we should get counting as soon as the PMU is on ?

> }
>
> static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
> {
> + if (cpu_pmu->has_branch_stack)
> + armv8pmu_branch_disable();
> +
> /* Disable all counters */
> armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
> }
>
> +static void read_branch_records(struct pmu_hw_events *cpuc,
> + struct perf_event *event,
> + struct perf_sample_data *data,
> + bool *branch_captured)
> +{
> + /*
> + * CPU specific branch records buffer must have been allocated already
> + * for the hardware records to be captured and processed further.
> + */
> + if (WARN_ON(!cpuc->branches))
> + return;
> +
> + /*
> + * Overflowed event's branch_sample_type does not match the configured
> + * branch filters in the BRBE HW. So the captured branch records here
> + * cannot be co-related to the overflowed event. Report to the user as
> + * if no branch records have been captured, and flush branch records.
> + * The same scenario is applicable when the current task context does
> + * not match with overflown event.
> + */
> + if ((cpuc->brbe_sample_type != event->attr.branch_sample_type) ||
> + (event->ctx->task && cpuc->brbe_context != event->ctx))
> + return;
> +
> + /*
> + * Read the branch records from the hardware once after the PMU IRQ
> + * has been triggered but subsequently same records can be used for
> + * other events that might have been overflowed simultaneously thus
> + * saving much CPU cycles.
> + */
> + if (!*branch_captured) {
> + armv8pmu_branch_read(cpuc, event);
> + *branch_captured = true;
> + }
> + perf_sample_save_brstack(data, event, &cpuc->branches->branch_stack, NULL);
> +}
> +
> static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
> {
> u32 pmovsr;
> @@ -844,6 +887,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
> struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
> struct pt_regs *regs;
> int idx;
> + bool branch_captured = false;
>
> /*
> * Get and reset the IRQ flags
> @@ -887,6 +931,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
> if (!armpmu_event_set_period(event))
> continue;
>
> + /*
> + * PMU IRQ should remain asserted until all branch records
> + * are captured and processed into struct perf_sample_data.
> + */
> + if (has_branch_stack(event) && cpu_pmu->has_branch_stack)

nit: Do we really need the cpu_pmu->has_branch_stack check ? The event
wouldn't reach here if the PMU doesn't supported it ?

> + read_branch_records(cpuc, event, &data, &branch_captured);
> +
> /*
> * Perf event overflow will queue the processing of the event as
> * an irq_work which will be taken care of in the handling of
> @@ -896,6 +947,8 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
> cpu_pmu->disable(event);
> }
> armv8pmu_start(cpu_pmu);
> + if (cpu_pmu->has_branch_stack)
> + armv8pmu_branch_reset();
>
> return IRQ_HANDLED;
> }
> @@ -985,6 +1038,24 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
> return event->hw.idx;
> }
>
> +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
> +{
> + struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
> + void *task_ctx = pmu_ctx->task_ctx_data;
> +
> + if (armpmu->has_branch_stack) {
> + /* Save branch records in task_ctx on sched out */
> + if (task_ctx && !sched_in) {
> + armv8pmu_branch_save(armpmu, task_ctx);
> + return;
> + }
> +
> + /* Reset branch records on sched in */
> + if (sched_in)
> + armv8pmu_branch_reset();
> + }
> +}
> +
> /*
> * Add an event filter to a given event.
> */
> @@ -1077,6 +1148,9 @@ static void armv8pmu_reset(void *info)
> pmcr |= ARMV8_PMU_PMCR_LP;
>
> armv8pmu_pmcr_write(pmcr);
> +
> + if (cpu_pmu->has_branch_stack)
> + armv8pmu_branch_reset();
> }
>
> static int __armv8_pmuv3_map_event_id(struct arm_pmu *armpmu,
> @@ -1114,6 +1188,20 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>
> hw_event_id = __armv8_pmuv3_map_event_id(armpmu, event);
>
> + if (has_branch_stack(event)) {
> + if (!armv8pmu_branch_attr_valid(event))
> + return -EOPNOTSUPP;
> +
> + /*
> + * If a task gets scheduled out, the current branch records
> + * get saved in the task's context data, which can be later
> + * used to fill in the records upon an event overflow. Let's
> + * enable PERF_ATTACH_TASK_DATA in 'event->attach_state' for
> + * all branch stack sampling perf events.
> + */
> + event->attach_state |= PERF_ATTACH_TASK_DATA;
> + }
> +
> /*
> * CHAIN events only work when paired with an adjacent counter, and it
> * never makes sense for a user to open one in isolation, as they'll be
> @@ -1229,6 +1317,41 @@ static void __armv8pmu_probe_pmu(void *info)
> cpu_pmu->reg_pmmir = read_pmmir();
> else
> cpu_pmu->reg_pmmir = 0;
> +
> + /*
> + * BRBE is being probed on a single cpu for a
> + * given PMU. The remaining cpus, are assumed
> + * to have the exact same BRBE implementation.
> + */
> + armv8pmu_branch_probe(cpu_pmu);
> +}
> +
> +static int branch_records_alloc(struct arm_pmu *armpmu)
> +{
> + struct branch_records __percpu *records;
> + int cpu;
> +
> + records = alloc_percpu_gfp(struct branch_records, GFP_KERNEL);
> + if (!records)
> + return -ENOMEM;
> +
> + /*
> + * percpu memory allocated for 'records' gets completely consumed
> + * here, and never required to be freed up later. So permanently
> + * losing access to this anchor i.e 'records' is acceptable.
> + *
> + * Otherwise this allocation handle would have to be saved up for
> + * free_percpu() release later if required.
> + */
> + for_each_possible_cpu(cpu) {
> + struct pmu_hw_events *events_cpu;
> + struct branch_records *records_cpu;
> +
> + events_cpu = per_cpu_ptr(armpmu->hw_events, cpu);
> + records_cpu = per_cpu_ptr(records, cpu);
> + events_cpu->branches = records_cpu;
> + }
> + return 0;
> }
>
> static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
> @@ -1245,7 +1368,21 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
> if (ret)
> return ret;
>
> - return probe.present ? 0 : -ENODEV;
> + if (!probe.present)
> + return -ENODEV;
> +
> + if (cpu_pmu->has_branch_stack) {
> + ret = armv8pmu_task_ctx_cache_alloc(cpu_pmu);
> + if (ret)
> + return ret;
> +
> + ret = branch_records_alloc(cpu_pmu);
> + if (ret) {
> + armv8pmu_task_ctx_cache_free(cpu_pmu);
> + return ret;
> + }
> + }
> + return 0;
> }
>
> static void armv8pmu_disable_user_access_ipi(void *unused)
> @@ -1304,6 +1441,8 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
> cpu_pmu->set_event_filter = armv8pmu_set_event_filter;
>
> cpu_pmu->pmu.event_idx = armv8pmu_user_event_idx;
> + cpu_pmu->sched_task = armv8pmu_sched_task;
> + cpu_pmu->branch_reset = armv8pmu_branch_reset;
>
> cpu_pmu->name = name;
> cpu_pmu->map_event = map_event;
> diff --git a/drivers/perf/arm_pmuv3_branch.h b/drivers/perf/arm_pmuv3_branch.h
> new file mode 100644
> index 000000000000..609e4d4ccac6
> --- /dev/null
> +++ b/drivers/perf/arm_pmuv3_branch.h
> @@ -0,0 +1,50 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Branch Record Buffer Extension Helpers.
> + *
> + * Copyright (C) 2022-2023 ARM Limited
> + *
> + * Author: Anshuman Khandual <[email protected]>
> + */
> +#include <linux/perf/arm_pmu.h>
> +
> +static inline void armv8pmu_branch_reset(void)
> +{
> +}
> +
> +static inline void armv8pmu_branch_probe(struct arm_pmu *arm_pmu)
> +{
> +}
> +
> +static inline bool armv8pmu_branch_attr_valid(struct perf_event *event)
> +{
> + WARN_ON_ONCE(!has_branch_stack(event));
> + return false;
> +}
> +
> +static inline void armv8pmu_branch_enable(struct arm_pmu *arm_pmu)
> +{
> +}
> +
> +static inline void armv8pmu_branch_disable(void)
> +{
> +}
> +
> +static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc,
> + struct perf_event *event)
> +{
> + WARN_ON_ONCE(!has_branch_stack(event));
> +}
> +
> +static inline void armv8pmu_branch_save(struct arm_pmu *arm_pmu, void *ctx)
> +{
> +}
> +
> +static inline int armv8pmu_task_ctx_cache_alloc(struct arm_pmu *arm_pmu)
> +{
> + return 0;
> +}
> +
> +static inline void armv8pmu_task_ctx_cache_free(struct arm_pmu *arm_pmu)
> +{
> +}
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index b3b34f6670cf..8cfcc735c0f7 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -46,6 +46,18 @@ static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_63BIT) == ARMPMU_EVT_63BIT);
> }, \
> }
>
> +/*
> + * Maximum branch record entries which could be processed
> + * for core perf branch stack sampling support, regardless
> + * of the hardware support available on a given ARM PMU.
> + */
> +#define MAX_BRANCH_RECORDS 64
> +
> +struct branch_records {
> + struct perf_branch_stack branch_stack;
> + struct perf_branch_entry branch_entries[MAX_BRANCH_RECORDS];
> +};
> +
> /* The events for a given PMU register set. */
> struct pmu_hw_events {
> /*
> @@ -66,6 +78,17 @@ struct pmu_hw_events {
> struct arm_pmu *percpu_pmu;
>
> int irq;
> +
> + struct branch_records *branches;
> +
> + /* Active context for task events */
> + void *brbe_context;
> +
> + /* Active events requesting branch records */

Please see my comment above.

> + unsigned int brbe_users;
> +
> + /* Active branch sample type filters */
> + unsigned long brbe_sample_type;
> };
>
> enum armpmu_attr_groups {
> @@ -96,8 +119,12 @@ struct arm_pmu {
> void (*stop)(struct arm_pmu *);
> void (*reset)(void *);
> int (*map_event)(struct perf_event *event);
> + void (*sched_task)(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
> + void (*branch_reset)(void);
> int num_events;
> - bool secure_access; /* 32-bit ARM only */
> + unsigned int secure_access : 1, /* 32-bit ARM only */
> + has_branch_stack: 1, /* 64-bit ARM only */
> + reserved : 30;
> #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
> DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
> #define ARMV8_PMUV3_EXT_COMMON_EVENT_BASE 0x4000
> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
> index 46377e134d67..c3e7d2cfb737 100644
> --- a/include/linux/perf/arm_pmuv3.h
> +++ b/include/linux/perf/arm_pmuv3.h
> @@ -308,5 +308,4 @@
> default: WARN(1, "Invalid PMEV* index\n"); \
> } \
> } while (0)
> -

nit: Unrelated change ?

Suzuki

> #endif


2024-01-25 14:21:19

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH V16 1/8] arm64/sysreg: Add BRBE registers and fields

On Thu, Jan 25, 2024 at 03:11:12PM +0530, Anshuman Khandual wrote:
> This adds BRBE related register definitions and various other related field
> macros there in. These will be used subsequently in a BRBE driver, which is
> being added later on.

Checked against DDI0601 2023-12.

Reviewed-by: Mark Brown <[email protected]>


Attachments:
(No filename) (335.00 B)
signature.asc (499.00 B)
Download all attachments

2024-01-29 04:43:20

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V16 3/8] drivers: perf: arm_pmuv3: Enable branch stack sampling framework



On 1/25/24 19:14, Suzuki K Poulose wrote:
> On 25/01/2024 09:41, Anshuman Khandual wrote:
>> Branch stack sampling support i.e capturing branch records during execution
>> in core perf, rides along with normal HW events being scheduled on the PMU.
>> This prepares ARMV8 PMU framework for branch stack support on relevant PMUs
>> with required HW implementation.
>>
>> ARMV8 PMU hardware support for branch stack sampling is indicated via a new
>> feature flag called 'has_branch_stack' that can be ascertained via probing.
>> This modifies current gate in armpmu_event_init() which blocks branch stack
>> sampling based perf events unconditionally. Instead allows such perf events
>> getting initialized on supporting PMU hardware.
>>
>> Branch stack sampling is enabled and disabled along with regular PMU events
>> . This adds required function callbacks in armv8pmu_branch_xxx() format, to
>> drive the PMU branch stack hardware when supported. This also adds fallback
>> stub definitions for these callbacks for PMUs which would not have required
>> support.
>>
>> If a task gets scheduled out, the current branch records get saved in the
>> task's context data, which can be later used to fill in the records upon an
>> event overflow. Hence, we enable PERF_ATTACH_TASK_DATA (event->attach_state
>> based flag) for branch stack requesting perf events. But this also requires
>> adding support for pmu::sched_task() callback to arm_pmu.
>>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Mark Rutland <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> Changes in V16:
>>
>> - Renamed arm_brbe.h as arm_pmuv3_branch.h
>> - Updated perf_sample_save_brstack()'s new argument requirements with NULL
>>
>>   drivers/perf/arm_pmu.c          |  57 ++++++++++++-
>>   drivers/perf/arm_pmuv3.c        | 141 +++++++++++++++++++++++++++++++-
>>   drivers/perf/arm_pmuv3_branch.h |  50 +++++++++++
>>   include/linux/perf/arm_pmu.h    |  29 ++++++-
>>   include/linux/perf/arm_pmuv3.h  |   1 -
>>   5 files changed, 273 insertions(+), 5 deletions(-)
>>   create mode 100644 drivers/perf/arm_pmuv3_branch.h
>>
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 8458fe2cebb4..16f488ae7747 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -317,6 +317,15 @@ armpmu_del(struct perf_event *event, int flags)
>>       struct hw_perf_event *hwc = &event->hw;
>>       int idx = hwc->idx;
>>   +    if (has_branch_stack(event)) {
>> +        WARN_ON_ONCE(!hw_events->brbe_users);
>> +        hw_events->brbe_users--;
>
> ^^ Should we do this only if the event matches the sample type ? Put the other way around, what does brbe_users track ? The number of events that
> "share" the BRBE instance ? Or the number of active events that
> has_branch_stack() ?

Active perf events with has_branch_stack() irrespective of whether
there is branch_sample_type match or not i.e branch records might
be consumed or not.

>
>> +        if (!hw_events->brbe_users) {
>> +            hw_events->brbe_context = NULL;
>> +            hw_events->brbe_sample_type = 0;
>> +        }
>> +    }
>> +
>>       armpmu_stop(event, PERF_EF_UPDATE);
>>       hw_events->events[idx] = NULL;
>>       armpmu->clear_event_idx(hw_events, event);
>> @@ -333,6 +342,38 @@ armpmu_add(struct perf_event *event, int flags)
>>       struct hw_perf_event *hwc = &event->hw;
>>       int idx;
>>   +    if (has_branch_stack(event)) {
>> +        /*
>> +         * Reset branch records buffer if a new CPU bound event
>> +         * gets scheduled on a PMU. Otherwise existing branch
>> +         * records present in the buffer might just leak into
>> +         * such events.
>> +         *
>> +         * Also reset current 'hw_events->brbe_context' because
>> +         * any previous task bound event now would have lost an
>> +         * opportunity for continuous branch records.
>> +         */
>> +        if (!event->ctx->task) {
>> +            hw_events->brbe_context = NULL;
>> +            if (armpmu->branch_reset)
>> +                armpmu->branch_reset();
>> +        }
>> +
>> +        /*
>> +         * Reset branch records buffer if a new task event gets
>> +         * scheduled on a PMU which might have existing records.
>> +         * Otherwise older branch records present in the buffer
>> +         * might leak into the new task event.
>> +         */
>> +        if (event->ctx->task && hw_events->brbe_context != event->ctx) {
>> +            hw_events->brbe_context = event->ctx;
>> +            if (armpmu->branch_reset)
>> +                armpmu->branch_reset();
>> +        }
>> +        hw_events->brbe_users++;
>> +        hw_events->brbe_sample_type = event->attr.branch_sample_type;
>> +    }
>> +
>>       /* An event following a process won't be stopped earlier */
>>       if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
>>           return -ENOENT;
>> @@ -511,13 +552,24 @@ static int armpmu_event_init(struct perf_event *event)
>>           !cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
>>           return -ENOENT;
>>   -    /* does not support taken branch sampling */
>> -    if (has_branch_stack(event))
>> +    /*
>> +     * Branch stack sampling events are allowed
>> +     * only on PMU which has required support.
>> +     */
>> +    if (has_branch_stack(event) && !armpmu->has_branch_stack)
>>           return -EOPNOTSUPP;
>>         return __hw_perf_event_init(event);
>>   }
>>   +static void armpmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
>> +{
>> +    struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
>> +
>> +    if (armpmu->sched_task)
>> +        armpmu->sched_task(pmu_ctx, sched_in);
>> +}
>> +
>>   static void armpmu_enable(struct pmu *pmu)
>>   {
>>       struct arm_pmu *armpmu = to_arm_pmu(pmu);
>> @@ -864,6 +916,7 @@ struct arm_pmu *armpmu_alloc(void)
>>       }
>>         pmu->pmu = (struct pmu) {
>> +        .sched_task    = armpmu_sched_task,
>>           .pmu_enable    = armpmu_enable,
>>           .pmu_disable    = armpmu_disable,
>>           .event_init    = armpmu_event_init,
>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>> index 23fa6c5da82c..9e17764a0929 100644
>> --- a/drivers/perf/arm_pmuv3.c
>> +++ b/drivers/perf/arm_pmuv3.c
>> @@ -26,6 +26,7 @@
>>   #include <linux/nmi.h>
>>     #include <asm/arm_pmuv3.h>
>> +#include "arm_pmuv3_branch.h"
>>     /* ARMv8 Cortex-A53 specific event types. */
>>   #define ARMV8_A53_PERFCTR_PREF_LINEFILL                0xC2
>> @@ -829,14 +830,56 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu)
>>       armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
>>         kvm_vcpu_pmu_resync_el0();
>> +    if (cpu_pmu->has_branch_stack)
>> +        armv8pmu_branch_enable(cpu_pmu);
>
> Is there a reason why do this after kvm_vcpu_pmu_resync_el0() ?
> Ideally, we should get counting as soon as the PMU is on ?

But if the kernel is also being traced, branches from kvm_vcpu_pmu_resync_el0()
might get into final BRBE samples as well. Placing branch_enable() at the last
help avoid such situations. Although the same could also be reasoned about
normal PMU event counters being enabled via armv8pmu_pmcr_write() as well. So
armv8pmu_branch_enable() could be moved before kvm_vcpu_pmu_resync_el0().

>
>>   }
>>     static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
>>   {
>> +    if (cpu_pmu->has_branch_stack)
>> +        armv8pmu_branch_disable();
>> +
>>       /* Disable all counters */
>>       armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
>>   }
>>   +static void read_branch_records(struct pmu_hw_events *cpuc,
>> +                struct perf_event *event,
>> +                struct perf_sample_data *data,
>> +                bool *branch_captured)
>> +{
>> +    /*
>> +     * CPU specific branch records buffer must have been allocated already
>> +     * for the hardware records to be captured and processed further.
>> +     */
>> +    if (WARN_ON(!cpuc->branches))
>> +        return;
>> +
>> +    /*
>> +     * Overflowed event's branch_sample_type does not match the configured
>> +     * branch filters in the BRBE HW. So the captured branch records here
>> +     * cannot be co-related to the overflowed event. Report to the user as
>> +     * if no branch records have been captured, and flush branch records.
>> +     * The same scenario is applicable when the current task context does
>> +     * not match with overflown event.
>> +     */
>> +    if ((cpuc->brbe_sample_type != event->attr.branch_sample_type) ||
>> +        (event->ctx->task && cpuc->brbe_context != event->ctx))
>> +        return;
>> +
>> +    /*
>> +     * Read the branch records from the hardware once after the PMU IRQ
>> +     * has been triggered but subsequently same records can be used for
>> +     * other events that might have been overflowed simultaneously thus
>> +     * saving much CPU cycles.
>> +     */
>> +    if (!*branch_captured) {
>> +        armv8pmu_branch_read(cpuc, event);
>> +        *branch_captured = true;
>> +    }
>> +    perf_sample_save_brstack(data, event, &cpuc->branches->branch_stack, NULL);
>> +}
>> +
>>   static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
>>   {
>>       u32 pmovsr;
>> @@ -844,6 +887,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
>>       struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
>>       struct pt_regs *regs;
>>       int idx;
>> +    bool branch_captured = false;
>>         /*
>>        * Get and reset the IRQ flags
>> @@ -887,6 +931,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
>>           if (!armpmu_event_set_period(event))
>>               continue;
>>   +        /*
>> +         * PMU IRQ should remain asserted until all branch records
>> +         * are captured and processed into struct perf_sample_data.
>> +         */
>> +        if (has_branch_stack(event) && cpu_pmu->has_branch_stack)
>
> nit: Do we really need the cpu_pmu->has_branch_stack check ? The event
> wouldn't reach here if the PMU doesn't supported it ?

This is just an additional check - has_branch_stack() based event might
not have reached here otherwise i.e without the PMU supporting branch
records.

>
>> +            read_branch_records(cpuc, event, &data, &branch_captured);
>> +
>>           /*
>>            * Perf event overflow will queue the processing of the event as
>>            * an irq_work which will be taken care of in the handling of
>> @@ -896,6 +947,8 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
>>               cpu_pmu->disable(event);
>>       }
>>       armv8pmu_start(cpu_pmu);
>> +    if (cpu_pmu->has_branch_stack)
>> +        armv8pmu_branch_reset();
>>         return IRQ_HANDLED;
>>   }
>> @@ -985,6 +1038,24 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
>>       return event->hw.idx;
>>   }
>>   +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
>> +{
>> +    struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
>> +    void *task_ctx = pmu_ctx->task_ctx_data;
>> +
>> +    if (armpmu->has_branch_stack) {
>> +        /* Save branch records in task_ctx on sched out */
>> +        if (task_ctx && !sched_in) {
>> +            armv8pmu_branch_save(armpmu, task_ctx);
>> +            return;
>> +        }
>> +
>> +        /* Reset branch records on sched in */
>> +        if (sched_in)
>> +            armv8pmu_branch_reset();
>> +    }
>> +}
>> +
>>   /*
>>    * Add an event filter to a given event.
>>    */
>> @@ -1077,6 +1148,9 @@ static void armv8pmu_reset(void *info)
>>           pmcr |= ARMV8_PMU_PMCR_LP;
>>         armv8pmu_pmcr_write(pmcr);
>> +
>> +    if (cpu_pmu->has_branch_stack)
>> +        armv8pmu_branch_reset();
>>   }
>>     static int __armv8_pmuv3_map_event_id(struct arm_pmu *armpmu,
>> @@ -1114,6 +1188,20 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>>         hw_event_id = __armv8_pmuv3_map_event_id(armpmu, event);
>>   +    if (has_branch_stack(event)) {
>> +        if (!armv8pmu_branch_attr_valid(event))
>> +            return -EOPNOTSUPP;
>> +
>> +        /*
>> +         * If a task gets scheduled out, the current branch records
>> +         * get saved in the task's context data, which can be later
>> +         * used to fill in the records upon an event overflow. Let's
>> +         * enable PERF_ATTACH_TASK_DATA in 'event->attach_state' for
>> +         * all branch stack sampling perf events.
>> +         */
>> +        event->attach_state |= PERF_ATTACH_TASK_DATA;
>> +    }
>> +
>>       /*
>>        * CHAIN events only work when paired with an adjacent counter, and it
>>        * never makes sense for a user to open one in isolation, as they'll be
>> @@ -1229,6 +1317,41 @@ static void __armv8pmu_probe_pmu(void *info)
>>           cpu_pmu->reg_pmmir = read_pmmir();
>>       else
>>           cpu_pmu->reg_pmmir = 0;
>> +
>> +    /*
>> +     * BRBE is being probed on a single cpu for a
>> +     * given PMU. The remaining cpus, are assumed
>> +     * to have the exact same BRBE implementation.
>> +     */
>> +    armv8pmu_branch_probe(cpu_pmu);
>> +}
>> +
>> +static int branch_records_alloc(struct arm_pmu *armpmu)
>> +{
>> +    struct branch_records __percpu *records;
>> +    int cpu;
>> +
>> +    records = alloc_percpu_gfp(struct branch_records, GFP_KERNEL);
>> +    if (!records)
>> +        return -ENOMEM;
>> +
>> +    /*
>> +     * percpu memory allocated for 'records' gets completely consumed
>> +     * here, and never required to be freed up later. So permanently
>> +     * losing access to this anchor i.e 'records' is acceptable.
>> +     *
>> +     * Otherwise this allocation handle would have to be saved up for
>> +     * free_percpu() release later if required.
>> +     */
>> +    for_each_possible_cpu(cpu) {
>> +        struct pmu_hw_events *events_cpu;
>> +        struct branch_records *records_cpu;
>> +
>> +        events_cpu = per_cpu_ptr(armpmu->hw_events, cpu);
>> +        records_cpu = per_cpu_ptr(records, cpu);
>> +        events_cpu->branches = records_cpu;
>> +    }
>> +    return 0;
>>   }
>>     static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
>> @@ -1245,7 +1368,21 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
>>       if (ret)
>>           return ret;
>>   -    return probe.present ? 0 : -ENODEV;
>> +    if (!probe.present)
>> +        return -ENODEV;
>> +
>> +    if (cpu_pmu->has_branch_stack) {
>> +        ret = armv8pmu_task_ctx_cache_alloc(cpu_pmu);
>> +        if (ret)
>> +            return ret;
>> +
>> +        ret = branch_records_alloc(cpu_pmu);
>> +        if (ret) {
>> +            armv8pmu_task_ctx_cache_free(cpu_pmu);
>> +            return ret;
>> +        }
>> +    }
>> +    return 0;
>>   }
>>     static void armv8pmu_disable_user_access_ipi(void *unused)
>> @@ -1304,6 +1441,8 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
>>       cpu_pmu->set_event_filter    = armv8pmu_set_event_filter;
>>         cpu_pmu->pmu.event_idx        = armv8pmu_user_event_idx;
>> +    cpu_pmu->sched_task        = armv8pmu_sched_task;
>> +    cpu_pmu->branch_reset        = armv8pmu_branch_reset;
>>         cpu_pmu->name            = name;
>>       cpu_pmu->map_event        = map_event;
>> diff --git a/drivers/perf/arm_pmuv3_branch.h b/drivers/perf/arm_pmuv3_branch.h
>> new file mode 100644
>> index 000000000000..609e4d4ccac6
>> --- /dev/null
>> +++ b/drivers/perf/arm_pmuv3_branch.h
>> @@ -0,0 +1,50 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Branch Record Buffer Extension Helpers.
>> + *
>> + * Copyright (C) 2022-2023 ARM Limited
>> + *
>> + * Author: Anshuman Khandual <[email protected]>
>> + */
>> +#include <linux/perf/arm_pmu.h>
>> +
>> +static inline void armv8pmu_branch_reset(void)
>> +{
>> +}
>> +
>> +static inline void armv8pmu_branch_probe(struct arm_pmu *arm_pmu)
>> +{
>> +}
>> +
>> +static inline bool armv8pmu_branch_attr_valid(struct perf_event *event)
>> +{
>> +    WARN_ON_ONCE(!has_branch_stack(event));
>> +    return false;
>> +}
>> +
>> +static inline void armv8pmu_branch_enable(struct arm_pmu *arm_pmu)
>> +{
>> +}
>> +
>> +static inline void armv8pmu_branch_disable(void)
>> +{
>> +}
>> +
>> +static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc,
>> +                    struct perf_event *event)
>> +{
>> +    WARN_ON_ONCE(!has_branch_stack(event));
>> +}
>> +
>> +static inline void armv8pmu_branch_save(struct arm_pmu *arm_pmu, void *ctx)
>> +{
>> +}
>> +
>> +static inline int armv8pmu_task_ctx_cache_alloc(struct arm_pmu *arm_pmu)
>> +{
>> +    return 0;
>> +}
>> +
>> +static inline void armv8pmu_task_ctx_cache_free(struct arm_pmu *arm_pmu)
>> +{
>> +}
>> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
>> index b3b34f6670cf..8cfcc735c0f7 100644
>> --- a/include/linux/perf/arm_pmu.h
>> +++ b/include/linux/perf/arm_pmu.h
>> @@ -46,6 +46,18 @@ static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_63BIT) == ARMPMU_EVT_63BIT);
>>       },                                \
>>   }
>>   +/*
>> + * Maximum branch record entries which could be processed
>> + * for core perf branch stack sampling support, regardless
>> + * of the hardware support available on a given ARM PMU.
>> + */
>> +#define MAX_BRANCH_RECORDS 64
>> +
>> +struct branch_records {
>> +    struct perf_branch_stack    branch_stack;
>> +    struct perf_branch_entry    branch_entries[MAX_BRANCH_RECORDS];
>> +};
>> +
>>   /* The events for a given PMU register set. */
>>   struct pmu_hw_events {
>>       /*
>> @@ -66,6 +78,17 @@ struct pmu_hw_events {
>>       struct arm_pmu        *percpu_pmu;
>>         int irq;
>> +
>> +    struct branch_records    *branches;
>> +
>> +    /* Active context for task events */
>> +    void            *brbe_context;
>> +
>> +    /* Active events requesting branch records */
>
> Please see my comment above.

This is true - brbe_users tracks number of active perf events requesting
branch records, irrespective of whether their branch_sample_type matches
with each other or not.

>
>> +    unsigned int        brbe_users;
>> +
>> +    /* Active branch sample type filters */
>> +    unsigned long        brbe_sample_type;
>>   };
>>     enum armpmu_attr_groups {
>> @@ -96,8 +119,12 @@ struct arm_pmu {
>>       void        (*stop)(struct arm_pmu *);
>>       void        (*reset)(void *);
>>       int        (*map_event)(struct perf_event *event);
>> +    void        (*sched_task)(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
>> +    void        (*branch_reset)(void);
>>       int        num_events;
>> -    bool        secure_access; /* 32-bit ARM only */
>> +    unsigned int    secure_access    : 1, /* 32-bit ARM only */
>> +            has_branch_stack: 1, /* 64-bit ARM only */
>> +            reserved    : 30;
>>   #define ARMV8_PMUV3_MAX_COMMON_EVENTS        0x40
>>       DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
>>   #define ARMV8_PMUV3_EXT_COMMON_EVENT_BASE    0x4000
>> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
>> index 46377e134d67..c3e7d2cfb737 100644
>> --- a/include/linux/perf/arm_pmuv3.h
>> +++ b/include/linux/perf/arm_pmuv3.h
>> @@ -308,5 +308,4 @@
>>           default: WARN(1, "Invalid PMEV* index\n");    \
>>           }                        \
>>       } while (0)
>> -
>
> nit: Unrelated change ?

Right, will fix and fold.

2024-01-29 12:16:13

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions

On 25/01/2024 09:41, Anshuman Khandual wrote:
> Currently BRBE feature is not supported in a guest environment. This hides
> BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field. This also
> blocks guest accesses into BRBE system registers and instructions as if the
> underlying hardware never implemented FEAT_BRBE feature.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Oliver Upton <[email protected]>
> Cc: James Morse <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> Changes in V16:
>
> - Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion
>
> arch/arm64/kvm/sys_regs.c | 56 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 56 insertions(+)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 30253bd19917..6a06dc2f0c06 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1304,6 +1304,11 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
> return 0;
> }
>
> +#define BRB_INF_SRC_TGT_EL1(n) \
> + { SYS_DESC(SYS_BRBINF##n##_EL1), undef_access }, \
> + { SYS_DESC(SYS_BRBSRC##n##_EL1), undef_access }, \
> + { SYS_DESC(SYS_BRBTGT##n##_EL1), undef_access } \
> +
> /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
> #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
> { SYS_DESC(SYS_DBGBVRn_EL1(n)), \
> @@ -1707,6 +1712,9 @@ static u64 read_sanitised_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
> /* Hide SPE from guests */
> val &= ~ID_AA64DFR0_EL1_PMSVer_MASK;
>
> + /* Hide BRBE from guests */
> + val &= ~ID_AA64DFR0_EL1_BRBE_MASK;
> +
> return val;
> }
>
> @@ -2195,6 +2203,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> { SYS_DESC(SYS_DC_CISW), access_dcsw },
> { SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
> { SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
> + { SYS_DESC(OP_BRB_IALL), undef_access },
> + { SYS_DESC(OP_BRB_INJ), undef_access },
>

heads up: This may conflict with Marc's patches to move the sys
instructions to a separate table. But otherwise, looks good to me.


Suzuki


2024-01-29 12:20:26

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH V16 5/8] KVM: arm64: nvhe: Disable branch generation in nVHE guests

On 25/01/2024 09:41, Anshuman Khandual wrote:
> Disable the BRBE before we enter the guest, saving the status and enable it
> back once we get out of the guest. This avoids capturing branch records in
> the guest kernel or userspace, which would be confusing the host samples.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Oliver Upton <[email protected]>
> Cc: James Morse <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> CC: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> Changes in V16:
>
> - Dropped BRBCR_EL1 and BRBFCR_EL1 from enum vcpu_sysreg
> - Reverted back the KVM NVHE patch - used host_debug_state based 'brbcr_el1'
> element, and dropped the previous dependency on Jame's coresight series
>
> arch/arm64/include/asm/kvm_host.h | 5 ++++-
> arch/arm64/kvm/debug.c | 5 +++++
> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 33 ++++++++++++++++++++++++++++++
> 3 files changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 21c57b812569..bce8792092af 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -569,7 +569,7 @@ struct kvm_vcpu_arch {
> u8 cflags;
>
> /* Input flags to the hypervisor code, potentially cleared after use */
> - u8 iflags;
> + u16 iflags;
>
> /* State flags for kernel bookkeeping, unused by the hypervisor code */
> u8 sflags;
> @@ -610,6 +610,7 @@ struct kvm_vcpu_arch {
> u64 pmscr_el1;
> /* Self-hosted trace */
> u64 trfcr_el1;
> + u64 brbcr_el1;
> } host_debug_state;
>
> /* VGIC state */
> @@ -779,6 +780,8 @@ struct kvm_vcpu_arch {
> #define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
> /* vcpu running in HYP context */
> #define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
> +/* Save BRBE context if active */
> +#define DEBUG_STATE_SAVE_BRBE __vcpu_single_flag(iflags, BIT(8))
>
> /* SVE enabled for host EL0 */
> #define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 8725291cb00a..99f85d8acbf3 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -335,10 +335,15 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
> if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
> !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
> vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
> +
> + /* Check if we have BRBE implemented and available at the host */
> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT))
> + vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_BRBE);
> }
>
> void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
> {
> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
> + vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_BRBE);
> }
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 4558c02eb352..79bcf0fb1326 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -79,6 +79,34 @@ static void __debug_restore_trace(u64 trfcr_el1)
> write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1);
> }
>
> +static void __debug_save_brbe(u64 *brbcr_el1)
> +{
> + *brbcr_el1 = 0;
> +
> + /* Check if the BRBE is enabled */
> + if (!(read_sysreg_s(SYS_BRBCR_EL1) & (BRBCR_ELx_E0BRE | BRBCR_ELx_ExBRE)))
> + return;
> +
> + /*
> + * Prohibit branch record generation while we are in guest.
> + * Since access to BRBCR_EL1 is trapped, the guest can't
> + * modify the filtering set by the host.
> + */
> + *brbcr_el1 = read_sysreg_s(SYS_BRBCR_EL1);
> + write_sysreg_s(0, SYS_BRBCR_EL1);
> + isb();

Is this isb() required here ? This can be synchronised with the Guest
entry ?

> +}
> +
> +static void __debug_restore_brbe(u64 brbcr_el1)
> +{
> + if (!brbcr_el1)
> + return;
> +
> + /* Restore BRBE controls */
> + write_sysreg_s(brbcr_el1, SYS_BRBCR_EL1);
> + isb();

Similarly here, exit back to EL1 host can synchronise the setting ?

Suzuki

> +}
> +
> void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
> {
> /* Disable and flush SPE data generation */
> @@ -87,6 +115,9 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
> /* Disable and flush Self-Hosted Trace generation */
> if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
> __debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
> + /* Disable BRBE branch records */
> + if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_BRBE))
> + __debug_save_brbe(&vcpu->arch.host_debug_state.brbcr_el1);
> }
>
> void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
> @@ -100,6 +131,8 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
> __debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
> if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
> __debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
> + if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_BRBE))
> + __debug_restore_brbe(vcpu->arch.host_debug_state.brbcr_el1);
> }
>
> void __debug_switch_to_host(struct kvm_vcpu *vcpu)


2024-01-30 03:41:51

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V16 5/8] KVM: arm64: nvhe: Disable branch generation in nVHE guests



On 1/29/24 17:50, Suzuki K Poulose wrote:
> On 25/01/2024 09:41, Anshuman Khandual wrote:
>> Disable the BRBE before we enter the guest, saving the status and enable it
>> back once we get out of the guest. This avoids capturing branch records in
>> the guest kernel or userspace, which would be confusing the host samples.
>>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Oliver Upton <[email protected]>
>> Cc: James Morse <[email protected]>
>> Cc: Suzuki K Poulose <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> CC: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> Changes in V16:
>>
>> - Dropped BRBCR_EL1 and BRBFCR_EL1 from enum vcpu_sysreg
>> - Reverted back the KVM NVHE patch - used host_debug_state based 'brbcr_el1'
>>    element, and dropped the previous dependency on Jame's coresight series
>>
>>   arch/arm64/include/asm/kvm_host.h  |  5 ++++-
>>   arch/arm64/kvm/debug.c             |  5 +++++
>>   arch/arm64/kvm/hyp/nvhe/debug-sr.c | 33 ++++++++++++++++++++++++++++++
>>   3 files changed, 42 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 21c57b812569..bce8792092af 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -569,7 +569,7 @@ struct kvm_vcpu_arch {
>>       u8 cflags;
>>         /* Input flags to the hypervisor code, potentially cleared after use */
>> -    u8 iflags;
>> +    u16 iflags;
>>         /* State flags for kernel bookkeeping, unused by the hypervisor code */
>>       u8 sflags;
>> @@ -610,6 +610,7 @@ struct kvm_vcpu_arch {
>>           u64 pmscr_el1;
>>           /* Self-hosted trace */
>>           u64 trfcr_el1;
>> +        u64 brbcr_el1;
>>       } host_debug_state;
>>         /* VGIC state */
>> @@ -779,6 +780,8 @@ struct kvm_vcpu_arch {
>>   #define DEBUG_STATE_SAVE_TRBE    __vcpu_single_flag(iflags, BIT(6))
>>   /* vcpu running in HYP context */
>>   #define VCPU_HYP_CONTEXT    __vcpu_single_flag(iflags, BIT(7))
>> +/* Save BRBE context if active  */
>> +#define DEBUG_STATE_SAVE_BRBE    __vcpu_single_flag(iflags, BIT(8))
>>     /* SVE enabled for host EL0 */
>>   #define HOST_SVE_ENABLED    __vcpu_single_flag(sflags, BIT(0))
>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>> index 8725291cb00a..99f85d8acbf3 100644
>> --- a/arch/arm64/kvm/debug.c
>> +++ b/arch/arm64/kvm/debug.c
>> @@ -335,10 +335,15 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
>>       if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
>>           !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
>>           vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
>> +
>> +    /* Check if we have BRBE implemented and available at the host */
>> +    if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT))
>> +        vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_BRBE);
>>   }
>>     void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
>>   {
>>       vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
>>       vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
>> +    vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_BRBE);
>>   }
>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> index 4558c02eb352..79bcf0fb1326 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> @@ -79,6 +79,34 @@ static void __debug_restore_trace(u64 trfcr_el1)
>>       write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1);
>>   }
>>   +static void __debug_save_brbe(u64 *brbcr_el1)
>> +{
>> +    *brbcr_el1 = 0;
>> +
>> +    /* Check if the BRBE is enabled */
>> +    if (!(read_sysreg_s(SYS_BRBCR_EL1) & (BRBCR_ELx_E0BRE | BRBCR_ELx_ExBRE)))
>> +        return;
>> +
>> +    /*
>> +     * Prohibit branch record generation while we are in guest.
>> +     * Since access to BRBCR_EL1 is trapped, the guest can't
>> +     * modify the filtering set by the host.
>> +     */
>> +    *brbcr_el1 = read_sysreg_s(SYS_BRBCR_EL1);
>> +    write_sysreg_s(0, SYS_BRBCR_EL1);
>> +    isb();
>
> Is this isb() required here ? This can be synchronised with the Guest entry ?
>
>> +}
>> +
>> +static void __debug_restore_brbe(u64 brbcr_el1)
>> +{
>> +    if (!brbcr_el1)
>> +        return;
>> +
>> +    /* Restore BRBE controls */
>> +    write_sysreg_s(brbcr_el1, SYS_BRBCR_EL1);
>> +    isb();
>
> Similarly here, exit back to EL1 host can synchronise the setting ?

Sure, will drop both the isb() here.

2024-01-30 03:43:49

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions



On 1/29/24 17:45, Suzuki K Poulose wrote:
> On 25/01/2024 09:41, Anshuman Khandual wrote:
>> Currently BRBE feature is not supported in a guest environment. This hides
>> BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field. This also
>> blocks guest accesses into BRBE system registers and instructions as if the
>> underlying hardware never implemented FEAT_BRBE feature.
>>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Oliver Upton <[email protected]>
>> Cc: James Morse <[email protected]>
>> Cc: Suzuki K Poulose <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> Changes in V16:
>>
>> - Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion
>>
>>   arch/arm64/kvm/sys_regs.c | 56 +++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 56 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 30253bd19917..6a06dc2f0c06 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1304,6 +1304,11 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
>>       return 0;
>>   }
>>   +#define BRB_INF_SRC_TGT_EL1(n)                    \
>> +    { SYS_DESC(SYS_BRBINF##n##_EL1), undef_access },    \
>> +    { SYS_DESC(SYS_BRBSRC##n##_EL1), undef_access },    \
>> +    { SYS_DESC(SYS_BRBTGT##n##_EL1), undef_access }        \
>> +
>>   /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
>>   #define DBG_BCR_BVR_WCR_WVR_EL1(n)                    \
>>       { SYS_DESC(SYS_DBGBVRn_EL1(n)),                    \
>> @@ -1707,6 +1712,9 @@ static u64 read_sanitised_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
>>       /* Hide SPE from guests */
>>       val &= ~ID_AA64DFR0_EL1_PMSVer_MASK;
>>   +    /* Hide BRBE from guests */
>> +    val &= ~ID_AA64DFR0_EL1_BRBE_MASK;
>> +
>>       return val;
>>   }
>>   @@ -2195,6 +2203,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>       { SYS_DESC(SYS_DC_CISW), access_dcsw },
>>       { SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
>>       { SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
>> +    { SYS_DESC(OP_BRB_IALL), undef_access },
>> +    { SYS_DESC(OP_BRB_INJ), undef_access },
>>  
>
> heads up: This may conflict with Marc's patches to move the sys instructions to a separate table. But otherwise, looks good to me.

Sure, will rebase this on recent changes.

2024-02-21 13:52:55

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V16 1/8] arm64/sysreg: Add BRBE registers and fields

On Thu, Jan 25, 2024 at 03:11:12PM +0530, Anshuman Khandual wrote:
> This adds BRBE related register definitions and various other related field
> macros there in. These will be used subsequently in a BRBE driver, which is
> being added later on.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> Changes in V16:
>
> - Updated BRBINFx_EL1.TYPE = 0b110000 as field IMPDEF_TRAP_EL3
> - Updated BRBCR_ELx[9] as field FZPSS
> - Updated BRBINFINJ_EL1 to use sysreg field BRBINFx_EL1
>
> arch/arm64/include/asm/sysreg.h | 109 ++++++++++++++++++++++++++
> arch/arm64/tools/sysreg | 131 ++++++++++++++++++++++++++++++++
> 2 files changed, 240 insertions(+)
>
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index c3b19b376c86..72544b5c4951 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -272,6 +272,109 @@
>
> #define SYS_BRBCR_EL2 sys_reg(2, 4, 9, 0, 0)
>
> +#define __SYS_BRBINF(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 0))
> +#define __SYS_BRBSRC(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 1))
> +#define __SYS_BRBTGT(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 2))

We already have definitions for these since v6.5, added in commit:

57596c8f991c9aac ("arm64: Add debug registers affected by HDFGxTR_EL2:)

That commit also added register encoding definitions:

| #define SYS_BRBINF_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 0))
| #define SYS_BRBINFINJ_EL1 sys_reg(2, 1, 9, 1, 0)
| #define SYS_BRBSRC_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 1))
| #define SYS_BRBSRCINJ_EL1 sys_reg(2, 1, 9, 1, 1)
| #define SYS_BRBTGT_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 2))
| #define SYS_BRBTGTINJ_EL1 sys_reg(2, 1, 9, 1, 2)
| #define SYS_BRBTS_EL1 sys_reg(2, 1, 9, 0, 2)

I don't think we need to add new encoding definitions for BRBINF<n>_EL1,
BRBSRC<n>_EL1, or BRBTGT<n>_EL1; we can just use those existing defintions
directly. That also means we don't need to add all of the expanded 0..31
definitions; the driver can use SYS_BRBINF_EL1(n) and friends directly.

[...]

> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
> index 4c9b67934367..caf851ba5dc0 100644
> --- a/arch/arm64/tools/sysreg
> +++ b/arch/arm64/tools/sysreg
> @@ -1023,6 +1023,137 @@ UnsignedEnum 3:0 MTEPERM
> EndEnum
> EndSysreg
>
> +
> +SysregFields BRBINFx_EL1
> +Res0 63:47
> +Field 46 CCU
> +Field 45:32 CC
> +Res0 31:18
> +Field 17 LASTFAILED
> +Field 16 T
> +Res0 15:14
> +Enum 13:8 TYPE
> + 0b000000 UNCOND_DIRECT
> + 0b000001 INDIRECT
> + 0b000010 DIRECT_LINK
> + 0b000011 INDIRECT_LINK
> + 0b000101 RET
> + 0b000111 ERET
> + 0b001000 COND_DIRECT

Minor nit, but for consistency with DIRECT_LINK, could we please use
DIRECT_UNCOND and DIRECT_COND?

> + 0b100001 DEBUG_HALT
> + 0b100010 CALL
> + 0b100011 TRAP
> + 0b100100 SERROR
> + 0b100110 INSN_DEBUG
> + 0b100111 DATA_DEBUG
> + 0b101010 ALIGN_FAULT
> + 0b101011 INSN_FAULT
> + 0b101100 DATA_FAULT
> + 0b101110 IRQ
> + 0b101111 FIQ
> + 0b110000 IMPDEF_TRAP_EL3
> + 0b111001 DEBUG_EXIT

That IMPDEF_TRAP_EL3 encoding doesn't seem to exist in the latest ARM ARM (ARM
DDI 0487J.a), and I see Mark Brown checked against the "Arm A-profile
Architecture Registers" document (ARM DDI 0601 ID121123, AKA 2023-12).

Could you please mention that in the commit message, and link to that version
of the document (https://developer.arm.com/documentation/ddi0601/2023-12/) ?
That'll make it easier for anyone else to review this, and it'll be good in
case anyone needs to figure out where this came from in future.

> +EndEnum
> +Enum 7:6 EL
> + 0b00 EL0
> + 0b01 EL1
> + 0b10 EL2
> + 0b11 EL3
> +EndEnum
> +Field 5 MPRED
> +Res0 4:2
> +Enum 1:0 VALID
> + 0b00 NONE
> + 0b01 TARGET
> + 0b10 SOURCE
> + 0b11 FULL
> +EndEnum
> +EndSysregFields

The other fields here all look good per the ARM ARM and sysreg document.

> +SysregFields BRBCR_ELx
> +Res0 63:24
> +Field 23 EXCEPTION
> +Field 22 ERTN
> +Res0 21:10
> +Field 9 FZPSS
> +Field 8 FZP
> +Res0 7
> +Enum 6:5 TS
> + 0b01 VIRTUAL
> + 0b10 GUEST_PHYSICAL
> + 0b11 PHYSICAL
> +EndEnum
> +Field 4 MPRED
> +Field 3 CC
> +Res0 2
> +Field 1 ExBRE
> +Field 0 E0BRE
> +EndSysregFields

This looks good per the ARM ARM and sysreg document.

> +Sysreg BRBCR_EL2 2 4 9 0 0
> +Fields BRBCR_ELx
> +EndSysreg
> +
> +Sysreg BRBCR_EL1 2 1 9 0 0
> +Fields BRBCR_ELx
> +EndSysreg
> +
> +Sysreg BRBCR_EL12 2 5 9 0 0
> +Fields BRBCR_ELx
> +EndSysreg

These all look good per the ARM ARM and sysreg document.

Minor nit, but could we please list thse in order:

BRBCR_EL1
BRBCR_EL12
BRBCR_EL2

.. since that way the names are ordered alphnumerically, which is what we've
done for other groups (e.g. PIR_EL{1,12,2}), and it's the way the ARM ARM
happens to be ordered.

> +Sysreg BRBFCR_EL1 2 1 9 0 1
> +Res0 63:30
> +Enum 29:28 BANK
> + 0b0 FIRST
> + 0b1 SECOND

Nit: since this is a 2-bit field, please pad these as '0b00' and '0b01'.

Could we please use BANK_0 and BANK_1 rather than FIRST and SECOND?

That'd also be easier to use behind macros.

> +EndEnum
> +Res0 27:23
> +Field 22 CONDDIR
> +Field 21 DIRCALL
> +Field 20 INDCALL
> +Field 19 RTN
> +Field 18 INDIRECT
> +Field 17 DIRECT
> +Field 16 EnI
> +Res0 15:8
> +Field 7 PAUSED
> +Field 6 LASTFAILED
> +Res0 5:0
> +EndSysreg

Other than the nit, this looks good per the ARM ARM and sysreg document.

[...]

> +Sysreg BRBIDR0_EL1 2 1 9 2 0
> +Res0 63:16
> +Enum 15:12 CC
> + 0b101 20_BIT
> +EndEnum
> +Enum 11:8 FORMAT
> + 0b0 0
> +EndEnum
> +Enum 7:0 NUMREC
> + 0b0001000 8
> + 0b0010000 16
> + 0b0100000 32
> + 0b1000000 64

This is an 8-bit field; please pad these to 8 bits (they all need a leading
'0').

> +EndEnum
> +EndSysreg

Aside from the comments above, this looks good to me.

Mark.

2024-02-21 13:59:24

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH V16 1/8] arm64/sysreg: Add BRBE registers and fields

On Wed, Feb 21, 2024 at 01:52:38PM +0000, Mark Rutland wrote:
> On Thu, Jan 25, 2024 at 03:11:12PM +0530, Anshuman Khandual wrote:

> Minor nit, but could we please list thse in order:

> BRBCR_EL1
> BRBCR_EL12
> BRBCR_EL2

> ... since that way the names are ordered alphnumerically, which is what we've
> done for other groups (e.g. PIR_EL{1,12,2}), and it's the way the ARM ARM
> happens to be ordered.

It's a good point about the sorting, though the file is currently mostly
sorted by encoding rather than alphanumerically (similarly to how
sysreg.h was done).


Attachments:
(No filename) (584.00 B)
signature.asc (499.00 B)
Download all attachments

2024-02-21 14:03:22

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions

On Thu, Jan 25, 2024 at 03:11:13PM +0530, Anshuman Khandual wrote:
> Currently BRBE feature is not supported in a guest environment. This hides
> BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field.

Does that means that a guest can currently see BRBE advertised in the
ID_AA64DFR0_EL1.BRB field, or is that hidden by the regular cpufeature code
today?

> This also blocks guest accesses into BRBE system registers and instructions
> as if the underlying hardware never implemented FEAT_BRBE feature.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Oliver Upton <[email protected]>
> Cc: James Morse <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> Changes in V16:
>
> - Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion
>
> arch/arm64/kvm/sys_regs.c | 56 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 56 insertions(+)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 30253bd19917..6a06dc2f0c06 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1304,6 +1304,11 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
> return 0;
> }
>
> +#define BRB_INF_SRC_TGT_EL1(n) \
> + { SYS_DESC(SYS_BRBINF##n##_EL1), undef_access }, \
> + { SYS_DESC(SYS_BRBSRC##n##_EL1), undef_access }, \
> + { SYS_DESC(SYS_BRBTGT##n##_EL1), undef_access } \

With the changes suggested on the previous patch, this would need to change to be:

#define BRB_INF_SRC_TGT_EL1(n) \
{ SYS_DESC(SYS_BRBINF_EL1(n)), undef_access }, \
{ SYS_DESC(SYS_BRBSRC_EL1(n)), undef_access }, \
{ SYS_DESC(SYS_BRBTGT_EL1(n)), undef_access } \


.. which would also be easier for backporting (if necessary), since those
definitions have existed for a while.

Otherwise (modulo Suzuki's comment about rebasing), this looks good to me.

Mark.

> /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
> #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
> { SYS_DESC(SYS_DBGBVRn_EL1(n)), \
> @@ -1707,6 +1712,9 @@ static u64 read_sanitised_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
> /* Hide SPE from guests */
> val &= ~ID_AA64DFR0_EL1_PMSVer_MASK;
>
> + /* Hide BRBE from guests */
> + val &= ~ID_AA64DFR0_EL1_BRBE_MASK;
> +
> return val;
> }
>
> @@ -2195,6 +2203,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> { SYS_DESC(SYS_DC_CISW), access_dcsw },
> { SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
> { SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
> + { SYS_DESC(OP_BRB_IALL), undef_access },
> + { SYS_DESC(OP_BRB_INJ), undef_access },
>
> DBG_BCR_BVR_WCR_WVR_EL1(0),
> DBG_BCR_BVR_WCR_WVR_EL1(1),
> @@ -2225,6 +2235,52 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> { SYS_DESC(SYS_DBGCLAIMCLR_EL1), trap_raz_wi },
> { SYS_DESC(SYS_DBGAUTHSTATUS_EL1), trap_dbgauthstatus_el1 },
>
> + /*
> + * BRBE branch record sysreg address space is interleaved between
> + * corresponding BRBINF<N>_EL1, BRBSRC<N>_EL1, and BRBTGT<N>_EL1.
> + */
> + BRB_INF_SRC_TGT_EL1(0),
> + BRB_INF_SRC_TGT_EL1(16),
> + BRB_INF_SRC_TGT_EL1(1),
> + BRB_INF_SRC_TGT_EL1(17),
> + BRB_INF_SRC_TGT_EL1(2),
> + BRB_INF_SRC_TGT_EL1(18),
> + BRB_INF_SRC_TGT_EL1(3),
> + BRB_INF_SRC_TGT_EL1(19),
> + BRB_INF_SRC_TGT_EL1(4),
> + BRB_INF_SRC_TGT_EL1(20),
> + BRB_INF_SRC_TGT_EL1(5),
> + BRB_INF_SRC_TGT_EL1(21),
> + BRB_INF_SRC_TGT_EL1(6),
> + BRB_INF_SRC_TGT_EL1(22),
> + BRB_INF_SRC_TGT_EL1(7),
> + BRB_INF_SRC_TGT_EL1(23),
> + BRB_INF_SRC_TGT_EL1(8),
> + BRB_INF_SRC_TGT_EL1(24),
> + BRB_INF_SRC_TGT_EL1(9),
> + BRB_INF_SRC_TGT_EL1(25),
> + BRB_INF_SRC_TGT_EL1(10),
> + BRB_INF_SRC_TGT_EL1(26),
> + BRB_INF_SRC_TGT_EL1(11),
> + BRB_INF_SRC_TGT_EL1(27),
> + BRB_INF_SRC_TGT_EL1(12),
> + BRB_INF_SRC_TGT_EL1(28),
> + BRB_INF_SRC_TGT_EL1(13),
> + BRB_INF_SRC_TGT_EL1(29),
> + BRB_INF_SRC_TGT_EL1(14),
> + BRB_INF_SRC_TGT_EL1(30),
> + BRB_INF_SRC_TGT_EL1(15),
> + BRB_INF_SRC_TGT_EL1(31),
> +
> + /* Remaining BRBE sysreg addresses space */
> + { SYS_DESC(SYS_BRBCR_EL1), undef_access },
> + { SYS_DESC(SYS_BRBFCR_EL1), undef_access },
> + { SYS_DESC(SYS_BRBTS_EL1), undef_access },
> + { SYS_DESC(SYS_BRBINFINJ_EL1), undef_access },
> + { SYS_DESC(SYS_BRBSRCINJ_EL1), undef_access },
> + { SYS_DESC(SYS_BRBTGTINJ_EL1), undef_access },
> + { SYS_DESC(SYS_BRBIDR0_EL1), undef_access },
> +
> { SYS_DESC(SYS_MDCCSR_EL0), trap_raz_wi },
> { SYS_DESC(SYS_DBGDTR_EL0), trap_raz_wi },
> // DBGDTR[TR]X_EL0 share the same encoding
> --
> 2.25.1
>

2024-02-21 14:06:44

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V16 1/8] arm64/sysreg: Add BRBE registers and fields

On Wed, Feb 21, 2024 at 01:59:03PM +0000, Mark Brown wrote:
> On Wed, Feb 21, 2024 at 01:52:38PM +0000, Mark Rutland wrote:
> > On Thu, Jan 25, 2024 at 03:11:12PM +0530, Anshuman Khandual wrote:
>
> > Minor nit, but could we please list thse in order:
>
> > BRBCR_EL1
> > BRBCR_EL12
> > BRBCR_EL2
>
> > ... since that way the names are ordered alphnumerically, which is what we've
> > done for other groups (e.g. PIR_EL{1,12,2}), and it's the way the ARM ARM
> > happens to be ordered.
>
> It's a good point about the sorting, though the file is currently mostly
> sorted by encoding rather than alphanumerically (similarly to how
> sysreg.h was done).

Sure, we're inconsistent. I'd just prefer that there's *some* local ordering
here, as the patch is neither ordered as above nor by encoding:

Sysreg BRBCR_EL2 2 4 9 0 0
..
Sysreg BRBCR_EL1 2 1 9 0 0
..
Sysreg BRBCR_EL12 2 5 9 0 0

Mark.

2024-02-21 14:11:56

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH V16 1/8] arm64/sysreg: Add BRBE registers and fields

On Wed, Feb 21, 2024 at 02:05:24PM +0000, Mark Rutland wrote:
> On Wed, Feb 21, 2024 at 01:59:03PM +0000, Mark Brown wrote:
> > On Wed, Feb 21, 2024 at 01:52:38PM +0000, Mark Rutland wrote:

> > It's a good point about the sorting, though the file is currently mostly
> > sorted by encoding rather than alphanumerically (similarly to how
> > sysreg.h was done).

> Sure, we're inconsistent. I'd just prefer that there's *some* local ordering
> here, as the patch is neither ordered as above nor by encoding:

I agree, I'm just saying that if we're going to fix the ordering it'd
probably be better to go along with what the rest of the file is doing.


Attachments:
(No filename) (664.00 B)
signature.asc (499.00 B)
Download all attachments

2024-02-21 17:25:26

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V16 3/8] drivers: perf: arm_pmuv3: Enable branch stack sampling framework

Hi Anshuman,

On Thu, Jan 25, 2024 at 03:11:14PM +0530, Anshuman Khandual wrote:
> Branch stack sampling support i.e capturing branch records during execution
> in core perf, rides along with normal HW events being scheduled on the PMU.
> This prepares ARMV8 PMU framework for branch stack support on relevant PMUs
> with required HW implementation.

Please can we start a bit more clearly, e.g.

| drivers: perf: arm_pmu: add instructure for branch stack sampling
|
| In order to support the Branch Record Buffer Extension (BRBE), we need to
| extend the arm_pmu framework with some basic infrastructure for branch stack
| sampling which arm_pmu drivers can opt-in to using. Subsequent patches will
| use this to add support for BRBE in the PMUv3 driver.

> ARMV8 PMU hardware support for branch stack sampling is indicated via a new
> feature flag called 'has_branch_stack' that can be ascertained via probing.
> This modifies current gate in armpmu_event_init() which blocks branch stack
> sampling based perf events unconditionally. Instead allows such perf events
> getting initialized on supporting PMU hardware.

This paragraph can be deleted. The addition of 'has_branch_stack' and its use
in armpmu_event_init() is trivial and obvious in-context, and this distracts
from the important parts of this patch.

> Branch stack sampling is enabled and disabled along with regular PMU events
> . This adds required function callbacks in armv8pmu_branch_xxx() format, to
> drive the PMU branch stack hardware when supported. This also adds fallback
> stub definitions for these callbacks for PMUs which would not have required
> support.

Those additions to the PMUv3 driver should all be in the next patch.

We don't add anything for the other PMU drivers that don't support branch
sampling, so why do we need to do *anything* to the PMUv3 driver here, given we
add the support in the next patch? Those additions only make this patch bigger
and more confusing (and hence more painful to review).

> If a task gets scheduled out, the current branch records get saved in the
> task's context data, which can be later used to fill in the records upon an
> event overflow. Hence, we enable PERF_ATTACH_TASK_DATA (event->attach_state
> based flag) for branch stack requesting perf events. But this also requires
> adding support for pmu::sched_task() callback to arm_pmu.

I think what this is trying to say is:

| With BRBE, the hardware records branches into a hardware FIFO, which will be
| sampled by software when perf events overflow. A task may be context-switched
| an arbitrary number of times between overflows, and to avoid losing samples
| we need to save the current records when a task is context-switched out. To
| do these we'll need to use the pmu::sched_task() callback, and we'll need to
| allocate some per-task storage space using PERF_ATTACH_TASK_DATA.

> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> Changes in V16:
>
> - Renamed arm_brbe.h as arm_pmuv3_branch.h
> - Updated perf_sample_save_brstack()'s new argument requirements with NULL
>
> drivers/perf/arm_pmu.c | 57 ++++++++++++-
> drivers/perf/arm_pmuv3.c | 141 +++++++++++++++++++++++++++++++-
> drivers/perf/arm_pmuv3_branch.h | 50 +++++++++++
> include/linux/perf/arm_pmu.h | 29 ++++++-
> include/linux/perf/arm_pmuv3.h | 1 -
> 5 files changed, 273 insertions(+), 5 deletions(-)
> create mode 100644 drivers/perf/arm_pmuv3_branch.h
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 8458fe2cebb4..16f488ae7747 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -317,6 +317,15 @@ armpmu_del(struct perf_event *event, int flags)
> struct hw_perf_event *hwc = &event->hw;
> int idx = hwc->idx;
>
> + if (has_branch_stack(event)) {
> + WARN_ON_ONCE(!hw_events->brbe_users);
> + hw_events->brbe_users--;
> + if (!hw_events->brbe_users) {
> + hw_events->brbe_context = NULL;
> + hw_events->brbe_sample_type = 0;
> + }
> + }
> +

If this is going to leak into the core arm_pmu code, use "branch_stack" rather
than "brbe" for these field names.

However, I reckon we could just have two new callbacks on arm_pmu:

branch_stack_add(struct perf_event *event, ...);
branch_stack_del(struct perf_event *event, ...);

.. and hide all of the details in the PMUv3 (or BRBE) driver for now, and the
code above can just do:

if (has_branch_stack(event))
branch_stack_del(event, ...);

.. and likewise in armpmu_add().

That way the actuel management logic for the context and so on can be added in
the next patch, where the lifetime would be *much* clearer.

> armpmu_stop(event, PERF_EF_UPDATE);
> hw_events->events[idx] = NULL;
> armpmu->clear_event_idx(hw_events, event);
> @@ -333,6 +342,38 @@ armpmu_add(struct perf_event *event, int flags)
> struct hw_perf_event *hwc = &event->hw;
> int idx;
>
> + if (has_branch_stack(event)) {
> + /*
> + * Reset branch records buffer if a new CPU bound event
> + * gets scheduled on a PMU. Otherwise existing branch
> + * records present in the buffer might just leak into
> + * such events.
> + *
> + * Also reset current 'hw_events->brbe_context' because
> + * any previous task bound event now would have lost an
> + * opportunity for continuous branch records.
> + */

Doesn't this mean some user silently loses events? Why is that ok?

> + if (!event->ctx->task) {
> + hw_events->brbe_context = NULL;
> + if (armpmu->branch_reset)
> + armpmu->branch_reset();
> + }
> +
> + /*
> + * Reset branch records buffer if a new task event gets
> + * scheduled on a PMU which might have existing records.
> + * Otherwise older branch records present in the buffer
> + * might leak into the new task event.
> + */
> + if (event->ctx->task && hw_events->brbe_context != event->ctx) {
> + hw_events->brbe_context = event->ctx;
> + if (armpmu->branch_reset)
> + armpmu->branch_reset();
> + }

Same question here.

How does this work on other architectures?

What do we do if the CPU-bound and task-bound events want different filters,
etc?

This is the sort of gnarly detail that should be explained (or at least
introduced) in the commit message.

> + hw_events->brbe_users++;
> + hw_events->brbe_sample_type = event->attr.branch_sample_type;

What exactly is brbe_sample_type, and why does it get overriden *every time* we
add a new event? What happens when events have different values for
brbe_sample_type? Or is that forbidden somehow?

> + }
> +
> /* An event following a process won't be stopped earlier */
> if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
> return -ENOENT;

Unless this cpumask check has been made redundant, it means that the code above
it is obviously wrong, since that pokes the BRBE HW and increments brbe_users
*before* we decide whether the event can be installed on this CPU. That'll blow
up on big.LITTLE, e.g. we try and install a 'big' CPU event on a 'little' CPU,
poke the BRBE HW and increment brbe_users, then *after* that we abort
installing the event.

Even ignoring big.LITTLE, we can fail immediately after this when we don't have
enough counters, since the following code is:

| /* An event following a process won't be stopped earlier */
| if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
| return -ENOENT;
|
| /* If we don't have a space for the counter then finish early. */
| idx = armpmu->get_event_idx(hw_events, event);
| if (idx < 0)
| return idx;

.. which'll go wrong if you try to open 1 more event than the CPU has
counters.

> @@ -511,13 +552,24 @@ static int armpmu_event_init(struct perf_event *event)
> !cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
> return -ENOENT;
>
> - /* does not support taken branch sampling */
> - if (has_branch_stack(event))
> + /*
> + * Branch stack sampling events are allowed
> + * only on PMU which has required support.
> + */
> + if (has_branch_stack(event) && !armpmu->has_branch_stack)
> return -EOPNOTSUPP;
> return __hw_perf_event_init(event);
> }
>

I think we can delete the comment entirely here, but the code itself looks
fine here.

> +static void armpmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
> +{
> + struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
> +
> + if (armpmu->sched_task)
> + armpmu->sched_task(pmu_ctx, sched_in);
> +}

This looks fine.

> static void armpmu_enable(struct pmu *pmu)
> {
> struct arm_pmu *armpmu = to_arm_pmu(pmu);
> @@ -864,6 +916,7 @@ struct arm_pmu *armpmu_alloc(void)
> }
>
> pmu->pmu = (struct pmu) {
> + .sched_task = armpmu_sched_task,
> .pmu_enable = armpmu_enable,
> .pmu_disable = armpmu_disable,
> .event_init = armpmu_event_init,
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 23fa6c5da82c..9e17764a0929 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -26,6 +26,7 @@
> #include <linux/nmi.h>
>
> #include <asm/arm_pmuv3.h>
> +#include "arm_pmuv3_branch.h"

As above, I do not thing that the PMUv3 driver should change at all in this
patch. As of this patch it achieves nothing, and it makes it really hard to
understand what's going on because the important aspects are spread randomly
across this patch and the next patch which actually adds the BRBE management.

Please factor the PMUv3 changes out into the patch adding the actual BRBE code.

[...]

> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
> index 46377e134d67..c3e7d2cfb737 100644
> --- a/include/linux/perf/arm_pmuv3.h
> +++ b/include/linux/perf/arm_pmuv3.h
> @@ -308,5 +308,4 @@
> default: WARN(1, "Invalid PMEV* index\n"); \
> } \
> } while (0)
> -
> #endif

Unrelated whitespace change.

Mark.

2024-02-23 05:28:44

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V16 1/8] arm64/sysreg: Add BRBE registers and fields



On 2/21/24 19:37, Mark Brown wrote:
> On Wed, Feb 21, 2024 at 02:05:24PM +0000, Mark Rutland wrote:
>> On Wed, Feb 21, 2024 at 01:59:03PM +0000, Mark Brown wrote:
>>> On Wed, Feb 21, 2024 at 01:52:38PM +0000, Mark Rutland wrote:
>
>>> It's a good point about the sorting, though the file is currently mostly
>>> sorted by encoding rather than alphanumerically (similarly to how
>>> sysreg.h was done).
>
>> Sure, we're inconsistent. I'd just prefer that there's *some* local ordering
>> here, as the patch is neither ordered as above nor by encoding:
>
> I agree, I'm just saying that if we're going to fix the ordering it'd
> probably be better to go along with what the rest of the file is doing.

Sure, will change the registers order as has been suggested earlier i.e
alphanumerically instead. Because ordering registers with encoding will
push BRBCR_EL2/12 after all other BRBE registers, including BRBIDR0_EL1.

After the change

BRBCR_EL1
BRBCR_EL12
BRBCR_EL2

2024-02-23 06:38:04

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V16 1/8] arm64/sysreg: Add BRBE registers and fields

On 2/21/24 19:22, Mark Rutland wrote:
> On Thu, Jan 25, 2024 at 03:11:12PM +0530, Anshuman Khandual wrote:
>> This adds BRBE related register definitions and various other related field
>> macros there in. These will be used subsequently in a BRBE driver, which is
>> being added later on.
>>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Mark Rutland <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> Changes in V16:
>>
>> - Updated BRBINFx_EL1.TYPE = 0b110000 as field IMPDEF_TRAP_EL3
>> - Updated BRBCR_ELx[9] as field FZPSS
>> - Updated BRBINFINJ_EL1 to use sysreg field BRBINFx_EL1
>>
>> arch/arm64/include/asm/sysreg.h | 109 ++++++++++++++++++++++++++
>> arch/arm64/tools/sysreg | 131 ++++++++++++++++++++++++++++++++
>> 2 files changed, 240 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>> index c3b19b376c86..72544b5c4951 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -272,6 +272,109 @@
>>
>> #define SYS_BRBCR_EL2 sys_reg(2, 4, 9, 0, 0)
>>
>> +#define __SYS_BRBINF(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 0))
>> +#define __SYS_BRBSRC(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 1))
>> +#define __SYS_BRBTGT(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 2))
>
> We already have definitions for these since v6.5, added in commit:
>
> 57596c8f991c9aac ("arm64: Add debug registers affected by HDFGxTR_EL2:)
>
> That commit also added register encoding definitions:
>
> | #define SYS_BRBINF_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 0))
> | #define SYS_BRBINFINJ_EL1 sys_reg(2, 1, 9, 1, 0)
> | #define SYS_BRBSRC_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 1))
> | #define SYS_BRBSRCINJ_EL1 sys_reg(2, 1, 9, 1, 1)
> | #define SYS_BRBTGT_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 2))
> | #define SYS_BRBTGTINJ_EL1 sys_reg(2, 1, 9, 1, 2)
> | #define SYS_BRBTS_EL1 sys_reg(2, 1, 9, 0, 2)
>
> I don't think we need to add new encoding definitions for BRBINF<n>_EL1,
> BRBSRC<n>_EL1, or BRBTGT<n>_EL1; we can just use those existing defintions
> directly. That also means we don't need to add all of the expanded 0..31
> definitions; the driver can use SYS_BRBINF_EL1(n) and friends directly.

Right, that seems feasible. Hence with the following change to the BRBE driver
and arm64 KVM, we can convert using existing SYS_BRBXXX_EL1(n) format.

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6a06dc2f0c06..739d861b9ef3 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1304,10 +1304,10 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
return 0;
}

-#define BRB_INF_SRC_TGT_EL1(n) \
- { SYS_DESC(SYS_BRBINF##n##_EL1), undef_access }, \
- { SYS_DESC(SYS_BRBSRC##n##_EL1), undef_access }, \
- { SYS_DESC(SYS_BRBTGT##n##_EL1), undef_access } \
+#define BRB_INF_SRC_TGT_EL1(n) \
+ { SYS_DESC(SYS_BRBINF_EL1(n)), undef_access }, \
+ { SYS_DESC(SYS_BRBSRC_EL1(n)), undef_access }, \
+ { SYS_DESC(SYS_BRBTGT_EL1(n)), undef_access } \

/* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
#define DBG_BCR_BVR_WCR_WVR_EL1(n) \
diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c
index 22924023e0f1..dfaf098432ff 100644
--- a/drivers/perf/arm_brbe.c
+++ b/drivers/perf/arm_brbe.c
@@ -104,13 +104,13 @@ enum brbe_bank_idx {
};

#define RETURN_READ_BRBSRCN(n) \
- read_sysreg_s(SYS_BRBSRC##n##_EL1)
+ read_sysreg_s(SYS_BRBSRC_EL1(n))

#define RETURN_READ_BRBTGTN(n) \
- read_sysreg_s(SYS_BRBTGT##n##_EL1)
+ read_sysreg_s(SYS_BRBTGT_EL1(n))

#define RETURN_READ_BRBINFN(n) \
- read_sysreg_s(SYS_BRBINF##n##_EL1)
+ read_sysreg_s(SYS_BRBINF_EL1(n))

#define BRBE_REGN_CASE(n, case_macro) \
case n: return case_macro(n); break

But while here, will also drop previously added other BRBE registers
from (arch/arm64/include/asm/sysreg.h), as they are now being added
via (arch/arm64/tools/sysreg) instead which is the right place.

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 9a919a102cf1..481c7d186dfa 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -195,16 +195,8 @@
#define SYS_DBGVCR32_EL2 sys_reg(2, 4, 0, 7, 0)

#define SYS_BRBINF_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 0))
-#define SYS_BRBINFINJ_EL1 sys_reg(2, 1, 9, 1, 0)
#define SYS_BRBSRC_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 1))
-#define SYS_BRBSRCINJ_EL1 sys_reg(2, 1, 9, 1, 1)
#define SYS_BRBTGT_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 2))
-#define SYS_BRBTGTINJ_EL1 sys_reg(2, 1, 9, 1, 2)
-#define SYS_BRBTS_EL1 sys_reg(2, 1, 9, 0, 2)
-
-#define SYS_BRBCR_EL1 sys_reg(2, 1, 9, 0, 0)
-#define SYS_BRBFCR_EL1 sys_reg(2, 1, 9, 0, 1)
-#define SYS_BRBIDR0_EL1 sys_reg(2, 1, 9, 2, 0)

#define SYS_TRCITECR_EL1 sys_reg(3, 0, 1, 2, 3)
#define SYS_TRCACATR(m) sys_reg(2, 1, 2, ((m & 7) << 1), (2 | (m >> 3)))
@@ -270,8 +262,6 @@
/* ETM */
#define SYS_TRCOSLAR sys_reg(2, 1, 1, 0, 4)

-#define SYS_BRBCR_EL2 sys_reg(2, 4, 9, 0, 0)
-
#define SYS_MIDR_EL1 sys_reg(3, 0, 0, 0, 0)
#define SYS_MPIDR_EL1 sys_reg(3, 0, 0, 0, 5)
#define SYS_REVIDR_EL1 sys_reg(3, 0, 0, 0, 6)
@@ -601,7 +591,6 @@
#define SYS_CNTHV_CVAL_EL2 sys_reg(3, 4, 14, 3, 2)

/* VHE encodings for architectural EL0/1 system registers */
-#define SYS_BRBCR_EL12 sys_reg(2, 5, 9, 0, 0)
#define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0)
#define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2)
#define SYS_SCTLR2_EL12 sys_reg(3, 5, 1, 0, 3)


>
> [...]
>
>> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
>> index 4c9b67934367..caf851ba5dc0 100644
>> --- a/arch/arm64/tools/sysreg
>> +++ b/arch/arm64/tools/sysreg
>> @@ -1023,6 +1023,137 @@ UnsignedEnum 3:0 MTEPERM
>> EndEnum
>> EndSysreg
>>
>> +
>> +SysregFields BRBINFx_EL1
>> +Res0 63:47
>> +Field 46 CCU
>> +Field 45:32 CC
>> +Res0 31:18
>> +Field 17 LASTFAILED
>> +Field 16 T
>> +Res0 15:14
>> +Enum 13:8 TYPE
>> + 0b000000 UNCOND_DIRECT
>> + 0b000001 INDIRECT
>> + 0b000010 DIRECT_LINK
>> + 0b000011 INDIRECT_LINK
>> + 0b000101 RET
>> + 0b000111 ERET
>> + 0b001000 COND_DIRECT
>
> Minor nit, but for consistency with DIRECT_LINK, could we please use
> DIRECT_UNCOND and DIRECT_COND?

Sure, will change as above.

>
>> + 0b100001 DEBUG_HALT
>> + 0b100010 CALL
>> + 0b100011 TRAP
>> + 0b100100 SERROR
>> + 0b100110 INSN_DEBUG
>> + 0b100111 DATA_DEBUG
>> + 0b101010 ALIGN_FAULT
>> + 0b101011 INSN_FAULT
>> + 0b101100 DATA_FAULT
>> + 0b101110 IRQ
>> + 0b101111 FIQ
>> + 0b110000 IMPDEF_TRAP_EL3
>> + 0b111001 DEBUG_EXIT
>
> That IMPDEF_TRAP_EL3 encoding doesn't seem to exist in the latest ARM ARM (ARM
> DDI 0487J.a), and I see Mark Brown checked against the "Arm A-profile
> Architecture Registers" document (ARM DDI 0601 ID121123, AKA 2023-12).

That's correct.

>
> Could you please mention that in the commihttps://developer.arm.com/documentation/ddi0601/2023-12/) ?
> That'll make it easier for anyone else to review this, and it'll be good it message, and link to that version
> of the document (n
> case anyone needs to figure out where this came from in future.
>

Sure, will do that.

>> +EndEnum
>> +Enum 7:6 EL
>> + 0b00 EL0
>> + 0b01 EL1
>> + 0b10 EL2
>> + 0b11 EL3
>> +EndEnum
>> +Field 5 MPRED
>> +Res0 4:2
>> +Enum 1:0 VALID
>> + 0b00 NONE
>> + 0b01 TARGET
>> + 0b10 SOURCE
>> + 0b11 FULL
>> +EndEnum
>> +EndSysregFields
>
> The other fields here all look good per the ARM ARM and sysreg document.
>
>> +SysregFields BRBCR_ELx
>> +Res0 63:24
>> +Field 23 EXCEPTION
>> +Field 22 ERTN
>> +Res0 21:10
>> +Field 9 FZPSS
>> +Field 8 FZP
>> +Res0 7
>> +Enum 6:5 TS
>> + 0b01 VIRTUAL
>> + 0b10 GUEST_PHYSICAL
>> + 0b11 PHYSICAL
>> +EndEnum
>> +Field 4 MPRED
>> +Field 3 CC
>> +Res0 2
>> +Field 1 ExBRE
>> +Field 0 E0BRE
>> +EndSysregFields
>
> This looks good per the ARM ARM and sysreg document.
>
>> +Sysreg BRBCR_EL2 2 4 9 0 0
>> +Fields BRBCR_ELx
>> +EndSysreg
>> +
>> +Sysreg BRBCR_EL1 2 1 9 0 0
>> +Fields BRBCR_ELx
>> +EndSysreg
>> +
>> +Sysreg BRBCR_EL12 2 5 9 0 0
>> +Fields BRBCR_ELx
>> +EndSysreg
>
> These all look good per the ARM ARM and sysreg document.
>
> Minor nit, but could we please list thse in order:
>
> BRBCR_EL1
> BRBCR_EL12
> BRBCR_EL2
>
> ... since that way the names are ordered alphnumerically, which is what we've
> done for other groups (e.g. PIR_EL{1,12,2}), and it's the way the ARM ARM
> happens to be ordered.
>
>> +Sysreg BRBFCR_EL1 2 1 9 0 1
>> +Res0 63:30
>> +Enum 29:28 BANK
>> + 0b0 FIRST
>> + 0b1 SECOND
>
> Nit: since this is a 2-bit field, please pad these as '0b00' and '0b01'.
>
> Could we please use BANK_0 and BANK_1 rather than FIRST and SECOND?
>
> That'd also be easier to use behind macros.

Sure, will change as above.

>
>> +EndEnum
>> +Res0 27:23
>> +Field 22 CONDDIR
>> +Field 21 DIRCALL
>> +Field 20 INDCALL
>> +Field 19 RTN
>> +Field 18 INDIRECT
>> +Field 17 DIRECT
>> +Field 16 EnI
>> +Res0 15:8
>> +Field 7 PAUSED
>> +Field 6 LASTFAILED
>> +Res0 5:0
>> +EndSysreg
>
> Other than the nit, this looks good per the ARM ARM and sysreg document.

Okay

>
> [...]
>
>> +Sysreg BRBIDR0_EL1 2 1 9 2 0
>> +Res0 63:16
>> +Enum 15:12 CC
>> + 0b101 20_BIT
>> +EndEnum
>> +Enum 11:8 FORMAT
>> + 0b0 0
>> +EndEnum
>> +Enum 7:0 NUMREC
>> + 0b0001000 8
>> + 0b0010000 16
>> + 0b0100000 32
>> + 0b1000000 64
>
> This is an 8-bit field; please pad these to 8 bits (they all need a leading
> '0').

Sure, will change as above.

>
>> +EndEnum
>> +EndSysreg
>
> Aside from the comments above, this looks good to me.
>
> Mark.

2024-02-23 07:29:07

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions



On 2/21/24 19:31, Mark Rutland wrote:
> On Thu, Jan 25, 2024 at 03:11:13PM +0530, Anshuman Khandual wrote:
>> Currently BRBE feature is not supported in a guest environment. This hides
>> BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field.
>
> Does that means that a guest can currently see BRBE advertised in the
> ID_AA64DFR0_EL1.BRB field, or is that hidden by the regular cpufeature code
> today?

IIRC it is hidden, but will have to double check. When experimenting for BRBE
guest support enablement earlier, following changes were need for the feature
to be visible in ID_AA64DFR0_EL1.

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 646591c67e7a..f258568535a8 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -445,6 +445,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
};

static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = {
+ S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_IMP),
S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0),

Should we add the following entry - explicitly hiding BRBE from the guest
as a prerequisite patch ?

S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_NI)

>
>> This also blocks guest accesses into BRBE system registers and instructions
>> as if the underlying hardware never implemented FEAT_BRBE feature.
>>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Oliver Upton <[email protected]>
>> Cc: James Morse <[email protected]>
>> Cc: Suzuki K Poulose <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> Changes in V16:
>>
>> - Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion
>>
>> arch/arm64/kvm/sys_regs.c | 56 +++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 56 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 30253bd19917..6a06dc2f0c06 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1304,6 +1304,11 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
>> return 0;
>> }
>>
>> +#define BRB_INF_SRC_TGT_EL1(n) \
>> + { SYS_DESC(SYS_BRBINF##n##_EL1), undef_access }, \
>> + { SYS_DESC(SYS_BRBSRC##n##_EL1), undef_access }, \
>> + { SYS_DESC(SYS_BRBTGT##n##_EL1), undef_access } \
>
> With the changes suggested on the previous patch, this would need to change to be:
>
> #define BRB_INF_SRC_TGT_EL1(n) \
> { SYS_DESC(SYS_BRBINF_EL1(n)), undef_access }, \
> { SYS_DESC(SYS_BRBSRC_EL1(n)), undef_access }, \
> { SYS_DESC(SYS_BRBTGT_EL1(n)), undef_access } \

Sure, already folded back in these above changes.

>
>
> ... which would also be easier for backporting (if necessary), since those
> definitions have existed for a while.
>
> Otherwise (modulo Suzuki's comment about rebasing), this looks good to me.

Okay.

>
> Mark.
>
>> /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
>> #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
>> { SYS_DESC(SYS_DBGBVRn_EL1(n)), \
>> @@ -1707,6 +1712,9 @@ static u64 read_sanitised_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
>> /* Hide SPE from guests */
>> val &= ~ID_AA64DFR0_EL1_PMSVer_MASK;
>>
>> + /* Hide BRBE from guests */
>> + val &= ~ID_AA64DFR0_EL1_BRBE_MASK;
>> +
>> return val;
>> }
>>
>> @@ -2195,6 +2203,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>> { SYS_DESC(SYS_DC_CISW), access_dcsw },
>> { SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
>> { SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
>> + { SYS_DESC(OP_BRB_IALL), undef_access },
>> + { SYS_DESC(OP_BRB_INJ), undef_access },
>>
>> DBG_BCR_BVR_WCR_WVR_EL1(0),
>> DBG_BCR_BVR_WCR_WVR_EL1(1),
>> @@ -2225,6 +2235,52 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>> { SYS_DESC(SYS_DBGCLAIMCLR_EL1), trap_raz_wi },
>> { SYS_DESC(SYS_DBGAUTHSTATUS_EL1), trap_dbgauthstatus_el1 },
>>
>> + /*
>> + * BRBE branch record sysreg address space is interleaved between
>> + * corresponding BRBINF<N>_EL1, BRBSRC<N>_EL1, and BRBTGT<N>_EL1.
>> + */
>> + BRB_INF_SRC_TGT_EL1(0),
>> + BRB_INF_SRC_TGT_EL1(16),
>> + BRB_INF_SRC_TGT_EL1(1),
>> + BRB_INF_SRC_TGT_EL1(17),
>> + BRB_INF_SRC_TGT_EL1(2),
>> + BRB_INF_SRC_TGT_EL1(18),
>> + BRB_INF_SRC_TGT_EL1(3),
>> + BRB_INF_SRC_TGT_EL1(19),
>> + BRB_INF_SRC_TGT_EL1(4),
>> + BRB_INF_SRC_TGT_EL1(20),
>> + BRB_INF_SRC_TGT_EL1(5),
>> + BRB_INF_SRC_TGT_EL1(21),
>> + BRB_INF_SRC_TGT_EL1(6),
>> + BRB_INF_SRC_TGT_EL1(22),
>> + BRB_INF_SRC_TGT_EL1(7),
>> + BRB_INF_SRC_TGT_EL1(23),
>> + BRB_INF_SRC_TGT_EL1(8),
>> + BRB_INF_SRC_TGT_EL1(24),
>> + BRB_INF_SRC_TGT_EL1(9),
>> + BRB_INF_SRC_TGT_EL1(25),
>> + BRB_INF_SRC_TGT_EL1(10),
>> + BRB_INF_SRC_TGT_EL1(26),
>> + BRB_INF_SRC_TGT_EL1(11),
>> + BRB_INF_SRC_TGT_EL1(27),
>> + BRB_INF_SRC_TGT_EL1(12),
>> + BRB_INF_SRC_TGT_EL1(28),
>> + BRB_INF_SRC_TGT_EL1(13),
>> + BRB_INF_SRC_TGT_EL1(29),
>> + BRB_INF_SRC_TGT_EL1(14),
>> + BRB_INF_SRC_TGT_EL1(30),
>> + BRB_INF_SRC_TGT_EL1(15),
>> + BRB_INF_SRC_TGT_EL1(31),
>> +
>> + /* Remaining BRBE sysreg addresses space */
>> + { SYS_DESC(SYS_BRBCR_EL1), undef_access },
>> + { SYS_DESC(SYS_BRBFCR_EL1), undef_access },
>> + { SYS_DESC(SYS_BRBTS_EL1), undef_access },
>> + { SYS_DESC(SYS_BRBINFINJ_EL1), undef_access },
>> + { SYS_DESC(SYS_BRBSRCINJ_EL1), undef_access },
>> + { SYS_DESC(SYS_BRBTGTINJ_EL1), undef_access },
>> + { SYS_DESC(SYS_BRBIDR0_EL1), undef_access },
>> +
>> { SYS_DESC(SYS_MDCCSR_EL0), trap_raz_wi },
>> { SYS_DESC(SYS_DBGDTR_EL0), trap_raz_wi },
>> // DBGDTR[TR]X_EL0 share the same encoding
>> --
>> 2.25.1
>>

2024-02-23 13:32:13

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH V16 1/8] arm64/sysreg: Add BRBE registers and fields

On Fri, Feb 23, 2024 at 10:58:12AM +0530, Anshuman Khandual wrote:
> On 2/21/24 19:37, Mark Brown wrote:
> > On Wed, Feb 21, 2024 at 02:05:24PM +0000, Mark Rutland wrote:

> >> Sure, we're inconsistent. I'd just prefer that there's *some* local ordering
> >> here, as the patch is neither ordered as above nor by encoding:

> > I agree, I'm just saying that if we're going to fix the ordering it'd
> > probably be better to go along with what the rest of the file is doing.

> Sure, will change the registers order as has been suggested earlier i.e
> alphanumerically instead. Because ordering registers with encoding will
> push BRBCR_EL2/12 after all other BRBE registers, including BRBIDR0_EL1.

> After the change

> BRBCR_EL1
> BRBCR_EL12
> BRBCR_EL2

The _EL2/12 registers generally come at the end of the file due to the
way the encodings work?


Attachments:
(No filename) (874.00 B)
signature.asc (499.00 B)
Download all attachments

2024-02-26 04:23:07

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH] arm64/hw_breakpoint: Determine lengths from generic perf breakpoint macros

Both platform i.e ARM_BREAKPOINT_LEN_X and generic i.e HW_BREAKPOINT_LEN_X
macros are used interchangeably to convert event->attr.bp_len and platform
breakpoint control arch_hw_breakpoint_ctrl->len. Let's be consistent while
deriving one from the other. This does not cause any functional changes.

Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
This applies on v6.8-rc5

arch/arm64/kernel/hw_breakpoint.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 35225632d70a..1ab9fc865ddd 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -301,28 +301,28 @@ static int get_hbp_len(u8 hbp_len)

switch (hbp_len) {
case ARM_BREAKPOINT_LEN_1:
- len_in_bytes = 1;
+ len_in_bytes = HW_BREAKPOINT_LEN_1;
break;
case ARM_BREAKPOINT_LEN_2:
- len_in_bytes = 2;
+ len_in_bytes = HW_BREAKPOINT_LEN_2;
break;
case ARM_BREAKPOINT_LEN_3:
- len_in_bytes = 3;
+ len_in_bytes = HW_BREAKPOINT_LEN_3;
break;
case ARM_BREAKPOINT_LEN_4:
- len_in_bytes = 4;
+ len_in_bytes = HW_BREAKPOINT_LEN_4;
break;
case ARM_BREAKPOINT_LEN_5:
- len_in_bytes = 5;
+ len_in_bytes = HW_BREAKPOINT_LEN_5;
break;
case ARM_BREAKPOINT_LEN_6:
- len_in_bytes = 6;
+ len_in_bytes = HW_BREAKPOINT_LEN_6;
break;
case ARM_BREAKPOINT_LEN_7:
- len_in_bytes = 7;
+ len_in_bytes = HW_BREAKPOINT_LEN_7;
break;
case ARM_BREAKPOINT_LEN_8:
- len_in_bytes = 8;
+ len_in_bytes = HW_BREAKPOINT_LEN_8;
break;
}

--
2.25.1


2024-02-26 04:25:00

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH] arm64/sysreg: Add BRBE registers and fields

This adds BRBE related register definitions and various other related field
macros there in. These will be used subsequently in a BRBE driver, which is
being added later on. While here, this drops redundant register definitions
from the header i.e (arch/arm64/include/asm/sysreg.h).

BRBINFx_EL1_TYPE_IMPDEF_TRAP_EL3 register field value has been derived from
latest ARM DDI 0601 ID121123, AKA 2023-12 instead of latest ARM ARM i.e ARM
DDI 0487J.a. Please find the definition here.

https://developer.arm.com/documentation/ddi0601/2023-12/

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
Please find the modified patch here for a quick review and do let me know
if this looks good for the next version i.e V17. BRBCR_EL1/12/2 organized
per their encoding. Thanks !

arch/arm64/include/asm/sysreg.h | 17 ++---
arch/arm64/tools/sysreg | 131 ++++++++++++++++++++++++++++++++
2 files changed, 137 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index c3b19b376c86..481c7d186dfa 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -195,16 +195,8 @@
#define SYS_DBGVCR32_EL2 sys_reg(2, 4, 0, 7, 0)

#define SYS_BRBINF_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 0))
-#define SYS_BRBINFINJ_EL1 sys_reg(2, 1, 9, 1, 0)
#define SYS_BRBSRC_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 1))
-#define SYS_BRBSRCINJ_EL1 sys_reg(2, 1, 9, 1, 1)
#define SYS_BRBTGT_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 2))
-#define SYS_BRBTGTINJ_EL1 sys_reg(2, 1, 9, 1, 2)
-#define SYS_BRBTS_EL1 sys_reg(2, 1, 9, 0, 2)
-
-#define SYS_BRBCR_EL1 sys_reg(2, 1, 9, 0, 0)
-#define SYS_BRBFCR_EL1 sys_reg(2, 1, 9, 0, 1)
-#define SYS_BRBIDR0_EL1 sys_reg(2, 1, 9, 2, 0)

#define SYS_TRCITECR_EL1 sys_reg(3, 0, 1, 2, 3)
#define SYS_TRCACATR(m) sys_reg(2, 1, 2, ((m & 7) << 1), (2 | (m >> 3)))
@@ -270,8 +262,6 @@
/* ETM */
#define SYS_TRCOSLAR sys_reg(2, 1, 1, 0, 4)

-#define SYS_BRBCR_EL2 sys_reg(2, 4, 9, 0, 0)
-
#define SYS_MIDR_EL1 sys_reg(3, 0, 0, 0, 0)
#define SYS_MPIDR_EL1 sys_reg(3, 0, 0, 0, 5)
#define SYS_REVIDR_EL1 sys_reg(3, 0, 0, 0, 6)
@@ -601,7 +591,6 @@
#define SYS_CNTHV_CVAL_EL2 sys_reg(3, 4, 14, 3, 2)

/* VHE encodings for architectural EL0/1 system registers */
-#define SYS_BRBCR_EL12 sys_reg(2, 5, 9, 0, 0)
#define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0)
#define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2)
#define SYS_SCTLR2_EL12 sys_reg(3, 5, 1, 0, 3)
@@ -794,6 +783,12 @@
#define OP_COSP_RCTX sys_insn(1, 3, 7, 3, 6)
#define OP_CPP_RCTX sys_insn(1, 3, 7, 3, 7)

+/*
+ * BRBE Instructions
+ */
+#define BRB_IALL_INSN __emit_inst(0xd5000000 | OP_BRB_IALL | (0x1f))
+#define BRB_INJ_INSN __emit_inst(0xd5000000 | OP_BRB_INJ | (0x1f))
+
/* Common SCTLR_ELx flags. */
#define SCTLR_ELx_ENTP2 (BIT(60))
#define SCTLR_ELx_DSSBS (BIT(44))
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 4c9b67934367..60d288cbd5eb 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -1023,6 +1023,137 @@ UnsignedEnum 3:0 MTEPERM
EndEnum
EndSysreg

+
+SysregFields BRBINFx_EL1
+Res0 63:47
+Field 46 CCU
+Field 45:32 CC
+Res0 31:18
+Field 17 LASTFAILED
+Field 16 T
+Res0 15:14
+Enum 13:8 TYPE
+ 0b000000 DIRECT_UNCOND
+ 0b000001 INDIRECT
+ 0b000010 DIRECT_LINK
+ 0b000011 INDIRECT_LINK
+ 0b000101 RET
+ 0b000111 ERET
+ 0b001000 DIRECT_COND
+ 0b100001 DEBUG_HALT
+ 0b100010 CALL
+ 0b100011 TRAP
+ 0b100100 SERROR
+ 0b100110 INSN_DEBUG
+ 0b100111 DATA_DEBUG
+ 0b101010 ALIGN_FAULT
+ 0b101011 INSN_FAULT
+ 0b101100 DATA_FAULT
+ 0b101110 IRQ
+ 0b101111 FIQ
+ 0b110000 IMPDEF_TRAP_EL3
+ 0b111001 DEBUG_EXIT
+EndEnum
+Enum 7:6 EL
+ 0b00 EL0
+ 0b01 EL1
+ 0b10 EL2
+ 0b11 EL3
+EndEnum
+Field 5 MPRED
+Res0 4:2
+Enum 1:0 VALID
+ 0b00 NONE
+ 0b01 TARGET
+ 0b10 SOURCE
+ 0b11 FULL
+EndEnum
+EndSysregFields
+
+SysregFields BRBCR_ELx
+Res0 63:24
+Field 23 EXCEPTION
+Field 22 ERTN
+Res0 21:10
+Field 9 FZPSS
+Field 8 FZP
+Res0 7
+Enum 6:5 TS
+ 0b01 VIRTUAL
+ 0b10 GUEST_PHYSICAL
+ 0b11 PHYSICAL
+EndEnum
+Field 4 MPRED
+Field 3 CC
+Res0 2
+Field 1 ExBRE
+Field 0 E0BRE
+EndSysregFields
+
+Sysreg BRBCR_EL1 2 1 9 0 0
+Fields BRBCR_ELx
+EndSysreg
+
+Sysreg BRBFCR_EL1 2 1 9 0 1
+Res0 63:30
+Enum 29:28 BANK
+ 0b00 BANK_0
+ 0b01 BANK_1
+EndEnum
+Res0 27:23
+Field 22 CONDDIR
+Field 21 DIRCALL
+Field 20 INDCALL
+Field 19 RTN
+Field 18 INDIRECT
+Field 17 DIRECT
+Field 16 EnI
+Res0 15:8
+Field 7 PAUSED
+Field 6 LASTFAILED
+Res0 5:0
+EndSysreg
+
+Sysreg BRBTS_EL1 2 1 9 0 2
+Field 63:0 TS
+EndSysreg
+
+Sysreg BRBINFINJ_EL1 2 1 9 1 0
+Fields BRBINFx_EL1
+EndSysreg
+
+Sysreg BRBSRCINJ_EL1 2 1 9 1 1
+Field 63:0 ADDRESS
+EndSysreg
+
+Sysreg BRBTGTINJ_EL1 2 1 9 1 2
+Field 63:0 ADDRESS
+EndSysreg
+
+Sysreg BRBIDR0_EL1 2 1 9 2 0
+Res0 63:16
+Enum 15:12 CC
+ 0b101 20_BIT
+EndEnum
+Enum 11:8 FORMAT
+ 0b0 0
+EndEnum
+Enum 7:0 NUMREC
+ 0b00001000 8
+ 0b00010000 16
+ 0b00100000 32
+ 0b01000000 64
+EndEnum
+EndSysreg
+
+Sysreg BRBCR_EL2 2 4 9 0 0
+Fields BRBCR_ELx
+EndSysreg
+
+Sysreg BRBCR_EL12 2 5 9 0 0
+Fields BRBCR_ELx
+EndSysreg
+
Sysreg ID_AA64ZFR0_EL1 3 0 0 4 4
Res0 63:60
UnsignedEnum 59:56 F64MM
--
2.25.1


2024-02-26 04:26:46

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH] arm64/hw_breakpoint: Determine lengths from generic perf breakpoint macros



On 2/26/24 09:52, Anshuman Khandual wrote:
> Both platform i.e ARM_BREAKPOINT_LEN_X and generic i.e HW_BREAKPOINT_LEN_X
> macros are used interchangeably to convert event->attr.bp_len and platform
> breakpoint control arch_hw_breakpoint_ctrl->len. Let's be consistent while
> deriving one from the other. This does not cause any functional changes.
>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> This applies on v6.8-rc5
>
> arch/arm64/kernel/hw_breakpoint.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
> index 35225632d70a..1ab9fc865ddd 100644
> --- a/arch/arm64/kernel/hw_breakpoint.c
> +++ b/arch/arm64/kernel/hw_breakpoint.c
> @@ -301,28 +301,28 @@ static int get_hbp_len(u8 hbp_len)
>
> switch (hbp_len) {
> case ARM_BREAKPOINT_LEN_1:
> - len_in_bytes = 1;
> + len_in_bytes = HW_BREAKPOINT_LEN_1;
> break;
> case ARM_BREAKPOINT_LEN_2:
> - len_in_bytes = 2;
> + len_in_bytes = HW_BREAKPOINT_LEN_2;
> break;
> case ARM_BREAKPOINT_LEN_3:
> - len_in_bytes = 3;
> + len_in_bytes = HW_BREAKPOINT_LEN_3;
> break;
> case ARM_BREAKPOINT_LEN_4:
> - len_in_bytes = 4;
> + len_in_bytes = HW_BREAKPOINT_LEN_4;
> break;
> case ARM_BREAKPOINT_LEN_5:
> - len_in_bytes = 5;
> + len_in_bytes = HW_BREAKPOINT_LEN_5;
> break;
> case ARM_BREAKPOINT_LEN_6:
> - len_in_bytes = 6;
> + len_in_bytes = HW_BREAKPOINT_LEN_6;
> break;
> case ARM_BREAKPOINT_LEN_7:
> - len_in_bytes = 7;
> + len_in_bytes = HW_BREAKPOINT_LEN_7;
> break;
> case ARM_BREAKPOINT_LEN_8:
> - len_in_bytes = 8;
> + len_in_bytes = HW_BREAKPOINT_LEN_8;
> break;
> }
>

Please ignore this. Wrong patch got picked up in the git send-email :)

2024-02-26 13:18:33

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] arm64/sysreg: Add BRBE registers and fields

On Mon, Feb 26, 2024 at 09:54:41AM +0530, Anshuman Khandual wrote:

> Please find the modified patch here for a quick review and do let me know
> if this looks good for the next version i.e V17. BRBCR_EL1/12/2 organized
> per their encoding. Thanks !

That looks good to me from a quick scan.


Attachments:
(No filename) (300.00 B)
signature.asc (499.00 B)
Download all attachments

2024-02-27 10:05:48

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions

On Fri, Feb 23, 2024 at 12:58:48PM +0530, Anshuman Khandual wrote:
>
>
> On 2/21/24 19:31, Mark Rutland wrote:
> > On Thu, Jan 25, 2024 at 03:11:13PM +0530, Anshuman Khandual wrote:
> >> Currently BRBE feature is not supported in a guest environment. This hides
> >> BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field.
> >
> > Does that means that a guest can currently see BRBE advertised in the
> > ID_AA64DFR0_EL1.BRB field, or is that hidden by the regular cpufeature code
> > today?
>
> IIRC it is hidden, but will have to double check. When experimenting for BRBE
> guest support enablement earlier, following changes were need for the feature
> to be visible in ID_AA64DFR0_EL1.
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 646591c67e7a..f258568535a8 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -445,6 +445,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
> };
>
> static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = {
> + S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_IMP),
> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0),
> ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0),
> ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0),
>
> Should we add the following entry - explicitly hiding BRBE from the guest
> as a prerequisite patch ?
>
> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_NI)

Is it visbile currently, or is it hidden currently?

* If it is visible before this patch, that's a latent bug that we need to go
fix first, and that'll require more coordination.

* If it is not visible before this patch, there's no problem in the code, but
the commit message needs to explicitly mention that's the case as the commit
message currently implies it is visible by only mentioning hiding it.

.. so can you please double check as you suggested above? We should be able to
explain why it is or is not visible today.

Mark.

> >> This also blocks guest accesses into BRBE system registers and instructions
> >> as if the underlying hardware never implemented FEAT_BRBE feature.
> >>
> >> Cc: Marc Zyngier <[email protected]>
> >> Cc: Oliver Upton <[email protected]>
> >> Cc: James Morse <[email protected]>
> >> Cc: Suzuki K Poulose <[email protected]>
> >> Cc: Catalin Marinas <[email protected]>
> >> Cc: Will Deacon <[email protected]>
> >> Cc: [email protected]
> >> Cc: [email protected]
> >> Cc: [email protected]
> >> Signed-off-by: Anshuman Khandual <[email protected]>
> >> ---
> >> Changes in V16:
> >>
> >> - Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion
> >>
> >> arch/arm64/kvm/sys_regs.c | 56 +++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 56 insertions(+)
> >>
> >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >> index 30253bd19917..6a06dc2f0c06 100644
> >> --- a/arch/arm64/kvm/sys_regs.c
> >> +++ b/arch/arm64/kvm/sys_regs.c
> >> @@ -1304,6 +1304,11 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
> >> return 0;
> >> }
> >>
> >> +#define BRB_INF_SRC_TGT_EL1(n) \
> >> + { SYS_DESC(SYS_BRBINF##n##_EL1), undef_access }, \
> >> + { SYS_DESC(SYS_BRBSRC##n##_EL1), undef_access }, \
> >> + { SYS_DESC(SYS_BRBTGT##n##_EL1), undef_access } \
> >
> > With the changes suggested on the previous patch, this would need to change to be:
> >
> > #define BRB_INF_SRC_TGT_EL1(n) \
> > { SYS_DESC(SYS_BRBINF_EL1(n)), undef_access }, \
> > { SYS_DESC(SYS_BRBSRC_EL1(n)), undef_access }, \
> > { SYS_DESC(SYS_BRBTGT_EL1(n)), undef_access } \
>
> Sure, already folded back in these above changes.
>
> >
> >
> > ... which would also be easier for backporting (if necessary), since those
> > definitions have existed for a while.
> >
> > Otherwise (modulo Suzuki's comment about rebasing), this looks good to me.
>
> Okay.
>
> >
> > Mark.
> >
> >> /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
> >> #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
> >> { SYS_DESC(SYS_DBGBVRn_EL1(n)), \
> >> @@ -1707,6 +1712,9 @@ static u64 read_sanitised_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
> >> /* Hide SPE from guests */
> >> val &= ~ID_AA64DFR0_EL1_PMSVer_MASK;
> >>
> >> + /* Hide BRBE from guests */
> >> + val &= ~ID_AA64DFR0_EL1_BRBE_MASK;
> >> +
> >> return val;
> >> }
> >>
> >> @@ -2195,6 +2203,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> >> { SYS_DESC(SYS_DC_CISW), access_dcsw },
> >> { SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
> >> { SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
> >> + { SYS_DESC(OP_BRB_IALL), undef_access },
> >> + { SYS_DESC(OP_BRB_INJ), undef_access },
> >>
> >> DBG_BCR_BVR_WCR_WVR_EL1(0),
> >> DBG_BCR_BVR_WCR_WVR_EL1(1),
> >> @@ -2225,6 +2235,52 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> >> { SYS_DESC(SYS_DBGCLAIMCLR_EL1), trap_raz_wi },
> >> { SYS_DESC(SYS_DBGAUTHSTATUS_EL1), trap_dbgauthstatus_el1 },
> >>
> >> + /*
> >> + * BRBE branch record sysreg address space is interleaved between
> >> + * corresponding BRBINF<N>_EL1, BRBSRC<N>_EL1, and BRBTGT<N>_EL1.
> >> + */
> >> + BRB_INF_SRC_TGT_EL1(0),
> >> + BRB_INF_SRC_TGT_EL1(16),
> >> + BRB_INF_SRC_TGT_EL1(1),
> >> + BRB_INF_SRC_TGT_EL1(17),
> >> + BRB_INF_SRC_TGT_EL1(2),
> >> + BRB_INF_SRC_TGT_EL1(18),
> >> + BRB_INF_SRC_TGT_EL1(3),
> >> + BRB_INF_SRC_TGT_EL1(19),
> >> + BRB_INF_SRC_TGT_EL1(4),
> >> + BRB_INF_SRC_TGT_EL1(20),
> >> + BRB_INF_SRC_TGT_EL1(5),
> >> + BRB_INF_SRC_TGT_EL1(21),
> >> + BRB_INF_SRC_TGT_EL1(6),
> >> + BRB_INF_SRC_TGT_EL1(22),
> >> + BRB_INF_SRC_TGT_EL1(7),
> >> + BRB_INF_SRC_TGT_EL1(23),
> >> + BRB_INF_SRC_TGT_EL1(8),
> >> + BRB_INF_SRC_TGT_EL1(24),
> >> + BRB_INF_SRC_TGT_EL1(9),
> >> + BRB_INF_SRC_TGT_EL1(25),
> >> + BRB_INF_SRC_TGT_EL1(10),
> >> + BRB_INF_SRC_TGT_EL1(26),
> >> + BRB_INF_SRC_TGT_EL1(11),
> >> + BRB_INF_SRC_TGT_EL1(27),
> >> + BRB_INF_SRC_TGT_EL1(12),
> >> + BRB_INF_SRC_TGT_EL1(28),
> >> + BRB_INF_SRC_TGT_EL1(13),
> >> + BRB_INF_SRC_TGT_EL1(29),
> >> + BRB_INF_SRC_TGT_EL1(14),
> >> + BRB_INF_SRC_TGT_EL1(30),
> >> + BRB_INF_SRC_TGT_EL1(15),
> >> + BRB_INF_SRC_TGT_EL1(31),
> >> +
> >> + /* Remaining BRBE sysreg addresses space */
> >> + { SYS_DESC(SYS_BRBCR_EL1), undef_access },
> >> + { SYS_DESC(SYS_BRBFCR_EL1), undef_access },
> >> + { SYS_DESC(SYS_BRBTS_EL1), undef_access },
> >> + { SYS_DESC(SYS_BRBINFINJ_EL1), undef_access },
> >> + { SYS_DESC(SYS_BRBSRCINJ_EL1), undef_access },
> >> + { SYS_DESC(SYS_BRBTGTINJ_EL1), undef_access },
> >> + { SYS_DESC(SYS_BRBIDR0_EL1), undef_access },
> >> +
> >> { SYS_DESC(SYS_MDCCSR_EL0), trap_raz_wi },
> >> { SYS_DESC(SYS_DBGDTR_EL0), trap_raz_wi },
> >> // DBGDTR[TR]X_EL0 share the same encoding
> >> --
> >> 2.25.1
> >>

2024-02-27 10:06:31

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH] arm64/sysreg: Add BRBE registers and fields

On Mon, Feb 26, 2024 at 09:54:41AM +0530, Anshuman Khandual wrote:
> This adds BRBE related register definitions and various other related field
> macros there in. These will be used subsequently in a BRBE driver, which is
> being added later on. While here, this drops redundant register definitions
> from the header i.e (arch/arm64/include/asm/sysreg.h).
>
> BRBINFx_EL1_TYPE_IMPDEF_TRAP_EL3 register field value has been derived from
> latest ARM DDI 0601 ID121123, AKA 2023-12 instead of latest ARM ARM i.e ARM
> DDI 0487J.a. Please find the definition here.
>
> https://developer.arm.com/documentation/ddi0601/2023-12/
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> Please find the modified patch here for a quick review and do let me know
> if this looks good for the next version i.e V17. BRBCR_EL1/12/2 organized
> per their encoding. Thanks !

Superficially that looks fine to me.

Mark.

> arch/arm64/include/asm/sysreg.h | 17 ++---
> arch/arm64/tools/sysreg | 131 ++++++++++++++++++++++++++++++++
> 2 files changed, 137 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index c3b19b376c86..481c7d186dfa 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -195,16 +195,8 @@
> #define SYS_DBGVCR32_EL2 sys_reg(2, 4, 0, 7, 0)
>
> #define SYS_BRBINF_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 0))
> -#define SYS_BRBINFINJ_EL1 sys_reg(2, 1, 9, 1, 0)
> #define SYS_BRBSRC_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 1))
> -#define SYS_BRBSRCINJ_EL1 sys_reg(2, 1, 9, 1, 1)
> #define SYS_BRBTGT_EL1(n) sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 2))
> -#define SYS_BRBTGTINJ_EL1 sys_reg(2, 1, 9, 1, 2)
> -#define SYS_BRBTS_EL1 sys_reg(2, 1, 9, 0, 2)
> -
> -#define SYS_BRBCR_EL1 sys_reg(2, 1, 9, 0, 0)
> -#define SYS_BRBFCR_EL1 sys_reg(2, 1, 9, 0, 1)
> -#define SYS_BRBIDR0_EL1 sys_reg(2, 1, 9, 2, 0)
>
> #define SYS_TRCITECR_EL1 sys_reg(3, 0, 1, 2, 3)
> #define SYS_TRCACATR(m) sys_reg(2, 1, 2, ((m & 7) << 1), (2 | (m >> 3)))
> @@ -270,8 +262,6 @@
> /* ETM */
> #define SYS_TRCOSLAR sys_reg(2, 1, 1, 0, 4)
>
> -#define SYS_BRBCR_EL2 sys_reg(2, 4, 9, 0, 0)
> -
> #define SYS_MIDR_EL1 sys_reg(3, 0, 0, 0, 0)
> #define SYS_MPIDR_EL1 sys_reg(3, 0, 0, 0, 5)
> #define SYS_REVIDR_EL1 sys_reg(3, 0, 0, 0, 6)
> @@ -601,7 +591,6 @@
> #define SYS_CNTHV_CVAL_EL2 sys_reg(3, 4, 14, 3, 2)
>
> /* VHE encodings for architectural EL0/1 system registers */
> -#define SYS_BRBCR_EL12 sys_reg(2, 5, 9, 0, 0)
> #define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0)
> #define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2)
> #define SYS_SCTLR2_EL12 sys_reg(3, 5, 1, 0, 3)
> @@ -794,6 +783,12 @@
> #define OP_COSP_RCTX sys_insn(1, 3, 7, 3, 6)
> #define OP_CPP_RCTX sys_insn(1, 3, 7, 3, 7)
>
> +/*
> + * BRBE Instructions
> + */
> +#define BRB_IALL_INSN __emit_inst(0xd5000000 | OP_BRB_IALL | (0x1f))
> +#define BRB_INJ_INSN __emit_inst(0xd5000000 | OP_BRB_INJ | (0x1f))
> +
> /* Common SCTLR_ELx flags. */
> #define SCTLR_ELx_ENTP2 (BIT(60))
> #define SCTLR_ELx_DSSBS (BIT(44))
> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
> index 4c9b67934367..60d288cbd5eb 100644
> --- a/arch/arm64/tools/sysreg
> +++ b/arch/arm64/tools/sysreg
> @@ -1023,6 +1023,137 @@ UnsignedEnum 3:0 MTEPERM
> EndEnum
> EndSysreg
>
> +
> +SysregFields BRBINFx_EL1
> +Res0 63:47
> +Field 46 CCU
> +Field 45:32 CC
> +Res0 31:18
> +Field 17 LASTFAILED
> +Field 16 T
> +Res0 15:14
> +Enum 13:8 TYPE
> + 0b000000 DIRECT_UNCOND
> + 0b000001 INDIRECT
> + 0b000010 DIRECT_LINK
> + 0b000011 INDIRECT_LINK
> + 0b000101 RET
> + 0b000111 ERET
> + 0b001000 DIRECT_COND
> + 0b100001 DEBUG_HALT
> + 0b100010 CALL
> + 0b100011 TRAP
> + 0b100100 SERROR
> + 0b100110 INSN_DEBUG
> + 0b100111 DATA_DEBUG
> + 0b101010 ALIGN_FAULT
> + 0b101011 INSN_FAULT
> + 0b101100 DATA_FAULT
> + 0b101110 IRQ
> + 0b101111 FIQ
> + 0b110000 IMPDEF_TRAP_EL3
> + 0b111001 DEBUG_EXIT
> +EndEnum
> +Enum 7:6 EL
> + 0b00 EL0
> + 0b01 EL1
> + 0b10 EL2
> + 0b11 EL3
> +EndEnum
> +Field 5 MPRED
> +Res0 4:2
> +Enum 1:0 VALID
> + 0b00 NONE
> + 0b01 TARGET
> + 0b10 SOURCE
> + 0b11 FULL
> +EndEnum
> +EndSysregFields
> +
> +SysregFields BRBCR_ELx
> +Res0 63:24
> +Field 23 EXCEPTION
> +Field 22 ERTN
> +Res0 21:10
> +Field 9 FZPSS
> +Field 8 FZP
> +Res0 7
> +Enum 6:5 TS
> + 0b01 VIRTUAL
> + 0b10 GUEST_PHYSICAL
> + 0b11 PHYSICAL
> +EndEnum
> +Field 4 MPRED
> +Field 3 CC
> +Res0 2
> +Field 1 ExBRE
> +Field 0 E0BRE
> +EndSysregFields
> +
> +Sysreg BRBCR_EL1 2 1 9 0 0
> +Fields BRBCR_ELx
> +EndSysreg
> +
> +Sysreg BRBFCR_EL1 2 1 9 0 1
> +Res0 63:30
> +Enum 29:28 BANK
> + 0b00 BANK_0
> + 0b01 BANK_1
> +EndEnum
> +Res0 27:23
> +Field 22 CONDDIR
> +Field 21 DIRCALL
> +Field 20 INDCALL
> +Field 19 RTN
> +Field 18 INDIRECT
> +Field 17 DIRECT
> +Field 16 EnI
> +Res0 15:8
> +Field 7 PAUSED
> +Field 6 LASTFAILED
> +Res0 5:0
> +EndSysreg
> +
> +Sysreg BRBTS_EL1 2 1 9 0 2
> +Field 63:0 TS
> +EndSysreg
> +
> +Sysreg BRBINFINJ_EL1 2 1 9 1 0
> +Fields BRBINFx_EL1
> +EndSysreg
> +
> +Sysreg BRBSRCINJ_EL1 2 1 9 1 1
> +Field 63:0 ADDRESS
> +EndSysreg
> +
> +Sysreg BRBTGTINJ_EL1 2 1 9 1 2
> +Field 63:0 ADDRESS
> +EndSysreg
> +
> +Sysreg BRBIDR0_EL1 2 1 9 2 0
> +Res0 63:16
> +Enum 15:12 CC
> + 0b101 20_BIT
> +EndEnum
> +Enum 11:8 FORMAT
> + 0b0 0
> +EndEnum
> +Enum 7:0 NUMREC
> + 0b00001000 8
> + 0b00010000 16
> + 0b00100000 32
> + 0b01000000 64
> +EndEnum
> +EndSysreg
> +
> +Sysreg BRBCR_EL2 2 4 9 0 0
> +Fields BRBCR_ELx
> +EndSysreg
> +
> +Sysreg BRBCR_EL12 2 5 9 0 0
> +Fields BRBCR_ELx
> +EndSysreg
> +
> Sysreg ID_AA64ZFR0_EL1 3 0 0 4 4
> Res0 63:60
> UnsignedEnum 59:56 F64MM
> --
> 2.25.1
>

2024-02-27 11:13:45

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions



On 2/27/24 15:34, Mark Rutland wrote:
> On Fri, Feb 23, 2024 at 12:58:48PM +0530, Anshuman Khandual wrote:
>>
>>
>> On 2/21/24 19:31, Mark Rutland wrote:
>>> On Thu, Jan 25, 2024 at 03:11:13PM +0530, Anshuman Khandual wrote:
>>>> Currently BRBE feature is not supported in a guest environment. This hides
>>>> BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field.
>>>
>>> Does that means that a guest can currently see BRBE advertised in the
>>> ID_AA64DFR0_EL1.BRB field, or is that hidden by the regular cpufeature code
>>> today?
>>
>> IIRC it is hidden, but will have to double check. When experimenting for BRBE
>> guest support enablement earlier, following changes were need for the feature
>> to be visible in ID_AA64DFR0_EL1.
>>
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 646591c67e7a..f258568535a8 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -445,6 +445,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
>> };
>>
>> static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = {
>> + S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_IMP),
>> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0),
>> ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0),
>> ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0),
>>
>> Should we add the following entry - explicitly hiding BRBE from the guest
>> as a prerequisite patch ?
>>
>> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_NI)
>
> Is it visbile currently, or is it hidden currently?
>
> * If it is visible before this patch, that's a latent bug that we need to go
> fix first, and that'll require more coordination.
>
> * If it is not visible before this patch, there's no problem in the code, but
> the commit message needs to explicitly mention that's the case as the commit
> message currently implies it is visible by only mentioning hiding it.
>
> ... so can you please double check as you suggested above? We should be able to
> explain why it is or is not visible today.

It is currently hidden i.e following code returns 1 in the host
but returns 0 inside the guest.

aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
brbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT);

Hence - will update the commit message here as suggested.

>
> Mark.
>
>>>> This also blocks guest accesses into BRBE system registers and instructions
>>>> as if the underlying hardware never implemented FEAT_BRBE feature.
>>>>
>>>> Cc: Marc Zyngier <[email protected]>
>>>> Cc: Oliver Upton <[email protected]>
>>>> Cc: James Morse <[email protected]>
>>>> Cc: Suzuki K Poulose <[email protected]>
>>>> Cc: Catalin Marinas <[email protected]>
>>>> Cc: Will Deacon <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Signed-off-by: Anshuman Khandual <[email protected]>
>>>> ---
>>>> Changes in V16:
>>>>
>>>> - Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion
>>>>
>>>> arch/arm64/kvm/sys_regs.c | 56 +++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 56 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>>> index 30253bd19917..6a06dc2f0c06 100644
>>>> --- a/arch/arm64/kvm/sys_regs.c
>>>> +++ b/arch/arm64/kvm/sys_regs.c
>>>> @@ -1304,6 +1304,11 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
>>>> return 0;
>>>> }
>>>>
>>>> +#define BRB_INF_SRC_TGT_EL1(n) \
>>>> + { SYS_DESC(SYS_BRBINF##n##_EL1), undef_access }, \
>>>> + { SYS_DESC(SYS_BRBSRC##n##_EL1), undef_access }, \
>>>> + { SYS_DESC(SYS_BRBTGT##n##_EL1), undef_access } \
>>>
>>> With the changes suggested on the previous patch, this would need to change to be:
>>>
>>> #define BRB_INF_SRC_TGT_EL1(n) \
>>> { SYS_DESC(SYS_BRBINF_EL1(n)), undef_access }, \
>>> { SYS_DESC(SYS_BRBSRC_EL1(n)), undef_access }, \
>>> { SYS_DESC(SYS_BRBTGT_EL1(n)), undef_access } \
>>
>> Sure, already folded back in these above changes.
>>
>>>
>>>
>>> ... which would also be easier for backporting (if necessary), since those
>>> definitions have existed for a while.
>>>
>>> Otherwise (modulo Suzuki's comment about rebasing), this looks good to me.
>>
>> Okay.
>>
>>>
>>> Mark.
>>>
>>>> /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
>>>> #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
>>>> { SYS_DESC(SYS_DBGBVRn_EL1(n)), \
>>>> @@ -1707,6 +1712,9 @@ static u64 read_sanitised_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
>>>> /* Hide SPE from guests */
>>>> val &= ~ID_AA64DFR0_EL1_PMSVer_MASK;
>>>>
>>>> + /* Hide BRBE from guests */
>>>> + val &= ~ID_AA64DFR0_EL1_BRBE_MASK;
>>>> +
>>>> return val;
>>>> }
>>>>
>>>> @@ -2195,6 +2203,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>>> { SYS_DESC(SYS_DC_CISW), access_dcsw },
>>>> { SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
>>>> { SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
>>>> + { SYS_DESC(OP_BRB_IALL), undef_access },
>>>> + { SYS_DESC(OP_BRB_INJ), undef_access },
>>>>
>>>> DBG_BCR_BVR_WCR_WVR_EL1(0),
>>>> DBG_BCR_BVR_WCR_WVR_EL1(1),
>>>> @@ -2225,6 +2235,52 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>>> { SYS_DESC(SYS_DBGCLAIMCLR_EL1), trap_raz_wi },
>>>> { SYS_DESC(SYS_DBGAUTHSTATUS_EL1), trap_dbgauthstatus_el1 },
>>>>
>>>> + /*
>>>> + * BRBE branch record sysreg address space is interleaved between
>>>> + * corresponding BRBINF<N>_EL1, BRBSRC<N>_EL1, and BRBTGT<N>_EL1.
>>>> + */
>>>> + BRB_INF_SRC_TGT_EL1(0),
>>>> + BRB_INF_SRC_TGT_EL1(16),
>>>> + BRB_INF_SRC_TGT_EL1(1),
>>>> + BRB_INF_SRC_TGT_EL1(17),
>>>> + BRB_INF_SRC_TGT_EL1(2),
>>>> + BRB_INF_SRC_TGT_EL1(18),
>>>> + BRB_INF_SRC_TGT_EL1(3),
>>>> + BRB_INF_SRC_TGT_EL1(19),
>>>> + BRB_INF_SRC_TGT_EL1(4),
>>>> + BRB_INF_SRC_TGT_EL1(20),
>>>> + BRB_INF_SRC_TGT_EL1(5),
>>>> + BRB_INF_SRC_TGT_EL1(21),
>>>> + BRB_INF_SRC_TGT_EL1(6),
>>>> + BRB_INF_SRC_TGT_EL1(22),
>>>> + BRB_INF_SRC_TGT_EL1(7),
>>>> + BRB_INF_SRC_TGT_EL1(23),
>>>> + BRB_INF_SRC_TGT_EL1(8),
>>>> + BRB_INF_SRC_TGT_EL1(24),
>>>> + BRB_INF_SRC_TGT_EL1(9),
>>>> + BRB_INF_SRC_TGT_EL1(25),
>>>> + BRB_INF_SRC_TGT_EL1(10),
>>>> + BRB_INF_SRC_TGT_EL1(26),
>>>> + BRB_INF_SRC_TGT_EL1(11),
>>>> + BRB_INF_SRC_TGT_EL1(27),
>>>> + BRB_INF_SRC_TGT_EL1(12),
>>>> + BRB_INF_SRC_TGT_EL1(28),
>>>> + BRB_INF_SRC_TGT_EL1(13),
>>>> + BRB_INF_SRC_TGT_EL1(29),
>>>> + BRB_INF_SRC_TGT_EL1(14),
>>>> + BRB_INF_SRC_TGT_EL1(30),
>>>> + BRB_INF_SRC_TGT_EL1(15),
>>>> + BRB_INF_SRC_TGT_EL1(31),
>>>> +
>>>> + /* Remaining BRBE sysreg addresses space */
>>>> + { SYS_DESC(SYS_BRBCR_EL1), undef_access },
>>>> + { SYS_DESC(SYS_BRBFCR_EL1), undef_access },
>>>> + { SYS_DESC(SYS_BRBTS_EL1), undef_access },
>>>> + { SYS_DESC(SYS_BRBINFINJ_EL1), undef_access },
>>>> + { SYS_DESC(SYS_BRBSRCINJ_EL1), undef_access },
>>>> + { SYS_DESC(SYS_BRBTGTINJ_EL1), undef_access },
>>>> + { SYS_DESC(SYS_BRBIDR0_EL1), undef_access },
>>>> +
>>>> { SYS_DESC(SYS_MDCCSR_EL0), trap_raz_wi },
>>>> { SYS_DESC(SYS_DBGDTR_EL0), trap_raz_wi },
>>>> // DBGDTR[TR]X_EL0 share the same encoding
>>>> --
>>>> 2.25.1
>>>>

2024-02-29 11:45:56

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions

On 27/02/2024 11:13, Anshuman Khandual wrote:
>
>
> On 2/27/24 15:34, Mark Rutland wrote:
>> On Fri, Feb 23, 2024 at 12:58:48PM +0530, Anshuman Khandual wrote:
>>>
>>>
>>> On 2/21/24 19:31, Mark Rutland wrote:
>>>> On Thu, Jan 25, 2024 at 03:11:13PM +0530, Anshuman Khandual wrote:
>>>>> Currently BRBE feature is not supported in a guest environment. This hides
>>>>> BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field.
>>>>
>>>> Does that means that a guest can currently see BRBE advertised in the
>>>> ID_AA64DFR0_EL1.BRB field, or is that hidden by the regular cpufeature code
>>>> today?
>>>
>>> IIRC it is hidden, but will have to double check. When experimenting for BRBE
>>> guest support enablement earlier, following changes were need for the feature
>>> to be visible in ID_AA64DFR0_EL1.
>>>
>>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>>> index 646591c67e7a..f258568535a8 100644
>>> --- a/arch/arm64/kernel/cpufeature.c
>>> +++ b/arch/arm64/kernel/cpufeature.c
>>> @@ -445,6 +445,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
>>> };
>>>
>>> static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = {
>>> + S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_IMP),
>>> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0),
>>> ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0),
>>> ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0),
>>>
>>> Should we add the following entry - explicitly hiding BRBE from the guest
>>> as a prerequisite patch ?

This has nothing to do with the Guest visibility of the BRBE. This is
specifically for host "userspace" (via MRS emulation).

>>>
>>> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_NI)
>>
>> Is it visbile currently, or is it hidden currently?
>>
>> * If it is visible before this patch, that's a latent bug that we need to go
>> fix first, and that'll require more coordination.
>>
>> * If it is not visible before this patch, there's no problem in the code, but
>> the commit message needs to explicitly mention that's the case as the commit
>> message currently implies it is visible by only mentioning hiding it.
>>
>> ... so can you please double check as you suggested above? We should be able to
>> explain why it is or is not visible today.
>
> It is currently hidden i.e following code returns 1 in the host
> but returns 0 inside the guest.
>
> aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> brbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT);
>
> Hence - will update the commit message here as suggested.

This is by virtue of the masking we do in the kvm/sysreg.c below.

>
>>
>> Mark.
>>
>>>>> This also blocks guest accesses into BRBE system registers and instructions
>>>>> as if the underlying hardware never implemented FEAT_BRBE feature.
>>>>>
>>>>> Cc: Marc Zyngier <[email protected]>
>>>>> Cc: Oliver Upton <[email protected]>
>>>>> Cc: James Morse <[email protected]>
>>>>> Cc: Suzuki K Poulose <[email protected]>
>>>>> Cc: Catalin Marinas <[email protected]>
>>>>> Cc: Will Deacon <[email protected]>
>>>>> Cc: [email protected]
>>>>> Cc: [email protected]
>>>>> Cc: [email protected]
>>>>> Signed-off-by: Anshuman Khandual <[email protected]>
>>>>> ---
>>>>> Changes in V16:
>>>>>
>>>>> - Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion
>>>>>
>>>>> arch/arm64/kvm/sys_regs.c | 56 +++++++++++++++++++++++++++++++++++++++
>>>>> 1 file changed, 56 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>>>> index 30253bd19917..6a06dc2f0c06 100644
>>>>> --- a/arch/arm64/kvm/sys_regs.c
>>>>> +++ b/arch/arm64/kvm/sys_regs.c
>>>>> @@ -1304,6 +1304,11 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +#define BRB_INF_SRC_TGT_EL1(n) \
>>>>> + { SYS_DESC(SYS_BRBINF##n##_EL1), undef_access }, \
>>>>> + { SYS_DESC(SYS_BRBSRC##n##_EL1), undef_access }, \
>>>>> + { SYS_DESC(SYS_BRBTGT##n##_EL1), undef_access } \
>>>>
>>>> With the changes suggested on the previous patch, this would need to change to be:
>>>>
>>>> #define BRB_INF_SRC_TGT_EL1(n) \
>>>> { SYS_DESC(SYS_BRBINF_EL1(n)), undef_access }, \
>>>> { SYS_DESC(SYS_BRBSRC_EL1(n)), undef_access }, \
>>>> { SYS_DESC(SYS_BRBTGT_EL1(n)), undef_access } \
>>>
>>> Sure, already folded back in these above changes.
>>>
>>>>
>>>>
>>>> ... which would also be easier for backporting (if necessary), since those
>>>> definitions have existed for a while.
>>>>
>>>> Otherwise (modulo Suzuki's comment about rebasing), this looks good to me.
>>>
>>> Okay.
>>>
>>>>
>>>> Mark.
>>>>
>>>>> /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
>>>>> #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
>>>>> { SYS_DESC(SYS_DBGBVRn_EL1(n)), \
>>>>> @@ -1707,6 +1712,9 @@ static u64 read_sanitised_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
>>>>> /* Hide SPE from guests */
>>>>> val &= ~ID_AA64DFR0_EL1_PMSVer_MASK;
>>>>>
>>>>> + /* Hide BRBE from guests */
>>>>> + val &= ~ID_AA64DFR0_EL1_BRBE_MASK;
>>>>> +

This controls what the guest sees.

Suzuki


>>>>> return val;
>>>>> }
>>>>>
>>>>> @@ -2195,6 +2203,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>>>> { SYS_DESC(SYS_DC_CISW), access_dcsw },
>>>>> { SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
>>>>> { SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
>>>>> + { SYS_DESC(OP_BRB_IALL), undef_access },
>>>>> + { SYS_DESC(OP_BRB_INJ), undef_access },
>>>>>
>>>>> DBG_BCR_BVR_WCR_WVR_EL1(0),
>>>>> DBG_BCR_BVR_WCR_WVR_EL1(1),
>>>>> @@ -2225,6 +2235,52 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>>>> { SYS_DESC(SYS_DBGCLAIMCLR_EL1), trap_raz_wi },
>>>>> { SYS_DESC(SYS_DBGAUTHSTATUS_EL1), trap_dbgauthstatus_el1 },
>>>>>
>>>>> + /*
>>>>> + * BRBE branch record sysreg address space is interleaved between
>>>>> + * corresponding BRBINF<N>_EL1, BRBSRC<N>_EL1, and BRBTGT<N>_EL1.
>>>>> + */
>>>>> + BRB_INF_SRC_TGT_EL1(0),
>>>>> + BRB_INF_SRC_TGT_EL1(16),
>>>>> + BRB_INF_SRC_TGT_EL1(1),
>>>>> + BRB_INF_SRC_TGT_EL1(17),
>>>>> + BRB_INF_SRC_TGT_EL1(2),
>>>>> + BRB_INF_SRC_TGT_EL1(18),
>>>>> + BRB_INF_SRC_TGT_EL1(3),
>>>>> + BRB_INF_SRC_TGT_EL1(19),
>>>>> + BRB_INF_SRC_TGT_EL1(4),
>>>>> + BRB_INF_SRC_TGT_EL1(20),
>>>>> + BRB_INF_SRC_TGT_EL1(5),
>>>>> + BRB_INF_SRC_TGT_EL1(21),
>>>>> + BRB_INF_SRC_TGT_EL1(6),
>>>>> + BRB_INF_SRC_TGT_EL1(22),
>>>>> + BRB_INF_SRC_TGT_EL1(7),
>>>>> + BRB_INF_SRC_TGT_EL1(23),
>>>>> + BRB_INF_SRC_TGT_EL1(8),
>>>>> + BRB_INF_SRC_TGT_EL1(24),
>>>>> + BRB_INF_SRC_TGT_EL1(9),
>>>>> + BRB_INF_SRC_TGT_EL1(25),
>>>>> + BRB_INF_SRC_TGT_EL1(10),
>>>>> + BRB_INF_SRC_TGT_EL1(26),
>>>>> + BRB_INF_SRC_TGT_EL1(11),
>>>>> + BRB_INF_SRC_TGT_EL1(27),
>>>>> + BRB_INF_SRC_TGT_EL1(12),
>>>>> + BRB_INF_SRC_TGT_EL1(28),
>>>>> + BRB_INF_SRC_TGT_EL1(13),
>>>>> + BRB_INF_SRC_TGT_EL1(29),
>>>>> + BRB_INF_SRC_TGT_EL1(14),
>>>>> + BRB_INF_SRC_TGT_EL1(30),
>>>>> + BRB_INF_SRC_TGT_EL1(15),
>>>>> + BRB_INF_SRC_TGT_EL1(31),
>>>>> +
>>>>> + /* Remaining BRBE sysreg addresses space */
>>>>> + { SYS_DESC(SYS_BRBCR_EL1), undef_access },
>>>>> + { SYS_DESC(SYS_BRBFCR_EL1), undef_access },
>>>>> + { SYS_DESC(SYS_BRBTS_EL1), undef_access },
>>>>> + { SYS_DESC(SYS_BRBINFINJ_EL1), undef_access },
>>>>> + { SYS_DESC(SYS_BRBSRCINJ_EL1), undef_access },
>>>>> + { SYS_DESC(SYS_BRBTGTINJ_EL1), undef_access },
>>>>> + { SYS_DESC(SYS_BRBIDR0_EL1), undef_access },
>>>>> +
>>>>> { SYS_DESC(SYS_MDCCSR_EL0), trap_raz_wi },
>>>>> { SYS_DESC(SYS_DBGDTR_EL0), trap_raz_wi },
>>>>> // DBGDTR[TR]X_EL0 share the same encoding
>>>>> --
>>>>> 2.25.1
>>>>>
>


2024-02-29 12:55:55

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions

Hi Suzuki,

On Thu, Feb 29, 2024 at 11:45:08AM +0000, Suzuki K Poulose wrote:
> On 27/02/2024 11:13, Anshuman Khandual wrote:
> > On 2/27/24 15:34, Mark Rutland wrote:
> > > On Fri, Feb 23, 2024 at 12:58:48PM +0530, Anshuman Khandual wrote:
> > > > On 2/21/24 19:31, Mark Rutland wrote:
> > > > > On Thu, Jan 25, 2024 at 03:11:13PM +0530, Anshuman Khandual wrote:
> > > > > > Currently BRBE feature is not supported in a guest environment. This hides
> > > > > > BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field.
> > > > >
> > > > > Does that means that a guest can currently see BRBE advertised in the
> > > > > ID_AA64DFR0_EL1.BRB field, or is that hidden by the regular cpufeature code
> > > > > today?
> > > >
> > > > IIRC it is hidden, but will have to double check. When experimenting for BRBE
> > > > guest support enablement earlier, following changes were need for the feature
> > > > to be visible in ID_AA64DFR0_EL1.
> > > >
> > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > > index 646591c67e7a..f258568535a8 100644
> > > > --- a/arch/arm64/kernel/cpufeature.c
> > > > +++ b/arch/arm64/kernel/cpufeature.c
> > > > @@ -445,6 +445,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
> > > > };
> > > > static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = {
> > > > + S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_IMP),
> > > > S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0),
> > > > ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0),
> > > > ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0),
> > > >
> > > > Should we add the following entry - explicitly hiding BRBE from the guest
> > > > as a prerequisite patch ?
>
> This has nothing to do with the Guest visibility of the BRBE. This is
> specifically for host "userspace" (via MRS emulation).
>
> > > >
> > > > S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_NI)
> > >
> > > Is it visbile currently, or is it hidden currently?
> > >
> > > * If it is visible before this patch, that's a latent bug that we need to go
> > > fix first, and that'll require more coordination.
> > >
> > > * If it is not visible before this patch, there's no problem in the code, but
> > > the commit message needs to explicitly mention that's the case as the commit
> > > message currently implies it is visible by only mentioning hiding it.
> > >
> > > ... so can you please double check as you suggested above? We should be able to
> > > explain why it is or is not visible today.
> >
> > It is currently hidden i.e following code returns 1 in the host
> > but returns 0 inside the guest.
> >
> > aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> > brbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT);
> >
> > Hence - will update the commit message here as suggested.
>
> This is by virtue of the masking we do in the kvm/sysreg.c below.

Yep, once this patch is applied.

I think we might have some crossed wires here; I'm only really asking for the
commit message (and title) to be updated and clarified.

Ignoring the patchlet above, and just considering the original patch:

IIUC before the patch is applied, the ID_AA64DFR0_EL1.BRBE field is zero for
the guest because we don't have an arm64_ftr_bits entry for the
ID_AA64DFR0_EL1.BRBE field, and so init_cpu_ftr_reg() will leave that as zero
in arm64_ftr_reg::sys_val, and hence when read_sanitised_id_aa64dfr0_el1()
calls read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1), the BRBE field will be zero.

This series as-is doesn't add an arm64_ftr_bits entry for ID_AA64DFR0_EL1.BRBE,
so it'd still be hidden from a guest regardless of whether we add explicit
masking in read_sanitised_id_aa64dfr0_el1(). The reason to add that masking is
to be explicit, so that if/when we add an arm64_ftr_bits entry for
ID_AA64DFR0_EL1.BRBE, it isn't exposed to a guest unexpectedly.

Similarly, IIUC the BRBE register accesses are *already* trapped, and
emulate_sys_reg() will log a warning an inject an UNDEFINED exception into the
guest if the guest tries to access the BRBE registers. Any well-behaved guest
*shouldn't* do that, but a poorly-behaved guest could do that and (slowly) spam
dmesg with messages about the unhandled sysreg traps. The reasons to handle
thos regs is largely to suppress that warning, and to make it clear that we
intend for those to be handled as undef.

So the commit title should be something like:

KVM: arm64: explicitly handle BRBE register accesses as UNDEFINED

.. and the message should mention the key points from the above.

Suzuki, does that sound right to you?

Anshuman, can you go re-write the commit message with that in mind?

Mark.

2024-02-29 15:43:38

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions

On 29/02/2024 12:50, Mark Rutland wrote:
> Hi Suzuki,
>
> On Thu, Feb 29, 2024 at 11:45:08AM +0000, Suzuki K Poulose wrote:
>> On 27/02/2024 11:13, Anshuman Khandual wrote:
>>> On 2/27/24 15:34, Mark Rutland wrote:
>>>> On Fri, Feb 23, 2024 at 12:58:48PM +0530, Anshuman Khandual wrote:
>>>>> On 2/21/24 19:31, Mark Rutland wrote:
>>>>>> On Thu, Jan 25, 2024 at 03:11:13PM +0530, Anshuman Khandual wrote:
>>>>>>> Currently BRBE feature is not supported in a guest environment. This hides
>>>>>>> BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field.
>>>>>>
>>>>>> Does that means that a guest can currently see BRBE advertised in the
>>>>>> ID_AA64DFR0_EL1.BRB field, or is that hidden by the regular cpufeature code
>>>>>> today?
>>>>>
>>>>> IIRC it is hidden, but will have to double check. When experimenting for BRBE
>>>>> guest support enablement earlier, following changes were need for the feature
>>>>> to be visible in ID_AA64DFR0_EL1.
>>>>>
>>>>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>>>>> index 646591c67e7a..f258568535a8 100644
>>>>> --- a/arch/arm64/kernel/cpufeature.c
>>>>> +++ b/arch/arm64/kernel/cpufeature.c
>>>>> @@ -445,6 +445,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
>>>>> };
>>>>> static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = {
>>>>> + S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_IMP),
>>>>> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0),
>>>>> ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0),
>>>>> ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0),
>>>>>
>>>>> Should we add the following entry - explicitly hiding BRBE from the guest
>>>>> as a prerequisite patch ?
>>
>> This has nothing to do with the Guest visibility of the BRBE. This is
>> specifically for host "userspace" (via MRS emulation).
>>
>>>>>
>>>>> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_NI)
>>>>
>>>> Is it visbile currently, or is it hidden currently?
>>>>
>>>> * If it is visible before this patch, that's a latent bug that we need to go
>>>> fix first, and that'll require more coordination.
>>>>
>>>> * If it is not visible before this patch, there's no problem in the code, but
>>>> the commit message needs to explicitly mention that's the case as the commit
>>>> message currently implies it is visible by only mentioning hiding it.
>>>>
>>>> ... so can you please double check as you suggested above? We should be able to
>>>> explain why it is or is not visible today.
>>>
>>> It is currently hidden i.e following code returns 1 in the host
>>> but returns 0 inside the guest.
>>>
>>> aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
>>> brbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT);
>>>
>>> Hence - will update the commit message here as suggested.
>>
>> This is by virtue of the masking we do in the kvm/sysreg.c below.
>
> Yep, once this patch is applied.
>
> I think we might have some crossed wires here; I'm only really asking for the
> commit message (and title) to be updated and clarified.
>
> Ignoring the patchlet above, and just considering the original patch:
>
> IIUC before the patch is applied, the ID_AA64DFR0_EL1.BRBE field is zero for
> the guest because we don't have an arm64_ftr_bits entry for the
> ID_AA64DFR0_EL1.BRBE field, and so init_cpu_ftr_reg() will leave that as zero
> in arm64_ftr_reg::sys_val, and hence when read_sanitised_id_aa64dfr0_el1()
> calls read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1), the BRBE field will be zero.
>
> This series as-is doesn't add an arm64_ftr_bits entry for ID_AA64DFR0_EL1.BRBE,
> so it'd still be hidden from a guest regardless of whether we add explicit
> masking in read_sanitised_id_aa64dfr0_el1(). The reason to add that masking is
> to be explicit, so that if/when we add an arm64_ftr_bits entry for
> ID_AA64DFR0_EL1.BRBE, it isn't exposed to a guest unexpectedly.
>
> Similarly, IIUC the BRBE register accesses are *already* trapped, and
> emulate_sys_reg() will log a warning an inject an UNDEFINED exception into the
> guest if the guest tries to access the BRBE registers. Any well-behaved guest
> *shouldn't* do that, but a poorly-behaved guest could do that and (slowly) spam
> dmesg with messages about the unhandled sysreg traps. The reasons to handle
> thos regs is largely to suppress that warning, and to make it clear that we
> intend for those to be handled as undef.
>
> So the commit title should be something like:
>
> KVM: arm64: explicitly handle BRBE register accesses as UNDEFINED
>
> ... and the message should mention the key points from the above.
>
> Suzuki, does that sound right to you?

Yes, that makes perfect sense to me. Thanks for clarifying

Suzuki

>
> Anshuman, can you go re-write the commit message with that in mind?
>
> Mark.


2024-02-29 18:41:07

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH V16 5/8] KVM: arm64: nvhe: Disable branch generation in nVHE guests

On Thu, 25 Jan 2024 09:41:16 +0000,
Anshuman Khandual <[email protected]> wrote:
>
> Disable the BRBE before we enter the guest, saving the status and enable it
> back once we get out of the guest. This avoids capturing branch records in
> the guest kernel or userspace, which would be confusing the host samples.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Oliver Upton <[email protected]>
> Cc: James Morse <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> CC: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> Changes in V16:
>
> - Dropped BRBCR_EL1 and BRBFCR_EL1 from enum vcpu_sysreg
> - Reverted back the KVM NVHE patch - used host_debug_state based 'brbcr_el1'
> element, and dropped the previous dependency on Jame's coresight series
>
> arch/arm64/include/asm/kvm_host.h | 5 ++++-
> arch/arm64/kvm/debug.c | 5 +++++
> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 33 ++++++++++++++++++++++++++++++
> 3 files changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 21c57b812569..bce8792092af 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -569,7 +569,7 @@ struct kvm_vcpu_arch {
> u8 cflags;
>
> /* Input flags to the hypervisor code, potentially cleared after use */
> - u8 iflags;
> + u16 iflags;
>
> /* State flags for kernel bookkeeping, unused by the hypervisor code */
> u8 sflags;
> @@ -610,6 +610,7 @@ struct kvm_vcpu_arch {
> u64 pmscr_el1;
> /* Self-hosted trace */
> u64 trfcr_el1;
> + u64 brbcr_el1;
> } host_debug_state;
>
> /* VGIC state */
> @@ -779,6 +780,8 @@ struct kvm_vcpu_arch {
> #define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
> /* vcpu running in HYP context */
> #define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
> +/* Save BRBE context if active */
> +#define DEBUG_STATE_SAVE_BRBE __vcpu_single_flag(iflags, BIT(8))
>
> /* SVE enabled for host EL0 */
> #define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 8725291cb00a..99f85d8acbf3 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -335,10 +335,15 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
> if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
> !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
> vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
> +
> + /* Check if we have BRBE implemented and available at the host */
> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT))
> + vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_BRBE);
> }
>
> void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
> {
> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
> + vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_BRBE);
> }
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 4558c02eb352..79bcf0fb1326 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -79,6 +79,34 @@ static void __debug_restore_trace(u64 trfcr_el1)
> write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1);
> }
>
> +static void __debug_save_brbe(u64 *brbcr_el1)
> +{
> + *brbcr_el1 = 0;
> +
> + /* Check if the BRBE is enabled */
> + if (!(read_sysreg_s(SYS_BRBCR_EL1) & (BRBCR_ELx_E0BRE | BRBCR_ELx_ExBRE)))
> + return;
> +
> + /*
> + * Prohibit branch record generation while we are in guest.
> + * Since access to BRBCR_EL1 is trapped, the guest can't
> + * modify the filtering set by the host.
> + */
> + *brbcr_el1 = read_sysreg_s(SYS_BRBCR_EL1);
> + write_sysreg_s(0, SYS_BRBCR_EL1);

As for TRFCR and PMSCR, this is broken on hVHE.

Please see [1]

M.

[1] https://lore.kernel.org/r/[email protected]

--
Without deviation from the norm, progress is not possible.

2024-03-01 02:24:06

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V16 5/8] KVM: arm64: nvhe: Disable branch generation in nVHE guests



On 3/1/24 00:10, Marc Zyngier wrote:
> On Thu, 25 Jan 2024 09:41:16 +0000,
> Anshuman Khandual <[email protected]> wrote:
>>
>> Disable the BRBE before we enter the guest, saving the status and enable it
>> back once we get out of the guest. This avoids capturing branch records in
>> the guest kernel or userspace, which would be confusing the host samples.
>>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Oliver Upton <[email protected]>
>> Cc: James Morse <[email protected]>
>> Cc: Suzuki K Poulose <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> CC: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> Changes in V16:
>>
>> - Dropped BRBCR_EL1 and BRBFCR_EL1 from enum vcpu_sysreg
>> - Reverted back the KVM NVHE patch - used host_debug_state based 'brbcr_el1'
>> element, and dropped the previous dependency on Jame's coresight series
>>
>> arch/arm64/include/asm/kvm_host.h | 5 ++++-
>> arch/arm64/kvm/debug.c | 5 +++++
>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 33 ++++++++++++++++++++++++++++++
>> 3 files changed, 42 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 21c57b812569..bce8792092af 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -569,7 +569,7 @@ struct kvm_vcpu_arch {
>> u8 cflags;
>>
>> /* Input flags to the hypervisor code, potentially cleared after use */
>> - u8 iflags;
>> + u16 iflags;
>>
>> /* State flags for kernel bookkeeping, unused by the hypervisor code */
>> u8 sflags;
>> @@ -610,6 +610,7 @@ struct kvm_vcpu_arch {
>> u64 pmscr_el1;
>> /* Self-hosted trace */
>> u64 trfcr_el1;
>> + u64 brbcr_el1;
>> } host_debug_state;
>>
>> /* VGIC state */
>> @@ -779,6 +780,8 @@ struct kvm_vcpu_arch {
>> #define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
>> /* vcpu running in HYP context */
>> #define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
>> +/* Save BRBE context if active */
>> +#define DEBUG_STATE_SAVE_BRBE __vcpu_single_flag(iflags, BIT(8))
>>
>> /* SVE enabled for host EL0 */
>> #define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>> index 8725291cb00a..99f85d8acbf3 100644
>> --- a/arch/arm64/kvm/debug.c
>> +++ b/arch/arm64/kvm/debug.c
>> @@ -335,10 +335,15 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
>> if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
>> !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
>> vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
>> +
>> + /* Check if we have BRBE implemented and available at the host */
>> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT))
>> + vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_BRBE);
>> }
>>
>> void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
>> {
>> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
>> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
>> + vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_BRBE);
>> }
>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> index 4558c02eb352..79bcf0fb1326 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> @@ -79,6 +79,34 @@ static void __debug_restore_trace(u64 trfcr_el1)
>> write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1);
>> }
>>
>> +static void __debug_save_brbe(u64 *brbcr_el1)
>> +{
>> + *brbcr_el1 = 0;
>> +
>> + /* Check if the BRBE is enabled */
>> + if (!(read_sysreg_s(SYS_BRBCR_EL1) & (BRBCR_ELx_E0BRE | BRBCR_ELx_ExBRE)))
>> + return;
>> +
>> + /*
>> + * Prohibit branch record generation while we are in guest.
>> + * Since access to BRBCR_EL1 is trapped, the guest can't
>> + * modify the filtering set by the host.
>> + */
>> + *brbcr_el1 = read_sysreg_s(SYS_BRBCR_EL1);
>> + write_sysreg_s(0, SYS_BRBCR_EL1);
>
> As for TRFCR and PMSCR, this is broken on hVHE.
>
> Please see [1]
>
> M.
>
> [1] https://lore.kernel.org/r/[email protected]
>

Ahh I see, so the unified accessors read_sysreg_el1()/write_sysreg_el1()
need to be used here - which will choose between BRBCR_EL1 & BRBCR_EL12
as required. Will do the changes, thanks for pointing out.

2024-03-01 05:40:52

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V16 3/8] drivers: perf: arm_pmuv3: Enable branch stack sampling framework

On 2/21/24 22:55, Mark Rutland wrote:
> Hi Anshuman,
>
> On Thu, Jan 25, 2024 at 03:11:14PM +0530, Anshuman Khandual wrote:
>> Branch stack sampling support i.e capturing branch records during execution
>> in core perf, rides along with normal HW events being scheduled on the PMU.
>> This prepares ARMV8 PMU framework for branch stack support on relevant PMUs
>> with required HW implementation.
>
> Please can we start a bit more clearly, e.g.
>
> | drivers: perf: arm_pmu: add instructure for branch stack sampling
> |
> | In order to support the Branch Record Buffer Extension (BRBE), we need to
> | extend the arm_pmu framework with some basic infrastructure for branch stack
> | sampling which arm_pmu drivers can opt-in to using. Subsequent patches will
> | use this to add support for BRBE in the PMUv3 driver.

Added this paragraph at the very beginning.

>
>> ARMV8 PMU hardware support for branch stack sampling is indicated via a new
>> feature flag called 'has_branch_stack' that can be ascertained via probing.
>> This modifies current gate in armpmu_event_init() which blocks branch stack
>> sampling based perf events unconditionally. Instead allows such perf events
>> getting initialized on supporting PMU hardware.
>
> This paragraph can be deleted. The addition of 'has_branch_stack' and its use
> in armpmu_event_init() is trivial and obvious in-context, and this distracts
> from the important parts of this patch.

Okay, dropped the above paragraph.

>
>> Branch stack sampling is enabled and disabled along with regular PMU events
>> . This adds required function callbacks in armv8pmu_branch_xxx() format, to
>> drive the PMU branch stack hardware when supported. This also adds fallback
>> stub definitions for these callbacks for PMUs which would not have required
>> support.
>
> Those additions to the PMUv3 driver should all be in the next patch.

Sure, will do that.

>
> We don't add anything for the other PMU drivers that don't support branch
> sampling, so why do we need to do *anything* to the PMUv3 driver here, given we
> add the support in the next patch? Those additions only make this patch bigger
> and more confusing (and hence more painful to review).

Understood.

>
>> If a task gets scheduled out, the current branch records get saved in the
>> task's context data, which can be later used to fill in the records upon an
>> event overflow. Hence, we enable PERF_ATTACH_TASK_DATA (event->attach_state
>> based flag) for branch stack requesting perf events. But this also requires
>> adding support for pmu::sched_task() callback to arm_pmu.
>
> I think what this is trying to say is:
>
> | With BRBE, the hardware records branches into a hardware FIFO, which will be
> | sampled by software when perf events overflow. A task may be context-switched
> | an arbitrary number of times between overflows, and to avoid losing samples
> | we need to save the current records when a task is context-switched out. To
> | do these we'll need to use the pmu::sched_task() callback, and we'll need to
> | allocate some per-task storage space using PERF_ATTACH_TASK_DATA.

Replaced as suggested.

>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Mark Rutland <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> Changes in V16:
>>
>> - Renamed arm_brbe.h as arm_pmuv3_branch.h
>> - Updated perf_sample_save_brstack()'s new argument requirements with NULL
>>
>> drivers/perf/arm_pmu.c | 57 ++++++++++++-
>> drivers/perf/arm_pmuv3.c | 141 +++++++++++++++++++++++++++++++-
>> drivers/perf/arm_pmuv3_branch.h | 50 +++++++++++
>> include/linux/perf/arm_pmu.h | 29 ++++++-
>> include/linux/perf/arm_pmuv3.h | 1 -
>> 5 files changed, 273 insertions(+), 5 deletions(-)
>> create mode 100644 drivers/perf/arm_pmuv3_branch.h
>>
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 8458fe2cebb4..16f488ae7747 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -317,6 +317,15 @@ armpmu_del(struct perf_event *event, int flags)
>> struct hw_perf_event *hwc = &event->hw;
>> int idx = hwc->idx;
>>
>> + if (has_branch_stack(event)) {
>> + WARN_ON_ONCE(!hw_events->brbe_users);
>> + hw_events->brbe_users--;
>> + if (!hw_events->brbe_users) {
>> + hw_events->brbe_context = NULL;
>> + hw_events->brbe_sample_type = 0;
>> + }
>> + }
>> +
>
> If this is going to leak into the core arm_pmu code, use "branch_stack" rather
> than "brbe" for these field names.

Right, makes sense. I too was contemplating for that rename, will change.

>
> However, I reckon we could just have two new callbacks on arm_pmu:
>
> branch_stack_add(struct perf_event *event, ...);
> branch_stack_del(struct perf_event *event, ...);
>
> ... and hide all of the details in the PMUv3 (or BRBE) driver for now, and the
> code above can just do:
>
> if (has_branch_stack(event))
> branch_stack_del(event, ...);
>
> ... and likewise in armpmu_add().
>
> That way the actuel management logic for the context and so on can be added in
> the next patch, where the lifetime would be *much* clearer.

Right, will change as required.

>
>> armpmu_stop(event, PERF_EF_UPDATE);
>> hw_events->events[idx] = NULL;
>> armpmu->clear_event_idx(hw_events, event);
>> @@ -333,6 +342,38 @@ armpmu_add(struct perf_event *event, int flags)
>> struct hw_perf_event *hwc = &event->hw;
>> int idx;
>>
>> + if (has_branch_stack(event)) {
>> + /*
>> + * Reset branch records buffer if a new CPU bound event
>> + * gets scheduled on a PMU. Otherwise existing branch
>> + * records present in the buffer might just leak into
>> + * such events.
>> + *
>> + * Also reset current 'hw_events->brbe_context' because
>> + * any previous task bound event now would have lost an
>> + * opportunity for continuous branch records.
>> + */
>
> Doesn't this mean some user silently loses events? Why is that ok?

Previous task bound event that has been on the CPU will loose current branch
records available in BRBE when this happens. Buffer needs reset for records
integrity for the upcoming CPU bound event. Following options are available
in such cases.

- Let it loose some samples, anyways it's going to be rare (proposed here)
- Call armv8pmu_branch_save() to save them off on the event, before reset
- Tell the event that it has lost some samples - PERF_RECORD_LOST ?

Please suggest which would be a better solution ? OR there might be some other
approach for this scenario ?

>
>> + if (!event->ctx->task) {
>> + hw_events->brbe_context = NULL;
>> + if (armpmu->branch_reset)
>> + armpmu->branch_reset();
>> + }
>> +
>> + /*
>> + * Reset branch records buffer if a new task event gets
>> + * scheduled on a PMU which might have existing records.
>> + * Otherwise older branch records present in the buffer
>> + * might leak into the new task event.
>> + */
>> + if (event->ctx->task && hw_events->brbe_context != event->ctx) {
>> + hw_events->brbe_context = event->ctx;
>> + if (armpmu->branch_reset)
>> + armpmu->branch_reset();
>> + }
>
> Same question here.

As explained above.

>
> How does this work on other architectures?

I had gone through some of them before but don't recollect the details right now.
I will get back on this later.

>
> What do we do if the CPU-bound and task-bound events want different filters,
> etc?

Unless the same task comes back again on the CPU, buffer needs to be reset in all
other scenarios to guard against captured branch records integrity for the target
event. Then filter differences does not really matter.

>
> This is the sort of gnarly detail that should be explained (or at least
> introduced) in the commit message.

Understood, will try and update the commit message accordingly.

>
>> + hw_events->brbe_users++;
>> + hw_events->brbe_sample_type = event->attr.branch_sample_type;
>
> What exactly is brbe_sample_type, and why does it get overriden *every time* we
> add a new event? What happens when events have different values for
> brbe_sample_type? Or is that forbidden somehow?

brbe_sample_type contains the final perf branch filter that gets into BRBE HW for
the recording session. The proposed solution here goes with the last perf event's
'attr.branch_sample_type' when they get collected for the given PMU via pmu_add()
callback.

hw_events->brbe_sample_type = event->attr.branch_sample_type

So in a scenario where multiple branch events are programmed with different filter
requests, the captured branch records during PMU IRQ might not match the requests
for many events that were scheduled together. Hence we only give the branch records
to the matching events.

static void read_branch_records()
{
..
/*
* Overflowed event's branch_sample_type does not match the configured
* branch filters in the BRBE HW. So the captured branch records here
* cannot be co-related to the overflowed event. Report to the user as
* if no branch records have been captured, and flush branch records.
* The same scenario is applicable when the current task context does
* not match with overflown event.
*/
if ((cpuc->branch_sample_type != event->attr.branch_sample_type) ||
(event->ctx->task && cpuc->branch_context != event->ctx))
return;
..
}

Please note that we don't prohibit the events from being grouped together on the PMU
i.e pmu_add() does not fail when filters do not match. But there might be some other
approaches that could be taken in such scenarios.

A. Fail pmu_add() when branch_sample_type does not match

B. OR together all event's event->attr.branch_sample_type on a given PMU

- Then captured records need to be post processed to find applicable samples
matching event's original filter request

- But it might add some more latency to PMU IRQ handling ?

But please do let me know if there are better solutions that can be taken up.

>
>> + }
>> +
>> /* An event following a process won't be stopped earlier */
>> if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
>> return -ENOENT;
>
> Unless this cpumask check has been made redundant, it means that the code above
> it is obviously wrong, since that pokes the BRBE HW and increments brbe_users
> *before* we decide whether the event can be installed on this CPU. That'll blow
> up on big.LITTLE, e.g. we try and install a 'big' CPU event on a 'little' CPU,
> poke the BRBE HW and increment brbe_users, then *after* that we abort
> installing the event.

Agreed, aborting to install the event on the cpu after incrementing brbe_users
will be problematic.

>
> Even ignoring big.LITTLE, we can fail immediately after this when we don't have
> enough counters, since the following code is:
>
> | /* An event following a process won't be stopped earlier */
> | if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
> | return -ENOENT;
> |
> | /* If we don't have a space for the counter then finish early. */
> | idx = armpmu->get_event_idx(hw_events, event);
> | if (idx < 0)
> | return idx;
>
> ... which'll go wrong if you try to open 1 more event than the CPU has
> counters.

Agreed, the event needs to clear that test as well before incrementing brbe_users.

Should the branch stack context needs to be installed only after the event has
cleared get_event_idx() successfully along with HW counters availability check
etc before proceeding to install on the CPU ? IOW just move the block bit down

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index ac07911263a9..d657ce337f10 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -336,9 +336,6 @@ armpmu_add(struct perf_event *event, int flags)
struct hw_perf_event *hwc = &event->hw;
int idx;

- if (has_branch_stack(event))
- armpmu->branch_stack_add(event, hw_events);
-
/* An event following a process won't be stopped earlier */
if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
return -ENOENT;
@@ -348,6 +345,9 @@ armpmu_add(struct perf_event *event, int flags)
if (idx < 0)
return idx;

+ if (has_branch_stack(event))
+ armpmu->branch_stack_add(event, hw_events);
+
/*
* If there is an event in the counter we are going to use then make
* sure it is disabled.


>
>> @@ -511,13 +552,24 @@ static int armpmu_event_init(struct perf_event *event)
>> !cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
>> return -ENOENT;
>>
>> - /* does not support taken branch sampling */
>> - if (has_branch_stack(event))
>> + /*
>> + * Branch stack sampling events are allowed
>> + * only on PMU which has required support.
>> + */
>> + if (has_branch_stack(event) && !armpmu->has_branch_stack)
>> return -EOPNOTSUPP;
>> return __hw_perf_event_init(event);
>> }
>>
>
> I think we can delete the comment entirely here, but the code itself looks
> fine here.

Sure, will delete the above comment.

>
>> +static void armpmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
>> +{
>> + struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
>> +
>> + if (armpmu->sched_task)
>> + armpmu->sched_task(pmu_ctx, sched_in);
>> +}
>
> This looks fine.
>
>> static void armpmu_enable(struct pmu *pmu)
>> {
>> struct arm_pmu *armpmu = to_arm_pmu(pmu);
>> @@ -864,6 +916,7 @@ struct arm_pmu *armpmu_alloc(void)
>> }
>>
>> pmu->pmu = (struct pmu) {
>> + .sched_task = armpmu_sched_task,
>> .pmu_enable = armpmu_enable,
>> .pmu_disable = armpmu_disable,
>> .event_init = armpmu_event_init,
>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>> index 23fa6c5da82c..9e17764a0929 100644
>> --- a/drivers/perf/arm_pmuv3.c
>> +++ b/drivers/perf/arm_pmuv3.c
>> @@ -26,6 +26,7 @@
>> #include <linux/nmi.h>
>>
>> #include <asm/arm_pmuv3.h>
>> +#include "arm_pmuv3_branch.h"
>
> As above, I do not thing that the PMUv3 driver should change at all in this
> patch. As of this patch it achieves nothing, and it makes it really hard to
> understand what's going on because the important aspects are spread randomly
> across this patch and the next patch which actually adds the BRBE management.
>
> Please factor the PMUv3 changes out into the patch adding the actual BRBE code.

Sure, will keep the following changes in this patch.

A. drivers/perf/arm_pmu.c

- armpmu_add() --> armpmu->branch_stack_add()
- armpmu_del() --> armpmu->branch_stack_del()
- Allowing has_branch_stack() events in armpmu_event_init()
- Adding callback arm_pmu->pmu->sched_task = armpmu_sched_task

B. include/linux/perf/arm_pmu.h

- Adding branch elements into pmu_hw_events
- Adding branch callbacks into arm_pmu
- Adding sched_task() into arm_pmu
- Adding has_branch_stack into arm_pmu

Move everything else into the next patch implementing BRBE.

drivers/perf/arm_pmuv3.c
drivers/perf/arm_pmuv3_branch.h

>
> [...]
>
>> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
>> index 46377e134d67..c3e7d2cfb737 100644
>> --- a/include/linux/perf/arm_pmuv3.h
>> +++ b/include/linux/perf/arm_pmuv3.h
>> @@ -308,5 +308,4 @@
>> default: WARN(1, "Invalid PMEV* index\n"); \
>> } \
>> } while (0)
>> -
>> #endif
>
> Unrelated whitespace change.

Already folded this in.

2024-03-01 07:56:56

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions


On 2/29/24 18:20, Mark Rutland wrote:
> Hi Suzuki,
>
> On Thu, Feb 29, 2024 at 11:45:08AM +0000, Suzuki K Poulose wrote:
>> On 27/02/2024 11:13, Anshuman Khandual wrote:
>>> On 2/27/24 15:34, Mark Rutland wrote:
>>>> On Fri, Feb 23, 2024 at 12:58:48PM +0530, Anshuman Khandual wrote:
>>>>> On 2/21/24 19:31, Mark Rutland wrote:
>>>>>> On Thu, Jan 25, 2024 at 03:11:13PM +0530, Anshuman Khandual wrote:
>>>>>>> Currently BRBE feature is not supported in a guest environment. This hides
>>>>>>> BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field.
>>>>>>
>>>>>> Does that means that a guest can currently see BRBE advertised in the
>>>>>> ID_AA64DFR0_EL1.BRB field, or is that hidden by the regular cpufeature code
>>>>>> today?
>>>>>
>>>>> IIRC it is hidden, but will have to double check. When experimenting for BRBE
>>>>> guest support enablement earlier, following changes were need for the feature
>>>>> to be visible in ID_AA64DFR0_EL1.
>>>>>
>>>>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>>>>> index 646591c67e7a..f258568535a8 100644
>>>>> --- a/arch/arm64/kernel/cpufeature.c
>>>>> +++ b/arch/arm64/kernel/cpufeature.c
>>>>> @@ -445,6 +445,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
>>>>> };
>>>>> static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = {
>>>>> + S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_IMP),
>>>>> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0),
>>>>> ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0),
>>>>> ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0),
>>>>>
>>>>> Should we add the following entry - explicitly hiding BRBE from the guest
>>>>> as a prerequisite patch ?
>>
>> This has nothing to do with the Guest visibility of the BRBE. This is
>> specifically for host "userspace" (via MRS emulation).
>>
>>>>>
>>>>> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_NI)
>>>>
>>>> Is it visbile currently, or is it hidden currently?
>>>>
>>>> * If it is visible before this patch, that's a latent bug that we need to go
>>>> fix first, and that'll require more coordination.
>>>>
>>>> * If it is not visible before this patch, there's no problem in the code, but
>>>> the commit message needs to explicitly mention that's the case as the commit
>>>> message currently implies it is visible by only mentioning hiding it.
>>>>
>>>> ... so can you please double check as you suggested above? We should be able to
>>>> explain why it is or is not visible today.
>>>
>>> It is currently hidden i.e following code returns 1 in the host
>>> but returns 0 inside the guest.
>>>
>>> aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
>>> brbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT);
>>>
>>> Hence - will update the commit message here as suggested.
>>
>> This is by virtue of the masking we do in the kvm/sysreg.c below.
>
> Yep, once this patch is applied.
>
> I think we might have some crossed wires here; I'm only really asking for the
> commit message (and title) to be updated and clarified.

Understood.

>
> Ignoring the patchlet above, and just considering the original patch:
>
> IIUC before the patch is applied, the ID_AA64DFR0_EL1.BRBE field is zero for
> the guest because we don't have an arm64_ftr_bits entry for the
> ID_AA64DFR0_EL1.BRBE field, and so init_cpu_ftr_reg() will leave that as zero
> in arm64_ftr_reg::sys_val, and hence when read_sanitised_id_aa64dfr0_el1()
> calls read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1), the BRBE field will be zero.

Makes sense, but should not arm64_ftr_reg::sys_val be explicitly set to '0' via
ID_AA64DFR0_EL1_BRBE_NI via adding a S_ARM64_FTR_BITS() into ftr_id_aa64dfr0[] ?
OR because it's going to be made visible via S_ARM64_FTR_BITS(FTR_VISIBLE
, ...., ID_AA64DFR0_EL1_BRBE_IMP) for enabling it in the guest, this might not be
necessary for now. Besides it is also being blocked explicitly now via this patch
in read_sanitised_id_aa64dfr0_el1().

>
> This series as-is doesn't add an arm64_ftr_bits entry for ID_AA64DFR0_EL1.BRBE,
> so it'd still be hidden from a guest regardless of whether we add explicit
> masking in read_sanitised_id_aa64dfr0_el1(). The reason to add that masking is
> to be explicit, so that if/when we add an arm64_ftr_bits entry for
> ID_AA64DFR0_EL1.BRBE, it isn't exposed to a guest unexpectedly.

>
> Similarly, IIUC the BRBE register accesses are *already* trapped, and
> emulate_sys_reg() will log a warning an inject an UNDEFINED exception into the
> guest if the guest tries to access the BRBE registers. Any well-behaved guest
> *shouldn't* do that, but a poorly-behaved guest could do that and (slowly) spam
> dmesg with messages about the unhandled sysreg traps. The reasons to handle
> thos regs is largely to suppress that warning, and to make it clear that we
> intend for those to be handled as undef.

Understood.

>
> So the commit title should be something like:
>
> KVM: arm64: explicitly handle BRBE register accesses as UNDEFINED
>
> ... and the message should mention the key points from the above.
>
> Suzuki, does that sound right to you?
>
> Anshuman, can you go re-write the commit message with that in mind?

Sure, will something like the following be okay ?

KVM: arm64: Explicitly handle BRBE register accesses as UNDEFINED

Although ID_AA64DFR0_EL1.BRBE field is zero for the guest because there is
no arm64_ftr_bits[] entry for the ID_AA64DFR0_EL1.BRBE field while getting
processed for read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1), this masks BRBE
feature here to be rather explicit. This will prevent unexpected exposure
of BRBE feature to guest when arm64_ftr_bits[] changes for ID_AA64DFR0_EL1.
This also makes all guest accesses into BRBE registers, and instructions
as undefined access explicitly.

2024-03-01 12:49:40

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V16 2/8] KVM: arm64: Prevent guest accesses into BRBE system registers/instructions

On Fri, Mar 01, 2024 at 01:16:10PM +0530, Anshuman Khandual wrote:
>
> On 2/29/24 18:20, Mark Rutland wrote:
> > Hi Suzuki,
> >
> > On Thu, Feb 29, 2024 at 11:45:08AM +0000, Suzuki K Poulose wrote:
> >> On 27/02/2024 11:13, Anshuman Khandual wrote:
> >>> On 2/27/24 15:34, Mark Rutland wrote:
> >>>> On Fri, Feb 23, 2024 at 12:58:48PM +0530, Anshuman Khandual wrote:
> >>>>> On 2/21/24 19:31, Mark Rutland wrote:
> >>>>>> On Thu, Jan 25, 2024 at 03:11:13PM +0530, Anshuman Khandual wrote:
> >>>>>>> Currently BRBE feature is not supported in a guest environment. This hides
> >>>>>>> BRBE feature availability via masking ID_AA64DFR0_EL1.BRBE field.
> >>>>>>
> >>>>>> Does that means that a guest can currently see BRBE advertised in the
> >>>>>> ID_AA64DFR0_EL1.BRB field, or is that hidden by the regular cpufeature code
> >>>>>> today?
> >>>>>
> >>>>> IIRC it is hidden, but will have to double check. When experimenting for BRBE
> >>>>> guest support enablement earlier, following changes were need for the feature
> >>>>> to be visible in ID_AA64DFR0_EL1.
> >>>>>
> >>>>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> >>>>> index 646591c67e7a..f258568535a8 100644
> >>>>> --- a/arch/arm64/kernel/cpufeature.c
> >>>>> +++ b/arch/arm64/kernel/cpufeature.c
> >>>>> @@ -445,6 +445,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
> >>>>> };
> >>>>> static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = {
> >>>>> + S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_IMP),
> >>>>> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0),
> >>>>> ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0),
> >>>>> ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0),
> >>>>>
> >>>>> Should we add the following entry - explicitly hiding BRBE from the guest
> >>>>> as a prerequisite patch ?
> >>
> >> This has nothing to do with the Guest visibility of the BRBE. This is
> >> specifically for host "userspace" (via MRS emulation).
> >>
> >>>>>
> >>>>> S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_BRBE_SHIFT, 4, ID_AA64DFR0_EL1_BRBE_NI)
> >>>>
> >>>> Is it visbile currently, or is it hidden currently?
> >>>>
> >>>> * If it is visible before this patch, that's a latent bug that we need to go
> >>>> fix first, and that'll require more coordination.
> >>>>
> >>>> * If it is not visible before this patch, there's no problem in the code, but
> >>>> the commit message needs to explicitly mention that's the case as the commit
> >>>> message currently implies it is visible by only mentioning hiding it.
> >>>>
> >>>> ... so can you please double check as you suggested above? We should be able to
> >>>> explain why it is or is not visible today.
> >>>
> >>> It is currently hidden i.e following code returns 1 in the host
> >>> but returns 0 inside the guest.
> >>>
> >>> aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> >>> brbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT);
> >>>
> >>> Hence - will update the commit message here as suggested.
> >>
> >> This is by virtue of the masking we do in the kvm/sysreg.c below.
> >
> > Yep, once this patch is applied.
> >
> > I think we might have some crossed wires here; I'm only really asking for the
> > commit message (and title) to be updated and clarified.
>
> Understood.
>
> > Ignoring the patchlet above, and just considering the original patch:
> >
> > IIUC before the patch is applied, the ID_AA64DFR0_EL1.BRBE field is zero for
> > the guest because we don't have an arm64_ftr_bits entry for the
> > ID_AA64DFR0_EL1.BRBE field, and so init_cpu_ftr_reg() will leave that as zero
> > in arm64_ftr_reg::sys_val, and hence when read_sanitised_id_aa64dfr0_el1()
> > calls read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1), the BRBE field will be zero.
>
> Makes sense, but should not arm64_ftr_reg::sys_val be explicitly set to '0' via
> ID_AA64DFR0_EL1_BRBE_NI via adding a S_ARM64_FTR_BITS() into ftr_id_aa64dfr0[] ?

I don't understand what you're asking here -- there's no way that a
arm64_ftr_bits entry can explicitly zero a field.

> OR because it's going to be made visible via S_ARM64_FTR_BITS(FTR_VISIBLE
> , ...., ID_AA64DFR0_EL1_BRBE_IMP) for enabling it in the guest, this might not be
> necessary for now. Besides it is also being blocked explicitly now via this patch
> in read_sanitised_id_aa64dfr0_el1().

We are not going to add a FTR_VISIBLE entry -- as Suzuki already pointed out,
that means *visible to userspace*.

We currently have no need for an arm64_ftr_bits entry for BRBE. We can add one
for the sake of documenting our policy for that field, like we do for PMUVer,
but that's the only reason to do so, and doing that requires that we also mask
the field within read_sanitised_id_aa64dfr0_el1().

> > So the commit title should be something like:
> >
> > KVM: arm64: explicitly handle BRBE register accesses as UNDEFINED
> >
> > ... and the message should mention the key points from the above.
> >
> > Suzuki, does that sound right to you?
> >
> > Anshuman, can you go re-write the commit message with that in mind?
>
> Sure, will something like the following be okay ?
>
> KVM: arm64: Explicitly handle BRBE register accesses as UNDEFINED
>
> Although ID_AA64DFR0_EL1.BRBE field is zero for the guest because there is
> no arm64_ftr_bits[] entry for the ID_AA64DFR0_EL1.BRBE field while getting
> processed for read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1), this masks BRBE
> feature here to be rather explicit. This will prevent unexpected exposure
> of BRBE feature to guest when arm64_ftr_bits[] changes for ID_AA64DFR0_EL1.
> This also makes all guest accesses into BRBE registers, and instructions
> as undefined access explicitly.

How about:

| KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
|
| The Branch Record Buffer Extension (BRBE) adds a number of system registers
| and instructions which we don't currently intend to expose to guests. Our
| existing logic handles this safely, but could be improved with some explicit
| handling of BRBE.
|
| The presence of BRBE is currently hidden from guests as the cpufeature code's
| ftr_id_aa64dfr0[] table doesn't have an entry for the BRBE field, and so this
| will be zero in the sanitised value of ID_AA64DFR0 exposed to guests via
| read_sanitised_id_aa64dfr0_el1(). As the ftr_id_aa64dfr0[] table may gain an
| entry for the BRBE field in future, for robustness we should explicitly mask
| out the BRBE field in read_sanitised_id_aa64dfr0_el1().
|
| The BRBE system registers and instructions are currently trapped by the
| existing configuration of the fine-grained traps. As the registers and
| instructions are not described in the sys_reg_descs[] table,
| emulate_sys_reg() will warn that these are unknown before injecting an
| UNDEFINED exception into the guest. Well-behaved guests shouldn't try to use
| the registers or instructions, but badly-behaved guests could, these,
| resulting in unnecessary warnings. To avoid those warnings, we should
| explicitly handle the BRBE registers and instructions as UNDEFINED.
|
| Address the above by having read_sanitised_id_aa64dfr0_el1() mask out the
| ID_AA64DFR0.BRBE field, and by adding sys_reg_desc entries for all of the
| BRBE system registers and instructions, treating these all as UNDEFINED.

Mark.

2024-03-01 13:53:05

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V16 3/8] drivers: perf: arm_pmuv3: Enable branch stack sampling framework

On Fri, Mar 01, 2024 at 11:07:32AM +0530, Anshuman Khandual wrote:
> On 2/21/24 22:55, Mark Rutland wrote:
> > On Thu, Jan 25, 2024 at 03:11:14PM +0530, Anshuman Khandual wrote:

> >> @@ -333,6 +342,38 @@ armpmu_add(struct perf_event *event, int flags)
> >> struct hw_perf_event *hwc = &event->hw;
> >> int idx;
> >>
> >> + if (has_branch_stack(event)) {
> >> + /*
> >> + * Reset branch records buffer if a new CPU bound event
> >> + * gets scheduled on a PMU. Otherwise existing branch
> >> + * records present in the buffer might just leak into
> >> + * such events.
> >> + *
> >> + * Also reset current 'hw_events->brbe_context' because
> >> + * any previous task bound event now would have lost an
> >> + * opportunity for continuous branch records.
> >> + */
> >
> > Doesn't this mean some user silently loses events? Why is that ok?
>
> Previous task bound event that has been on the CPU will loose current branch
> records available in BRBE when this happens. Buffer needs reset for records
> integrity for the upcoming CPU bound event. Following options are available
> in such cases.
>
> - Let it loose some samples, anyways it's going to be rare (proposed here)
> - Call armv8pmu_branch_save() to save them off on the event, before reset
> - Tell the event that it has lost some samples - PERF_RECORD_LOST ?
>
> Please suggest which would be a better solution ? OR there might be some other
> approach for this scenario ?

TBH, I'm not immediately sure what the best option is here, and this is part of
the bigger problem of "how do multiple events with branch sampling interact?".

I'll need to go explore that problem space (and see what other architectures
do). For now, it would be good if you could handle the patch restructuring
(i.e. splitting the PMUv3 and arm_pmu changes) sorted first, and then we can
consider the BRBE sharing / lifetime problems atop that.

So for now (i.e. for v17), leave this as-is;

[...]

> >
> >> + hw_events->brbe_users++;
> >> + hw_events->brbe_sample_type = event->attr.branch_sample_type;
> >
> > What exactly is brbe_sample_type, and why does it get overriden *every time* we
> > add a new event? What happens when events have different values for
> > brbe_sample_type? Or is that forbidden somehow?
>
> brbe_sample_type contains the final perf branch filter that gets into BRBE HW for
> the recording session. The proposed solution here goes with the last perf event's
> 'attr.branch_sample_type' when they get collected for the given PMU via pmu_add()
> callback.
>
> hw_events->brbe_sample_type = event->attr.branch_sample_type
>
> So in a scenario where multiple branch events are programmed with different filter
> requests, the captured branch records during PMU IRQ might not match the requests
> for many events that were scheduled together. Hence we only give the branch records
> to the matching events.
>
> static void read_branch_records()
> {
> ...
> /*
> * Overflowed event's branch_sample_type does not match the configured
> * branch filters in the BRBE HW. So the captured branch records here
> * cannot be co-related to the overflowed event. Report to the user as
> * if no branch records have been captured, and flush branch records.
> * The same scenario is applicable when the current task context does
> * not match with overflown event.
> */
> if ((cpuc->branch_sample_type != event->attr.branch_sample_type) ||
> (event->ctx->task && cpuc->branch_context != event->ctx))
> return;
> ...
> }

I see; it's good that we filter that in read_branch_records(), but this doesn't
feel right from a lifetime perspective. For example, if you install a pinned
per-cpu event with branch sample type A, then a task temporarily installs a
task-bound event with branch sample type B, the type in HW will be left as B
and the cpu-bound event will never get samples again.

So I think we'll have to change *something* here, but that's part of the bigger
question above, so please leave this as-is for now.

> Please note that we don't prohibit the events from being grouped together on the PMU
> i.e pmu_add() does not fail when filters do not match. But there might be some other
> approaches that could be taken in such scenarios.
>
> A. Fail pmu_add() when branch_sample_type does not match
>
> B. OR together all event's event->attr.branch_sample_type on a given PMU
>
> - Then captured records need to be post processed to find applicable samples
> matching event's original filter request
>
> - But it might add some more latency to PMU IRQ handling ?
>
> But please do let me know if there are better solutions that can be taken up.

It looks like LBR always does SW filtering, and I don't think it's actually
that expensive, so B looks like a nicer option.

However, I think that's part of that bigger question above, so for now please
leave that as-is.

> >
> >> + }
> >> +
> >> /* An event following a process won't be stopped earlier */
> >> if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
> >> return -ENOENT;
> >
> > Unless this cpumask check has been made redundant, it means that the code above
> > it is obviously wrong, since that pokes the BRBE HW and increments brbe_users
> > *before* we decide whether the event can be installed on this CPU. That'll blow
> > up on big.LITTLE, e.g. we try and install a 'big' CPU event on a 'little' CPU,
> > poke the BRBE HW and increment brbe_users, then *after* that we abort
> > installing the event.
>
> Agreed, aborting to install the event on the cpu after incrementing brbe_users
> will be problematic.
>
> > Even ignoring big.LITTLE, we can fail immediately after this when we don't have
> > enough counters, since the following code is:
> >
> > | /* An event following a process won't be stopped earlier */
> > | if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
> > | return -ENOENT;
> > |
> > | /* If we don't have a space for the counter then finish early. */
> > | idx = armpmu->get_event_idx(hw_events, event);
> > | if (idx < 0)
> > | return idx;
> >
> > ... which'll go wrong if you try to open 1 more event than the CPU has
> > counters.
>
> Agreed, the event needs to clear that test as well before incrementing brbe_users.
>
> Should the branch stack context needs to be installed only after the event has
> cleared get_event_idx() successfully along with HW counters availability check
> etc before proceeding to install on the CPU ? IOW just move the block bit down

Yes; for now that should be sufficient.

[...]

> >> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> >> index 23fa6c5da82c..9e17764a0929 100644
> >> --- a/drivers/perf/arm_pmuv3.c
> >> +++ b/drivers/perf/arm_pmuv3.c
> >> @@ -26,6 +26,7 @@
> >> #include <linux/nmi.h>
> >>
> >> #include <asm/arm_pmuv3.h>
> >> +#include "arm_pmuv3_branch.h"
> >
> > As above, I do not thing that the PMUv3 driver should change at all in this
> > patch. As of this patch it achieves nothing, and it makes it really hard to
> > understand what's going on because the important aspects are spread randomly
> > across this patch and the next patch which actually adds the BRBE management.
> >
> > Please factor the PMUv3 changes out into the patch adding the actual BRBE code.
>
> Sure, will keep the following changes in this patch.
>
> A. drivers/perf/arm_pmu.c
>
> - armpmu_add() --> armpmu->branch_stack_add()
> - armpmu_del() --> armpmu->branch_stack_del()
> - Allowing has_branch_stack() events in armpmu_event_init()
> - Adding callback arm_pmu->pmu->sched_task = armpmu_sched_task
>
> B. include/linux/perf/arm_pmu.h
>
> - Adding branch elements into pmu_hw_events
> - Adding branch callbacks into arm_pmu
> - Adding sched_task() into arm_pmu
> - Adding has_branch_stack into arm_pmu
>
> Move everything else into the next patch implementing BRBE.
>
> drivers/perf/arm_pmuv3.c
> drivers/perf/arm_pmuv3_branch.h

That looks good. Depending on what we do about BRBE sharing we might need an
armpmu::branch_stack_init() callback for event_init(), so if you end up needing
one now that's also fine.

Thanks,
Mark.