Received: by 2002:ab2:6f44:0:b0:1fd:c486:4f03 with SMTP id l4csp192558lqq; Wed, 12 Jun 2024 23:18:03 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUVDdFRum5Lvp9Xeq4u16LypLyJXDSSUKsVBPX3g57akBQ63Im4hw8aQDKfy6+MgVXWmiLyDs2zglfup7oao32xlLvyMNfFFOE4fWMiKg== X-Google-Smtp-Source: AGHT+IFxKGG2iPneybJJvi/S43PxcbkpjMYa1aeidjzEOlDNw2lecsdXnH820Yf9Qb4h2wqMPu92 X-Received: by 2002:a05:622a:19a3:b0:440:891c:4022 with SMTP id d75a77b69052e-4415abae4c2mr43223631cf.7.1718259483073; Wed, 12 Jun 2024 23:18:03 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718259483; cv=pass; d=google.com; s=arc-20160816; b=qbqvVdoD1yaT8ECpEb7dPJKr/N7Q4IkkLaHadZA+g0ZhbQgEmg6M1Gb1HRtHYH6Yne vJPZi/2jKksLtl43Xifnk3kxUl97GL13cijXiTe9Bq4RkYmKmWp0L+WHB5d7ybSQA4u0 3T4kq1Ey08wqBqCB653eMpXfRrdYolzpfyfQR6hg9aaGBnZ9q9o+mFaZY2nD5ACIlWKz Xn8z1d4N/O8vhNs8SE8RCIb4czsJSlnPRdhGiQ74Jp2C2Cdlr6Mwux8jlNczfg10zaJN OlRvsQ1p+UuCdtSYNH3iCzX4ECM5HrDelMLJACuiYa+S7KSqqY5BkTdiU3Hzyhcb8G7k AgfA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=hbrg+jg9l5q0j2HJd2WLdtotzSF2lY4ONEOF2NDRSEU=; fh=V2J5DHn0lHtldVpcv/exnbGcu06d5n6VQ+L/z9BBIRM=; b=x5+pVtbhnk5rK+95/pn0u8wBjb9kmrW4J987ckVWHMeUniJwItnyAdYQ6wENd/2nMy hOk3xNuNhCdl/OrCUkeTQyW/yoKiQgct3Wz3alR8OZKiLRGdgPwy0m9iZ7+qR/YzRUwL j7ZhMyuo6Kh+fShYzmarBR0ugCv/xRCA4VAIJ0Z9J7i4ZwI4leVjQIn5A3rXSGLL5SLP d5XjI2BRpkHppqwx5yjAs02zIFpST+zGASU/8cYuz+YKE+ZwZAQgMCaQYEwgsQvNat47 xcMZoCt1wf0ZAUMWaQFP9PvuL0WSuA5oiuefEavTi+iJVRTiiUn3sT18y0eskVAug6kU dQ1A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-212609-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-212609-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d75a77b69052e-441f31034c7si7483831cf.733.2024.06.12.23.18.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jun 2024 23:18:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-212609-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-212609-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-212609-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id B54C01C21C40 for ; Thu, 13 Jun 2024 06:18:02 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 996F01369BE; Thu, 13 Jun 2024 06:17:52 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1F6AA13440F; Thu, 13 Jun 2024 06:17:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718259470; cv=none; b=f/vC/cpwbPUDiliD8bbQ8F0tIz9eUqtwR0TBiYy8XD0a8dlp3MzUH6QNeGKO0nw+vmHViiZBa6Ped6/flcMXa7PbA/g7MkaMxwuQd02xtejKQN33vRNmA/PlznG7Vs+RRPNigzu8ZwIc2XkzkczUSxGk6Aj5iPqcowoM8fIBpKQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718259470; c=relaxed/simple; bh=KLpiAmqur20HWq3al7TvNFaUOZ9PR8NlsStEVB0QZyg=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=miipNXL4JCEY8iLyjn5JSA1WEsPF913hUv4gq9LPRRnpLo6LVRPXyoS4nsKBNiBTguSQyS38QbWeEG0uDdfUTfzKrgvPms2Bj0qQrNyyTMQNjwcz9f9uz7WDD02I1wirIhdpQ8WPenmO5beWuSmTRzAa6DXykZcpvmGTlQemnBI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ACDEB1063; Wed, 12 Jun 2024 23:18:09 -0700 (PDT) Received: from a077893.arm.com (unknown [10.163.44.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9085A3F5A1; Wed, 12 Jun 2024 23:17:39 -0700 (PDT) From: Anshuman Khandual To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, will@kernel.org, catalin.marinas@arm.com, mark.rutland@arm.com Cc: Anshuman Khandual , Mark Brown , James Clark , Rob Herring , Marc Zyngier , Suzuki Poulose , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , linux-perf-users@vger.kernel.org Subject: [PATCH V18 0/9] arm64/perf: Enable branch stack sampling Date: Thu, 13 Jun 2024 11:47:22 +0530 Message-Id: <20240613061731.3109448-1-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.25.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit This series enables perf branch stack sampling support on arm64 platform via a new arch feature called Branch Record Buffer Extension (BRBE). All the relevant register definitions could be accessed here. https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers This series applies on 6.10-rc3. Also this series is being hosted below for quick access, review and test. https://git.gitlab.arm.com/linux-arm/linux-anshuman.git (brbe_v18) - Anshuman ========== Perf Branch Stack Sampling Support (arm64 platforms) =========== Currently arm64 platform does not support perf branch stack sampling. Hence any event requesting for branch stack records i.e PERF_SAMPLE_BRANCH_STACK marked in event->attr.sample_type, will be rejected in armpmu_event_init(). static int armpmu_event_init(struct perf_event *event) { ........ /* does not support taken branch sampling */ if (has_branch_stack(event)) return -EOPNOTSUPP; ........ } $perf record -j any,u,k ls Error: cycles:P: PMU Hardware or event type doesn't support branch stack sampling. -------------------- CONFIG_ARM64_BRBE and FEAT_BRBE ---------------------- After this series, perf branch stack sampling feature gets enabled on arm64 platforms where FEAT_BRBE HW feature is supported, and CONFIG_ARM64_BRBE is also selected during build. Let's observe all all possible scenarios here. 1. Feature not built (!CONFIG_ARM64_BRBE): Falls back to the current behaviour i.e event gets rejected. 2. Feature built but HW not supported (CONFIG_ARM64_BRBE && !FEAT_BRBE): Falls back to the current behaviour i.e event gets rejected. 3. Feature built and HW supported (CONFIG_ARM64_BRBE && FEAT_BRBE): Platform supports branch stack sampling requests. Let's observe through a simple example here. $perf record -j any_call,u,k,save_type ls [Please refer perf-record man pages for all possible branch filter options] $perf report -------------------------- Snip ---------------------- # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ............................................ ............................................ .................. # 3.52% ls [kernel.kallsyms] [k] sched_clock_noinstr [k] arch_counter_get_cntpct 16 3.52% ls [kernel.kallsyms] [k] sched_clock [k] sched_clock_noinstr 9 1.85% ls [kernel.kallsyms] [k] sched_clock_cpu [k] sched_clock 5 1.80% ls [kernel.kallsyms] [k] irqtime_account_irq [k] sched_clock_cpu 20 1.58% ls [kernel.kallsyms] [k] gic_handle_irq [k] generic_handle_domain_irq 19 1.58% ls [kernel.kallsyms] [k] call_on_irq_stack [k] gic_handle_irq 9 1.58% ls [kernel.kallsyms] [k] do_interrupt_handler [k] call_on_irq_stack 23 1.58% ls [kernel.kallsyms] [k] generic_handle_domain_irq [k] __irq_resolve_mapping 6 1.58% ls [kernel.kallsyms] [k] __irq_resolve_mapping [k] __rcu_read_lock 10 -------------------------- Snip ---------------------- $perf report -D | grep cycles -------------------------- Snip ---------------------- ..... 1: ffff800080dd3334 -> ffff800080dd759c 39 cycles P 0 IND_CALL ..... 2: ffff800080ffaea0 -> ffff800080ffb688 16 cycles P 0 IND_CALL ..... 3: ffff800080139918 -> ffff800080ffae64 9 cycles P 0 CALL ..... 4: ffff800080dd3324 -> ffff8000801398f8 7 cycles P 0 CALL ..... 5: ffff8000800f8548 -> ffff800080dd330c 21 cycles P 0 IND_CALL ..... 6: ffff8000800f864c -> ffff8000800f84ec 6 cycles P 0 CALL ..... 7: ffff8000800f86dc -> ffff8000800f8638 11 cycles P 0 CALL ..... 8: ffff8000800f86d4 -> ffff800081008630 16 cycles P 0 CALL -------------------------- Snip ---------------------- perf script and other tooling can also be applied on the captured perf.data Similarly branch stack sampling records can be collected via direct system call i.e perf_event_open() method after setting 'struct perf_event_attr' as required. event->attr.sample_type |= PERF_SAMPLE_BRANCH_STACK event->attr.branch_sample_type |= PERF_SAMPLE_BRANCH_ | PERF_SAMPLE_BRANCH_ | PERF_SAMPLE_BRANCH_ | ............................... But all branch filters might not be supported on the platform. ----------------------- BRBE Branch Filters Support ----------------------- - Following branch filters are supported on arm64. PERF_SAMPLE_BRANCH_USER /* Branch privilege filters */ PERF_SAMPLE_BRANCH_KERNEL PERF_SAMPLE_BRANCH_HV PERF_SAMPLE_BRANCH_ANY /* Branch type filters */ PERF_SAMPLE_BRANCH_ANY_CALL PERF_SAMPLE_BRANCH_ANY_RETURN PERF_SAMPLE_BRANCH_IND_CALL PERF_SAMPLE_BRANCH_COND PERF_SAMPLE_BRANCH_IND_JUMP PERF_SAMPLE_BRANCH_CALL PERF_SAMPLE_BRANCH_NO_FLAGS /* Branch record flags */ PERF_SAMPLE_BRANCH_NO_CYCLES PERF_SAMPLE_BRANCH_TYPE_SAVE PERF_SAMPLE_BRANCH_HW_INDEX PERF_SAMPLE_BRANCH_PRIV_SAVE - Following branch filters are not supported on arm64. PERF_SAMPLE_BRANCH_ABORT_TX PERF_SAMPLE_BRANCH_IN_TX PERF_SAMPLE_BRANCH_NO_TX PERF_SAMPLE_BRANCH_CALL_STACK Events requesting above non-supported branch filters get rejected. --------------------------- Virtualisation support ------------------------ - No guest support -------------------------------- Testing --------------------------------- - Cross compiled for both arm64 and arm32 platforms - Passes all branch tests with 'perf test branch' on arm64 Changes in V18: - Changed BRBIDR0_EL1 register fields CC and FORMAT, updated the commit message - Replaced BRBIDR0_EL1_FORMAT_0 as BRBIDR0_EL1_FORMAT_FORMAT_0 in BRBE driver - Dropped ifdef CONFIG_ARM64_BRBE around __init_el2_brbe() - Updated in code comment around __init_el2_brbe() - Dropped the write up for EL2->EL1 transition, also moved up the EL3 write up - Unconditionally capture branch record type and privilege information - Scan valid branch stack events in armpmu_start() to create merged filter - Dropped branch_sample_type override in armv8pmu_branch_stack_add() - Dropped branch filter mismatch between PMU and event in read_branch_records() - Added SW filtering framework in read_branch_records() during filter mismatch - Added SW filtering for privilege modes - Used host_data_ptr() to access host_debug_state.brbcr_el1 register - Changed DEBUG_STATE_SAVE_BRBE to use BIT(7) - Reverted back iflags as u8 https://lore.kernel.org/all/20240405024639.1179064-1-anshuman.khandual@arm.com/ Changes in V17: - Added back Reviewed-by tags from Mark Brown - Updated the commit message regarding the field BRBINFx_EL1_TYPE_IMPDEF_TRAP_EL3 - Added leading 0s for all values as BRBIDR0_EL1.NUMREC is a 8 bit field - Added leading 0s for all values as BRBFCR_EL1.BANK is a 2 bit field - Reordered BRBCR_EL1/BRBCR_EL12/BRBCR_EL2 registers as per sysreg encodings - Renamed s/FIRST/BANK_0 and s/SECOND/BANK_1 in BRBFCR_EL1.BANK - Renamed s/UNCOND_DIRECT/DIRECT_UNCOND in BRBINFx_EL1.TYPE - Renamed s/COND_DIRECT/DIRECT_COND in BRBINFx_EL1.TYPE - Dropped __SYS_BRBINF/__SYS_BRBSRC/__SYS_BRBTGT and their expansions - Moved all existing BRBE registers from sysreg.h header to tools/sysreg format - Updated the commit message including about sys_insn_descs[] - Changed KVM to use existing SYS_BRBSRC/TGT/INF_EL1(n) format - Moved the BRBE instructions into sys_insn_descs[] array - ARM PMUV3 changes have been moved into the BRBE driver patch instead - Moved down branch_stack_add() in armpmu_add() after event's basic checks - Added new callbacks in struct arm_pmu e.g branch_stack_[init|add|del]() - Renamed struct arm_pmu callback branch_reset() as branch_stack_reset() - Dropped the comment in armpmu_event_init() - Renamed 'pmu_hw_events' elements from 'brbe_' to more generic 'branch_' - Separated out from the BRBE driver implementation patch - Dropped the comment in __init_el2_brbe() - Updated __init_el2_brbe() with BRBCR_EL2.MPRED requirements - Updated __init_el2_brbe() with __check_hvhe() constructs - Updated booting.rst regarding MPRED, MDCR_EL3 and fine grained control - Dropped Documentation/arch/arm64/brbe.rst - Renamed armv8pmu_branch_reset() as armv8pmu_branch_stack_reset() - Separated out booting.rst and EL2 boot requirements into a new patch - Dropped process_branch_aborts() completely - Added an warning if transaction states get detected unexpectedly - Dropped enum brbe_bank_idx from the driver - Defined armv8pmu_branch_stack_init/add/del() callbacks in the driver - Changed BRBE driver to use existing SYS_BRBSRC/TGT/INF_EL1(n) format - Dropped isb() call sites in __debug_[save|restore]_brbe() - Changed to [read|write]_sysreg_el1() accessors in __debug_[save|restore]_brbe() Changes in V16 https://lore.kernel.org/all/20240125094119.2542332-1-anshuman.khandual@arm.com/ - Updated BRBINFx_EL1.TYPE = 0b110000 as field IMPDEF_TRAP_EL3 - Updated BRBCR_ELx[9] as field FZPSS - Updated BRBINFINJ_EL1 to use sysreg field BRBINFx_EL1 - Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion - Renamed arm_brbe.h as arm_pmuv3_branch.h - Updated perf_sample_save_brstack()'s new argument requirements with NULL - Fixed typo (s/informations/information) in Documentation/arch/arm64/brbe.rst - Added SPDX-License-Identifier in Documentation/arch/arm64/brbe.rst - Added new PERF_SAMPLE_BRANCH_COUNTERS into BRBE_EXCLUDE_BRANCH_FILTERS - Dropped BRBFCR_EL1 and BRBCR_EL1 from enum vcpu_sysreg - Reverted back the KVM NVHE patch - use host_debug_state based 'brbcr_el1' element and dropped the previous dependency on Jame's coresight series Changes in V15: https://lore.kernel.org/all/20231201053906.1261704-1-anshuman.khandual@arm.com/ - Added a comment for armv8pmu_branch_probe() regarding single cpu probe - Added a text in brbe.rst regarding single cpu probe - Dropped runtime BRBE enable for setting DEBUG_STATE_SAVE_BRBE - Dropped zero_branch_stack based zero branch records mechanism - Replaced BRBFCR_EL1_DEFAULT_CONFIG with BRBFCR_EL1_CONFIG_MASK - Added BRBFCR_EL1_CONFIG_MASK masking in branch_type_to_brbfcr() - Moved BRBE helpers from arm_brbe.h into arm_brbe.c - Moved armv8_pmu_xxx() declaration inside arm_brbe.h for arm64 (CONFIG_ARM64_BRBE) - Moved armv8_pmu_xxx() stub definitions inside arm_brbe.h for arm32 (!CONFIG_ARM64_BRBE) - Included arm_brbe.h header both in arm_pmuv3.c and arm_brbe.c - Dropped BRBE custom pr_fmt() - Dropped CONFIG_PERF_EVENTS wrapping from header entries - Flush branch records when a cpu bound event follows a task bound event - Dropped BRBFCR_EL1 from __debug_save_brbe()/__debug_restore_brbe() - Always save the live SYS_BRBCR_EL1 in host context and then check if BRBE was enabled before resetting SYS_BRBCR_EL1 for the host Changes in V14: https://lore.kernel.org/all/20231114051329.327572-1-anshuman.khandual@arm.com/ - This series has been reorganised as suggested during V13 - There are just eight patches now i.e 5 enablement and 3 perf branch tests - Fixed brackets problem in __SYS_BRBINFO/BRBSRC/BRBTGT() macros - Renamed the macro i.e s/__SYS_BRBINFO/__SYS_BRBINF/ - Renamed s/BRB_IALL/BRB_IALL_INSN and s/BRBE_INJ/BRB_INJ_INSN - Moved BRB_IALL_INSN and SYS_BRB_INSN instructions to sysreg patch - Changed E1BRE as ExBRE in sysreg fields inside BRBCR_ELx - Used BRBCR_ELx for defining all BRBCR_EL1, BRBCR_EL2, and BRBCR_EL12 (new) - Folded the following three patches into a single patch i.e [PATCH 3/8] drivers: perf: arm_pmu: Add new sched_task() callback arm64/perf: Add branch stack support in struct arm_pmu arm64/perf: Add branch stack support in struct pmu_hw_events arm64/perf: Add branch stack support in ARMV8 PMU arm64/perf: Add PERF_ATTACH_TASK_DATA to events with has_branch_stack() - All armv8pmu_branch_xxxx() stub definitions have been moved inside include/linux/perf/arm_pmuv3.h for easy access from both arm32 and arm64 - Added brbe_users, brbe_context and brbe_sample_type in struct pmu_hw_events - Added comments for all the above new elements in struct pmu_hw_events - Added branch_reset() and sched_task() callbacks - Changed and optimized branch records processing during a PMU IRQ - NO branch records get captured for event with mismatched brbe_sample_type - Branch record context is tracked from armpmu_del() & armpmu_add() - Branch record hardware is driven from armv8pmu_start() & armv8pmu_stop() - Dropped NULL check for 'pmu_ctx' inside armv8pmu_sched_task() - Moved down PERF_ATTACH_TASK_DATA assignment with a preceding comment - In conflicting branch sample type requests, first event takes precedence - Folded the following five patches from V13 into a single patch i.e [PATCH 4/8] arm64/perf: Enable branch stack events via FEAT_BRBE arm64/perf: Add struct brbe_regset helper functions arm64/perf: Implement branch records save on task sched out arm64/perf: Implement branch records save on PMU IRQ - Fixed the year in copyright statement - Added Documentation/arch/arm64/brbe.rst - Updated Documentation/arch/arm64/booting.rst (BRBCR_EL2.CC for EL1 entry) - Added __init_el2_brbe() which enables branch record cycle count support - Disabled EL2 traps in __init_el2_fgt() while accessing BRBE registers and executing instructions - Changed CONFIG_ARM64_BRBE user visible description - Fixed a typo in CONFIG_ARM64_BRBE config option description text - Added BUILD_BUG_ON() co-relating BRBE_BANK_MAX_ENTRIES and MAX_BRANCH_RECORDS - Dropped arm64_create_brbe_task_ctx_kmem_cache() - Moved down comment for PERF_SAMPLE_BRANCH_KERNEL in branch_type_to_brbcr() - Renamed BRBCR_ELx_DEFAULT_CONFIG as BRBCR_ELx_CONFIG_MASK - Replaced BRBCR_ELx_DEFAULT_TS with BRBCR_ELx_TS_MASK in BRBCR_ELx_CONFIG_MASK - Replaced BRBCR_ELx_E1BRE instances with BRBCR_ELx_ExBRE - Added BRBE specific branch stack sampling perf test patches into the series - Added a patch to prevent guest accesses into BRBE registers and instructions - Added a patch to save the BRBE host context in NVHE environment - Updated most commit messages Changes in V13: https://lore.kernel.org/all/20230711082455.215983-1-anshuman.khandual@arm.com/ https://lore.kernel.org/all/20230622065351.1092893-1-anshuman.khandual@arm.com/ - Added branch callback stubs for aarch32 pmuv3 based platforms - Updated the comments for capture_brbe_regset() - Deleted the comments in __read_brbe_regset() - Reversed the arguments order in capture_brbe_regset() and brbe_branch_save() - Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in armv8pmu_branch_read() - Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in capture_brbe_regset() Changes in V12: https://lore.kernel.org/all/20230615133239.442736-1-anshuman.khandual@arm.com/ - Replaced branch types with complete DIRECT/INDIRECT prefixes/suffixes - Replaced branch types with complete INSN/ALIGN prefixes/suffixes - Replaced return branch types as simple RET/ERET - Replaced time field GST_PHYSICAL as GUEST_PHYSICAL - Added 0 padding for BRBIDR0_EL1.NUMREC enum values - Dropped helper arm_pmu_branch_stack_supported() - Renamed armv8pmu_branch_valid() as armv8pmu_branch_attr_valid() - Separated perf_task_ctx_cache setup from arm_pmu private allocation - Collected changes to branch_records_alloc() in a single patch [5/10] - Reworked and cleaned up branch_records_alloc() - Reworked armv8pmu_branch_read() with new loop iterations in patch [6/10] - Reworked capture_brbe_regset() with new loop iterations in patch [8/10] - Updated the comment in branch_type_to_brbcr() - Fixed the comment before stitch_stored_live_entries() - Fixed BRBINFINJ_EL1 definition for VALID_FULL enum field - Factored out helper __read_brbe_regset() from capture_brbe_regset() - Dropped the helper copy_brbe_regset() - Simplified stitch_stored_live_entries() with memcpy(), memmove() - Reworked armv8pmu_probe_pmu() to bail out early with !probe.present - Rework brbe_attributes_probe() without 'struct brbe_hw_attr' - Dropped 'struct brbe_hw_attr' argument from capture_brbe_regset() - Dropped 'struct brbe_hw_attr' argument from brbe_branch_save() - Dropped arm_pmu->private and added arm_pmu->reg_trbidr instead Changes in V11: https://lore.kernel.org/all/20230531040428.501523-1-anshuman.khandual@arm.com/ - Fixed the crash for per-cpu events without event->pmu_ctx->task_ctx_data Changes in V10: https://lore.kernel.org/all/20230517022410.722287-1-anshuman.khandual@arm.com/ - Rebased the series on v6.4-rc2 - Moved ARMV8 PMUV3 changes inside drivers/perf/arm_pmuv3.c - Moved BRBE driver changes inside drivers/perf/arm_brbe.[c|h] - Moved the WARN_ON() inside the if condition in armv8pmu_handle_irq() Changes in V9: https://lore.kernel.org/all/20230315051444.1683170-1-anshuman.khandual@arm.com/ - Fixed build problem with has_branch_stack() in arm64 header - BRBINF_EL1 definition has been changed from 'Sysreg' to 'SysregFields' - Renamed all BRBINF_EL1 call sites as BRBINFx_EL1 - Dropped static const char branch_filter_error_msg[] - Implemented a positive list check for BRBE supported perf branch filters - Added a comment in armv8pmu_handle_irq() - Implemented per-cpu allocation for struct branch_record records - Skipped looping through bank 1 if an invalid record is detected in bank 0 - Added comment in armv8pmu_branch_read() explaining prohibited region etc - Added comment warning about erroneously marking transactions as aborted - Replaced the first argument (perf_branch_entry) in capture_brbe_flags() - Dropped the last argument (idx) in capture_brbe_flags() - Dropped the brbcr argument from capture_brbe_flags() - Used perf_sample_save_brstack() to capture branch records for perf_sample_data - Added comment explaining rationale for setting BRBCR_EL1_FZP for user only traces - Dropped BRBE prohibited state mechanism while in armv8pmu_branch_read() - Implemented event task context based branch records save mechanism Changes in V8: https://lore.kernel.org/all/20230123125956.1350336-1-anshuman.khandual@arm.com/ - Replaced arm_pmu->features as arm_pmu->has_branch_stack, updated its helper - Added a comment and line break before arm_pmu->private element - Added WARN_ON_ONCE() in helpers i.e armv8pmu_branch_[read|valid|enable|disable]() - Dropped comments in armv8pmu_enable_event() and armv8pmu_disable_event() - Replaced open bank encoding in BRBFCR_EL1 with SYS_FIELD_PREP() - Changed brbe_hw_attr->brbe_version from 'bool' to 'int' - Updated pr_warn() as pr_warn_once() with values in brbe_get_perf_[type|priv]() - Replaced all pr_warn_once() as pr_debug_once() in armv8pmu_branch_valid() - Added a comment in branch_type_to_brbcr() for the BRBCR_EL1 privilege settings - Modified the comment related to BRBINFx_EL1.LASTFAILED in capture_brbe_flags() - Modified brbe_get_perf_entry_type() as brbe_set_perf_entry_type() - Renamed brbe_valid() as brbe_record_is_complete() - Renamed brbe_source() as brbe_record_is_source_only() - Renamed brbe_target() as brbe_record_is_target_only() - Inverted checks for !brbe_record_is_[target|source]_only() for info capture - Replaced 'fetch' with 'get' in all helpers that extract field value - Dropped 'static int brbe_current_bank' optimization in select_brbe_bank() - Dropped select_brbe_bank_index() completely, added capture_branch_entry() - Process captured branch entries in two separate loops one for each BRBE bank - Moved branch_records_alloc() inside armv8pmu_probe_pmu() - Added a forward declaration for the helper has_branch_stack() - Added new callbacks armv8pmu_private_alloc() and armv8pmu_private_free() - Updated armv8pmu_probe_pmu() to allocate the private structure before SMP call Changes in V7: https://lore.kernel.org/all/20230105031039.207972-1-anshuman.khandual@arm.com/ - Folded [PATCH 7/7] into [PATCH 3/7] which enables branch stack sampling event - Defined BRBFCR_EL1_BRANCH_FILTERS, BRBCR_EL1_DEFAULT_CONFIG in the header - Defined BRBFCR_EL1_DEFAULT_CONFIG in the header - Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_FZP - Defined BRBCR_EL1_DEFAULT_TS in the header - Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_DEFAULT_TS - Moved BRBCR_EL1_DEFAULT_CONFIG check inside branch_type_to_brbcr() - Moved down BRBCR_EL1_CC, BRBCR_EL1_MPRED later in branch_type_to_brbcr() - Also set BRBE in paused state in armv8pmu_branch_disable() - Dropped brbe_paused(), set_brbe_paused() helpers - Extracted error string via branch_filter_error_msg[] for armv8pmu_branch_valid() - Replaced brbe_v1p1 with brbe_version in struct brbe_hw_attr - Added valid_brbe_[cc, format, version]() helpers - Split a separate brbe_attributes_probe() from armv8pmu_branch_probe() - Capture event->attr.branch_sample_type earlier in armv8pmu_branch_valid() - Defined enum brbe_bank_idx with possible values for BRBE bank indices - Changed armpmu->hw_attr into armpmu->private - Added missing space in stub definition for armv8pmu_branch_valid() - Replaced both kmalloc() with kzalloc() - Added BRBE_BANK_MAX_ENTRIES - Updated comment for capture_brbe_flags() - Updated comment for struct brbe_hw_attr - Dropped space after type cast in couple of places - Replaced inverse with negation for testing BRBCR_EL1_FZP in armv8pmu_branch_read() - Captured cpuc->branches->branch_entries[idx] in a local variable - Dropped saved_priv from armv8pmu_branch_read() - Reorganize PERF_SAMPLE_BRANCH_NO_[CYCLES|NO_FLAGS] related configuration - Replaced with FIELD_GET() and FIELD_PREP() wherever applicable - Replaced BRBCR_EL1_TS_PHYSICAL with BRBCR_EL1_TS_VIRTUAL - Moved valid_brbe_nr(), valid_brbe_cc(), valid_brbe_format(), valid_brbe_version() select_brbe_bank(), select_brbe_bank_index() helpers inside the C implementation - Reorganized brbe_valid_nr() and dropped the pr_warn() message - Changed probe sequence in brbe_attributes_probe() - Added 'brbcr' argument into capture_brbe_flags() to ascertain correct state - Disable BRBE before disabling the PMU event counter - Enable PERF_SAMPLE_BRANCH_HV filters when is_kernel_in_hyp_mode() - Guard armv8pmu_reset() & armv8pmu_sched_task() with arm_pmu_branch_stack_supported() Changes in V6: https://lore.kernel.org/linux-arm-kernel/20221208084402.863310-1-anshuman.khandual@arm.com/ - Restore the exception level privilege after reading the branch records - Unpause the buffer after reading the branch records - Decouple BRBCR_EL1_EXCEPTION/ERTN from perf event privilege level - Reworked BRBE implementation and branch stack sampling support on arm pmu - BRBE implementation is now part of overall ARMV8 PMU implementation - BRBE implementation moved from drivers/perf/ to inside arch/arm64/kernel/ - CONFIG_ARM_BRBE_PMU renamed as CONFIG_ARM64_BRBE in arch/arm64/Kconfig - File moved - drivers/perf/arm_pmu_brbe.c -> arch/arm64/kernel/brbe.c - File moved - drivers/perf/arm_pmu_brbe.h -> arch/arm64/kernel/brbe.h - BRBE name has been dropped from struct arm_pmu and struct hw_pmu_events - BRBE name has been abstracted out as 'branches' in arm_pmu and hw_pmu_events - BRBE name has been abstracted out as 'branches' in ARMV8 PMU implementation - Added sched_task() callback into struct arm_pmu - Added 'hw_attr' into struct arm_pmu encapsulating possible PMU HW attributes - Dropped explicit attributes brbe_(v1p1, nr, cc, format) from struct arm_pmu - Dropped brbfcr, brbcr, registers scratch area from struct hw_pmu_events - Dropped brbe_users, brbe_context tracking in struct hw_pmu_events - Added 'features' tracking into struct arm_pmu with ARM_PMU_BRANCH_STACK flag - armpmu->hw_attr maps into 'struct brbe_hw_attr' inside BRBE implementation - Set ARM_PMU_BRANCH_STACK in 'arm_pmu->features' after successful BRBE probe - Added armv8pmu_branch_reset() inside armv8pmu_branch_enable() - Dropped brbe_supported() as events will be rejected via ARM_PMU_BRANCH_STACK - Dropped set_brbe_disabled() as well - Reformated armv8pmu_branch_valid() warnings while rejecting unsupported events Changes in V5: https://lore.kernel.org/linux-arm-kernel/20221107062514.2851047-1-anshuman.khandual@arm.com/ - Changed BRBCR_EL1.VIRTUAL from 0b1 to 0b01 - Changed BRBFCR_EL1.EnL into BRBFCR_EL1.EnI - Changed config ARM_BRBE_PMU from 'tristate' to 'bool' Changes in V4: https://lore.kernel.org/all/20221017055713.451092-1-anshuman.khandual@arm.com/ - Changed ../tools/sysreg declarations as suggested - Set PERF_SAMPLE_BRANCH_STACK in data.sample_flags - Dropped perfmon_capable() check in armpmu_event_init() - s/pr_warn_once/pr_info in armpmu_event_init() - Added brbe_format element into struct pmu_hw_events - Changed v1p1 as brbe_v1p1 in struct pmu_hw_events - Dropped pr_info() from arm64_pmu_brbe_probe(), solved LOCKDEP warning Changes in V3: https://lore.kernel.org/all/20220929075857.158358-1-anshuman.khandual@arm.com/ - Moved brbe_stack from the stack and now dynamically allocated - Return PERF_BR_PRIV_UNKNOWN instead of -1 in brbe_fetch_perf_priv() - Moved BRBIDR0, BRBCR, BRBFCR registers and fields into tools/sysreg - Created dummy BRBINF_EL1 field definitions in tools/sysreg - Dropped ARMPMU_EVT_PRIV framework which cached perfmon_capable() - Both exception and exception return branche records are now captured only if the event has PERF_SAMPLE_BRANCH_KERNEL which would already been checked in generic perf via perf_allow_kernel() Changes in V2: https://lore.kernel.org/all/20220908051046.465307-1-anshuman.khandual@arm.com/ - Dropped branch sample filter helpers consolidation patch from this series - Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable() - Use cached perfmon_capable() while configuring BRBE branch record filters Changes in V1: https://lore.kernel.org/linux-arm-kernel/20220613100119.684673-1-anshuman.khandual@arm.com/ - Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers - Process new perf branch types via PERF_BR_EXTEND_ABI Changes in RFC V2: https://lore.kernel.org/linux-arm-kernel/20220412115455.293119-1-anshuman.khandual@arm.com/ - Added branch_sample_priv() while consolidating other branch sample filter helpers - Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc - Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5) - Added documentation for struct arm_pmu changes, updated commit message - Updated commit message for BRBE detection infrastructure patch - PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver) - Branch privilege state capture mechanism has now moved inside the driver Changes in RFC V1: https://lore.kernel.org/all/1642998653-21377-1-git-send-email-anshuman.khandual@arm.com/ Cc: Catalin Marinas Cc: Will Deacon Cc: Mark Rutland Cc: Mark Brown Cc: James Clark Cc: Rob Herring Cc: Marc Zyngier Cc: Suzuki Poulose Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: linux-arm-kernel@lists.infradead.org Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Anshuman Khandual (6): arm64/sysreg: Add BRBE registers and fields KVM: arm64: Explicitly handle BRBE traps as UNDEFINED drivers: perf: arm_pmu: Add infrastructure for branch stack sampling arm64/boot: Enable EL2 requirements for BRBE drivers: perf: arm_pmuv3: Enable branch stack sampling via FEAT_BRBE KVM: arm64: nvhe: Disable branch generation in nVHE guests James Clark (3): perf: test: Speed up running brstack test on an Arm model perf: test: Remove empty lines from branch filter test output perf: test: Extend branch stack sampling test for Arm64 BRBE Documentation/arch/arm64/booting.rst | 23 +- arch/arm64/include/asm/el2_setup.h | 87 +- arch/arm64/include/asm/kvm_host.h | 3 + arch/arm64/include/asm/sysreg.h | 17 +- arch/arm64/kvm/debug.c | 5 + arch/arm64/kvm/hyp/nvhe/debug-sr.c | 31 + arch/arm64/kvm/sys_regs.c | 56 ++ arch/arm64/tools/sysreg | 131 +++ drivers/perf/Kconfig | 11 + drivers/perf/Makefile | 1 + drivers/perf/arm_brbe.c | 1198 ++++++++++++++++++++++++ drivers/perf/arm_pmu.c | 42 +- drivers/perf/arm_pmuv3.c | 160 +++- drivers/perf/arm_pmuv3_branch.h | 83 ++ include/linux/perf/arm_pmu.h | 37 +- tools/perf/tests/builtin-test.c | 1 + tools/perf/tests/shell/test_brstack.sh | 57 +- tools/perf/tests/tests.h | 1 + tools/perf/tests/workloads/Build | 2 + tools/perf/tests/workloads/traploop.c | 39 + 20 files changed, 1959 insertions(+), 26 deletions(-) create mode 100644 drivers/perf/arm_brbe.c create mode 100644 drivers/perf/arm_pmuv3_branch.h create mode 100644 tools/perf/tests/workloads/traploop.c -- 2.25.1