Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp198450pxb; Tue, 31 Aug 2021 19:02:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwY+19bobFNSS2ClJBm1K3rQ1TE/6xwK1e/kAmFN07muGNboP88DAYzFzMDXGmM0EXmFvl6 X-Received: by 2002:a17:906:9241:: with SMTP id c1mr34183530ejx.125.1630461734120; Tue, 31 Aug 2021 19:02:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630461734; cv=none; d=google.com; s=arc-20160816; b=TUjQgLOOgxGfnlOAtOygBoSMMsPmw5J1WMXBOYNN4lOlOOW92YRy3ZNyl0dGLizBxM ACGImjix9VkIFuWhKCSu6huQSPjNoTM7qonGv9qPNvnGoC6mdjXrK4PTzQRI7bG5X/2M 6ptHrbAo7uZ+SbilAdIqlC8mxO7gwJFO6Is3qE6ZHtzt5rt5h56w7ugMXdVAX0hPn6Zy cwlGttB1adCouBtkG6qM/rsG7met2KIiME+BcNo06toYdFNz5Td5yUGL3fCh55eQ0BE4 Bf5mHgp+d8SgGM0kWE+RADJrHNKH7pDY/vb3nAvNy38W1TevIy2SAHBgAuwemSO7v8TZ L61g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=M+oSr91wZM4AdpVJbkwn+Sn6mz/eExc1SXQ25sIII6I=; b=REE09cPqTcIcnkySIbdWkPlnwjLfb59QKiiBHSRmQlVXB2UJb+JzPR+HF3p6BpTtxI fKrbI5R3TNhrq9el2HlPsAfExYjy7SlEUKlw3A3x/AxYl/8XT0mLCbvH887B4VLEvnMy HfmwOqIaJIiXxiN27vNf+3WFp1vKIpqIeAJkfXRhsf7qIj7mDLb8kU9GoBzbgj6wbfq7 g94GHJtjEXiMyadS3oZ861Oe1gAWyKV97hzhY1s2uuir5cNvClgZI3VnNtiEMv7aAZro ltn9lr98FTSP9g/mvdOjofLwUs95YB8kaEMoSDfY0QJyqCzI/WC1C6jO9N8qMovgrbz7 cnqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b="QaD8/fJc"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=fb.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qw16si12111091ejc.537.2021.08.31.19.01.47; Tue, 31 Aug 2021 19:02:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b="QaD8/fJc"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=fb.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241430AbhIAAg2 (ORCPT + 99 others); Tue, 31 Aug 2021 20:36:28 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:10752 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S241413AbhIAAg1 (ORCPT ); Tue, 31 Aug 2021 20:36:27 -0400 Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.0.43/8.16.0.43) with SMTP id 1810Xf4f002701 for ; Tue, 31 Aug 2021 17:35:31 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=M+oSr91wZM4AdpVJbkwn+Sn6mz/eExc1SXQ25sIII6I=; b=QaD8/fJcb9iLYEHiy6IVimMw1za4lFwclf+k05eBWxy7DCWwTVf7VZ/zUFSiMRXCiSrI wA1fxz37vnIFOOUDIrPHUFpkJqYRpxPL0T0DHGiWEQU9MzXuJUlS7mTu477qUynb6beQ vsEg7zeRa6aWFlVITFpExcZHWBDaMBFvTi8= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net with ESMTP id 3assek2pqg-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 31 Aug 2021 17:35:31 -0700 Received: from intmgw002.06.ash9.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Tue, 31 Aug 2021 17:35:29 -0700 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id 66B2BF5868EB; Tue, 31 Aug 2021 17:35:23 -0700 (PDT) From: Song Liu To: , CC: , , , , , Song Liu Subject: [PATCH v4 bpf-next 1/3] perf: enable branch record for software events Date: Tue, 31 Aug 2021 17:35:15 -0700 Message-ID: <20210901003517.3953145-2-songliubraving@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210901003517.3953145-1-songliubraving@fb.com> References: <20210901003517.3953145-1-songliubraving@fb.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe Content-Type: text/plain X-FB-Source: Intern X-Proofpoint-ORIG-GUID: ZNC6SsTmXfZ2AhJVjNRKP_Q76qobe6ey X-Proofpoint-GUID: ZNC6SsTmXfZ2AhJVjNRKP_Q76qobe6ey X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.790 definitions=2021-08-31_10:2021-08-31,2021-08-31 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 clxscore=1015 suspectscore=0 impostorscore=0 adultscore=0 priorityscore=1501 lowpriorityscore=0 mlxlogscore=999 spamscore=0 phishscore=0 malwarescore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2109010001 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The typical way to access branch record (e.g. Intel LBR) is via hardware perf_event. For CPUs with FREEZE_LBRS_ON_PMI support, PMI could capture reliable LBR. On the other hand, LBR could also be useful in non-PMI scenario. For example, in kretprobe or bpf fexit program, LBR could provide a lot of information on what happened with the function. Add API to use branch record for software use. Note that, when the software event triggers, it is necessary to stop the branch record hardware asap. Therefore, static_call is used to remove som= e branch instructions in this process. Signed-off-by: Song Liu --- arch/x86/events/intel/core.c | 26 +++++++++++++++++++++++--- arch/x86/events/intel/ds.c | 8 -------- arch/x86/events/perf_event.h | 10 ++++++++-- include/linux/perf_event.h | 26 ++++++++++++++++++++++++++ kernel/events/core.c | 2 ++ 5 files changed, 59 insertions(+), 13 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index ac6fd2dabf6a2..fe9bec93eb53b 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2143,7 +2143,7 @@ static __initconst const u64 knl_hw_cache_extra_reg= s * However, there are some cases which may change PEBS status, e.g. PMI * throttle. The PEBS_ENABLE should be updated where the status changes. */ -static void __intel_pmu_disable_all(void) +static __always_inline void __intel_pmu_disable_all(void) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); =20 @@ -2153,7 +2153,7 @@ static void __intel_pmu_disable_all(void) intel_pmu_disable_bts(); } =20 -static void intel_pmu_disable_all(void) +static __always_inline void intel_pmu_disable_all(void) { __intel_pmu_disable_all(); intel_pmu_pebs_disable_all(); @@ -2186,6 +2186,20 @@ static void intel_pmu_enable_all(int added) __intel_pmu_enable_all(added, false); } =20 +static int +intel_pmu_snapshot_branch_stack(struct perf_branch_snapshot *br_snapshot= ) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + + intel_pmu_disable_all(); + intel_pmu_lbr_read(); + memcpy(br_snapshot->entries, cpuc->lbr_entries, + sizeof(struct perf_branch_entry) * x86_pmu.lbr_nr); + br_snapshot->nr =3D x86_pmu.lbr_nr; + intel_pmu_enable_all(0); + return 0; +} + /* * Workaround for: * Intel Errata AAK100 (model 26) @@ -6283,9 +6297,15 @@ __init int intel_pmu_init(void) x86_pmu.lbr_nr =3D 0; } =20 - if (x86_pmu.lbr_nr) + if (x86_pmu.lbr_nr) { pr_cont("%d-deep LBR, ", x86_pmu.lbr_nr); =20 + /* only support branch_stack snapshot for perfmon >=3D v2 */ + if (x86_pmu.disable_all =3D=3D intel_pmu_disable_all) + static_call_update(perf_snapshot_branch_stack, + intel_pmu_snapshot_branch_stack); + } + intel_pmu_check_extra_regs(x86_pmu.extra_regs); =20 /* Support full width counters using alternative MSR range */ diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 8647713276a73..8a832986578a9 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1296,14 +1296,6 @@ void intel_pmu_pebs_enable_all(void) wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled); } =20 -void intel_pmu_pebs_disable_all(void) -{ - struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); - - if (cpuc->pebs_enabled) - wrmsrl(MSR_IA32_PEBS_ENABLE, 0); -} - static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index e3ac05c97b5e5..171abbb359fe5 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1240,6 +1240,14 @@ static inline bool intel_pmu_has_bts(struct perf_e= vent *event) return intel_pmu_has_bts_period(event, hwc->sample_period); } =20 +static __always_inline void intel_pmu_pebs_disable_all(void) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + + if (cpuc->pebs_enabled) + wrmsrl(MSR_IA32_PEBS_ENABLE, 0); +} + int intel_pmu_save_and_restart(struct perf_event *event); =20 struct event_constraint * @@ -1314,8 +1322,6 @@ void intel_pmu_pebs_disable(struct perf_event *even= t); =20 void intel_pmu_pebs_enable_all(void); =20 -void intel_pmu_pebs_disable_all(void); - void intel_pmu_pebs_sched_task(struct perf_event_context *ctx, bool sche= d_in); =20 void intel_pmu_auto_reload_read(struct perf_event *event); diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index fe156a8170aa3..a368dfd754608 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -57,6 +57,7 @@ struct perf_guest_info_callbacks { #include #include #include +#include #include =20 struct perf_callchain_entry { @@ -1612,4 +1613,29 @@ extern void __weak arch_perf_update_userpage(struc= t perf_event *event, extern __weak u64 arch_perf_get_page_size(struct mm_struct *mm, unsigned= long addr); #endif =20 +/* + * Snapshot branch stack on software events. + * + * Branch stack can be very useful in understanding software events. For + * example, when a long function, e.g. sys_perf_event_open, returns an + * errno, it is not obvious why the function failed. Branch stack could + * provide very helpful information in this type of scenarios. + * + * On software event, it is necessary to stop the hardware branch record= er + * fast. Otherwise, the hardware register/buffer will be flushed with + * entries af the triggering event. Therefore, static call is used to + * stop the hardware recorder. + */ +enum { + PERF_MAX_BRANCH_SNAPSHOT =3D 32, +}; + +struct perf_branch_snapshot { + unsigned int nr; + struct perf_branch_entry entries[PERF_MAX_BRANCH_SNAPSHOT]; +}; + +typedef int (perf_snapshot_branch_stack_t)(struct perf_branch_snapshot *= ); +DECLARE_STATIC_CALL(perf_snapshot_branch_stack, perf_snapshot_branch_sta= ck_t); + #endif /* _LINUX_PERF_EVENT_H */ diff --git a/kernel/events/core.c b/kernel/events/core.c index 011cc5069b7ba..d32a3cf37eb90 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -13437,3 +13437,5 @@ struct cgroup_subsys perf_event_cgrp_subsys =3D { .threaded =3D true, }; #endif /* CONFIG_CGROUP_PERF */ + +DEFINE_STATIC_CALL_RET0(perf_snapshot_branch_stack, perf_snapshot_branch= _stack_t); --=20 2.30.2