Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp441385ybg; Wed, 23 Oct 2019 00:07:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqwn//sRRJ5gxs2oJ6bkFU+YLoQ5IP0/SVEdeNyZ4CJY4uVAsh+xSHCHp/bs/nI32u4RNzeS X-Received: by 2002:a17:906:4d11:: with SMTP id r17mr8266480eju.99.1571814423136; Wed, 23 Oct 2019 00:07:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571814423; cv=none; d=google.com; s=arc-20160816; b=I1p8hJTS8mz6Rn1FBUgdl5jsjbSZNP2QeDH+452V/fY/8q0Cmo/+XqM7CVZ7BsCisV Znfc70T38NBr0Oj9Lm5L9JIHJ4bHnacOn1cjh/TNjX6KpEkS4YdTJbU+4em3+6nRZPJA R9+LiTNkFRm/X34PRb8p9gICcDl51Md2go++bxEifyUH8U/iEgrF2CBWxawPTwQNGLec r6DcIg+V7MHhGNdFgGlpLesGO+AeXqaiUKF0576MWu2beXmdw+Oo40iFRPS84bKGSovp BNCo5Cq6lyV/dzLAX3aYSL8gI4g0Lzbqe1j3LU9LTFeGPNp7p8wjIPr1jyB1NIUSmmYg Akfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:mime-version:user-agent:date:message-id :organization:cc:to:subject:from; bh=P6EcVgW5uDnY1QqdpOW5RlAvLbO/sqYW4KuN90MhCwo=; b=SRUPwhdbQjq0YoBcoR2FAfsT2o3Eq0YOL0pQGt4zRC8WZeFfA+0zYiLUGfHUezKPnv DQ4eHuzVwvleYBvNdanUEBV1W+Pd8ZRsd/ZLwDZl5j7GWiFfk/xrU7/+Tie5g5q3EoP8 23xRn9dtbLVOn5ciZqWLYArTnpw6rY9UxbbyDjfeLdqdRkT5KL6BpL6Wo+TdVWIRvp3V Gadub3tKJfk3prPYGEoDPoW6U/mcDXyhnv3/AsHoVGCydDmk7y8MPhvYJoL8lXB479Lo 2QQiL0hWLFG36/KRlsCYhDlIkVjpaF+CR/NA+4Lh8rNrLADr492r+jlJi4EQHkzC8n0k A4HA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b50si14549340ede.28.2019.10.23.00.06.39; Wed, 23 Oct 2019 00:07:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389650AbfJWHFt (ORCPT + 99 others); Wed, 23 Oct 2019 03:05:49 -0400 Received: from mga06.intel.com ([134.134.136.31]:27527 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731524AbfJWHFs (ORCPT ); Wed, 23 Oct 2019 03:05:48 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Oct 2019 00:05:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,219,1569308400"; d="scan'208";a="203879421" Received: from linux.intel.com ([10.54.29.200]) by FMSMGA003.fm.intel.com with ESMTP; 23 Oct 2019 00:05:46 -0700 Received: from [10.249.230.188] (abudanko-mobl.ccr.corp.intel.com [10.249.230.188]) by linux.intel.com (Postfix) with ESMTP id 01C1658029F; Wed, 23 Oct 2019 00:05:41 -0700 (PDT) From: Alexey Budankov Subject: [PATCH v5 0/4] perf/core: fix restoring of Intel LBR call stack on a context switch To: Peter Zijlstra Cc: Arnaldo Carvalho de Melo , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Andi Kleen , Kan Liang , Stephane Eranian , Ian Rogers , Song Liu , linux-kernel Organization: Intel Corp. Message-ID: Date: Wed, 23 Oct 2019 10:05:40 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Restore Intel LBR call stack from cloned inactive task perf context on a context switch. This change inherently addresses inconsistency in LBR call stack data provided on a sample in record profiling mode: $ perf record -N -B -T -R --call-graph lbr \ -e cpu/period=0xcdfe60,event=0x3c,name=\'CPU_CLK_UNHALTED.THREAD\'/Duk \ --clockid=monotonic_raw -- ./miniFE.x nx 25 ny 25 nz 25 Let's assume threads A, B, C belonging to the same process. B and C are siblings of A and their perf contexts are treated as equivalent. At some point B blocks on a futex (non preempt context switch). B's LBRs are preserved at B's perf context task_ctx_data and B's events are removed from PMU and disabled. B's perf context becomes inactive. Later C gets on a cpu, runs, gets profiled and eventually switches to the awaken but not yet running B. The optimized context switch path is executed swapping B's and C's task_ctx_data pointers at perf event contexts. So C's task_ctx_data will refer preserved B's LBRs on the following switch-in event. However, as far B's perf context is inactive there is no enabled events in there and B's task_ctx_data->lbr_callstack_users is equal to 0. When B gets on the cpu B's events reviving is skipped following the optimized context switch path and B's task_ctx_data->lbr_callstack_users remains 0. Thus B's LBR's are not restored by pmu sched_task() code called in the end of perf context switch-in callback for B. In the report that manifests as having short fragments of B's call stack, still tracked by LBR's HW between adjacent samples, but the whole thread call tree doesn't aggregate. The fix has been evaluated when profiling miniFE [1] (C++, OpenMP) workload running 64 threads on Intel Skylake EP(64 core, 2 sockets): $ perf report --call-graph callee,flat 5.3.0-rc6+ (tip perf/core) - fixed - 92.66% 82.64% miniFE.x libiomp5.so [.] _INTERNAL_25_______src_kmp_barrier_cpp_1d20fae8::__kmp_hyper_barrier_release - 69.14% _INTERNAL_25_______src_kmp_barrier_cpp_1d20fae8::__kmp_hyper_barrier_release __kmp_fork_barrier __kmp_launch_thread _INTERNAL_24_______src_z_Linux_util_c_3e0095e6::__kmp_launch_worker start_thread __clone - 21.89% _INTERNAL_25_______src_kmp_barrier_cpp_1d20fae8::__kmp_hyper_barrier_release __kmp_barrier __kmpc_reduce_nowait miniFE::cg_solve, miniFE::Vector, miniFE::matvec_std, miniFE::Vector, miniFE::Vector, miniFE::matvec_std, miniFE::Vector, miniFE::Vector, miniFE::matvec_std, miniFE::Vector, miniFE::Vector, miniFE::matvec_std, miniFE::Vectortask_ctx_data pointers to architecture specific intel_pmu_lbr_swap_task_ctx() implementation; Changes in v4: - moved check on simultaneous task_ctx_data objects availability to the perf/core layer; - marked sync_task_ctx() as the optional in code comments; - renamed params of sync_task_ctx() to prev and next; Changes in v3: - replaced assignment with swap at intel_pmu_lbr_sync_task_ctx() Changes in v2: - implemented sync_task_ctx() method at perf,x86,intel pmu types; - employed the method on the optimized context switch path between equivalent perf event contexts; -- 2.20.1