Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752370AbdHVR6k convert rfc822-to-8bit (ORCPT ); Tue, 22 Aug 2017 13:58:40 -0400 Received: from mga05.intel.com ([192.55.52.43]:33316 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752335AbdHVR6i (ORCPT ); Tue, 22 Aug 2017 13:58:38 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,413,1498546800"; d="scan'208";a="303194581" From: "Liang, Kan" To: Peter Zijlstra CC: "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , "acme@kernel.org" , "jolsa@redhat.com" , "tglx@linutronix.de" , "eranian@google.com" , "ak@linux.intel.com" Subject: RE: [PATCH V5] perf: Add PERF_SAMPLE_PHYS_ADDR Thread-Topic: [PATCH V5] perf: Add PERF_SAMPLE_PHYS_ADDR Thread-Index: AQHTF4UsqecasEDGFUSsTMnxyenm4KKQGhYAgACTuwA= Date: Tue, 22 Aug 2017 17:58:34 +0000 Message-ID: <37D7C6CF3E00A74B8858931C1DB2F0775378A2C0@SHSMSX103.ccr.corp.intel.com> References: <1502993843-6837-1-git-send-email-kan.liang@intel.com> <20170822165638.GH32112@worktop.programming.kicks-ass.net> In-Reply-To: <20170822165638.GH32112@worktop.programming.kicks-ass.net> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMTAwYTBiZGEtM2M3ZS00ZDAyLTgwOTMtMTE3NWVhZjc3ZTAyIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IkxhbWRCU1cxWURtSnVIUGtQYTFvWjdvbUFYVUR1TFFQOFhpczl1bmFIdHM9In0= x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4078 Lines: 102 > > On Thu, Aug 17, 2017 at 02:17:23PM -0400, kan.liang@intel.com wrote: > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > > index a3b873f..6783c69 100644 > > --- a/include/linux/perf_event.h > > +++ b/include/linux/perf_event.h > > @@ -944,6 +944,8 @@ struct perf_sample_data { > > > > struct perf_regs regs_intr; > > u64 stack_user_size; > > + > > + u64 phys_addr; > > } ____cacheline_aligned; > > > > /* default value for data source */ > > @@ -964,6 +966,7 @@ static inline void perf_sample_data_init(struct > perf_sample_data *data, > > data->weight = 0; > > data->data_src.val = PERF_MEM_NA; > > data->txn = 0; > > + data->phys_addr = 0; > > } > > So this is very unfortunate... > > struct perf_sample_data { > u64 addr; /* 0 8 */ > struct perf_raw_record * raw; /* 8 8 */ > struct perf_branch_stack * br_stack; /* 16 8 */ > u64 period; /* 24 8 */ > u64 weight; /* 32 8 */ > u64 txn; /* 40 8 */ > union perf_mem_data_src data_src; /* 48 8 */ > u64 type; /* 56 8 */ > /* --- cacheline 1 boundary (64 bytes) --- */ > u64 ip; /* 64 8 */ > struct { > u32 pid; /* 72 4 */ > u32 tid; /* 76 4 */ > } tid_entry; /* 72 8 */ > u64 time; /* 80 8 */ > u64 id; /* 88 8 */ > u64 stream_id; /* 96 8 */ > struct { > u32 cpu; /* 104 4 */ > u32 reserved; /* 108 4 */ > } cpu_entry; /* 104 8 */ > struct perf_callchain_entry * callchain; /* 112 8 */ > struct perf_regs regs_user; /* 120 16 */ > /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ > struct pt_regs regs_user_copy; /* 136 168 */ > /* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */ > struct perf_regs regs_intr; /* 304 16 */ > /* --- cacheline 5 boundary (320 bytes) --- */ > u64 stack_user_size; /* 320 8 */ > > /* size: 384, cachelines: 6, members: 19 */ > /* padding: 56 */ > }; > > > static inline void perf_sample_data_init(struct perf_sample_data *data, > u64 addr, u64 period) > { > /* remaining struct members initialized in perf_prepare_sample() */ > data->addr = addr; > data->raw = NULL; > data->br_stack = NULL; > data->period = period; > data->weight = 0; > data->data_src.val = PERF_MEM_NA; > data->txn = 0; > } > > You'll note that that only touches the first cacheline of the data structure, > and you just wrecked that. Back when I did that this made a measurable > difference. It looks there is still one room in cacheline 1. Could I use it? diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index b14095b..bcd1007 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -915,6 +915,7 @@ struct perf_sample_data { u64 weight; u64 txn; union perf_mem_data_src data_src; + u64 phys_addr; /* * The other fields, optionally {set,used} by @@ -964,6 +966,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->weight = 0; data->data_src.val = PERF_MEM_NA; data->txn = 0; + data->phys_addr = 0; } Thanks, Kan