Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp419959pxx; Wed, 28 Oct 2020 07:59:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzgWx3QHgz22/KNC9P+4wfChCGLuyZLSEqvJEmZtkWURb3CQ4cFza9mIS9epxpGl1dC7ykE X-Received: by 2002:a17:906:9483:: with SMTP id t3mr8331018ejx.390.1603897165424; Wed, 28 Oct 2020 07:59:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603897165; cv=none; d=google.com; s=arc-20160816; b=oHMiCCBRcavkg1qllFVaPLapE3mTDbnZJXuNApuKM5Wz8H0+ixWuVgNIpI3ZBtjtYx t0KXSJ6+/gagwcv6OXMY7+clnmqK4WGhSmj460t0iIyNoNB9lOVIq/fcplk0t+JoTJEC pD5B+B3XrGPMRcsycArEETscOtCMXV7B0gHILZMN0NrBj1T4NZXQWheFWFubjdLJX/Ge 2hE7M24dwas+si/FsOozK2XS6ZI09FJQBoDYxoI1KBr497T8LC6bwCUDw50f0SJGATy6 xSrsEdFbG8oos3MwL3T5TDtzI0FhnnvVhGU3Bxk1nOWtgWuvnxT4Z02dy/dmfuRmVM87 94PA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:ironport-sdr:ironport-sdr; bh=h4ZVjwiaSfLSIC6XqesRkUKoWV10D5cZ9fl9+N1JGYw=; b=idOS1E/eo5ffIM5pcBHdAutEyK03edTXqYhwOLPJ0J3J3DnqFKbVmXwk+2fcPrHmVu 2WGmzNZqtYq+GvJtxGeduYYHtEadgI+28y+qq6fq+1tfi0BHLYkyamHOgfoUtbP4PwX9 ScB5Ve4G7418GFpA10dR1vv+8djlwY6vhw/+HGyorcd44ZYTDMTGaTc1mrdstey28vmK TQVLN5NBG1I5Xgx4Bh4TQFOT919NrdIXn755WwZfbwODX2x1TxFkhSarLfPwOo+FXDTS Ny4b2G85+MPxRFkUW3NTkfVt5ieVy1+GGBVIVRH4Ne2qXQsdUOakMdO/kO3M7NMz20T4 FnAg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p17si2879649ejg.48.2020.10.28.07.59.03; Wed, 28 Oct 2020 07:59:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1808790AbgJ0QWY (ORCPT + 99 others); Tue, 27 Oct 2020 12:22:24 -0400 Received: from mga09.intel.com ([134.134.136.24]:31370 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S368923AbgJ0P6v (ORCPT ); Tue, 27 Oct 2020 11:58:51 -0400 IronPort-SDR: l5vOu3dYT2s/dHK6fnopku1bXlTLcCtsfdJwhe0ElQFP8KuXSZDm0ykyCH3So5rrFqSvvF6zGv fQCoAnmGTOJA== X-IronPort-AV: E=McAfee;i="6000,8403,9787"; a="168234891" X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="168234891" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 08:58:46 -0700 IronPort-SDR: yfCObZJ2pZNO3OTrQ3JvMhZg7gsWEuW0S8I5Cc5mPAp+YRN3PbeJvbbk8T/nvF46SCn+jPR6yy ckA/XsBYb/Hg== X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="535843677" Received: from abudanko-mobl.ccr.corp.intel.com (HELO [10.249.227.184]) ([10.249.227.184]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 08:58:44 -0700 Subject: Re: [PATCH v2 12/15] perf record: introduce thread local variable for trace streaming To: Jiri Olsa Cc: Arnaldo Carvalho de Melo , Namhyung Kim , Alexander Shishkin , Adrian Hunter , Andi Kleen , Peter Zijlstra , Ingo Molnar , linux-kernel References: <1ec29ed6-0047-d22f-630b-a7f5ccee96b4@linux.intel.com> <20201024154357.GD2589351@krava> <6eb97205-4d13-6487-8e15-a85f63d3f0cc@gmail.com> <20201026103426.GC2726983@krava> <78ca09c2-50da-3206-2dff-19523699d82b@gmail.com> <20201027120130.GD2900849@krava> From: Alexey Budankov Organization: Intel Corp. Message-ID: <570923a3-b4bf-6129-f470-0717e434a498@linux.intel.com> Date: Tue, 27 Oct 2020 18:58:41 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: <20201027120130.GD2900849@krava> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 27.10.2020 15:01, Jiri Olsa wrote: > On Mon, Oct 26, 2020 at 05:11:30PM +0300, Alexei Budankov wrote: >> >> On 26.10.2020 13:34, Jiri Olsa wrote: >>> On Mon, Oct 26, 2020 at 11:21:28AM +0300, Alexei Budankov wrote: >>>> >>>> On 24.10.2020 18:43, Jiri Olsa wrote: >>>>> On Wed, Oct 21, 2020 at 07:07:00PM +0300, Alexey Budankov wrote: >>>>>> >>>>>> Introduce thread local variable and use it for threaded trace streaming. >>>>>> >>>>>> Signed-off-by: Alexey Budankov >>>>>> --- >>>>>> tools/perf/builtin-record.c | 71 ++++++++++++++++++++++++++++++++----- >>>>>> 1 file changed, 62 insertions(+), 9 deletions(-) >>>>>> >>>>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c >>>>>> index 89cb8e913fb3..3b7e9026f25b 100644 >>>>>> --- a/tools/perf/builtin-record.c >>>>>> +++ b/tools/perf/builtin-record.c >>>>>> @@ -101,6 +101,8 @@ struct thread_data { >>>>>> u64 bytes_written; >>>>>> }; >>>>>> >>>>>> +static __thread struct thread_data *thread; >>>>>> + >>>>>> struct record { >>>>>> struct perf_tool tool; >>>>>> struct record_opts opts; >>>>>> @@ -587,7 +589,11 @@ static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size) >>>>>> } >>>>>> } >>>>>> >>>>>> - rec->samples++; >>>>>> + if (thread) >>>>>> + thread->samples++; >>>>>> + else >>>>>> + rec->samples++; >>>>> >>>>> this is really wrong, let's keep just single samples counter >>>>> ditto for all the other places in this patch >>>> >>>> This does look like data parallelism [1] which is very true for >>>> threaded trace streaming so your prototype design looks optimal. >>>> >>>> For this specific place incrementing global counter in memory is >>>> less performant and faces scalability limitations as a number of >>>> cores grow. >>>> >>>> Not sure why you have changed your mind. >>> >>> I'm not sure I follow.. what I'm complaining about is to have >>> 'samples' stat variable in separate locations for --threads >>> and --no-threads mode >> >> It is optimal to have samples variable as per thread one >> and then sum up the total in the end of data collection. >> >> Single global variable design has scalability and performance >> drawbacks. >> >> Why do you complain about per thread variable in this case? >> It looks like ideally fits these specific needs. > > I think there's misunderstanding.. I think we should move > samples to per thread 'thread' object and have just one > copy of that.. and do not increase separate variables for > thread and non-thread cases Aw, I see. Using the same __thread object by main thread in serial and threaded modes. That makes sense. I will try in v3. Alexei > > jirka >