Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp1322004pxy; Thu, 6 May 2021 05:45:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxA7r9cRp3wqVLjEdzkW89+llhL3eYNPRrZ+ZcjLI7pf1rmuwkV5iC7Xd7b90GIc/U/tlvg X-Received: by 2002:a17:90a:df08:: with SMTP id gp8mr4483205pjb.199.1620305112677; Thu, 06 May 2021 05:45:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620305112; cv=none; d=google.com; s=arc-20160816; b=uCmKU2HCy0GK/Lok6oM6SYFZintF31uWjYZ/LLy5UZzni7zCvyhtc3bbPUAYfubW6S b3YSp9GU38xk0pkotojRM56WJncg1FQq3WFPvu/9GgywzDTWGFLIaFRijPf8CcOEwHC9 U1kFIfPPswObscfrUogTAKD9Eey3rC0+WYFkaeM8yZMRxrtbLbJdp4/eny55wTfplTAn lfhzIL0Ddy8XbScVxvi8jOvAaDBe/aA9+yrzZ0SDB7PqZEzeMZvEa3AcEjo3jB9T7mS8 KY8s8GSEw4eAv7IFDM+Cml2yXY358LaYM2Xsq8tQRnd3seuPau45M3WLYhdnEvC0jeB3 sGjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:ironport-sdr:ironport-sdr; bh=k4F6OpzG2up44nF/hDnhcnzHu9kTSp/+bi+B8QcvPLI=; b=rke9H+AwuJNq9wU0gbFCbVaQi8IrQjUSADR9ZtlP8eD3VGXJhPOxH6dYYpwai0j0Hn tGpq+azMiv70elK6tzYsMp3ZFW9uIThscXiz7xqtEpn2/lYmQKN4Yy0yrcCMlhqzAiDM Vd4ELZDhZenNWfRfMVvhJUKkaspKhxnIMQs0HQcz6ZSxDvsAWtZeY3MqB1meeKC02BHD LW600N8jY/lNf6J5eb1Iug9dZ8B5qVO4hziGgXdf+lUcCdQmKirW6MlUp0ziFwsPMeoe uI2TE0SqbLkPioncF/rtmW/ZP/GqKlz1QEeoY3edGW7Iik49z9bZtnqQ2VneffA5FinG TlUA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q15si2550645pgb.52.2021.05.06.05.44.59; Thu, 06 May 2021 05:45:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231830AbhEFMpA (ORCPT + 99 others); Thu, 6 May 2021 08:45:00 -0400 Received: from mga14.intel.com ([192.55.52.115]:39337 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229777AbhEFMpA (ORCPT ); Thu, 6 May 2021 08:45:00 -0400 IronPort-SDR: wX5Gd5BEVO4AIV9P9LiMP6pWXn32Q/FOTQFW8Eg2mvqWlUZmqE17YDP/1zaGi9y0Lc5Gl4aQk0 sk/KGlFdv69w== X-IronPort-AV: E=McAfee;i="6200,9189,9975"; a="198102181" X-IronPort-AV: E=Sophos;i="5.82,277,1613462400"; d="scan'208";a="198102181" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 May 2021 05:44:01 -0700 IronPort-SDR: OLM6QTvlzuf1+BZDVUO5NgH1Qd5eC2S+EMEpFLH+JX6zstRllxDgrra673sZvqgE7hc5GaiG02 2qe8c57/35mQ== X-IronPort-AV: E=Sophos;i="5.82,277,1613462400"; d="scan'208";a="469436772" Received: from abaydur-mobl1.ccr.corp.intel.com (HELO [10.249.229.1]) ([10.249.229.1]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 May 2021 05:43:57 -0700 Subject: Re: [PATCH v5 00/20] Introduce threaded trace streaming for basic perf record operation To: Namhyung Kim Cc: Arnaldo Carvalho de Melo , Jiri Olsa , Alexander Shishkin , Peter Zijlstra , Ingo Molnar , linux-kernel , Andi Kleen , Adrian Hunter , Alexander Antonov , Alexei Budankov References: From: "Bayduraev, Alexey V" Organization: Intel Corporation Message-ID: <9f178dde-751f-9ac9-f5a0-fd1bfba3ca32@linux.intel.com> Date: Thu, 6 May 2021 15:43:55 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 06.05.2021 9:20, Namhyung Kim wrote: > Hello, > > On Tue, May 4, 2021 at 12:05 AM Alexey Bayduraev > wrote: >> >> >> Basic analysis of data directories is provided in perf report mode. >> Raw dump and aggregated reports are available for data directories, >> still with no memory consumption optimizations. > > Do you have an idea how to improve it? > > I have to say again that I don't like merely adding more threads to > record. Yeah, parallelizing the perf record is good, but we have to > think about the perf report (and others) too. There is your idea about separating tracking records and process them first, but these changes can be much larger than my patch and I think they looks like independent patch and could be introduced as extension of parallel data loading. I also thought and experimented with the intermediate flushing of the ordered queue. This is simple for per-cpu data files (sorted by time), but not clear for arbitrary CPU masks. I think my patch can be the first step to introduce parallel mode to the perf tool. It just extends perf-record (already used in our vtune tool) and allows to load parallel data in experimental mode. Next patches could optimize and extend parallel data loading. Regards, Alexey > > Thanks, > Namhyung > > >> >> Tested: >> >> tools/perf/perf record -o prof.data --threads -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data --threads= -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data --threads=cpu -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data --threads=core -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data --threads=socket -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data --threads=numa -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data -C 2,5 --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data -C 3,4 --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data -C 0,4,2,6 --threads=core -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data -C 0,4,2,6 --threads=numa -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data --threads -g --call-graph dwarf,4096 -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data --threads -g --call-graph dwarf,4096 --compression-level=3 -- matrix.gcc.g.O3 >> tools/perf/perf record -o prof.data --threads -a >> tools/perf/perf record -D -1 -e cpu-cycles -a --control fd:10,11 -- sleep 30 >> tools/perf/perf record --threads -D -1 -e cpu-cycles -a --control fd:10,11 -- sleep 30 >> >> tools/perf/perf report -i prof.data >> tools/perf/perf report -i prof.data --call-graph=callee >> tools/perf/perf report -i prof.data --stdio --header >> tools/perf/perf report -i prof.data -D --header >> >> [1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads >> [2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/ >> >> --- >> >> Alexey Bayduraev (20): >> perf record: introduce thread affinity and mmap masks >> perf record: introduce thread specific data array >> perf record: introduce thread local variable >> perf record: stop threads in the end of trace streaming >> perf record: start threads in the beginning of trace streaming >> perf record: introduce data file at mmap buffer object >> perf record: introduce data transferred and compressed stats >> perf record: init data file at mmap buffer object >> tools lib: introduce bitmap_intersects() operation >> perf record: introduce --threads= command line option >> perf record: document parallel data streaming mode >> perf report: output data file name in raw trace dump >> perf session: move reader structure to the top >> perf session: introduce reader_state in reader object >> perf session: introduce reader objects in session object >> perf session: introduce decompressor into trace reader object >> perf session: move init into reader__init function >> perf session: move map/unmap into reader__mmap function >> perf session: load single file for analysis >> perf session: load data directory files for analysis >> >> tools/include/linux/bitmap.h | 11 + >> tools/lib/api/fd/array.c | 17 + >> tools/lib/api/fd/array.h | 1 + >> tools/lib/bitmap.c | 14 + >> tools/perf/Documentation/perf-record.txt | 30 + >> tools/perf/builtin-inject.c | 3 +- >> tools/perf/builtin-record.c | 1066 ++++++++++++++++++++-- >> tools/perf/util/evlist.c | 16 + >> tools/perf/util/evlist.h | 1 + >> tools/perf/util/mmap.c | 6 + >> tools/perf/util/mmap.h | 6 + >> tools/perf/util/ordered-events.h | 1 + >> tools/perf/util/record.h | 2 + >> tools/perf/util/session.c | 491 +++++++--- >> tools/perf/util/session.h | 5 + >> tools/perf/util/tool.h | 3 +- >> 16 files changed, 1474 insertions(+), 199 deletions(-) >> >> -- >> 2.19.0 >>