Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4199956pxf; Tue, 6 Apr 2021 10:11:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzk8OpwNN9u2+qZyh8Y9HsFI+okJS6CExO59lN6F7lHc6vXK52mPuC9dFS8NFQ/GDFjKa59 X-Received: by 2002:a92:c549:: with SMTP id a9mr25322605ilj.300.1617729104531; Tue, 06 Apr 2021 10:11:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617729104; cv=none; d=google.com; s=arc-20160816; b=aijz+XeF6LrmIW8FsHemeNJDwiXGMdjqOJNJpVcEklXpXTOF3RHw88eBSjUiHblIsl fu8Q2Im67HoXCRgvBRiRHpSFPACRh0lden0RkCpw3vtfhNiTjZ4JGxUmgdst3Gn2DvBB rXVA+WnFzYxXXWbjK9l3+J8XzGQ2pcRhhIY1uB48xl0gIDXVce9LREAlD4DcxShCWfXy WDc6/sUtypc+FjYrXEp8xTLVFaSpPpPtoHIiSwv6JBJC1fv4srodmMDCNkeA8ue67zzd vomLiI4YuGf4il9eHEJEXRCdwNlWvliXGgfKzoe5nttKQcbRyHuyhAM8qwIq5F+GVrkg ESHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:user-agent:date:message-id:organization:subject:from :cc:to:ironport-sdr:ironport-sdr; bh=u42BgqHbZfQaTEyf8oaFI2TN6eNKam8fUMO7f71Ae74=; b=j+qRI09jFeiBESgb90yu+babF2xg0aVGsdgdvciCUh6DlvchZhnbVfbskzd0+TOam8 18km3lblJoUA7akL5HTgU2ioIpTs61tjtAy9NaGgOGSJXjSG8RrwDL0hDvR6yv9TArg7 pRMrfyjI8rMDFx80Vwy8GGMBNV34II6hEu2aM3Elyv+dUFGVln8C/2IE9vcgHM2AeysI cjfXtXOb67ZWc6hb8uEB7OGQIlsHXzcmqDPE07Y7ldpSuPcG1GFZxeUb0nGuvqNsHCN9 e5jNHG16u0z22U6iZ48LuuWqNTt3EiST5oQqPHTS/9Zx+KAFIXMp1TGfVUsFmTl9PG1L j+Gg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e21si19273190jaq.58.2021.04.06.10.11.32; Tue, 06 Apr 2021 10:11:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244408AbhDFIhj (ORCPT + 99 others); Tue, 6 Apr 2021 04:37:39 -0400 Received: from mga01.intel.com ([192.55.52.88]:40834 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234953AbhDFIhj (ORCPT ); Tue, 6 Apr 2021 04:37:39 -0400 IronPort-SDR: yI0VzMrlhPLZmyRDNt6338xFtfCrSVGhl+/3inuESKVQqXuvaDe4KlKuO83CPiTY9Sg9vLeqf7 70rF3dyH0uSw== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="213378345" X-IronPort-AV: E=Sophos;i="5.81,308,1610438400"; d="scan'208";a="213378345" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Apr 2021 01:37:31 -0700 IronPort-SDR: xI6VxEdezYdodO5AKXQpMfdMtLxZiuKBvKoKw/qKLVlAV5WTDWCGNeArPZ+Z18tWuyZ7/FQb+F zX/OAaFgkDQA== X-IronPort-AV: E=Sophos;i="5.81,308,1610438400"; d="scan'208";a="421118966" Received: from abaydur-mobl1.ccr.corp.intel.com (HELO [10.249.228.164]) ([10.249.228.164]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Apr 2021 01:37:28 -0700 To: Arnaldo Carvalho de Melo Cc: Jiri Olsa , Namhyung Kim , Alexander Shishkin , Peter Zijlstra , Ingo Molnar , linux-kernel , Andi Kleen , Adrian Hunter , Alexei Budankov , Alexander Antonov From: "Bayduraev, Alexey V" Subject: [PATCH v4 00/12] Introduce threaded trace streaming for basic perf record operation Organization: Intel Corporation Message-ID: <6c15adcb-6a9d-320e-70b5-957c4c8b6ff2@linux.intel.com> Date: Tue, 6 Apr 2021 11:37:26 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Changes in v4: - renamed 'comm' structure to 'pipes' - moved thread fd/maps messages to verbose=2 - fixed leaks during allocation of thread_data structures - fixed leaks during allocation of thread masks - fixed possible fails when releasing thread masks v3: https://lore.kernel.org/lkml/7d197a2d-56e2-896d-bf96-6de0a4db1fb8@linux.intel.com/ Changes in v3: - avoided skipped redundant patch 3/15 - applied "data file" and "data directory" terms allover the patch set - captured Acked-by: tags by Namhyung Kim - avoided braces where don't needed - employed thread local variable for serial trace streaming - added specs for --thread option - core, socket, numa and user defined - added parallel loading of data directory files similar to the prototype [1] v2: https://lore.kernel.org/lkml/1ec29ed6-0047-d22f-630b-a7f5ccee96b4@linux.intel.com/ Changes in v2: - explicitly added credit tags to patches 6/15 and 15/15, additionally to cites [1], [2] - updated description of 3/15 to explicitly mention the reason to open data directories in read access mode (e.g. for perf report) - implemented fix for compilation error of 2/15 - explicitly elaborated on found issues to be resolved for threaded AUX trace capture v1: https://lore.kernel.org/lkml/810f3a69-0004-9dff-a911-b7ff97220ae0@linux.intel.com/ Patch set provides parallel threaded trace streaming mode for basic perf record operation. Provided mode mitigates profiling data losses and resolves scalability issues of serial and asynchronous (--aio) trace streaming modes on multicore server systems. The design and implementation are based on the prototype [1], [2]. Parallel threaded mode executes trace streaming threads that read kernel data buffers and write captured data into several data files located at data directory. Layout of trace streaming threads and their mapping to data buffers to read can be configured using a value of --thread command line option. Specification value provides masks separated by colon so the masks define cpus to be monitored by one thread and thread affinity mask is separated by slash. /:/ specifies parallel threads layout that consists of two threads with corresponding assigned cpus to be monitored. Specification value can be a string e.g. "cpu", "core" or "socket" meaning creation of data streaming thread for monitoring every cpu, whole core or socket. The option provided with no or empty value defaults to "cpu" layout creating data streaming thread for every cpu being monitored. Specification masks are filtered by the mask provided via -C option. Parallel streaming mode is compatible with Zstd compression/decompression (--compression-level) and external control commands (--control). The mode is not enabled for pipe mode. The mode is not enabled for AUX area tracing, related and derived modes like --snapshot or --aux-sample. --switch-output-* and --timestamp-filename options are not enabled for parallel streaming. Initial intent to enable AUX area tracing faced the need to define some optimal way to store index data in data directory. --switch-output-* and --timestamp-filename use cases are not clear for data directories. Asynchronous(--aio) trace streaming and affinity (--affinity) modes are mutually exclusive to parallel streaming mode. Basic analysis of data directories is provided in perf report mode. Raw dump and aggregated reports are available for data directories, still with no memory consumption optimizations. Tested: tools/perf/perf record -o prof.data --threads -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data --threads= -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data --threads=cpu -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data --threads=core -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data --threads=socket -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data --threads=numa -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data -C 2,5 --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data -C 3,4 --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data -C 0,4,2,6 --threads=core -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data -C 0,4,2,6 --threads=numa -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data --threads -g --call-graph dwarf,4096 -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data --threads -g --call-graph dwarf,4096 --compression-level=3 -- matrix.gcc.g.O3 tools/perf/perf record -o prof.data --threads -a tools/perf/perf record -D -1 -e cpu-cycles -a --control fd:10,11 -- sleep 30 tools/perf/perf record --threads -D -1 -e cpu-cycles -a --control fd:10,11 -- sleep 30 tools/perf/perf report -i prof.data tools/perf/perf report -i prof.data --call-graph=callee tools/perf/perf report -i prof.data --stdio --header tools/perf/perf report -i prof.data -D --header [1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads [2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/ --- Alexey Bayduraev (12): perf record: introduce thread affinity and mmap masks perf record: introduce thread specific data array perf record: introduce thread local variable perf record: stop threads in the end of trace streaming perf record: start threads in the beginning of trace streaming perf record: introduce data file at mmap buffer object perf record: init data file at mmap buffer object perf record: introduce --threads= command line option perf record: document parallel data streaming mode perf report: output data file name in raw trace dump perf session: load data directory files for analysis perf session: use reader functions to load perf data file tools/include/linux/bitmap.h | 11 + tools/lib/api/fd/array.c | 17 + tools/lib/api/fd/array.h | 1 + tools/lib/bitmap.c | 14 + tools/perf/Documentation/perf-record.txt | 18 + tools/perf/builtin-inject.c | 3 +- tools/perf/builtin-record.c | 1027 ++++++++++++++++++++-- tools/perf/util/evlist.c | 16 + tools/perf/util/evlist.h | 1 + tools/perf/util/mmap.c | 6 + tools/perf/util/mmap.h | 6 + tools/perf/util/ordered-events.h | 1 + tools/perf/util/record.h | 2 + tools/perf/util/session.c | 484 +++++++--- tools/perf/util/session.h | 5 + tools/perf/util/tool.h | 3 +- 16 files changed, 1407 insertions(+), 208 deletions(-) -- 2.19.0