Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1216352ybl; Tue, 3 Dec 2019 03:42:29 -0800 (PST) X-Google-Smtp-Source: APXvYqw/yhkmmznI7HoGlLR+6bswSlCQafAWFKNZ28JGEEts38S3FzzXqUMBRdtjDjkXDtDL/UDT X-Received: by 2002:aca:3846:: with SMTP id f67mr3076063oia.61.1575373349680; Tue, 03 Dec 2019 03:42:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575373349; cv=none; d=google.com; s=arc-20160816; b=Db5Nv7PALF/sImPSLi5+9oaHw9x/Fh+p2gCuqGE2/XXgw7TbeXghxLeGdUAar2ZKB1 M0DhYtPpnzWbkYI/mF2d3LtsC+mFjtTfjaATL3KoBOa9waxjfY0VIBglRvgjq1Bb/K+q o8tILoooFWap9hz/4FbkxFE1lFDSGIhFAFcGkdquxoE26MMsLX5T3xI5RodhgWeUgz7v Abv8MXGYvp2y95/FrY6LhmzX6OnXL2zLHZmYAmQ8q1AO8/qj1X/fTLxh4qxDed5N0Rhw qmaF/1LsugULTtXcdGxGZOJ5IiU9KsxKReJOoRzI6FWazQy1kP28dRJUz0gfSX9ijsOc xh5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:mime-version:user-agent:date:message-id :organization:subject:from:cc:to; bh=dSUaGt1CrXPdxrEsopgY0gK+kJ7rUmpkbbrJt0irWiw=; b=EwghLKWB3jNF/wfHRHXK4/6DVOHmsw21r1ASCFANcBNefEVU2njcx/1sSBYRx2AunT L2mY0Pmp5zce+6N32qF1VEHgDspeNtBjy5nB9X+m0R1cgosA4TUYTSO+WkZh+cYjvRcS ZpwVflC5+RM2Qqnr3kIqdPRIKE8qq9n2NuM97B9JLr6seB/B/JK2Z999LvJYoIS0C8OL EmkUFWOuHJc8BRtaFj+CBGBdAakTDLcKXT4YxccevzFe8ngDshAr/ac6bVCTjMeJmYXz zIRPmVVbotjZNoB5hOI9nCDi5T6qzyQn5lDJMUM2fverGKTGz7sSjxnGZI8/TaW6xkma 6LPA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f16si1132021otp.198.2019.12.03.03.42.17; Tue, 03 Dec 2019 03:42:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726086AbfLCLlf (ORCPT + 99 others); Tue, 3 Dec 2019 06:41:35 -0500 Received: from mga06.intel.com ([134.134.136.31]:22758 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725773AbfLCLle (ORCPT ); Tue, 3 Dec 2019 06:41:34 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Dec 2019 03:41:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,273,1571727600"; d="scan'208";a="208456419" Received: from linux.intel.com ([10.54.29.200]) by fmsmga007.fm.intel.com with ESMTP; 03 Dec 2019 03:41:32 -0800 Received: from [10.125.253.16] (abudanko-mobl.ccr.corp.intel.com [10.125.253.16]) by linux.intel.com (Postfix) with ESMTP id 477C658049B; Tue, 3 Dec 2019 03:41:30 -0800 (PST) To: Arnaldo Carvalho de Melo Cc: Jiri Olsa , Namhyung Kim , Alexander Shishkin , Peter Zijlstra , Ingo Molnar , Andi Kleen , linux-kernel From: Alexey Budankov Subject: [PATCH v5 0/3] perf record: adapt NUMA awareness to machines with #CPUs > 1K Organization: Intel Corp. Message-ID: Date: Tue, 3 Dec 2019 14:41:29 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Current implementation of cpu_set_t type by glibc has internal cpu mask size limitation of no more than 1024 CPUs. This limitation confines NUMA awareness of Perf tool in record mode, thru --affinity option, to the first 1024 CPUs on machines with larger amount of CPUs. This patch set enables Perf tool to overcome 1024 CPUs limitation by using a dedicated struct mmap_cpu_mask type and applying tool's bitmap API operations to manipulate affinity masks of the tool's thread and the mmaped data buffers. tools bitmap API has been extended with bitmap_free() function and bitmap_equal() operation whose implementation is derived from the kernel one. --- Changes in v5: - avoided allocation of mmap affinity masks in case of rec->opts.affinity == PERF_AFFINITY_SYS Changes in v4: - renamed perf_mmap__print_cpu_mask() to mmap_cpu_mask__scnprintf() - avoided checking mask bits for NULL prior calling bitmask_free() - avoided thread affinity mask allocation for case of rec->opts.affinity == PERF_AFFINITY_SYS Changes in v3: - implemented perf_mmap__print_cpu_mask() function - use perf_mmap__print_cpu_mask() to log thread and mmap cpus masks when verbose level is equal to 2 Changes in v2: - implemented bitmap_free() for symmetry with bitmap_alloc() - capitalized MMAP_CPU_MASK_BYTES() macro - returned -1 from perf_mmap__setup_affinity_mask() - implemented releasing of masks using bitmap_free() - moved debug printing under -vv option --- Alexey Budankov (3): tools bitmap: implement bitmap_equal() operation at bitmap API perf mmap: declare type for cpu mask of arbitrary length perf record: adapt affinity to machines with #CPUs > 1K tools/include/linux/bitmap.h | 30 +++++++++++++++++++++++++++ tools/lib/bitmap.c | 15 ++++++++++++++ tools/perf/builtin-record.c | 28 +++++++++++++++++++------ tools/perf/util/mmap.c | 40 ++++++++++++++++++++++++++++++------ tools/perf/util/mmap.h | 13 +++++++++++- 5 files changed, 113 insertions(+), 13 deletions(-) --- Validation: # tools/perf/perf record -vv -- ls Using CPUID GenuineIntel-6-5E-3 intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch nr_cblocks: 0 affinity: SYS mmap flush: 1 comp level: 0 ------------------------------------------------------------ perf_event_attr: size 120 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ sys_perf_event_open: pid 23718 cpu 0 group_fd -1 flags 0x8 = 4 sys_perf_event_open: pid 23718 cpu 1 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid 23718 cpu 2 group_fd -1 flags 0x8 = 6 sys_perf_event_open: pid 23718 cpu 3 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid 23718 cpu 4 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid 23718 cpu 5 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid 23718 cpu 6 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid 23718 cpu 7 group_fd -1 flags 0x8 = 13 mmap size 528384B 0x7f3e06e060b8: mmap mask[8]: 0x7f3e06e16180: mmap mask[8]: 0x7f3e06e26248: mmap mask[8]: 0x7f3e06e36310: mmap mask[8]: 0x7f3e06e463d8: mmap mask[8]: 0x7f3e06e564a0: mmap mask[8]: 0x7f3e06e66568: mmap mask[8]: 0x7f3e06e76630: mmap mask[8]: ------------------------------------------------------------ perf_event_attr: type 1 size 120 config 0x9 watermark 1 sample_id_all 1 bpf_event 1 { wakeup_events, wakeup_watermark } 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 19 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 21 mmap size 528384B 0x7f3e0697d0b8: mmap mask[8]: 0x7f3e0698d180: mmap mask[8]: 0x7f3e0699d248: mmap mask[8]: 0x7f3e069ad310: mmap mask[8]: 0x7f3e069bd3d8: mmap mask[8]: 0x7f3e069cd4a0: mmap mask[8]: 0x7f3e069dd568: mmap mask[8]: 0x7f3e069ed630: mmap mask[8]: Synthesizing TSC conversion information arch copy Documentation init kernel MAINTAINERS modules.builtin.modinfo perf.data scripts System.map vmlinux block COPYING drivers ipc lbuild Makefile modules.order perf.data.old security tools vmlinux.o certs CREDITS fs Kbuild lib mm Module.symvers README sound usr config-5.2.7-100.fc29.x86_64 crypto include Kconfig LICENSES modules.builtin net samples stdio virt [ perf record: Woken up 1 times to write data ] Looking at the vmlinux_path (8 entries long) Using vmlinux for symbols [ perf record: Captured and wrote 0.013 MB perf.data (8 samples) ] tools/perf/perf record -vv --affinity=cpu -- ls thread mask[8]: empty Using CPUID GenuineIntel-6-5E-3 intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch nr_cblocks: 0 affinity: CPU mmap flush: 1 comp level: 0 ------------------------------------------------------------ perf_event_attr: size 120 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ sys_perf_event_open: pid 23713 cpu 0 group_fd -1 flags 0x8 = 4 sys_perf_event_open: pid 23713 cpu 1 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid 23713 cpu 2 group_fd -1 flags 0x8 = 6 sys_perf_event_open: pid 23713 cpu 3 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid 23713 cpu 4 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid 23713 cpu 5 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid 23713 cpu 6 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid 23713 cpu 7 group_fd -1 flags 0x8 = 13 mmap size 528384B 0x7f3e005bc0b8: mmap mask[8]: 0 0x7f3e005cc180: mmap mask[8]: 1 0x7f3e005dc248: mmap mask[8]: 2 0x7f3e005ec310: mmap mask[8]: 3 0x7f3e005fc3d8: mmap mask[8]: 4 0x7f3e0060c4a0: mmap mask[8]: 5 0x7f3e0061c568: mmap mask[8]: 6 0x7f3e0062c630: mmap mask[8]: 7 ------------------------------------------------------------ perf_event_attr: type 1 size 120 config 0x9 watermark 1 sample_id_all 1 bpf_event 1 { wakeup_events, wakeup_watermark } 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 19 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 21 mmap size 528384B 0x7f3e001330b8: mmap mask[8]: 0x7f3e00143180: mmap mask[8]: 0x7f3e00153248: mmap mask[8]: 0x7f3e00163310: mmap mask[8]: 0x7f3e001733d8: mmap mask[8]: 0x7f3e001834a0: mmap mask[8]: 0x7f3e00193568: mmap mask[8]: 0x7f3e001a3630: mmap mask[8]: Synthesizing TSC conversion information 0x9c9ff0: thread mask[8]: 0 0x9c9ff0: thread mask[8]: 1 0x9c9ff0: thread mask[8]: 2 0x9c9ff0: thread mask[8]: 3 0x9c9ff0: thread mask[8]: 4 arch copy Documentation init kernel MAINTAINERS modules.builtin.modinfo perf.data scripts System.map vmlinux block COPYING drivers ipc lbuild Makefile modules.order perf.data.old security tools vmlinux.o certs CREDITS fs Kbuild lib mm Module.symvers README sound usr config-5.2.7-100.fc29.x86_64 crypto include Kconfig LICENSES modules.builtin net samples stdio virt 0x9c9ff0: thread mask[8]: 5 0x9c9ff0: thread mask[8]: 6 0x9c9ff0: thread mask[8]: 7 0x9c9ff0: thread mask[8]: 0 0x9c9ff0: thread mask[8]: 1 0x9c9ff0: thread mask[8]: 2 0x9c9ff0: thread mask[8]: 3 0x9c9ff0: thread mask[8]: 4 0x9c9ff0: thread mask[8]: 5 0x9c9ff0: thread mask[8]: 6 0x9c9ff0: thread mask[8]: 7 [ perf record: Woken up 0 times to write data ] 0x9c9ff0: thread mask[8]: 0 0x9c9ff0: thread mask[8]: 1 0x9c9ff0: thread mask[8]: 2 0x9c9ff0: thread mask[8]: 3 0x9c9ff0: thread mask[8]: 4 0x9c9ff0: thread mask[8]: 5 0x9c9ff0: thread mask[8]: 6 0x9c9ff0: thread mask[8]: 7 Looking at the vmlinux_path (8 entries long) Using vmlinux for symbols ... [ perf record: Captured and wrote 0.013 MB perf.data (10 samples) ]