Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752507AbdIKCX6 (ORCPT ); Sun, 10 Sep 2017 22:23:58 -0400 Received: from mga05.intel.com ([192.55.52.43]:17265 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752477AbdIKCXz (ORCPT ); Sun, 10 Sep 2017 22:23:55 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.42,375,1500966000"; d="scan'208";a="1013017741" From: kan.liang@intel.com To: acme@kernel.org, peterz@infradead.org, mingo@redhat.com, linux-kernel@vger.kernel.org Cc: jolsa@kernel.org, namhyung@kernel.org, adrian.hunter@intel.com, lukasz.odzioba@intel.com, ak@linux.intel.com, Kan Liang Subject: [PATCH RFC V2 00/10] perf top optimization Date: Sun, 10 Sep 2017 19:23:13 -0700 Message-Id: <1505096603-215017-1-git-send-email-kan.liang@intel.com> X-Mailer: git-send-email 2.5.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4106 Lines: 99 From: Kan Liang The patch series intends to fix the severe performance issue in Knights Landing/Mill, when monitoring in heavy load system. perf top costs a few minutes to show the result, which is unacceptable. With the patch series applied, the latency will reduces to several seconds. machine__synthesize_threads and perf_top__mmap_read costs most of the perf top time (> 99%). Patch 1-9 do the optimization for machine__synthesize_threads. Patch 10 does the optimization for perf_top__mmap_read. Optimization for machine__synthesize_threads - Multithreading the whole process. - The threads number is set to the max online CPU# by default. User can change the threads number through the new option. - Introduces hashtable for machine threads to reduce the lock contention. - The optimization can also benefit other platforms and other perf tools, like perf record. But this patch series doesn't do the optimization for other tools. It can be done later separately. - With this optimization applied, there is a 1.56x speedup in Knights Mill with heavy workload. Optimization for perf_top__mmap_read - switch back to overwrite mode For non overwrite mode, it tries to read everything in the ring buffer and does not check the messup. Once there are lots of samples delivered shortly, the processing time could be very long. Considering the real time requirement for perf top, it should switch back to overwrite mode. - With this optimization applied, there is a huge 53.8x speedup in Knights Mill with heavy workload. - With this optimization applied, the latency of perf_top__mmap_read is less than the default perf top fresh time (2s) in Knights Mill with heavy workload. Here are perf top latency test result on Knights Mill and Skylake server The heavy workload is to compile Linux kernel as below "sudo nice make -j$(grep -c '^processor' /proc/cpuinfo)" Then, "sudo perf top" The latency period is the time between perf top launched and the first profiling result shown. - Latency on Knights Mill (272 CPUs) Original(s) With patch(s) Speedup 272.68 16.48 16.54x - Latency on Skylake server (192 CPUs) Original(s) With patch(s) Speedup 12.28 2.96 4.15x Changes since V1: - Patch 1: machine threads and hashtable related renaming (Arnaldo) - Patch 6: use a smaller locked section for comm_str__put add a locked wrapper for comm_str__findnew (Arnaldo) Kan Liang (10): perf tools: hashtable for machine threads perf tools: using scandir to replace readdir petf tools: using comm_str to replace comm in hist_entry petf tools: introduce a new function to set namespaces id perf tools: lock to protect thread list perf tools: lock to protect comm_str rb tree perf tools: change machine comm_exec type to atomic perf top: implement multithreading for perf_event__synthesize_threads perf top: add option to set the number of thread for event synthesize perf top: switch back to overwrite mode tools/perf/builtin-kvm.c | 3 +- tools/perf/builtin-record.c | 2 +- tools/perf/builtin-top.c | 9 +- tools/perf/builtin-trace.c | 21 +++-- tools/perf/tests/mmap-thread-lookup.c | 2 +- tools/perf/ui/browsers/hists.c | 2 +- tools/perf/util/comm.c | 18 +++- tools/perf/util/event.c | 149 +++++++++++++++++++++++++------- tools/perf/util/event.h | 14 ++- tools/perf/util/evlist.c | 5 +- tools/perf/util/hist.c | 11 +-- tools/perf/util/machine.c | 158 +++++++++++++++++++++------------- tools/perf/util/machine.h | 34 ++++++-- tools/perf/util/rb_resort.h | 5 +- tools/perf/util/sort.c | 8 +- tools/perf/util/sort.h | 2 +- tools/perf/util/thread.c | 68 ++++++++++++--- tools/perf/util/thread.h | 6 +- tools/perf/util/top.h | 1 + 19 files changed, 376 insertions(+), 142 deletions(-) -- 2.5.5