Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp4806590imm; Mon, 15 Oct 2018 23:39:55 -0700 (PDT) X-Google-Smtp-Source: ACcGV63ytICn2RFuFuJru1stgy8/9q79B3YEjraW0K3lJk+mV1vtLiLySZ1gT7Awupr5xDiRQSyX X-Received: by 2002:a17:902:f096:: with SMTP id go22mr20497390plb.235.1539671995148; Mon, 15 Oct 2018 23:39:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539671995; cv=none; d=google.com; s=arc-20160816; b=cmirYrP44mYmgVY2im4FfD8cngKhFOevnRfW5oOWaTUw8MV6ww5Pk2Lro6ik1Ip/ZD 1bB2Der8kEd8JIIcWKtbD+aHil3By0ncNgmwCDvI7lWDvrUw35/jeNnV/buvdAmo9C0b pJfbkny1tvSBTZVoYgeb8NDoehqp4YH3EEcxTA9yqHOOMmrnvDXtZuDQAhi2g38lctlq lDVY1JQBLWdirkNRb+/kY9UfUq741dD2ngjrnqUaeb8s7bfze2fWhdVzKOmuLY9HIS2B BgqB+uKRD+CoudBhxwgW7Tdj94joCHAwsWDJCZ/fagixSZlcczvSYfj4BmRIpwcoMIkf m0HQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject; bh=syzZnuB0Cp4MwbO66jbZi4ALSQkiAMb/5jj+p7JMF+I=; b=QCRkhiDAqRgmxtp6BKtErU0IdfWj0549iZ+yU1YzzCSqo3dP4lAoIhFFz3KSJZ4mvP vVyfuooH5Dog0E2zJcs5FycbdsJMkywohJZL73nuLLbW0CKULG/0/gk+4aszeXvPpoFd zLCnWT9WJlA/1dH4Wn9my7VihY3AXcF3V6MtdreNAHitjvuRwfJKW4y2YwuQQNe+2U9J 27E+/47zgeduMT7GL3LqQ4bgpWLDO6rUjlkrIpfByPIRl8TbcV2BNialUJrBz3Cw0cp6 xUoOxNvMWT1u0XOtccEFysYVFAUaPSgcLSu0iQ1iYx+rUW6AXWoM4NnnTepVHK8xyUMs q9Sg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g187-v6si12876617pgc.151.2018.10.15.23.39.39; Mon, 15 Oct 2018 23:39:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727353AbeJPO2E (ORCPT + 99 others); Tue, 16 Oct 2018 10:28:04 -0400 Received: from mga02.intel.com ([134.134.136.20]:21507 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727094AbeJPO2E (ORCPT ); Tue, 16 Oct 2018 10:28:04 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Oct 2018 23:39:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,387,1534834800"; d="scan'208";a="241660898" Received: from linux.intel.com ([10.54.29.200]) by orsmga004.jf.intel.com with ESMTP; 15 Oct 2018 23:39:08 -0700 Received: from [10.125.252.40] (abudanko-mobl.ccr.corp.intel.com [10.125.252.40]) by linux.intel.com (Postfix) with ESMTP id 6C87C580444; Mon, 15 Oct 2018 23:39:05 -0700 (PDT) Subject: Re: [RFC][PATCH] perf: Rewrite core context handling To: Stephane Eranian Cc: Peter Zijlstra , Ingo Molnar , LKML , Arnaldo Carvalho de Melo , Alexander Shishkin , Jiri Olsa , songliubraving@fb.com, Thomas Gleixner , Mark Rutland , megha.dey@intel.com, frederic@kernel.org References: <20181010104559.GO5728@hirez.programming.kicks-ass.net> <3a738a08-2295-a4e9-dce7-a3e2b2ad794e@linux.intel.com> <20181015083448.GN9867@hirez.programming.kicks-ass.net> From: Alexey Budankov Organization: Intel Corp. Message-ID: Date: Tue, 16 Oct 2018 09:39:04 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 15.10.2018 21:31, Stephane Eranian wrote: > Hi, > > On Mon, Oct 15, 2018 at 10:29 AM Alexey Budankov > wrote: >> >> >> Hi, >> On 15.10.2018 11:34, Peter Zijlstra wrote: >>> On Mon, Oct 15, 2018 at 10:26:06AM +0300, Alexey Budankov wrote: >>>> Hi, >>>> >>>> On 10.10.2018 13:45, Peter Zijlstra wrote: >>>>> Hi all, >>>>> >>>>> There have been various issues and limitations with the way perf uses >>>>> (task) contexts to track events. Most notable is the single hardware PMU >>>>> task context, which has resulted in a number of yucky things (both >>>>> proposed and merged). >>>>> >>>>> Notably: >>>>> >>>>> - HW breakpoint PMU >>>>> - ARM big.little PMU >>>>> - Intel Branch Monitoring PMU >>>>> >>>>> Since we now track the events in RB trees, we can 'simply' add a pmu >>>>> order to them and have them grouped that way, reducing to a single >>>>> context. Of course, reality never quite works out that simple, and below >>>>> ends up adding an intermediate data structure to bridge the context -> >>>>> pmu mapping. >>>>> >>>>> Something a little like: >>>>> >>>>> ,------------------------[1:n]---------------------. >>>>> V V >>>>> perf_event_context <-[1:n]-> perf_event_pmu_context <--- perf_event >>>>> ^ ^ | | >>>>> `--------[1:n]---------' `-[n:1]-> pmu <-[1:n]-' >>>>> >>>>> This patch builds (provided you disable CGROUP_PERF), boots and survives >>>>> perf-top without the machine catching fire. >>>>> >>>>> There's still a fair bit of loose ends (look for XXX), but I think this >>>>> is the direction we should be going. >>>>> >>>>> Comments? >>>>> >>>>> Not-Quite-Signed-off-by: Peter Zijlstra (Intel) >>>>> --- >>>>> arch/powerpc/perf/core-book3s.c | 4 >>>>> arch/x86/events/core.c | 4 >>>>> arch/x86/events/intel/core.c | 6 >>>>> arch/x86/events/intel/ds.c | 6 >>>>> arch/x86/events/intel/lbr.c | 16 >>>>> arch/x86/events/perf_event.h | 6 >>>>> include/linux/perf_event.h | 80 +- >>>>> include/linux/sched.h | 2 >>>>> kernel/events/core.c | 1412 ++++++++++++++++++++-------------------- >>>>> 9 files changed, 815 insertions(+), 721 deletions(-) >>>> >>>> Rewrite is impressive however it doesn't result in code base reduction as it is. >>> >>> Yeah.. that seems to be nature of these things .. >>> >>>> Nonetheless there is a clear demand for per pmu events groups tracking and rotation >>>> in single cpu context (HW breakpoints, ARM big.little, Intel LBRs) and there is >>>> a supply thru groups ordering on RB-tree. >>>> >>>> This might be driven into the kernel by some new Perf features that would base on >>>> that RB-tree groups ordering or by refactoring of existing code but in the way it >>>> would result in overall code base reduction thus lowering support cost. >>> >>> If you have a concrete suggestion on how to reduce complexity? I tried, >>> but couldn't find any (without breaking something). >> >> Could some of those PMUs (HW breakpoints, ARM big.little, Intel LBRs) >> or other Perf related code be adjusted now so that overall subsystem >> code base would reduce? >> > I have always had a hard time understanding the role of all these structs in > the generic code. This is still very confusing and very hard to follow. > > In my mind, you have per-task and per-cpu perf_events contexts. > And for each you can have multiple PMUs, some hw some sw. > Each PMU has its own list of events maintained in RB tree. There is > never any interactions between PMUs. > > Maybe this is how this is done or proposed by your patches, but it > certainly is not > obvious. > > Also the Intel LBR is not a PMU on is own. Maybe you are talking about > the BTS in > arch/x86/even/sintel/bts.c. I am referring to Intel Branch Monitoring PMU mentioned in the description. Thanks for correction. - Alexey > > >>> >>> The active lists and pmu_ctx_list could arguably be replaced with >>> (slower) iteratons over the RB tree, but you'll still need the per pmu >>> nr_events/nr_active counts to determine if rotation is required at all. >>> >>> And like you know, performance is quite important here too. I'd love to >>> reduce complexity while maintaining or improve performance, but that >>> rarely if ever happens :/ >>> >