From: "Liang, Kan" <kan.liang@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
CC: "acme@kernel.org" <acme@kernel.org>,
        "eranian@google.com" <eranian@google.com>,
        "andi@firstfloor.org" <andi@firstfloor.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH V2 1/6] perf,core: allow invalid context events to be
 part of sw/hw groups
Thread-Topic: [PATCH V2 1/6] perf,core: allow invalid context events to be
 part of sw/hw groups
Thread-Index: AQHQd44JMA1QjYHgG0GA8QKAdb90yJ1NzhQAgADLGRA=
Date: Thu, 16 Apr 2015 14:53:41 +0000
Message-ID: <37D7C6CF3E00A74B8858931C1DB2F077017C9293@SHSMSX103.ccr.corp.intel.com>
References: <1429084576-1078-1-git-send-email-kan.liang@intel.com>
 <20150415172856.GA5029@twins.programming.kicks-ass.net>
In-Reply-To: <20150415172856.GA5029@twins.programming.kicks-ass.net>
Accept-Language: zh-CN, en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8BIT
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4286
Lines: 110


> 
> On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
> > The event count only be read when the event is already sched_in.
> 
> Yeah, so no. This breaks what groups are. Group events _must_ be co-
> scheduled. You cannot guarantee you can schedule events from another
> PMU.

Why? I think it's similar as sw and hw mixed event group.
group_sched_in will co-schedule all events in sibling_list. 
Invalid context events is already added in sibling_list in
perf_install_in_context.
So all group events will be scheduled together. 
event->pmu guarantee proper add is called.
For invalid context events, everything is global. It never fails to schedule.

> 
> Also, I cannot see how this can possibly work, you cannot put these things
> on the same event_context.

Why not? For a sw and hw mixed event group, they use same
hw event_context. 

Actually, the invalid context events don't have any state to switch 
on context switch. They can attach to any event_context.

> 
> Also, how does this work wrt cpumasks, regular events are per cpu, uncore
> events are per node.

Currently, uncore events is only available on the first cpu of the node.
So if it's a hw and uncore mixed group, only cpu 0 do the group.
For other CPUs, there will be only hw event. 
Perf tool will guarantee it. Here are some examples.

[perf]$ sudo perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S'
-C 0-4 -vvv -- sleep 1
------------------------------------------------------------
perf_event_attr:
  size                             112
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|READ|ID|CPU|PERIOD
  read_format                      ID|GROUP
  disabled                         1
  mmap                             1
  comm                             1
  freq                             1
  task                             1
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8
------------------------------------------------------------
perf_event_attr:
  type                             15
  size                             112
  config                           304
  sample_type                      IP|TID|TIME|READ|ID|CPU|PERIOD
  read_format                      ID|GROUP
  freq                             1
  sample_id_all                    1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd 3  flags 0x8
mmap size 528384B
perf event ring buffer mmapped per cpu


[perf]$ sudo perf stat -e '{cycles,uncore_imc_0/cas_count_read/}' -C 0-4 
--per-core -- sleep 1

 Performance counter stats for 'CPU(s) 0-4':

S0-C0           1            1367712      cycles                                                        (100.00%)
S0-C0           1               0.17 MiB  uncore_imc_0/cas_count_read/
S0-C1           1             982923      cycles
S0-C1           0      <not counted> MiB  uncore_imc_0/cas_count_read/
S0-C2           1             958585      cycles
S0-C2           0      <not counted> MiB  uncore_imc_0/cas_count_read/
S0-C3           1             978161      cycles
S0-C3           0      <not counted> MiB  uncore_imc_0/cas_count_read/
S0-C4           1             976012      cycles
S0-C4           0      <not counted> MiB  uncore_imc_0/cas_count_read/

       1.001151765 seconds time elapsed


> 
> There is so much broken stuff here without explanation its not funny.
> 
> Please explain how this does not completely wreck everything?

OK. I will refine the description when questions as above are addressed.

Thanks,
Kan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/