Date: Mon, 29 Jun 2009 21:29:13 +0200
From: Ingo Molnar <mingo@elte.hu>
To: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, paulus@samba.org,
       LKML <linux-kernel@vger.kernel.org>
Subject: Re: [perf] howto switch from pfmon
Message-ID: <20090629192913.GA29295@elte.hu>
References: <4A3FEF75.2020804@inria.fr> <20090623131450.GA31519@elte.hu> <20090623134749.GA6897@elte.hu> <4A40DFF5.7010207@inria.fr> <20090623143601.GA13415@elte.hu> <4A40F31F.4030609@inria.fr>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4A40F31F.4030609@inria.fr>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2028
Lines: 46


* Brice Goglin <Brice.Goglin@inria.fr> wrote:

> > How many threads does your workload typically run, and how do 
> > you get their stats displayed?
> 
> In the aforementioned OpenMP stuff, we use pfmon to get the 
> local/remote numa memory access ratio of each thread. In this 
> specific case, we bind one thread per core (even with a O(1) 
> scheduler, people tend to avoid launching hundreds of threads on 
> current machines). pfmon gives us something similar to the output 
> of 'perf stat' in a file whose filename contains process and 
> thread IDs. We apply our own custom script to convert these many 
> pfmon output files into a single summary saying for each thread, 
> its thread ID, its core binding, its individual numa node access 
> numbers and percentages, and if they were local or remote (with 
> the Barcelona counters we were talking about, you need to check 
> where you were running before you know if accesses to node X are 
> actually local or remote accesses).

Update: based on your feedback the latest perfcounters tree includes 
the following new perf record features:

    -s, --stat            per thread counts
    -n, --no-samples      don't sample

--stat instructs the kernel to gather precise per task/thread stats 
and emits those counts to the data file. Via --no-samples one can do 
non-profiling runs - i.e. only statistics collection.

The 'perf stat' pretty printing side is not fully implemented yet - 
right now you can only see these stats if you look for 
PERF_EVENT_READ counts in the raw event log:

   perf report -D | grep PERF_EVENT_READ

But the biggest piece, the kernel and perf record side is there 
already. What kind of output would you prefer? Maybe you'd like to 
take a stab at implementing the perf report side?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/