Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755456AbZGALV7 (ORCPT ); Wed, 1 Jul 2009 07:21:59 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752244AbZGALVw (ORCPT ); Wed, 1 Jul 2009 07:21:52 -0400 Received: from mx2.mail.elte.hu ([157.181.151.9]:36154 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751013AbZGALVw (ORCPT ); Wed, 1 Jul 2009 07:21:52 -0400 Date: Wed, 1 Jul 2009 13:20:07 +0200 From: Ingo Molnar To: Jaswinder Singh Rajput , Arjan van de Ven , Paul Mackerras , Benjamin Herrenschmidt , Anton Blanchard Cc: Thomas Gleixner , Peter Zijlstra , x86 maintainers , LKML , Alan Cox Subject: Re: [PATCH 3/6 -tip] perf_counter: Add Generalized Hardware vectored co-processor support for AMD Message-ID: <20090701112007.GD15958@elte.hu> References: <1246440815.3403.3.camel@hpdv5.satnam> <1246440909.3403.5.camel@hpdv5.satnam> <1246440977.3403.7.camel@hpdv5.satnam> <1246441043.3403.9.camel@hpdv5.satnam> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1246441043.3403.9.camel@hpdv5.satnam> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.5 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2163 Lines: 49 * Jaswinder Singh Rajput wrote: > $ ./perf stat -e add -e multiply -e divide -e vec-idle-cycles -e vec-stall-cycles -e vec-ops -- /usr/bin/vlc ~jaswinder/Videos/Linus_Torvalds_interview_with_Charlie_Rose_Part_1.flv > > Performance counter stats for '/usr/bin/vlc /home/jaswinder/Videos/Linus_Torvalds_interview_with_Charlie_Rose_Part_1.flv': > > 20177177044 vec-adds (scaled from 66.63%) > 34101687027 vec-muls (scaled from 66.64%) > 3984060862 vec-divs (scaled from 66.71%) > 26349684710 vec-idle-cycles (scaled from 66.65%) > 9052001905 vec-stall-cycles (scaled from 66.66%) > 76440734242 vec-ops (scaled from 66.71%) > > 272.523058097 seconds time elapsed Ok, this looks very nice now - a highly generic and still very useful looking categorization of FPU/MMX/SSE related co-processor hw events. I'm still waiting for feedback from Paulus, BenH and Anton, whether this kind of generic enumeration fits PowerPC well enough. I think from a pure logic/math/physics POV this categorization is pretty complete: a modern co-processor has three fundamental states we are interested in: idle, busy and busy-stalled. It has an 'ops' metric that counts instructions, plus the main operations are add, mul and div. Cell is i guess a complication to be solved, as there the various vector units have separate decoders and separate thread state. This above abstraction only covers the portion of CPU designs where there are vector operations in the main ALU decoder stream of instructions One thing that might be worth exposing is vectored loads/stores in general. But we dont have those in the generic ALU enumeration yet and if then it should be done together. Also, the Nehalem bits need to be tested, i'll try to find time for that. Good stuff. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/