Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757118AbZGCMuh (ORCPT ); Fri, 3 Jul 2009 08:50:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753350AbZGCMuV (ORCPT ); Fri, 3 Jul 2009 08:50:21 -0400 Received: from hera.kernel.org ([140.211.167.34]:55225 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754382AbZGCMuU (ORCPT ); Fri, 3 Jul 2009 08:50:20 -0400 Subject: Re: [PATCH 1/2 -tip] perf_counter: Add generalized hardware vectored co-processor support for AMD and Intel Corei7/Nehalem From: Jaswinder Singh Rajput To: Ingo Molnar Cc: Arjan van de Ven , Paul Mackerras , Benjamin Herrenschmidt , Anton Blanchard , Thomas Gleixner , Peter Zijlstra , x86 maintainers , LKML , Alan Cox In-Reply-To: <1246622122.2322.25.camel@jaswinder.satnam> References: <1246440815.3403.3.camel@hpdv5.satnam> <1246440909.3403.5.camel@hpdv5.satnam> <1246440977.3403.7.camel@hpdv5.satnam> <1246441043.3403.9.camel@hpdv5.satnam> <20090701112007.GD15958@elte.hu> <20090701112704.GF15958@elte.hu> <1246448441.6940.3.camel@hpdv5.satnam> <20090701114928.GI15958@elte.hu> <1246527872.13659.2.camel@hpdv5.satnam> <20090703102953.GF32128@elte.hu> <1246622122.2322.25.camel@jaswinder.satnam> Content-Type: text/plain Date: Fri, 03 Jul 2009 18:19:37 +0530 Message-Id: <1246625377.3088.10.camel@hpdv5.satnam> Mime-Version: 1.0 X-Mailer: Evolution 2.24.5 (2.24.5-1.fc10) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2784 Lines: 73 On Fri, 2009-07-03 at 17:25 +0530, Jaswinder Singh Rajput wrote: > On Fri, 2009-07-03 at 12:29 +0200, Ingo Molnar wrote: > > * Jaswinder Singh Rajput wrote: > > > > > Performance counter stats for '/usr/bin/rhythmbox /home/jaswinder/Music/singhiskinng.mp3': > > > > > > 17552264 vec-adds (scaled from 66.28%) > > > 19715258 vec-muls (scaled from 66.63%) > > > 15862733 vec-divs (scaled from 66.82%) > > > 23735187095 vec-idle-cycles (scaled from 66.89%) > > > 11353159 vec-stall-cycles (scaled from 66.90%) > > > 36628571 vec-ops (scaled from 66.48%) > > > > Is stall-cycles equivalent to busy-cycles? > > > hmm, normally we can use these terms interchangeably. But they can be > different some times. > > busy means it is already executing some instructions so it will not take > another instruction. > > stall can be busy(executing) or non-executing may be it is waiting for > some operands due to cache miss. > > > > I.e. do we have this > > general relationship to the cycle event: > > > > cycles = vec-stall-cycles + vec-idle-cycles > > > > ? Like on AMD : 13390918485 vec-adds (scaled from 57.07%) 22465091289 vec-muls (scaled from 57.22%) 2643789384 vec-divs (scaled from 57.21%) 17922784596 vec-idle-cycles (scaled from 57.23%) 6402888606 vec-stall-cycles (scaled from 57.17%) 55823491597 cycles (scaled from 57.05%) 51035264218 vec-ops (scaled from 57.05%) 187.494664172 seconds time elapsed vec-idle-cycles + vec-stall-cycles = 24325673202 so cycles = 2.29 * (vec-idle-cycles + vec-stall-cycles) On AMD I used : EventSelect 0D7h Dispatch Stall for FPU Full The number of processor cycles the decoder is stalled because the scheduler for the Floating Point Unit is full. This condition can be caused by a lack of parallelism in FP-intensive code, or by cache misses on FP operand loads (which could also show up as EventSelect 0D8h instead, depending on the nature of the instruction sequences). May occur simultaneously with certain other stall conditions; see EventSelect 0D1h So stall is due to lack of parallelism and cache misses. If we keep on increasing the size of FP units and cache may at some point be we can get vec-stall-cycles = zero. Thanks, -- JSR http://userweb.kernel.org/~jaswinder/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/