Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946857AbWKKANI (ORCPT ); Fri, 10 Nov 2006 19:13:08 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1946813AbWKKANH (ORCPT ); Fri, 10 Nov 2006 19:13:07 -0500 Received: from mailout1.vmware.com ([65.113.40.130]:54491 "EHLO mailout1.vmware.com") by vger.kernel.org with ESMTP id S1946857AbWKKANE convert rfc822-to-8bit (ORCPT ); Fri, 10 Nov 2006 19:13:04 -0500 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT Subject: RE: touch_cache() only touches two thirds Date: Fri, 10 Nov 2006 16:12:19 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: touch_cache() only touches two thirds Thread-Index: AccEoeggPGuYib7PR1ObQZqbfZqbogAhCT8/ References: From: "Bela Lubkin" To: "Andi Kleen" Cc: , Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1635 Lines: 40 Andi Kleen wrote: > "Bela Lubkin" writes: > > > > /* > > * Dirty a big buffer in a hard-to-predict (for the L2 cache) way. This > > * is the operation that is timed, so we try to generate unpredictable > > * cachemisses that still end up filling the L2 cache: > > */ > > The comment is misleading anyways. AFAIK several of the modern > CPUs (at least K8, later P4s, Core2, POWER4+, PPC970) have prefetch > predictors advanced enough to follow several streams forward and backwards > in parallel. > > I hit this while doing NUMA benchmarking for example. > > Most likely to be really unpredictable you need to use a > true RND and somehow make sure still the full cache range > is covered. The corrected code in covers the full cache range. Granted that modern CPUs may be able to track multiple simultaneous cache access streams: how many such streams are they likely to be able to follow at once? It seems like going from 1 to 2 would be a big win, 2 to 3 a small win, beyond that it wouldn't likely make much incremental difference. So what do the actual implementations in the field support? The code (original and corrected) uses 6 simultaneous streams. I have a modified version that takes a `ways' parameter to use an arbitrary number of streams. I'll post that onto bugzilla.kernel.org. >Bela< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/