Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751158Ab3JRGn1 (ORCPT ); Fri, 18 Oct 2013 02:43:27 -0400 Received: from mail-ee0-f52.google.com ([74.125.83.52]:35293 "EHLO mail-ee0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750895Ab3JRGnZ (ORCPT ); Fri, 18 Oct 2013 02:43:25 -0400 Date: Fri, 18 Oct 2013 08:43:21 +0200 From: Ingo Molnar To: "H. Peter Anvin" Cc: Neil Horman , Eric Dumazet , linux-kernel@vger.kernel.org, sebastien.dugue@bull.net, Thomas Gleixner , Ingo Molnar , x86@kernel.org, netdev@vger.kernel.org Subject: Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's Message-ID: <20131018064321.GG14264@gmail.com> References: <1381510298-20572-1-git-send-email-nhorman@tuxdriver.com> <20131012172124.GA18241@gmail.com> <20131014202854.GH26880@hmsreliant.think-freely.org> <1381785560.2045.11.camel@edumazet-glaptop.roam.corp.google.com> <1381789127.2045.22.camel@edumazet-glaptop.roam.corp.google.com> <20131017003421.GA31470@hmsreliant.think-freely.org> <20131017084121.GC22705@gmail.com> <52602A29.506@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52602A29.506@zytor.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2836 Lines: 63 * H. Peter Anvin wrote: > On 10/17/2013 01:41 AM, Ingo Molnar wrote: > > > > To correctly simulate the workload you'd have to: > > > > - allocate a buffer larger than your L2 cache. > > > > - to measure the effects of the prefetches you'd also have to randomize > > the individual buffer positions. See how 'perf bench numa' implements a > > random walk via --data_rand_walk, in tools/perf/bench/numa.c. > > Otherwise the CPU might learn your simplistic stream direction and the > > L2 cache might hw-prefetch your data, interfering with any explicit > > prefetches the code does. In many real-life usecases packet buffers are > > scattered. > > > > Also, it would be nice to see standard deviation noise numbers when two > > averages are close to each other, to be able to tell whether differences > > are statistically significant or not. > > > Seriously, though, how much does it matter? All the above seems likely > to do is to drown the signal by adding noise. I think it matters a lot and I don't think it 'adds' noise - it measures something else (cache cold behavior - which is the common case for first-time csum_partial() use for network packets), which was not measured before, and that that is by its nature has different noise patterns. I've done many cache-cold measurements myself and had no trouble achieving statistically significant results and high precision. > If the parallel (threaded) checksumming is faster, which theory says it > should and microbenchmarking confirms, how important are the > macrobenchmarks? Microbenchmarks can be totally blind to things like the ideal prefetch window size. (or whether a prefetch should be done at all: some CPUs will throw away prefetches if enough regular fetches arrive.) Also, 'naive' single-threaded algorithms can occasionally be better in the cache-cold case because a linear, predictable stream of memory accesses might saturate the memory bus better than a somewhat random looking, interleaved web of accesses that might not harmonize with buffer depths. I _think_ if correctly tuned then the parallel algorithm should be better in the cache cold case, I just don't know with what parameters (and the algorithm has at least one free parameter: the prefetch window size), and I don't know how significant the effect is. Also, more fundamentally, I absolutely detest doing no measurements or measuring the wrong thing - IMHO there are too many 'blind' optimization commits in the kernel with little to no observational data attached. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/