Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2992603AbbEPBXd (ORCPT ); Fri, 15 May 2015 21:23:33 -0400 Received: from mail-ig0-f181.google.com ([209.85.213.181]:35837 "EHLO mail-ig0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2992471AbbEPBXb (ORCPT ); Fri, 15 May 2015 21:23:31 -0400 MIME-Version: 1.0 In-Reply-To: <20150515233808.GH4316@dastard> References: <20150505052205.GS889@ZenIV.linux.org.uk> <20150511180650.GA4147@ZenIV.linux.org.uk> <20150513222533.GA24192@ZenIV.linux.org.uk> <20150514033040.GF7232@ZenIV.linux.org.uk> <20150514112304.GT15721@dastard> <20150515233808.GH4316@dastard> Date: Fri, 15 May 2015 18:23:30 -0700 X-Google-Sender-Auth: 8LLBuauBdGPKehno0BtVfgmbl9Y Message-ID: Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks From: Linus Torvalds To: Dave Chinner Cc: Al Viro , Linux Kernel Mailing List , linux-fsdevel , Christoph Hellwig , Neil Brown Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3285 Lines: 68 On Fri, May 15, 2015 at 4:38 PM, Dave Chinner wrote: > > Right, because it's cold cache performance that everyone complains > about. People really do complain about the hot-cache one too. Did you read the description of the sample benchmark that Jeremy described Windows sales people for using? That kind of thing is actually not that unusual, and they can be big sales tools. We went through similar things with "mindcraft", then netbench/dbench. People will run those benchmarks with enough memory (and often tune things like dirty thresholds etc) explicitly to get rid of the IO component for benchmarking reasons. And often they are just nasty marketing benchmarks and not very meaningful. The "geekbench of filesystem testing", if you will. Fair enough. But those kinds of things have also been very useful in making performance better, because the "real" filesystem benchmarks are usually too nasty to actually run on reasonable machines. So the fake/bad ones are often good at showing things that don't scale well (despite being 100% just CPU-bound) because they show some bottleneck. And sometimes fixing that bottleneck for the non-IO case ends up helping the IO case too. So the one samba profile I remember seeing was probably from early dbench, I'm pretty sure it was Tridge that showed it as a stress-case for samba on Linux. So we're talking a decade ago, I really can't claim I remember the details, but I do remember it being readdir() being 100% CPU-bound. Or rather, it *would* have been 100% CPU-bound, but due to the inode semaphore (and back then it was i_sem, I think, now it's i_mutex) it was actually spending most of the time sleeping/scheduling due to inode semaphore contention. So rather than scaling perfectly with CPU's, it just took basically one CPU. Now, samba has probably changed enormously, and maybe it's not a big deal. But I don't think our filesystem locking has changed at all, because quite frankly, nobody else seems to see it. It tends to be a fileserving thing (the Lustre comment kind of feeds into that). So it might be interesting to have a simple benchmark that people can run. WITHOUT the IO load. Because really, IO isn't that interesting to most of us, especially when we then don't even have IO subsystems that do much parallelism.. I wrote my own (really really stupid) concurrent stat() test just to get good profiles of where the real problems are. It's nasty - it's literally just MAX_THREADS pthread that loop on doing stat() on a list of files for ten seconds, and then it reports the total number of loops. But that stupid thing was actually ridiculously useful, not because the load is meaningful, but because it ended up showing that we had horribly fragile behavior when we had contention on the dentry lock. (That got fixed, although it still ends up sucking when we fall out of RCU mode - but with Al's upcoming patches that should hopefully be really really unusual rather than "every time we see a symlink" etc) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/