MIME-Version: 1.0
In-Reply-To: <20150515233808.GH4316@dastard>
References: <20150505052205.GS889@ZenIV.linux.org.uk>
	<20150511180650.GA4147@ZenIV.linux.org.uk>
	<20150513222533.GA24192@ZenIV.linux.org.uk>
	<CA+55aFy+NNW1FcJj4Ka2d9u5uvEYOi8xfXH1JnOJ+LSbdtKO-g@mail.gmail.com>
	<20150514033040.GF7232@ZenIV.linux.org.uk>
	<CA+55aFx3g5CzC-jHC55Ui=0+Spck-8Fe1eei9CKzT+ESHBj_mw@mail.gmail.com>
	<20150514112304.GT15721@dastard>
	<CA+55aFw+nuhYcAcWC=_svKQFPGTADAsoJg3NOnf3NjpYM1iPSQ@mail.gmail.com>
	<20150515233808.GH4316@dastard>
Date: Fri, 15 May 2015 18:23:30 -0700
Message-ID: <CA+55aFxxBDtQpnnqp8NU6ridNYbrcyia8=fwZpV7BHXPgMKXYg@mail.gmail.com>
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        Christoph Hellwig <hch@infradead.org>, Neil Brown <neilb@suse.de>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3285
Lines: 68

On Fri, May 15, 2015 at 4:38 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> Right, because it's cold cache performance that everyone complains
> about.

People really do complain about the hot-cache one too.

Did you read the description of the sample benchmark that Jeremy
described Windows sales people for using?

That kind of thing is actually not that unusual, and they can be big
sales tools.

We went through similar things with "mindcraft", then netbench/dbench.
People will run those benchmarks with enough memory (and often tune
things like dirty thresholds etc) explicitly to get rid of the IO
component for benchmarking reasons.

And often they are just nasty marketing benchmarks and not very
meaningful. The "geekbench of filesystem testing", if you will. Fair
enough. But those kinds of things have also been very useful in making
performance better, because the "real" filesystem benchmarks are
usually too nasty to actually run on reasonable machines. So the
fake/bad ones are often good at showing things that don't scale well
(despite being 100% just CPU-bound) because they show some bottleneck.

And sometimes fixing that bottleneck for the non-IO case ends up
helping the IO case too.

So the one samba profile I remember seeing was probably from early
dbench, I'm pretty sure it was Tridge that showed it as a stress-case
for samba on Linux. So we're talking a decade ago, I really can't
claim I remember the details, but I do remember it being readdir()
being 100% CPU-bound. Or rather, it *would* have been 100% CPU-bound,
but due to the inode semaphore (and back then it was i_sem, I think,
now it's i_mutex) it was actually spending most of the time
sleeping/scheduling due to inode semaphore contention. So rather than
scaling perfectly with CPU's, it just took basically one CPU.

Now, samba has probably changed enormously, and maybe it's not a big
deal. But I don't think our filesystem locking has changed at all,
because quite frankly, nobody else seems to see it. It tends to be a
fileserving thing (the Lustre comment kind of feeds into that).

So it might be interesting to have a simple benchmark that people can
run. WITHOUT the IO load. Because really, IO isn't that interesting to
most of us, especially when we then don't even have IO subsystems that
do much parallelism..

I wrote my own (really really stupid) concurrent stat() test just to
get good profiles of where the real problems are. It's nasty - it's
literally just MAX_THREADS pthread that loop on doing stat() on a list
of files for ten seconds, and then it reports the total number of
loops. But that stupid thing was actually ridiculously useful, not
because the load is meaningful, but because it ended up showing that
we had horribly fragile behavior when we had contention on the dentry
lock.

(That got fixed, although it still ends up sucking when we fall out of
RCU mode - but with Al's upcoming patches that should hopefully be
really really unusual rather than "every time we see a symlink" etc)

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/