by Jeff Layton

[permalink] [raw]

Subject: Re: [PATCH v1 00/11] locks: scalability improvements for file locking

On Tue, 4 Jun 2013 07:56:44 -0400
Jim Rees <[email protected]> wrote:

> Jeff Layton wrote:
>
> > Might be nice to look at some profiles to confirm all of that. I'd also
> > be curious how much variation there was in the results above, as they're
> > pretty close.
> >
>
> The above is just a random representative sample. The results are
> pretty close when running this test, but I can average up several runs
> and present the numbers. I plan to get a bare-metal test box on which
> to run some more detailed testing and maybe some profiling this week.
>
> Just contributing more runs into the mean doesn't tell us anything about the
> variance. With numbers that close you need the variance to tell whether it's
> a significant change.

Thanks. I'll see if I can get some standard deviation numbers here, and
I'll do it on some bare metal to ensure that virtualization doesn't
skew any results.

FWIW, they were all consistently very close to one another when I ran
these tests, and the times were all consistently shorter than the
unpatched kernel.

That said, this test is pretty rough. Doing this with "time" measures
other things that aren't related to locking. So I'll also see if I
can come up with a way to measure the actual locking performance more
accurately too.

--
Jeff Layton <[email protected]>

2013-06-04 14:21:27

by Stefan Metzmacher

[permalink] [raw]

Subject: Re: [PATCH v1 11/11] locks: give the blocked_hash its own spinlock

Hi Jeff,

> There's no reason we have to protect the blocked_hash and file_lock_list
> with the same spinlock. With the tests I have, breaking it in two gives
> a barely measurable performance benefit, but it seems reasonable to make
> this locking as granular as possible.

as file_lock_{list,lock} is only used for debugging (/proc/locks) after this
change, I guess it would be possible to use RCU instead of a spinlock.

@others: this was the related discussion on IRC
(http://irclog.samba.org/) about this:

16:02 < metze> jlayton: do you have time to discuss your file_lock_lock
changes?
16:02 < jlayton> metze: sure, what's up?
16:03 < jlayton> metze: note that it won't help vl's thundering herd
problems...
16:03 < metze> is it correct that after your last patch file_lock_lock
is only used for /proc/locks?
16:03 < jlayton> well, it's only used to protect the list that is used
for /proc/locks
16:04 < jlayton> it still gets taken whenever a lock is acquired or
released in order to manipulate that list
16:04 < metze> would it be a good idea to use rcu instead of a spin lock?
16:04 < jlayton> I tried using RCU, but it turned out to slow everything
down
16:04 < jlayton> this is not a read-mostly workload unfortunately
16:04 < jlayton> so doing it with mutual exclusion turns out to be faster
16:04 < metze> ok
16:05 < jlayton> I might play around with it again sometime, but I don't
think it really helps. What we need to ensure is
that we optimize the code that manipulates that list,
and RCU list manipulations have larger overhead
16:06 < jlayton> metze: that's a good question though so if you want to
ask it on the list, please do
16:06 < jlayton> others will probably be wondering the same thing
16:08 < metze> maybe it's worth a comment in commit message and the code
16:08 < metze> btw, why don't you remove the ' /* Protects the
file_lock_list and the blocked_hash */' comment?

metze

Attachments:

signature.asc (261.00 B)
OpenPGP digital signature

2013-06-04 14:40:19

On Wed, Jun 05, 2013 at 07:43:09AM -0400, Jeff Layton wrote:
> On Tue, 4 Jun 2013 17:59:50 -0400
> "J. Bruce Fields" <[email protected]> wrote:
>
> > On Fri, May 31, 2013 at 11:07:31PM -0400, Jeff Layton wrote:
> > > Testing has shown that iterating over the blocked_list for deadlock
> > > detection turns out to be a bottleneck. In order to alleviate that,
> > > begin the process of turning it into a hashtable. We start by turning
> > > the fl_link into a hlist_node and the global lists into hlists. A later
> > > patch will do the conversion of the blocked_list to a hashtable.
> >
> > Even simpler would be if we could add a pointer to the (well, a) lock
> > that a lockowner is blocking on, and then we'd just have to follow a
> > pointer. I haven't thought that through, though, perhaps that's hard ot
> > make work....
> >
> > --b.
> >
>
> I considered that as well and it makes sense for the simple local
> filesystem case where you just track ownership based on fl_owner_t.
>
> But...what about lockd? It considers ownership to be a tuple of the
> nlm_host and the pid sent in a lock request. I can't seem to wrap my
> brain around how to make such an approach work there.

I wonder if we could do something vaguely like

struct lock_owner_common {
struct file_lock *blocker;
};

struct nlmsvc_lock_owner {
struct lock_owner_common owner;
unsigned int client_pid;
};

and make fl_owner a (struct lock_owner_common *) and have lockd create
nlmsvc_lock_owners as necessary on the fly. The lm_compare_owner
callback could then be replaced by a pointer comparison. I'm not sure
what kind of locking or refcounting might be needed. But...

> I'll confess though that I haven't tried *too* hard yet

... me neither, so...

> though since I had bigger problems to sort through at the time. Maybe
> we can consider that for a later set?

sounds fine.

--b.

2013-06-05 12:59:49

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH v1 07/11] locks: only pull entries off of blocked_list when they are really unblocked

On Wed, Jun 05, 2013 at 08:38:59AM -0400, Jeff Layton wrote:
> On Wed, 5 Jun 2013 08:24:32 -0400
> "J. Bruce Fields" <[email protected]> wrote:
>
> > On Wed, Jun 05, 2013 at 07:38:22AM -0400, Jeff Layton wrote:
> > > On Tue, 4 Jun 2013 17:58:39 -0400
> > > "J. Bruce Fields" <[email protected]> wrote:
> > >
> > > > On Fri, May 31, 2013 at 11:07:30PM -0400, Jeff Layton wrote:
> > > > > Currently, when there is a lot of lock contention the kernel spends an
> > > > > inordinate amount of time taking blocked locks off of the global
> > > > > blocked_list and then putting them right back on again. When all of this
> > > > > code was protected by a single lock, then it didn't matter much, but now
> > > > > it means a lot of file_lock_lock thrashing.
> > > > >
> > > > > Optimize this a bit by deferring the removal from the blocked_list until
> > > > > we're either applying or cancelling the lock. By doing this, and using a
> > > > > lockless list_empty check, we can avoid taking the file_lock_lock in
> > > > > many cases.
> > > > >
> > > > > Because the fl_link check is lockless, we must ensure that only the task
> > > > > that "owns" the request manipulates the fl_link. Also, with this change,
> > > > > it's possible that we'll see an entry on the blocked_list that has a
> > > > > NULL fl_next pointer. In that event, just ignore it and continue walking
> > > > > the list.
> > > >
> > > > OK, that sounds safe as in it shouldn't crash, but does the deadlock
> > > > detection still work, or can it miss loops?
> > > >
> > > > Those locks that are temporarily NULL would previously not have been on
> > > > the list at all, OK, but... I'm having trouble reasoning about how this
> > > > works now.
> > > >
> > > > Previously a single lock was held interrupted across
> > > > posix_locks_deadlock and locks_insert_block() which guaranteed we
> > > > shouldn't be adding a loop, is that still true?
> > > >
> > > > --b.
> > > >
> > >
> > > I had thought it was when I originally looked at this, but now that I
> > > consider it again I think you may be correct and that there are possible
> > > races here. Since we might end up reblocking behind a different lock
> > > without taking the global spinlock we could flip to blocking behind a
> > > different lock such that a loop is created if you had a complex (>2)
> > > chain of locks.
> > >
> > > I think I'm going to have to drop this approach and instead make it so
> > > that the deadlock detection and insertion into the global blocker
> > > list/hash are atomic.
> >
> > Right. Once you drop the lock you can no longer be sure that what you
> > learned about the file-lock graph stays true.
> >
> > > Ditto for locks_wake_up_blocks on posix locks and
> > > taking the entries off the list/hash.
> >
> > Here I'm not sure what you mean.
> >
>
> Basically, I mean that rather than setting the fl_next pointer to NULL
> while holding only the inode lock and then ignoring those locks in the
> deadlock detection code, we should additionally take the global lock in
> locks_wake_up_blocks too and take the blocked locks off the global list
> and the i_flock list at the same time.

OK, thanks, got it. I have a hard time thinking about that.... But yes
it bothers me that the deadlock detection code could see an out-of-date
value of fl_next, and I can't convince myself that this wouldn't result
in false positives or false negatives.

> That actually might not be completely necessary, but it'll make the
> logic clearer and easier to understand and probably won't hurt
> performance too much. Again, I'll need to do some perf testing to be
> sure.

OK!

--b.