2002-10-02 02:33:40

by John Levon

[permalink] [raw]
Subject: flock(fd, LOCK_UN) taking 500ms+ ?


In 2.5.40, our application is getting seemingly very large
flock(LOCK_UN) latencies :

Unlock, took 541386 usecs
Old was 567999237 secs, 131042 usecs
now is 567999237 secs, 672428 usecs
Unlock, took 125083 usecs
Old was 567999237 secs, 699310 usecs
now is 567999237 secs, 824393 usecs
Unlock, took 151245 usecs
Old was 567999237 secs, 825119 usecs
now is 567999237 secs, 976364 usecs

(using gettimeofday ...)

No similar times are observed during the corresponding LOCK_EX call.

Is this just a silly app bug somewhere, or something funny in locks.c ?

More details, testing etc. on request ...

regards
john

--
"I never understood what's so hard about picking a unique
first and last name - and not going beyond the 6 character limit."
- Toon Moene


2002-10-02 03:18:05

by John Levon

[permalink] [raw]
Subject: Re: flock(fd, LOCK_UN) taking 500ms+ ?

On Wed, Oct 02, 2002 at 03:39:01AM +0100, John Levon wrote:

> Unlock, took 541386 usecs

Hmm. This :

--- linux-linus/fs/locks.c Sat Sep 28 15:56:28 2002
+++ linux/fs/locks.c Wed Oct 2 04:15:54 2002
@@ -727,11 +727,11 @@
}
unlock_kernel();

- if (found)
- yield();
-
if (new_fl->fl_type == F_UNLCK)
return 0;
+
+ if (found)
+ yield();

lock_kernel();
for_each_lock(inode, before) {

"fixes" the problem (a simultaneous kernel compile is going on; it's a
2-way SMP machine). What is the yield for ? What's the right fix (if
any) ? This turns our app's main loop from a second or two into a
90-second behemoth.

There are no other apps taking locks on the relevant files, btw

regards
john

--
"I never understood what's so hard about picking a unique
first and last name - and not going beyond the 6 character limit."
- Toon Moene

2002-10-02 13:09:09

by Matthew Wilcox

[permalink] [raw]
Subject: Re: flock(fd, LOCK_UN) taking 500ms+ ?

On Wed, Oct 02, 2002 at 04:23:27AM +0100, John Levon wrote:
> --- linux-linus/fs/locks.c Sat Sep 28 15:56:28 2002
> +++ linux/fs/locks.c Wed Oct 2 04:15:54 2002
> @@ -727,11 +727,11 @@
> }
> unlock_kernel();
>
> - if (found)
> - yield();
> -
> if (new_fl->fl_type == F_UNLCK)
> return 0;
> +
> + if (found)
> + yield();
>
> lock_kernel();
> for_each_lock(inode, before) {
>
> "fixes" the problem (a simultaneous kernel compile is going on; it's a
> 2-way SMP machine). What is the yield for ? What's the right fix (if
> any) ? This turns our app's main loop from a second or two into a
> 90-second behemoth.

I'm pretty sure this is correct. There's no particular reason to yield()
if we're unlocking.

I wonder if yield() is the correct call to make anyway. We certainly need
to schedule() to allow any higher-priority task the opportunity to run.
But if we're the highest-priority task downgrading our write-lock to
a read-lock which permits other tasks the opportunity to run, i see no
reason we should _yield_ to them.

Scheduling is a bit of a black art as far as I'm concerned. Someone with
a bit more experience care to comment?

--
Revolutions do not require corporate support.

2002-10-02 16:59:31

by Andrew Morton

[permalink] [raw]
Subject: Re: flock(fd, LOCK_UN) taking 500ms+ ?

Matthew Wilcox wrote:
>
> On Wed, Oct 02, 2002 at 04:23:27AM +0100, John Levon wrote:
> > --- linux-linus/fs/locks.c Sat Sep 28 15:56:28 2002
> > +++ linux/fs/locks.c Wed Oct 2 04:15:54 2002
> > @@ -727,11 +727,11 @@
> > }
> > unlock_kernel();
> >
> > - if (found)
> > - yield();
> > -
> > if (new_fl->fl_type == F_UNLCK)
> > return 0;
> > +
> > + if (found)
> > + yield();
> >
> > lock_kernel();
> > for_each_lock(inode, before) {
> >
> > "fixes" the problem (a simultaneous kernel compile is going on; it's a
> > 2-way SMP machine). What is the yield for ? What's the right fix (if
> > any) ? This turns our app's main loop from a second or two into a
> > 90-second behemoth.
>
> I'm pretty sure this is correct. There's no particular reason to yield()
> if we're unlocking.
>
> I wonder if yield() is the correct call to make anyway. We certainly need
> to schedule() to allow any higher-priority task the opportunity to run.
> But if we're the highest-priority task downgrading our write-lock to
> a read-lock which permits other tasks the opportunity to run, i see no
> reason we should _yield_ to them.
>

sched_yield() sementics changed a lot. It used to mean "take a quick
nap", but it now means "go to the back of the runqueue and stay there
for absolutely ages". The latter is a closer interpretation of the spec,
but it has broken stuff which was tuned to the old behaviour.

It's not really clear why that yield is in there at all? Unless that
code is really, really slow (milliseconds) then probably it should just
be deleted.

If there really is a solid need to hand the CPU over to some now-runnable
higher-priority process then a cond_resched() will suffice.

2002-10-02 18:25:26

by Matthew Wilcox

[permalink] [raw]
Subject: Re: flock(fd, LOCK_UN) taking 500ms+ ?

On Wed, Oct 02, 2002 at 10:04:52AM -0700, Andrew Morton wrote:
> sched_yield() sementics changed a lot. It used to mean "take a quick
> nap", but it now means "go to the back of the runqueue and stay there
> for absolutely ages". The latter is a closer interpretation of the spec,
> but it has broken stuff which was tuned to the old behaviour.

*nod*. This code has been around for many many years ;-)

> It's not really clear why that yield is in there at all? Unless that
> code is really, really slow (milliseconds) then probably it should just
> be deleted.

Heh, you're so focused on perf tuning, Andrew! It's not a matter of
locking, it's a matter of semantics. Here's the comment:

* FL_FLOCK locks never deadlock, an existing lock is always removed before
* upgrading from shared to exclusive (or vice versa). When this happens
* any processes blocked by the current lock are woken up and allowed to
* run before the new lock is applied.
* Andy Walker ([email protected]), June 09, 1995

> If there really is a solid need to hand the CPU over to some now-runnable
> higher-priority process then a cond_resched() will suffice.

I think that's the right thing to do. If I understand right, we'll
check needs_resched at syscall exit, so we don't need to do it for
unlocks, right?

--
Revolutions do not require corporate support.

2002-10-02 18:52:53

by John Levon

[permalink] [raw]
Subject: Re: flock(fd, LOCK_UN) taking 500ms+ ?

On Wed, Oct 02, 2002 at 07:30:52PM +0100, Matthew Wilcox wrote:

> * FL_FLOCK locks never deadlock, an existing lock is always removed before
> * upgrading from shared to exclusive (or vice versa). When this happens
> * any processes blocked by the current lock are woken up and allowed to
> * run before the new lock is applied.
> * Andy Walker ([email protected]), June 09, 1995
>
> > If there really is a solid need to hand the CPU over to some now-runnable
> > higher-priority process then a cond_resched() will suffice.

How will cond_resched() work ? Surely that will only give a chance if
the current process has reached the end of its timeslice (need_resched)
? Isn't "schedule()" the right thing here ?

> check needs_resched at syscall exit, so we don't need to do it for
> unlocks, right?

right ...

regards
john

--
"Me and my friends are so smart, we invented this new kind of art:
Post-modernist throwing darts"
- the Moldy Peaches

2002-10-02 19:18:45

by Robert Love

[permalink] [raw]
Subject: Re: flock(fd, LOCK_UN) taking 500ms+ ?

On Wed, 2002-10-02 at 14:58, John Levon wrote:

> How will cond_resched() work ? Surely that will only give a chance if
> the current process has reached the end of its timeslice (need_resched)
> ? Isn't "schedule()" the right thing here ?

need_resched is set whenever there is a higher priority process that is
now runnable (it is set on wake_up()). Otherwise cond_resched() would
be fairly worthless.. and kernel preemption, etc.

Robert Love

2002-10-02 19:16:41

by Robert Love

[permalink] [raw]
Subject: Re: flock(fd, LOCK_UN) taking 500ms+ ?

On Wed, 2002-10-02 at 14:30, Matthew Wilcox wrote:

> Heh, you're so focused on perf tuning, Andrew! It's not a matter of
> locking, it's a matter of semantics. Here's the comment:
>
> * FL_FLOCK locks never deadlock, an existing lock is always removed before
> * upgrading from shared to exclusive (or vice versa). When this happens
> * any processes blocked by the current lock are woken up and allowed to
> * run before the new lock is applied.
> * Andy Walker ([email protected]), June 09, 1995

Oh, I understand now. In this case, you actually do need to call
yield() as gross as it is.

You have four options:

- call schedule(). This will result in the highest priority task
running again which may be you. You will certainly end up running again
in the near future and there may be processes blocked by the current
lock which will not run before you continue.

- call cond_resched(). Same as above but avoid the superfluous
reschedule back to yourself, since if need_resched is unset we won't
call schedule().

- call yield(). You are put at the end of the runqueue and thus all
runnable tasks should run before you get to run again.

- do nothing. Always my choice :)

So yield() is the only way to guarantee that all lock holders run before
you do (it does not even do that, however: it is possible for you to get
reinserted into the active array and thus it only guarantees that all
lock holders at or above your priority get to run before you).
cond_resched() will guarantee all higher priority tasks run before you.

If you REALLY want to assure you do not run at all until all the other
lock holders ran, you would need to down() a semaphore and not wake up
until all of them have run again (I have no idea how the flock code
looks, if this is even feasible...).

> > If there really is a solid need to hand the CPU over to some now-runnable
> > higher-priority process then a cond_resched() will suffice.
>
> I think that's the right thing to do. If I understand right, we'll
> check needs_resched at syscall exit, so we don't need to do it for
> unlocks, right?

Right. On return to user-space need_resched will be checked and
schedule() called if it is set.

However, it is only set if the newly woken up tasks have a higher
priority than you. Otherwise, schedule() would just select you again.

Robert Love

2002-10-02 19:21:09

by Andrew Morton

[permalink] [raw]
Subject: Re: flock(fd, LOCK_UN) taking 500ms+ ?

Matthew Wilcox wrote:
>
> ...
> * FL_FLOCK locks never deadlock, an existing lock is always removed before
> * upgrading from shared to exclusive (or vice versa). When this happens
> * any processes blocked by the current lock are woken up and allowed to
> * run before the new lock is applied.
> * Andy Walker ([email protected]), June 09, 1995

hm. This is a tricky thing to guarantee. If this process is
high-priority or SCHED_RR or whatever, we want to ensure that
any current holder of the lock gets a CPU slice?

Seems a strange thing to want to do, and if we really want to
implement these semantics then there's quite a bit of stuff
to do - making *all* blocked processes get some CPU will involve
scheduler work, or funny games with semaphores.

Now if we interpret "allowed to run" as meaning "made runnable" then
no probs. Just wake them up.

> > If there really is a solid need to hand the CPU over to some now-runnable
> > higher-priority process then a cond_resched() will suffice.
>
> I think that's the right thing to do. If I understand right, we'll
> check needs_resched at syscall exit, so we don't need to do it for
> unlocks, right?

Sure. Sounds to me like we just want to delete the code ;)

2002-10-02 19:59:40

by Matthew Wilcox

[permalink] [raw]
Subject: Re: flock(fd, LOCK_UN) taking 500ms+ ?

On Wed, Oct 02, 2002 at 12:23:03PM -0700, Andrew Morton wrote:
> hm. This is a tricky thing to guarantee. If this process is
> high-priority or SCHED_RR or whatever, we want to ensure that
> any current holder of the lock gets a CPU slice?
>
> Seems a strange thing to want to do, and if we really want to
> implement these semantics then there's quite a bit of stuff
> to do - making *all* blocked processes get some CPU will involve
> scheduler work, or funny games with semaphores.
>
> Now if we interpret "allowed to run" as meaning "made runnable" then
> no probs. Just wake them up.

Yeah, I think the original author was a little imprecise in his description
of the semantics. The freebsd flock(2) manpage says:

A shared lock may be upgraded to an exclusive lock, and vice versa, sim­
ply by specifying the appropriate lock type; this results in the previous
lock being released and the new lock applied (possibly after other pro­
cesses have gained and released the lock).

So I think what they're trying to say is that changing the lock type is
exactly equivalent to removing the existing lock and then applying the
new lock; it just happens to be one syscall. Using cond_resched() looks
like the right approach.

--
Revolutions do not require corporate support.

2002-10-02 21:31:24

by Manfred Spraul

[permalink] [raw]
Subject: Re: flock(fd, LOCK_UN) taking 500ms+ ?

akpm wrote:
> It's not really clear why that yield is in there at all? Unless that
> code is really, really slow (milliseconds) then probably it should just
> be deleted.
>
It looks like a very simple starvation control:

thread 1:
for(;;) {
F_LOCK
<do a few seconds work>
F_UNLCK
}
thread 2:
F_LOCK.

If there is no yield after unlock, thread 2 might never receive the lock.
Is it possible to figure out if someone is waiting on the lock? F_UNLCK
should schedule, if there is a waiter with higher priority.

--
Manfred