2003-02-17 19:01:28

by Chris Friesen

[permalink] [raw]
Subject: fcntl and flock wakeups not FIFO?


I've been doing some experimenting with locking on 2.4.18 and have
noticed that if I have a number of writers waiting on a lock, they are
not woken up in the order in which they requested the lock.

Is this expected? If so, what was the reasoning for this and are there
any patches to give FIFO wakeups?

Thanks,

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]



2003-02-18 00:50:58

by Matthew Wilcox

[permalink] [raw]
Subject: Re: fcntl and flock wakeups not FIFO?


[cc'ing the person or list mentioned in MAINTAINERS would get you a better
response :-P]

> I've been doing some experimenting with locking on 2.4.18 and have
> noticed that if I have a number of writers waiting on a lock, they are
> not woken up in the order in which they requested the lock.
>
> Is this expected? If so, what was the reasoning for this and are there
> any patches to give FIFO wakeups?

That certainly isn't what's supposed to happen. They should get woken
up in-order. The code in 2.4.18 seems to be doing that. Are you doing
anything clever with scheduling?

--
"It's not Hollywood. War is real, war is primarily not about defeat or
victory, it is about death. I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

2003-02-18 04:41:02

by Chris Friesen

[permalink] [raw]
Subject: Re: fcntl and flock wakeups not FIFO?

Matthew Wilcox wrote:
> [cc'ing the person or list mentioned in MAINTAINERS would get you
> a better response :-P]

Hmm...that might be a good idea. :)

>>I've been doing some experimenting with locking on 2.4.18 and have
>>noticed that if I have a number of writers waiting on a lock, they are
>>not woken up in the order in which they requested the lock.
>>
>>Is this expected? If so, what was the reasoning for this and are there
>>any patches to give FIFO wakeups?
>
>
> That certainly isn't what's supposed to happen. They should get woken
> up in-order. The code in 2.4.18 seems to be doing that. Are you
> doing anything clever with scheduling?

Well maybe a little bit on the production box, but I don't think its the
cause since the same thing happens on my home machine with a stock
Mandrake 9 kernel (2.4.19-16mdk).

Here's the test app:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/file.h>

int main(int argc, char **argv)
{
int fd = open("/dev/null", O_RDWR);
if (fd < 0)
{
perror("open");
exit(-1);
}

printf("aquiring exclusive lock\n");
int rc = flock(fd, LOCK_EX);
if (rc < 0)
{
perror("flock");
exit(-1);
}

printf("got lock\n");

while(1)
pause();

return 0;
}

I start up four different instances of it in different windows, then
kill them (ctrl-c) in the order that I started them.

It doesn't happen every time, but they don't always get the lock in the
same order that I started them.

Chris



2003-02-18 14:35:01

by Chris Friesen

[permalink] [raw]
Subject: Re: fcntl and flock wakeups not FIFO?

Matthew Wilcox wrote:
>>I've been doing some experimenting with locking on 2.4.18 and have
>>noticed that if I have a number of writers waiting on a lock, they are
>>not woken up in the order in which they requested the lock.
>>
>>Is this expected? If so, what was the reasoning for this and are there
>>any patches to give FIFO wakeups?
>
> That certainly isn't what's supposed to happen. They should get woken
> up in-order. The code in 2.4.18 seems to be doing that. Are you
> doing anything clever with scheduling?


I have a potential cause here, but I'm not sure if it makes sense. The
following code (slightly reformatted) is taken from locks.c in the
Mandrake 2.4.19-16mdk kernel.


static void locks_wake_up_blocks(struct file_lock *blocker,
unsigned int wait)
{
while (!list_empty(&blocker->fl_block)) {
struct file_lock *waiter = list_entry(blocker->fl_block.next,
struct file_lock, fl_block);
if (wait) {
locks_notify_blocked(waiter);

/* Let the blocked process remove waiter from the
* block list when it gets scheduled.
*/
current->policy |= SCHED_YIELD;
schedule();
} else {
/* Remove waiter from the block list, because by the
* time it wakes up blocker won't exist any more.
*/
locks_delete_block(waiter);
locks_notify_blocked(waiter);
}
}
}


It appears that if this function is called with a wait value of zero,
all of the waiting processes will be woken up before the scheduler gets
called. This means that the scheduler ends up picking which process
runs rather than the locking code.

Looking through the file, there is no call chain on an unlock or on
closing the last locked fd which can give a nonzero wait value, meaning
that we will always end up with the scheduler making the decision in
these cases.

Am I missing something?

Chris




2003-02-18 14:52:04

by Matthew Wilcox

[permalink] [raw]
Subject: Re: fcntl and flock wakeups not FIFO?

On Tue, Feb 18, 2003 at 09:44:19AM -0500, Chris Friesen wrote:
> > That certainly isn't what's supposed to happen. They should get woken
> > up in-order. The code in 2.4.18 seems to be doing that. Are you
> > doing anything clever with scheduling?

> static void locks_wake_up_blocks(struct file_lock *blocker,
> unsigned int wait)
> {
> while (!list_empty(&blocker->fl_block)) {
> struct file_lock *waiter = list_entry(blocker->fl_block.next,
> struct file_lock, fl_block);
> if (wait) {
> locks_notify_blocked(waiter);
>
> /* Let the blocked process remove waiter from the
> * block list when it gets scheduled.
> */
> current->policy |= SCHED_YIELD;
> schedule();
> } else {
> /* Remove waiter from the block list, because by the
> * time it wakes up blocker won't exist any more.
> */
> locks_delete_block(waiter);
> locks_notify_blocked(waiter);
> }
> }
> }
>
> It appears that if this function is called with a wait value of zero,
> all of the waiting processes will be woken up before the scheduler gets
> called. This means that the scheduler ends up picking which process
> runs rather than the locking code.

Right. That's why I asked whether you were doing something clever with
scheduling ;-)

> Looking through the file, there is no call chain on an unlock or on
> closing the last locked fd which can give a nonzero wait value, meaning
> that we will always end up with the scheduler making the decision in
> these cases.

I'm impressed that you chased it through ;-) This logic is mostly gone
from 2.5 because I found it too hard to keep in my mind while working
on this file.

> Am I missing something?

Nope, it's true. But the tasks get marked as runnable in the right order,
so the scheduler should be doing the right thing -- if any tasks really
have a better reason to run first (whether it's through RT scheduling
or through standard Unix priority scheduling) then they'll get the lock
first. Otherwise, I'd've thought it should be first-runnable, first-run.

--
"It's not Hollywood. War is real, war is primarily not about defeat or
victory, it is about death. I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

2003-02-18 18:51:42

by Chris Friesen

[permalink] [raw]
Subject: Re: fcntl and flock wakeups not FIFO?

Matthew Wilcox wrote:
> On Tue, Feb 18, 2003 at 09:44:19AM -0500, Chris Friesen wrote:


>>It appears that if this function is called with a wait value of zero,
>>all of the waiting processes will be woken up before the scheduler gets
>>called. This means that the scheduler ends up picking which process
>>runs rather than the locking code.

> Right. That's why I asked whether you were doing something clever with
> scheduling ;-)

Ah, okay.

>>Looking through the file, there is no call chain on an unlock or on
>>closing the last locked fd which can give a nonzero wait value, meaning
>>that we will always end up with the scheduler making the decision in
>>these cases.

> I'm impressed that you chased it through ;-)

I was bored and it was bothering me.... :)



>>Am I missing something?

> Nope, it's true. But the tasks get marked as runnable in the right order,
> so the scheduler should be doing the right thing -- if any tasks really
> have a better reason to run first (whether it's through RT scheduling
> or through standard Unix priority scheduling) then they'll get the lock
> first. Otherwise, I'd've thought it should be first-runnable, first-run.

Apparently not always. I guess it's probably good enough for my
purposes the way it is, it just surprised me a bit.

Is 2.5 the same way? (Haven't looked at it yet.)

Chris



--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]