LinuxLists.cc - [RFC][PATCH] Fix hang in posix_locks

[permalink] [raw]

Subject: Re: [RFC][PATCH] Fix hang in posix_locks_deadlock()

On Thu, Oct 18, 2007 at 02:57:59PM -0400, George G. Davis wrote:
> On Wed, Oct 17, 2007 at 02:51:57PM -0400, George G. Davis wrote:
> > ---
> > Not sure if this is the correct fix but it does resolve the hangs we're
> > observing in posix_locks_deadlock().
>
> Please disregard the previous patch, it's not quite right - causes occasional
> segfaults and clearly did not retain the posix_same_owner() checks implemented
> in the original code. Here's a new version which I believe retains the
> intent of the original code:
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 7f9a3ea..e012b27 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -702,14 +702,12 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
> {
> struct file_lock *fl;
>
> -next_task:
> if (posix_same_owner(caller_fl, block_fl))
> return 1;
> list_for_each_entry(fl, &blocked_list, fl_link) {
> if (posix_same_owner(fl, block_fl)) {
> - fl = fl->fl_next;
> - block_fl = fl;
> - goto next_task;
> + if (posix_same_owner(caller_fl, fl))
> + return 1;
> }
> }
> return 0;

It may take multiple steps to identify a deadlock. With the above
you'll miss deadlocks like

process 1 is requesting a lock held by process 2
process 2 is blocking on a lock held by process 3
process 3 is blocking on a lock held by process 1.

Could you give more details about how you're causing
posix_locks_deadlock to hang? Is there a simple test-case you can post?

--b.

>
>
> I'm not sure about those "fl = fl->fl_next; block_fl = fl;" statements,
> first, the order of those statements seems reversed to me. Otherwise,
> I think the intent was to advance the "fl" for loop variable to the next
> entry in the list but it doesn't work out that way at all - the for
> loop restarts from the beginning - this is where we get into an
> infinite loop condition. Whether the test case I posted before is
> valid or not, I reckon it shouldn't be possible for non-root Joe user
> to contrive a test case which can hang the system as we're observing
> with that test case. The above patch fixes the hang.
>
> Comments greatly appreciated...
>
> --
> Regards,
> George
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2007-10-26 22:47:18

[permalink] [raw]

Subject: Re: [RFC][PATCH] Fix hang in posix_locks_deadlock()

On Fri, Oct 26, 2007 at 01:07:50PM -0400, bfields wrote:
> On Thu, Oct 18, 2007 at 02:57:59PM -0400, George G. Davis wrote:
> > On Wed, Oct 17, 2007 at 02:51:57PM -0400, George G. Davis wrote:
> > > ---
> > > Not sure if this is the correct fix but it does resolve the hangs we're
> > > observing in posix_locks_deadlock().
> >
> > Please disregard the previous patch, it's not quite right - causes occasional
> > segfaults and clearly did not retain the posix_same_owner() checks implemented
> > in the original code. Here's a new version which I believe retains the
> > intent of the original code:
> >
> > diff --git a/fs/locks.c b/fs/locks.c
> > index 7f9a3ea..e012b27 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -702,14 +702,12 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
> > {
> > struct file_lock *fl;
> >
> > -next_task:
> > if (posix_same_owner(caller_fl, block_fl))
> > return 1;
> > list_for_each_entry(fl, &blocked_list, fl_link) {
> > if (posix_same_owner(fl, block_fl)) {
> > - fl = fl->fl_next;
> > - block_fl = fl;
> > - goto next_task;
> > + if (posix_same_owner(caller_fl, fl))
> > + return 1;
> > }
> > }
> > return 0;
>
> It may take multiple steps to identify a deadlock. With the above
> you'll miss deadlocks like
>
> process 1 is requesting a lock held by process 2
> process 2 is blocking on a lock held by process 3
> process 3 is blocking on a lock held by process 1.
>
> Could you give more details about how you're causing
> posix_locks_deadlock to hang? Is there a simple test-case you can post?

Hm. After another look: assume we have four tasks, t1, t2, t3, and t4.
Assume t1 and t2 share the same current->files (so they're the same
"owner" for the purpose of posix_same_owner()). Assume:

t1 is waiting on a conflicting lock held by t3.
t2 is waiting on a conflicting lock held by t4.

Now suppose t4 requests a lock that conflicts with a lock held by t1 and
t2. The list_for_each_entry() above will search for a task with t1 or
t2 as owner, which is waiting on a lock. If it finds t1 first, the loop
won't be noticed, so t4 will be put to sleep. Now we have a loop; t3
can release its lock (it no longer matters), and we'll have

t2 waiting on a conflicting lock held by t4, and
t4 waiting on a conflicting lock held by t2.

If a new task t5 then requests a lock conflicting with the one held by
t2, then the above function will go into an infinite loop. I think.

Consider the directed graph with each vertex representing the set of all
tasks sharing the same file table, and each edge representing the
relationship "a task at this vertex is waiting on a lock held by a task
on another vertex". The existance of multiple tasks with the same file
table means that we can no longer assume that each vertex has outdegree
at most one, so we have to switch to an algorithm that works on an
arbitrary directed graph. That sounds painful.

Am I right about that, and about the example above? It'd be interesting
to code it up just to make sure.

If so, one can imagine various bandaids, but maybe we should just rip
out the deadlock detection completely.... It's hard to imagine it being
really useful anyway.

--b.

2007-10-28 17:32:18

[permalink] [raw]

Subject: [PATCH] locks: fix possible infinite loop in posix deadlock detection

From: J. Bruce Fields <[email protected]>

I think the real solution is to remove deadlock detection completely;
it's hard to imaagine applications really depend on it anyway.

For now, though, just bail out after a few iterations.

Thanks to George Davis for reporting the problem.

Cc: "George G. Davis" <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/locks.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 0127a28..131aa88 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -696,17 +696,29 @@ EXPORT_SYMBOL(posix_test_lock);
* Note: the above assumption may not be true when handling lock requests
* from a broken NFS client. But broken NFS clients have a lot more to
* worry about than proper deadlock detection anyway... --okir
+ *
+ * However, the failure of this assumption (also possible in the case of
+ * multiple tasks sharing the same open file table) also means there's no
+ * guarantee that the loop below will terminate. As a hack, we give up
+ * after a few iterations. We don't bother returning EDEADLK in that case;
+ * the deadlock has probably already happened anyway.
*/
+
+#define MAX_DEADLK_ITERATIONS 10
+
static int posix_locks_deadlock(struct file_lock *caller_fl,
struct file_lock *block_fl)
{
struct file_lock *fl;
+ int i = 0;

next_task:
if (posix_same_owner(caller_fl, block_fl))
return 1;
list_for_each_entry(fl, &blocked_list, fl_link) {
if (posix_same_owner(fl, block_fl)) {
+ if (i++ > MAX_DEADLK_ITERATIONS)
+ return 0;
fl = fl->fl_next;
block_fl = fl;
goto next_task;
--
1.5.3.4.208.gc990

2007-10-28 17:43:36

[permalink] [raw]

Subject: [RFC, PATCH] locks: remove posix deadlock detection

From: J. Bruce Fields <[email protected]>

We currently attempt to return -EDEALK to blocking fcntl() file locking
requests that would create a cycle in the graph of tasks waiting on
locks.

This is inefficient: in the general case it requires us determining
whether we're adding a cycle to an arbitrary directed acyclic graph.
And this calculation has to be performed while holding a lock (currently
the BKL) that prevents that graph from changing.

It has historically been a source of bugs; most recently it was noticed
that it could loop indefinitely while holding the BKL.

It seems unlikely to be useful to applications:
- The difficulty of implementation has kept standards from
requiring it. (E.g. SUSv3 : "Since implementation of full
deadlock detection is not always feasible, the [EDEADLK] error
was made optional.") So portable applications may not be able to
depend on it.
- It only detects deadlocks that involve nothing but local posix
file locks; deadlocks involving network filesystems or other kinds
of locks or resources are missed.

It therefore seems best to remove deadlock detection.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/locks.c | 48 ------------------------------------------------
1 files changed, 0 insertions(+), 48 deletions(-)

This also solves the problem addressed by the previous patch. But this
patch would require more discussion, and the problem needs to be fixed
now.

Of course, we shouldn't apply this if there's a chance that real
applications may depend on the existing (imperfect) deadlock detection.

diff --git a/fs/locks.c b/fs/locks.c
index 131aa88..564b85d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -683,50 +683,6 @@ posix_test_lock(struct file *filp, struct file_lock *fl)

EXPORT_SYMBOL(posix_test_lock);

-/* This function tests for deadlock condition before putting a process to
- * sleep. The detection scheme is no longer recursive. Recursive was neat,
- * but dangerous - we risked stack corruption if the lock data was bad, or
- * if the recursion was too deep for any other reason.
- *
- * We rely on the fact that a task can only be on one lock's wait queue
- * at a time. When we find blocked_task on a wait queue we can re-search
- * with blocked_task equal to that queue's owner, until either blocked_task
- * isn't found, or blocked_task is found on a queue owned by my_task.
- *
- * Note: the above assumption may not be true when handling lock requests
- * from a broken NFS client. But broken NFS clients have a lot more to
- * worry about than proper deadlock detection anyway... --okir
- *
- * However, the failure of this assumption (also possible in the case of
- * multiple tasks sharing the same open file table) also means there's no
- * guarantee that the loop below will terminate. As a hack, we give up
- * after a few iterations. We don't bother returning EDEADLK in that case;
- * the deadlock has probably already happened anyway.
- */
-
-#define MAX_DEADLK_ITERATIONS 10
-
-static int posix_locks_deadlock(struct file_lock *caller_fl,
- struct file_lock *block_fl)
-{
- struct file_lock *fl;
- int i = 0;
-
-next_task:
- if (posix_same_owner(caller_fl, block_fl))
- return 1;
- list_for_each_entry(fl, &blocked_list, fl_link) {
- if (posix_same_owner(fl, block_fl)) {
- if (i++ > MAX_DEADLK_ITERATIONS)
- return 0;
- fl = fl->fl_next;
- block_fl = fl;
- goto next_task;
- }
- }
- return 0;
-}
-
/* Try to create a FLOCK lock on filp. We always insert new FLOCK locks
* after any leases, but before any posix locks.
*
@@ -846,10 +802,6 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
error = -EAGAIN;
if (!(request->fl_flags & FL_SLEEP))
goto out;
- error = -EDEADLK;
- if (posix_locks_deadlock(request, fl))
- goto out;
- error = -EAGAIN;
locks_insert_block(fl, request);
goto out;
}
--
1.5.3.4.208.gc990

2007-10-28 17:47:21

[permalink] [raw]

Subject: Re: [RFC][PATCH] Fix hang in posix_locks_deadlock()

On Fri, Oct 26, 2007 at 06:47:08PM -0400, J. Bruce Fields wrote:
> Hm. After another look: assume we have four tasks, t1, t2, t3, and t4.
> Assume t1 and t2 share the same current->files (so they're the same
> "owner" for the purpose of posix_same_owner()). Assume:
>
> t1 is waiting on a conflicting lock held by t3.
> t2 is waiting on a conflicting lock held by t4.
>
> Now suppose t4 requests a lock that conflicts with a lock held by t1 and
> t2. The list_for_each_entry() above will search for a task with t1 or
> t2 as owner, which is waiting on a lock. If it finds t1 first, the loop
> won't be noticed, so t4 will be put to sleep. Now we have a loop; t3
> can release its lock (it no longer matters), and we'll have
>
> t2 waiting on a conflicting lock held by t4, and
> t4 waiting on a conflicting lock held by t2.
>
> If a new task t5 then requests a lock conflicting with the one held by
> t2, then the above function will go into an infinite loop. I think.
>
> Consider the directed graph with each vertex representing the set of all
> tasks sharing the same file table, and each edge representing the
> relationship "a task at this vertex is waiting on a lock held by a task
> on another vertex". The existance of multiple tasks with the same file
> table means that we can no longer assume that each vertex has outdegree
> at most one, so we have to switch to an algorithm that works on an
> arbitrary directed graph. That sounds painful.
>
> Am I right about that, and about the example above? It'd be interesting
> to code it up just to make sure.
>
> If so, one can imagine various bandaids, but maybe we should just rip
> out the deadlock detection completely.... It's hard to imagine it being
> really useful anyway.

OK, well I cooked up a similar example, which was kind of fun, and
verified that I can indeed lock up the kernel this way.

The only way this can happen, though, is if you already have deadlocked
threads--that is to say, two threads that are each waiting on posix file
locks held by the other. (Or a similar cycle of length more than 2.) So
hopefully your application is doing some other kind of deadlock
detection (e.g. by killing threads that block for too long); otherwise
it already has a bug.

--b.

2007-10-28 18:27:58

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, Oct 28, 2007 at 01:43:21PM -0400, J. Bruce Fields wrote:
> We currently attempt to return -EDEALK to blocking fcntl() file locking
> requests that would create a cycle in the graph of tasks waiting on
> locks.
>
> This is inefficient: in the general case it requires us determining
> whether we're adding a cycle to an arbitrary directed acyclic graph.
> And this calculation has to be performed while holding a lock (currently
> the BKL) that prevents that graph from changing.
>
> It has historically been a source of bugs; most recently it was noticed
> that it could loop indefinitely while holding the BKL.

It can also return -EDEADLK spuriously. So yeah, just kill it.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-10-28 18:39:20

by Alan

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, 28 Oct 2007 12:27:32 -0600
Matthew Wilcox <[email protected]> wrote:

> On Sun, Oct 28, 2007 at 01:43:21PM -0400, J. Bruce Fields wrote:
> > We currently attempt to return -EDEALK to blocking fcntl() file locking
> > requests that would create a cycle in the graph of tasks waiting on
> > locks.
> >
> > This is inefficient: in the general case it requires us determining
> > whether we're adding a cycle to an arbitrary directed acyclic graph.
> > And this calculation has to be performed while holding a lock (currently
> > the BKL) that prevents that graph from changing.
> >
> > It has historically been a source of bugs; most recently it was noticed
> > that it could loop indefinitely while holding the BKL.
>
> It can also return -EDEADLK spuriously. So yeah, just kill it.

NAK. This is an ABI change. It was also comprehensively rejected before
because

- EDEADLK behaviour is ABI
- EDEADLK behaviour is required by SuSv3
- We have no idea what applications may rely on this behaviour.

and also SuSv3 is required by LSB

See the thread
http://osdir.com/ml/file-systems/2004-06/msg00017.html

so we need to fix the bugs - the lock usage and the looping. At that
point it merely becomes a performance concern to those who use it, which
is the proper behaviour. If you want a faster non-checking one use
flock(), or add another flag that is a Linux "don't check for deadlock"

Alan

2007-10-28 20:11:20

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, Oct 28, 2007 at 06:40:52PM +0000, Alan Cox wrote:
> NAK. This is an ABI change. It was also comprehensively rejected before
> because
>
> - EDEADLK behaviour is ABI

Not in any meaningful way.

> - EDEADLK behaviour is required by SuSv3

What SuSv3 actually says is:

If the system detects that sleeping until a locked region is
unlocked would cause a deadlock, fcntl() shall fail with an
[EDEADLK] error.

It doesn't require the system to detect it, only mandate what to return
if it does detect it.

> - We have no idea what applications may rely on this behaviour.

I've never heard of one that does.

> so we need to fix the bugs - the lock usage and the looping. At that
> point it merely becomes a performance concern to those who use it, which
> is the proper behaviour. If you want a faster non-checking one use
> flock(), or add another flag that is a Linux "don't check for deadlock"

You can't fix the false EDEADLK detection without solving the halting
problem. Best of luck with that.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-10-28 21:37:20

by Alan

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

> > - EDEADLK behaviour is ABI
>
> Not in any meaningful way.

I've seen SYS5 software that relies on it so we should be careful. Again
see the 2004 discussion where the conclusion was that EDEADLK should stay
>
> > - EDEADLK behaviour is required by SuSv3
>
> What SuSv3 actually says is:
>
> If the system detects that sleeping until a locked region is
> unlocked would cause a deadlock, fcntl() shall fail with an
> [EDEADLK] error.
>
> It doesn't require the system to detect it, only mandate what to return
> if it does detect it.

We should be detecting at least the obvious case.

> > - We have no idea what applications may rely on this behaviour.
>
> I've never heard of one that does.

Very scientific. I have on SYS5 though not afaik Linux

> > so we need to fix the bugs - the lock usage and the looping. At that
> > point it merely becomes a performance concern to those who use it, which
> > is the proper behaviour. If you want a faster non-checking one use
> > flock(), or add another flag that is a Linux "don't check for deadlock"
>
> You can't fix the false EDEADLK detection without solving the halting
> problem. Best of luck with that.

A good question to ask here would be what subset of deadlock loops on
flock does SYS5 Unix error. I also don't see why you need to solve the
halting problem

If SYSV only spots simple AB - BA deadlocks or taking the same lock twice
yourself then that ought to be sufficient for us too.

2007-10-28 21:45:58

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, 28 Oct 2007, Matthew Wilcox wrote:

> You can't fix the false EDEADLK detection without solving the halting
> problem. Best of luck with that.

Could you please elaborate a little bit more on this? I don't see how
detecting loops in graph relates to solving halting problem.

Of course the halting problem can be transformed to deadlock-detection
problem, but this relates to static code analysis, right? Not anything we
are interested in, i.e. tracking things in runtime and detecting loops in
simple dependency graphs.

--
Jiri Kosina

2007-10-28 21:48:51

by Trond Myklebust

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, 2007-10-28 at 14:11 -0600, Matthew Wilcox wrote:
> On Sun, Oct 28, 2007 at 06:40:52PM +0000, Alan Cox wrote:
> > so we need to fix the bugs - the lock usage and the looping. At that
> > point it merely becomes a performance concern to those who use it, which
> > is the proper behaviour. If you want a faster non-checking one use
> > flock(), or add another flag that is a Linux "don't check for deadlock"
>
> You can't fix the false EDEADLK detection without solving the halting
> problem. Best of luck with that.

I can see that it would be difficult to do efficiently, but basically,
this boils down to finding a circular path in a graph. That is hardly an
unsolvable issue...

Trond

2007-10-28 22:42:17

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, Oct 28, 2007 at 05:50:30PM -0400, Trond Myklebust wrote:
> > You can't fix the false EDEADLK detection without solving the halting
> > problem. Best of luck with that.
>
> I can see that it would be difficult to do efficiently, but basically,
> this boils down to finding a circular path in a graph. That is hardly an
> unsolvable issue...

Bzzt. You get a false deadlock with multiple threads like so:

Thread A of task B takes lock 1
Thread C of task D takes lock 2
Thread C of task D blocks on lock 1
Thread E of task B blocks on lock 2

We currently declare deadlock at this point (unless the deadlock detection
code has changed since I last looked at it), despite thread A being about
to release lock 1. Oh, and by the way, thread E is capable of releasing
lock 1, so you can't just say "well, detect by thread instead of by task".

So the only way I can see to accurately detect deadlock is to simulate
the future execution of all threads in task B to see if any of them
will release lock 1 without first gaining lock 2. Which, I believe,
is halting-equivalent.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-10-28 22:49:11

by Alan

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

> Bzzt. You get a false deadlock with multiple threads like so:
>
> Thread A of task B takes lock 1
> Thread C of task D takes lock 2
> Thread C of task D blocks on lock 1
> Thread E of task B blocks on lock 2

The spec and SYSV certainly ignore threading in this situation and you
know that perfectly well (or did in 2004)

2007-10-28 22:56:18

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, Oct 28, 2007 at 10:48:33PM +0000, Alan Cox wrote:
> > Bzzt. You get a false deadlock with multiple threads like so:
> >
> > Thread A of task B takes lock 1
> > Thread C of task D takes lock 2
> > Thread C of task D blocks on lock 1
> > Thread E of task B blocks on lock 2
>
> The spec and SYSV certainly ignore threading in this situation and you
> know that perfectly well (or did in 2004)

The discussion petered out (or that mailing list archive lost articles
from the thread) without any kind of resolution, or indeed interest.
What is your suggestion for handling this problem? As it is now, the
kernel 'detects' deadlock where there is none, which doesn't seem
allowed by SuSv3 either.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-10-28 22:56:48

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, 28 Oct 2007, Matthew Wilcox wrote:

> Bzzt. You get a false deadlock with multiple threads like so:
> Thread A of task B takes lock 1
> Thread C of task D takes lock 2
> Thread C of task D blocks on lock 1
> Thread E of task B blocks on lock 2

A potential for deadlock occurs if a process controlling a locked
region is put to sleep by attempting to lock another process'
locked region. If the system detects that sleeping until a locked
region is unlocked would cause a deadlock, fcntl() shall fail with
an [EDEADLK] error.

This is what POSIX says [1], even after being modified with respect to
POSIX Threads Extension, right?

So it doesn't deal with threads at all, just processess are taken into
account. Probably for a reason :)

[1] http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html

--
Jiri Kosina

2007-10-28 23:31:32

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, Oct 28, 2007 at 11:55:52PM +0100, Jiri Kosina wrote:
> On Sun, 28 Oct 2007, Matthew Wilcox wrote:
>
> > Bzzt. You get a false deadlock with multiple threads like so:
> > Thread A of task B takes lock 1
> > Thread C of task D takes lock 2
> > Thread C of task D blocks on lock 1
> > Thread E of task B blocks on lock 2
>
> A potential for deadlock occurs if a process controlling a locked
> region is put to sleep by attempting to lock another process'
> locked region. If the system detects that sleeping until a locked
> region is unlocked would cause a deadlock, fcntl() shall fail with
> an [EDEADLK] error.
>
> This is what POSIX says [1], even after being modified with respect to
> POSIX Threads Extension, right?
>
> So it doesn't deal with threads at all, just processess are taken into
> account. Probably for a reason :)

Did you have a concrete suggestion, or are you just quoting the spec?

The problem is that it's nonsense -- processes don't sleep, threads do.
I think the key is "would deadlock", not "might deadlock". Our current
behaviour is clearly in violation of SuSv3.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-10-28 23:36:50

by Alan

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

> > The spec and SYSV certainly ignore threading in this situation and you
> > know that perfectly well (or did in 2004)
>
> The discussion petered out (or that mailing list archive lost articles
> from the thread) without any kind of resolution, or indeed interest.

I think the resolution was that the EDEADLK stayed.

> What is your suggestion for handling this problem? As it is now, the
> kernel 'detects' deadlock where there is none, which doesn't seem
> allowed by SuSv3 either

Re-read the spec. The EDEADLK doesn't account for threads, only processes.

Alan

2007-10-28 23:38:25

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, Oct 28, 2007 at 09:38:55PM +0000, Alan Cox wrote:
> > It doesn't require the system to detect it, only mandate what to return
> > if it does detect it.
>
> We should be detecting at least the obvious case.

What is the obvious case? A task that has never called clone()?

> If SYSV only spots simple AB - BA deadlocks or taking the same lock twice
> yourself then that ought to be sufficient for us too.

You can't deadlock against yourself -- either with POSIX locks or BSD
locks (POSIX simple upgrades/downgrades the lock; each byte in a file
can only be in either read-locked state or write-locked state for a
given process. BSD locks release the lock and wake all waiters before
attempting to acquire the lock in its new state). In my other post, I
showed a simple AB-BA deadlock that can't be easily detected.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-10-28 23:43:19

by Alan

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, 28 Oct 2007 17:38:14 -0600
Matthew Wilcox <[email protected]> wrote:

> On Sun, Oct 28, 2007 at 09:38:55PM +0000, Alan Cox wrote:
> > > It doesn't require the system to detect it, only mandate what to return
> > > if it does detect it.
> >
> > We should be detecting at least the obvious case.
>
> What is the obvious case? A task that has never called clone()?

Simple AB BA I would have thought obvious. Clone as has been said several
times now is irrelevant as the standard is about *processes* [in the SuS
sense]

2007-10-29 01:13:42

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, Oct 28, 2007 at 06:40:52PM +0000, Alan Cox wrote:
> On Sun, 28 Oct 2007 12:27:32 -0600
> Matthew Wilcox <[email protected]> wrote:
>
> > On Sun, Oct 28, 2007 at 01:43:21PM -0400, J. Bruce Fields wrote:
> > > We currently attempt to return -EDEALK to blocking fcntl() file locking
> > > requests that would create a cycle in the graph of tasks waiting on
> > > locks.
> > >
> > > This is inefficient: in the general case it requires us determining
> > > whether we're adding a cycle to an arbitrary directed acyclic graph.
> > > And this calculation has to be performed while holding a lock (currently
> > > the BKL) that prevents that graph from changing.
> > >
> > > It has historically been a source of bugs; most recently it was noticed
> > > that it could loop indefinitely while holding the BKL.
> >
> > It can also return -EDEADLK spuriously. So yeah, just kill it.
>
> NAK. This is an ABI change. It was also comprehensively rejected before
> because
>
> - EDEADLK behaviour is ABI
> - EDEADLK behaviour is required by SuSv3
> - We have no idea what applications may rely on this behaviour.

On the second point I think you're mistaken; see

http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html

"Since implementation of full deadlock detection is not
always feasible, the [EDEADLK] error was made optional."

The third objection is the one that concerns me most, and is the one I'd
like to understand better. So I'd be interested in any evidence of
applications that do actually depend on the current behavior. (Even
hypothetical use cases might be interesting.) I'm also curious what
other OS's do.

> See the thread
> http://osdir.com/ml/file-systems/2004-06/msg00017.html
>
> so we need to fix the bugs - the lock usage and the looping. At that
> point it merely becomes a performance concern to those who use it, which
> is the proper behaviour. If you want a faster non-checking one use
> flock(), or add another flag that is a Linux "don't check for deadlock"

That's an interesting idea. The flag might not help as much, since
looking for a cycle in the graph of blocked requests may be more
difficult in the case where we don't know whether the graph is acyclic
to start out with. (But I don't know--I haven't thought about it much.)

--b.

2007-10-29 02:11:19

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, Oct 28, 2007 at 04:41:57PM -0600, Matthew Wilcox wrote:
> On Sun, Oct 28, 2007 at 05:50:30PM -0400, Trond Myklebust wrote:
> > > You can't fix the false EDEADLK detection without solving the halting
> > > problem. Best of luck with that.
> >
> > I can see that it would be difficult to do efficiently, but basically,
> > this boils down to finding a circular path in a graph. That is hardly an
> > unsolvable issue...
>
> Bzzt. You get a false deadlock with multiple threads like so:
>
> Thread A of task B takes lock 1
> Thread C of task D takes lock 2
> Thread C of task D blocks on lock 1
> Thread E of task B blocks on lock 2

Oh neat, I missed that case, thanks for pointing it out.

> We currently declare deadlock at this point (unless the deadlock detection
> code has changed since I last looked at it), despite thread A being about
> to release lock 1. Oh, and by the way, thread E is capable of releasing
> lock 1, so you can't just say "well, detect by thread instead of by task".
>
> So the only way I can see to accurately detect deadlock is to simulate
> the future execution of all threads in task B to see if any of them
> will release lock 1 without first gaining lock 2.

Hm. It's annoying, but I'm not convinced it's *that* annoying. We're
not trying to predict whether a deadlock could arise as the result of
future behavior. We're just trying to determine whether granting the
current lock request results in an immediate deadlock consisting purely
of posix file locks.

But yes, I'm assume it's possible, for example, that a thread-exit could
race with a lock request, with the result that we see no deadlock at the
time we handle the lock request, even though at that point the last task
with the ability to solve the problem is already exiting.

Supposing that we're willing to permit the request in such cases and
return EDEADLK only in cases where we're positive there's a deadlock, is
there still some useful subset of cases where we could return EDEADLK?
For example, could we take note of tasks that, when they block on a
lock, have a current->files with reference count one, and only follow
cycles consisting of such blocks?

I'm still not convinced it's worth the trouble, but I suspect you're
overstating the difficulty.

--b.

2007-10-29 02:30:07

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, Oct 28, 2007 at 11:38:26PM +0000, Alan Cox wrote:
> > > The spec and SYSV certainly ignore threading in this situation and you
> > > know that perfectly well (or did in 2004)
> >
> > The discussion petered out (or that mailing list archive lost articles
> > from the thread) without any kind of resolution, or indeed interest.
>
> I think the resolution was that the EDEADLK stayed.
>
> > What is your suggestion for handling this problem? As it is now, the
> > kernel 'detects' deadlock where there is none, which doesn't seem
> > allowed by SuSv3 either
>
> Re-read the spec. The EDEADLK doesn't account for threads, only processes.

Do you have in mind a way to take advantage of that distinction?

The practical requirement I see here is that our deadlock detection
should never give false positives. If we return EDEADLK to applications
with locking schemes that don't actually deadlock, then we're telling
application designers that avoiding deadlock in itself isn't
sufficient--they also have to know our particular deadlock detection
algorithm, so as to avoid giving even the appearance of deadlock.

And if posix file locks are to be useful to threaded applications, then
we have to preserve the same no-false-positives requirement for them as
well.

But, OK, if we can identify unshared current->files at the time we put a
task to sleep, then a slight modification of our current algorithm might
be sufficient to detect any deadlock that involves purely posix file
locks and processes. And we can tell people that avoiding deadlock is
their problem as soon as any task with a shared current->files is
involved. (Ditto, I assume, if nfsd/lockd acquires a lock.)

--b.

2007-10-29 03:24:22

by Trond Myklebust

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, 2007-10-28 at 16:41 -0600, Matthew Wilcox wrote:
> On Sun, Oct 28, 2007 at 05:50:30PM -0400, Trond Myklebust wrote:
> > > You can't fix the false EDEADLK detection without solving the halting
> > > problem. Best of luck with that.
> >
> > I can see that it would be difficult to do efficiently, but basically,
> > this boils down to finding a circular path in a graph. That is hardly an
> > unsolvable issue...
>
> Bzzt. You get a false deadlock with multiple threads like so:
>
> Thread A of task B takes lock 1
> Thread C of task D takes lock 2
> Thread C of task D blocks on lock 1
> Thread E of task B blocks on lock 2
>
> We currently declare deadlock at this point (unless the deadlock detection
> code has changed since I last looked at it), despite thread A being about
> to release lock 1. Oh, and by the way, thread E is capable of releasing
> lock 1, so you can't just say "well, detect by thread instead of by task".
>
> So the only way I can see to accurately detect deadlock is to simulate
> the future execution of all threads in task B to see if any of them
> will release lock 1 without first gaining lock 2. Which, I believe,
> is halting-equivalent.

As several people have told you, the SUSv3 section on fcntl and
deadlocks reads as follows:

"A potential for deadlock occurs if a process controlling a
locked region is put to sleep by attempting to lock another
process' locked region. If the system detects that sleeping
until a locked region is unlocked would cause a deadlock,
fcntl() shall fail with an [EDEADLK] error."

There is no mention there or anywhere else of a need to make exceptions
when dealing with threads. The posix locking model is _process_ based,
and so our deadlock detection only needs to take that into account.
If programmers choose to play tricksy little games with threads, then it
is their responsibility to ensure that the application doesn't get into
a situation where the posix deadlock detection model breaks down.

Trond

2007-10-29 08:04:41

by Alan

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, 28 Oct 2007 13:43:21 -0400
"J. Bruce Fields" <[email protected]> wrote:

> From: J. Bruce Fields <[email protected]>
>
> We currently attempt to return -EDEALK to blocking fcntl() file locking
> requests that would create a cycle in the graph of tasks waiting on
> locks.
>
> This is inefficient: in the general case it requires us determining
> whether we're adding a cycle to an arbitrary directed acyclic graph.
> And this calculation has to be performed while holding a lock (currently
> the BKL) that prevents that graph from changing.
>
> It has historically been a source of bugs; most recently it was noticed
> that it could loop indefinitely while holding the BKL.
>
> It seems unlikely to be useful to applications:
> - The difficulty of implementation has kept standards from
> requiring it. (E.g. SUSv3 : "Since implementation of full
> deadlock detection is not always feasible, the [EDEADLK] error
> was made optional.") So portable applications may not be able to
> depend on it.
> - It only detects deadlocks that involve nothing but local posix
> file locks; deadlocks involving network filesystems or other kinds
> of locks or resources are missed.
>
> It therefore seems best to remove deadlock detection.
>
> Signed-off-by: J. Bruce Fields <[email protected]>

NAK. This is an ABI change and one that was rejected before when this was
last discussed in detail. Moving it out of BKL makes a ton of sense, even
adding a "don't check" flag makes a lot of sense. Removing the checking
does not.

I'd much rather see

if (flags & FL_NODLCHECK)
posix_deadlock_detect(....)

The failure case for removing this feature is obscure and hard to debug
application hangs for the afflicted programs - not nice for users at all.

Alan

2007-10-29 08:06:56

by Alan

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

> And if posix file locks are to be useful to threaded applications, then
> we have to preserve the same no-false-positives requirement for them as
> well.

It isn't useful to threaded applications. The specification requires
this. Which is another reason for having an additional Linux (for now)
flag to say "don't bother"

Alan

2007-10-29 09:12:21

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, 28 Oct 2007, Matthew Wilcox wrote:

> > A potential for deadlock occurs if a process controlling a locked
> > region is put to sleep by attempting to lock another process'
> > locked region. If the system detects that sleeping until a locked
> > region is unlocked would cause a deadlock, fcntl() shall fail with
> > an [EDEADLK] error.
> > This is what POSIX says [1], even after being modified with respect to
> > POSIX Threads Extension, right? So it doesn't deal with threads at
> > all, just processess are taken into account. Probably for a reason :)
> Did you have a concrete suggestion, or are you just quoting the spec?

I was quoting the spec and I thought that the suggestion is implicit --
the specification defines what happens only when processess are in place.
If the application uses POSIX threads in combination with locks, it is
going to receive undefined behavior.

> The problem is that it's nonsense -- processes don't sleep, threads do.
> I think the key is "would deadlock", not "might deadlock". Our current
> behaviour is clearly in violation of SuSv3.

- either we can add a special fcntl() Linux-specific flag, that will ask
the kernel not to perform deadlock detection. Any application that is
combining threads and posix locks (and thus IMHO asking for undefined
behavior) could use this flag and not receive EDEADLK any more

- or we can add some heuristics here-and-there to track which
current->files are shared and which are not, and do not return EDEADLK
in the case of shared ->files (could be a little bit tricky)

--
Jiri Kosina

2007-10-29 09:16:55

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Sun, 28 Oct 2007, J. Bruce Fields wrote:

> But, OK, if we can identify unshared current->files at the time we put a
> task to sleep, then a slight modification of our current algorithm might
> be sufficient to detect any deadlock that involves purely posix file
> locks and processes. And we can tell people that avoiding deadlock is
> their problem as soon as any task with a shared current->files is
> involved. (Ditto, I assume, if nfsd/lockd acquires a lock.)

Don't forget that comparing file_lock->fl_owner (i.e. current->files) is
not the only way how lock ownership could be computed (there could be
specific file_lock->fl_lmops->fl_compare_owner() and all of them should
be teached this new semantics, right?).

--
Jiri Kosina

2007-10-30 15:20:29

[permalink] [raw]

Subject: [PATCH, RESEND] locks: fix possible infinite loop in posix deadlock detection

From: J. Bruce Fields <[email protected]>

It's currently possible to send posix_locks_deadlock() into an infinite
loop (under the BKL).

For now, fix this just by bailing out after a few iterations. We may
want to fix this in a way that better clarifies the semantics of
deadlock detection. But that will take more time, and this minimal fix
is probably adequate for any realistic scenario, and is simple enough to
be appropriate for applying to stable kernels now.

Thanks to George Davis for reporting the problem.

Cc: "George G. Davis" <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/locks.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)

I didn't see objections to this quick fix (just to the followup that
attempts to rip out posix deadlock detection entirely), so I'm
resending with just comment modifications.

I haven't given up on a more comprehensive solution, but I think we
really need to apply some fix now.

--b.

diff --git a/fs/locks.c b/fs/locks.c
index 0127a28..8b8388e 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -696,17 +696,28 @@ EXPORT_SYMBOL(posix_test_lock);
* Note: the above assumption may not be true when handling lock requests
* from a broken NFS client. But broken NFS clients have a lot more to
* worry about than proper deadlock detection anyway... --okir
+ *
+ * However, the failure of this assumption (also possible in the case of
+ * multiple tasks sharing the same open file table) also means there's no
+ * guarantee that the loop below will terminate. As a hack, we give up
+ * after a few iterations.
*/
+
+#define MAX_DEADLK_ITERATIONS 10
+
static int posix_locks_deadlock(struct file_lock *caller_fl,
struct file_lock *block_fl)
{
struct file_lock *fl;
+ int i = 0;

next_task:
if (posix_same_owner(caller_fl, block_fl))
return 1;
list_for_each_entry(fl, &blocked_list, fl_link) {
if (posix_same_owner(fl, block_fl)) {
+ if (i++ > MAX_DEADLK_ITERATIONS)
+ return 0;
fl = fl->fl_next;
block_fl = fl;
goto next_task;
--
1.5.3.4.208.gc990

2007-10-30 15:35:01

by Alan

[permalink] [raw]

Subject: Re: [PATCH, RESEND] locks: fix possible infinite loop in posix deadlock detection

On Tue, 30 Oct 2007 11:20:02 -0400
"J. Bruce Fields" <[email protected]> wrote:

> From: J. Bruce Fields <[email protected]>
>
> It's currently possible to send posix_locks_deadlock() into an infinite
> loop (under the BKL).
>
> For now, fix this just by bailing out after a few iterations. We may
> want to fix this in a way that better clarifies the semantics of
> deadlock detection. But that will take more time, and this minimal fix
> is probably adequate for any realistic scenario, and is simple enough to
> be appropriate for applying to stable kernels now.
>
> Thanks to George Davis for reporting the problem.
>
> Cc: "George G. Davis" <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>

Acked-by: Alan Cox <[email protected]>

Its a good fix for now and I doubt any real world user has that complex a
locking pattern to break.

2007-10-30 15:35:49

[permalink] [raw]

Subject: Re: [RFC, PATCH] locks: remove posix deadlock detection

On Mon, Oct 29, 2007 at 10:15:19AM +0100, Jiri Kosina wrote:
> On Sun, 28 Oct 2007, J. Bruce Fields wrote:
>
> > But, OK, if we can identify unshared current->files at the time we put a
> > task to sleep, then a slight modification of our current algorithm might
> > be sufficient to detect any deadlock that involves purely posix file
> > locks and processes. And we can tell people that avoiding deadlock is
> > their problem as soon as any task with a shared current->files is
> > involved. (Ditto, I assume, if nfsd/lockd acquires a lock.)
>
> Don't forget that comparing file_lock->fl_owner (i.e. current->files) is
> not the only way how lock ownership could be computed (there could be
> specific file_lock->fl_lmops->fl_compare_owner() and all of them should
> be teached this new semantics, right?).

Right. So, for example, this would handle both cases as described
above.

We're turning off deadlock detection in the problematic cases just by
neglecting to add such waiting lockers to the blocked_list (which is
what we search to look for lock cycles).

(It'd be nice if we didn't have to search that list at all. There
should be some way to set up the data structures so we can follow the
lock dependencies without having to scan through all the blocked locks
at each step. But I haven't quite figured out how to do that yet. And
perhaps it's not important to optimize cases where there are lots of
blocks.)

--b.

diff --git a/fs/locks.c b/fs/locks.c
index 8b8388e..5446305 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -512,6 +512,8 @@ static void locks_delete_block(struct file_lock *waiter)
unlock_kernel();
}

+static int posix_owner_shared(struct file_lock *caller_fl);
+
/* Insert waiter into blocker's block list.
* We use a circular list so that processes can be easily woken up in
* the order they blocked. The documentation doesn't require this but
@@ -523,7 +525,7 @@ static void locks_insert_block(struct file_lock *blocker,
BUG_ON(!list_empty(&waiter->fl_block));
list_add_tail(&waiter->fl_block, &blocker->fl_block);
waiter->fl_next = blocker;
- if (IS_POSIX(blocker))
+ if (IS_POSIX(blocker) && !posix_owner_shared(waiter))
list_add(&waiter->fl_link, &blocked_list);
}

@@ -683,46 +685,79 @@ posix_test_lock(struct file *filp, struct file_lock *fl)

EXPORT_SYMBOL(posix_test_lock);

-/* This function tests for deadlock condition before putting a process to
- * sleep. The detection scheme is no longer recursive. Recursive was neat,
- * but dangerous - we risked stack corruption if the lock data was bad, or
- * if the recursion was too deep for any other reason.
- *
- * We rely on the fact that a task can only be on one lock's wait queue
- * at a time. When we find blocked_task on a wait queue we can re-search
- * with blocked_task equal to that queue's owner, until either blocked_task
- * isn't found, or blocked_task is found on a queue owned by my_task.
- *
- * Note: the above assumption may not be true when handling lock requests
- * from a broken NFS client. But broken NFS clients have a lot more to
- * worry about than proper deadlock detection anyway... --okir
- *
- * However, the failure of this assumption (also possible in the case of
- * multiple tasks sharing the same open file table) also means there's no
- * guarantee that the loop below will terminate. As a hack, we give up
- * after a few iterations.
+/*
+ * Deadlock detection:
+ *
+ * We attempt to detect deadlocks that are due purely to posix file
+ * locks.
+ *
+ * We assume that a task can be waiting for at most one lock at a time.
+ * So for any acquired lock, the process holding that lock may be
+ * waiting on at most one other lock. That lock in turns may be held by
+ * someone waiting for at most one other lock. Given a requested lock
+ * caller_fl which is about to wait for a conflicting lock block_fl, we
+ * follow this chain of waiters to ensure we are not about to create a
+ * cycle.
+ *
+ * Since we do this before we ever put a process to sleep on a lock, we
+ * are ensured that there is never a cycle; that is what guarantees that
+ * the while() loop in posix_locks_deadlock() eventually completes.
+ *
+ * Note: the above assumption may not be true when handling lock
+ * requests from a broken NFS client. It may also fail in the presence
+ * of tasks (such as posix threads) sharing the same open file table.
+ *
+ * We don't necessarily care about returning EDEALK correctly in such
+ * cases, but we do need to avoid cycles in the lock dependency graph in
+ * order to ensure the loop in posix_locks_deadlock eventually
+ * terminates. To that end, we enforce the assumption above by refusing
+ * to return EDEADLK or add to the list of blocked locks in any case
+ * where a lock owner might be able to block on more than one lock.
*/

-#define MAX_DEADLK_ITERATIONS 10
+static int posix_owner_shared(struct file_lock *caller_fl)
+{
+ /*
+ * The caller is a lock manager (lockd/nfsd), and won't
+ * necessarily guarantee that a single lock owner won't block on
+ * two locks at once:
+ */
+ if (caller_fl->fl_lmops && caller_fl->fl_lmops->fl_compare_owner)
+ return 1;
+ /*
+ * Multiple tasks share current->files, also allowing the same
+ * "owner" to block on two locks at once:
+ */
+ if (current->files == NULL || atomic_read(&current->files->count) > 1)
+ return 1;
+ /*
+ * The lock is not on behalf of a file manager, and no other
+ * tasks share this file owner (and, as long as this task is
+ * stuck waiting for a lock, that's not going to change):
+ */
+ return 0;
+}
+
+/* Find a lock that the owner of the given block_fl is blocking on. */
+static struct file_lock * what_owner_is_waiting_for(struct file_lock *block_fl)
+{
+ struct file_lock *fl;
+
+ list_for_each_entry(fl, &blocked_list, fl_link)
+ if (posix_same_owner(fl, block_fl))
+ return fl->fl_next;
+ return NULL;
+}

static int posix_locks_deadlock(struct file_lock *caller_fl,
struct file_lock *block_fl)
{
- struct file_lock *fl;
- int i = 0;
+ if (posix_owner_shared(caller_fl))
+ return 0;

-next_task:
- if (posix_same_owner(caller_fl, block_fl))
- return 1;
- list_for_each_entry(fl, &blocked_list, fl_link) {
- if (posix_same_owner(fl, block_fl)) {
- if (i++ > MAX_DEADLK_ITERATIONS)
- return 0;
- fl = fl->fl_next;
- block_fl = fl;
- goto next_task;
- }
- }
+ while ((block_fl = what_owner_is_waiting_for(block_fl)))
+ if (posix_same_owner(caller_fl, block_fl))
+ return 1;
return 0;
}

2007-10-30 15:52:17