2003-09-03 19:24:45

by linas

[permalink] [raw]
Subject: PATCH: kernel-2.4 brlock livelock




Hi,

I'm soliciting for review of a patch to brlock's in kernel 2.4

PROBLEM:
We've got a system here, a very heavily loaded 8-way SMP powerPC64
system, that seems to be getting stuck in br_write_lock(). It
appears that its stuck in a 'livelock' situation.

ANALYSIS:
The current 2.4 brlock implementation has a write-lock that spins
while waiting for the number of readers to drop to zero.
(This is true for both the 'atomic' and Dave Miller's 'non-atomic'
implementations.) The problem is that there are so many readers
and so many CPU's, that new readers come in as fast as the old
readers finish, that the length of the queue never drops to zero,
and thus the write lock is never obtained. (At leaset that's
what we think is happening, the evidence is indirect.)

After reviewing the brlock code, it appears that neither of the
implementations (atomic/non-atomic) gaurentee 'forward progress'.
The patch below does gaurentee forward progress (at some cost?)
(I can't tell what the cost is). Note: ppc64 uses the Dave Miller
non-atomic code.

DESCRIPTION OF PATCH:
The patch changes the non-atomic code. It grabs the write lock, and
then spins, waiting for all of the existing readers to finish.
New readers are held off. This seems (to me) to be a reasonable
thing to do, based on the following logic:

The current brlock design, in either the atomic or non-atomic case,
assumes that sooner or later, the number of readers will drop to
zero. Since the write locks spin, the current design also seems to
assume that each individual read lock is not held for very long.
(i.e. it assumes that its OK for the writer to spin until the readers
complete.) Based on this observation, I propose a modified write
lock that does gaurentee forward progress.

>From what I can tell, it should work just fine; but its minimally
tested. (I booted it and compiled a kernel.)

There is some performance impact, but I think its less than the
'atomic' code. The atomic code slowly chokes off new readers
by locking one cpu at a time. New readers wait and spin: which
is the same as in the patch below, and so this patch doesn't make
things worse.

--linas

--- brlock.c.orig 2003-09-02 18:18:06.000000000 -0500
+++ brlock.c 2003-09-03 12:05:28.000000000 -0500
@@ -48,14 +48,11 @@ void __br_write_lock (enum brlock_indice
{
int i;

-again:
spin_lock(&__br_write_locks[idx].lock);
+wait_for_readers_to_finish:
for (i = 0; i < smp_num_cpus; i++)
if (__brlock_array[cpu_logical_map(i)][idx] != 0) {
- spin_unlock(&__br_write_locks[idx].lock);
- barrier();
- cpu_relax();
- goto again;
+ goto wait_for_readers_to_finish;
}
}






2003-09-03 19:48:18

by Anton Blanchard

[permalink] [raw]
Subject: Re: PATCH: kernel-2.4 brlock livelock


Hi,

> The patch changes the non-atomic code. It grabs the write lock, and
> then spins, waiting for all of the existing readers to finish.
> New readers are held off. This seems (to me) to be a reasonable
> thing to do, based on the following logic:

The problem is with recursive readers. One cpu takes a br read lock then
wants to take the same lock again. It must be allowed to get that read lock.

We need to drop the write spinlock or else we will deadlock.

Anton

2003-09-03 20:13:32

by linas

[permalink] [raw]
Subject: Re: PATCH: kernel-2.4 brlock livelock

On Thu, Sep 04, 2003 at 05:44:02AM +1000, Anton Blanchard wrote:
>
> Hi,
>
> > The patch changes the non-atomic code. It grabs the write lock, and
> > then spins, waiting for all of the existing readers to finish.
> > New readers are held off. This seems (to me) to be a reasonable
> > thing to do, based on the following logic:
>
> The problem is with recursive readers. One cpu takes a br read lock then
> wants to take the same lock again. It must be allowed to get that read lock.
>
> We need to drop the write spinlock or else we will deadlock.

Whoops.

OK, how about the following: readers on a given cpu are held off
if the write lock is held *and* the read-count on that cpu is zero?

That way, 'recursive' readers on other CPU's can get a read-lock if
there's already a non-zero read-lock-count on that CPU.

That should work if the thread holding the lock can't get scheduled
to another cpu. Can these things wander around?

If they can wander around, then oone would have to order the cpus:
wait for read count to drop to zero on cpu 0 then on 1 then on 2,
meanwhile the read-lock can be gotten on the higher ordered CPUs ...

If this sounds reasonable, would you care to see a revised patch?

What else can go wrong?

--linas

2003-09-04 08:59:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: PATCH: kernel-2.4 brlock livelock


On Wed, 3 Sep 2003 [email protected] wrote:

> OK, how about the following: readers on a given cpu are held off if the
> write lock is held *and* the read-count on that cpu is zero?
>
> That way, 'recursive' readers on other CPU's can get a read-lock if
> there's already a non-zero read-lock-count on that CPU.
>
> That should work if the thread holding the lock can't get scheduled to
> another cpu. Can these things wander around?
>
> If they can wander around, then oone would have to order the cpus: wait
> for read count to drop to zero on cpu 0 then on 1 then on 2, meanwhile
> the read-lock can be gotten on the higher ordered CPUs ...
>
> If this sounds reasonable, would you care to see a revised patch?

could you try this approach on your box that shows the livelock situation?

certainly we can add code that only triggers if there's some write attempt
- the important thing is to have the right read-mostly behavior.

Ingo