Here's a before-breakfast implementation of a recursive spinlock. That
is, the same thread can "take" the spinlock repeatedly. This is crude -
I just want to focus some attention on the issue (while I go out and
have breakfast :'E).
The idea is to implement trylock correctly, and then get lock and
unlock from that via the standard algebra. lock is a loop doing
a trylock until it succeeds. We emerge from a successful trylock
with the lock notionally held.
The "spinlock" is a register of the current pid, plus a recursion
counter, with atomic access. The pid is either -1 (unset, count is
zero) or some decent value (count is positive).
The trylock will succeed and set the pid if it is currently unset. It
will succeed if the pid matches ours, and increment the count of
holders.
Unlock just decrements the count. When we've unlocked enough times,
somebody else can take the lock.
The objection to this scheme that I have is
1) its notion of identity is limited to the pid, which is crude
2) it doesn't detect dead holders of the lock, which would be nice
3) it should probably be done in assembler, using whatever trick
rwlocks use
4) it doesn't actually use a real spinlock to "hold" the lock, which
makes me nervous.
typedef struct recursive_spinlock {
spinlock_t lock;
int pid;
int count;
} recursive_spinlock_t;
int recursive_spin_trylock(recursive_spinlock_t *lock) {
spin_lock(&lock->lock);
if (lock->count <= 0) {
// greenfield site
lock->pid = current->pid;
lock->count = 1;
spin_unlock(&lock->lock);
return 0;
}
// somebody has the lock
if (lock->pid == current->pid) {
// it was us! return ok`
lock->count++;
spin_unlock(&lock->lock);
return 0;
}
// somebody has the lock and it's not us! return fail
spin_unlock(&lock->lock);
return 1;
}
void recursive_spin_lock(recursive_spinlock_t *lock) {
while (recursive_spin_trylock(lock) != 0);
}
void recursive_spin_unlock(recursive_spinlock_t *lock) {
spin_lock(&lock->lock);
if (--lock->count <= 0)
lock->count = 0;
spin_unlock(&lock->lock);
}
void recursive_spinlock_init(recursive_spinlock_t *lock) {
spinlock_init(&lock->lock);
lock->pid = -1;
lock->count = 0;
}
Peter
> Here's a before-breakfast implementation of a recursive spinlock. That
> is, the same thread can "take" the spinlock repeatedly.
Why?
M.
At some point in the past, Peter Breuer's attribution was removed from:
>> Here's a before-breakfast implementation of a recursive spinlock. That
>> is, the same thread can "take" the spinlock repeatedly.
On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote:
> Why?
netconsole.
-- wli
At some point in the past, Peter Breuer's attribution was removed from:
>>>> Here's a before-breakfast implementation of a recursive spinlock. That
>>>> is, the same thread can "take" the spinlock repeatedly.
On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote:
>>> Why?
On Sun, 2003-05-18 at 18:35, William Lee Irwin III wrote:
>> netconsole.
On Sun, May 18, 2003 at 06:49:04PM +0200, Arjan van de Ven wrote:
> not really.
> the netconsole issue is tricky and recursive, but recursive locks aren't
> the solution; that would require a rewrite of the network drivers. It's
> far easier to solve it by moving the debug printk's instead.
Yes, there are better ways to fix it. But AIUI this is why some people
want it (the rest of us just don't want messy locking semantics around).
-- wli
On Sun, 2003-05-18 at 18:35, William Lee Irwin III wrote:
> At some point in the past, Peter Breuer's attribution was removed from:
> >> Here's a before-breakfast implementation of a recursive spinlock. That
> >> is, the same thread can "take" the spinlock repeatedly.
>
> On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote:
> > Why?
>
> netconsole.
not really.
the netconsole issue is tricky and recursive, but recursive locks aren't
the solution; that would require a rewrite of the network drivers. It's
far easier to solve it by moving the debug printk's instead.
--William Lee Irwin III <[email protected]> wrote (on Sunday, May 18, 2003 09:54:45 -0700):
> At some point in the past, Peter Breuer's attribution was removed from:
>>>>> Here's a before-breakfast implementation of a recursive spinlock. That
>>>>> is, the same thread can "take" the spinlock repeatedly.
>
> On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote:
>>>> Why?
>
> On Sun, 2003-05-18 at 18:35, William Lee Irwin III wrote:
>>> netconsole.
>
> On Sun, May 18, 2003 at 06:49:04PM +0200, Arjan van de Ven wrote:
>> not really.
>> the netconsole issue is tricky and recursive, but recursive locks aren't
>> the solution; that would require a rewrite of the network drivers. It's
>> far easier to solve it by moving the debug printk's instead.
>
> Yes, there are better ways to fix it. But AIUI this is why some people
> want it (the rest of us just don't want messy locking semantics around).
Right ... to me this just seems to create confusing code for no really
good reason that I can see right now.
M.
"A month of sundays ago William Lee Irwin III wrote:"
> At some point in the past, Peter Breuer's attribution was removed from:
> >> Here's a before-breakfast implementation of a recursive spinlock. That
> >> is, the same thread can "take" the spinlock repeatedly.
>
> On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote:
> > Why?
>
> netconsole.
That's a problem looking for a solution! No, the reason for wanting a
recursive spinlock is that nonrecursive locks make programming harder.
Though I've got quite good at finding and removing deadlocks in my old
age, there are still two popular ways that the rest of the world's
prgrammers often shoot themselves in the foot with a spinlock:
a) sleeping while holding the spinlock
b) taking the spinlock in a subroutine while you already have it
The first method leads to an early death if the spinlock is a popular
one, as the only thread that can release it doesn't seem to be running,
errr..
The second method is used by programmers who aren't aware that some
obscure subroutine takes a spinlock, and who recklessly take a lock
before calling a subroutine (the very thought sends shivers down my
spine ...). A popular scenario involves not /knowing/ that your routine
is called by the kernel with some obscure lock already held, and then
calling a subroutine that calls the same obscure lock. The request
function is one example, but that's hardly obscure (and in 2.5 the
situation has eased there!).
It's the case (b) that a recursive spinlock makes go away.
Hey, that's not bad for a small change! 50% of potential programming
errors sent to the dustbin without ever being encountered.
Peter
On Sun, 18 May 2003, Peter T. Breuer wrote:
>
> Here's a before-breakfast implementation of a recursive spinlock. That
> is, the same thread can "take" the spinlock repeatedly. This is crude -
> I just want to focus some attention on the issue (while I go out and
> have breakfast :'E).
>
> The idea is to implement trylock correctly, and then get lock and
> unlock from that via the standard algebra. lock is a loop doing
> a trylock until it succeeds. We emerge from a successful trylock
> with the lock notionally held.
>
> The "spinlock" is a register of the current pid, plus a recursion
> counter, with atomic access. The pid is either -1 (unset, count is
> zero) or some decent value (count is positive).
>
> The trylock will succeed and set the pid if it is currently unset. It
> will succeed if the pid matches ours, and increment the count of
> holders.
>
> Unlock just decrements the count. When we've unlocked enough times,
> somebody else can take the lock.
A looong time ago I gave to someone a recursive spinlock implementation
that they integrated in the USB code. I don't see it in the latest
kernels, so I have to guess that they found a better solution to do their
things. I'm biased to say that it must not be necessary to have the thing
if you structure your code correctly.
- Davide
In article <[email protected]> you wrote:
> On Sun, 18 May 2003, Peter T. Breuer wrote:
>> Here's a before-breakfast implementation of a recursive spinlock. That
> A looong time ago I gave to someone a recursive spinlock implementation
> that they integrated in the USB code. I don't see it in the latest
> kernels, so I have to guess that they found a better solution to do their
> things. I'm biased to say that it must not be necessary to have the thing
> if you structure your code correctly.
Well, you can get rid of anything that way. The question is if the
interface is an appropriate one to use or not - i.e. if it makes for
better code in general, or if it make errors of programming less
likely.
I would argue that the latter is undoubtedly true - merely that
userspace flock/fcntl works that way would argue for it, but there
are a couple of other reasons too.
Going against is the point that it may be slower. Can you dig out your
implementation and show me it? I wasn't going for assembler in my hasty
example. I just wanted to establish that it's easy, so that it becomes
known that its easy, and folks therefore aren't afraid of it. That both
you and I have had to write it implies that it's not obvious code to
everyone.
Peter
On Sun, 18 May 2003, Peter T. Breuer wrote:
> In article <[email protected]> you wrote:
> > On Sun, 18 May 2003, Peter T. Breuer wrote:
> >> Here's a before-breakfast implementation of a recursive spinlock. That
>
> > A looong time ago I gave to someone a recursive spinlock implementation
> > that they integrated in the USB code. I don't see it in the latest
> > kernels, so I have to guess that they found a better solution to do their
> > things. I'm biased to say that it must not be necessary to have the thing
> > if you structure your code correctly.
>
> Well, you can get rid of anything that way. The question is if the
> interface is an appropriate one to use or not - i.e. if it makes for
> better code in general, or if it make errors of programming less
> likely.
>
> I would argue that the latter is undoubtedly true - merely that
> userspace flock/fcntl works that way would argue for it, but there
> are a couple of other reasons too.
>
> Going against is the point that it may be slower. Can you dig out your
> implementation and show me it? I wasn't going for assembler in my hasty
> example. I just wanted to establish that it's easy, so that it becomes
> known that its easy, and folks therefore aren't afraid of it. That both
> you and I have had to write it implies that it's not obvious code to
> everyone.
It was something like this. The real code should be in 2.2.x I believe.
The problem at the time was nested ( recursive ) spinlocks.
typedef struct s_nestlock {
spinlock_t lock;
void *uniq;
atomic_t count;
} nestlock_t;
#define nestlock_init(snl) \
do { \
spin_lock_init(&(snl)->lock); \
(snl)->uniq = NULL; \
atomic_set(&(snl)->count, 0); \
} while (0)
#define nestlock_irqlock(snl, flags) \
do { \
if ((snl)->uniq == current) { \
atomic_inc(&(snl)->count); \
flags = 0; /* No warnings */ \
} else { \
spin_lock_irqsave(&(snl)->lock, flags); \
atomic_inc(&(snl)->count); \
(snl)->uniq = current; \
} \
} while (0)
#define nestlock_irqunlock(snl, flags) \
do { \
if (atomic_dec_and_test(&(snl)->count)) { \
(snl)->uniq = NULL; \
spin_unlock_irqrestore(&(snl)->lock, flags); \
} \
} while (0)
#define nestlock_lock(snl) \
do { \
if ((snl)->uniq == current) { \
atomic_inc(&(snl)->count); \
} else { \
spin_lock(&(snl)->lock); \
atomic_inc(&(snl)->count); \
(snl)->uniq = current; \
} \
} while (0)
#define nestlock_unlock(snl) \
do { \
if (atomic_dec_and_test(&(snl)->count)) { \
(snl)->uniq = NULL; \
spin_unlock(&(snl)->lock); \
} \
} while (0)
- Davide
"Davide Libenzi wrote:"
> On Sun, 18 May 2003, Peter T. Breuer wrote:
> > Going against is the point that it may be slower. Can you dig out your
> > implementation and show me it? I wasn't going for assembler in my hasty
> > example. I just wanted to establish that it's easy, so that it becomes
>
> It was something like this. The real code should be in 2.2.x I believe.
> The problem at the time was nested ( recursive ) spinlocks.
>
>
> typedef struct s_nestlock {
> spinlock_t lock;
> void *uniq;
> atomic_t count;
> } nestlock_t;
This is essentially the same struct as mine. I used the pid of the task,
where you use the address of the task. You use an atomic count, whereas
I used an ordinary int, guarded by the embedded spinlock.
> #define nestlock_lock(snl) \
> do { \
> if ((snl)->uniq == current) { \
That would be able to read uniq while it is being written by something
else (which it can, according to the code below). It needs protection.
Probably you want the (< 24 bit) pid instead and to use atomic ops on
it.
> atomic_inc(&(snl)->count); \
OK, that's the same.
> } else { \
> spin_lock(&(snl)->lock); \
> atomic_inc(&(snl)->count); \
> (snl)->uniq = current; \
Hmm .. else we wait for the lock, and then set count and uniq, while
somebody else may have entered and be reading it :-). You exit with
the embedded lock held, while I used the lock only as a guard to
make the ops atomic.
> } \
> } while (0)
>
> #define nestlock_unlock(snl) \
> do { \
> if (atomic_dec_and_test(&(snl)->count)) { \
> (snl)->uniq = NULL; \
> spin_unlock(&(snl)->lock); \
That's OK.
> } \
> } while (0)
Well, it's not assembler either, but at least it's easily comparable
with the nonrecursive version. It's essentially got an extra if and
an inc in the lock. That's all.
I suspect that the rwlock assembler can be coerced to do the job.
Peter
On Sun, 18 May 2003, Peter T. Breuer wrote:
> This is essentially the same struct as mine. I used the pid of the task,
> where you use the address of the task. You use an atomic count, whereas
> I used an ordinary int, guarded by the embedded spinlock.
>
>
> > #define nestlock_lock(snl) \
> > do { \
> > if ((snl)->uniq == current) { \
>
> That would be able to read uniq while it is being written by something
> else (which it can, according to the code below). It needs protection.
No it does not, look better.
> > atomic_inc(&(snl)->count); \
>
> OK, that's the same.
>
> > } else { \
> > spin_lock(&(snl)->lock); \
> > atomic_inc(&(snl)->count); \
> > (snl)->uniq = current; \
>
> Hmm .. else we wait for the lock, and then set count and uniq, while
> somebody else may have entered and be reading it :-). You exit with
Nope, think about a case were it breaks. False negatives are not possible
because it is set by the same task and false positives either.
> Well, it's not assembler either, but at least it's easily comparable
> with the nonrecursive version. It's essentially got an extra if and
> an inc in the lock. That's all.
Well, there's a little difference. In case of contention, you loop with
your custom try lock while I used the optimized asm code inside spin_lock.
But again, I believe we didn't lose anything with the removal of this code
(nested/recursive locks) from the kernel.
- Davide
On Sun, 2003-05-18 at 18:24, Peter T. Breuer wrote:
> The second method is used by programmers who aren't aware that some
> obscure subroutine takes a spinlock, and who recklessly take a lock
> before calling a subroutine (the very thought sends shivers down my
> spine ...). A popular scenario involves not /knowing/ that your routine
> is called by the kernel with some obscure lock already held, and then
> calling a subroutine that calls the same obscure lock. The request
> function is one example, but that's hardly obscure (and in 2.5 the
> situation has eased there!).
To be honest, if any programmer is capable of committing this error and
not finding and fixing it for themselves, then they're also capable, and
arguably _likely_, to introduce subtle lock ordering discrepancies which
will cause deadlock once in a blue moon.
I don't _want_ you to make life easier for this hypothetical programmer.
I want them to either learn to comprehend locking _properly_, or take up
gardening instead.
--
dwmw2
In article <[email protected]> you wrote:
>>
>> > #define nestlock_lock(snl) \
>> > do { \
>> > if ((snl)->uniq == current) { \
>>
>> That would be able to read uniq while it is being written by something
>> else (which it can, according to the code below). It needs protection.
> No it does not, look better.
I'm afraid I only see that it does!
>> > atomic_inc(&(snl)->count); \
>> > } else { \
>> > spin_lock(&(snl)->lock); \
>> > atomic_inc(&(snl)->count); \
>> > (snl)->uniq = current; \
>>
>> Hmm .. else we wait for the lock, and then set count and uniq, while
>> somebody else may have entered and be reading it :-). You exit with
> Nope, think about a case were it breaks. False negatives are not possible
> because it is set by the same task and false positives either.
No. This is not true. Imagine two threads, timed as follows ...
.
.
.
.
if ((snl)->uniq == current) {
atomic_inc(&(snl)->count); .
} else { .
spin_lock(&(snl)->lock); .
atomic_inc(&(snl)->count); .
(snl)->uniq = current; <-> if ((snl)->uniq == current) {
atomic_inc(&(snl)->count);
} else {
spin_lock(&(snl)->lock);
atomic_inc(&(snl)->count);
(snl)->uniq = current;
There you are. One hits the read exactly at the time the other does a
write. Bang.
>> Well, it's not assembler either, but at least it's easily comparable
>> with the nonrecursive version. It's essentially got an extra if and
>> an inc in the lock. That's all.
> Well, there's a little difference. In case of contention, you loop with
> your custom try lock while I used the optimized asm code inside spin_lock.
> But again, I believe we didn't lose anything with the removal of this code
> (nested/recursive locks) from the kernel.
We lose hours of programmers time, looking for deadlocks caused by
accidently taking the same spinlock twice and not knowing it.
A question in my mind is whether a fault in a third thread, like
sleeping with a spinlock held, can make a recursive spinlock into
a fault causer ... no, I don't see any way.
Peter
On Mon, 19 May 2003, Peter T. Breuer wrote:
> No. This is not true. Imagine two threads, timed as follows ...
>
> .
> .
> .
> .
> if ((snl)->uniq == current) {
> atomic_inc(&(snl)->count); .
> } else { .
> spin_lock(&(snl)->lock); .
> atomic_inc(&(snl)->count); .
> (snl)->uniq = current; <-> if ((snl)->uniq == current) {
> atomic_inc(&(snl)->count);
> } else {
> spin_lock(&(snl)->lock);
> atomic_inc(&(snl)->count);
> (snl)->uniq = current;
>
>
> There you are. One hits the read exactly at the time the other does a
> write. Bang.
So, what's bang for you ? The second task (the one that reads "uniq")
will either see "uniq" as NULL or as (task1)->current. And it'll go
acquiring the lock, as expected. Check it again ...
- Davide
On Sun, May 18, 2003 at 07:24:17PM +0200, Peter T. Breuer wrote:
> Though I've got quite good at finding and removing deadlocks in my old
> age, there are still two popular ways that the rest of the world's
> prgrammers often shoot themselves in the foot with a spinlock:
>
> a) sleeping while holding the spinlock
> b) taking the spinlock in a subroutine while you already have it
[...]
> The second method is used by programmers who aren't aware that some
> obscure subroutine takes a spinlock, and who recklessly take a lock
> before calling a subroutine (the very thought sends shivers down my
> spine ...).
Recursive spinlocks only hide the problem - consider programmers who take
lock B and recklessly call a subroutine that takes lock A followed be lock
B. The resulting code will appear to work fine, but may have introduced a
subtle AB-BA deadlock. I'd rather have a coding defect that reliably and
consistently causes a deadlock than one that causes deadlocks in rare
timing related cases.
>A popular scenario involves not /knowing/ that your routine
> is called by the kernel with some obscure lock already held, and then
> calling a subroutine that calls the same obscure lock.
If the kernel invokes a callback with an obscure lock held (that is
promiscuous enough to be grabbed in other helper sub-routines), then its
probably a bug - why not just fix it?
-Kevin
--
------------------------------------------------------------------------
| Kevin O'Connor "BTW, IMHO we need a FAQ for |
| [email protected] 'IMHO', 'FAQ', 'BTW', etc. !" |
------------------------------------------------------------------------
On Sun, May 18, 2003 at 07:24:17PM +0200, Peter T. Breuer wrote:
> "A month of sundays ago William Lee Irwin III wrote:"
> > At some point in the past, Peter Breuer's attribution was removed from:
> > >> Here's a before-breakfast implementation of a recursive spinlock. That
> > >> is, the same thread can "take" the spinlock repeatedly.
> >
> > On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote:
> > > Why?
> >
> > netconsole.
>
> That's a problem looking for a solution! No, the reason for wanting a
> recursive spinlock is that nonrecursive locks make programming harder.
>
> Though I've got quite good at finding and removing deadlocks in my old
> age, there are still two popular ways that the rest of the world's
> prgrammers often shoot themselves in the foot with a spinlock:
>
> a) sleeping while holding the spinlock
> b) taking the spinlock in a subroutine while you already have it
>
> The first method leads to an early death if the spinlock is a popular
> one, as the only thread that can release it doesn't seem to be running,
> errr..
>
> The second method is used by programmers who aren't aware that some
> obscure subroutine takes a spinlock, and who recklessly take a lock
> before calling a subroutine (the very thought sends shivers down my
> spine ...). A popular scenario involves not /knowing/ that your routine
> is called by the kernel with some obscure lock already held, and then
> calling a subroutine that calls the same obscure lock. The request
> function is one example, but that's hardly obscure (and in 2.5 the
> situation has eased there!).
>
> It's the case (b) that a recursive spinlock makes go away.
It thought we still have the spinlock debuging level 2 which DOES CHECK
THIS (seems noone knows about it - I had to fix a trivial error when
I used it in user-mode arch, but it worked well then). [check means it
gives backtrace]
> Hey, that's not bad for a small change! 50% of potential programming
> errors sent to the dustbin without ever being encountered.
They are errors. Appropriate response to errors is an OOPS.
-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <[email protected]>
Peter T. Breuer wrote:
> Hey, that's not bad for a small change! 50% of potential programming
> errors sent to the dustbin without ever being encountered.
Then you replace errors with inefficiency - nobody discovers that
you needlessly take a lock twice. They notice OOPSes though, the
lock gurus can then debug it.
Trading performance for simplicity is ok in some cases, but I have a strong
felling this isn't one of them. Consider how people optimize locking
by shaving off a single cycle when they can, and try to avoid
locking as much as possible for that big smp scalability.
This is something better done right - people should just take the
trouble.
Helge Hafting
Helge Hafting writes:
> Peter T. Breuer wrote:
>
> > Hey, that's not bad for a small change! 50% of potential programming
> > errors sent to the dustbin without ever being encountered.
>
> Then you replace errors with inefficiency - nobody discovers that
> you needlessly take a lock twice. They notice OOPSes though, the
> lock gurus can then debug it.
>
> Trading performance for simplicity is ok in some cases, but I have a strong
> felling this isn't one of them. Consider how people optimize locking
> by shaving off a single cycle when they can, and try to avoid
> locking as much as possible for that big smp scalability.
>
> This is something better done right - people should just take the
> trouble.
There, however, are cases when recursive locking is needed. Take, for
example, top-to-bottom insertion into balanced tree with per-node
locking. Once modifications are done at the "leaf" level, parents should
be locked and modified, but one cannot tell in advance whether different
leaves have the same or different parents. Simplest (and, sometimes, the
only) solution here is to lock parents of all children in turn, even if
this may lock the same parent node several times---recursively.
>
> Helge Hafting
>
Nikita.
"A month of sundays ago Davide Libenzi wrote:"
> On Mon, 19 May 2003, Peter T. Breuer wrote:
>
> > No. This is not true. Imagine two threads, timed as follows ...
> >
> > .
> > .
> > .
> > .
> > if ((snl)->uniq == current) {
> > atomic_inc(&(snl)->count); .
> > } else { .
> > spin_lock(&(snl)->lock); .
> > atomic_inc(&(snl)->count); .
> > (snl)->uniq = current; <-> if ((snl)->uniq == current) {
> > atomic_inc(&(snl)->count);
> > } else {
> > spin_lock(&(snl)->lock);
> > atomic_inc(&(snl)->count);
> > (snl)->uniq = current;
> >
> >
> > There you are. One hits the read exactly at the time the other does a
> > write. Bang.
>
> So, what's bang for you ? The second task (the one that reads "uniq")
> will either see "uniq" as NULL or as (task1)->current. And it'll go
> acquiring the lock, as expected. Check it again ...
Perhaps I should expand on my earlier answer ...
(1) while, with some luck, writing may be atomic on ia32 (and I'm not
sure it is, I'm only prepared to guarantee it for the lower bits, and I
really don't know about zeroing the carry and so on), I actually doubt
that reading is atomic, or we wouldn't need the atomic_read
construction!
(2) I'm not prepared to bet that one either sees the answer from
before the write was done, or the answer after it is done. I would
suspect that one can get anything, including an explosion or reading
of the works of tolstoy ...
(3) even if one gets either one or the other answer, one of them would
be the wrong answer, and you clearly intend the atomic_inc of the
counter to be done in the same atomic region as the setting to current,
which would be a programming hypothesis that is broken when the wrong
answer comes up.
Peter
"David Woodhouse wrote:"
> To be honest, if any programmer is capable of committing this error and
> not finding and fixing it for themselves, then they're also capable, and
> arguably _likely_, to introduce subtle lock ordering discrepancies which
> will cause deadlock once in a blue moon.
>
> I don't _want_ you to make life easier for this hypothetical programmer.
>
> I want them to either learn to comprehend locking _properly_, or take up
> gardening instead.
Let's quote the example from rubini & corbet of the sbull block device
driver. The request function ends like so:
spin_unlock_irq (&io_request_lock);
spin_lock(&device->lock);
/* Process all of the buffers in this (possibly clustered) request. */
do {
status = sbull_transfer(device, req);
} while (end_that_request_first(req, status, DEVICE_NAME));
spin_unlock(&device->lock);
spin_lock_irq (&io_request_lock);
end_that_request_last(req);
}
device->busy = 0;
}
Notice that he runs end_that_request_first outside the io_request_lock
and end_that_request_last under the lock. Do you know which is right?
(if any :-).
And he takes a device lock before calling the "transfer" routine.
Yes, he's ok because his transfer function is trivial and doesn't
take the lock, but anyone following his recipe is heading for
trouble.
Peter
On Mon, 2003-05-19 at 15:37, Peter T. Breuer wrote:
> "David Woodhouse wrote:"
> > To be honest, if any programmer is capable of committing this error and
> > not finding and fixing it for themselves, then they're also capable, and
> > arguably _likely_, to introduce subtle lock ordering discrepancies which
> > will cause deadlock once in a blue moon.
> >
> > I don't _want_ you to make life easier for this hypothetical programmer.
> >
> > I want them to either learn to comprehend locking _properly_, or take up
> > gardening instead.
>
> Let's quote the example from rubini & corbet of the sbull block device
> driver. The request function ends like so:
defective locking in a driver is no excuse to pamper over it with
recusrive shite.
On Mon, May 19 2003, Peter T. Breuer wrote:
> "David Woodhouse wrote:"
> > To be honest, if any programmer is capable of committing this error and
> > not finding and fixing it for themselves, then they're also capable, and
> > arguably _likely_, to introduce subtle lock ordering discrepancies which
> > will cause deadlock once in a blue moon.
> >
> > I don't _want_ you to make life easier for this hypothetical programmer.
> >
> > I want them to either learn to comprehend locking _properly_, or take up
> > gardening instead.
>
> Let's quote the example from rubini & corbet of the sbull block device
> driver. The request function ends like so:
>
>
> spin_unlock_irq (&io_request_lock);
> spin_lock(&device->lock);
>
> /* Process all of the buffers in this (possibly clustered) request. */
> do {
> status = sbull_transfer(device, req);
> } while (end_that_request_first(req, status, DEVICE_NAME));
> spin_unlock(&device->lock);
> spin_lock_irq (&io_request_lock);
> end_that_request_last(req);
> }
> device->busy = 0;
> }
>
>
> Notice that he runs end_that_request_first outside the io_request_lock
> and end_that_request_last under the lock. Do you know which is right?
> (if any :-).
Both are right, as it so happens.
> And he takes a device lock before calling the "transfer" routine.
> Yes, he's ok because his transfer function is trivial and doesn't
> take the lock, but anyone following his recipe is heading for
> trouble.
In 2.5, the device lock most likely would be the queue lock as well so
no confusion there. But what are you talking about? I'd assume that the
device lock must be held in the transfer function, why else would you
grab it in the first place in the function you quote? Please, if you are
going to find examples to support the recursive locking idea, find some
decent ones...
I think you are trying to make up problems that do not exist. I'd hate
to see recursive locks being used just because someone can't be bothered
to write to code correctly (recursive locks have its uses, the one you
are advocating definitely isn't one). Recursive locks are _not_ a remedy
for someone who doesn't understand locking or isn't capable enough to
design his locking correctly. Period.
--
Jens Axboe
> That's a problem looking for a solution! No, the reason for wanting a
> recursive spinlock is that nonrecursive locks make programming harder.
And more correct.
> Though I've got quite good at finding and removing deadlocks in my old
> age, there are still two popular ways that the rest of the world's
> prgrammers often shoot themselves in the foot with a spinlock:
>
> a) sleeping while holding the spinlock
> b) taking the spinlock in a subroutine while you already have it
So ... we should BUG() on both.
M.
On Mon, 19 May 2003, Peter T. Breuer wrote:
> "A month of sundays ago Davide Libenzi wrote:"
> > On Mon, 19 May 2003, Peter T. Breuer wrote:
> >
> > > No. This is not true. Imagine two threads, timed as follows ...
> > >
> > > .
> > > .
> > > .
> > > .
> > > if ((snl)->uniq == current) {
> > > atomic_inc(&(snl)->count); .
> > > } else { .
> > > spin_lock(&(snl)->lock); .
> > > atomic_inc(&(snl)->count); .
> > > (snl)->uniq = current; <-> if ((snl)->uniq == current) {
> > > atomic_inc(&(snl)->count);
> > > } else {
> > > spin_lock(&(snl)->lock);
> > > atomic_inc(&(snl)->count);
> > > (snl)->uniq = current;
> > >
> > >
> > > There you are. One hits the read exactly at the time the other does a
> > > write. Bang.
> >
> > So, what's bang for you ? The second task (the one that reads "uniq")
> > will either see "uniq" as NULL or as (task1)->current. And it'll go
> > acquiring the lock, as expected. Check it again ...
>
> Perhaps I should expand on my earlier answer ...
>
> (1) while, with some luck, writing may be atomic on ia32 (and I'm not
> sure it is, I'm only prepared to guarantee it for the lower bits, and I
> really don't know about zeroing the carry and so on), I actually doubt
> that reading is atomic, or we wouldn't need the atomic_read
> construction!
Look at atomic read :
$ emacs `find /usr/src/linux/include -name atomic.h | xargs`
> (3) even if one gets either one or the other answer, one of them would
> be the wrong answer, and you clearly intend the atomic_inc of the
> counter to be done in the same atomic region as the setting to current,
> which would be a programming hypothesis that is broken when the wrong
> answer comes up.
Atomic inc/dec/add/sub are different to read/write an aligned sizeof(int)
memory location. An aligned sizeof(int) read/write must be atomic for the
bare bone CPU memory coherency. While add/sub/add/inc are (or at least can
be) split in MEMOP(load)->ALUOP(?)->MEMOP(store) whose full cycle is not
guaranteed to be atomic, load/store of sizeof(int) aligned memory location
are not split is every CPU whose designer was not drunk during the
architectural phase. Or better, if they are split due some sort of HW
limitation, the HW itself has to guarentee the atomicity of the operation.
Intel tries also to guarantee the atomicity of non aligned load/store by
doing split locks on the memory bus. If you have :
int a;
thread1:
for (;;)
a = 1;
thread2:
for (;;)
a = -1;
being "a" an aligned memory location, a third thread reading "a" reads
either 1 or -1. Going back to the (doubtfully useful) code, you still have
to point out were it does bang ...
- Davide
> > (1) while, with some luck, writing may be atomic on ia32 (and I'm not
> > sure it is, I'm only prepared to guarantee it for the lower bits, and I
> > really don't know about zeroing the carry and so on), I actually doubt
> > that reading is atomic, or we wouldn't need the atomic_read
> > construction!
>
> Look at atomic read :
>
> $ emacs `find /usr/src/linux/include -name atomic.h | xargs`
:-). Well, whaddya know. Both read and write of a int (declared
volatile) are atomic on ia32.
> > (3) even if one gets either one or the other answer, one of them would
> > be the wrong answer, and you clearly intend the atomic_inc of the
> > counter to be done in the same atomic region as the setting to current,
> > which would be a programming hypothesis that is broken when the wrong
> > answer comes up.
[snip atomicity of read/write]
> being "a" an aligned memory location, a third thread reading "a" reads
> either 1 or -1. Going back to the (doubtfully useful) code, you still have
> to point out were it does bang ...
OK. I'll have a look later.
Peter
On Llu, 2003-05-19 at 18:27, Peter T. Breuer wrote:
> :-). Well, whaddya know. Both read and write of a int (declared
> volatile) are atomic on ia32.
Except when they aren't. There are ppro SMP issues about ordering of
unlocked stores in some situations. Thats why the PPro generates
lock movb $0, foo for the unlock on PPro
Fixable however by PPro specific lock ops
"Davide Libenzi wrote:"
> On Mon, 19 May 2003, Peter T. Breuer wrote:
> > > > .
> > > > .
> > > > .
> > > > if ((snl)->uniq == current) {
> > > > atomic_inc(&(snl)->count); .
> > > > } else { .
> > > > spin_lock(&(snl)->lock); .
> > > > atomic_inc(&(snl)->count); .
> > > > (snl)->uniq = current; <-> if ((snl)->uniq == current) {
> > > > atomic_inc(&(snl)->count);
> > > > } else {
> > > > spin_lock(&(snl)->lock);
> > > > atomic_inc(&(snl)->count);
> > > > (snl)->uniq = current;
> > > So, what's bang for you ? The second task (the one that reads "uniq")
> > > will either see "uniq" as NULL or as (task1)->current. And it'll go
> > > acquiring the lock, as expected. Check it again ...
Let's take that as hypothetically true.
> > (3) even if one gets either one or the other answer, one of them would
> > be the wrong answer, and you clearly intend the atomic_inc of the
> > counter to be done in the same atomic region as the setting to current,
> > which would be a programming hypothesis that is broken when the wrong
> > answer comes up.
> either 1 or -1. Going back to the (doubtfully useful) code, you still have
> to point out were it does bang ...
The "unexpected" sutuation is when the RH process reads the old value
of uniq, just before the LH process sets it. Let's suppose that it
erroneously reads NULL, since that's what was set last, at the unlock
that lets the LH spinlock open up. That's OK, because it's not equal to
its own task identifier either, and that's the only special thing it's
looking for.
Umm ... actually, the RH process could read uniq before the LH process
entered the spinlock. If it were at that time equal to its own task
identifier, then chaos would ensue. But if it were so equal, then
the RH process would have the spinlock, since it's taken before setting
uniq. OK, I agree, there's no problem if the read doesn't blow up.
Peter
Actually, it is safe to have the concurrent/race condition for the read of a
lock owner (outside the lock) and the write of a lock owner (inside the
lock) as long as the "current owner" value stored inside the lock is "sent
away" before the recursive lock is unlocked for the last time.
The owner value doesn't even have to be atomic (though it ought to be) The
only "hard" requirement is that there is no mid-update value that can ever
be somebody else. This requirement can be violated in some odd segmenting
systems if "me" is a pointer. If me is a machine-primitive scalar the
probability drops to nearly null that a partially written value can match
anybody else (on an x86 platform, you toss a bus-lock around the
assignment, which is what, I believe, the atomic type assignments do).
Since the logic reads "if not me then wait for lock else increment depth"
you only have to guard against false positives. That guard case is
accomplished by making sure you don't ever leave "yourself" in the owner
slot when you free up the lock.
So generically:
get_lock():
if owner==me
then
++depth;
else
wait on actual lock primitive
possibly_atomic_assign(owner = me)
depth = 1
fi
free_lock()
/* optional debug test */
if owner != me
then
oops()
fi
if depth > 1
then
--depth
else
depth = 0
possibly_atomic_assign(owner = nobody)
release actual lock primitive
fi
There needs must be a "nobody" (a well defined quasi-legal value that can
not occur in the legitimate set of owners, like -1 or NULL) assigned to the
owner item so that you cannot encounter your own id in the owner slot when
you don't actually have the lock primitive.
At several glances this structure seems odd and somehow vulnerable, but when
you consider that a thread is never actually competing with itself, you get
closure on the system. (that is, once you really internalize the fact that
the same thread can not be multiply entrant on both of these routines at the
same time, and you see that all other threads will do a proper wait on the
"real" primitive lock because of conditional exclusion (e.g. they are not
ever the owner at the time of the test, no matter how you update owner in
the sequence, if they are, in fact, not the owner of the lock.)
Hitting the read and write at the same time is not a "bang" because (if the
rest of the code is correct) the value being read, even if it is "trash"
because of a partial update, will not let the reader into the "is me" part
of the logic and the not-me branch is a wait for the "real" spinlock.
Again, you might want to use atomic_read() and atomic_set(), then again,
according to the headers you are only guaranteed 24 bits of atomic
protection anyway, so if the owner is a pointer, it probably won't help that
much.
The finer points include the fact that the depth and owner values are
actually protected values inside the domain of the real lock primitive.
They are just as safe as any other value protected by the lock. The
inequality test "outside" the lock is actually a microcosm of the
readers/writer kind of behavior. "Atomicizing" the owner value armors
against the possibility of CPU reading a half-written value that "happens"
to match the enquiring thread ID. But since the only transition that can
occur is from some existent owner to "nobody", or from "nobody" to the new
owner, a well chosen "nobody" can make the atomic assignment moot anyway.
I use this recursion layer all the time with posix mutex(es) in application
layer code where I don't actually have an atomic_t to begin with.
It is actually provable.
Rob.
-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Peter T. Breuer
Sent: Sunday, May 18, 2003 4:15 PM
To: Davide Libenzi
Cc: [email protected]
Subject: Re: recursive spinlocks. Shoot.
In article <[email protected]> you wrote:
>>
>> > #define nestlock_lock(snl) \
>> > do { \
>> > if ((snl)->uniq == current) { \
>>
>> That would be able to read uniq while it is being written by something
>> else (which it can, according to the code below). It needs protection.
> No it does not, look better.
I'm afraid I only see that it does!
>> > atomic_inc(&(snl)->count); \
>> > } else { \
>> > spin_lock(&(snl)->lock); \
>> > atomic_inc(&(snl)->count); \
>> > (snl)->uniq = current; \
>>
>> Hmm .. else we wait for the lock, and then set count and uniq, while
>> somebody else may have entered and be reading it :-). You exit with
> Nope, think about a case were it breaks. False negatives are not possible
> because it is set by the same task and false positives either.
No. This is not true. Imagine two threads, timed as follows ...
.
.
.
.
if ((snl)->uniq == current) {
atomic_inc(&(snl)->count); .
} else { .
spin_lock(&(snl)->lock); .
atomic_inc(&(snl)->count); .
(snl)->uniq = current; <-> if ((snl)->uniq == current) {
atomic_inc(&(snl)->count);
} else {
spin_lock(&(snl)->lock);
atomic_inc(&(snl)->count);
(snl)->uniq = current;
There you are. One hits the read exactly at the time the other does a
write. Bang.
>> Well, it's not assembler either, but at least it's easily comparable
>> with the nonrecursive version. It's essentially got an extra if and
>> an inc in the lock. That's all.
> Well, there's a little difference. In case of contention, you loop with
> your custom try lock while I used the optimized asm code inside spin_lock.
> But again, I believe we didn't lose anything with the removal of this code
> (nested/recursive locks) from the kernel.
We lose hours of programmers time, looking for deadlocks caused by
accidently taking the same spinlock twice and not knowing it.
A question in my mind is whether a fault in a third thread, like
sleeping with a spinlock held, can make a recursive spinlock into
a fault causer ... no, I don't see any way.
Peter
On Sun, May 18, 2003 at 09:09:54PM +0200, Peter T. Breuer wrote:
> In article <[email protected]> you wrote:
> > On Sun, 18 May 2003, Peter T. Breuer wrote:
> >> Here's a before-breakfast implementation of a recursive spinlock. That
>
> > A looong time ago I gave to someone a recursive spinlock implementation
> > that they integrated in the USB code. I don't see it in the latest
> > kernels, so I have to guess that they found a better solution to do their
> > things. I'm biased to say that it must not be necessary to have the thing
> > if you structure your code correctly.
>
> Well, you can get rid of anything that way. The question is if the
> interface is an appropriate one to use or not - i.e. if it makes for
> better code in general, or if it make errors of programming less
> likely.
I dare to disagree. It makes for more messy code in general and might
result in the obvious bugs to be replaced by subtle ones that are far
harder to debug.
> I would argue that the latter is undoubtedly true - merely that
> userspace flock/fcntl works that way would argue for it, but there
> are a couple of other reasons too.
No ;-) Only fcntl does.
> Going against is the point that it may be slower. Can you dig out your
> implementation and show me it? I wasn't going for assembler in my hasty
It's also a waste of memory. There are many structures that have a lock
per instance and four extra bytes (for the owner) would be noticeable.
Not that memory is so precious resource, but cachelines may be.
> example. I just wanted to establish that it's easy, so that it becomes
> known that its easy, and folks therefore aren't afraid of it. That both
> you and I have had to write it implies that it's not obvious code to
> everyone.
It's not about weather it's easy. It's about weather it's useful.
-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <[email protected]>
>> Let's quote the example from rubini & corbet of the sbull block device
>> driver. The request function ends like so:
>
> defective locking in a driver is no excuse to pamper over it with
> recusrive shite.
Arjan is a little too harsh here, but on the principle I happen
to agree, because I worked with systems which allow recursive locks.
They very often cover up for programmer's lack of basic understanding.
Worse, sometimes even experienced programmers can do poorly.
I ran into the latter cathegory of code when fixing so-called
"presto" in Solaris (now replaced by Encore-originated code).
Normal spinlocks are not without problems, in particular people
tend to write:
void urb_rm_priv_locked(struct urb *) {
......
}
void urb_rm_priv(struct urb *u) {
spin_lock_irqsave();
urb_rm_prin_locked(u);
spin_unlock_irqrestore();
}
Which eats a stack frame. We make this tradeoff on purpose,
as a lesser evil.
BTW, I do not see Linus and his leutenants rebuking the onslaught
of recursive ingenuity in this thread. Ignoring the hogwash,
or waiting and watching?
-- Pete
On Mon, May 19, 2003, Pete Zaitcev <[email protected]> wrote:
> >> Let's quote the example from rubini & corbet of the sbull block device
> >> driver. The request function ends like so:
> >
> > defective locking in a driver is no excuse to pamper over it with
> > recusrive shite.
>
> Arjan is a little too harsh here, but on the principle I happen
> to agree, because I worked with systems which allow recursive locks.
> They very often cover up for programmer's lack of basic understanding.
> Worse, sometimes even experienced programmers can do poorly.
> I ran into the latter cathegory of code when fixing so-called
> "presto" in Solaris (now replaced by Encore-originated code).
>
> Normal spinlocks are not without problems, in particular people
> tend to write:
>
> void urb_rm_priv_locked(struct urb *) {
> ......
> }
> void urb_rm_priv(struct urb *u) {
> spin_lock_irqsave();
> urb_rm_prin_locked(u);
> spin_unlock_irqrestore();
> }
>
> Which eats a stack frame. We make this tradeoff on purpose,
> as a lesser evil.
>
> BTW, I do not see Linus and his leutenants rebuking the onslaught
> of recursive ingenuity in this thread. Ignoring the hogwash,
> or waiting and watching?
If past experience is any example, I don't think Linus is completely
against recursive spinlocks.
The uhci driver used a simple implementation at one point in time
because of a tricky locking situation. We eventually discovered a non
recursive method of handling it and ditched the code.
Linus actually helped with the code a little bit.
That being said, I'm happy we found an alternative solution and ditched
the recursive spinlock code. I agree with much of your sentiments about
them as well.
JE
On Mon, May 19, 2003 at 07:54:41PM -0400, Pete Zaitcev wrote:
> BTW, I do not see Linus and his leutenants rebuking the onslaught
> of recursive ingenuity in this thread. Ignoring the hogwash,
> or waiting and watching?
Watching people flog the darkened spot where a dead horse used to be
5 years ago. Amusing sight, that...
Er..
The gnome wields a morality stick, the morality stick wields itself to the
gnome's hand...
---more---
The gnome hits you... you feel better...
Your "moral position"...
(quote) I want them to either learn to comprehend locking _properly_, or
take up gardening instead. (unquote)
...is critically flawed.
In point of fact, "proper" locking, when combined with "proper" definitions
of an interface dictate that recursive locking is "better". Demanding that
a call_EE_ know what locks a call_ER_ (and all antecedents of caller) will
have taken is not exactly good design.
Oh, don't get me wrong, I appreciate that in intimately interlocking code
you have to do some of this, but remember that "a lock" is "morally" a
declaration of need at the module interface level of abstraction.
(for instance) The current kernel design requires that (the editorial) we
need to know what locks are held when our file system hooks are called.
Statements about how some particular lock (let's call him "bob") will be
already held by the kernel when routine X is called are "bad" because they
lead to all sorts of problems when that statement becomes false later.
In the most "morally erect" interface design, the implementer of a module
would never have to worry about locking anything, or at least about the
current lock-state of anything. In the morally perfect universe, the
declarative statement "lock that thing there" would be a smart operator that
would know what all locks, in what dependency orders, needed to be acquired
to "lock that thing there" and the system could execute that pathway of
locking automagically.
Of course, in such a world, the domestic swine would be useful as a bulk
passenger carrier because of the easy way it traverses the sky...
In the current design "we" demand that each designer learn and know what is
locked and what is not. "We" have documents about how in version X the duty
to lock thing Y has been shifted from outside routine Z to inside.
Recursive locks (and a well defined resource allocation order, but who am I
kidding) would actually be a benefit to all on both sides of most _info and
_op structures, they would insulate design changes and improve portability.
A design where I lock things for only as long as I need them is optimal for
that contention.
If my caller does the same, regardless of my actions then his code is
optimal for those contentions too.
If neither is concerned with the other's needs then the lock intervals are
demand based and will tend to allow the bodies of code to be improved on a
"purely local" basis.
To achieve that, you need recursive locking.
I don't think for an instant that we are going to get recursive locking, but
it does make the specification of interfaces better.
It's only real down side is that it lets the lazy and the sloppy get away
with too much before getting burned. Recursive locking plus a "lock
everything I *might* touch because my callees wont care" mentality leads to
overlocking and hard-to-find deadlocks. Everything looks and works find for
a while until everybody, willing to pre-reserve facilities for all their
callees, starts building pool-straddling devices (USB Hard Drives, Network
Extensible File Systems, etc) and only finding their deadlocks late in the
game when things get tangled.
so we wont get recursive locking because of the 02% people who can't be
trusted with it.
But that doesn't make the recursive locking facility anything less than
superior for defining and implementing superior interfaces.
Rob.
-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of David Woodhouse
Sent: Sunday, May 18, 2003 3:35 PM
To: [email protected]
Cc: William Lee Irwin III; Martin J. Bligh; [email protected]
Subject: Re: recursive spinlocks. Shoot.
On Sun, 2003-05-18 at 18:24, Peter T. Breuer wrote:
> The second method is used by programmers who aren't aware that some
> obscure subroutine takes a spinlock, and who recklessly take a lock
> before calling a subroutine (the very thought sends shivers down my
> spine ...). A popular scenario involves not /knowing/ that your routine
> is called by the kernel with some obscure lock already held, and then
> calling a subroutine that calls the same obscure lock. The request
> function is one example, but that's hardly obscure (and in 2.5 the
> situation has eased there!).
To be honest, if any programmer is capable of committing this error and
not finding and fixing it for themselves, then they're also capable, and
arguably _likely_, to introduce subtle lock ordering discrepancies which
will cause deadlock once in a blue moon.
I don't _want_ you to make life easier for this hypothetical programmer.
I want them to either learn to comprehend locking _properly_, or take up
gardening instead.
--
dwmw2
Robert White wrote:
> Er..
>
> The gnome wields a morality stick, the morality stick wields itself to the
> gnome's hand...
> ---more---
> The gnome hits you... you feel better...
>
>
> Your "moral position"...
>
> (quote) I want them to either learn to comprehend locking _properly_, or
> take up gardening instead. (unquote)
>
> ...is critically flawed.
>
> In point of fact, "proper" locking, when combined with "proper" definitions
> of an interface dictate that recursive locking is "better". Demanding that
> a call_EE_ know what locks a call_ER_ (and all antecedents of caller) will
> have taken is not exactly good design.
That depends on how big the total system is. You can break things
down into independent modules and submodules that don't know each other, but
at some point people need to know a whole module to do it properly.
> Oh, don't get me wrong, I appreciate that in intimately interlocking code
> you have to do some of this, but remember that "a lock" is "morally" a
> declaration of need at the module interface level of abstraction.
>
> (for instance) The current kernel design requires that (the editorial) we
> need to know what locks are held when our file system hooks are called.
> Statements about how some particular lock (let's call him "bob") will be
> already held by the kernel when routine X is called are "bad" because they
> lead to all sorts of problems when that statement becomes false later.
>
> In the most "morally erect" interface design, the implementer of a module
> would never have to worry about locking anything, or at least about the
> current lock-state of anything.
How can you say that? Sure, an ideal module might not need to
worry about locks used by "others", but it might need a lock
of its own - one that didn't exist prior to the module.
Surely the implementor must know about this lock and
when to use it. The module's interface
may, after all, be called simultaneoulsy on quite a few
cpus.
> In the morally perfect universe, the
> declarative statement "lock that thing there" would be a smart operator that
> would know what all locks, in what dependency orders, needed to be acquired
> to "lock that thing there" and the system could execute that pathway of
> locking automagically.
>
Such a smart operator is an interesting excercise, but can it be made
about as
lightweight as the current locks? It is otherwise bad, because
understanding the kernel locking isn't black magic - only work.
> Of course, in such a world, the domestic swine would be useful as a bulk
> passenger carrier because of the easy way it traverses the sky...
:-)
> In the current design "we" demand that each designer learn and know what is
> locked and what is not. "We" have documents about how in version X the duty
> to lock thing Y has been shifted from outside routine Z to inside.
>
> Recursive locks (and a well defined resource allocation order, but who am I
> kidding) would actually be a benefit to all on both sides of most _info and
> _op structures, they would insulate design changes and improve portability.
>
> A design where I lock things for only as long as I need them is optimal for
> that contention.
>
> If my caller does the same, regardless of my actions then his code is
> optimal for those contentions too.
>
> If neither is concerned with the other's needs then the lock intervals are
> demand based and will tend to allow the bodies of code to be improved on a
> "purely local" basis.
>
> To achieve that, you need recursive locking.
Or a wrapper function that takes the lock and
calls the function that assumes the lock is taken.
You need to know which to use - but that isn't so
incredibly hard, and you avoid the overhead of
taking a lock twice. (bus lockeed memory access is slow,
conditional jumps...)
> I don't think for an instant that we are going to get recursive locking, but
> it does make the specification of interfaces better.
>
> It's only real down side is that it lets the lazy and the sloppy get away
> with too much before getting burned. Recursive locking plus a "lock
> everything I *might* touch because my callees wont care" mentality leads to
> overlocking and hard-to-find deadlocks. Everything looks and works find for
> a while until everybody, willing to pre-reserve facilities for all their
> callees, starts building pool-straddling devices (USB Hard Drives, Network
> Extensible File Systems, etc) and only finding their deadlocks late in the
> game when things get tangled.
>
> so we wont get recursive locking because of the 02% people who can't be
> trusted with it.
>
> But that doesn't make the recursive locking facility anything less than
> superior for defining and implementing superior interfaces.
Too much abstraction isn't superior.
Helge Hafting
On Tue, 20 May 2003, Helge Hafting wrote:
> Robert White wrote:
> > Er..
> >
> > The gnome wields a morality stick, the morality stick wields itself to the
> > gnome's hand...
> > ---more---
> > The gnome hits you... you feel better...
> >
> >
> > Your "moral position"...
> >
> > (quote) I want them to either learn to comprehend locking _properly_, or
> > take up gardening instead. (unquote)
> >
> > ...is critically flawed.
> >
> > In point of fact, "proper" locking, when combined with "proper" definitions
> > of an interface dictate that recursive locking is "better". Demanding that
> > a call_EE_ know what locks a call_ER_ (and all antecedents of caller) will
> > have taken is not exactly good design.
>
> That depends on how big the total system is. You can break things
> down into independent modules and submodules that don't know each other, but
> at some point people need to know a whole module to do it properly.
>
[SNIPPED...]
Recursive locking is a misnomer. It does during run-time that which
should have been done during design-time. In fact, there cannot
be any recursion associated with locking. A locking mechanism that
allows reentry or recursion is defective, both in design, and
implementation.
The nature of a lock is required to be such that if the locked object
is in use, no access to that object is allowed. Recursive locking
implies that if the lock is in use by the same thread that locked
it before, then access to that object is allowed. In other words,
if the coder (as opposed to designer) screwed up, the locking
mechanism will allow it. If this is the way students are being
taught to write code at the present time, all software will
be moved off-shore in the not too distant future. There is
absolutely no possible justification for such garbage. Just
because some idiot wrote an article and got it published,
doesn't mean this diatribe has any value at all.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.
-----Original Message-----
From: Richard B. Johnson [mailto:[email protected]]
Sent: Tuesday, May 20, 2003 5:23 AM
To: Helge Hafting
> Recursive locking is a misnomer. It does during run-time that which
> should have been done during design-time. In fact, there cannot
> be any recursion associated with locking. A locking mechanism that
> allows reentry or recursion is defective, both in design, and
> implementation.
Amusing... but false...
A lock serves, and is defined by, exactly _ONE_ trait. A lock asserts and
guarantees exclusive access to a domain (group of data or resources etc).
There is nothing inherently "one deep" about that assertion.
(Philosophically stated:) If I say "I own this car for the next hour" and
ten minutes later say "I am taking the car to the store for twenty minutes"
there is no "violation of truth" to the two assertions. The outer hour and
the inner twenty minutes are in no form of conflict. Further, since there
promise of the first assertion is _*NOT*_ that the car will be free for use
after an hour, if the twenty minutes of the second assertion begins at the
58 minute mark of the sort-of-enclosing hour, thus extending the hour, you
are still ok.
More mathematically, if I write an operation "X" that, say, needs exclusive
access to the dircache for some portion of its execution, for correctness I
should lock that dircache. Say I write a second operation "Y" that also
needs the dircache, and locks it appropriately. If someone wants/needs to
create an operation "Alpha" that contains a number of sub operations
including X and Y, but needs to ensure the consistency of the dircache for a
range (of time/operations) larger than X, Y and the any number of operations
"n" what is to be done.
With your artificial definition of locking, the implementer of Alpha must do
one of the following:
1) Separately reimplement (copy) X and Y without locking, so that the lock
may be held by Alpha.
2) Restructure X and Y into X', X, Y', and Y such that all public uses of X
and Y remain unchanged, but X and Y are now locking wrappers around X' and
Y' so that X' and Y' may be used within Alpha.
3) (Multiply) Move the locks out of X and Y into all instances of
invocations of X and Y so that Alpha has equal and unimpeded access to X an
Y that is "identical" to every (now revised) use of X and Y.
*ALL* of the above alternatives are wrong. At no time should a stable
operation "O" need to be recoded or restructured to be rationally used in an
enclosing operation Alpha.
In point of fact, if the lock used in X, Y, and Alpha are, by default,
recursive, Alpha can be coded as needed without having to revisit any of the
operations that Alpha invokes. The implementer of Alpha "probably ought to"
know what X and Y and all instances of "n" do and examine need to pre-claim
locks to prevent internal deadlock as that is more expedient than making
sure that X, Y and "n" all have proper anti-deadlock back-offs.
There is no rational argument against recursive locking that can justify any
system where the creation of an outer operation should view the pathological
restructuring of existent component operations as "the right thing to do".
If you think this doesn't happen, I point you do the documentation on the
VFS and the various notations about locks having moved in and out of
operations.
The simple truth is that your statement:
> The nature of a lock is required to be such that if the locked object
> is in use, no access to that object is allowed.
is PURE FANTASY and logically junct. Correctly stated:
The nature of a lock is required to be such that, if the locked object is in
use, no COMPETING OR OUT OF THREAD/BAND access to that object is allowed.
A recursive lock would "protect" accesses that are IN BAND and thus
completely allowed. Period.
> Recursive locking
> implies that if the lock is in use by the same thread that locked
> it before, then access to that object is allowed.
The statement above is logically useless to your argument, that is, if the
words "recursive locking" were replaced with the word "this" or indeed "any
lock", the statement remains tautological. "(This/Any lock) implies that if
the lock is in use by the same thread that locked it (before/previously)
then access to that object is allowed." See how nicely true that is?
> In other words,
> if the coder (as opposed to designer) screwed up, the locking
> mechanism will allow it. If this is the way students are being
> taught to write code at the present time, all software will
> be moved off-shore in the not too distant future. There is
> absolutely no possible justification for such garbage. Just
> because some idiot wrote an article and got it published,
> doesn't mean this diatribe has any value at all.
Your assertions do nothing to address how the coder of "X" (the inner lock)
has "screwed up" by correctly coding X with the necessary lock.
Your assertions do nothing to address how the coder of "Alpha" can NOT
"screw up" if Alpha requires exclusive access to the facility lock by "X"
*AND* needs to invoke "X".
Your stance is naive and prone to induce errors in the name of an
unreasonable and logically fallacious notion of purity.
Rob.
On Tue, 20 May 2003, Robert White wrote:
>
>
> -----Original Message-----
> From: Richard B. Johnson [mailto:[email protected]]
> Sent: Tuesday, May 20, 2003 5:23 AM
> To: Helge Hafting
>
> > Recursive locking is a misnomer. It does during run-time that which
> > should have been done during design-time. In fact, there cannot
> > be any recursion associated with locking. A locking mechanism that
> > allows reentry or recursion is defective, both in design, and
> > implementation.
>
> Amusing... but false...
>
> A lock serves, and is defined by, exactly _ONE_ trait. A lock asserts and
> guarantees exclusive access to a domain (group of data or resources etc).
>
The "ONE" trait is incorrect.
The lock must guarantee that recursion is not allowed. Or to put
it more bluntly, the lock must result in a single thread of execution
through the locked object.
> There is nothing inherently "one deep" about that assertion.
> (Philosophically stated:) If I say "I own this car for the next hour" and
> ten minutes later say "I am taking the car to the store for twenty minutes"
> there is no "violation of truth" to the two assertions. The outer hour and
> the inner twenty minutes are in no form of conflict. Further, since there
> promise of the first assertion is _*NOT*_ that the car will be free for use
> after an hour, if the twenty minutes of the second assertion begins at the
> 58 minute mark of the sort-of-enclosing hour, thus extending the hour, you
> are still ok.
>
Yes there is, in your example above after being granted access to
the "car", you start its trip to the store, but in the process you
decide to get another "permission" to use the car. The second permission
is invalid because somebody already has been granted exclusive use
of the car. If you don't already know that, then you will spend the
weekend in jail for a DUI.
> More mathematically, if I write an operation "X" that, say, needs exclusive
> access to the dircache for some portion of its execution, for correctness I
> should lock that dircache. Say I write a second operation "Y" that also
> needs the dircache, and locks it appropriately. If someone wants/needs to
> create an operation "Alpha" that contains a number of sub operations
> including X and Y, but needs to ensure the consistency of the dircache for a
> range (of time/operations) larger than X, Y and the any number of operations
> "n" what is to be done.
>
> With your artificial definition of locking, the implementer of Alpha must do
> one of the following:
>
> 1) Separately reimplement (copy) X and Y without locking, so that the lock
> may be held by Alpha.
>
> 2) Restructure X and Y into X', X, Y', and Y such that all public uses of X
> and Y remain unchanged, but X and Y are now locking wrappers around X' and
> Y' so that X' and Y' may be used within Alpha.
>
> 3) (Multiply) Move the locks out of X and Y into all instances of
> invocations of X and Y so that Alpha has equal and unimpeded access to X an
> Y that is "identical" to every (now revised) use of X and Y.
>
> *ALL* of the above alternatives are wrong. At no time should a stable
> operation "O" need to be recoded or restructured to be rationally used in an
> enclosing operation Alpha.
>
You are not making any sense at all. If X needs to see the identical
results of Y, while the cache is locked, then only one lock need exist
and that lock may provide one of the following:
(1) Either access to data before modification, i.e., before the
lock or..
(2) Access to data after modification, i.e., after the lock...
Both for X, Y, respectively, for all of N.
> In point of fact, if the lock used in X, Y, and Alpha are, by default,
> recursive, Alpha can be coded as needed without having to revisit any of the
> operations that Alpha invokes. The implementer of Alpha "probably ought to"
> know what X and Y and all instances of "n" do and examine need to pre-claim
> locks to prevent internal deadlock as that is more expedient than making
> sure that X, Y and "n" all have proper anti-deadlock back-offs.
>
It is not fact so there is no point to the argument. If a lock cannot
retain the state of a locked object, and prevent it from being modifed
out-of-order, then the lock isn't a lock at all. It may be an invention
that may have some use (somewhere), but it is not a lock.
> There is no rational argument against recursive locking that can justify any
> system where the creation of an outer operation should view the pathological
> restructuring of existent component operations as "the right thing to do".
>
> If you think this doesn't happen, I point you do the documentation on the
> VFS and the various notations about locks having moved in and out of
> operations.
>
> The simple truth is that your statement:
>
> > The nature of a lock is required to be such that if the locked object
> > is in use, no access to that object is allowed.
>
> is PURE FANTASY and logically junct. Correctly stated:
>
No. It is correct.
> The nature of a lock is required to be such that, if the locked object is in
> use, no COMPETING OR OUT OF THREAD/BAND access to that object is allowed.
>
You just "invented" something that is not a lock. It is some kind
of procedure that keeps records of access and allows or disallows
access to an object based upon some knowledge of the locked resource
and policy.
> A recursive lock would "protect" accesses that are IN BAND and thus
> completely allowed. Period.
>
This policy will usually be unknown at compile-time, and would
require some kind of knowledge base at run-time. For instance,
one could not just allow the same PID access to the same object
because, even with threads, there will be a different PID for
every attempt at access unless there is a coding error that lets
the locker of a shared resource lock the same resource again --
and this is a coding error for which there can be no justification
whatsoever.
> > Recursive locking
> > implies that if the lock is in use by the same thread that locked
> > it before, then access to that object is allowed.
>
> The statement above is logically useless to your argument, that is, if the
> words "recursive locking" were replaced with the word "this" or indeed "any
> lock", the statement remains tautological. "(This/Any lock) implies that if
> the lock is in use by the same thread that locked it (before/previously)
> then access to that object is allowed." See how nicely true that is?
>
It is obvious that you have some kind of adgenda that attempts
to justify some poor coding methods with some illogical logic.
> > In other words,
> > if the coder (as opposed to designer) screwed up, the locking
> > mechanism will allow it. If this is the way students are being
> > taught to write code at the present time, all software will
> > be moved off-shore in the not too distant future. There is
> > absolutely no possible justification for such garbage. Just
> > because some idiot wrote an article and got it published,
> > doesn't mean this diatribe has any value at all.
>
> Your assertions do nothing to address how the coder of "X" (the inner lock)
> has "screwed up" by correctly coding X with the necessary lock.
>
It was already locked. An attempt of the resource-holder to lock
the resource again is an error, pure and simple.
> Your assertions do nothing to address how the coder of "Alpha" can NOT
> "screw up" if Alpha requires exclusive access to the facility lock by "X"
> *AND* needs to invoke "X".
>
> Your stance is naive and prone to induce errors in the name of an
> unreasonable and logically fallacious notion of purity.
>
The stance is not unreasonable. It also corresponds to what the
QC folks call "Good Standards of Engineering Practice", etc. To
properly write code so that, at compile-time resources are known
to be locked and unlocked in the correct order, goes a long way
towards producing the correct results.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.
-----Original Message-----
From: Richard B. Johnson [mailto:[email protected]]
> The lock must guarantee that recursion is not allowed. Or to put
> it more bluntly, the lock must result in a single thread of execution
> through the locked object.
Where the HECK do you get that load of bull? The one requirement of a
MUTUAL EXCLUSION PRIMITIVE (a.k.a. mutex, a.k.a. lock) is *MUTUAL*
EXCLUSIVITY.
Nothing else. Lets look at the words:
Mutual - "directed by *each* toward the *other* or *others*" (e.g. not self,
duh) {all other definitions talk about group insurance, which applies too
8-)
Exclusion -> exclude -> "To prevent or restrict entrance" (etc.) "to bar
from participation"
So, a mutex erects a "bar to/from participation" "directed by each (holder)
to other (would be holder(s))".
There is no concept of "Self Exclusion" in a lock (mutex et. al.) so
recursion, by definition, is (or should be) permitted.
All else is unfounded blither.
The fact is, that it is easier to write locks that will self dead-lock and
lazy people, acting in the name of expediency, decided that somehow, such
locks were "better" because they didn't want to expend the effort to make
them correct. Still others then try to stand on lazy precedent as if it is
somehow cannon.
The only place/way you can argue this is if the constituent operations "X"
within a larger body of code Alpha are not considered part of Alpha (re, the
previous Alpha is composed of X and others example). But that is like
yelling "I locked it, so my arm, which is not all of me, should not be
allowed to use it because my arm is not me..."
>From the standpoint of purely logical analysis, this is a little esoteric...
and obviously specious tripe.
Rob.
Richard B. Johnson writes:
> On Tue, 20 May 2003, Robert White wrote:
>
> >
> >
> > -----Original Message-----
> > From: Richard B. Johnson [mailto:[email protected]]
> > Sent: Tuesday, May 20, 2003 5:23 AM
> > To: Helge Hafting
> >
> > > Recursive locking is a misnomer. It does during run-time that which
> > > should have been done during design-time. In fact, there cannot
> > > be any recursion associated with locking. A locking mechanism that
> > > allows reentry or recursion is defective, both in design, and
> > > implementation.
> >
> > Amusing... but false...
> >
> > A lock serves, and is defined by, exactly _ONE_ trait. A lock asserts and
> > guarantees exclusive access to a domain (group of data or resources etc).
> >
>
> The "ONE" trait is incorrect.
>
> The lock must guarantee that recursion is not allowed. Or to put
> it more bluntly, the lock must result in a single thread of execution
> through the locked object.
>
Suppose you have a set of objects X, each containing pointer to object
from set Y, each y in Y being protected by a lock. To update pointer in
x from y to y' one has to take lock on y, and global lock G in this
order. To read pointer, it is enough to take G. Several x's may point to
the same y.
How to lock all y's pointed to by elements of X?
If you have recursive locking one may just:
1. take each x from X in turn
2. lock G
3. take y pointed to by x
4. unlock G
5. lock y
6. lock G
7. re-check that y is still pointed to by x, and if not unlock y and go to 3
8. unlock G
Without recursive locking what would one do, but emulate it?
Nikita.
I'll put this response in one final email.
First, I am not impressed with the attempt to use semantics
to justify some poor design. There are many words, especially
in English, that do not mean what they seem to say. Examples
are terminate (kill) impart force (make war), device (nuclear bomb),
etc.
A lock is supposed to allow one to obtain exclusive use of
a resource. It is a resource for which one obtains a lock, not
some lines of code. Some lines of code may constitute a resource,
but seldom do. There is generally an underlying device that
is involved. Since this device may eventually be shared by many,
it is essential that only one user be allowed to use this
device at a time so the operations upon this device are atomic.
Atomic is another word that does not mean what it seems to mean.
In the context of software operations, atomic means that an
operation must complete as intended or have no effect at all.
It is naive (defective) code that makes persons think they need
recursive locking of resources. It also forces code execution at
runtime that checks whether or not a lock has been obtained, by
the very execution thread that obtained the lock in the first place.
This means that whoever wrote the code didn't bother to check
whether or not they had already written some previous lines of
code. It's left up to the CPU to figure this out during runtime.
Here is an example:
procedure()
{
lock_resource();
read_directory();
unlock_resource();
}
Wonderful. Unfortunately read_directory() can by called by others
who don't have a lock on the resource. So the naive coder wants
to have a procedure like this for read_directory():
read_directory()
{
lock_resource();
do_stuff();
unlock_resource();
}
The idea is that lock_resource() should succeed if the caller
already has obtained the lock. This is the 'technique' of
so-called recursive locks. But, suppose some know-it-all
manager states in no uncertain terms that such stuff shall
not be allowed in production code, that it makes maintenance
impossible, etc.
So, our not-to-be out-done coder decides to beat the system by
passing some kind of flag to each of the procedures. This
will tell the procedure if a lock has already been obtained.
procedure()
{
lock_resource();
read_directory(TRUE);
unlock_resource();
}
read_directory(flag)
{
if(!flag)
lock_resource();
do_stuff();
unlock_resource();
}
So, this prevents attempts at double-locking, but this does not
convey information that the resource has been un-locked, so the
flag needs to be a pointer to something to which the first resource-
locker has access, like:
procedure()
{
lock_flag = FALSE;
lock_resource();
lock_flag = TRUE;
read_directory(&lock_flag);
if(lock_flag)
unlock_resource();
}
read_directory(*flag)
{
if(*flag != TRUE)
lock_resource();
*flag = TRUE;
do_stuff();
unlock_resource();
*flag = FALSE;
}
This works if the resource was only the procedure, read_directory(),
and fails miserably if the resource was some underlying device because
we now return from read_directory() without a lock.
So this is shown to be 'proof' that recursive locks are required.
The recursive lock simply moves the lock-flag into the locking
procedure and the procedure keeps track of recursion so that the
'real' unlocking only occurs after the Nth call if there were N
calls to lock. So the recursive lock comprises a counter and some
additional identification method. This seems to fix everything while,
in fact, it covers over the real problem, defective code.
In the example case, the problem would be fixed by inspection.
You look at the code and see what it is doing. In this case,
the code could be simplified as:
procedure()
{
read_directory();
}
read_directory()
{
lock_resource();
do_stuff();
unlock_resource();
}
Or simply reduced to calling read_directory() without the
intervening procedure. There is never any reason, ever, to
attempt to obtain a lock on a resource by the execution thread
that has already locked the resource. This is demonstrable proof
of a bug.
Often 'impossible' locking situations can be resolved by using
a relay procedure. In the days of old-and-bold engineers who
coded in octal because assembly was too high a level, such
procedures were commonplace. A relay procedure is some procedure
that assures a single path of execution to and from some code that
operates upon a shared resource. Such a procedure lines all
the "ducks up in a row" so that the code that operates upon
the shared resource operates atomically even though the code
itself contains no locks. Here is a trivial example:
relay()
{
mem = obtain_all_memory()
lock_all_resources();
execute_procedure(mem);
unlock_all_resources();
free(mem);
}
Since the only path to execute_procedure() is through this
relay code, there cannot be any failures to unlock the device
in certain execution paths or memory leaks, etc. Everything
necessary to execute that common procedure atomically is
set up ahead of time and torn down after it returns.
Of course such code in real life isn't very useful because one
generally doesn't know how much memory, or how many resources
are actually required until the procedure starts execution.
It is meant to demonstrate a method of executing a gigantic,
hard to comprehend, procedure with guaranteed atomicity. You
call it using a single execution path that obtains all its
resources first and releases them after the procedure returns.
The easiest way to emulate this in an operating system is to
have one global lock. Anybody who needs to have guaranteed
atomicity takes the global lock. Unfortunately this is not
efficient so some finer granularity locks needed to be
established in real operating systems. Every time somebody
decides that a section of code needs to be locked, the
possible execution paths that could result in a lack of
atomicity need to be reviewed. If there are none, then no
lock is needed. If there are some such paths, then locks
need to be acquired in those paths only.
In one email there was mention of the 'necessity' of
recursive locks in network file-systems.
Network file-systems create a bad problem because they
are really 'state-less', with hooks to make them seem
at least safe to use. An 'advisory' lock on a shared
resource is like getting almost pregnant. A lock should
be total and complete or it should not exist at all.
In the case of network file-system locking, one needs
a lock manager to watch over the whole mess so that
if a client dies, disconnects, or otherwise goes away,
its locked shared resources are unlocked after some
bookkeeping that may allow for reconnection. The lock
manager carries out policy so it isn't just the lock it's
concerned with, but the whole communications "plant".
So, this is not a "locking" issue, but a network file-
system management issue, for which there are at least
partial solutions, with new problems and newer solutions
being discovered almost monthly.
On Tue, 20 May 2003, Robert White wrote:
>
>
> -----Original Message-----
> From: Richard B. Johnson [mailto:[email protected]]
>
> > The lock must guarantee that recursion is not allowed. Or to put
> > it more bluntly, the lock must result in a single thread of execution
> > through the locked object.
>
> Where the HECK do you get that load of bull? The one requirement of a
> MUTUAL EXCLUSION PRIMITIVE (a.k.a. mutex, a.k.a. lock) is *MUTUAL*
> EXCLUSIVITY.
>
> Nothing else. Lets look at the words:
>
> Mutual - "directed by *each* toward the *other* or *others*" (e.g. not self,
> duh) {all other definitions talk about group insurance, which applies too
> 8-)
>
> Exclusion -> exclude -> "To prevent or restrict entrance" (etc.) "to bar
> from participation"
>
> So, a mutex erects a "bar to/from participation" "directed by each (holder)
> to other (would be holder(s))".
>
> There is no concept of "Self Exclusion" in a lock (mutex et. al.) so
> recursion, by definition, is (or should be) permitted.
>
> All else is unfounded blither.
>
> The fact is, that it is easier to write locks that will self dead-lock and
> lazy people, acting in the name of expediency, decided that somehow, such
> locks were "better" because they didn't want to expend the effort to make
> them correct. Still others then try to stand on lazy precedent as if it is
> somehow cannon.
>
> The only place/way you can argue this is if the constituent operations "X"
> within a larger body of code Alpha are not considered part of Alpha (re, the
> previous Alpha is composed of X and others example). But that is like
> yelling "I locked it, so my arm, which is not all of me, should not be
> allowed to use it because my arm is not me..."
>
> >From the standpoint of purely logical analysis, this is a little esoteric...
> and obviously specious tripe.
>
> Rob.
>
Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.
In article <[email protected]> you wrote:
> From: Richard B. Johnson [mailto:[email protected]]
>> The lock must guarantee that recursion is not allowed. Or to put
>> it more bluntly, the lock must result in a single thread of execution
>> through the locked object.
> Where the HECK do you get that load of bull? The one requirement of a
> MUTUAL EXCLUSION PRIMITIVE (a.k.a. mutex, a.k.a. lock) is *MUTUAL*
> EXCLUSIVITY.
:-) this reminds me of the post I didn't make.
> All else is unfounded blither.
Hey, that was the word I used in the post I didn't make! I'm sort of
glad I didn't say it out loud - it ruins the argument.
Anyhow, I have now written a tool that can detect various improper uses of
locks. I've started out by detecting sleep under spinlock, but I'll go
on to other deadlocks.
% ./c ../dbr/1/sbull..c
<function name=sbull_ioctl file=sbull.c line=171 attr=S_SLEEP>
<function name=sbull_request file=sbull.c line=361 attr=S_SLEEP>
%
Detects two functions in the target file that sleep. For that it had
to analyse 28K lines of C. Put a spinlock in the wrong place and ...
% ./c test16
sleep under spinlock at file="test16" line=30
<function name=sbull_open file=test16 line=1 attr=S_SLEEP>
(I excised a function to fiddle with and put a call to one of the other
two in under a lock).
I'll make this available. As soon as I figure out some more of how
to define C contexts. It uses a database and I'm having trouble
deciding where something is first defined in C.
Peter
/Heavy Sigh
You insist on missing the simplest of points and then dressing your response
as cannon.
Simple point 1:
We have public interfaces within the kernel. These interfaces include all
of the various whatever_ops structures (e.g. struct file_ops) and are
characterized by a list of pointers-to-functions, each with a "well known"
behavior.
Simple point 2:
We occasionally see changes to the afore mentioned, and required "well
known" behaviors, in the form of a change of responsibility for locking.
Not counting the move from the "big kernel lock" to all that came after,
whenever a definition of a routine changes from "called holding some_lock"
"no longer called with some_lock held" a whole lot of code gets stirred up.
(not the key point though...)
Simple Point 3: (The real point)
In the existing system, it is impossible to aggregate the existing code
calls under an umbrella call, if that umbrella needs the same lock as the
constituent calls.
Lets say I have a file system with a perfectly implemented unlink and a
perfectly implemented rename. Both of these routines need to exist exactly
as they are. Both of these routines need to lock the vfs dentry subsystem
(look it up.)
(slightly contrived example follows)
But now lets say that I am building a virtualization layer for, say, an
network file system, or a caching file system, or a logical volume
concatenator or something. It doesn't really matter why exactly, but lets
say that to implement some semantic requirement of the existing network file
system I need to implement a "destructive rename", that must be atomic. The
"natural solution" is to implement something that looks like
destructive_rename(...)
{
lock_dircache
underlying_ops->unlink(target_name)
underlying_ops->rename(source_name, target_name)
unlock_dircache
}
But that natural synthesis of primitives wont work because both the unlink
and the rename on the underlying file system are implemented with the
absolutely correct lock_dircache they need internally. This "natural code"
will self-deadlock.
You assert that this can only happen if someone "did bad" in some form or
another. You are wrong. Your simple-minded view of locking is, well,
simple minded. In the stated example, for instance, there is no way to meet
the requirements without manhandling the members pointed to by
underlying_ops, violating the requirements of the outer layer, or hard
coding the theoretical upper-layer so that it only works with a particular
underlying file system. None of these are less-than-brain-dead options.
But, you say, no such problem sets exist...
Here is a problem statement that is virtually unimplementable in the current
Linux kernel because of the simple-minded locking model.
Create a meta file-system that takes two existing (backing) file system
devices (or existing mounted directories) and aggregates them such that:
(Ladies, and Gentlemen, The "deltafs" file system...)
1) It doesn't matter what types of file systems are used as the backing file
systems.
2) The aggregate file system is fully read-write
3) The base (first existing) file system is read-only
4) The front (second existing) file system is read-write
5) All operations are available on the aggregate file system (unlink,
rename, open for write, open for append, chown, set access time, etc.)
6) The aggregate file system engine will transcribe all the modified files
from the base to the front file system if the file is modified.
7) The aggregate file system will (probably using a reserved file name and a
journaling structure encoded therein) maintain a white-out list to hide
unlinked and renamed files.
8) After unmount, the front file system should be a minimal delta of the
base file system
9) Remounting the same combination of file systems should consistently
result in the same, consistent file system image.
10) (you get the point... 8-)
Now, the natural, and only rational implementation, would be to build a
module with a "Y" shape. The two short legs being the backing file systems
and the long leg being the presentation side that links under "real" mount
point. You'd probably need a kernel thread too. Regardless of the
secondary concerns, the aggregate file system will need to... wait for it...
aggregate.
For instance, when you search the dircache for a file, the dircache routines
for the aggregate file system will need to call into the dircache routines
of the backing file systems, and not necessarily in a symmetric way. There
are "interesting" permutations. A simple lookup within the aggregate system
will have to (typically) do three things: Look for the file on the front
component, or then check for the file in the white-out list, or then check
for the file on the read-only base file system.
Another interesting aggregate operation would be the various incarnations of
open (and subsequent use). If I open, or god forbid, map a file for
reading, and then someone else opens it for writing, I can't have just
passed out the inode from the base file system out because the write access
will not ever be reflected in the reading/mapping of the open-for-read case.
You can still almost do this as it exists today, but if you go and allow the
backing elements to be bound directories (a la mount --bind) instead of pure
devices everything flies right out the window. (The difference being the
shared visibility of the underlying dentry stuff to "third parties.")
You could get close to a good aggregate file system if you just don't lock
anything at the aggregate levels and then hope concurrent accesses don't
fall into some k-hole while complex operations (e.g. rename) are in for a
competitor. THAT, however, is really bad design.
So answer one question: in the case of say "mount -t deltafs -o
front=/dev/ram/0,back=/dev/hda3 aggregate /opt" where /dev/ram/0 is a ext2
file system, and /dev/hda3 is a riserfs, who was the "stupid programmer"
that caused the self deadlock? The guy who wrote ext2? The guy who wrote
riserfs? The guy who wrote deltafs? (probably the latter in your mind.)
Would the guy who wrote deltafs have been "smart" if he didn't lock things
and just hoped for the best? Would he have been smart to take the locks,
but then drop them quickly around the calls to his constituent file systems
and again hoe for the best?
Simply put, the fault lies in defining a public interface
(dentry_operations, and file_ops) that can and will self-deadlock if used in
aggregate operations and lo-and-behold if the used in "public" operations
were reentrant then the problem would be solved for everyone. Period.
There are other examples of highly desirable aggregate or reentrant access
requirements that will come up. Do date they haven't really because they
are flat out impossible to do more than fake up, but as hot-pluggable
devices and loopback based techniques come to the fore, the dependency list
starts to become unbounded.
The point you doubly miss is that I am not suggesting "all locks must be
reentrant" as some alternate pronouncement from on high, as a new truth to
replace the old in some quasi-religious coup. Many of the (spin)locks
should be just as they are. The per-cpu locks are an example. The paging
system locks are another. The should-be-self-exclusive locks are
characterized by a particular "privacy".
The counter-truth is that any lock that should/could/might be held by code
immediately on either side of a public interface should use recursive
locking. These are primarily locks with a direct relationship to the
various _operations structures. That is dentry_operations, file_ops, etc.
may each have one or more resources that naturally relate to them, said
resources are usually obvious because they are documented as locks that will
be taken before call, might be taken before call, should be taken during
phase X of what the call is supposed to do, or "must never be taken". These
locks should be recursive.
That way, when you want to have your file system on your (conditionally?)
(partially?) encrypted block device which is dependent on your removable USB
dongle key, the authors for each of the components don't have to go and
crack open proven code. (e.g. the block device is encrypted [a la CCS, but
not stupid] and is only accessible if the dongle key is inserted. Should
that arrangement require the provider to also provide a mega-patch to ext3?
I think not.)
Sure, all your super-simple examples support super-simple locking. The
interesting problems in the real world occasionally demand something more
than simple-minded-ness.
All of your counter examples and complaints boil down to variations on one,
very simple and straight forward case:
> relay()
> {
> mem = obtain_all_memory()
> lock_all_resources();
> execute_procedure(mem);
> unlock_all_resources();
> free(mem);
> }
>
> Since the only path to execute_procedure() is through this
> relay code, there cannot be any failures to unlock the device
> in certain execution paths or memory leaks, etc. Everything
> necessary to execute that common procedure atomically is
> set up ahead of time and torn down after it returns.
The logical outcome of your proposals is that there would (at any given
level of abstraction) be some outer ring of relay functions that wrap the
"real implementations" of all that lies within. It would have to be an
outer ring so that, for my previous example, direct uses of ext3 and deltafs
would hit the locks but the internal uses where deltafs would call ext3
directly would call the routines behind the locks. Such a ring could only
be implemented by doubling the width of the public interface so that it had
the ring-surface calls and the intra-ring calls (like read_op and
read_by_peer_op). Yech.
So who is in charge of making sure that everything on both sides of any
possible combination of, say, dentry_operations is going to be re-dressed
with these relay functions?
Nobody, because that is naive and unworkable for aggregate code, you would
only be guessing who might want to aggregate what combinations of things.
It is a bad case of "magic happens here" and while such magic can be made in
the specific cases, making that magic happen in the general case is the road
back to the "BIG KERNEL LOCK".
Rob.
On Wed, May 21, 2003 at 02:56:12PM -0700, Robert White wrote:
> Lets say I have a file system with a perfectly implemented unlink and a
> perfectly implemented rename. Both of these routines need to exist exactly
> as they are. Both of these routines need to lock the vfs dentry subsystem
> (look it up.)
_Do_ look it up. Neither ->unlink() nor ->rename() need to do anything with
any sort of dentry locking or modifications.
Illustrates the point rather nicely, doesn't it? What was that about
taking locks out of laziness and ignorance, again? 2%? You really
like to feel yourself a member of select group...
Unfortunately, that group is nowhere near that select - look up the
Sturgeon's Law somewhere. 90% of anything and all such...
yep, I kind of figured that was going to happen... (doh! 8-)
(I was groping for the example without my references at hand. My bad for
crediting someone with willingness to reason when I know they deliberately
do not want to get the point... My apologies to the rest of the list
here... 8-)
Did my error in routine selection render you so fixated on counting coup
that you TOTALLY missed the point about aggregation of operations?
I bet it did...
/sigh
-----Original Message-----
From: viro@http://www.linux.org.uk [mailto:viro@http://www.linux.org.uk]On Behalf Of
[email protected]
Sent: Wednesday, May 21, 2003 5:14 PM
To: Robert White
Cc: [email protected]; Helge Hafting; Linux kernel
Subject: Re: recursive spinlocks. Shoot.
On Wed, May 21, 2003 at 02:56:12PM -0700, Robert White wrote:
> Lets say I have a file system with a perfectly implemented unlink and a
> perfectly implemented rename. Both of these routines need to exist
exactly
> as they are. Both of these routines need to lock the vfs dentry subsystem
> (look it up.)
_Do_ look it up. Neither ->unlink() nor ->rename() need to do anything with
any sort of dentry locking or modifications.
Illustrates the point rather nicely, doesn't it? What was that about
taking locks out of laziness and ignorance, again? 2%? You really
like to feel yourself a member of select group...
Unfortunately, that group is nowhere near that select - look up the
Sturgeon's Law somewhere. 90% of anything and all such...
Robert White wrote:
>
> Here is a problem statement that is virtually unimplementable in the current
> Linux kernel because of the simple-minded locking model.
>
> Create a meta file-system that takes two existing (backing) file system
> devices (or existing mounted directories) and aggregates them such that:
>
> (Ladies, and Gentlemen, The "deltafs" file system...)
Call it unionfs.
> 1) It doesn't matter what types of file systems are used as the backing file
> systems.
> 2) The aggregate file system is fully read-write
> 3) The base (first existing) file system is read-only
> 4) The front (second existing) file system is read-write
> 5) All operations are available on the aggregate file system (unlink,
> rename, open for write, open for append, chown, set access time, etc.)
> 6) The aggregate file system engine will transcribe all the modified files
> from the base to the front file system if the file is modified.
> 7) The aggregate file system will (probably using a reserved file name and a
> journaling structure encoded therein) maintain a white-out list to hide
> unlinked and renamed files.
> 8) After unmount, the front file system should be a minimal delta of the
> base file system
> 9) Remounting the same combination of file systems should consistently
> result in the same, consistent file system image.
> 10) (you get the point... 8-)
IIRC, Al Viro was working on this and we might have it in 2.6
http://www.ussg.iu.edu/hypermail/linux/kernel/0201.0/0745.html
Al?
Regards,
Carl-Daniel
On Mon, 19 May 2003, Robert White wrote:
> In point of fact, "proper" locking, when combined with "proper"
> definitions of an interface dictate that recursive locking is "better".
> Demanding that a call_EE_ know what locks a call_ER_ (and all
> antecedents of caller) will have taken is not exactly good design.
So call_EE_ messes with the data structure which call_ER_
has locked, unexpectedly because the recursive locking
doesn't show up as an error.
Looks like recursive locking would just make debugging
harder.
Rik
--
Engineers don't grow up, they grow sideways.
http://www.surriel.com/ http://kernelnewbies.org/
On Monday 19 May 2003 13:37, Nikita Danilov wrote:
> There, however, are cases when recursive locking is needed. Take, for
> example, top-to-bottom insertion into balanced tree with per-node
> locking. Once modifications are done at the "leaf" level, parents should
> be locked and modified, but one cannot tell in advance whether different
> leaves have the same or different parents. Simplest (and, sometimes, the
> only) solution here is to lock parents of all children in turn, even if
> this may lock the same parent node several times---recursively.
Having locked the parent of one child you then check whether the parent of the
second child is in fact the same, in which case you already hold the lock and
must remember to unlock it only once, otherwise lock the second parent. Do
you see a hole in that?
Regards,
Daniel
> From: Rik van Riel [mailto:[email protected]]On Behalf Of Rik van Riel
> So call_EE_ messes with the data structure which call_ER_
> has locked, unexpectedly because the recursive locking
> doesn't show up as an error.
Yes and no. It all hinges on that nonsensical use of "unexpectedly".
(I'll be using fully abstract examples from here on out to prevent the
function call police from busting me again on a technicality... 8-)
call_EE_ is an operator invoked by call_ER_, call_ER_ is changing the
protected structure/data/state call_ER_ locked by invoking call_EE_ against
it. Nothing "unexpected" has happened because call_ER_ invoked call_EE_ for
a reason and with the expectation that call_EE_ would do whatever it is
designed to do. Invoking call_EE_ is an imperative command to "mess with
the structure" if that is what call_EE_ is supposed to be doing. When you
call atomic_inc(something) you expect something to be "messed with" in a
particular way. Is the fact that "something" got bigger by one "an error"?
Nope. How about if it precision wrapped to zero? Probably not, but if it
is, that is a larger semantic problem, not a problem with atomic_int() or
your use thereof. Whenever anybody holds a lock they are expected to
understand what the operations they invoke will do to the protected
resource. The fact that the operator asserts nested ownership for the
duration of its nested operation is mathematically provable as a non-issue.
If you don't know what an operator is going to do, you shouldn't be invoking
it in the kernel in the first place. (or anywhere else for that matter. 8-)
If an operator does something it isn't supposed to do, the operator is
broken by any and every measure regardless of locking.
How hard is this to understand?
(Looking back at "my original message" in this thread I exactly cede the
point that what recursive locking give you on the well-behaved interface end
it can take back from you on the lazy and/or stupid programmer end.)
We already demand that the programmer who uses any operation inside the
kernel understand the requirements and dangers of that operation. It is
implicit that if you play with some pointer protected by a lock you are
going to lock it, and unlock it, and not trash it etc. That is already a
covenant inside the walls of the kernel.
Holding a lock nets you a guarantee that "only the operations you invoke
will change the protected data". The fact that some of those operations
might be function calls is a given. The recursive lock concept adds exactly
one "new feature", with recursive locks, you can call functions that
themselves contain sections that want to make that same assertion. Without
recursive locks you are instantly barred from calling an existing function
(that presumably exists for its own rights a purposes) that does exactly
what you need to do, if that function itself wants to make sure that only it
is changing the protected data.
How much sense does that make as a restriction? "You cannot call that
function, it demands exclusive access to the data you have already claimed
for the express purpose of modifying via function call(s)." That is like
getting your car towed away because didn't move it after the police put a
boot on the wheel.
That is actually the whole thing. As it is today, if a move (of a file)
were implemented as a link and an unlink, (and if those two routines needed
to lock something which I am informed they do not 8-], and you were
implementing the move after-the-fact for some reason) then the move function
would have to be a copy of link and unlink function text now living in a new
third function with the locking rearranged. But now we are maintaining two
essentially identical copy of the code for links and unlinks. (or you would
move the locks out of the "real" unlink and link and wrapper them with ...
etc, od nausium, amen 8-)
When you take the steps to allow aggregation across/around an interface you
(just as you do today) define what any given routine is supposed to do and
what it will/should dirty. If the implementers of the "inside" of the
interface (e.g. the guy who filled in the file_operations structure in his
driver) does his job then the code has a well defined behavior. It is
expected that "whangle_inode_list" will "whangle" the inode list, whatever
"whangle" means in this usage.
The value of the recursive lock is that it lets the guy who comes along
later work completely within the clearly defined set of sub-operations.
That is, that programmer can know, if he has claimed the inode list lock,
that the only entity who "whangled" it was the entity he called, and if
there was another operator on the list "bojum", if he never called
"bojum_inode_list" then bojum didn't happen, or if it did, it happened in
the course of the explicitly invoked whangle. This knowledge exists
independently of whangle's need to make sure that nobody bojum(s) while the
whangle is taking place.
At the aggregation level it would be "dumb" to something like
boo()
{
lock_thingy;
void* X = fetch_internal_thingy_pointer();
some_thingy_operations->whangle();
tingy_use_without_revalidation(X);
unlock_thingy;
}
without the sure and simple knowledge that whangle() will not invalidate X.
But this is no more difficult a mental leap than knowing whether or not
free(X) will invalidate X.
If you "know" X can not be invalidated by any "legal" whangle() then this
code is okay, and can be used with every possible some_thingy_operations
structure. If you don't know this, then this is bad code.
The thing is, by defining the thingy_operations standard, and including the
fact that lock_thingy protects certain data and states, and "whangle" never
"bojums" (or whatever assertions we can or cant make for "any thingy") then
we have clearly defined an interface.
An other programmer, before or after the creation of any given thingy driver
could, if he needs an "atomic" operation "snark" that is a whangle, a bojum,
and another whangle in that specific order with no exposure to others trying
to whangle, bojum, or snark, could easily write snark. Without recursive
locking "snark" would have to look like
snark() {
some_thingy->whangle()
some_thingy->bojum()
some_thingy->whangle()
}
but this snark would not be atomic, it doesn't lock anything. If each
operation is done inside a "hopefully minimum" unlock, the danger is still
very real.
snark() { // meaningless snark
lock_thingy;
unlock_thingy; thingy->whangle(); lock_thingy;
unlock_thingy; thingy->bojum(); lock_thingy;
unlock_thingy; thingy->whangle(); lock_thingy;
unlock_thingy;
}
Clearly if snark is at a higher level of abstraction from whangle and bojum,
and snarks are mutually exclusive, then we can just put in a snark_lock.
But really if we "want to" lock thingy(s) individually instead of blocking
all snarks what really are the alternatives to recursive locking?
In fact, placing the entire onus on the aggregators to properly use their
parts (operators and function calls) and be wary of side effects, places
that onus exactly where it belongs. Someone capable of composing the
abstraction is not likely to be tripped up by it.
We know that "reading a file moves the read pointer" and so forth. If
someone implements a set of operations in a thingy_operations struct that
pisses all over things it shouldn't be touching, that person is to blame for
their bad code. If someone else using a thingy_operations struct does so
without a care as to what the operations do, then they are to blame. This
is actually no more complex than knowing that if I atomic_inc(some_value),
its going to "get larger", with the caveat of zero-wrap hanging over my
head.
There will always be the cases where some particular aggregation is not now
safe and never will be. An example from before, if the deltafs (unionfs?)
were to try to mount the same device as the front and base elementary file
systems they would be doomed to failure.
Aggregate use is invaluable, but it also requires as much understanding of
programming in this kind of environment as anything else. It also allows
for the construction of some incredible dependencies "at runtime" and a
proper aggregate entity should be ready to deal with that. These are not
insurmountable, but they are real costs. You can't protect everyone from
themselves.
The win is that, for the deltafs example, if you never violate your
constituent components interfaces (e.g. the etx3 front-side file system
semantics and data), you can co-own their elements and make "bigger
promises" "for free" by leveraging the mature code.
A well defined interface with recursive locks is no more difficult to
maintain and generally exist within, than a well designed interface with
self-deadlocking locks, it just has more "depth" (if you will forgive the
quasi-pun) because of the opportunities for aggregation.
Both beat the hell out of an interface definition that has to have all its
code "fixed" because a lock needs to move from "callers responsibility" to
"callees responsibility" or vice-versa in the next version.
This stuff is much clearer at the application level. Make an MT-Safe Deque
template that can work "over" both an MT-Safe List. The locking needs to be
promoted to the Deque. If you want to view the contents as both the Deque
and the List then the locking can't be in both separately, nor can it be in
just one (gets obscure, but has to do with the "wait for object" nature of a
Deque not being part of a List). If the locks are recursive, the Deque can
be implemented without private access to the list's naughty bits. User
calls Deque->Push which calls List->PutOnTail the lock is set twice, the
end. View as list, view as deque, it all works. When you add the "wait for
object" version of Pop, List isn't suddenly considered broken, but it is
completely up to the writer of Deque to do what is necessary. There is some
magic to wrapping a recursive structure around posix_mutex to yield a
recursive entity that is useful with a posix_condition, but that is a
totally different post. 8-)
The point it, you *can* and indeed *must* apportion responsibility with any
public interface anyway, so using recursive locking in a public interface
adds far more than it costs.
Rob.
PS and again, I am not advocating that every lock "must be" recursive, just
the ones tied to public interfaces (say, those expected to be used by
functions named in any legal module _operations struct) need to be
considered for recursion. Clearly the per-CPU locks and such are not
candidates for recursion. There is a distinction to be made, and refusing
the value of recursive locking is not making a distinction, it is clapping
ones ears to ones head and dancing about under the assumption that since it
isn't right in some instances it is wrong in all instances.
Robert White wrote:
>>From: Rik van Riel [mailto:[email protected]]On Behalf Of Rik van Riel
>>
>
>>So call_EE_ messes with the data structure which call_ER_
>>has locked, unexpectedly because the recursive locking
>>doesn't show up as an error.
>>
>
>Yes and no. It all hinges on that nonsensical use of "unexpectedly".
>
>(I'll be using fully abstract examples from here on out to prevent the
>function call police from busting me again on a technicality... 8-)
>
Oh come on! Now I won't get into this argument, but you can't win
by saying "if this were implemented in such a way that recursive
locking is required, then recursive locking is required here"!!
Look: I could easily reimplement your snark so it doesn't have to call
the last "whangle" - now see how it can be done completely lockless
with a memory barrier between a and b?
Locking is an implementation issue, and as such I think you'll have
to come up with a real problem or some real code. You must have some
target problem in mind?
Best regards,
Nick
"Nick Piggin wrote:"
> Locking is an implementation issue, and as such I think you'll have
> to come up with a real problem or some real code. You must have some
> target problem in mind?
I'll butt back in here.
I found a problem when I made two drivers talk to each other.
Concretely a modified raid MD driver and a block device driver.
Each driver had ioctls that walked or modified a linear list, and locked
the list against other threads while running their subroutine in the
ioctl. To be concrete again, the lists were respectively the set of
raid arrays in which the block device found itself currently, and
the md drivers internal lists of arrays, component devices, etc.
The ioctls worked fine when used from user space.
Then I had the bright idea of getting the block driver to call the md
driver's ioctl automatically when a certain condition arose. Concretely
again, I had the block device driver tell the md driver "setfaulty" when
the block device noticed an internal problem, and "hotadd" when it felt
cured.
Unfortunately, I had already gotten the md driver to tell the block
driver when it was booted out from or newly included in an array
(so that it could know if it should tell md when it felt ill or well).
The result was a call from the block driver to the md driver with a
lock held, and a rather unexpected call back from the md driver that
impotently tried to take the same lock.
Same thread.
Peter
On Thu, May 22, 2003 at 06:42:15AM +0200, Peter T. Breuer wrote:
> "Nick Piggin wrote:"
> > Locking is an implementation issue, and as such I think you'll have
> > to come up with a real problem or some real code. You must have some
> > target problem in mind?
>
> I'll butt back in here.
>
snip
>
> The result was a call from the block driver to the md driver with a
> lock held, and a rather unexpected call back from the md driver that
> impotently tried to take the same lock.
>
> Same thread.
>
OK, in this case, it didn't sound like your block driver
expected to be re-entered. Lucky the problem immediately
caused a deadlock ;)
More seriously, lets say you get around the above with
recursive locks:
Thread 1 Thread 2 (on another cpu)
enter the block driver
take the bd lock
issue an md ioctl
take the md lock
enter the block driver
spin on bd lock
automatically call md ioctl
spin on md lock
So the problem needs a rethink.
This will, hopefully, be my out-comment on this thread.
Various persons in different branches of this thread are making and
repeating various inferences they have taken from my statements as if those
were my position. I will state my position and then let most of this drop.
But First: At some point I dropped the spinlock focus and started talking
about locking in general. I'll take the onus for that completely. My Bad.
Sorry. Now that it is in the mix, however, I will have to make it part of
the position, else there will be all sorts of "but you said here that..."
kinds of possible responses.
My Position:
1) Recursive Locking has value, it is not absolutely inherently broken, bad,
or evil as some seem to believe.
There is no a-priori "somebody messed up" absolute that can be deduced
simply because a logical structure reaches code that wants to clam a
resource that the same thread, in an outer layer of abstraction, has already
claimed. The only time it is a-priori wrong is when the locking mechanism
is designed to self-deadlock, and it is only that self-deadlock that is
"wrong by absolute definition".
2) Any application of locking (spin or otherwise) that can be written with
recursive locking can be re-written without it.
This is nearly tautology, even if it would be NP complete to prove. This is
also _exactly_ the iterative vs. recursive design principle. Anything that
can be written iteratively can be written recursively and vice versa. It is
also true that many problems exhibit an inherent bias towards one or the
other of these approaches. Any stance, in the absence of overriding
pressure, that one of these approaches should be ruled out completely for a
program or design is detrimental to that design. A real-world example of an
overriding pressure is the absence of a real stack on the 8051
micro-controller causes the compiler to turn recursive calls into expensive
iteration anyway. In the kernel design there are almost no such overriding
pressures.
2a) Any application that "is correct" with self-deadlocking locks will be
semantically identical using recursive locks.
2b) Persons relying on the self-deadlock behavior of a lock (spin or
otherwise) to "detect semantic errors" in a production environment are doing
a bad thing.
This is not intended, in any way, to minimize the value of catching errors
this way during development, especially in an open source project where all
sorts of stuff is being joined together. I think, however, that we are all
disappointed when our production systems lock up, and would prefer to have
them notice the impending problem and at least print a diagnostic.
Switching a self-deadlock spinlock for a recursive spinlock with a ceiling
of one and a linked printk or stack-trace-and-oops() would be better than a
silent lockup in a production system. The atomic spinlock system, as it
exists today, has no sense of lock owner, and so can not tell a valid wait
from a self-deadlock in an SMP setting. All but the most critical and
private locks could do with detection and reporting in the wild because the
kernel is subject to post-development vagaries caused by people adding
modules.
3) Any lock (spin or otherwise) that is directly related to a public
interface should be considered for recursive implementation.
This point does not even come close to an assertion that all such locks
"must" or even "should" be recursive. It is the simple statement that the
locking policy and model for any public interface should be examined for
possible value that could be developed because of rational aggregation or
reentrancy. That decision, having been made, should be made an express part
of the interface.
4) All locks (spin or otherwise) should obviously be held for the shortest
amount of time reasonably possible which still produces the correct result.
If this needs explaining... 8-)
5) The entity who takes a lock (programmer or code, spin or otherwise) is
responsible for knowing what he is doing when he operates on a protected
resource, and even all the non-protected resources.
This is another corollary of the fact that any entity that uses a common
anything (in life or in code) is responsible for its actions. Locking does
not ensure proper or meaningful operations will be applied to the protected
object, it only ensures that nobody else will be getting in the way. Any
argument that contains "what if the programmer takes a lock and then
calls/invokes something that..." is specious. The programmer who takes a
lock and/or invokes an action, bares the responsibility for his operations
whether he wrote the code locally or not. That's just life.
=====
I will now provide one concrete and specific example of a 2.5 kernel
facility (public interface, though not one with an _operations structure)
that should be protected by a recursive lock. This is NOT because I have a
specific application in mind, but because I can conceive of several possible
scenarios and a related body of "overriding concern". I don't have the code
right here to say that this is, or isn't, a spinlock per se. Analysis of
the interface documentation and the obvious presence of "some kind of lock,
probably in spinlock kinds of time quanta" is enough.
The new workqueue, and it's associated work_struct, system "really ought to
be" recursively protected.
That is, there should be a lock_workqueue() and an unlock_workqueue() added
to the public interface, and calling queue_work(), queue_delayed_work(), and
cancel_delayed_work() (etc) should be legal exactly as they exist today
(e.g. atomically), and also legal between calls of the proposed
lock_workqueue() and unlock_workqueue() functions.
Clearly using lock_workqueue() and unlock_workqueue() against "events" (the
predefined system workqueue) should be "frowned upon", but even that should
still be legal.
The reasoning is simple. It is almost certain, or at least far from
inconceivable, that a high-performance interface with obvious/high user
visibility (In one email I used an AGP rendering pipeline), or subsystem
significance, could find itself needing to atomically perform more than one
insert/remove/modify action on a (hopefully private) work queue.
I make no statement that this would be a "good design" but I can see cases
where it could come up. I also hope that the reader understands that there
could be other reasonable combinations/aggregations of operations other than
the two adjacent cancel operations I cite below; the dual cancel is just a
good simple but concrete possibility.
So anyway, in the current public self-deadlocking interface, if some module
writer needed to do, say, an atomic cancel of several particular
work_struct(s), but he didn't want to flush the entire workqueue, he would
be obliged to read the implementation, take the locks himself and then
hack-apart and reassemble the workqueue's linked list in local code.
This is bad, and God help the kernel if this module is loaded into a later
kernel with a tweaked workqueue semantic.
In keeping with principle 2, it is tautological that we could turn
queue_work(), queue_delayed_work(), cancel_delayed_work() et. al. into a two
layer affair with each calling their respective queue_work_not_locked()
versions internally after having claimed the lock, then publicize the
lock/unlock_workqueue() calls and all the *_not_locked() calls for people
who need aggregation. This, however, would then more than double the size
of the interface, damage clarity, and otherwise make things ugly and un-fun.
Another principle 2 work around would be to have a "technically not
recursive" (spin)lock and build tests into the various routines that detect
the outer lock_workqueue() activity and branch around the actual lock
attempts. That's just recursive locking in sheep's clothing.
It is much better to let this module writer construct a sequence that looks
like: lock, cancel, cancel, (whatever), unlock; that uses the regular and
well defined top-level calls like cancel_delayed_work().
The fact is, if the lock built into the workqueue system is recursive,
persons who need to make aggregate changes to a queue in an atomic way are
provided for in an optimally useful and safe way. It is, of course,
possible that an idiot could clog up an SMP machine by misusing this outer
lock facility improperly. But an idiot can clog up an SMP machine by
misusing almost any (spin)lock he cares to take, so that is a push.
So here is one public interface that could stand the scrutiny and has a
fairly obvious case where recursive locking is a big win, or at least a much
bigger lose for non-recursive locking combined with a near certainty of
desirable aggregate but still short-duration activities.
The workqueue example came to mind almost instantly and I would be stunned
if a simultaneously rigorous but open minded survey didn't yield at least a
few more "fairly obvious" places where a recursive locking strategy wouldn't
pay off big-time.
==== for those who need some flesh on the example, that is all that follows
====
Disclaimer: this is given an informed lay-opinion of what happens in
graphics subsystems garnered mostly from sampling manual pages and reading
magazines. It does, however, map to other possible problem sets, so it is
good enough for this discussion.
Consider a graphics rendering module that has a "draw mesh" ioctl/interface.
A mesh is a vector of points (list of x, y coordinates and supporting data
like color etc) that combine together to make a bunch of triangles that
share vertices with their neighbors and get drawn and filled in by the
graphics card to create a possibly-non-flat object.
Consider that each graphics card has different features and optimal and
maximum sizes for things like meshes.
Now imagine you are designing the gaming driver interface (think DirectDraw
for windows etc.) you don't want to limit the largest mesh acceptable in
the interface definition to the smallest "largest mesh" you are likely to
encounter in the cheapest video card a user might buy.
So you build in a feature that, when a mesh is submitted to the rendering
system, it gets compared to the current card, and it the mesh is "too big"
it is cut into three parts. (the left part, the right part, and a meta-mesh
to "zipper over" the seam between the two.)
The system is also smart enough to detect an SMP machine and launch several
high-priority kernel threads to do share the rendering. This is, after all
for a game, where display speed is the most important thing... 8-)
So now, the game sends in a mesh, which just got carved into three (or maybe
more, but lets stick with three) pieces.
So then, the game discovers that that mesh really needs to be canceled, or
the rendering engine decides that the whole thing is behind a wall and
should just be discarded to save time, or the entire frame is going to be
canceled to make up for a network delay (etc.). If you cancel the mesh
fragments (take them out of their work queue) piece-at-a-time you might make
an ugly mess (say if only the small "zipper" segment made it onto the
display surface) and disappoint your fans and users (both players and game
programmers).
So there you have it, some programmer focused entirely on his pretty picture
simply _must_ *selectively* cull his workqueue to get rid of all of
"mesh/frame X".
He could, and probably will, walk the workqueue himself, which is highly
undesirable from a system stability standpoint. He doesn't really care if
he globs up the SMP CPUs to do it because they would otherwise be wasting
their cycles in the more intensive task of processing data that will never
see phosphor, and he has no choice but to ignore the published interface
"just this once" because that interface would leave him with public mud on
his face as bits leaked onto CPUs between the individual cancel operations.
I can see it happening... pretty easily in fact... and in all sorts of
other kinds of deferred tasking models...
I can see a recursive locking making it useful for the programmer to stick
to the public interface and therefore make the system more stable...
I could also see the manual page for lock_workqueue() having a big bold
disclaimer reading "USING THIS ROUTINE IS ALMOST ALWAYS THE WRONG THING TO
DO but..."
Rob.
Robert White writes:
> This will, hopefully, be my out-comment on this thread.
>
[...]
>
> 4) All locks (spin or otherwise) should obviously be held for the shortest
> amount of time reasonably possible which still produces the correct result.
>
> If this needs explaining... 8-)
It surely does.
Consider two loops:
(1)
spin_lock(&lock);
list_for_each_entry(item, ...) {
do something with item;
}
spin_unlock(&lock);
versus
(2)
list_for_each_entry(item, ...) {
spin_lock(&lock);
do something with item;
spin_unlock(&lock);
}
and suppose they both are equally correct. Now, in (2) total amount of
time &lock is held is smaller than in (1), but (2) will usually perform
worse on SMP, because:
. spin_lock() is an optimization barrier
. taking even un-contended spin lock is an expensive operation, because
of the cache coherency issues.
>
[...]
>
> Rob.
>
Nikita.
Nikita Danilov wrote:
> Consider two loops:
>
> (1)
>
> spin_lock(&lock);
> list_for_each_entry(item, ...) {
> do something with item;
> }
> spin_unlock(&lock);
>
> versus
>
> (2)
>
> list_for_each_entry(item, ...) {
> spin_lock(&lock);
> do something with item;
> spin_unlock(&lock);
> }
>
> and suppose they both are equally correct. Now, in (2) total amount of
> time &lock is held is smaller than in (1), but (2) will usually perform
> worse on SMP, because:
>
> . spin_lock() is an optimization barrier
>
> . taking even un-contended spin lock is an expensive operation, because
> of the cache coherency issues.
This is a tradeoff. If the total running time is "short", use (1) for
performance.
If the running time is "long" use (2) to avoid lock contention.
"long" time happens when the time wasted by other processors spinning
typically exceed the time wasted by repeated lock+unlock, or there
is excessive latency on some irq-blocking lock.
You can get the best of both worlds (low latency and few lock operations)
like this:
while(more work to do) {
spin_lock(&lock);
process one suitably sized batch of items
spin_unlock(&lock);
}
This sort of thing certainly helped the VM system.
Helge Hafting
On Fri, May 23, 2003 at 11:22:11AM +0400, Nikita Danilov wrote:
> and suppose they both are equally correct. Now, in (2) total amount of
> time &lock is held is smaller than in (1), but (2) will usually perform
> worse on SMP, because:
> . spin_lock() is an optimization barrier
> . taking even un-contended spin lock is an expensive operation, because
> of the cache coherency issues.
All good. Also, the arrival rate (i.e. frequency of lock acquisition)
is more important to lock contention than hold time, so they're actually
not being as friendly to big SMP as the comment from Robert White would
suggest. The arrival rate tends to be O(cpus) since whatever codepath
pounds on a lock on one cpu can be executed on all simultaneously.
-- wli
William's comment below, and its direct ancestor from Nikita, seem to
support the recursive locking approach to the locking (of some) resources.
I didn't go into number 4 myself because I was (rightly) sure that the terms
"shortest amount of time reasonably possible" and "correct result" would be
points of contention.
If Nikita's case 1 is indeed "better" on an SMP system (I take no official
position, even though one could be easily deduced from the sum of my
previous arguments... 8-)
> spin_lock(&lock);
> list_for_each_entry(item, ...) {
> do something with item;
> }
> spin_unlock(&lock);
and case 2 is "worse"
> list_for_each_entry(item, ...) {
> spin_lock(&lock);
> do something with item;
> spin_unlock(&lock);
> }
and since each workqueue primitive op is essentially:
workqueue_op() {
lock_workqueue()
perform_actual_primitive_op()
unlock_workqueue()
}
Then the existing solution of calling several workqueue operations as they
exist today (provable by induction) always decompose to case 2.
Calling just one workqueue_op is nominally case 1 (and indistinguishable
from case 2 when the list-length is 1).
So the existing model, for successive (aggregate) workqueue ops greater than
one nets the SMP unfriendly and non-optimal result described in case 2.
(ignoring, for now, the ratio of locked to unlocked instructions in each
particular primitive)
IF, however, an aggregator knows he is going to do several ops in succession
(and wants the aggregation to be atomic) and takes the recursive lock in an
outer context manually, then the event stream:
lock_wokrqueue()
list_for_each_entry(item...) {
some_workque_op(current_item)
}
unlock_workqueue()
by induction actually becomes:
lock_wokrqueue()
list_for_each_entry(item...) {
lock_workqueue(); // Factoring point
perform_actual_primitive_op();
unlock_workqueue(); // Factoring point
}
unlock_workqueue();
At the marked "factoring points" the code is effectively reduced from taking
a lock to something like "if owner == me then increment/decrement
lock_depth" because the ownership is tautological at all factoring points.
(As is the presence of the lock data in the cache etc(?). But even if
everything is inline the compiler probably still can't get rid of the test
because the parts of the lock are almost certainly declared volatile.)
This leads to a couple of questions (not rhetorical, I genuinely don't
know): (and answers probably very by platform)
1) Does the barrier() and test_and_set() operators (e.g. xchgb on an x86),
when executed on the non-lock-owner CPU(s) invalidate the relevant cache
element(s) of the CPU that owns the lock?
If not
2) Do we care that (in a recursive lock) the alterations of the "depth"
value will invalidate the cache elements of the non-owner CPU(s) that are
waiting for the lock to free up (given that the final unlock will commit at
least one such invalidate anyway)?
If the answer (to #1) is that the owning CPUs cache is not invalidated by a
competitors active competition, then recursive locking is extremely cheap
even if the compiler can't get rid of the conditional.
regardless (and this one is a little rhetorical 8-)
3) Does the always-evaluated conditional (owner == me) and its associated
low-level jump/branch impose sufficient unacceptable cost compared to the
complexity of the actual protected operations on the proposed public
interface(s) to justify always forcing the case-2 model? (e.g. most
manhandle_workqueue operations in the example are way more complex than the
cost of the ever-evaluated conditional.)
and of course, for any considered public interface:
4) Do the primitive operations of each/any considered interface perform most
of their task (over the domain of time) with the lock held or released?
5) For a particular interface, does the flexibility of a recursive lock
outweigh the probability of someone coding each/any stupid error and/or the
costs of adding the code to prevent same?
That is, for any public interface that is selected for recursive locking
there will always be some set of primitive operations which can't *ever*
make sense when aggregated (invoked with the lock held in the aggregating
context). In the case of the workqueue example, if you take the lock and
then call the wait-for-workqueue-to-empty-itself call, you will wait forever
because no CPU can take a task off of the queue 'cause it's locked... And
as obvious as that may be to most people, there are going to be people out
there making those kinds of mistakes. The follow-on decision to add
check-primitives to various attractive nuisances is non free.
Rob.
-----Original Message-----
From: William Lee Irwin III [mailto:[email protected]]
Sent: Friday, May 23, 2003 5:19 AM
To: Nikita Danilov
Cc: Robert White; Nick Piggin; [email protected]; Rik van Riel; David
Woodhouse; [email protected]; Martin J. Bligh;
[email protected]; [email protected]
Subject: Re: recursive spinlocks. Shoot.
On Fri, May 23, 2003 at 11:22:11AM +0400, Nikita Danilov wrote:
> and suppose they both are equally correct. Now, in (2) total amount of
> time &lock is held is smaller than in (1), but (2) will usually perform
> worse on SMP, because:
> . spin_lock() is an optimization barrier
> . taking even un-contended spin lock is an expensive operation, because
> of the cache coherency issues.
All good. Also, the arrival rate (i.e. frequency of lock acquisition)
is more important to lock contention than hold time, so they're actually
not being as friendly to big SMP as the comment from Robert White would
suggest. The arrival rate tends to be O(cpus) since whatever codepath
pounds on a lock on one cpu can be executed on all simultaneously.
-- wli
Someone's probably already suggested this, but here goes...
If you REALLY have places where multiple levels may try to grab the same
spinlock, and you would therefore have lock contention with yourself,
why not just use multiple spinlocks? So if you often call B(), which
grabs lock X, but sometimes you call A() which also needs lock X, but
A() calls B() which therefore causes self contention, why not have A()
use lock Y instead? Treat what A() uses as a separate resource? Or
here's another idea: Why not have A() unlock before calling B()?
Oh, and there's the obvious idea of passing parameters around which
indicate whether or not a lock has been taken, although that's certainly
a pain.
I do see some value in tracking lock ownership. But what do you do
about the unlocks? Do you keep a refcount and unlock when it reaches
zero? Now, not only do you have to deal with locks across function
calls, but you have to make sure everything unravels properly. Sounds
difficult.