Greg KH wrote:
> On Sun, Jul 07, 2002 at 03:42:56PM -0700, Dave Hansen wrote:
>
>>BKL use isn't right or wrong -- it isn't a case of creating a deadlock
>>or a race. I'm picking a relatively random function from "grep -r
>>lock_kernel * | grep /usb/". I'll show what I think isn't optimal
>>about it.
>>
>>"up" is a local variable. There is no point in protecting its
>>allocation. If the goal is to protect data inside "up", there should
>>probably be a subsystem-level lock for all "struct uhci_hcd"s or a
>>lock contained inside of the structure itself. Is this the kind of
>>example you're looking for?
>
>
> Nice example, it proves my previous points:
> - you didn't send this to the author of the code, who is very
> responsive when you do.
> - you didn't send this to the linux-usb-devel list, which is a
> very responsive list.
> - this is in the file drivers/usb/host/uhci-debug.c, which by
> its very nature leads you to believe that this is not critical
> code at all. This is true if you look at the code.
> - it looks like you could just remove the BKL entirely from this
> call, and not just move it around the kmalloc() call. But
> since I don't understand all of the different locking rules
> inside the uhci-hcd.c driver, I'm not going to do this. I'll
> let the author of the driver do that, incase it really matters
> (and yes, the locking in the uhci-hcd driver is tricky, but
> nicely documented.)
> - this file is about to be radically rewritten, to use driverfs
> instead of procfs, as the recent messages on linux-usb-devel
> state. So any patch you might make will probably be moot in
> about 3 days :) Again, contacting the author/maintainer is
> the proper thing to do.
You are taking this example way too seriously. Thunder wanted an
example and I grabbed the first one that I saw (I created it in the
last hour). I showed you how I arrived at it, just a quick grepping.
It wan't a real patch, only a quick little example snippet.
> - even if you remove the BKL from this code, what have you
> achieved? A faster kernel? A very tiny bit smaller kernel,
> yes, but not any faster overall. This is not on _any_
> critical path.
How many times do I have to say it? We're going around in circles
here. I _know_ that it isn't on a critical path, or saving a large
quantity of program text. I just think that it is better than it was
before.
--
Dave Hansen
[email protected]
On Sun, Jul 07, 2002 at 05:07:05PM -0700, Dave Hansen wrote:
>
> You are taking this example way too seriously. Thunder wanted an
> example and I grabbed the first one that I saw (I created it in the
> last hour). I showed you how I arrived at it, just a quick grepping.
> It wan't a real patch, only a quick little example snippet.
I know that, you're taking my response way too seriously :)
I just showed the typical response to one of your posts, you just picked
the wrong example, or maybe any USB example you could have picked would
have evicted much the same response :)
> > - even if you remove the BKL from this code, what have you
> > achieved? A faster kernel? A very tiny bit smaller kernel,
> > yes, but not any faster overall. This is not on _any_
> > critical path.
>
> How many times do I have to say it? We're going around in circles
> here. I _know_ that it isn't on a critical path, or saving a large
> quantity of program text. I just think that it is better than it was
> before.
So you agree with me? Good. I know you think the code is better than
it was before, but beauty is in the eye of the beholder, or in this
case, the eye of the people that fully understand the code :)
If nothing else, I hope you will think twice before sending off your
next BKL removel patch in a subsystem that you haven't fully tested or
understood. That's the point I keep trying to get across here.
thanks,
greg k-h
So you agree with me? Good. I know you think the code is better
than it was before, but beauty is in the eye of the beholder, or in
this case, the eye of the people that fully understand the code :)
The problem is, of course, that to responsibly use the BKL, you must
fully understand ALL the code that utilizes it, so that you know your
new use of it doesn't conflict or interfere with existing code and
usage. That's the same problem we have when it DOES show contention --
is the problem in the functions which can't grab it (for trying
unnecessarily), or in the functions that can (for holding it
unnecessarily)?
If you are the person who understands the BKL in all its usages
throughout the kernel, then thankfully, our search is over. We've been
looking for you to help resolve some of these discussions. If you
aren't that person, though, then you can't accurately say your use of
it doesn't affect anybody else adversely. All you can assert is that
in your corner of the kernel, *you* use it for X, Y, and Z and they
don't interact poorly with each other.
With a narrowly defined and used lock, it is much less difficult to
determine who uses it, what it is guarding, and what impact yet another
use of it will have. With the BKL (and a few other poorly documented
locks which have recently been cleaned up) nobody has had a hope of
understanding the interactions.
If nothing else, I hope you will think twice before sending off
your next BKL removel patch in a subsystem that you haven't fully
tested or understood. That's the point I keep trying to get across
here.
So can you define for me under what conditions the BKL is appropriate
to use? Removing it from legitimate uses would be bad, of course, but
part of the problem here is that it's currently used for a variety of
unrelated purposes.
Rick
On Mon, Jul 08, 2002 at 06:46:12PM -0700, Rick Lindsley wrote:
> So you agree with me? Good. I know you think the code is better
> than it was before, but beauty is in the eye of the beholder, or in
> this case, the eye of the people that fully understand the code :)
>
> The problem is, of course, that to responsibly use the BKL, you must
> fully understand ALL the code that utilizes it, so that you know your
> new use of it doesn't conflict or interfere with existing code and
> usage. That's the same problem we have when it DOES show contention --
> is the problem in the functions which can't grab it (for trying
> unnecessarily), or in the functions that can (for holding it
> unnecessarily)?
{Sigh}
But the driverfs and USB code do _not_ show contention. Or if they do
(as Dave pointed out in the driverfs code) it's in a non-critical
point in the kernel's life (boot time, USB device removal, etc.)
> If you are the person who understands the BKL in all its usages
> throughout the kernel, then thankfully, our search is over. We've been
> looking for you to help resolve some of these discussions. If you
> aren't that person, though, then you can't accurately say your use of
> it doesn't affect anybody else adversely. All you can assert is that
> in your corner of the kernel, *you* use it for X, Y, and Z and they
> don't interact poorly with each other.
Yes, that's what I am asserting, _and_ that when my "corner" of the
kernel uses it, it doesn't really matter if ext3, ext2, or even the
entire VFS layer grinds to a halt. No one will care, or even notice.
> So can you define for me under what conditions the BKL is appropriate
> to use? Removing it from legitimate uses would be bad, of course, but
> part of the problem here is that it's currently used for a variety of
> unrelated purposes.
Wait! You're still not getting it. I'm not against removing the BKL.
I'm not saying we should add it to more places. What I am complaining
about (and I'm not the only one) is the method that people who are
trying to remove the BKL from various kernel subsystems are using. I
think Oliver said it best in this quote from this thread:
So you do the greps and question locking usage on the
right lists. It's quite important, you might uncover
bugs and speed up the kernel. And you need to be
persistent about it. Take your time to find out why
the lock is used. Or why it makes no sense.
So please, no more "hit and run" BKL patches that break things. Please
come offering to help, with detailed reasons why BKL usage is wrong in
the specific portions of the code, and how possibly it might be cleaned
up, with patches that have _actually been tested_. And then
flames^Wdiscussions like this one will not happen at all.
thanks,
greg k-h
> If nothing else, I hope you will think twice before sending off
> your next BKL removel patch in a subsystem that you haven't fully
> tested or understood. That's the point I keep trying to get across
> here.
>
>So can you define for me under what conditions the BKL is appropriate
>to use? Removing it from legitimate uses would be bad, of course, but
>part of the problem here is that it's currently used for a variety of
>unrelated purposes.
If the trade-offs weigh in about the same, removing the BKL from
legitimate uses in favor of a different (neither better nor worse)
approach would be more than acceptable, would it not?
Would creating a few new names for lock_kernel() and friends be
acceptable? Just a few macros to give slightly more meaningful names to
each function call for 2.5. Then take lock_kernel() entirely away (the
name, not the function), in 2.7. By 2.9 it should be able to be removed
from nearly all "inappropriate" uses. This seems like it would encourage
more explicit usage of the BKL, while giving maintainers ample time to
comply.
Note that I have never added or removed a lock from the kernel. I am
simply thinking aloud; half hoping to be corrected.
--Drew Vogel
> The problem is, of course, that to responsibly use the BKL, you must
> fully understand ALL the code that utilizes it, so that you know your
> new use of it doesn't conflict or interfere with existing code and
> usage.
Indeed.
v
> With a narrowly defined and used lock, it is much less difficult to
^
If you were talking about replacing a big lock with one lock, you
might have a point. But you aren't. You can't be, because by
definition if you take out the big lock you have to put in lots
of little locks. And then you get discover all the problems that
the BKL was hiding that you just exposed by removing it.
If you think that managing those is easier than managing the BKL,
you don't understand the first thing about threading.
I think the kernel crowd is starting to sense how complex things
are getting and are pushing back a bit. Don't fight them, this
isn't IRIX/Dynix/PTX/AIX/Solaris. It's Linux and part of the
appeal, part of the reason you are here, is that it is (was) simple.
All you are doing is suggesting that it should be more complex.
I don't agree at all.
> So can you define for me under what conditions the BKL is appropriate
> to use?
Can you tell me for sure that there are no races introduced by your
proposed change?
Can you tell me the list of locks and what they cover in the last
multi threaded OS you worked in? I thought not. Nobody could.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
Hi,
On Mon, 8 Jul 2002, Larry McVoy wrote:
> I think the kernel crowd is starting to sense how complex things
> are getting and are pushing back a bit. Don't fight them, this
> isn't IRIX/Dynix/PTX/AIX/Solaris. It's Linux and part of the
> appeal, part of the reason you are here, is that it is (was) simple.
> All you are doing is suggesting that it should be more complex.
> I don't agree at all.
Why do you think it was simple before?
bye, Roman
I wrote:
> The problem is, of course, that to responsibly use the BKL, you must
> fully understand ALL the code that utilizes it, so that you know your
> new use of it doesn't conflict or interfere with existing code and
> usage. That's the same problem we have when it DOES show contention --
> is the problem in the functions which can't grab it (for trying
> unnecessarily), or in the functions that can (for holding it
> unnecessarily)?
Greg responded:
But the driverfs and USB code do _not_ show contention. Or if they
do (as Dave pointed out in the driverfs code) it's in a
non-critical point in the kernel's life (boot time, USB device
removal, etc.)
Ok, I think this underscores something I haven't seen clearly stated
here before. Yes, if you can demonstrate contention, you have a
problem that needs fixing, and I think few would argue it shouldn't be
addressed.
However, I look at this from a supportability point of view as well --
a lock such as the BKL which is used for multiple reasons by multiple
subsystems, when it DOES show contention, is harder to fix. People
trying to fix old code wonder if a particular section really needs the
BKL, and give up figuring it out because its too hard.
So while contention can bring a lot of (unwanted) attention to a
spinlock, including the BKL, I want to see the BKL reduced even if it's
NOT showing contention, because it makes it easier to debug the cases
where it IS being used. Unless a developer is relying on the
release-on-sleep mechanism or the nested-locks-without-deadlock
mechanism, there's no reason an instance of the BKL can't be replaced
with another spinlock.
Arguing that efforts to replace it have indicated we don't know
how it is really used just supports my point. I don't think anybody
exists who can claim (or will admit :) that they know why the BKL is
used everywhere it is. We either need to develop that person (and keep
them safe from harm!) or make it easier to understand "on the fly".
I believe that reducing the varied usages, even harmless ones", will
make it easier to either clearly document the remaining usages or
to understand it "on the fly".
Rick
Wait! You're still not getting it. I'm not against removing the
BKL. I'm not saying we should add it to more places. What I am
complaining about (and I'm not the only one) is the method that
people who are trying to remove the BKL from various kernel
subsystems are using.
[ ... ]
So please, no more "hit and run" BKL patches that break things.
Please come offering to help, with detailed reasons why BKL usage
is wrong in the specific portions of the code, and how possibly it
might be cleaned up, with patches that have _actually been
tested_.
I understand. But there's a bit of a problem here. We don't have
every device available to us that uses the BKL. The maintainers, I
presume, do. While we can inspect to our little hearts' content, and
code up "obviously" correct patches, nothing really tests it like,
well, testing it. In the past, in writing to maintainers, we have
sometimes found that a) they ignore email that does not include actual
patches, or b) they don't know themselves if it is safe, and are afraid
or unwilling to try it themselves. So there's no means to discuss
changes, no means to test them, and (some) maintainers then refuse to
apply ANY changes.
(Others take the opposite approach, and apply anything that looks like
a patch -- ALSO probably not the desired result :)
This is not to say that you, as a maintainer, behave this way. But we
HAVE observed this behavior in our efforts. What is the proper
response when a maintainer won't apply a change, won't help test it,
and won't even help derive a correct one? About all that's left at
that point, it seems, is to provide a patch, appeal to the community at
large, and ask them to try it or comment upon it.
Rick
On Tue, Jul 09, 2002 at 12:33:13PM -0700, Rick Lindsley wrote:
> Wait! You're still not getting it. I'm not against removing the
> BKL. I'm not saying we should add it to more places. What I am
> complaining about (and I'm not the only one) is the method that
> people who are trying to remove the BKL from various kernel
> subsystems are using.
>
> [ ... ]
>
> So please, no more "hit and run" BKL patches that break things.
> Please come offering to help, with detailed reasons why BKL usage
> is wrong in the specific portions of the code, and how possibly it
> might be cleaned up, with patches that have _actually been
> tested_.
>
> I understand. But there's a bit of a problem here. We don't have
> every device available to us that uses the BKL.
Wait. The patches I specifically mention are in regards to devices that
both of you should have access to. If not, you could have simply walked
over and asked me for one. A single USB mouse or keyboard would have
proved that both the input and the driverfs patches were wrong. Since
the patch was for these types of subsystems, not even attempting to test
it out is indefensible.
> The maintainers, I presume, do.
This too is incorrect. I do not have some devices for drivers that I
have created and maintain. I think this is also true for the majority
of the joystick drivers. Granted, this isn't the best thing, but
companies aren't giving us free devices to write our free drivers for.
We rely on users for testing sometimes (debugging through email does
work, albeit slowly :)
> What is the proper response when a maintainer won't apply a change,
> won't help test it, and won't even help derive a correct one?
Post it to a public mailing list (like lkml) where people in general
care about these things. Enough hounding will either shame the
maintainer into replying, or you might find yourself becoming the
maintainer if no one steps up (like ATM :)
greg k-h
On Tue, Jul 09, 2002 at 12:31:13PM -0700, Rick Lindsley wrote:
> Unless a developer is relying on the release-on-sleep mechanism or the
> nested-locks-without-deadlock mechanism, there's no reason an instance
> of the BKL can't be replaced with another spinlock.
Um, not true. You can call schedule with the BKL held, not true for a
spinlock.
And see the oft repeated messages on lkml about the spinlock/semaphore
hell that some oses have turned into when they try to do this. Let's
learn from history this time around please.
greg k-h
I wrote:
> Unless a developer is relying on the release-on-sleep mechanism or the
> nested-locks-without-deadlock mechanism, there's no reason an instance
> of the BKL can't be replaced with another spinlock.
Greg replied:
Um, not true. You can call schedule with the BKL held, not true for a
spinlock.
Well, this is what I meant by the release-on-sleep. When you go to
sleep, the BKL is actually released in the scheduler.
Anything which is relying on this behavior needs to be redesigned. It
is holding, at best, an advisory lock. The lock, after all, is only in
effect until you go to sleep. No other lock behaves this way. Is
there any code in the kernel which MUST have this feature? The code
should be written so that the code itself releases any locks it holds
before putting itself to sleep. "magic" which happens simply because
you down'ed a semaphore is just more obfuscated code to the uninitiated
and more trouble to even the informed.
And see the oft repeated messages on lkml about the
spinlock/semaphore hell that some oses have turned into when they
try to do this. Let's learn from history this time around please.
Fire burns, but we seem to have gotten a handle on it.
Proper locking is hard, but that's not a reason to ignore it. Be
careful, yes. Avoid, no.
Rick
On Tue, Jul 09, 2002 at 01:55:55PM -0700, Rick Lindsley wrote:
> Well, this is what I meant by the release-on-sleep. When you go to
> sleep, the BKL is actually released in the scheduler.
> Anything which is relying on this behavior needs to be redesigned. It
This is an ugly aspect. But AFAICT the most that's needed to clean it
up is an explicit release before potentially sleeping.
Cheers,
Bill
On Tue, 2002-07-09 at 14:00, William Lee Irwin III wrote:
> This is an ugly aspect. But AFAICT the most that's needed to clean it
> up is an explicit release before potentially sleeping.
Yep that is all we need... remove the release_kernel_lock() and
reacquire_kernel_lock() from schedule and do it explicitly everywhere it
is needed.
The problem is, it is needed in a _lot_ of places. Mostly instances
where the lock is held across something that may implicitly sleep.
Places that call schedule() explicitly holding the BKL are rare enough
we can probably handle them. I have a patch that does so (thus turning
all cond_resched() calls into no-ops with the preemptive kernel -- my
goal). The other implicit situations are near impossible to handle.
Summary is, I would love to do things like dismantle the BKLs odd-ball
features... cleanly and safely. Good luck ;)
Robert Love
On Tue, Jul 09, 2002 at 02:12:55PM -0700, Robert Love wrote:
> Summary is, I would love to do things like dismantle the BKLs odd-ball
> features... cleanly and safely. Good luck ;)
This is even automatable. "Can doing this schedule me away?" and "Is
the BKL held here?" can be handled by machines just fine so long as the
answer to the latter is not "sometimes", and when it is, it can be auto-
identified and fixed up by hand. Given something that can parse a cscope
database and maybe do some transitive closure operations on subgraphs of
the call graph this should be doable in a few days on decent hardware.
There's also a nice advantage in it being harder to screw up than removal.
Callbacks can of course be handled by relaxing the condition to "may call"
and propagating values around, though that might be done better by
stealing a lint front end as opposed to a cscope database, especially for
taking advantage of type information. For instance, the callbacks will be
a component with a given name in a struct of a given type. So finding the
set of all functions assigned to that component of a variable having that
struct as its type should suffice to bang out the "may call" relation then.
All of the above actually applies to "call function F under lock L", so
the effort could be reused to, for instance, find sleep under spinlock
scenarios, or failures to hold L while calling F. Last, but not least,
it may well be the case that all the code has already been written, and
is waiting for someone to use it. There are some open-source static
checkers coming out it seems.
... if only it weren't such a PITA to write...
Cheers,
Bill
> Places that call schedule() explicitly holding the BKL are rare enough
> we can probably handle them. I have a patch that does so (thus turning
> all cond_resched() calls into no-ops with the preemptive kernel -- my
> goal). The other implicit situations are near impossible to handle.
There are lots of them hiding 8)
> Summary is, I would love to do things like dismantle the BKLs odd-ball
> features... cleanly and safely. Good luck ;)
You can actually do it with some testing to catch the missed cases. Move
them to spinlocks, lob a check if the lock is held into the schedule code
run it for a while through its various code paths, then remove the debug
once you trust it
[kernel-janitor people please Cc: me if you're going to remove l-k from
the Cc: list]
On Mon, 8 Jul 2002, Larry McVoy wrote:
[...]
> If you were talking about replacing a big lock with one lock, you
> might have a point. But you aren't. You can't be, because by
> definition if you take out the big lock you have to put in lots
> of little locks. And then you get discover all the problems that
> the BKL was hiding that you just exposed by removing it.
[...]
> Can you tell me the list of locks and what they cover in the last
> multi threaded OS you worked in? I thought not. Nobody could.
Larry, there's something I've always wanted to ask you about your
idea of the "locking cliff": when you're counting the number of locks,
are you looking at the running image of an OS or at the source?
I mean, suppose we have a file object, with its private lock used
to protect access to its "internals" (whatever that means): of course
we may have thousands of these locks living in the running image of
the OS at a given time (most of them unused, BTW). At source level,
it is only one lock, instead. So are you worried about the proliferation
of locks in the source or in the live image of the OS? (I can see
arguments against both.)
.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ [email protected]
On Tue, Jul 09, 2002 at 11:21:25PM +0100, Alan Cox wrote:
>
> There are lots of them hiding 8)
Just out of curisoty. If I remember correctly SMP came to Linux when
Caldera hired you to make it work. Did you invent the BKL?
Thanks,
Jim
> On Tue, Jul 09, 2002 at 11:21:25PM +0100, Alan Cox wrote:
> >
> > There are lots of them hiding 8)
>
> Just out of curisoty. If I remember correctly SMP came to Linux when
> Caldera hired you to make it work. Did you invent the BKL?
Caldera bought the hardware, rather than hiring me. Having said that at
the time the dual P90 board + processors was not exactly cheap. The board
btw is alive and well and currently owned by Dave Jones.
As far as the locking goes I invented the big kernel lock, but the basis of
that is all taken directly from "Unix systems for modern architectures"
by Schimmel which is required reading for anyone who cares about caches,
SMP and locking.
I'd prefer the trees to be separate for testing purposes: it
doens't make much sense to have SMP support as a normal kernel
feature when most people won't have SMP anyway"
-- Linus Torvalds
Alan
On Wed, Jul 10, 2002 at 12:03:08PM +0200, Marco Colombo wrote:
> Larry, there's something I've always wanted to ask you about your
> idea of the "locking cliff": when you're counting the number of locks,
> are you looking at the running image of an OS or at the source?
Larry normally talks about the number of conceptual locks. So in order
to manipulate a `struct file', it really doesn't matter whether you have
to grab the BKL, the files_struct lock or the filp->lock. There's a big
difference if you have to grab the filp->pos_lock, the filp->ra_lock and
the filp->iobuf_lock. You'd have to know what order to grab them in,
for a start.
--
Revolutions do not require corporate support.
On Wed, Jul 10, 2002 at 12:03:08PM +0200, Marco Colombo wrote:
>> Larry, there's something I've always wanted to ask you about your
>> idea of the "locking cliff": when you're counting the number of locks,
>> are you looking at the running image of an OS or at the source?
On Wed, Jul 10, 2002 at 03:40:03PM +0100, Matthew Wilcox wrote:
> Larry normally talks about the number of conceptual locks. So in order
> to manipulate a `struct file', it really doesn't matter whether you have
> to grab the BKL, the files_struct lock or the filp->lock. There's a big
> difference if you have to grab the filp->pos_lock, the filp->ra_lock and
> the filp->iobuf_lock. You'd have to know what order to grab them in,
> for a start.
This is called "lock depth" and is not related to the total number of
locks declared in the source. AFAIK no one wants to increase lock depth.
Cheers,
Bill
I wrote:
v
> With a narrowly defined and used lock, it is much less difficult to
^
Larry responded:
If you were talking about replacing a big lock with one lock, you
might have a point. But you aren't. You can't be, because by
definition if you take out the big lock you have to put in lots
of little locks. And then you get discover all the problems that
the BKL was hiding that you just exposed by removing it.
And this is bad ... how? A bad locking mechanism, or incorrect usage of
a good one, is exposed. Should it not be?
If you think that managing those is easier than managing the BKL,
you don't understand the first thing about threading.
So threading is difficult to manage or even impossible without a global
lock for everyone to use? True -- I suppose that would force people to
think about what they are locking and which lock is appropriate to avoid
unnecessary contention. It would require that the new locks' scopes and
assumptions be documented instead of handed down verbally from
teacher to student. It would have the side effect of making it easier
for a newcomer to come up to speed on a particular section of code, thus
allowing a greater number of people to understand the code and offer fixes
or enhancements.
I think the kernel crowd is starting to sense how complex things
are getting and are pushing back a bit. Don't fight them, this
isn't IRIX/Dynix/PTX/AIX/Solaris.
In some ways, that's a shame. All of those have valuable parts to them.
It's Linux and part of the
appeal, part of the reason you are here, is that it is (was) simple.
Feel free to return to version 7 anytime. Life was simpler with single
cpus and only physical memory. It is true that more complex hardware
has demanded more complex software.
All you are doing is suggesting that it should be more complex.
I don't agree at all.
I respect your right to disagree, but I contest the assertion that
"all" we are doing is making it more complex. In general, finer
grained locking is a good thing, as it isolates contention (both
present AND future) in areas that truly are contending for the same
resource. As with all things, it is possible to overdo it. While
certainly true on SMP machines, this is especially underscored with
NUMA machines. Sometimes the cost of allowing other processors to have
access to a lock or set of data (and the attendant flushing and
reloading of caches) can actually make it more efficient to hold the
lock longer than is necessary for the simple operation you are
attempting. This is the exception, however, not the rule, and is not a
reason to abandon finer-grained locking.
> So can you define for me under what conditions the BKL is appropriate
> to use?
Can you tell me for sure that there are no races introduced by your
proposed change?
In many cases, no more than you can tell me, "for sure", what the
weather will be tomorrow. I can tell you, after inspection, that I
*believe* there are no races and you would learn, after enough times,
that I'm right more often than not. You'd learn that my word (and the
implied testing behind it) is sufficient. I don't expect anybody to
grant that right off. And even then, the world will find I'm still
wrong sometimes. This truly is the way Linux is supported now.
(And you can replace "I" with, I think, about any Linux luminary.)
The reason I asked the question is to point out that nobody CAN answer
it today. That's bad. Every subsystem should have either a maintainer
who can answer these sorts of questions, or clear documentation that
answers it for us. The BKL has (unfortunately) evolved beyond that.
This is exactly the supportability issue that reduction of the BKL will
address. Replacing an instance of it with a well-defined,
well-documented lock should make the new lock's use obvious to
newcomers, and the BKL's remaining uses marginally easier to
understand.
Can you tell me the list of locks and what they cover in the last
multi threaded OS you worked in? I thought not. Nobody could.
Off the top of my head? no. I'm not a walking encyclopedia. From
comments and supporting documentation and supporting staff? about
80%. The 20% that could not be clearly identified tended to provide a
disproportionate number of bugs, and I think most people agreed they
were undesirable from a support standpoint.
Your argument seems to be, "things are very complicated right now but
they often work, so let's not change anything." I would argue that
unless these things ARE understood, we are destined to spend a lot of
time uncovering bugs in the future -- so let's improve it now.
Rick
On Wednesday 10 July 2002 23:28, Rick Lindsley wrote:
> So threading is difficult to manage or even impossible without a global
> lock for everyone to use? True -- I suppose that would force people to
> think about what they are locking and which lock is appropriate to avoid
> unnecessary contention. It would require that the new locks' scopes and
> assumptions be documented instead of handed down verbally from
> teacher to student.
Hear hear! Well, Al at least has made a pretty good attempt to do that
for VFS locks. The rest of the kernel is pretty much a disaster. If
we're lucky, we find the odd comment here and there: 'must be called
holding such-and-such lock', and on a good day the comment is even
correct. Which reminds me of a janitorial idea I discussed briefly with
Acme, which is to replace all those above-the-function lock coverage
comments with assert-like thingies:
spin_assert(&pagemap_lru_lock);
And everbody knows what that does: when compiled with no spinlock
debugging it does nothing, but with spinlock debugging enabled, it oopses
unless pagemap_lru_lock is held at that point in the code. The practical
effect of this is that lots of 3 line comments get replaced with a
one line assert that actually does something useful. That is, besides
documenting the lock coverage, this thing will actually check to see if
you're telling the truth, if you ask it to.
Oh, and they will stay up to date much better than the comments do,
because nobody needs to to be an ueber-hacker to turn on the option and
post any resulting oopses to lkml.
> It would have the side effect of making it easier
> for a newcomer to come up to speed on a particular section of code, thus
> allowing a greater number of people to understand the code and offer fixes
> or enhancements.
Yup, and double-yup.
--
Daniel
On Thu, Jul 11, 2002 at 12:24:06AM +0200, Daniel Phillips wrote:
> Acme, which is to replace all those above-the-function lock coverage
> comments with assert-like thingies:
>
> spin_assert(&pagemap_lru_lock);
>
> And everbody knows what that does: when compiled with no spinlock
> debugging it does nothing, but with spinlock debugging enabled, it oopses
> unless pagemap_lru_lock is held at that point in the code. The practical
> effect of this is that lots of 3 line comments get replaced with a
> one line assert that actually does something useful. That is, besides
> documenting the lock coverage, this thing will actually check to see if
> you're telling the truth, if you ask it to.
>
> Oh, and they will stay up to date much better than the comments do,
> because nobody needs to to be an ueber-hacker to turn on the option and
> post any resulting oopses to lkml.
Sounds like a great idea to me. Were you thinking of something along
the lines of what I have below or perhaps something more
sophisticated? I suppose it would be helpful to have the name of the
lock in addition to the file and line number...
Jesse
diff -Naur -X /home/jbarnes/dontdiff linux-2.5.25/fs/inode.c linux-2.5.25-spinassert/fs/inode.c
--- linux-2.5.25/fs/inode.c Fri Jul 5 16:42:38 2002
+++ linux-2.5.25-spinassert/fs/inode.c Wed Jul 10 16:30:18 2002
@@ -183,6 +183,8 @@
*/
void __iget(struct inode * inode)
{
+ spin_assert_locked(&inode_lock);
+
if (atomic_read(&inode->i_count)) {
atomic_inc(&inode->i_count);
return;
diff -Naur -X /home/jbarnes/dontdiff linux-2.5.25/include/linux/spinlock.h linux-2.5.25-spinassert/include/linux/spinlock.h
--- linux-2.5.25/include/linux/spinlock.h Fri Jul 5 16:42:24 2002
+++ linux-2.5.25-spinassert/include/linux/spinlock.h Wed Jul 10 16:20:06 2002
@@ -118,6 +118,18 @@
#endif /* !SMP */
+/*
+ * Simple lock assertions for debugging and documenting where locks need
+ * to be locked/unlocked.
+ */
+#ifdef CONFIG_DEBUG_SPINLOCK
+#define spin_assert_locked(lock) if (!spin_is_locked(lock)) { printk("lock assertion failure: lock at %s:%d should be locked!\n", __FILE__, __LINE__); }
+#define spin_assert_unlocked(lock) if (spin_is_locked(lock)) { printk("lock assertion failure: lock at %s:%d should be unlocked!\n", __FILE__, __LINE__); }
+#else
+#define spin_assert_locked(lock) do { } while(0)
+#define spin_assert_unlocked(lock) do { } while(0)
+#endif /* CONFIG_DEBUG_SPINLOCK */
+
#ifdef CONFIG_PREEMPT
asmlinkage void preempt_schedule(void);
On Jul 10, 2002 16:36 -0700, Jesse Barnes wrote:
> Sounds like a great idea to me. Were you thinking of something along
> the lines of what I have below or perhaps something more
> sophisticated? I suppose it would be helpful to have the name of the
> lock in addition to the file and line number...
>
> Jesse
>
>
> diff -Naur -X /home/jbarnes/dontdiff linux-2.5.25/fs/inode.c linux-2.5.25-spinassert/fs/inode.c
> --- linux-2.5.25/fs/inode.c Fri Jul 5 16:42:38 2002
> +++ linux-2.5.25-spinassert/fs/inode.c Wed Jul 10 16:30:18 2002
> @@ -183,6 +183,8 @@
> */
> void __iget(struct inode * inode)
> {
> + spin_assert_locked(&inode_lock);
> +
> if (atomic_read(&inode->i_count)) {
> atomic_inc(&inode->i_count);
> return;
> diff -Naur -X /home/jbarnes/dontdiff linux-2.5.25/include/linux/spinlock.h linux-2.5.25-spinassert/include/linux/spinlock.h
> --- linux-2.5.25/include/linux/spinlock.h Fri Jul 5 16:42:24 2002
> +++ linux-2.5.25-spinassert/include/linux/spinlock.h Wed Jul 10 16:20:06 2002
> @@ -118,6 +118,18 @@
>
> #endif /* !SMP */
>
> +/*
> + * Simple lock assertions for debugging and documenting where locks need
> + * to be locked/unlocked.
> + */
> +#ifdef CONFIG_DEBUG_SPINLOCK
> +#define spin_assert_locked(lock) if (!spin_is_locked(lock)) { printk("lock assertion failure: lock at %s:%d should be locked!\n", __FILE__, __LINE__); }
You can use CPP to add in the lock name like:
#define spin_assert_locked(lock) if (!spin_is_locked(lock)) \
printk("lock assertion error: %s:%d: " #lock \
" should be locked!\n", __FILE__, __LINE__)
#define spin_assert_unlocked(lock) if (!spin_is_locked(lock)) \
printk("lock assertion error: %s:%d: " #lock \
" should be unlocked!\n", __FILE__, __LINE__)
Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/
On Wed, Jul 10, 2002 at 06:54:24PM -0600, Andreas Dilger wrote:
> You can use CPP to add in the lock name like:
>
> #define spin_assert_locked(lock) if (!spin_is_locked(lock)) \
> printk("lock assertion error: %s:%d: " #lock \
> " should be locked!\n", __FILE__, __LINE__)
>
> #define spin_assert_unlocked(lock) if (!spin_is_locked(lock)) \
> printk("lock assertion error: %s:%d: " #lock \
> " should be unlocked!\n", __FILE__, __LINE__)
Oh yeah, I should have done that the first time. How does this look?
Thanks,
Jesse
diff -Naur -X /home/jbarnes/dontdiff linux-2.5.25/fs/inode.c linux-2.5.25-spinassert/fs/inode.c
--- linux-2.5.25/fs/inode.c Fri Jul 5 16:42:38 2002
+++ linux-2.5.25-spinassert/fs/inode.c Wed Jul 10 16:30:18 2002
@@ -183,6 +183,8 @@
*/
void __iget(struct inode * inode)
{
+ spin_assert_locked(&inode_lock);
+
if (atomic_read(&inode->i_count)) {
atomic_inc(&inode->i_count);
return;
diff -Naur -X /home/jbarnes/dontdiff linux-2.5.25/include/linux/spinlock.h linux-2.5.25-spinassert/include/linux/spinlock.h
--- linux-2.5.25/include/linux/spinlock.h Fri Jul 5 16:42:24 2002
+++ linux-2.5.25-spinassert/include/linux/spinlock.h Wed Jul 10 18:05:13 2002
@@ -118,6 +118,22 @@
#endif /* !SMP */
+/*
+ * Simple lock assertions for debugging and documenting where locks need
+ * to be locked/unlocked.
+ */
+#if defined(CONFIG_DEBUG_SPINLOCK) && defined(SMP)
+#define spin_assert_locked(lock) if (!spin_is_locked(lock)) { \
+ printk("lock assertion failure: %s:%d lock " #lock \
+ "should be locked!\n", __FILE__, __LINE__); }
+#define spin_assert_unlocked(lock) if (spin_is_locked(lock)) { \
+ printk("lock assertion failure: %s:%d lock " #lock \
+ "should be unlocked!\n", __FILE__, __LINE__); }
+#else
+#define spin_assert_locked(lock) do { } while(0)
+#define spin_assert_unlocked(lock) do { } while(0)
+#endif /* CONFIG_DEBUG_SPINLOCK && SMP */
+
#ifdef CONFIG_PREEMPT
asmlinkage void preempt_schedule(void);
On Thursday 11 July 2002 01:36, Jesse Barnes wrote:
> On Thu, Jul 11, 2002 at 12:24:06AM +0200, Daniel Phillips wrote:
> > Acme, which is to replace all those above-the-function lock coverage
> > comments with assert-like thingies:
> >
> > spin_assert(&pagemap_lru_lock);
> >
> > And everbody knows what that does: when compiled with no spinlock
> > debugging it does nothing, but with spinlock debugging enabled, it oopses
> > unless pagemap_lru_lock is held at that point in the code. The practical
> > effect of this is that lots of 3 line comments get replaced with a
> > one line assert that actually does something useful. That is, besides
> > documenting the lock coverage, this thing will actually check to see if
> > you're telling the truth, if you ask it to.
> >
> > Oh, and they will stay up to date much better than the comments do,
> > because nobody needs to to be an ueber-hacker to turn on the option and
> > post any resulting oopses to lkml.
>
> Sounds like a great idea to me. Were you thinking of something along
> the lines of what I have below or perhaps something more
> sophisticated? I suppose it would be helpful to have the name of the
> lock in addition to the file and line number...
I was thinking of something as simple as:
#define spin_assert_locked(LOCK) BUG_ON(!spin_is_locked(LOCK))
but in truth I'd be happy regardless of the internal implementation. A note
on names: Linus likes to shout the names of his BUG macros. I've never been
one for shouting, but it's not my kernel, and anyway, I'm happy he now likes
asserts. I bet he'd like it more spelled like this though:
MUST_HOLD(&lock);
And, dare I say it, what I'd *really* like to happen when the thing triggers
is to get dropped into kdb. Ah well, perhaps in a parallel universe...
When one of these things triggers I do think you want everything to come to
a screeching halt, since, to misquote Matrix, "you're already dead", and you
don't want any one-per-year warnings to slip off into the gloomy depths of
some forgotten log file.
--
Daniel
On Thursday 11 July 2002 01:36, Jesse Barnes wrote:
+#define spin_assert_unlocked(lock) if (spin_is_locked(lock)) { printk("lock assertion failure: lock at %s:%d should be unlocked!\n", __FILE__, __LINE__); }
I suppose what would at least be as helpful is to check if _any_
lock is held, e.g. when calling a potentially sleeping function.
Something along these lines:
#ifdef CONFIG_DEBUG_SPINLOCK
extern char *volatile last_spinlock[NR_CPUS];
#define spin_assert_unlocked_all() ({ \
char *lock = last_spinlock[smp_processor_id()]; \
if (lock) { \
printk (KERN_CRIT "%s:%d: lock %s is held\n", \
__func__, __LINE__, lock); \
BUG(); \
} \
})
#define spin_lock(lock) ({ \
last_spinlock[smp_processor_id()] = __stringify(lock) "@" \
__FILE__ ":" __stringify(__LINE__); \
__really_spin_lock(lock); \
})
#endif
probably, a per-cpu lock depth should be used to also catch
spin_lock(foo_lock);
spin_lock(bar_lock);
spin_unlock(bar_lock);
spin_assert_unlock_all();
Arnd <><
On Wed, 10 Jul 2002, William Lee Irwin III wrote:
> On Wed, Jul 10, 2002 at 12:03:08PM +0200, Marco Colombo wrote:
> >> Larry, there's something I've always wanted to ask you about your
> >> idea of the "locking cliff": when you're counting the number of locks,
> >> are you looking at the running image of an OS or at the source?
>
> On Wed, Jul 10, 2002 at 03:40:03PM +0100, Matthew Wilcox wrote:
> > Larry normally talks about the number of conceptual locks. So in order
> > to manipulate a `struct file', it really doesn't matter whether you have
> > to grab the BKL, the files_struct lock or the filp->lock. There's a big
> > difference if you have to grab the filp->pos_lock, the filp->ra_lock and
> > the filp->iobuf_lock. You'd have to know what order to grab them in,
> > for a start.
>
> This is called "lock depth" and is not related to the total number of
> locks declared in the source. AFAIK no one wants to increase lock depth.
Not directly of course. But with one BKL, depth is limited to one.
With one lock per subsystem, it's hardly more than one. But with 10000
locks spread into the source, it's quite unlikely you can do *anything*
without holding two locks at least. I think one of the point of Larry's
"locking cliff" idea is that when the lock depth grows beyond your
ability (or time) to really handle it, the only solution is to create
"your own" lock, just to play safe. But this way you increase both the
global number of locks *and* the lock depth of your code (by one at least).
I don't fully agree with Larry: I blame the lack of a clean locking model.
If you start from one BKL and push it "down", just increasing the number
of locks everytime you feel you need to, you get a mess. Locking should be
done globally at the same level: either at top (BKL), at the top of
each subsystem, or at "object" level.
But I realize I'm speaking out of imagination, Larry out of experience.
And it really makes a difference (to me, at least).
.TM.
On Thu, 11 Jul 2002, Daniel Phillips wrote:
> I was thinking of something as simple as:
>
> #define spin_assert_locked(LOCK) BUG_ON(!spin_is_locked(LOCK))
>
> but in truth I'd be happy regardless of the internal implementation. A note
> on names: Linus likes to shout the names of his BUG macros. I've never been
> one for shouting, but it's not my kernel, and anyway, I'm happy he now likes
> asserts. I bet he'd like it more spelled like this though:
>
> MUST_HOLD(&lock);
I prefer that form too.
> And, dare I say it, what I'd *really* like to happen when the thing triggers
> is to get dropped into kdb. Ah well, perhaps in a parallel universe...
It ought to.
As long as we're talking about spinlock debugging, I've found it extremely
useful to add an entry to the spinlock to record where the spinlock was
taken.
--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."
On Thu, Jul 11, 2002 at 07:31:09AM +0200, Daniel Phillips wrote:
> I was thinking of something as simple as:
>
> #define spin_assert_locked(LOCK) BUG_ON(!spin_is_locked(LOCK))
>
> but in truth I'd be happy regardless of the internal implementation. A note
> on names: Linus likes to shout the names of his BUG macros. I've never been
> one for shouting, but it's not my kernel, and anyway, I'm happy he now likes
> asserts. I bet he'd like it more spelled like this though:
>
> MUST_HOLD(&lock);
I like lowercase better too, but you're right that Linus likes to
shout...
> And, dare I say it, what I'd *really* like to happen when the thing triggers
> is to get dropped into kdb. Ah well, perhaps in a parallel universe...
As long as you've got kdb patched in, this _should_ happen on BUG().
How about this? Are there simple *_is_locked() calls for the other
mutex mechanisms? If so, I could add those too...
Thanks,
Jesse
diff -Naur -X /home/jbarnes/dontdiff linux-2.5.25/fs/inode.c linux-2.5.25-spinassert/fs/inode.c
--- linux-2.5.25/fs/inode.c Fri Jul 5 16:42:38 2002
+++ linux-2.5.25-spinassert/fs/inode.c Thu Jul 11 10:59:23 2002
@@ -183,6 +183,8 @@
*/
void __iget(struct inode * inode)
{
+ MUST_HOLD(&inode_lock);
+
if (atomic_read(&inode->i_count)) {
atomic_inc(&inode->i_count);
return;
diff -Naur -X /home/jbarnes/dontdiff linux-2.5.25/include/linux/spinlock.h linux-2.5.25-spinassert/include/linux/spinlock.h
--- linux-2.5.25/include/linux/spinlock.h Fri Jul 5 16:42:24 2002
+++ linux-2.5.25-spinassert/include/linux/spinlock.h Thu Jul 11 11:02:17 2002
@@ -116,7 +116,19 @@
#define _raw_write_lock(lock) (void)(lock) /* Not "unused variable". */
#define _raw_write_unlock(lock) do { } while(0)
-#endif /* !SMP */
+#endif /* !CONFIG_SMP */
+
+/*
+ * Simple lock assertions for debugging and documenting where locks need
+ * to be locked/unlocked.
+ */
+#if defined(CONFIG_DEBUG_SPINLOCK) && defined(CONFIG_SMP)
+#define MUST_HOLD(lock) BUG_ON(!spin_is_locked(lock))
+#define MUST_NOT_HOLD(lock) BUG_ON(spin_is_locked(lock))
+#else
+#define MUST_HOLD(lock) do { } while(0)
+#define MUST_NOT_HOLD(lock) do { } while(0)
+#endif /* CONFIG_DEBUG_SPINLOCK && CONFIG_SMP */
#ifdef CONFIG_PREEMPT
On Thursday 11 July 2002 20:03, Jesse Barnes wrote:
> How about this?
It looks good, the obvious thing we don't get is what the actual lock
count is, and actually, we don't care because we know what it is in
this case.
> Are there simple *_is_locked() calls for the other
> mutex mechanisms?
I didn't look, but my guess is no and you're in for some reverse
engineering, or you could just ask Ben.
> If so, I could add those too...
Definitely needed. Note there are both counting and mutex forms of
semaphore and they are actually the same primitive - this doesn't mean
the asserts should treat them the same. That is, it doesn't make
sense to think of the counting form of semaphore in terms of lock
coverage, so actually, we're never interested in semaphore count
values other than 0 and 1, the simple BUG_ON is all we need. This
is nice, not only because it reads well, but because it doesn't
generate a lot of code. This whole feature will be a lot more useful
if code size isn't a problem, in which case I'd tend to run with the
option enabled in the normal course of development.
> Thanks,
> Jesse
>
>
> diff -Naur -X /home/jbarnes/dontdiff linux-2.5.25/fs/inode.c linux-2.5.25-spinassert/fs/inode.c
> --- linux-2.5.25/fs/inode.c Fri Jul 5 16:42:38 2002
> +++ linux-2.5.25-spinassert/fs/inode.c Thu Jul 11 10:59:23 2002
> @@ -183,6 +183,8 @@
> */
> void __iget(struct inode * inode)
> {
> + MUST_HOLD(&inode_lock);
> +
> if (atomic_read(&inode->i_count)) {
> atomic_inc(&inode->i_count);
> return;
> diff -Naur -X /home/jbarnes/dontdiff linux-2.5.25/include/linux/spinlock.h linux-2.5.25-spinassert/include/linux/spinlock.h
> --- linux-2.5.25/include/linux/spinlock.h Fri Jul 5 16:42:24 2002
> +++ linux-2.5.25-spinassert/include/linux/spinlock.h Thu Jul 11 11:02:17 2002
> @@ -116,7 +116,19 @@
> #define _raw_write_lock(lock) (void)(lock) /* Not "unused variable". */
> #define _raw_write_unlock(lock) do { } while(0)
>
> -#endif /* !SMP */
> +#endif /* !CONFIG_SMP */
> +
> +/*
> + * Simple lock assertions for debugging and documenting where locks need
> + * to be locked/unlocked.
> + */
> +#if defined(CONFIG_DEBUG_SPINLOCK) && defined(CONFIG_SMP)
> +#define MUST_HOLD(lock) BUG_ON(!spin_is_locked(lock))
> +#define MUST_NOT_HOLD(lock) BUG_ON(spin_is_locked(lock))
> +#else
> +#define MUST_HOLD(lock) do { } while(0)
> +#define MUST_NOT_HOLD(lock) do { } while(0)
> +#endif /* CONFIG_DEBUG_SPINLOCK && CONFIG_SMP */
>
> #ifdef CONFIG_PREEMPT
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
Daniel
On Thu, Jul 11, 2002 at 09:17:44PM +0200, Daniel Phillips wrote:
> On Thursday 11 July 2002 20:03, Jesse Barnes wrote:
> > How about this?
>
> It looks good, the obvious thing we don't get is what the actual lock
> count is, and actually, we don't care because we know what it is in
> this case.
Something I've been meaning to hack up for a while is some spinlock
debugging code that adds a FUNCTION_SLEEPS() to (ta-da) functions that
may sleep. This macro then checks whether we're currently holding any
locks, and if so printk's the names of locks held, and where they were taken.
When I came up with the idea[1] I envisioned some linked-lists frobbing,
but in more recent times, we can now check the preempt_count for a
quick-n-dirty implementation (without the additional info of which locks
we hold, lock-taker, etc).
Dave
[1] Not an original idea, in fact I think Ingo came up with an
implementation back in `98 or so.
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
On Friday 12 July 2002 14:07, Dave Jones wrote:
> On Thu, Jul 11, 2002 at 09:17:44PM +0200, Daniel Phillips wrote:
> > On Thursday 11 July 2002 20:03, Jesse Barnes wrote:
> > > How about this?
> >
> > It looks good, the obvious thing we don't get is what the actual lock
> > count is, and actually, we don't care because we know what it is in
> > this case.
>
> Something I've been meaning to hack up for a while is some spinlock
> debugging code that adds a FUNCTION_SLEEPS() to (ta-da) functions that
> may sleep.
Yesss. May I suggest simply SLEEPS()? (Chances are, we know it's a
function.)
> This macro then checks whether we're currently holding any
> locks, and if so printk's the names of locks held, and where they were taken.
And then oopes?
> When I came up with the idea[1] I envisioned some linked-lists frobbing,
> but in more recent times, we can now check the preempt_count for a
> quick-n-dirty implementation (without the additional info of which locks
> we hold, lock-taker, etc).
Spin_lock just has to store the address/location of the lock in a
per-cpu vector, and the assert prints that out when it oopses. Such
bugs won't live too long under those conditions.
Any idea how one might implement NEVER_SLEEPS()? Maybe as:
NEVER_ [code goes here] _SLEEPS
which inc/dec the preeempt count, triggering a BUG in schedule().
--
Daniel
On Fri, 2002-07-12 at 05:07, Dave Jones wrote:
> When I came up with the idea[1] I envisioned some linked-lists frobbing,
> but in more recent times, we can now check the preempt_count for a
> quick-n-dirty implementation (without the additional info of which locks
> we hold, lock-taker, etc).
Neat idea. I have seen some other good similar ideas: check
preempt_count on schedule(), check preempt_count in usleep/msleep
(Arjan's idea), and check preempt_count in wakeup/context switch/etc.
code...
Note some of these need one or both of: subtracting out
current->lock_depth+1 since we can sleep with the BKL and NAND'ing out
PREEMPT_ACTIVE as that is set before entering schedule off a preemption.
As we move preempt_count to more of a generic "are we atomic" count in
2.5, these become easier and more useful...
Robert Love
On Fri, Jul 12, 2002 at 10:49:42AM -0700, Robert Love wrote:
> > When I came up with the idea[1] I envisioned some linked-lists frobbing,
> > but in more recent times, we can now check the preempt_count for a
> > quick-n-dirty implementation (without the additional info of which locks
> > we hold, lock-taker, etc).
>
> Neat idea. I have seen some other good similar ideas: check
> preempt_count on schedule(), check preempt_count in usleep/msleep
> (Arjan's idea), and check preempt_count in wakeup/context switch/etc.
> code...
Sounds sensible. I'd like to see more self-checking bits added for
preemption. It may be the only way we ever pin down some of the
outstanding "don't make any sense" bugs.
Dave
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
On Fri, 12 Jul 2002, Daniel Phillips wrote:
> On Friday 12 July 2002 14:07, Dave Jones wrote:
> > On Thu, Jul 11, 2002 at 09:17:44PM +0200, Daniel Phillips wrote:
> > > On Thursday 11 July 2002 20:03, Jesse Barnes wrote:
> > > > How about this?
> > >
> > > It looks good, the obvious thing we don't get is what the actual lock
> > > count is, and actually, we don't care because we know what it is in
> > > this case.
> >
> > Something I've been meaning to hack up for a while is some spinlock
> > debugging code that adds a FUNCTION_SLEEPS() to (ta-da) functions that
> > may sleep.
>
> Yesss. May I suggest simply SLEEPS()? (Chances are, we know it's a
> function.)
>
> > This macro then checks whether we're currently holding any
> > locks, and if so printk's the names of locks held, and where they were taken.
>
> And then oopes?
>
> > When I came up with the idea[1] I envisioned some linked-lists frobbing,
> > but in more recent times, we can now check the preempt_count for a
> > quick-n-dirty implementation (without the additional info of which locks
> > we hold, lock-taker, etc).
>
> Spin_lock just has to store the address/location of the lock in a
> per-cpu vector, and the assert prints that out when it oopses. Such
> bugs won't live too long under those conditions.
Store it in the task struct, and store a pointer to the previous (outer
lock) in that lock, then you've got a linked list of locks per task - very
useful. You'll need a helper function - current() is hard to get at from
spinlock.h due to tangled dependencies. As I mentioned before, it can also
be very handy to stash the address of the code that took the lock in the
lock itself.
> Any idea how one might implement NEVER_SLEEPS()? Maybe as:
>
> NEVER_ [code goes here] _SLEEPS
>
> which inc/dec the preeempt count, triggering a BUG in schedule().
NEVER_SLEEPS will only trigger on the rare events that blow up anyway,
while the MAY_SLEEP version catches _potential_ problems even when the
fatal sleep doesn't happen.
--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."
On Friday 12 July 2002 22:41, Oliver Xymoron wrote:
> On Fri, 12 Jul 2002, Daniel Phillips wrote:
> > On Friday 12 July 2002 14:07, Dave Jones wrote:
> > > Something I've been meaning to hack up for a while is some spinlock
> > > debugging code that adds a FUNCTION_SLEEPS() to (ta-da) functions that
> > > may sleep.
> >
> > Yesss. May I suggest simply SLEEPS()? (Chances are, we know it's a
> > function.)
> >
> > > This macro then checks whether we're currently holding any
> > > locks, and if so printk's the names of locks held, and where they were taken.
> >
> > And then oopes?
> >
> > > When I came up with the idea[1] I envisioned some linked-lists frobbing,
> > > but in more recent times, we can now check the preempt_count for a
> > > quick-n-dirty implementation (without the additional info of which locks
> > > we hold, lock-taker, etc).
> >
> > Spin_lock just has to store the address/location of the lock in a
> > per-cpu vector, and the assert prints that out when it oopses. Such
> > bugs won't live too long under those conditions.
>
> Store it in the task struct, and store a pointer to the previous (outer
> lock) in that lock, then you've got a linked list of locks per task - very
> useful.
Yes, thanks, I was in fact feeling a little guilty about proposing that
non-nested solution, even if it would in practice point you at the
correct culprit most of the time.
So in schedule() we check the task struct field and oops if it's non-null.
In other words, one instance of SLEEPS goes in shedule() and others are
sprinkled around the kernel as executable documentation.
> You'll need a helper function - current() is hard to get at from
> spinlock.h due to tangled dependencies.
What is the problem? It looks like (struct thread_info *) can be forward
declared, and hence current_thread_info and get_current can be defined
early. Maybe I didn't sniff at enough architectures.
Anyway, if there are nasty dependencies then they are probably similar
to the ones I unravelled with my early-page patch, which allows struct
page to be declared right at the start of mm.h instead of way later,
below a lot of things that want to reference it.
> As I mentioned before, it can also
> be very handy to stash the address of the code that took the lock in the
> lock itself.
And since the stack is chained through the locks, that just works.
> > Any idea how one might implement NEVER_SLEEPS()? Maybe as:
> >
> > NEVER_ [code goes here] _SLEEPS
> >
> > which inc/dec the preeempt count, triggering a BUG in schedule().
>
> NEVER_SLEEPS will only trigger on the rare events that blow up anyway,
> while the MAY_SLEEP version catches _potential_ problems even when the
> fatal sleep doesn't happen.
Yes, that one's already been rejected as largely pointless.
--
Daniel
Alan Cox <[email protected]> writes:
> > On Tue, Jul 09, 2002 at 11:21:25PM +0100, Alan Cox wrote:
> > >
> > > There are lots of them hiding 8)
> >
> > Just out of curisoty. If I remember correctly SMP came to Linux when
> > Caldera hired you to make it work. Did you invent the BKL?
>
> Caldera bought the hardware, rather than hiring me. Having said that at
> the time the dual P90 board + processors was not exactly cheap. The board
> btw is alive and well and currently owned by Dave Jones.
>
> As far as the locking goes I invented the big kernel lock, but the basis of
> that is all taken directly from "Unix systems for modern architectures"
> by Schimmel which is required reading for anyone who cares about caches,
> SMP and locking.
>
> I'd prefer the trees to be separate for testing purposes: it
> doens't make much sense to have SMP support as a normal kernel
> feature when most people won't have SMP anyway"
> -- Linus Torvalds
>
That takes Linux even one step closer to the big commercial world, we
now have a statement to quialify for membership into the same club as
Mr. Gates' "Who needs more than 640kb anyway?" and Olsen's (Digital?)
"There will only be a handfull of computers in the nation". :)
mvh,
A
--
Alexander Hoogerhuis | [email protected]
CCNP - CCDP - MCNE - CCSE | +47 908 21 485
"You have zero privacy anyway. Get over it." --Scott McNealy
On 15 Jul 2002, Alexander Hoogerhuis wrote:
> That takes Linux even one step closer to the big commercial world, we
> now have a statement to quialify for membership into the same club as
> Mr. Gates' "Who needs more than 640kb anyway?" and Olsen's (Digital?)
> "There will only be a handfull of computers in the nation". :)
With ongoing miniaturisation, Olsen might just be right again ...
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
Hi,
On Mon, 15 Jul 2002, Rik van Riel wrote:
> > "There will only be a handfull of computers in the nation". :)
>
> With ongoing miniaturisation, Olsen might just be right again ...
Note: a handful of computers, not a handful of computer.
Miniaturization is a Good Thing[tm], IMHO. I've worked on Computers for
quite a while, and I was quite happy watching them shrink. Remember the
Honeywell DDP-110, or which was it? The first computer which I managed to
hide in a single corner of a room. With the incredible amount of 12 kB of
storage. Were we happy...
Today, I can hide a computer in my pocket, playing mp3 files. No need to
pull your cupboard behind you. Miniaturization is quite useful when it
comes to carry things around.
Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------