2007-05-09 21:06:07

by Jonathan Corbet

[permalink] [raw]
Subject: [PATCH] "volatile considered harmful" document

OK, here's an updated version of the volatile document - as a plain text
file this time. It drops a new file in Documentation/, but might it be
better as an addition to CodingStyle?

Comments welcome,

jon

Tell kernel developers why they shouldn't use volatile.

Signed-off-by: Jonathan Corbet <[email protected]>

diff -ruNp linux-2.6/Documentation/volatile.txt volatile/Documentation/volatile.txt
--- linux-2.6/Documentation/volatile.txt 1969-12-31 17:00:00.000000000 -0700
+++ volatile/Documentation/volatile.txt 2007-05-09 14:56:40.000000000 -0600
@@ -0,0 +1,127 @@
+Why the "volatile" type class should not be used
+------------------------------------------------
+
+The C Programming Language, Second Edition (copyright 1988, still known as
+"the new C book") has the following to say about the volatile keyword:
+
+ The purpose of volatile is to force an implementation to suppress
+ optimization that could otherwise occur. For example, for a
+ machine with memory-mapped input/output, a pointer to a device
+ register might be declared as a pointer to volatile, in
+ order to prevent the compiler from removing apparently redundant
+ references through the pointer.
+
+C programmers have often taken volatile to mean that the variable could be
+changed outside of the current thread of execution; as a result, they are
+sometimes tempted to use it in kernel code when shared data structures are
+being used. Andrew Morton recently called out[1] use of volatile in a
+submitted patch, saying:
+
+ The volatiles are a worry - volatile is said to be
+ basically-always-wrong in-kernel, although we've never managed to
+ document why, and i386 cheerfully uses it in readb() and friends.
+
+In response, Randy Dunlap pulled together some email from Linus[2] on the
+topic and suggested that we could maybe "document why." Here is the
+result.
+
+The point that Linus often makes with regard to volatile is that
+its purpose is to suppress optimization, which is almost never what one
+really wants to do. In the kernel, one must protect accesses to data
+against race conditions, which is very much a different task.
+
+Like volatile, the kernel primitives which make concurrent access to data
+safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent
+unwanted optimization. If they are being used properly, there will be no
+need to use volatile as well. If volatile is still necessary, there is
+almost certainly a bug in the code somewhere. In properly-written kernel
+code, volatile can only serve to slow things down.
+
+Consider a typical block of kernel code:
+
+ spin_lock(&the_lock);
+ do_something_on(&shared_data);
+ do_something_else_with(&shared_data);
+ spin_unlock(&the_lock);
+
+If all the code follows the locking rules, the value of shared_data cannot
+change unexpectedly while the_lock is held. Any other code which might
+want to play with that data will be waiting on the lock. The spinlock
+primitives act as memory barriers - they are explicitly written to do so -
+meaning that data accesses will not be optimized across them. So the
+compiler might think it knows what will be in some_data, but the
+spin_lock() call will force it to forget anything it knows. There will be
+no optimization problems with accesses to that data.
+
+If shared_data were declared volatile, the locking would
+still be necessary. But the compiler would also be prevented from
+optimizing access to shared _within_ the critical section,
+when we know that nobody else can be working with it. While the lock is
+held, shared_data is not volatile. This is why Linus says:
+
+ Also, more importantly, "volatile" is on the wrong _part_ of the
+ whole system. In C, it's "data" that is volatile, but that is
+ insane. Data isn't volatile - _accesses_ are volatile. So it may
+ make sense to say "make this particular _access_ be careful", but
+ not "make all accesses to this data use some random strategy".
+
+When dealing with shared data, proper locking makes volatile unnecessary -
+and potentially harmful.
+
+The volatile storage class was originally meant for memory-mapped I/O
+registers. Within the kernel, register accesses, too, should be protected
+by locks, but one also does not want the compiler "optimizing" register
+accesses within a critical section. But, within the kernel, I/O memory
+accesses are always done through accessor functions; accessing I/O memory
+directly through pointers is frowned upon and does not work on all
+architectures. Those accessors are written to prevent unwanted
+optimization, so, once again, volatile is unnecessary.
+
+Another situation where one might be tempted to use volatile is
+when the processor is busy-waiting on the value of a variable. The right
+way to perform a busy wait is:
+
+ while (my_variable != what_i_want)
+ cpu_relax();
+
+The cpu_relax() call can lower CPU power consumption or yield to a
+hyperthreaded twin processor; it also happens to serve as a memory barrier,
+so, once again, volatile is unnecessary. Of course, busy-waiting is
+generally an anti-social act to begin with.
+
+There are still a few rare situations where volatile makes sense in the
+kernel:
+
+ - The above-mentioned accessor functions might use volatile on
+ architectures where direct I/O memory access does work. Essentially,
+ each accessor call becomes a little critical section on its own and
+ ensures that the access happens as expected by the programmer.
+
+ - Inline assembly code which changes memory, but which has no other
+ visible side effects, risks being deleted by GCC. Adding the volatile
+ keyword to asm statements will prevent this removal.
+
+ - The jiffies variable is special in that it can have a different value
+ every time it is referenced, but it can be read without any special
+ locking. So jiffies can be volatile, but the addition of other
+ variables of this type is frowned upon. Jiffies is considered to be a
+ "stupid legacy" issue in this regard.
+
+For most code, none of the above justifications for volatile
+apply. As a result, the use of volatile is likely to be seen as a
+bug and will bring additional scrutiny to the code. Developers who are
+tempted to use volatile should take a step back and think about
+what they are truly trying to accomplish.
+
+NOTES
+-----
+
+[1] http://lwn.net/Articles/233481/
+[2] http://lwn.net/Articles/233482/
+
+CREDITS
+-------
+
+Original impetus and research by Randy Dunlap
+Written by Jonathan Corbet
+Improvements via coments from Satyam Sharma Johannes Stezenbach


2007-05-09 21:45:49

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

On 09/05/07, Jonathan Corbet <[email protected]> wrote:
> OK, here's an updated version of the volatile document - as a plain text
> file this time. It drops a new file in Documentation/, but might it be
> better as an addition to CodingStyle?
>
IMHO this is better as a sepperate document rather than an adition to
CodingStyle. The use of volatile is not a style issue, it's a
correctness issue, so it doesn't belong in the CodingStyle document.


> Comments welcome,
>
Find a few below :)


> jon
>
> Tell kernel developers why they shouldn't use volatile.
>
> Signed-off-by: Jonathan Corbet <[email protected]>
>
> diff -ruNp linux-2.6/Documentation/volatile.txt volatile/Documentation/volatile.txt

Might I suggest a different filename: volatile-considered-harmful.txt


> --- linux-2.6/Documentation/volatile.txt 1969-12-31 17:00:00.000000000 -0700
> +++ volatile/Documentation/volatile.txt 2007-05-09 14:56:40.000000000 -0600
> @@ -0,0 +1,127 @@
> +Why the "volatile" type class should not be used
> +------------------------------------------------
> +
> +The C Programming Language, Second Edition (copyright 1988, still known as
> +"the new C book") has the following to say about the volatile keyword:
> +
> + The purpose of volatile is to force an implementation to suppress
> + optimization that could otherwise occur. For example, for a
> + machine with memory-mapped input/output, a pointer to a device
> + register might be declared as a pointer to volatile, in
> + order to prevent the compiler from removing apparently redundant
> + references through the pointer.
> +
> +C programmers have often taken volatile to mean that the variable could be
> +changed outside of the current thread of execution; as a result, they are

you write: "... that the variable could be changed outside of the
current thread of execution ..."

I suggest: "... that the variable could be changed outside of the
current thread of execution - a sort of simple atomic variable ..."

> +sometimes tempted to use it in kernel code when shared data structures are
> +being used. Andrew Morton recently called out[1] use of volatile in a
> +submitted patch, saying:
> +
> + The volatiles are a worry - volatile is said to be
> + basically-always-wrong in-kernel, although we've never managed to
> + document why, and i386 cheerfully uses it in readb() and friends.
> +
> +In response, Randy Dunlap pulled together some email from Linus[2] on the
> +topic and suggested that we could maybe "document why." Here is the
> +result.
> +
> +The point that Linus often makes with regard to volatile is that
> +its purpose is to suppress optimization, which is almost never what one
> +really wants to do. In the kernel, one must protect accesses to data
> +against race conditions, which is very much a different task.
> +
> +Like volatile, the kernel primitives which make concurrent access to data
> +safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent
> +unwanted optimization. If they are being used properly, there will be no
> +need to use volatile as well. If volatile is still necessary, there is
> +almost certainly a bug in the code somewhere. In properly-written kernel
> +code, volatile can only serve to slow things down.
> +
> +Consider a typical block of kernel code:
> +
> + spin_lock(&the_lock);
> + do_something_on(&shared_data);
> + do_something_else_with(&shared_data);
> + spin_unlock(&the_lock);
> +
> +If all the code follows the locking rules, the value of shared_data cannot
> +change unexpectedly while the_lock is held. Any other code which might
> +want to play with that data will be waiting on the lock. The spinlock
> +primitives act as memory barriers - they are explicitly written to do so -
> +meaning that data accesses will not be optimized across them. So the
> +compiler might think it knows what will be in some_data, but the
> +spin_lock() call will force it to forget anything it knows. There will be
> +no optimization problems with accesses to that data.
> +
> +If shared_data were declared volatile, the locking would
> +still be necessary. But the compiler would also be prevented from
> +optimizing access to shared _within_ the critical section,
> +when we know that nobody else can be working with it. While the lock is
> +held, shared_data is not volatile. This is why Linus says:
> +
> + Also, more importantly, "volatile" is on the wrong _part_ of the
> + whole system. In C, it's "data" that is volatile, but that is
> + insane. Data isn't volatile - _accesses_ are volatile. So it may
> + make sense to say "make this particular _access_ be careful", but
> + not "make all accesses to this data use some random strategy".
> +
> +When dealing with shared data, proper locking makes volatile unnecessary -
> +and potentially harmful.
> +
> +The volatile storage class was originally meant for memory-mapped I/O
> +registers. Within the kernel, register accesses, too, should be protected
> +by locks, but one also does not want the compiler "optimizing" register
> +accesses within a critical section. But, within the kernel, I/O memory
> +accesses are always done through accessor functions; accessing I/O memory
> +directly through pointers is frowned upon and does not work on all
> +architectures. Those accessors are written to prevent unwanted
> +optimization, so, once again, volatile is unnecessary.
> +
> +Another situation where one might be tempted to use volatile is
> +when the processor is busy-waiting on the value of a variable. The right
> +way to perform a busy wait is:
> +
> + while (my_variable != what_i_want)
> + cpu_relax();
> +
> +The cpu_relax() call can lower CPU power consumption or yield to a
> +hyperthreaded twin processor; it also happens to serve as a memory barrier,
> +so, once again, volatile is unnecessary. Of course, busy-waiting is
> +generally an anti-social act to begin with.
> +
> +There are still a few rare situations where volatile makes sense in the
> +kernel:
> +
> + - The above-mentioned accessor functions might use volatile on
> + architectures where direct I/O memory access does work. Essentially,
> + each accessor call becomes a little critical section on its own and
> + ensures that the access happens as expected by the programmer.
> +
> + - Inline assembly code which changes memory, but which has no other
> + visible side effects, risks being deleted by GCC. Adding the volatile
> + keyword to asm statements will prevent this removal.
> +
> + - The jiffies variable is special in that it can have a different value
> + every time it is referenced, but it can be read without any special
> + locking. So jiffies can be volatile, but the addition of other
> + variables of this type is frowned upon. Jiffies is considered to be a

suggestion: "frowned strongly upon"

> + "stupid legacy" issue in this regard.
> +
> +For most code, none of the above justifications for volatile
> +apply. As a result, the use of volatile is likely to be seen as a
> +bug and will bring additional scrutiny to the code. Developers who are
> +tempted to use volatile should take a step back and think about
> +what they are truly trying to accomplish.
> +

Suggested addition :

Patches that remove volatile from current code (provided there's a
good explanation of why the volatile can be removed and how the bug it
was hiding has been dealt with) are a good thing.

Friends don't let friends use volatile.


> +NOTES
> +-----
> +
> +[1] http://lwn.net/Articles/233481/
> +[2] http://lwn.net/Articles/233482/
> +
> +CREDITS
> +-------
> +
> +Original impetus and research by Randy Dunlap
> +Written by Jonathan Corbet
> +Improvements via coments from Satyam Sharma Johannes Stezenbach


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-05-09 22:16:37

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

Heikki Orsila wrote:
> Imo, the best style to relay information is by directly stating facts.
> I'm going to try to suggest some improvements on this..
>
> On Wed, May 09, 2007 at 03:05:44PM -0600, Jonathan Corbet wrote:
>> Andrew Morton recently called out[1] use of volatile in a
>> +submitted patch, saying:
>> +
>> + The volatiles are a worry - volatile is said to be
>> + basically-always-wrong in-kernel, although we've never managed to
>> + document why, and i386 cheerfully uses it in readb() and friends.
>> +
>> +In response, Randy Dunlap pulled together some email from Linus[2] on the
>> +topic and suggested that we could maybe "document why." Here is the
>> +result.
>
> I think previous text is unnecessary. Just tell the reasons why
> volatile is bad and what should be used instead, no need to quote
> people.

Ack, the doc. history isn't needed.

>> +The point that Linus often makes with regard to volatile is that
>> +its purpose is to suppress optimization, which is almost never what one
>> +really wants to do. In the kernel, one must protect accesses to data
>> +against race conditions, which is very much a different task.
>
> Again, I would write this in non-personal way:
>
> "Volatile is often used to prevent optimization, but it is not the
> behavior that is actually wanted. Access to data must be protected and
> handled with kernel provided synchronization, mutual exclusion and
> barriers tools. These tools make volatile useless."
>
>> +Like volatile, the kernel primitives which make concurrent access to data
>> +safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent
>> +unwanted optimization. If they are being used properly, there will be no
>> +need to use volatile as well.
>
> The previous suggestion takes care of explaining that, remove this.
>
>> If volatile is still necessary, there is
>> +almost certainly a bug in the code somewhere. In properly-written kernel
>> +code, volatile can only serve to slow things down.
>> +
>> +Consider a typical block of kernel code:
>> +
>> + spin_lock(&the_lock);
>> + do_something_on(&shared_data);
>> + do_something_else_with(&shared_data);
>> + spin_unlock(&the_lock);
>> +
>> +If all the code follows the locking rules, the value of shared_data cannot
>> +change unexpectedly while the_lock is held. Any other code which might
>> +want to play with that data will be waiting on the lock.
>
> Ok
>
>> +The spinlock
>> +primitives act as memory barriers - they are explicitly written to do so -
>> +meaning that data accesses will not be optimized across them.
>
> "Spinlock primitives will serialise execution of code regions, and
> memory barrier functions must be used inside spinlocks to force
> loads and stores."
>
>> So the
>> +compiler might think it knows what will be in some_data, but the
>> +spin_lock() call will force it to forget anything it knows. There will be
>> +no optimization problems with accesses to that data.
>
> I would say it directly:
>
> "Spinlock will force an implicit memory barrier, thus preventing
> optimizations to data access."
>
>> +If shared_data were declared volatile, the locking would
>> +still be necessary. But the compiler would also be prevented from
>> +optimizing access to shared _within_ the critical section,
>> +when we know that nobody else can be working with it. While the lock is
>> +held, shared_data is not volatile.
>
> ok
>
>> This is why Linus says:
>> +
>> + Also, more importantly, "volatile" is on the wrong _part_ of the
>> + whole system. In C, it's "data" that is volatile, but that is
>> + insane. Data isn't volatile - _accesses_ are volatile. So it may
>> + make sense to say "make this particular _access_ be careful", but
>> + not "make all accesses to this data use some random strategy".
>
> Unnecessary quoting, imo. Tell the same information directly without
> personifying the statement.


--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-05-09 22:31:24

by Satyam Sharma

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

On 5/10/07, Jesper Juhl <[email protected]> wrote:
> On 09/05/07, Jonathan Corbet <[email protected]> wrote:
> > OK, here's an updated version of the volatile document - as a plain text
> > file this time. It drops a new file in Documentation/, but might it be
> > better as an addition to CodingStyle?
> >
> IMHO this is better as a sepperate document rather than an adition to
> CodingStyle. The use of volatile is not a style issue, it's a
> correctness issue, so it doesn't belong in the CodingStyle document.

Yes, definitely.

> > diff -ruNp linux-2.6/Documentation/volatile.txt volatile/Documentation/volatile.txt
>
> Might I suggest a different filename: volatile-considered-harmful.txt

Yes, I feel it's important to have a filename that strongly states the
dim view its contents take on its subject (the "volatile" type
qualifier in this case). Sort of like stable-api-nonsense.txt vs
stable-api.txt

> > --- linux-2.6/Documentation/volatile.txt 1969-12-31 17:00:00.000000000 -0700
> > +++ volatile/Documentation/volatile.txt 2007-05-09 14:56:40.000000000 -0600
> > @@ -0,0 +1,127 @@
> > +Why the "volatile" type class should not be used
> > +------------------------------------------------
> > +
> > +The C Programming Language, Second Edition (copyright 1988, still known as
> > +"the new C book") has the following to say about the volatile keyword:
> > +
> > + The purpose of volatile is to force an implementation to suppress
> > + optimization that could otherwise occur. For example, for a
> > + machine with memory-mapped input/output, a pointer to a device
> > + register might be declared as a pointer to volatile, in
> > + order to prevent the compiler from removing apparently redundant
> > + references through the pointer.
> > +
> > +C programmers have often taken volatile to mean that the variable could be
> > +changed outside of the current thread of execution; as a result, they are
>
> you write: "... that the variable could be changed outside of the
> current thread of execution ..."
>
> I suggest: "... that the variable could be changed outside of the
> current thread of execution - a sort of simple atomic variable ..."

I'm not so sure here. Why would any C programmer (worth his weight in
salt) think that volatile objects are automatically _atomic_? At
worst, the mistake someone might make would be to _implement_ locking
primitives using volatile. "that the variable could be changed outside
of the current thread of execution" sounds sufficient to me, and after
all, that is exactly what volatile hints to the compiler.

> > +For most code, none of the above justifications for volatile
> > +apply. As a result, the use of volatile is likely to be seen as a
> > +bug and will bring additional scrutiny to the code. Developers who are
> > +tempted to use volatile should take a step back and think about
> > +what they are truly trying to accomplish.
> > +
>
> Suggested addition :
>
> Patches that remove volatile from current code (provided there's a
> good explanation of why the volatile can be removed and how the bug it
> was hiding has been dealt with) are a good thing.
>
> Friends don't let friends use volatile.

Yes, this addition would be nice :-)

2007-05-09 22:36:17

by Scott Preece

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

On 5/9/07, Jonathan Corbet <[email protected]> wrote:
> OK, here's an updated version of the volatile document - as a plain text
> file this time. It drops a new file in Documentation/, but might it be
> better as an addition to CodingStyle?
> ...
---

I think the size of this file is OK as a separate file, but much too
big for CodingStyle. It would be good to also write a pithy paragraph
to go into CodingStyle to convey the rule and point at this file.

scott

2007-05-09 22:38:13

by Heikki Orsila

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

Imo, the best style to relay information is by directly stating facts.
I'm going to try to suggest some improvements on this..

On Wed, May 09, 2007 at 03:05:44PM -0600, Jonathan Corbet wrote:
> Andrew Morton recently called out[1] use of volatile in a
> +submitted patch, saying:
> +
> + The volatiles are a worry - volatile is said to be
> + basically-always-wrong in-kernel, although we've never managed to
> + document why, and i386 cheerfully uses it in readb() and friends.
> +
> +In response, Randy Dunlap pulled together some email from Linus[2] on the
> +topic and suggested that we could maybe "document why." Here is the
> +result.

I think previous text is unnecessary. Just tell the reasons why
volatile is bad and what should be used instead, no need to quote
people.

> +The point that Linus often makes with regard to volatile is that
> +its purpose is to suppress optimization, which is almost never what one
> +really wants to do. In the kernel, one must protect accesses to data
> +against race conditions, which is very much a different task.

Again, I would write this in non-personal way:

"Volatile is often used to prevent optimization, but it is not the
behavior that is actually wanted. Access to data must be protected and
handled with kernel provided synchronization, mutual exclusion and
barriers tools. These tools make volatile useless."

> +Like volatile, the kernel primitives which make concurrent access to data
> +safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent
> +unwanted optimization. If they are being used properly, there will be no
> +need to use volatile as well.

The previous suggestion takes care of explaining that, remove this.

> If volatile is still necessary, there is
> +almost certainly a bug in the code somewhere. In properly-written kernel
> +code, volatile can only serve to slow things down.
> +
> +Consider a typical block of kernel code:
> +
> + spin_lock(&the_lock);
> + do_something_on(&shared_data);
> + do_something_else_with(&shared_data);
> + spin_unlock(&the_lock);
> +
> +If all the code follows the locking rules, the value of shared_data cannot
> +change unexpectedly while the_lock is held. Any other code which might
> +want to play with that data will be waiting on the lock.

Ok

> +The spinlock
> +primitives act as memory barriers - they are explicitly written to do so -
> +meaning that data accesses will not be optimized across them.

"Spinlock primitives will serialise execution of code regions, and
memory barrier functions must be used inside spinlocks to force
loads and stores."

> So the
> +compiler might think it knows what will be in some_data, but the
> +spin_lock() call will force it to forget anything it knows. There will be
> +no optimization problems with accesses to that data.

I would say it directly:

"Spinlock will force an implicit memory barrier, thus preventing
optimizations to data access."

> +If shared_data were declared volatile, the locking would
> +still be necessary. But the compiler would also be prevented from
> +optimizing access to shared _within_ the critical section,
> +when we know that nobody else can be working with it. While the lock is
> +held, shared_data is not volatile.

ok

> This is why Linus says:
> +
> + Also, more importantly, "volatile" is on the wrong _part_ of the
> + whole system. In C, it's "data" that is volatile, but that is
> + insane. Data isn't volatile - _accesses_ are volatile. So it may
> + make sense to say "make this particular _access_ be careful", but
> + not "make all accesses to this data use some random strategy".

Unnecessary quoting, imo. Tell the same information directly without
personifying the statement.

--
Heikki Orsila Barbie's law:
[email protected] "Math is hard, let's go shopping!"
http://www.iki.fi/shd

2007-05-09 22:48:10

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

On 10/05/07, Satyam Sharma <[email protected]> wrote:
> On 5/10/07, Jesper Juhl <[email protected]> wrote:
> > On 09/05/07, Jonathan Corbet <[email protected]> wrote:
<snip>
> > > +"the new C book") has the following to say about the volatile keyword:
> > > +
> > > + The purpose of volatile is to force an implementation to suppress
> > > + optimization that could otherwise occur. For example, for a
> > > + machine with memory-mapped input/output, a pointer to a device
> > > + register might be declared as a pointer to volatile, in
> > > + order to prevent the compiler from removing apparently redundant
> > > + references through the pointer.
> > > +
> > > +C programmers have often taken volatile to mean that the variable could be
> > > +changed outside of the current thread of execution; as a result, they are
> >
> > you write: "... that the variable could be changed outside of the
> > current thread of execution ..."
> >
> > I suggest: "... that the variable could be changed outside of the
> > current thread of execution - a sort of simple atomic variable ..."
>
> I'm not so sure here. Why would any C programmer (worth his weight in
> salt) think that volatile objects are automatically _atomic_? At

I honestly don't really know, but I've encountered that confusion a
few times. Both from friends who (for some reason) believed that and
from documents on the web that implied it, aparently it's a common
confusion - a few examples:

http://lists.freebsd.org/pipermail/freebsd-perl/2004-June/000124.html
"... volatile (atomic) fixes the problem. ..."

http://blogs.msdn.com/ricom/archive/2006/04/28/586406.aspx
"That's the point of the volatile keyword. It makes sure that
the line "dict = d;" is atomic."

http://forum.java.sun.com/thread.jspa?threadID=5126877&start=0
"A volatile variable is also guaranteed to be read or written
as an atomic operation ..." (yes, this link talks about Java, which I
don't know, but if java volatile means atomic, that might explain why
some people assume the same for C).

In any case, it's not an uncommon belief, so I just thought it made
sense to also make that little note.


> worst, the mistake someone might make would be to _implement_ locking
> primitives using volatile. "that the variable could be changed outside
> of the current thread of execution" sounds sufficient to me, and after
> all, that is exactly what volatile hints to the compiler.
>


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-05-09 23:20:52

by Satyam Sharma

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

On 5/10/07, Jesper Juhl <[email protected]> wrote:
> On 10/05/07, Satyam Sharma <[email protected]> wrote:
> > On 5/10/07, Jesper Juhl <[email protected]> wrote:
> > > [snip]
> > > you write: "... that the variable could be changed outside of the
> > > current thread of execution ..."
> > >
> > > I suggest: "... that the variable could be changed outside of the
> > > current thread of execution - a sort of simple atomic variable ..."
> >
> > I'm not so sure here. Why would any C programmer (worth his weight in
> > salt) think that volatile objects are automatically _atomic_? At
>
> I honestly don't really know, but I've encountered that confusion a
> few times. Both from friends who (for some reason) believed that and
> from documents on the web that implied it, aparently it's a common
> confusion - a few examples:
>
> http://lists.freebsd.org/pipermail/freebsd-perl/2004-June/000124.html
> "... volatile (atomic) fixes the problem. ..."
>
> http://blogs.msdn.com/ricom/archive/2006/04/28/586406.aspx
> "That's the point of the volatile keyword. It makes sure that
> the line "dict = d;" is atomic."
>
> http://forum.java.sun.com/thread.jspa?threadID=5126877&start=0
> "A volatile variable is also guaranteed to be read or written
> as an atomic operation ..." (yes, this link talks about Java, which I
> don't know, but if java volatile means atomic, that might explain why
> some people assume the same for C).

Perl / Microsoft / Java programmers are probably not worth their
weight in salt anyway :-)

I'm not an expert in any of the above platforms either, so don't know
if the semantics of "volatile" in those other languages are different
from that in C -- and this document clearly applies to only the kernel
(and thus C). But if this volatile == atomic disease is indeed common
among _C_ programmers too, then your suggested addition would make
sense.

2007-05-10 16:20:32

by Giacomo Catenazzi

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

Jonathan Corbet wrote:
> +The volatile storage class was originally meant for memory-mapped I/O
> +registers. Within the kernel, register accesses, too, should be protected

I don't think it deserves to be added in documentation, but just for
reference: in userspace "volatile" is needed in signals (posix mandates
some variables to be volatile, as API, not as funtionality). I don't
know if this was also on the original signal handling.

Anyway user space APIs are not kernel problem ;-)

ciao
cate

2007-05-10 19:39:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

Jonathan Corbet wrote:
> OK, here's an updated version of the volatile document - as a plain text
> file this time. It drops a new file in Documentation/, but might it be
> better as an addition to CodingStyle?
>
> Comments welcome,

I have found one use of volatile which I consider legitimate: pointers
to data structures read or written by I/O devices in coherent memory
(typically pci_coherent memory.) This is local to device drivers, but
as far as I can tell, the use of volatile here is legitimate, although
arguably it will be redundant in > 90% of all cases due to the
incidental presence of other memory barriers.

In Ethernet drivers, for example, it is common for the network card to
maintain a pointer in host memory the the latest descriptor written; you
will generally have a loop of the form:

while ((this_pointer = *pointer_ptr) > my_last_pointer) {
for (pkt = my_last_pointer; pkt < this_pointer; pkt++)
receeive_packet(pkt);
my_last_pointer = this_pointer;
}

pointer_p can then be a volatile pointer into said coherent memory.

-hpa


2007-05-10 21:54:58

by Bill Davidsen

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

Jonathan Corbet wrote:

> +There are still a few rare situations where volatile makes sense in the
> +kernel:
> +
> + - The above-mentioned accessor functions might use volatile on
> + architectures where direct I/O memory access does work. Essentially,
> + each accessor call becomes a little critical section on its own and
> + ensures that the access happens as expected by the programmer.
> +
> + - Inline assembly code which changes memory, but which has no other
> + visible side effects, risks being deleted by GCC. Adding the volatile
> + keyword to asm statements will prevent this removal.
> +
> + - The jiffies variable is special in that it can have a different value
> + every time it is referenced, but it can be read without any special
> + locking. So jiffies can be volatile, but the addition of other
> + variables of this type is frowned upon. Jiffies is considered to be a
> + "stupid legacy" issue in this regard.

It would seem that any variable which is (a) subject to change by other
threads or hardware, and (b) the value of which is going to be used
without writing the variable, would be a valid use for volatile.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2007-05-10 21:56:35

by Alan

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

> In Ethernet drivers, for example, it is common for the network card to
> maintain a pointer in host memory the the latest descriptor written; you
> will generally have a loop of the form:
>
> while ((this_pointer = *pointer_ptr) > my_last_pointer) {
> for (pkt = my_last_pointer; pkt < this_pointer; pkt++)
> receeive_packet(pkt);
> my_last_pointer = this_pointer;
> }
>
> pointer_p can then be a volatile pointer into said coherent memory.

True but you can happily use rmb/wmb for this which are clearer and fit
the rest of the Linux model better.

Alan

2007-05-12 18:32:45

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

Bill Davidsen wrote:
> Jonathan Corbet wrote:
>
>> +There are still a few rare situations where volatile makes sense in the
>> +kernel:
>> +
>> + - The above-mentioned accessor functions might use volatile on
>> + architectures where direct I/O memory access does work.
>> Essentially,
>> + each accessor call becomes a little critical section on its own and
>> + ensures that the access happens as expected by the programmer.
>> +
>> + - Inline assembly code which changes memory, but which has no other
>> + visible side effects, risks being deleted by GCC. Adding the
>> volatile
>> + keyword to asm statements will prevent this removal.
>> +
>> + - The jiffies variable is special in that it can have a different
>> value
>> + every time it is referenced, but it can be read without any special
>> + locking. So jiffies can be volatile, but the addition of other
>> + variables of this type is frowned upon. Jiffies is considered to
>> be a
>> + "stupid legacy" issue in this regard.
>
> It would seem that any variable which is (a) subject to change by other
> threads or hardware, and (b) the value of which is going to be used
> without writing the variable, would be a valid use for volatile.

You don't need volatile in that case, rmb() can be used.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-13 16:30:17

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

Robert Hancock <[email protected]> writes:

> You don't need volatile in that case, rmb() can be used.

rmb() invalidates all compiler assumptions, it can be much slower.
--
Krzysztof Halasa

2007-05-13 23:26:03

by Bill Davidsen

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

Krzysztof Halasa wrote:
> Robert Hancock <[email protected]> writes:
>
>
>> You don't need volatile in that case, rmb() can be used.
>>
>
> rmb() invalidates all compiler assumptions, it can be much slower.
>
Yes, why would you use rmb() when a read of a volatile generates optimal
code?

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

2007-05-13 23:53:51

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

On Sun, May 13, 2007 at 07:26:13PM -0400, Bill Davidsen wrote:
> Krzysztof Halasa wrote:
> >Robert Hancock <[email protected]> writes:
> >>You don't need volatile in that case, rmb() can be used.

> >rmb() invalidates all compiler assumptions, it can be much slower.

It does not invalidate /all/ assumptions.


> Yes, why would you use rmb() when a read of a volatile generates optimal
> code?

Read of a volatile is guaranteed to generate the least optimal code.
That's what volatile does, guarantee no optimization of that particular
access.

Jeff



2007-05-14 14:10:33

by Bill Davidsen

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

Jeff Garzik wrote:
> On Sun, May 13, 2007 at 07:26:13PM -0400, Bill Davidsen wrote:
>
>> Krzysztof Halasa wrote:
>>
>>> Robert Hancock <[email protected]> writes:
>>>
>>>> You don't need volatile in that case, rmb() can be used.
>>>>
>
>
>>> rmb() invalidates all compiler assumptions, it can be much slower.
>>>
>
> It does not invalidate /all/ assumptions.
>
>
>
>> Yes, why would you use rmb() when a read of a volatile generates optimal
>> code?
>>
>
> Read of a volatile is guaranteed to generate the least optimal code.
> That's what volatile does, guarantee no optimization of that particular
> access.
>
By optimal you seem to mean "generate fewer CPU cycles by risking use of
an obsolete value," while by the same term I mean read the correct and
current value from the memory location without the overhead of locks. If
your logic doesn't require the correct value, why read it at all? And if
it does, how fewer cycles and cache impact can anything have than a
single register load from memory?

Locks are useful when the value will be changed by a thread, or when the
value must not be changed briefly. That's not always the case.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

2007-05-14 14:28:33

by folkert

[permalink] [raw]
Subject: [2.6.21] circular locking dependency found in QUOTA OFF

Hi,

When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
and system on an 1-filesystem IDE disk, I get the following circular
locking dependency error:

[330961.226405] =======================================================
[330961.226489] [ INFO: possible circular locking dependency detected ]
[330961.226531] 2.6.21 #5
[330961.226569] -------------------------------------------------------
[330961.226611] quotaoff/12249 is trying to acquire lock:
[330961.226652] (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
mutex_lock+0x8/0xa
[330961.226861]
[330961.226862] but task is already holding lock:
[330961.226938] (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
mutex_lock+0x8/0xa
[330961.227111]
[330961.227111] which lock already depends on the new lock.
[330961.227112]
[330961.227225]
[330961.227225] the existing dependency chain (in reverse order) is:
[330961.227303]
[330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
[330961.227473] [<c1039b02>] check_prev_add+0x15b/0x281
[330961.227766] [<c1039cb3>] check_prevs_add+0x8b/0xe8
[330961.228056] [<c103b683>] __lock_acquire+0x692/0xb81
[330961.228353] [<c103bfda>] lock_acquire+0x62/0x81
[330961.228643] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
[330961.228934] [<c120e2a1>] mutex_lock+0x8/0xa
[330961.229221] [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
[330961.229513] [<c109fdd1>] vfs_quota_on+0x75/0x79
[330961.229803] [<c10bc92d>] ext3_quota_on+0x95/0xb0
[330961.230093] [<c10a1eb2>] do_quotactl+0xc9/0x2dd
[330961.230384] [<c10a214a>] sys_quotactl+0x84/0xd6
[330961.230673] [<c1003f74>] syscall_call+0x7/0xb
[330961.230963] [<ffffffff>] 0xffffffff
[330961.231268]
[330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
[330961.231469] [<c10399db>] check_prev_add+0x34/0x281
[330961.231759] [<c1039cb3>] check_prevs_add+0x8b/0xe8
[330961.232049] [<c103b683>] __lock_acquire+0x692/0xb81
[330961.232344] [<c103bfda>] lock_acquire+0x62/0x81
[330961.232632] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
[330961.232923] [<c120e2a1>] mutex_lock+0x8/0xa
[330961.233211] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
[330961.233500] [<c10a2088>] do_quotactl+0x29f/0x2dd
[330961.233792] [<c10a214a>] sys_quotactl+0x84/0xd6
[330961.234081] [<c1003f74>] syscall_call+0x7/0xb
[330961.234503] [<ffffffff>] 0xffffffff
[330961.234795]
[330961.234795] other info that might help us debug this:
[330961.234796]
[330961.234908] 2 locks held by quotaoff/12249:
[330961.234947] #0: (&type->s_umount_key#15){----}, at: [<c1070b5d>]
get_super+0x53/0x94
[330961.235183] #1: (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
mutex_lock+0x8/0xa
[330961.235386]
[330961.235387] stack backtrace:
[330961.235462] [<c1004d53>] show_trace_log_lvl+0x1a/0x30
[330961.235535] [<c1004d7b>] show_trace+0x12/0x14
[330961.235606] [<c1004e75>] dump_stack+0x16/0x18
[330961.235679] [<c1039352>] print_circular_bug_tail+0x6f/0x71
[330961.235753] [<c10399db>] check_prev_add+0x34/0x281
[330961.235825] [<c1039cb3>] check_prevs_add+0x8b/0xe8
[330961.235897] [<c103b683>] __lock_acquire+0x692/0xb81
[330961.235969] [<c103bfda>] lock_acquire+0x62/0x81
[330961.236041] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
[330961.236113] [<c120e2a1>] mutex_lock+0x8/0xa
[330961.236185] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
[330961.236257] [<c10a2088>] do_quotactl+0x29f/0x2dd
[330961.236330] [<c10a214a>] sys_quotactl+0x84/0xd6
[330961.236402] [<c1003f74>] syscall_call+0x7/0xb
[330961.236473] =======================

2007-05-14 17:43:30

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [2.6.21] circular locking dependency found in QUOTA OFF

[adding Jan and fsdevel to CC]

Hi Folkert,

On 14/05/07, Folkert van Heusden <[email protected]> wrote:
> Hi,
>
> When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> and system on an 1-filesystem IDE disk, I get the following circular
> locking dependency error:
>
> [330961.226405] =======================================================
> [330961.226489] [ INFO: possible circular locking dependency detected ]
> [330961.226531] 2.6.21 #5
> [330961.226569] -------------------------------------------------------
> [330961.226611] quotaoff/12249 is trying to acquire lock:
> [330961.226652] (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
> mutex_lock+0x8/0xa
> [330961.226861]
> [330961.226862] but task is already holding lock:
> [330961.226938] (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> mutex_lock+0x8/0xa
> [330961.227111]
> [330961.227111] which lock already depends on the new lock.
> [330961.227112]
> [330961.227225]
> [330961.227225] the existing dependency chain (in reverse order) is:
> [330961.227303]
> [330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
> [330961.227473] [<c1039b02>] check_prev_add+0x15b/0x281
> [330961.227766] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> [330961.228056] [<c103b683>] __lock_acquire+0x692/0xb81
> [330961.228353] [<c103bfda>] lock_acquire+0x62/0x81
> [330961.228643] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> [330961.228934] [<c120e2a1>] mutex_lock+0x8/0xa
> [330961.229221] [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
> [330961.229513] [<c109fdd1>] vfs_quota_on+0x75/0x79
> [330961.229803] [<c10bc92d>] ext3_quota_on+0x95/0xb0
> [330961.230093] [<c10a1eb2>] do_quotactl+0xc9/0x2dd
> [330961.230384] [<c10a214a>] sys_quotactl+0x84/0xd6
> [330961.230673] [<c1003f74>] syscall_call+0x7/0xb
> [330961.230963] [<ffffffff>] 0xffffffff
> [330961.231268]
> [330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
> [330961.231469] [<c10399db>] check_prev_add+0x34/0x281
> [330961.231759] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> [330961.232049] [<c103b683>] __lock_acquire+0x692/0xb81
> [330961.232344] [<c103bfda>] lock_acquire+0x62/0x81
> [330961.232632] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> [330961.232923] [<c120e2a1>] mutex_lock+0x8/0xa
> [330961.233211] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> [330961.233500] [<c10a2088>] do_quotactl+0x29f/0x2dd
> [330961.233792] [<c10a214a>] sys_quotactl+0x84/0xd6
> [330961.234081] [<c1003f74>] syscall_call+0x7/0xb
> [330961.234503] [<ffffffff>] 0xffffffff
> [330961.234795]
> [330961.234795] other info that might help us debug this:
> [330961.234796]
> [330961.234908] 2 locks held by quotaoff/12249:
> [330961.234947] #0: (&type->s_umount_key#15){----}, at: [<c1070b5d>]
> get_super+0x53/0x94
> [330961.235183] #1: (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> mutex_lock+0x8/0xa
> [330961.235386]
> [330961.235387] stack backtrace:
> [330961.235462] [<c1004d53>] show_trace_log_lvl+0x1a/0x30
> [330961.235535] [<c1004d7b>] show_trace+0x12/0x14
> [330961.235606] [<c1004e75>] dump_stack+0x16/0x18
> [330961.235679] [<c1039352>] print_circular_bug_tail+0x6f/0x71
> [330961.235753] [<c10399db>] check_prev_add+0x34/0x281
> [330961.235825] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> [330961.235897] [<c103b683>] __lock_acquire+0x692/0xb81
> [330961.235969] [<c103bfda>] lock_acquire+0x62/0x81
> [330961.236041] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> [330961.236113] [<c120e2a1>] mutex_lock+0x8/0xa
> [330961.236185] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> [330961.236257] [<c10a2088>] do_quotactl+0x29f/0x2dd
> [330961.236330] [<c10a214a>] sys_quotactl+0x84/0xd6
> [330961.236402] [<c1003f74>] syscall_call+0x7/0xb
> [330961.236473] =======================
>

Is this a 2.6.21 regression?

Regards,
Michal

--
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)

2007-05-14 17:44:41

by folkert

[permalink] [raw]
Subject: Re: [2.6.21] circular locking dependency found in QUOTA OFF

> [adding Jan and fsdevel to CC]
> Hi Folkert,
> >When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> >and system on an 1-filesystem IDE disk, I get the following circular
> >locking dependency error:
> >
> >[330961.226405] =======================================================
> >[330961.226489] [ INFO: possible circular locking dependency detected ]
> >[330961.226531] 2.6.21 #5
...
> >[330961.236402] [<c1003f74>] syscall_call+0x7/0xb
> >[330961.236473] =======================
>
> Is this a 2.6.21 regression?

This is new for 2.6.21, yes.


Folkert van Heusden

--
MultiTail est un flexible tool pour suivre de logfiles et execution de
commandements. Filtrer, pourvoir de couleur, merge, 'diff-view', etc.
http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2007-05-15 13:59:19

by Jan Kara

[permalink] [raw]
Subject: Re: [2.6.21] circular locking dependency found in QUOTA OFF

Hi,

Thanks for report. It seems like a lockdep problem. i_mutex for quota
files is below dqonoff_sem. We were already fixing this for filesystem
specific quota IO functions but obviously we've missed a few cases. I
wonder why it showed up only now... Anyway, attached is a fix. Andrew,
would you add it? Thanks.

Honza

On Mon 14-05-07 19:43:18, Michal Piotrowski wrote:
> [adding Jan and fsdevel to CC]
>
> Hi Folkert,
>
> On 14/05/07, Folkert van Heusden <[email protected]> wrote:
> >Hi,
> >
> >When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> >and system on an 1-filesystem IDE disk, I get the following circular
> >locking dependency error:
> >
> >[330961.226405] =======================================================
> >[330961.226489] [ INFO: possible circular locking dependency detected ]
> >[330961.226531] 2.6.21 #5
> >[330961.226569] -------------------------------------------------------
> >[330961.226611] quotaoff/12249 is trying to acquire lock:
> >[330961.226652] (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
> >mutex_lock+0x8/0xa
> >[330961.226861]
> >[330961.226862] but task is already holding lock:
> >[330961.226938] (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> >mutex_lock+0x8/0xa
> >[330961.227111]
> >[330961.227111] which lock already depends on the new lock.
> >[330961.227112]
> >[330961.227225]
> >[330961.227225] the existing dependency chain (in reverse order) is:
> >[330961.227303]
> >[330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
> >[330961.227473] [<c1039b02>] check_prev_add+0x15b/0x281
> >[330961.227766] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> >[330961.228056] [<c103b683>] __lock_acquire+0x692/0xb81
> >[330961.228353] [<c103bfda>] lock_acquire+0x62/0x81
> >[330961.228643] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> >[330961.228934] [<c120e2a1>] mutex_lock+0x8/0xa
> >[330961.229221] [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
> >[330961.229513] [<c109fdd1>] vfs_quota_on+0x75/0x79
> >[330961.229803] [<c10bc92d>] ext3_quota_on+0x95/0xb0
> >[330961.230093] [<c10a1eb2>] do_quotactl+0xc9/0x2dd
> >[330961.230384] [<c10a214a>] sys_quotactl+0x84/0xd6
> >[330961.230673] [<c1003f74>] syscall_call+0x7/0xb
> >[330961.230963] [<ffffffff>] 0xffffffff
> >[330961.231268]
> >[330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
> >[330961.231469] [<c10399db>] check_prev_add+0x34/0x281
> >[330961.231759] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> >[330961.232049] [<c103b683>] __lock_acquire+0x692/0xb81
> >[330961.232344] [<c103bfda>] lock_acquire+0x62/0x81
> >[330961.232632] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> >[330961.232923] [<c120e2a1>] mutex_lock+0x8/0xa
> >[330961.233211] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> >[330961.233500] [<c10a2088>] do_quotactl+0x29f/0x2dd
> >[330961.233792] [<c10a214a>] sys_quotactl+0x84/0xd6
> >[330961.234081] [<c1003f74>] syscall_call+0x7/0xb
> >[330961.234503] [<ffffffff>] 0xffffffff
> >[330961.234795]
> >[330961.234795] other info that might help us debug this:
> >[330961.234796]
> >[330961.234908] 2 locks held by quotaoff/12249:
> >[330961.234947] #0: (&type->s_umount_key#15){----}, at: [<c1070b5d>]
> >get_super+0x53/0x94
> >[330961.235183] #1: (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> >mutex_lock+0x8/0xa
> >[330961.235386]
> >[330961.235387] stack backtrace:
> >[330961.235462] [<c1004d53>] show_trace_log_lvl+0x1a/0x30
> >[330961.235535] [<c1004d7b>] show_trace+0x12/0x14
> >[330961.235606] [<c1004e75>] dump_stack+0x16/0x18
> >[330961.235679] [<c1039352>] print_circular_bug_tail+0x6f/0x71
> >[330961.235753] [<c10399db>] check_prev_add+0x34/0x281
> >[330961.235825] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> >[330961.235897] [<c103b683>] __lock_acquire+0x692/0xb81
> >[330961.235969] [<c103bfda>] lock_acquire+0x62/0x81
> >[330961.236041] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> >[330961.236113] [<c120e2a1>] mutex_lock+0x8/0xa
> >[330961.236185] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> >[330961.236257] [<c10a2088>] do_quotactl+0x29f/0x2dd
> >[330961.236330] [<c10a214a>] sys_quotactl+0x84/0xd6
> >[330961.236402] [<c1003f74>] syscall_call+0x7/0xb
> >[330961.236473] =======================
> >
>
> Is this a 2.6.21 regression?
>
> Regards,
> Michal
>
> --
> Michal K. K. Piotrowski
> Kernel Monkeys
> (http://kernel.wikidot.com/start)
--
Jan Kara <[email protected]>
SuSE CR Labs


Attachments:
(No filename) (4.47 kB)
quota-2.6.21-1-quota_lockdep.diff (2.87 kB)
Download all attachments

2007-05-15 17:52:21

by folkert

[permalink] [raw]
Subject: Re: [2.6.21] circular locking dependency found in QUOTA OFF

I'm afraid it doesn't compile:

CHK include/linux/version.h
CHK include/linux/utsrelease.h
CHK include/linux/compile.h
CC fs/dquot.o
CC fs/quota.o
fs/quota.c: In function `quota_sync_sb':
fs/quota.c:180: error: `I_MUTEX_NESTED' undeclared (first use in this function)
fs/quota.c:180: error: (Each undeclared identifier is reported only once
fs/quota.c:180: error: for each function it appears in.)
make[1]: *** [fs/quota.o] Error 1
make: *** [fs] Error 2


On Tue, May 15, 2007 at 04:09:39PM +0200, Jan Kara wrote:
> Hi,
>
> Thanks for report. It seems like a lockdep problem. i_mutex for quota
> files is below dqonoff_sem. We were already fixing this for filesystem
> specific quota IO functions but obviously we've missed a few cases. I
> wonder why it showed up only now... Anyway, attached is a fix. Andrew,
> would you add it? Thanks.
>
> Honza
>
> On Mon 14-05-07 19:43:18, Michal Piotrowski wrote:
> > [adding Jan and fsdevel to CC]
> >
> > Hi Folkert,
> >
> > On 14/05/07, Folkert van Heusden <[email protected]> wrote:
> > >Hi,
> > >
> > >When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> > >and system on an 1-filesystem IDE disk, I get the following circular
> > >locking dependency error:
> > >
> > >[330961.226405] =======================================================
> > >[330961.226489] [ INFO: possible circular locking dependency detected ]
> > >[330961.226531] 2.6.21 #5
> > >[330961.226569] -------------------------------------------------------
> > >[330961.226611] quotaoff/12249 is trying to acquire lock:
> > >[330961.226652] (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
> > >mutex_lock+0x8/0xa
> > >[330961.226861]
> > >[330961.226862] but task is already holding lock:
> > >[330961.226938] (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > >mutex_lock+0x8/0xa
> > >[330961.227111]
> > >[330961.227111] which lock already depends on the new lock.
> > >[330961.227112]
> > >[330961.227225]
> > >[330961.227225] the existing dependency chain (in reverse order) is:
> > >[330961.227303]
> > >[330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
> > >[330961.227473] [<c1039b02>] check_prev_add+0x15b/0x281
> > >[330961.227766] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > >[330961.228056] [<c103b683>] __lock_acquire+0x692/0xb81
> > >[330961.228353] [<c103bfda>] lock_acquire+0x62/0x81
> > >[330961.228643] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > >[330961.228934] [<c120e2a1>] mutex_lock+0x8/0xa
> > >[330961.229221] [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
> > >[330961.229513] [<c109fdd1>] vfs_quota_on+0x75/0x79
> > >[330961.229803] [<c10bc92d>] ext3_quota_on+0x95/0xb0
> > >[330961.230093] [<c10a1eb2>] do_quotactl+0xc9/0x2dd
> > >[330961.230384] [<c10a214a>] sys_quotactl+0x84/0xd6
> > >[330961.230673] [<c1003f74>] syscall_call+0x7/0xb
> > >[330961.230963] [<ffffffff>] 0xffffffff
> > >[330961.231268]
> > >[330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
> > >[330961.231469] [<c10399db>] check_prev_add+0x34/0x281
> > >[330961.231759] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > >[330961.232049] [<c103b683>] __lock_acquire+0x692/0xb81
> > >[330961.232344] [<c103bfda>] lock_acquire+0x62/0x81
> > >[330961.232632] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > >[330961.232923] [<c120e2a1>] mutex_lock+0x8/0xa
> > >[330961.233211] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > >[330961.233500] [<c10a2088>] do_quotactl+0x29f/0x2dd
> > >[330961.233792] [<c10a214a>] sys_quotactl+0x84/0xd6
> > >[330961.234081] [<c1003f74>] syscall_call+0x7/0xb
> > >[330961.234503] [<ffffffff>] 0xffffffff
> > >[330961.234795]
> > >[330961.234795] other info that might help us debug this:
> > >[330961.234796]
> > >[330961.234908] 2 locks held by quotaoff/12249:
> > >[330961.234947] #0: (&type->s_umount_key#15){----}, at: [<c1070b5d>]
> > >get_super+0x53/0x94
> > >[330961.235183] #1: (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > >mutex_lock+0x8/0xa
> > >[330961.235386]
> > >[330961.235387] stack backtrace:
> > >[330961.235462] [<c1004d53>] show_trace_log_lvl+0x1a/0x30
> > >[330961.235535] [<c1004d7b>] show_trace+0x12/0x14
> > >[330961.235606] [<c1004e75>] dump_stack+0x16/0x18
> > >[330961.235679] [<c1039352>] print_circular_bug_tail+0x6f/0x71
> > >[330961.235753] [<c10399db>] check_prev_add+0x34/0x281
> > >[330961.235825] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > >[330961.235897] [<c103b683>] __lock_acquire+0x692/0xb81
> > >[330961.235969] [<c103bfda>] lock_acquire+0x62/0x81
> > >[330961.236041] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > >[330961.236113] [<c120e2a1>] mutex_lock+0x8/0xa
> > >[330961.236185] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > >[330961.236257] [<c10a2088>] do_quotactl+0x29f/0x2dd
> > >[330961.236330] [<c10a214a>] sys_quotactl+0x84/0xd6
> > >[330961.236402] [<c1003f74>] syscall_call+0x7/0xb
> > >[330961.236473] =======================
> > >
> >
> > Is this a 2.6.21 regression?
> >
> > Regards,
> > Michal
> >
> > --
> > Michal K. K. Piotrowski
> > Kernel Monkeys
> > (http://kernel.wikidot.com/start)
> --
> Jan Kara <[email protected]>
> SuSE CR Labs

> i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
> acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
> comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
> file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
> call under dqonoff_mutex and save some problems with races...
>
> Signed-off-by: Jan Kara <[email protected]>
>
> diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/dquot.c linux-2.6.21-1-quota_lockdep/fs/dquot.c
> --- linux-2.6.21/fs/dquot.c 2007-05-15 14:18:47.000000000 +0200
> +++ linux-2.6.21-1-quota_lockdep/fs/dquot.c 2007-05-15 14:22:47.000000000 +0200
> @@ -1421,7 +1421,7 @@ int vfs_quota_off(struct super_block *sb
> /* If quota was reenabled in the meantime, we have
> * nothing to do */
> if (!sb_has_quota_enabled(sb, cnt)) {
> - mutex_lock(&toputinode[cnt]->i_mutex);
> + mutex_lock_nested(&toputinode[cnt]->i_mutex, I_MUTEX_QUOTA);
> toputinode[cnt]->i_flags &= ~(S_IMMUTABLE |
> S_NOATIME | S_NOQUOTA);
> truncate_inode_pages(&toputinode[cnt]->i_data, 0);
> diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/quota.c linux-2.6.21-1-quota_lockdep/fs/quota.c
> --- linux-2.6.21/fs/quota.c 2006-11-29 22:57:37.000000000 +0100
> +++ linux-2.6.21-1-quota_lockdep/fs/quota.c 2007-05-15 15:15:44.000000000 +0200
> @@ -158,7 +158,6 @@ static int check_quotactl_valid(struct s
> static void quota_sync_sb(struct super_block *sb, int type)
> {
> int cnt;
> - struct inode *discard[MAXQUOTAS];
>
> sb->s_qcop->quota_sync(sb, type);
> /* This is not very clever (and fast) but currently I don't know about
> @@ -168,29 +167,21 @@ static void quota_sync_sb(struct super_b
> sb->s_op->sync_fs(sb, 1);
> sync_blockdev(sb->s_bdev);
>
> - /* Now when everything is written we can discard the pagecache so
> - * that userspace sees the changes. We need i_mutex and so we could
> - * not do it inside dqonoff_mutex. Moreover we need to be carefull
> - * about races with quotaoff() (that is the reason why we have own
> - * reference to inode). */
> + /*
> + * Now when everything is written we can discard the pagecache so
> + * that userspace sees the changes.
> + */
> mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> - discard[cnt] = NULL;
> if (type != -1 && cnt != type)
> continue;
> if (!sb_has_quota_enabled(sb, cnt))
> continue;
> - discard[cnt] = igrab(sb_dqopt(sb)->files[cnt]);
> + mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex, I_MUTEX_NESTED);
> + truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
> + mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
> }
> mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> - for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> - if (discard[cnt]) {
> - mutex_lock(&discard[cnt]->i_mutex);
> - truncate_inode_pages(&discard[cnt]->i_data, 0);
> - mutex_unlock(&discard[cnt]->i_mutex);
> - iput(discard[cnt]);
> - }
> - }
> }
>
> void sync_dquots(struct super_block *sb, int type)



Folkert van Heusden

--
http://www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
HP/UX en win een vlaai naar keuze
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2007-05-15 18:04:30

by Jan Kara

[permalink] [raw]
Subject: Re: [2.6.21] circular locking dependency found in QUOTA OFF

On Tue 15-05-07 19:52:03, Folkert van Heusden wrote:
> I'm afraid it doesn't compile:
>
> CHK include/linux/version.h
> CHK include/linux/utsrelease.h
> CHK include/linux/compile.h
> CC fs/dquot.o
> CC fs/quota.o
> fs/quota.c: In function `quota_sync_sb':
> fs/quota.c:180: error: `I_MUTEX_NESTED' undeclared (first use in this function)
> fs/quota.c:180: error: (Each undeclared identifier is reported only once
> fs/quota.c:180: error: for each function it appears in.)
> make[1]: *** [fs/quota.o] Error 1
> make: *** [fs] Error 2
Huh, I has compiled just fine for me... Can you send me your .config?
Thanks.

Honza

> On Tue, May 15, 2007 at 04:09:39PM +0200, Jan Kara wrote:
> > Hi,
> >
> > Thanks for report. It seems like a lockdep problem. i_mutex for quota
> > files is below dqonoff_sem. We were already fixing this for filesystem
> > specific quota IO functions but obviously we've missed a few cases. I
> > wonder why it showed up only now... Anyway, attached is a fix. Andrew,
> > would you add it? Thanks.
> >
> > Honza
> >
> > On Mon 14-05-07 19:43:18, Michal Piotrowski wrote:
> > > [adding Jan and fsdevel to CC]
> > >
> > > Hi Folkert,
> > >
> > > On 14/05/07, Folkert van Heusden <[email protected]> wrote:
> > > >Hi,
> > > >
> > > >When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> > > >and system on an 1-filesystem IDE disk, I get the following circular
> > > >locking dependency error:
> > > >
> > > >[330961.226405] =======================================================
> > > >[330961.226489] [ INFO: possible circular locking dependency detected ]
> > > >[330961.226531] 2.6.21 #5
> > > >[330961.226569] -------------------------------------------------------
> > > >[330961.226611] quotaoff/12249 is trying to acquire lock:
> > > >[330961.226652] (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.226861]
> > > >[330961.226862] but task is already holding lock:
> > > >[330961.226938] (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.227111]
> > > >[330961.227111] which lock already depends on the new lock.
> > > >[330961.227112]
> > > >[330961.227225]
> > > >[330961.227225] the existing dependency chain (in reverse order) is:
> > > >[330961.227303]
> > > >[330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
> > > >[330961.227473] [<c1039b02>] check_prev_add+0x15b/0x281
> > > >[330961.227766] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.228056] [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.228353] [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.228643] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.228934] [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.229221] [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
> > > >[330961.229513] [<c109fdd1>] vfs_quota_on+0x75/0x79
> > > >[330961.229803] [<c10bc92d>] ext3_quota_on+0x95/0xb0
> > > >[330961.230093] [<c10a1eb2>] do_quotactl+0xc9/0x2dd
> > > >[330961.230384] [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.230673] [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.230963] [<ffffffff>] 0xffffffff
> > > >[330961.231268]
> > > >[330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
> > > >[330961.231469] [<c10399db>] check_prev_add+0x34/0x281
> > > >[330961.231759] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.232049] [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.232344] [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.232632] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.232923] [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.233211] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > > >[330961.233500] [<c10a2088>] do_quotactl+0x29f/0x2dd
> > > >[330961.233792] [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.234081] [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.234503] [<ffffffff>] 0xffffffff
> > > >[330961.234795]
> > > >[330961.234795] other info that might help us debug this:
> > > >[330961.234796]
> > > >[330961.234908] 2 locks held by quotaoff/12249:
> > > >[330961.234947] #0: (&type->s_umount_key#15){----}, at: [<c1070b5d>]
> > > >get_super+0x53/0x94
> > > >[330961.235183] #1: (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.235386]
> > > >[330961.235387] stack backtrace:
> > > >[330961.235462] [<c1004d53>] show_trace_log_lvl+0x1a/0x30
> > > >[330961.235535] [<c1004d7b>] show_trace+0x12/0x14
> > > >[330961.235606] [<c1004e75>] dump_stack+0x16/0x18
> > > >[330961.235679] [<c1039352>] print_circular_bug_tail+0x6f/0x71
> > > >[330961.235753] [<c10399db>] check_prev_add+0x34/0x281
> > > >[330961.235825] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.235897] [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.235969] [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.236041] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.236113] [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.236185] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > > >[330961.236257] [<c10a2088>] do_quotactl+0x29f/0x2dd
> > > >[330961.236330] [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.236402] [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.236473] =======================
> > > >
> > >
> > > Is this a 2.6.21 regression?
> > >
> > > Regards,
> > > Michal
> > >
> > > --
> > > Michal K. K. Piotrowski
> > > Kernel Monkeys
> > > (http://kernel.wikidot.com/start)
> > --
> > Jan Kara <[email protected]>
> > SuSE CR Labs
>
> > i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
> > acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
> > comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
> > file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
> > call under dqonoff_mutex and save some problems with races...
> >
> > Signed-off-by: Jan Kara <[email protected]>
> >
> > diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/dquot.c linux-2.6.21-1-quota_lockdep/fs/dquot.c
> > --- linux-2.6.21/fs/dquot.c 2007-05-15 14:18:47.000000000 +0200
> > +++ linux-2.6.21-1-quota_lockdep/fs/dquot.c 2007-05-15 14:22:47.000000000 +0200
> > @@ -1421,7 +1421,7 @@ int vfs_quota_off(struct super_block *sb
> > /* If quota was reenabled in the meantime, we have
> > * nothing to do */
> > if (!sb_has_quota_enabled(sb, cnt)) {
> > - mutex_lock(&toputinode[cnt]->i_mutex);
> > + mutex_lock_nested(&toputinode[cnt]->i_mutex, I_MUTEX_QUOTA);
> > toputinode[cnt]->i_flags &= ~(S_IMMUTABLE |
> > S_NOATIME | S_NOQUOTA);
> > truncate_inode_pages(&toputinode[cnt]->i_data, 0);
> > diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/quota.c linux-2.6.21-1-quota_lockdep/fs/quota.c
> > --- linux-2.6.21/fs/quota.c 2006-11-29 22:57:37.000000000 +0100
> > +++ linux-2.6.21-1-quota_lockdep/fs/quota.c 2007-05-15 15:15:44.000000000 +0200
> > @@ -158,7 +158,6 @@ static int check_quotactl_valid(struct s
> > static void quota_sync_sb(struct super_block *sb, int type)
> > {
> > int cnt;
> > - struct inode *discard[MAXQUOTAS];
> >
> > sb->s_qcop->quota_sync(sb, type);
> > /* This is not very clever (and fast) but currently I don't know about
> > @@ -168,29 +167,21 @@ static void quota_sync_sb(struct super_b
> > sb->s_op->sync_fs(sb, 1);
> > sync_blockdev(sb->s_bdev);
> >
> > - /* Now when everything is written we can discard the pagecache so
> > - * that userspace sees the changes. We need i_mutex and so we could
> > - * not do it inside dqonoff_mutex. Moreover we need to be carefull
> > - * about races with quotaoff() (that is the reason why we have own
> > - * reference to inode). */
> > + /*
> > + * Now when everything is written we can discard the pagecache so
> > + * that userspace sees the changes.
> > + */
> > mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
> > for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> > - discard[cnt] = NULL;
> > if (type != -1 && cnt != type)
> > continue;
> > if (!sb_has_quota_enabled(sb, cnt))
> > continue;
> > - discard[cnt] = igrab(sb_dqopt(sb)->files[cnt]);
> > + mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex, I_MUTEX_NESTED);
> > + truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
> > + mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
> > }
> > mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> > - for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> > - if (discard[cnt]) {
> > - mutex_lock(&discard[cnt]->i_mutex);
> > - truncate_inode_pages(&discard[cnt]->i_data, 0);
> > - mutex_unlock(&discard[cnt]->i_mutex);
> > - iput(discard[cnt]);
> > - }
> > - }
> > }
> >
> > void sync_dquots(struct super_block *sb, int type)
>
>
>
> Folkert van Heusden
>
> --
> http://www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
> ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
> HP/UX en win een vlaai naar keuze
> ----------------------------------------------------------------------
> Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com
--
Jan Kara <[email protected]>
SuSE CR Labs

2007-05-15 19:08:07

by Jan Kara

[permalink] [raw]
Subject: Re: [2.6.21] circular locking dependency found in QUOTA OFF

On Tue 15-05-07 19:52:03, Folkert van Heusden wrote:
> I'm afraid it doesn't compile:
>
> CHK include/linux/version.h
> CHK include/linux/utsrelease.h
> CHK include/linux/compile.h
> CC fs/dquot.o
> CC fs/quota.o
> fs/quota.c: In function `quota_sync_sb':
> fs/quota.c:180: error: `I_MUTEX_NESTED' undeclared (first use in this function)
> fs/quota.c:180: error: (Each undeclared identifier is reported only once
> fs/quota.c:180: error: for each function it appears in.)
> make[1]: *** [fs/quota.o] Error 1
> make: *** [fs] Error 2
Fixed patch follows (I messed up testing as I thought that lockdep is
under config option Debug mutexes but it is not...). I'm sorry for the
stupid mistake. Andrew, please discard my previous patch and use this new
one...

Honza
> On Tue, May 15, 2007 at 04:09:39PM +0200, Jan Kara wrote:
> > Hi,
> >
> > Thanks for report. It seems like a lockdep problem. i_mutex for quota
> > files is below dqonoff_sem. We were already fixing this for filesystem
> > specific quota IO functions but obviously we've missed a few cases. I
> > wonder why it showed up only now... Anyway, attached is a fix. Andrew,
> > would you add it? Thanks.
> >
> > Honza
> >
> > On Mon 14-05-07 19:43:18, Michal Piotrowski wrote:
> > > [adding Jan and fsdevel to CC]
> > >
> > > Hi Folkert,
> > >
> > > On 14/05/07, Folkert van Heusden <[email protected]> wrote:
> > > >Hi,
> > > >
> > > >When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> > > >and system on an 1-filesystem IDE disk, I get the following circular
> > > >locking dependency error:
> > > >
> > > >[330961.226405] =======================================================
> > > >[330961.226489] [ INFO: possible circular locking dependency detected ]
> > > >[330961.226531] 2.6.21 #5
> > > >[330961.226569] -------------------------------------------------------
> > > >[330961.226611] quotaoff/12249 is trying to acquire lock:
> > > >[330961.226652] (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.226861]
> > > >[330961.226862] but task is already holding lock:
> > > >[330961.226938] (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.227111]
> > > >[330961.227111] which lock already depends on the new lock.
> > > >[330961.227112]
> > > >[330961.227225]
> > > >[330961.227225] the existing dependency chain (in reverse order) is:
> > > >[330961.227303]
> > > >[330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
> > > >[330961.227473] [<c1039b02>] check_prev_add+0x15b/0x281
> > > >[330961.227766] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.228056] [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.228353] [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.228643] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.228934] [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.229221] [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
> > > >[330961.229513] [<c109fdd1>] vfs_quota_on+0x75/0x79
> > > >[330961.229803] [<c10bc92d>] ext3_quota_on+0x95/0xb0
> > > >[330961.230093] [<c10a1eb2>] do_quotactl+0xc9/0x2dd
> > > >[330961.230384] [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.230673] [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.230963] [<ffffffff>] 0xffffffff
> > > >[330961.231268]
> > > >[330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
> > > >[330961.231469] [<c10399db>] check_prev_add+0x34/0x281
> > > >[330961.231759] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.232049] [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.232344] [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.232632] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.232923] [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.233211] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > > >[330961.233500] [<c10a2088>] do_quotactl+0x29f/0x2dd
> > > >[330961.233792] [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.234081] [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.234503] [<ffffffff>] 0xffffffff
> > > >[330961.234795]
> > > >[330961.234795] other info that might help us debug this:
> > > >[330961.234796]
> > > >[330961.234908] 2 locks held by quotaoff/12249:
> > > >[330961.234947] #0: (&type->s_umount_key#15){----}, at: [<c1070b5d>]
> > > >get_super+0x53/0x94
> > > >[330961.235183] #1: (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.235386]
> > > >[330961.235387] stack backtrace:
> > > >[330961.235462] [<c1004d53>] show_trace_log_lvl+0x1a/0x30
> > > >[330961.235535] [<c1004d7b>] show_trace+0x12/0x14
> > > >[330961.235606] [<c1004e75>] dump_stack+0x16/0x18
> > > >[330961.235679] [<c1039352>] print_circular_bug_tail+0x6f/0x71
> > > >[330961.235753] [<c10399db>] check_prev_add+0x34/0x281
> > > >[330961.235825] [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.235897] [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.235969] [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.236041] [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.236113] [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.236185] [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > > >[330961.236257] [<c10a2088>] do_quotactl+0x29f/0x2dd
> > > >[330961.236330] [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.236402] [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.236473] =======================
> > > >
> > >
> > > Is this a 2.6.21 regression?
> > >
> > > Regards,
> > > Michal
> > >
> > > --
> > > Michal K. K. Piotrowski
> > > Kernel Monkeys
> > > (http://kernel.wikidot.com/start)
> > --
> > Jan Kara <[email protected]>
> > SuSE CR Labs
>
> > i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
> > acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
> > comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
> > file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
> > call under dqonoff_mutex and save some problems with races...
> >
> > Signed-off-by: Jan Kara <[email protected]>
> >
> > diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/dquot.c linux-2.6.21-1-quota_lockdep/fs/dquot.c
> > --- linux-2.6.21/fs/dquot.c 2007-05-15 14:18:47.000000000 +0200
> > +++ linux-2.6.21-1-quota_lockdep/fs/dquot.c 2007-05-15 14:22:47.000000000 +0200
> > @@ -1421,7 +1421,7 @@ int vfs_quota_off(struct super_block *sb
> > /* If quota was reenabled in the meantime, we have
> > * nothing to do */
> > if (!sb_has_quota_enabled(sb, cnt)) {
> > - mutex_lock(&toputinode[cnt]->i_mutex);
> > + mutex_lock_nested(&toputinode[cnt]->i_mutex, I_MUTEX_QUOTA);
> > toputinode[cnt]->i_flags &= ~(S_IMMUTABLE |
> > S_NOATIME | S_NOQUOTA);
> > truncate_inode_pages(&toputinode[cnt]->i_data, 0);
> > diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/quota.c linux-2.6.21-1-quota_lockdep/fs/quota.c
> > --- linux-2.6.21/fs/quota.c 2006-11-29 22:57:37.000000000 +0100
> > +++ linux-2.6.21-1-quota_lockdep/fs/quota.c 2007-05-15 15:15:44.000000000 +0200
> > @@ -158,7 +158,6 @@ static int check_quotactl_valid(struct s
> > static void quota_sync_sb(struct super_block *sb, int type)
> > {
> > int cnt;
> > - struct inode *discard[MAXQUOTAS];
> >
> > sb->s_qcop->quota_sync(sb, type);
> > /* This is not very clever (and fast) but currently I don't know about
> > @@ -168,29 +167,21 @@ static void quota_sync_sb(struct super_b
> > sb->s_op->sync_fs(sb, 1);
> > sync_blockdev(sb->s_bdev);
> >
> > - /* Now when everything is written we can discard the pagecache so
> > - * that userspace sees the changes. We need i_mutex and so we could
> > - * not do it inside dqonoff_mutex. Moreover we need to be carefull
> > - * about races with quotaoff() (that is the reason why we have own
> > - * reference to inode). */
> > + /*
> > + * Now when everything is written we can discard the pagecache so
> > + * that userspace sees the changes.
> > + */
> > mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
> > for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> > - discard[cnt] = NULL;
> > if (type != -1 && cnt != type)
> > continue;
> > if (!sb_has_quota_enabled(sb, cnt))
> > continue;
> > - discard[cnt] = igrab(sb_dqopt(sb)->files[cnt]);
> > + mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex, I_MUTEX_NESTED);
> > + truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
> > + mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
> > }
> > mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> > - for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> > - if (discard[cnt]) {
> > - mutex_lock(&discard[cnt]->i_mutex);
> > - truncate_inode_pages(&discard[cnt]->i_data, 0);
> > - mutex_unlock(&discard[cnt]->i_mutex);
> > - iput(discard[cnt]);
> > - }
> > - }
> > }
> >
> > void sync_dquots(struct super_block *sb, int type)
>
>
>
> Folkert van Heusden
>
> --
> http://www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
> ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
> HP/UX en win een vlaai naar keuze
> ----------------------------------------------------------------------
> Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com


Attachments:
(No filename) (9.21 kB)
quota-2.6.21-1-quota_lockdep.diff (2.87 kB)
Download all attachments

2007-05-16 09:30:27

by Thomas De Schampheleire

[permalink] [raw]
Subject: Re: [PATCH] "volatile considered harmful" document

On 5/9/07, Jonathan Corbet <[email protected]> wrote:
> +CREDITS
> +-------
> +
> +Original impetus and research by Randy Dunlap
> +Written by Jonathan Corbet
> +Improvements via coments from Satyam Sharma Johannes Stezenbach
>


Just a small spelling mistake: coments should be comments.