Avoid lockdep warn by having both lock acquirements sleepable.
BTM also have to go to sleep if lock failed at the first time, otherwise it
will race with tty_open()->tty_lock().
Signed-off-by: Jie Liu <[email protected]>
---
drivers/tty/tty_io.c | 35 +++++++++++++++++++++++++++++++----
1 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index b425c79..2b4664d 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1098,12 +1098,40 @@ out:
*
* We must still hold the BTM and test the CLOSING flag for the moment.
*/
-
void tty_write_message(struct tty_struct *tty, char *msg)
{
+retry:
if (tty) {
- mutex_lock(&tty->atomic_write_lock);
- tty_lock();
+ /*
+ * tty_write_message() will invoked by print_warning()
+ * at fs/quota/dquot.c if CONFIG_PRINT_QUOTA_WARNING
+ * is enabled when a user running out of disk quota limits.
+ * It will end up call tty_write(). Here is a potential race
+ * situation since tty_write() call copy_from_user() which
+ * might produce page faults and turn to invoke inodes dirty
+ * process on the underlying file systems if needed, and
+ * it will try to acquire JBD/JBD2 lock accordingly. This
+ * might make lockdep unhappy to print dead lock warning.
+ * To solve this issue, we have to go to sleep and relinquish
+ * the CPU power to another process until the atomic_write_lock
+ * became available.
+ */
+ if (!mutex_trylock(&tty->atomic_write_lock)) {
+ DEFINE_WAIT(wait);
+ prepare_to_wait_exclusive(&tty->write_wait, &wait,
+ TASK_INTERRUPTIBLE);
+ schedule();
+ finish_wait(&tty->write_wait, &wait);
+ goto retry;
+ }
+
+ /*
+ * Call tty_lock() directly might race with tty_open() even
+ * if we have already got the atomic_write_lock. However, we
+ * must perform TTY_CLOSING check up with the BTM hold, so call
+ * tty_lock_wait() which is sleepable instead.
+ */
+ tty_lock_wait();
if (tty->ops->write && !test_bit(TTY_CLOSING, &tty->flags)) {
tty_unlock();
tty->ops->write(tty, msg, strlen(msg));
@@ -1114,7 +1142,6 @@ void tty_write_message(struct tty_struct *tty, char *msg)
return;
}
-
/**
* tty_write - write method for tty device file
* @file: tty file pointer
--
1.7.9
> + * tty_write_message() will invoked by print_warning()
> + * at fs/quota/dquot.c if CONFIG_PRINT_QUOTA_WARNING
> + * is enabled when a user running out of disk quota limits.
> + * It will end up call tty_write(). Here is a potential race
tty->ops->write is the low level write method, not tty_write.
This appears to be even more wrong than the other one in other ways too -
it uses interruptible sleeps but doesn't handle the signal case so will
spin on a signal and kill the box.
NAK
Looking gat the traces I suspect what you've actually got is a much more
complicated deadlock where a process doing perfectly normal I/O to the
tty has faulted and there is a chain of dependancies through the file
system code to the thread which is doing the dquot_alloc_inode.
If that is the case then dquot_alloc_inode shouldn't be making blocking
calls to tty_write_message and probably the right thing to do is to queue
work for it so the tty_write_message is done asynchronously.
There are a very limited number of events that need reporting so probably
something like a per mount flags and workqueue would allow you to do
set_bit(DQUOT_INODEOVER, &foo->events);
schedule_work()
and the work queue can just xchg the events long for 0 and spew any
messages required.
Alan
Hey Alan,
On 06/30/2012 08:44 PM, Alan Cox wrote:
>> + * tty_write_message() will invoked by print_warning()
>> + * at fs/quota/dquot.c if CONFIG_PRINT_QUOTA_WARNING
>> + * is enabled when a user running out of disk quota limits.
>> + * It will end up call tty_write(). Here is a potential race
>
> tty->ops->write is the low level write method, not tty_write.
I was wondering if below call trace is come from tty_write_message()->tty->ops->write()?
[ 2739.802106] -> #1 (&mm->mmap_sem){++++++}:
[ 2739.802120] [<c10fa825>] lock_acquire+0x14e/0x189
[ 2739.802133] [<c11e3e83>] might_fault+0xbf/0xf8
[ 2739.802154] [<c13bcbdf>] _copy_from_user+0x40/0x8a
[ 2739.802175] [<c14addc3>] copy_from_user+0x16/0x26
[ 2739.802195] [<c14b00b4>] tty_write+0x282/0x3c7
[ 2739.802212] [<c14b02dd>] redirected_tty_write+0xe4/0xfd
[ 2739.802226] [<c1231879>] vfs_write+0xf5/0x1a3
[ 2739.802239] [<c1231bdc>] sys_write+0x6c/0xa9
[ 2739.802253] [<c186281f>] sysenter_do_call+0x12/0x38
>
> This appears to be even more wrong than the other one in other ways too -
> it uses interruptible sleeps but doesn't handle the signal case so will
> spin on a signal and kill the box.
>
> NAK
>
> Looking gat the traces I suspect what you've actually got is a much more
> complicated deadlock where a process doing perfectly normal I/O to the
> tty has faulted and there is a chain of dependancies through the file
> system code to the thread which is doing the dquot_alloc_inode.
>
> If that is the case then dquot_alloc_inode shouldn't be making blocking
> calls to tty_write_message and probably the right thing to do is to queue
> work for it so the tty_write_message is done asynchronously.
>
> There are a very limited number of events that need reporting so probably
> something like a per mount flags and workqueue would allow you to do
>
> set_bit(DQUOT_INODEOVER, &foo->events);
> schedule_work()
>
> and the work queue can just xchg the events long for 0 and spew any
> messages required.
Thanks for the teaching, I'll give a try.
-Jeff
>
> Alan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html