Jens,
Can you please take a look at this patch, and if you think it's sane,
add it to your explicit i/o plugging patchset? Would it make sense in
any of these paths to use io_schedule() instead of schedule()?
I hadn't looked at your patchset until I discovered that jfs was easy to
hang in the -mm kernel. I think jfs may be able to add explicit
plugging and unplugging in a couple of places, but I'd like to fix the
hang right away and take my time with any later patches.
Thanks,
Shaggy
JFS: Avoid deadlock introduced by explicit I/O plugging
jfs is pretty easy to deadlock with Jens' explicit i/o plugging patchset.
Just try building a kernel.
The problem occurs when a synchronous transaction initiates some I/O, then
waits in lmGroupCommit for the transaction to be committed to the journal.
This requires action by the commit thread, which ends up waiting on a page
to complete writeback. The commit thread did not initiate the I/O, so it
cannot unplug the io queue, and deadlock occurs.
The fix is for the first thread to call blk_replug_current_nested() before
going to sleep. This patch also adds the call to a couple other places that
look like they need it.
Signed-off-by: Dave Kleikamp <[email protected]>
diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h linux/fs/jfs/jfs_lock.h
--- linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h 2006-11-29 15:57:37.000000000 -0600
+++ linux/fs/jfs/jfs_lock.h 2007-01-17 15:30:19.000000000 -0600
@@ -22,6 +22,7 @@
#include <linux/spinlock.h>
#include <linux/mutex.h>
#include <linux/sched.h>
+#include <linux/blkdev.h>
/*
* jfs_lock.h
@@ -42,6 +43,7 @@ do { \
if (cond) \
break; \
unlock_cmd; \
+ blk_replug_current_nested(); \
schedule(); \
lock_cmd; \
} \
diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_metapage.c linux/fs/jfs/jfs_metapage.c
--- linux-2.6.20-rc4-mm1/fs/jfs/jfs_metapage.c 2007-01-12 09:50:45.000000000 -0600
+++ linux/fs/jfs/jfs_metapage.c 2007-01-17 15:28:46.000000000 -0600
@@ -23,6 +23,7 @@
#include <linux/init.h>
#include <linux/buffer_head.h>
#include <linux/mempool.h>
+#include <linux/blkdev.h>
#include "jfs_incore.h"
#include "jfs_superblock.h"
#include "jfs_filsys.h"
@@ -56,6 +57,7 @@ static inline void __lock_metapage(struc
set_current_state(TASK_UNINTERRUPTIBLE);
if (metapage_locked(mp)) {
unlock_page(mp->page);
+ blk_replug_current_nested();
schedule();
lock_page(mp->page);
}
diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_txnmgr.c linux/fs/jfs/jfs_txnmgr.c
--- linux-2.6.20-rc4-mm1/fs/jfs/jfs_txnmgr.c 2007-01-12 09:50:45.000000000 -0600
+++ linux/fs/jfs/jfs_txnmgr.c 2007-01-17 15:29:04.000000000 -0600
@@ -50,6 +50,7 @@
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/kthread.h>
+#include <linux/blkdev.h>
#include "jfs_incore.h"
#include "jfs_inode.h"
#include "jfs_filsys.h"
@@ -135,6 +136,7 @@ static inline void TXN_SLEEP_DROP_LOCK(w
add_wait_queue(event, &wait);
set_current_state(TASK_UNINTERRUPTIBLE);
TXN_UNLOCK();
+ blk_replug_current_nested();
schedule();
current->state = TASK_RUNNING;
remove_wait_queue(event, &wait);
--
David Kleikamp
IBM Linux Technology Center
On Wed, Jan 17 2007, Dave Kleikamp wrote:
> Jens,
> Can you please take a look at this patch, and if you think it's sane,
> add it to your explicit i/o plugging patchset? Would it make sense in
> any of these paths to use io_schedule() instead of schedule()?
I'm glad you bring that up, actually. One of the "downsides" of the new
unplugging is that it really requires anyone waiting for IO in a path
like the file system or device driver to use io_schedule() instead of
schedule() to get the blk_replug_current_nested() done to avoid
deadlocks. While it is annoying that it could introduce some deadlocks
until we get things fixed it, I do consider it a correctness fix even in
the generic kernel, as you are really waiting for IO and as such should
use io_schedule() in the first place.
Perhaps I should add a WARN_ON() check for this to catch these bugs
upfront.
> I hadn't looked at your patchset until I discovered that jfs was easy to
> hang in the -mm kernel. I think jfs may be able to add explicit
> plugging and unplugging in a couple of places, but I'd like to fix the
> hang right away and take my time with any later patches.
Can you try io_schedule() and verify that things just work?
--
Jens Axboe
On Thu, 2007-01-18 at 10:18 +1100, Jens Axboe wrote:
> Can you try io_schedule() and verify that things just work?
I actually did do that in the first place, but wondered if it was the
right thing to introduce the accounting changes that came with that.
I'll change it back to io_schedule() and test it again, just to make
sure.
If that's the right fix, I can push it directly since it won't have any
dependencies on your patches.
Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center
On Wed, Jan 17 2007, Dave Kleikamp wrote:
> On Thu, 2007-01-18 at 10:18 +1100, Jens Axboe wrote:
>
> > Can you try io_schedule() and verify that things just work?
>
> I actually did do that in the first place, but wondered if it was the
> right thing to introduce the accounting changes that came with that.
> I'll change it back to io_schedule() and test it again, just to make
> sure.
It appears to be the correct change to me - you really are waiting for
IO resources (otherwise it would not hang with the plug change), so
doing an inc/dec of iowait around the schedule should be done.
> If that's the right fix, I can push it directly since it won't have any
> dependencies on your patches.
Perfect!
--
Jens Axboe
On Thu, 2007-01-18 at 10:46 +1100, Jens Axboe wrote:
> On Wed, Jan 17 2007, Dave Kleikamp wrote:
> > On Thu, 2007-01-18 at 10:18 +1100, Jens Axboe wrote:
> >
> > > Can you try io_schedule() and verify that things just work?
> >
> > I actually did do that in the first place, but wondered if it was the
> > right thing to introduce the accounting changes that came with that.
> > I'll change it back to io_schedule() and test it again, just to make
> > sure.
>
> It appears to be the correct change to me - you really are waiting for
> IO resources (otherwise it would not hang with the plug change), so
> doing an inc/dec of iowait around the schedule should be done.
Okay, here it is.
> > If that's the right fix, I can push it directly since it won't have any
> > dependencies on your patches.
>
> Perfect!
It should make the next -mm.
JFS: call io_schedule() instead of schedule() to avoid deadlock
The introduction of Jens Axboe's explicit i/o plugging patches introduced a
deadlock in jfs. This was caused by the process initiating I/O not
unplugging the queue before waiting on the commit thread. The commit
thread itself was waiting for that I/O to complete. Calling io_schedule()
rather than schedule() unplugs the I/O queue avoiding the deadlock, and it
appears to be the right function to call in any case.
Signed-off-by: Dave Kleikamp <[email protected]>
---
commit 4aa0d230c2cfc1ac4bcf7c5466f9943cf14233a9
tree b873dce6146f4880c6c48ab53c0079566f52a60b
parent 82d5b9a7c63054a9a2cd838ffd177697f86e7e34
author Dave Kleikamp <[email protected]> Wed, 17 Jan 2007 21:18:35 -0600
committer Dave Kleikamp <[email protected]> Wed, 17 Jan 2007 21:18:35 -0600
fs/jfs/jfs_lock.h | 2 +-
fs/jfs/jfs_metapage.c | 2 +-
fs/jfs/jfs_txnmgr.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/jfs/jfs_lock.h b/fs/jfs/jfs_lock.h
index 7d78e83..df48ece 100644
--- a/fs/jfs/jfs_lock.h
+++ b/fs/jfs/jfs_lock.h
@@ -42,7 +42,7 @@ do { \
if (cond) \
break; \
unlock_cmd; \
- schedule(); \
+ io_schedule(); \
lock_cmd; \
} \
current->state = TASK_RUNNING; \
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index ceaf03b..58deae0 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -56,7 +56,7 @@ static inline void __lock_metapage(struct metapage *mp)
set_current_state(TASK_UNINTERRUPTIBLE);
if (metapage_locked(mp)) {
unlock_page(mp->page);
- schedule();
+ io_schedule();
lock_page(mp->page);
}
} while (trylock_metapage(mp));
diff --git a/fs/jfs/jfs_txnmgr.c b/fs/jfs/jfs_txnmgr.c
index d558e51..6988a10 100644
--- a/fs/jfs/jfs_txnmgr.c
+++ b/fs/jfs/jfs_txnmgr.c
@@ -135,7 +135,7 @@ static inline void TXN_SLEEP_DROP_LOCK(wait_queue_head_t * event)
add_wait_queue(event, &wait);
set_current_state(TASK_UNINTERRUPTIBLE);
TXN_UNLOCK();
- schedule();
+ io_schedule();
current->state = TASK_RUNNING;
remove_wait_queue(event, &wait);
}
--
David Kleikamp
IBM Linux Technology Center
On Wed, Jan 17, 2007 at 04:55:49PM -0600, Dave Kleikamp wrote:
...
> diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h linux/fs/jfs/jfs_lock.h
> --- linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h 2006-11-29 15:57:37.000000000 -0600
> +++ linux/fs/jfs/jfs_lock.h 2007-01-17 15:30:19.000000000 -0600
> @@ -22,6 +22,7 @@
> #include <linux/spinlock.h>
> #include <linux/mutex.h>
> #include <linux/sched.h>
> +#include <linux/blkdev.h>
>
> /*
> * jfs_lock.h
> @@ -42,6 +43,7 @@ do { \
> if (cond) \
> break; \
> unlock_cmd; \
> + blk_replug_current_nested(); \
> schedule(); \
> lock_cmd; \
> } \
Is {,un}lock_cmd a macro? ...
Jeff.
--
Keyboard not found!
Press F1 to enter Setup
On Thu, 2007-01-18 at 01:30 -0500, Josef Sipek wrote:
> On Wed, Jan 17, 2007 at 04:55:49PM -0600, Dave Kleikamp wrote:
> > /*
> > * jfs_lock.h
> > @@ -42,6 +43,7 @@ do { \
> > if (cond) \
> > break; \
> > unlock_cmd; \
> > + blk_replug_current_nested(); \
> > schedule(); \
> > lock_cmd; \
> > } \
>
> Is {,un}lock_cmd a macro? ...
They are the commands passed into this macro (as arguments) to
release/take a lock. This is a home-grown wait_on_event type macro
where the condition must be checked while holding a lock. And the
commands passed in are themselves macros. The jfs code could probably
be cleaned up a bit as far as excessive use of macros for locking, but
it's been working for a few years with few problems.
--
David Kleikamp
IBM Linux Technology Center