2007-01-20 00:47:42

by Junichi Nomura

[permalink] [raw]
Subject: [PATCH 2.6.20-rc5] dm-multipath: fix stall on noflush suspend/resume

Allow noflush suspend/resume of device-mapper device only for
the case where the device size is unchanged.

Otherwise, dm-multipath devices can stall when resumed if noflush
was used when suspending them, all paths have failed and
queue_if_no_path is set.

Explanation:
1. Something is doing fsync() on the block dev,
holding inode->i_sem
2. The fsync write is blocked by all-paths-down and queue_if_no_path
3. Someone requests to suspend the dm device with noflush.
Pending writes are left in queue.
4. In the middle of dm_resume(), __bind() tries to get
inode->i_sem to do __set_size() and waits forever.

Signed-off-by: Jun'ichi Nomura <[email protected]>

---
'noflush suspend' is a new device-mapper feature introduced in
early 2.6.20. So I hope the fix being included before 2.6.20 is
released.

Example of reproducer:
1. Create a multipath device by dmsetup
2. Fail all paths during mkfs
3. Do dmsetup suspend --noflush and load new map with healthy paths
4. Do dmsetup resume


drivers/md/dm.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)


Attachments:
dm-noflush-fix-stall-on-resume.patch (1.63 kB)