just a simple test I did...
xfs_freeze -f /mnt/newtest
cp /etc/fstab /mnt/newtest
xfs_freeze -u /mnt/newtest
2007-01-04 01:44:30.341979500 <4>BUG: warning at kernel/mutex-debug.c:80/debug_mutex_unlock()
2007-01-04 01:44:30.385771500 <4> [<c0103cfb>] dump_trace+0x215/0x21a
2007-01-04 01:44:30.385774500 <4> [<c0103da3>] show_trace_log_lvl+0x1a/0x30
2007-01-04 01:44:30.385775500 <4> [<c0103dcb>] show_trace+0x12/0x14
2007-01-04 01:44:30.385777500 <4> [<c0103ec8>] dump_stack+0x19/0x1b
2007-01-04 01:44:30.385778500 <4> [<c013a3af>] debug_mutex_unlock+0x69/0x120
2007-01-04 01:44:30.385779500 <4> [<c04b4aac>] __mutex_unlock_slowpath+0x44/0xf0
2007-01-04 01:44:30.385780500 <4> [<c04b4887>] mutex_unlock+0x8/0xa
2007-01-04 01:44:30.385782500 <4> [<c018d0ba>] thaw_bdev+0x57/0x6e
2007-01-04 01:44:30.385791500 <4> [<c026a6cf>] xfs_ioctl+0x7ce/0x7d3
2007-01-04 01:44:30.385793500 <4> [<c0269158>] xfs_file_ioctl+0x33/0x54
2007-01-04 01:44:30.385794500 <4> [<c01793f2>] do_ioctl+0x76/0x85
2007-01-04 01:44:30.385795500 <4> [<c0179570>] vfs_ioctl+0x59/0x1aa
2007-01-04 01:44:30.385796500 <4> [<c0179728>] sys_ioctl+0x67/0x77
2007-01-04 01:44:30.385797500 <4> [<c0102e73>] syscall_call+0x7/0xb
2007-01-04 01:44:30.385799500 <4> [<001be410>] 0x1be410
2007-01-04 01:44:30.385800500 <4> =======================
fstab was there just fine after -u.
Linux 2.6.19.1 SMP on Pentium D.
--
On Thu, Jan 04, 2007 at 02:14:21AM +0200, Sami Farin wrote:
> just a simple test I did...
> xfs_freeze -f /mnt/newtest
> cp /etc/fstab /mnt/newtest
> xfs_freeze -u /mnt/newtest
>
> 2007-01-04 01:44:30.341979500 <4>BUG: warning at kernel/mutex-debug.c:80/debug_mutex_unlock()
> 2007-01-04 01:44:30.385771500 <4> [<c0103cfb>] dump_trace+0x215/0x21a
> 2007-01-04 01:44:30.385774500 <4> [<c0103da3>] show_trace_log_lvl+0x1a/0x30
> 2007-01-04 01:44:30.385775500 <4> [<c0103dcb>] show_trace+0x12/0x14
> 2007-01-04 01:44:30.385777500 <4> [<c0103ec8>] dump_stack+0x19/0x1b
> 2007-01-04 01:44:30.385778500 <4> [<c013a3af>] debug_mutex_unlock+0x69/0x120
> 2007-01-04 01:44:30.385779500 <4> [<c04b4aac>] __mutex_unlock_slowpath+0x44/0xf0
> 2007-01-04 01:44:30.385780500 <4> [<c04b4887>] mutex_unlock+0x8/0xa
> 2007-01-04 01:44:30.385782500 <4> [<c018d0ba>] thaw_bdev+0x57/0x6e
> 2007-01-04 01:44:30.385791500 <4> [<c026a6cf>] xfs_ioctl+0x7ce/0x7d3
> 2007-01-04 01:44:30.385793500 <4> [<c0269158>] xfs_file_ioctl+0x33/0x54
> 2007-01-04 01:44:30.385794500 <4> [<c01793f2>] do_ioctl+0x76/0x85
> 2007-01-04 01:44:30.385795500 <4> [<c0179570>] vfs_ioctl+0x59/0x1aa
> 2007-01-04 01:44:30.385796500 <4> [<c0179728>] sys_ioctl+0x67/0x77
> 2007-01-04 01:44:30.385797500 <4> [<c0102e73>] syscall_call+0x7/0xb
> 2007-01-04 01:44:30.385799500 <4> [<001be410>] 0x1be410
> 2007-01-04 01:44:30.385800500 <4> =======================
>
> fstab was there just fine after -u.
Oh, that still hasn't been fixed? Generic bug, not XFS - the global
semaphore->mutex cleanup converted the bd_mount_sem to a mutex, and
mutexes complain loudly when a the process unlocking the mutex is
not the process that locked it.
Basically, the generic code is broken - the bd_mount_mutex needs to
be reverted back to a semaphore because it is locked and unlocked
by different processes. The following patch does this....
BTW, Sami, can you cc [email protected] on XFS bug reports in future;
you'll get more XFS savvy eyes there.....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
---
Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.
Signed-off-by: Dave Chinner <[email protected]>
---
fs/block_dev.c | 2 +-
fs/buffer.c | 6 +++---
fs/gfs2/ops_fstype.c | 4 ++--
fs/super.c | 4 ++--
include/linux/fs.h | 2 +-
5 files changed, 9 insertions(+), 9 deletions(-)
Index: 2.6.x-xfs-new/fs/block_dev.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/block_dev.c 2006-12-22 10:53:20.000000000 +1100
+++ 2.6.x-xfs-new/fs/block_dev.c 2007-01-08 08:26:15.843378600 +1100
@@ -263,7 +263,7 @@ static void init_once(void * foo, kmem_c
{
memset(bdev, 0, sizeof(*bdev));
mutex_init(&bdev->bd_mutex);
- mutex_init(&bdev->bd_mount_mutex);
+ sema_init(&bdev->bd_mount_sem, 1);
INIT_LIST_HEAD(&bdev->bd_inodes);
INIT_LIST_HEAD(&bdev->bd_list);
#ifdef CONFIG_SYSFS
Index: 2.6.x-xfs-new/fs/buffer.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/buffer.c 2006-12-12 12:04:51.000000000 +1100
+++ 2.6.x-xfs-new/fs/buffer.c 2007-01-08 08:28:40.832542651 +1100
@@ -179,7 +179,7 @@ int fsync_bdev(struct block_device *bdev
* freeze_bdev -- lock a filesystem and force it into a consistent state
* @bdev: blockdevice to lock
*
- * This takes the block device bd_mount_mutex to make sure no new mounts
+ * This takes the block device bd_mount_sem to make sure no new mounts
* happen on bdev until thaw_bdev() is called.
* If a superblock is found on this device, we take the s_umount semaphore
* on it to make sure nobody unmounts until the snapshot creation is done.
@@ -188,7 +188,7 @@ struct super_block *freeze_bdev(struct b
{
struct super_block *sb;
- mutex_lock(&bdev->bd_mount_mutex);
+ down(&bdev->bd_mount_sem);
sb = get_super(bdev);
if (sb && !(sb->s_flags & MS_RDONLY)) {
sb->s_frozen = SB_FREEZE_WRITE;
@@ -230,7 +230,7 @@ void thaw_bdev(struct block_device *bdev
drop_super(sb);
}
- mutex_unlock(&bdev->bd_mount_mutex);
+ up(&bdev->bd_mount_sem);
}
EXPORT_SYMBOL(thaw_bdev);
Index: 2.6.x-xfs-new/fs/gfs2/ops_fstype.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/gfs2/ops_fstype.c 2006-12-12 12:04:58.000000000 +1100
+++ 2.6.x-xfs-new/fs/gfs2/ops_fstype.c 2007-01-08 08:27:12.847973663 +1100
@@ -867,9 +867,9 @@ static int gfs2_get_sb_meta(struct file_
error = -EBUSY;
goto error;
}
- mutex_lock(&sb->s_bdev->bd_mount_mutex);
+ down(&sb->s_bdev->bd_mount_sem);
new = sget(fs_type, test_bdev_super, set_bdev_super, sb->s_bdev);
- mutex_unlock(&sb->s_bdev->bd_mount_mutex);
+ up(&sb->s_bdev->bd_mount_sem);
if (IS_ERR(new)) {
error = PTR_ERR(new);
goto error;
Index: 2.6.x-xfs-new/fs/super.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/super.c 2006-12-22 11:45:59.000000000 +1100
+++ 2.6.x-xfs-new/fs/super.c 2007-01-08 08:24:20.718330640 +1100
@@ -736,9 +736,9 @@ int get_sb_bdev(struct file_system_type
* will protect the lockfs code from trying to start a snapshot
* while we are mounting
*/
- mutex_lock(&bdev->bd_mount_mutex);
+ down(&bdev->bd_mount_sem);
s = sget(fs_type, test_bdev_super, set_bdev_super, bdev);
- mutex_unlock(&bdev->bd_mount_mutex);
+ up(&bdev->bd_mount_sem);
if (IS_ERR(s))
goto error_s;
Index: 2.6.x-xfs-new/include/linux/fs.h
===================================================================
--- 2.6.x-xfs-new.orig/include/linux/fs.h 2006-12-12 12:06:31.000000000 +1100
+++ 2.6.x-xfs-new/include/linux/fs.h 2007-01-08 08:24:53.602060200 +1100
@@ -456,7 +456,7 @@ struct block_device {
struct inode * bd_inode; /* will die */
int bd_openers;
struct mutex bd_mutex; /* open/close mutex */
- struct mutex bd_mount_mutex; /* mount mutex */
+ struct semaphore bd_mount_sem;
struct list_head bd_inodes;
void * bd_holder;
int bd_holders;
On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
...
> > fstab was there just fine after -u.
>
> Oh, that still hasn't been fixed?
Looked like it =)
> Generic bug, not XFS - the global
> semaphore->mutex cleanup converted the bd_mount_sem to a mutex, and
> mutexes complain loudly when a the process unlocking the mutex is
> not the process that locked it.
>
> Basically, the generic code is broken - the bd_mount_mutex needs to
> be reverted back to a semaphore because it is locked and unlocked
> by different processes. The following patch does this....
>
> BTW, Sami, can you cc [email protected] on XFS bug reports in future;
> you'll get more XFS savvy eyes there.....
Forgot to.
Thanks for patch. It fixed the issue, no more warnings.
BTW. the fix is not in 2.6.git, either.
--
Do what you love because life is too short for anything else.
Sami Farin wrote:
> On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
> ...
>>> fstab was there just fine after -u.
>> Oh, that still hasn't been fixed?
>
> Looked like it =)
Hm, it was proposed upstream a while ago:
http://lkml.org/lkml/2006/9/27/137
I guess it got lost?
-Eric
>> Generic bug, not XFS - the global
>> semaphore->mutex cleanup converted the bd_mount_sem to a mutex, and
>> mutexes complain loudly when a the process unlocking the mutex is
>> not the process that locked it.
>>
>> Basically, the generic code is broken - the bd_mount_mutex needs to
>> be reverted back to a semaphore because it is locked and unlocked
>> by different processes. The following patch does this....
>>
>> BTW, Sami, can you cc [email protected] on XFS bug reports in future;
>> you'll get more XFS savvy eyes there.....
>
> Forgot to.
>
> Thanks for patch. It fixed the issue, no more warnings.
>
> BTW. the fix is not in 2.6.git, either.
>
On Mon, Jan 08, 2007 at 10:40:54AM -0600, Eric Sandeen wrote:
> Sami Farin wrote:
> > On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
> > ...
> >>> fstab was there just fine after -u.
> >> Oh, that still hasn't been fixed?
> >
> > Looked like it =)
>
> Hm, it was proposed upstream a while ago:
>
> http://lkml.org/lkml/2006/9/27/137
>
> I guess it got lost?
Seems like it. Andrew, did this ever get queued for merge?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
On Mon, 8 Jan 2007 08:37:34 +1100
David Chinner <[email protected]> wrote:
> On Thu, Jan 04, 2007 at 02:14:21AM +0200, Sami Farin wrote:
> > just a simple test I did...
> > xfs_freeze -f /mnt/newtest
> > cp /etc/fstab /mnt/newtest
> > xfs_freeze -u /mnt/newtest
> >
> > 2007-01-04 01:44:30.341979500 <4>BUG: warning at kernel/mutex-debug.c:80/debug_mutex_unlock()
> > 2007-01-04 01:44:30.385771500 <4> [<c0103cfb>] dump_trace+0x215/0x21a
> > 2007-01-04 01:44:30.385774500 <4> [<c0103da3>] show_trace_log_lvl+0x1a/0x30
> > 2007-01-04 01:44:30.385775500 <4> [<c0103dcb>] show_trace+0x12/0x14
> > 2007-01-04 01:44:30.385777500 <4> [<c0103ec8>] dump_stack+0x19/0x1b
> > 2007-01-04 01:44:30.385778500 <4> [<c013a3af>] debug_mutex_unlock+0x69/0x120
> > 2007-01-04 01:44:30.385779500 <4> [<c04b4aac>] __mutex_unlock_slowpath+0x44/0xf0
> > 2007-01-04 01:44:30.385780500 <4> [<c04b4887>] mutex_unlock+0x8/0xa
> > 2007-01-04 01:44:30.385782500 <4> [<c018d0ba>] thaw_bdev+0x57/0x6e
> > 2007-01-04 01:44:30.385791500 <4> [<c026a6cf>] xfs_ioctl+0x7ce/0x7d3
> > 2007-01-04 01:44:30.385793500 <4> [<c0269158>] xfs_file_ioctl+0x33/0x54
> > 2007-01-04 01:44:30.385794500 <4> [<c01793f2>] do_ioctl+0x76/0x85
> > 2007-01-04 01:44:30.385795500 <4> [<c0179570>] vfs_ioctl+0x59/0x1aa
> > 2007-01-04 01:44:30.385796500 <4> [<c0179728>] sys_ioctl+0x67/0x77
> > 2007-01-04 01:44:30.385797500 <4> [<c0102e73>] syscall_call+0x7/0xb
> > 2007-01-04 01:44:30.385799500 <4> [<001be410>] 0x1be410
> > 2007-01-04 01:44:30.385800500 <4> =======================
> >
> > fstab was there just fine after -u.
>
> Oh, that still hasn't been fixed? Generic bug, not XFS - the global
> semaphore->mutex cleanup converted the bd_mount_sem to a mutex, and
> mutexes complain loudly when a the process unlocking the mutex is
> not the process that locked it.
>
> Basically, the generic code is broken - the bd_mount_mutex needs to
> be reverted back to a semaphore because it is locked and unlocked
> by different processes. The following patch does this....
>
> ...
>
> Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
> xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.
Sad. The alternative would be to implement
mutex_unlock_dont_warn_if_a_different_task_did_it(). Ingo? Possible?
On Tue, 9 Jan 2007 10:47:28 +1100
David Chinner <[email protected]> wrote:
> On Mon, Jan 08, 2007 at 10:40:54AM -0600, Eric Sandeen wrote:
> > Sami Farin wrote:
> > > On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
> > > ...
> > >>> fstab was there just fine after -u.
> > >> Oh, that still hasn't been fixed?
> > >
> > > Looked like it =)
> >
> > Hm, it was proposed upstream a while ago:
> >
> > http://lkml.org/lkml/2006/9/27/137
> >
> > I guess it got lost?
>
> Seems like it. Andrew, did this ever get queued for merge?
Seems not. I think people were hoping that various nasties in there
would go away. We return to userspace with a kernel lock held??
On Mon, 08 Jan 2007 21:12:40 -0600
Eric Sandeen <[email protected]> wrote:
> Andrew Morton wrote:
> > On Tue, 9 Jan 2007 10:47:28 +1100
> > David Chinner <[email protected]> wrote:
> >
> >> On Mon, Jan 08, 2007 at 10:40:54AM -0600, Eric Sandeen wrote:
> >>> Sami Farin wrote:
> >>>> On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
> >>>> ...
> >>>>>> fstab was there just fine after -u.
> >>>>> Oh, that still hasn't been fixed?
> >>>> Looked like it =)
> >>> Hm, it was proposed upstream a while ago:
> >>>
> >>> http://lkml.org/lkml/2006/9/27/137
> >>>
> >>> I guess it got lost?
> >> Seems like it. Andrew, did this ever get queued for merge?
> >
> > Seems not. I think people were hoping that various nasties in there
> > would go away. We return to userspace with a kernel lock held??
>
> Is a semaphore any worse than the current mutex in this respect? At
> least unlocking from another thread doesn't violate semaphore rules. :)
I assume that if we weren't returning to userspace with a lock held, this
mutex problem would simply go away.
Andrew Morton wrote:
> On Tue, 9 Jan 2007 10:47:28 +1100
> David Chinner <[email protected]> wrote:
>
>> On Mon, Jan 08, 2007 at 10:40:54AM -0600, Eric Sandeen wrote:
>>> Sami Farin wrote:
>>>> On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
>>>> ...
>>>>>> fstab was there just fine after -u.
>>>>> Oh, that still hasn't been fixed?
>>>> Looked like it =)
>>> Hm, it was proposed upstream a while ago:
>>>
>>> http://lkml.org/lkml/2006/9/27/137
>>>
>>> I guess it got lost?
>> Seems like it. Andrew, did this ever get queued for merge?
>
> Seems not. I think people were hoping that various nasties in there
> would go away. We return to userspace with a kernel lock held??
Is a semaphore any worse than the current mutex in this respect? At
least unlocking from another thread doesn't violate semaphore rules. :)
-Eric
Andrew Morton wrote:
> On Mon, 08 Jan 2007 21:12:40 -0600
> Eric Sandeen <[email protected]> wrote:
>
>> Andrew Morton wrote:
>>> On Tue, 9 Jan 2007 10:47:28 +1100
>>> David Chinner <[email protected]> wrote:
>>>
>>>> On Mon, Jan 08, 2007 at 10:40:54AM -0600, Eric Sandeen wrote:
>>>>> Sami Farin wrote:
>>>>>> On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
>>>>>> ...
>>>>>>>> fstab was there just fine after -u.
>>>>>>> Oh, that still hasn't been fixed?
>>>>>> Looked like it =)
>>>>> Hm, it was proposed upstream a while ago:
>>>>>
>>>>> http://lkml.org/lkml/2006/9/27/137
>>>>>
>>>>> I guess it got lost?
>>>> Seems like it. Andrew, did this ever get queued for merge?
>>> Seems not. I think people were hoping that various nasties in there
>>> would go away. We return to userspace with a kernel lock held??
>> Is a semaphore any worse than the current mutex in this respect? At
>> least unlocking from another thread doesn't violate semaphore rules. :)
>
> I assume that if we weren't returning to userspace with a lock held, this
> mutex problem would simply go away.
>
Well nobody's asserting that the filesystem must always be locked &
unlocked by the same thread, are they? That'd be a strange rule to
enforce upon the userspace doing the filesystem management wouldn't it?
Or am I thinking about this wrong...
-Eric
On Mon, 08 Jan 2007 21:38:05 -0600
Eric Sandeen <[email protected]> wrote:
> Andrew Morton wrote:
> > On Mon, 08 Jan 2007 21:12:40 -0600
> > Eric Sandeen <[email protected]> wrote:
> >
> >> Andrew Morton wrote:
> >>> On Tue, 9 Jan 2007 10:47:28 +1100
> >>> David Chinner <[email protected]> wrote:
> >>>
> >>>> On Mon, Jan 08, 2007 at 10:40:54AM -0600, Eric Sandeen wrote:
> >>>>> Sami Farin wrote:
> >>>>>> On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
> >>>>>> ...
> >>>>>>>> fstab was there just fine after -u.
> >>>>>>> Oh, that still hasn't been fixed?
> >>>>>> Looked like it =)
> >>>>> Hm, it was proposed upstream a while ago:
> >>>>>
> >>>>> http://lkml.org/lkml/2006/9/27/137
> >>>>>
> >>>>> I guess it got lost?
> >>>> Seems like it. Andrew, did this ever get queued for merge?
> >>> Seems not. I think people were hoping that various nasties in there
> >>> would go away. We return to userspace with a kernel lock held??
> >> Is a semaphore any worse than the current mutex in this respect? At
> >> least unlocking from another thread doesn't violate semaphore rules. :)
> >
> > I assume that if we weren't returning to userspace with a lock held, this
> > mutex problem would simply go away.
> >
>
> Well nobody's asserting that the filesystem must always be locked &
> unlocked by the same thread, are they? That'd be a strange rule to
> enforce upon the userspace doing the filesystem management wouldn't it?
> Or am I thinking about this wrong...
I don't even know what code we're talking about here...
I'm under the impression that XFS will return to userspace with a
filesystem lock held, under the expectation (ie: requirement) that
userspace will later come in and release that lock.
If that's not true, then what _is_ happening in there?
If that _is_ true then, well, that sucks a bit.
On Mon, 2007-01-08 at 19:51 -0800, Andrew Morton wrote:
> On Mon, 08 Jan 2007 21:38:05 -0600
> Eric Sandeen <[email protected]> wrote:
>
> > Andrew Morton wrote:
> > > On Mon, 08 Jan 2007 21:12:40 -0600
> > > Eric Sandeen <[email protected]> wrote:
> > >
> > >> Andrew Morton wrote:
> > >>> On Tue, 9 Jan 2007 10:47:28 +1100
> > >>> David Chinner <[email protected]> wrote:
> > >>>
> > >>>> On Mon, Jan 08, 2007 at 10:40:54AM -0600, Eric Sandeen wrote:
> > >>>>> Sami Farin wrote:
> > >>>>>> On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
> > >>>>>> ...
> > >>>>>>>> fstab was there just fine after -u.
> > >>>>>>> Oh, that still hasn't been fixed?
> > >>>>>> Looked like it =)
> > >>>>> Hm, it was proposed upstream a while ago:
> > >>>>>
> > >>>>> http://lkml.org/lkml/2006/9/27/137
> > >>>>>
> > >>>>> I guess it got lost?
> > >>>> Seems like it. Andrew, did this ever get queued for merge?
> > >>> Seems not. I think people were hoping that various nasties in there
> > >>> would go away. We return to userspace with a kernel lock held??
> > >> Is a semaphore any worse than the current mutex in this respect? At
> > >> least unlocking from another thread doesn't violate semaphore rules. :)
> > >
> > > I assume that if we weren't returning to userspace with a lock held, this
> > > mutex problem would simply go away.
> > >
> >
> > Well nobody's asserting that the filesystem must always be locked &
> > unlocked by the same thread, are they? That'd be a strange rule to
> > enforce upon the userspace doing the filesystem management wouldn't it?
> > Or am I thinking about this wrong...
>
> I don't even know what code we're talking about here...
>
> I'm under the impression that XFS will return to userspace with a
> filesystem lock held, under the expectation (ie: requirement) that
> userspace will later come in and release that lock.
Its not really XFS, its more the generic device freezing code
(freeze_bdev) which is called by both XFS and the device mapper
suspend interface (both of which are exposed to userspace via
ioctls). These interfaces are used when doing block level
snapshots which are "filesystem coherent".
> If that's not true, then what _is_ happening in there?
This particular case was a device mapper stack trace, hence the
confusion, I think. Both XFS and DM are making the same generic
block layer call here though (freeze_bdev).
> If that _is_ true then, well, that sucks a bit.
Indeed, its a fairly ordinary interface, but thats too late to go
fix now I guess (since its exposed to userspace already). A remount
flag along the lines of readonly may have been a better way to go...
perhaps. *shrug*... not clear - I guess the problem the original
authors had there (whoever they were, I dunno), was that the block
layer wants to call up to the filesystem to quiesce itself, and at
some later user-defined point to unquiesce itself... which is a bit
of a layering violation.
>From a quick look, there seems to be a bug in the original patch - it
is passing -EAGAIN back without wrapping it up in ERR_PTR(), which
it needs to since freeze_bdev returns a struct super_block pointer.
cheers.
--
Nathan
On Tue, Jan 09, 2007 at 03:17:03PM +1100, Nathan Scott wrote:
> On Mon, 2007-01-08 at 19:51 -0800, Andrew Morton wrote:
> > On Mon, 08 Jan 2007 21:38:05 -0600
> > Eric Sandeen <[email protected]> wrote:
> >
> > > Andrew Morton wrote:
> > > > On Mon, 08 Jan 2007 21:12:40 -0600
> > > > Eric Sandeen <[email protected]> wrote:
> > > >
> > > >> Andrew Morton wrote:
> > > >>> On Tue, 9 Jan 2007 10:47:28 +1100
> > > >>> David Chinner <[email protected]> wrote:
> > > >>>
> > > >>>> On Mon, Jan 08, 2007 at 10:40:54AM -0600, Eric Sandeen wrote:
> > > >>>>> Sami Farin wrote:
> > > >>>>>> On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote:
> > > >>>>>> ...
> > > >>>>>>>> fstab was there just fine after -u.
> > > >>>>>>> Oh, that still hasn't been fixed?
> > > >>>>>> Looked like it =)
> > > >>>>> Hm, it was proposed upstream a while ago:
> > > >>>>>
> > > >>>>> http://lkml.org/lkml/2006/9/27/137
> > > >>>>>
> > > >>>>> I guess it got lost?
> > > >>>> Seems like it. Andrew, did this ever get queued for merge?
> > > >>> Seems not. I think people were hoping that various nasties in there
> > > >>> would go away. We return to userspace with a kernel lock held??
> > > >> Is a semaphore any worse than the current mutex in this respect? At
> > > >> least unlocking from another thread doesn't violate semaphore rules. :)
> > > >
> > > > I assume that if we weren't returning to userspace with a lock held, this
> > > > mutex problem would simply go away.
> > > >
> > >
> > > Well nobody's asserting that the filesystem must always be locked &
> > > unlocked by the same thread, are they? That'd be a strange rule to
> > > enforce upon the userspace doing the filesystem management wouldn't it?
> > > Or am I thinking about this wrong...
> >
> > I don't even know what code we're talking about here...
> >
> > I'm under the impression that XFS will return to userspace with a
> > filesystem lock held, under the expectation (ie: requirement) that
> > userspace will later come in and release that lock.
>
> Its not really XFS, its more the generic device freezing code
> (freeze_bdev) which is called by both XFS and the device mapper
> suspend interface (both of which are exposed to userspace via
> ioctls). These interfaces are used when doing block level
> snapshots which are "filesystem coherent".
>
> > If that's not true, then what _is_ happening in there?
>
> This particular case was a device mapper stack trace, hence the
> confusion, I think. Both XFS and DM are making the same generic
> block layer call here though (freeze_bdev).
Yup. it's the freeze_bdev/thaw_bdev use of the bd_mount_mutex()
that's the problem. I fail to see _why_ we need to hold a lock
across the freeze/thaw - the only reason i can think of is to
hold out new calls to sget() (via get_sb_bdev()) while the
filesystem is frozen though I'm not sure why you'd need to
do that. Can someone explain why we are holding the lock from
freeze to thaw?
FWIW, the comment in get_sb_bdev() seems to imply s_umount is supposed
to ensure that we don't get unmounted while frozen.
Indeed, in the comment above freeze_bdev:
* If a superblock is found on this device, we take the s_umount semaphore
* on it to make sure nobody unmounts until the snapshot creation is done.
implies this as well, but freeze_bdev does not take the s_umount
semaphore, nor does any filesystem that implements ->write_super_lockfs()
So the code is clearly at odds with the comments here.
IMO, you should be able to unmount a frozen filesystem - behaviour
should be the same as crashing while frozen, so i think the comments
about "snapshots" are pretty dodgy as well.
> > If that _is_ true then, well, that sucks a bit.
>
> Indeed, its a fairly ordinary interface, but thats too late to go
> fix now I guess (since its exposed to userspace already).
Userspace knows nothing about that lock, so we can change that without
changing the the userspace API.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
On Tue, 2007-01-09 at 15:49 +1100, David Chinner wrote:
> On Tue, Jan 09, 2007 at 03:17:03PM +1100, Nathan Scott wrote:
> > On Mon, 2007-01-08 at 19:51 -0800, Andrew Morton wrote:
> > > If that's not true, then what _is_ happening in there?
> >
> > This particular case was a device mapper stack trace, hence the
> > confusion, I think. Both XFS and DM are making the same generic
> > block layer call here though (freeze_bdev).
>
> Yup. it's the freeze_bdev/thaw_bdev use of the bd_mount_mutex()
> that's the problem. I fail to see _why_ we need to hold a lock
> across the freeze/thaw - the only reason i can think of is to
> hold out new calls to sget() (via get_sb_bdev()) while the
> filesystem is frozen though I'm not sure why you'd need to
> do that. Can someone explain why we are holding the lock from
> freeze to thaw?
Not me. If it's really not needed, then...
> > > If that _is_ true then, well, that sucks a bit.
> >
> > Indeed, its a fairly ordinary interface, but thats too late to go
> > fix now I guess (since its exposed to userspace already).
>
> Userspace knows nothing about that lock, so we can change that without
> changing the the userspace API.
...that would be true, AFAICS.
cheers.
--
Nathan
* Andrew Morton <[email protected]> wrote:
> > Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f
> > /mnt/newtest; xfs_freeze -u /mnt/newtest works safely and doesn't
> > produce lockdep warnings.
>
> Sad. The alternative would be to implement
> mutex_unlock_dont_warn_if_a_different_task_did_it(). Ingo? Possible?
i'd like to avoid it as much as i'd like to avoid having to add
spin_unlock_dont_warn_if_a_different_task_did_it(). Unlocking by a
different task is usually a sign of messy locking and bugs lurking. Is
it really true that XFS's use of bd_mount_mutex is safe and justified?
Ingo
On Mon, Jan 08, 2007 at 04:19:17PM -0800, Andrew Morton wrote:
> Seems not. I think people were hoping that various nasties in there
> would go away. We return to userspace with a kernel lock held??
Well, there might be nicer solutions, but for now we should revert the
broken commit to change the lock type.
On Mon, Jan 08, 2007 at 07:51:27PM -0800, Andrew Morton wrote:
> I don't even know what code we're talking about here...
>
> I'm under the impression that XFS will return to userspace with a
> filesystem lock held, under the expectation (ie: requirement) that
> userspace will later come in and release that lock.
>
> If that's not true, then what _is_ happening in there?
>
> If that _is_ true then, well, that sucks a bit.
It's not a filesystem lock. It's a per-blockdevice lock that prevents
multiple people from freezing the filesystem at the same time, aswell
as providing exclusion between a frozen filesystem an mount-related
activity. It's a traditional text-box example for a semaphore.
On Tue, Jan 09, 2007 at 10:04:20AM +0000, Christoph Hellwig wrote:
> On Mon, Jan 08, 2007 at 07:51:27PM -0800, Andrew Morton wrote:
> > I don't even know what code we're talking about here...
> >
> > I'm under the impression that XFS will return to userspace with a
> > filesystem lock held, under the expectation (ie: requirement) that
> > userspace will later come in and release that lock.
> >
> > If that's not true, then what _is_ happening in there?
> >
> > If that _is_ true then, well, that sucks a bit.
>
> It's not a filesystem lock. It's a per-blockdevice lock that prevents
> multiple people from freezing the filesystem at the same time, aswell
> as providing exclusion between a frozen filesystem an mount-related
> activity. It's a traditional text-box example for a semaphore.
This can be done without needing to hold a semaphore across the
freeze/thaw.
In the XFS case, we never try to lock the semaphore a second
time - the freeze code checks if the filesystem is not already
(being) frozen before calling freeze_bdev(). On thaw it also
checks that the filesystem is frozen before calling thaw_bdev().
IOWs, you can safely do:
# xfs_freeze -f /dev/sda1; xfs_freeze -f /dev/sda1; xfs_freeze -f /dev/sda1;
# xfs_freeze -u /dev/sda1; xfs_freeze -u /dev/sda1; xfs_freeze -u /dev/sda1;
And the filesystem will only be frozen once and thawed once. The second
and subsequent incantations of the freeze/thaw are effectively ignored
and don't block.
IMO, if we need to prevent certain operations from occurring when the
filesystem is frozen, those operations need to explicitly check the
frozen state and block i.e. do something like:
wait_event(sb->s_wait_unfrozen, (sb->s_frozen < SB_FREEZE_WRITE));
If you need to prevent unmounts from occurring while snapshotting a
frozen filesystem, then the snapshot code needs to take the s_umount
semaphore while the snapshot is in progress. We should not be making
frozen filesystems unmountable....
Thoughts?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group