Just like get_tree_bdev needs to drop s_umount when opening the main
device, we need to do the same for the xfs log and RT devices to avoid a
potential lock order reversal with s_unmount for the mark_dead path.
It might be preferable to just drop s_umount over ->fill_super entirely,
but that will require a fairly massive audit first, so we'll do the easy
version here first.
Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/xfs/xfs_super.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8185102431301d..d5042419ed9997 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -448,17 +448,21 @@ STATIC int
xfs_open_devices(
struct xfs_mount *mp)
{
- struct block_device *ddev = mp->m_super->s_bdev;
+ struct super_block *sb = mp->m_super;
+ struct block_device *ddev = sb->s_bdev;
struct block_device *logdev = NULL, *rtdev = NULL;
int error;
+ /* see get_tree_bdev why this is needed and safe */
+ up_write(&sb->s_umount);
+
/*
* Open real time and log devices - order is important.
*/
if (mp->m_logname) {
error = xfs_blkdev_get(mp, mp->m_logname, &logdev);
if (error)
- return error;
+ goto out_unlock;
}
if (mp->m_rtname) {
@@ -496,7 +500,10 @@ xfs_open_devices(
mp->m_logdev_targp = mp->m_ddev_targp;
}
- return 0;
+ error = 0;
+out_unlock:
+ down_write(&sb->s_umount);
+ return error;
out_free_rtdev_targ:
if (mp->m_rtdev_targp)
@@ -508,7 +515,7 @@ xfs_open_devices(
out_close_logdev:
if (logdev && logdev != ddev)
xfs_blkdev_put(mp, logdev);
- return error;
+ goto out_unlock;
}
/*
--
2.39.2
On Wed, Aug 02, 2023 at 05:41:30PM +0200, Christoph Hellwig wrote:
> Just like get_tree_bdev needs to drop s_umount when opening the main
> device, we need to do the same for the xfs log and RT devices to avoid a
> potential lock order reversal with s_unmount for the mark_dead path.
>
> It might be preferable to just drop s_umount over ->fill_super entirely,
> but that will require a fairly massive audit first, so we'll do the easy
> version here first.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> fs/xfs/xfs_super.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 8185102431301d..d5042419ed9997 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -448,17 +448,21 @@ STATIC int
> xfs_open_devices(
> struct xfs_mount *mp)
> {
> - struct block_device *ddev = mp->m_super->s_bdev;
> + struct super_block *sb = mp->m_super;
> + struct block_device *ddev = sb->s_bdev;
> struct block_device *logdev = NULL, *rtdev = NULL;
> int error;
>
> + /* see get_tree_bdev why this is needed and safe */
Which part of get_tree_bdev? Is it this?
/*
* s_umount nests inside open_mutex during
* __invalidate_device(). blkdev_put() acquires
* open_mutex and can't be called under s_umount. Drop
* s_umount temporarily. This is safe as we're
* holding an active reference.
*/
up_write(&s->s_umount);
blkdev_put(bdev, fc->fs_type);
down_write(&s->s_umount);
<confused>
> + up_write(&sb->s_umount);
> +
> /*
> * Open real time and log devices - order is important.
> */
> if (mp->m_logname) {
> error = xfs_blkdev_get(mp, mp->m_logname, &logdev);
> if (error)
> - return error;
> + goto out_unlock;
> }
>
> if (mp->m_rtname) {
> @@ -496,7 +500,10 @@ xfs_open_devices(
> mp->m_logdev_targp = mp->m_ddev_targp;
> }
>
> - return 0;
> + error = 0;
> +out_unlock:
> + down_write(&sb->s_umount);
Isn't down_write taking s_umount? I think the label should be
out_relock or something less misleading.
--D
> + return error;
>
> out_free_rtdev_targ:
> if (mp->m_rtdev_targp)
> @@ -508,7 +515,7 @@ xfs_open_devices(
> out_close_logdev:
> if (logdev && logdev != ddev)
> xfs_blkdev_put(mp, logdev);
> - return error;
> + goto out_unlock;
> }
>
> /*
> --
> 2.39.2
>
On Wed, Aug 02, 2023 at 09:32:19AM -0700, Darrick J. Wong wrote:
> > + /* see get_tree_bdev why this is needed and safe */
>
> Which part of get_tree_bdev? Is it this?
>
> /*
> * s_umount nests inside open_mutex during
> * __invalidate_device(). blkdev_put() acquires
> * open_mutex and can't be called under s_umount. Drop
> * s_umount temporarily. This is safe as we're
> * holding an active reference.
> */
> up_write(&s->s_umount);
> blkdev_put(bdev, fc->fs_type);
> down_write(&s->s_umount);
Yes. With the refactoring earlier in the series get_tree_bdev should
be trivial enough to not need a more specific reference. If you
think there's a better way to refer to it I can update the comment,
though.
> > mp->m_logdev_targp = mp->m_ddev_targp;
> > }
> >
> > - return 0;
> > + error = 0;
> > +out_unlock:
> > + down_write(&sb->s_umount);
>
> Isn't down_write taking s_umount? I think the label should be
> out_relock or something less misleading.
Agreed. Christian, can you just change this in your branch, or should
I send an incremental patch?
On Sat, Aug 05, 2023 at 10:32:39AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 02, 2023 at 09:32:19AM -0700, Darrick J. Wong wrote:
> > > + /* see get_tree_bdev why this is needed and safe */
> >
> > Which part of get_tree_bdev? Is it this?
> >
> > /*
> > * s_umount nests inside open_mutex during
> > * __invalidate_device(). blkdev_put() acquires
> > * open_mutex and can't be called under s_umount. Drop
> > * s_umount temporarily. This is safe as we're
> > * holding an active reference.
> > */
> > up_write(&s->s_umount);
> > blkdev_put(bdev, fc->fs_type);
> > down_write(&s->s_umount);
>
> Yes. With the refactoring earlier in the series get_tree_bdev should
> be trivial enough to not need a more specific reference. If you
> think there's a better way to refer to it I can update the comment,
> though.
>
> > > mp->m_logdev_targp = mp->m_ddev_targp;
> > > }
> > >
> > > - return 0;
> > > + error = 0;
> > > +out_unlock:
> > > + down_write(&sb->s_umount);
> >
> > Isn't down_write taking s_umount? I think the label should be
> > out_relock or something less misleading.
>
> Agreed. Christian, can you just change this in your branch, or should
> I send an incremental patch?
No need to send an incremental patch. I just s/out_unlock/out_relock/g
in-tree. Thanks!
On Sat, Aug 05, 2023 at 10:32:39AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 02, 2023 at 09:32:19AM -0700, Darrick J. Wong wrote:
> > > + /* see get_tree_bdev why this is needed and safe */
> >
> > Which part of get_tree_bdev? Is it this?
> >
> > /*
> > * s_umount nests inside open_mutex during
> > * __invalidate_device(). blkdev_put() acquires
> > * open_mutex and can't be called under s_umount. Drop
> > * s_umount temporarily. This is safe as we're
> > * holding an active reference.
> > */
> > up_write(&s->s_umount);
> > blkdev_put(bdev, fc->fs_type);
> > down_write(&s->s_umount);
>
> Yes. With the refactoring earlier in the series get_tree_bdev should
> be trivial enough to not need a more specific reference. If you
> think there's a better way to refer to it I can update the comment,
> though.
How about:
/*
* blkdev_put can't be called under s_umount, see the comment in
* get_tree_bdev for more details
*/
with that and the label name change,
Reviewed-by: Darrick J. Wong <[email protected]>
--D
> > > mp->m_logdev_targp = mp->m_ddev_targp;
> > > }
> > >
> > > - return 0;
> > > + error = 0;
> > > +out_unlock:
> > > + down_write(&sb->s_umount);
> >
> > Isn't down_write taking s_umount? I think the label should be
> > out_relock or something less misleading.
>
> Agreed. Christian, can you just change this in your branch, or should
> I send an incremental patch?
>
On Sat, Aug 05, 2023 at 09:19:04AM -0700, Darrick J. Wong wrote:
> On Sat, Aug 05, 2023 at 10:32:39AM +0200, Christoph Hellwig wrote:
> > On Wed, Aug 02, 2023 at 09:32:19AM -0700, Darrick J. Wong wrote:
> > > > + /* see get_tree_bdev why this is needed and safe */
> > >
> > > Which part of get_tree_bdev? Is it this?
> > >
> > > /*
> > > * s_umount nests inside open_mutex during
> > > * __invalidate_device(). blkdev_put() acquires
> > > * open_mutex and can't be called under s_umount. Drop
> > > * s_umount temporarily. This is safe as we're
> > > * holding an active reference.
> > > */
> > > up_write(&s->s_umount);
> > > blkdev_put(bdev, fc->fs_type);
> > > down_write(&s->s_umount);
> >
> > Yes. With the refactoring earlier in the series get_tree_bdev should
> > be trivial enough to not need a more specific reference. If you
> > think there's a better way to refer to it I can update the comment,
> > though.
>
> How about:
>
> /*
> * blkdev_put can't be called under s_umount, see the comment in
> * get_tree_bdev for more details
> */
>
> with that and the label name change,
> Reviewed-by: Darrick J. Wong <[email protected]>
Added that comment and you rvb in-tree.