On Thu, Dec 21, 2023 at 04:57:04PM +0800, Yu Kuai wrote:
> @@ -3674,16 +3670,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
> * Drop the page of the primary superblock, so later read will
> * always read from the device.
> */
> - invalidate_inode_pages2_range(mapping,
> - bytenr >> PAGE_SHIFT,
> + invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
> (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
> }
>
> - page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> - if (IS_ERR(page))
> - return ERR_CAST(page);
> + nofs_flag = memalloc_nofs_save();
> + folio = bdev_read_folio(bdev, bytenr);
> + memalloc_nofs_restore(nofs_flag);
This is the wrong way to use memalloc_nofs_save/restore. They should be
used at the point that the filesystem takes/releases whatever lock is
also used during reclaim. I don't know btrfs well enough to suggest
what lock is missing these annotations.
On Sat, Dec 23, 2023 at 05:31:55PM +0000, Matthew Wilcox wrote:
> On Thu, Dec 21, 2023 at 04:57:04PM +0800, Yu Kuai wrote:
> > @@ -3674,16 +3670,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
> > * Drop the page of the primary superblock, so later read will
> > * always read from the device.
> > */
> > - invalidate_inode_pages2_range(mapping,
> > - bytenr >> PAGE_SHIFT,
> > + invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
> > (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
> > }
> >
> > - page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> > - if (IS_ERR(page))
> > - return ERR_CAST(page);
> > + nofs_flag = memalloc_nofs_save();
> > + folio = bdev_read_folio(bdev, bytenr);
> > + memalloc_nofs_restore(nofs_flag);
>
> This is the wrong way to use memalloc_nofs_save/restore. They should be
> used at the point that the filesystem takes/releases whatever lock is
> also used during reclaim. I don't know btrfs well enough to suggest
> what lock is missing these annotations.
Yes, but considering this is a cross-filesystem cleanup I wouldn't want
to address that in this patchset. And the easier, more incremental
approach for the conversion would be to first convert every GFP_NOFS
usage to memalloc_nofs_save() like this patch does, as small local
changes, and then let the btrfs people combine them and move them to the
approproate location in a separate patchstet.
On Sat 23-12-23 17:31:55, Matthew Wilcox wrote:
> On Thu, Dec 21, 2023 at 04:57:04PM +0800, Yu Kuai wrote:
> > @@ -3674,16 +3670,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
> > * Drop the page of the primary superblock, so later read will
> > * always read from the device.
> > */
> > - invalidate_inode_pages2_range(mapping,
> > - bytenr >> PAGE_SHIFT,
> > + invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
> > (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
> > }
> >
> > - page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> > - if (IS_ERR(page))
> > - return ERR_CAST(page);
> > + nofs_flag = memalloc_nofs_save();
> > + folio = bdev_read_folio(bdev, bytenr);
> > + memalloc_nofs_restore(nofs_flag);
>
> This is the wrong way to use memalloc_nofs_save/restore. They should be
> used at the point that the filesystem takes/releases whatever lock is
> also used during reclaim. I don't know btrfs well enough to suggest
> what lock is missing these annotations.
In principle I agree with you but in this particular case I agree the ask
is just too big. I suspect it is one of btrfs btree locks or maybe
chunk_mutex but I doubt even btrfs developers know and maybe it is just a
cargo cult. And it is not like this would be the first occurence of this
anti-pattern in btrfs - see e.g. device_list_add(), add_missing_dev(),
btrfs_destroy_delalloc_inodes() (here the wrapping around
invalidate_inode_pages2() looks really weird), and many others...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Thu, Jan 04, 2024 at 12:49:58PM +0100, Jan Kara wrote:
> On Sat 23-12-23 17:31:55, Matthew Wilcox wrote:
> > On Thu, Dec 21, 2023 at 04:57:04PM +0800, Yu Kuai wrote:
> > > @@ -3674,16 +3670,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
> > > * Drop the page of the primary superblock, so later read will
> > > * always read from the device.
> > > */
> > > - invalidate_inode_pages2_range(mapping,
> > > - bytenr >> PAGE_SHIFT,
> > > + invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
> > > (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
> > > }
> > >
> > > - page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> > > - if (IS_ERR(page))
> > > - return ERR_CAST(page);
> > > + nofs_flag = memalloc_nofs_save();
> > > + folio = bdev_read_folio(bdev, bytenr);
> > > + memalloc_nofs_restore(nofs_flag);
> >
> > This is the wrong way to use memalloc_nofs_save/restore. They should be
> > used at the point that the filesystem takes/releases whatever lock is
> > also used during reclaim. I don't know btrfs well enough to suggest
> > what lock is missing these annotations.
>
> In principle I agree with you but in this particular case I agree the ask
> is just too big. I suspect it is one of btrfs btree locks or maybe
> chunk_mutex but I doubt even btrfs developers know and maybe it is just a
> cargo cult. And it is not like this would be the first occurence of this
> anti-pattern in btrfs - see e.g. device_list_add(), add_missing_dev(),
> btrfs_destroy_delalloc_inodes() (here the wrapping around
> invalidate_inode_pages2() looks really weird), and many others...
The pattern is intentional and a temporary solution before we could
implement the scoped NOFS. Functions calling allocations get converted
from GFP_NOFS to GFP_KERNEL but in case they're called from a context
that either holds big locks or can recursively enter the filesystem then
it's protected by the memalloc calls. This should not be surprising.
What may not be obvious is which locks or kmalloc calling functions it
could be, this depends on the analysis of the function call chain and
usually there's enough evidence why it's needed.