by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH 01/10] mbcache: Don't reclaim used entries

On Tue, 12 Jul 2022 12:54:20 +0200, Jan Kara wrote:
> Do not reclaim entries that are currently used by somebody from a
> shrinker. Firstly, these entries are likely useful. Secondly, we will
> need to keep such entries to protect pending increment of xattr block
> refcount.
>

Applied, thanks! (Some slight adjustments were needed to resolve a
merge conflict.)

[01/10] mbcache: Don't reclaim used entries
commit: ee595bcf21a86af4cff673000e2728d61c7c0e7b
[02/10] mbcache: Add functions to delete entry if unused
commit: ad3923aa44185f5f65e17764fe5c30501c6dfd22
[03/10] ext4: Remove EA inode entry from mbcache on inode eviction
commit: 428dc374a6cb6c0cbbf6fe8984b667ef78dc7d75
[04/10] ext4: Unindent codeblock in ext4_xattr_block_set()
commit: d52086dcf26a6284b08b5544210a7475b4837d52
[05/10] ext4: Fix race when reusing xattr blocks
commit: 132991ed28822cfb4be41ac72195f00fc0baf3c8
[06/10] ext2: Factor our freeing of xattr block reference
commit: c30e78a5f165244985aa346bdd460d459094470e
[07/10] ext2: Unindent codeblock in ext2_xattr_set()
commit: 0e85fb030d13e427deca44a95aabb2475614f8d2
[08/10] ext2: Avoid deleting xattr block that is being reused
commit: 44ce98e77ab4583b17ff4f501c2076eec3b759d7
[09/10] mbcache: Remove mb_cache_entry_delete()
commit: c3671ffa0919f2d433576c99c4e211cd367afda0
[10/10] mbcache: Automatically delete entries from cache on freeing
commit: b51539a7d04fb7d05b28ab9387364ccde88b6b6d

Best regards,
--
Theodore Ts'o <[email protected]>

2022-07-25 15:28:05

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 05/10] ext4: Fix race when reusing xattr blocks

On Sat 16-07-22 11:00:46, Zhihao Cheng wrote:
> 在 2022/7/12 18:54, Jan Kara 写道:
> Hi Jan, one little question confuses me:
> > When ext4_xattr_block_set() decides to remove xattr block the following
> > race can happen:
> >
> > CPU1 CPU2
> > ext4_xattr_block_set() ext4_xattr_release_block()
> > new_bh = ext4_xattr_block_cache_find()
> >
> > lock_buffer(bh);
> > ref = le32_to_cpu(BHDR(bh)->h_refcount);
> > if (ref == 1) {
> > ...
> > mb_cache_entry_delete();
> > unlock_buffer(bh);
> > ext4_free_blocks();
> > ...
> > ext4_forget(..., bh, ...);
> > jbd2_journal_revoke(..., bh);
> >
> > ext4_journal_get_write_access(..., new_bh, ...)
> > do_get_write_access()
> > jbd2_journal_cancel_revoke(..., new_bh);
> >
> > Later the code in ext4_xattr_block_set() finds out the block got freed
> > and cancels reusal of the block but the revoke stays canceled and so in
> > case of block reuse and journal replay the filesystem can get corrupted.
> > If the race works out slightly differently, we can also hit assertions
> > in the jbd2 code.
> >
> > Fix the problem by making sure that once matching mbcache entry is
> > found, code dropping the last xattr block reference (or trying to modify
> > xattr block in place) waits until the mbcache entry reference is
> > dropped. This way code trying to reuse xattr block is protected from
> > someone trying to drop the last reference to xattr block.
> >
> > Reported-and-tested-by: Ritesh Harjani <[email protected]>
> > CC: [email protected]
> > Fixes: 82939d7999df ("ext4: convert to mbcache2")
> > Signed-off-by: Jan Kara <[email protected]>

...

> > @@ -1991,18 +2020,13 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
> > lock_buffer(new_bh);
> > /*
> > * We have to be careful about races with
> > - * freeing, rehashing or adding references to
> > - * xattr block. Once we hold buffer lock xattr
> > - * block's state is stable so we can check
> > - * whether the block got freed / rehashed or
> > - * not. Since we unhash mbcache entry under
> > - * buffer lock when freeing / rehashing xattr
> > - * block, checking whether entry is still
> > - * hashed is reliable. Same rules hold for
> > - * e_reusable handling.
> > + * adding references to xattr block. Once we
> > + * hold buffer lock xattr block's state is
> > + * stable so we can check the additional
> > + * reference fits.
> > */
> > - if (hlist_bl_unhashed(&ce->e_hash_list) ||
> > - !ce->e_reusable) {
> > + ref = le32_to_cpu(BHDR(new_bh)->h_refcount) + 1;
> > + if (ref > EXT4_XATTR_REFCOUNT_MAX) {
>
> So far, we have mb_cache_entry_delete_or_get() and
> mb_cache_entry_wait_unused(), so used cache entry cannot be concurrently
> removed. Removing check 'hlist_bl_unhashed(&ce->e_hash_list)' is okay.
>
> What's affect of changing the other two checks 'ref >=
> EXT4_XATTR_REFCOUNT_MAX' and '!ce->e_reusable'? To make code more obvious?
> eg. To point out the condition of 'ref <= EXT4_XATTR_REFCOUNT_MAX' rather
> than 'ce->e_reusable', we have checked 'ce->e_reusable' in
> ext4_xattr_block_cache_find() before?

Well, ce->e_reusable is set if and only if BHDR(new_bh)->h_refcount <
EXT4_XATTR_REFCOUNT_MAX. So checking whether the refcount is small enough
is all that is needed and we don't need the ce->e_reusable check here.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR