Hi,
this update brings a few minor performance improvements, otherwise
there's a lot of refactoring, cleanups and other sort of not user
visible changes.
Please pull, thanks.
Performance improvements
- inline b-tree locking functions, improvement in metadata-heavy changes
- relax locking on a range that's being reflinked, allows read operations to
run in parallel
- speed up NOCOW write checks (throughput +9% on a sample test)
- extent locking ranges have been reduced in several places, namely
around delayed ref processing
Core
- more page to folio conversions
- relocation
- send
- compression
- inline extent handling
- super block write and wait
- extent_map structure optimizations
- reduced structure size
- code simplifications
- add shrinker for allocated objects, the numbers can go high and could
exhaust memory on smaller systems (reported) as they may not get an
opportunity to be freed fast enough
- extent locking optimizations
- reduce locking ranges where it does not seem to be necessary and
are safe due to other means of synchronization
- potential improvements due to lower contention, allocation/freeing
and state management operations of extent state tracking structures
- delayed ref cleanups and simplifications
- updated trace points
- improved error handling, warnings and assertions
- cleanups and refactoring, unification of error handling paths
----------------------------------------------------------------
The following changes since commit dccb07f2914cdab2ac3a5b6c98406f765acab803:
Merge tag 'for-6.9-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux (2024-05-06 13:43:13 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag
for you to fetch changes up to 0e39c9e524479b85c1b83134df0cfc6e3cb5353a:
btrfs: qgroup: fix initialization of auto inherit array (2024-05-07 21:31:11 +0200)
----------------------------------------------------------------
Anand Jain (20):
btrfs: rename err to ret in btrfs_initxattrs()
btrfs: rename err to ret in btrfs_rmdir()
btrfs: rename err to ret in btrfs_cont_expand()
btrfs: rename err to ret in btrfs_ioctl_snap_destroy()
btrfs: rename err to ret in __set_extent_bit()
btrfs: rename err to ret in convert_extent_bit()
btrfs: rename err to ret in __btrfs_end_transaction()
btrfs: rename err to ret in create_reloc_inode()
btrfs: rename err to ret in btrfs_dirty_pages()
btrfs: rename err to ret in prepare_pages()
btrfs: rename err to ret in btrfs_direct_write()
btrfs: report filemap_fdata<write|wait>_range() error
btrfs: rename werr and err to ret in btrfs_write_marked_extents()
btrfs: rename werr and err to ret in __btrfs_wait_marked_extents()
btrfs: rename err and ret to ret in build_backref_tree()
btrfs: reuse ret instead of err in relocate_tree_blocks()
btrfs: drop variable err in quick_update_accounting()
btrfs: rename return variables in btrfs_qgroup_rescan_worker()
btrfs: simplify return variables in lookup_extent_data_ref()
btrfs: simplify return variables in btrfs_drop_subtree()
Boris Burkov (1):
btrfs: free PERTRANS at the end of cleanup_transaction()
Dan Carpenter (2):
btrfs: qgroup: delete unnecessary check in btrfs_qgroup_check_inherit()
btrfs: qgroup: fix initialization of auto inherit array
David Sterba (1):
btrfs: use btrfs_is_testing() everywhere
Filipe Manana (39):
btrfs: remove pointless BUG_ON() when creating snapshot
btrfs: locking: inline btrfs_tree_lock() and btrfs_tree_read_lock()
btrfs: locking: rename __btrfs_tree_lock() and __btrfs_tree_read_lock()
btrfs: remove pointless readahead callback wrapper
btrfs: remove pointless writepages callback wrapper
btrfs: avoid pointless wake ups of drew lock readers
btrfs: stop locking the source extent range during reflink
btrfs: remove not needed mod_start and mod_len from struct extent_map
btrfs: remove pointless return value assignment at btrfs_finish_one_ordered()
btrfs: remove list_empty() check at warn_about_uncommitted_trans()
btrfs: remove no longer used btrfs_clone_chunk_map()
btrfs: move btrfs_page_mkwrite() from inode.c into file.c
btrfs: add function comment to btrfs_lookup_csums_list()
btrfs: remove search_commit parameter from btrfs_lookup_csums_list()
btrfs: remove use of a temporary list at btrfs_lookup_csums_list()
btrfs: simplify error path for btrfs_lookup_csums_list()
btrfs: make NOCOW checks for existence of checksums in a range more efficient
btrfs: open code csum_exist_in_range()
btrfs: pass an inode to btrfs_add_extent_mapping()
btrfs: tests: error out on unexpected extent map reference count
btrfs: simplify add_extent_mapping() by removing pointless label
btrfs: export find_next_inode() as btrfs_find_first_inode()
btrfs: use btrfs_find_first_inode() at btrfs_prune_dentries()
btrfs: pass the extent map tree's inode to add_extent_mapping()
btrfs: pass the extent map tree's inode to clear_em_logging()
btrfs: pass the extent map tree's inode to remove_extent_mapping()
btrfs: pass the extent map tree's inode to replace_extent_mapping()
btrfs: pass the extent map tree's inode to setup_extent_mapping()
btrfs: pass the extent map tree's inode to try_merge_map()
btrfs: add a global per cpu counter to track number of used extent maps
btrfs: add a shrinker for extent maps
btrfs: update comment for btrfs_set_inode_full_sync() about locking
btrfs: add tracepoints for extent map shrinker events
btrfs: rename some variables at try_release_extent_mapping()
btrfs: use btrfs_get_fs_generation() at try_release_extent_mapping()
btrfs: remove i_size restriction at try_release_extent_mapping()
btrfs: be better releasing extent maps at try_release_extent_mapping()
btrfs: make try_release_extent_mapping() return a bool
btrfs: initialize delayed inodes xarray without GFP_ATOMIC
Goldwyn Rodrigues (3):
btrfs: page to folio conversion: prealloc_file_extent_cluster()
btrfs: convert relocate_one_page() to folios and rename
btrfs: convert put_file_data() to folios
Josef Bacik (38):
btrfs: add a helper to get the delayed ref node from the data/tree ref
btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node
btrfs: do not use a function to initialize btrfs_ref
btrfs: move ref_root into btrfs_ref
btrfs: pass btrfs_ref to init_delayed_ref_common
btrfs: initialize btrfs_delayed_ref_head with btrfs_ref
btrfs: move ref specific initialization into init_delayed_ref_common
btrfs: simplify delayed ref tracepoints
btrfs: unify the btrfs_add_delayed_*_ref helpers into one helper
btrfs: rename ->len to ->num_bytes in btrfs_ref
btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node
btrfs: rename btrfs_data_ref->ino to ->objectid
btrfs: make __btrfs_inc_extent_ref take a btrfs_delayed_ref_node
btrfs: drop unnecessary arguments from __btrfs_free_extent
btrfs: make the insert backref helpers take a btrfs_delayed_ref_node
btrfs: stop referencing btrfs_delayed_data_ref directly
btrfs: stop referencing btrfs_delayed_tree_ref directly
btrfs: remove the btrfs_delayed_ref_node container helpers
btrfs: replace btrfs_delayed_*_ref with btrfs_*_ref
btrfs: set start on clone before calling copy_extent_buffer_full
btrfs: change root->root_key.objectid to btrfs_root_id()
btrfs: handle errors in btrfs_reloc_clone_csums properly
btrfs: push all inline logic into cow_file_range
btrfs: unlock all the pages with successful inline extent creation
btrfs: move extent bit and page cleanup into cow_file_range_inline
btrfs: lock extent when doing inline extent in compression
btrfs: push the extent lock into btrfs_run_delalloc_range
btrfs: push extent lock into run_delalloc_nocow
btrfs: adjust while loop condition in run_delalloc_nocow
btrfs: push extent lock down in run_delalloc_nocow
btrfs: remove unlock_extent from run_delalloc_compressed
btrfs: push extent lock into run_delalloc_cow
btrfs: push extent lock into cow_file_range
btrfs: push lock_extent into cow_file_range_inline
btrfs: move can_cow_file_range_inline() outside of the extent lock
btrfs: push lock_extent down in cow_file_range()
btrfs: push extent lock down in submit_one_async_extent
btrfs: add a cached state to extent_clear_unlock_delalloc
Matthew Wilcox (Oracle) (5):
bio: Export bio_add_folio_nofail to modules
btrfs: convert super block writes to folio in wait_dev_supers()
btrfs: convert super block writes to folio in write_dev_supers()
btrfs: use the folio iterator in btrfs_end_super_write()
btrfs: count super block write errors in device instead of tracking folio error state
Naohiro Aota (1):
btrfs: drop unused argument of calcu_metadata_size()
Qu Wenruo (9):
btrfs: compression: add error handling for missed page cache
btrfs: compression: convert page allocation to folio interfaces
btrfs: make insert_inline_extent() accept one page directly
btrfs: migrate insert_inline_extent() to folio interfaces
btrfs: introduce btrfs_alloc_folio_array()
btrfs: compression: migrate compression/decompression paths to folios
btrfs: add extra comments on extent_map members
btrfs: simplify the inline extent map creation
btrfs: add extra sanity checks for create_io_em()
Tavian Barnes (2):
btrfs: add helper to clear EXTENT_BUFFER_READING
btrfs: warn if EXTENT_BUFFER_UPTODATE is set while reading
Thorsten Blum (1):
btrfs: remove duplicate included header from fs.h
block/bio.c | 1 +
fs/btrfs/backref.c | 48 +-
fs/btrfs/block-rsv.c | 11 +-
fs/btrfs/btrfs_inode.h | 10 +-
fs/btrfs/compression.c | 119 +++--
fs/btrfs/compression.h | 42 +-
fs/btrfs/ctree.c | 51 +--
fs/btrfs/defrag.c | 2 +-
fs/btrfs/delayed-inode.c | 2 +-
fs/btrfs/delayed-ref.c | 365 +++++----------
fs/btrfs/delayed-ref.h | 148 +++---
fs/btrfs/disk-io.c | 157 +++----
fs/btrfs/export.c | 8 +-
fs/btrfs/extent-io-tree.c | 58 +--
fs/btrfs/extent-tree.c | 366 +++++++--------
fs/btrfs/extent_io.c | 223 +++++----
fs/btrfs/extent_io.h | 11 +-
fs/btrfs/extent_map.c | 316 ++++++++++---
fs/btrfs/extent_map.h | 67 ++-
fs/btrfs/file-item.c | 90 ++--
fs/btrfs/file-item.h | 3 +-
fs/btrfs/file.c | 327 ++++++++++----
fs/btrfs/fs.h | 5 +-
fs/btrfs/inode-item.c | 16 +-
fs/btrfs/inode.c | 923 +++++++++++++++++---------------------
fs/btrfs/ioctl.c | 86 ++--
fs/btrfs/locking.c | 26 +-
fs/btrfs/locking.h | 18 +-
fs/btrfs/lzo.c | 89 ++--
fs/btrfs/ordered-data.c | 8 +-
fs/btrfs/ordered-data.h | 1 +
fs/btrfs/props.c | 2 +-
fs/btrfs/qgroup.c | 79 ++--
fs/btrfs/ref-verify.c | 8 +-
fs/btrfs/reflink.c | 56 +--
fs/btrfs/relocation.c | 417 ++++++++---------
fs/btrfs/root-tree.c | 3 +-
fs/btrfs/send.c | 74 +--
fs/btrfs/super.c | 33 +-
fs/btrfs/sysfs.c | 8 +-
fs/btrfs/tests/btrfs-tests.c | 3 +-
fs/btrfs/tests/extent-map-tests.c | 216 +++++----
fs/btrfs/transaction.c | 76 ++--
fs/btrfs/tree-checker.c | 2 +-
fs/btrfs/tree-log.c | 46 +-
fs/btrfs/tree-mod-log.c | 2 +-
fs/btrfs/volumes.c | 15 -
fs/btrfs/volumes.h | 10 +-
fs/btrfs/xattr.c | 10 +-
fs/btrfs/zlib.c | 112 ++---
fs/btrfs/zstd.c | 80 ++--
include/trace/events/btrfs.h | 158 +++++--
52 files changed, 2650 insertions(+), 2357 deletions(-)
The pull request you sent on Mon, 13 May 2024 18:20:55 +0200:
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a3d1f54d7aa4c3be2c6a10768d4ffa1dcb620da9
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
On Mon, 13 May 2024 at 09:28, David Sterba <[email protected]> wrote:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag
So I initially blamed a GPU driver for the following problem, but Dave
Airlie seems to think it's unlikely that problem would cause this kind
of corruption, so now it looks like it might just be btrfs itself:
BUG: Bad page state in process kworker/u261:13 pfn:31fb9a
page: refcount:0 mapcount:0 mapping:00000000ff0b239e index:0x37ce8
pfn:0x31fb9a
aops:btree_aops ino:1
flags: 0x2fffc600000020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x3fff)
page_type: 0xffffffff()
raw: 02fffc600000020c dead000000000100 dead000000000122 ffff9b191efb0338
raw: 0000000000037ce8 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: non-NULL mapping
CPU: 18 PID: 141351 Comm: kworker/u261:13 Tainted: G W
6.9.0-07381-g3860ca371740 #60
Workqueue: btrfs-delayed-meta btrfs_work_helper
Call Trace:
bad_page+0xe0/0xf0
free_unref_page_prepare+0x363/0x380
? __count_memcg_events+0x63/0xd0
free_unref_page+0x33/0x1f0
? __mem_cgroup_uncharge+0x80/0xb0
__folio_put+0x62/0x80
release_extent_buffer+0xad/0x110
btrfs_force_cow_block+0x68f/0x890
btrfs_cow_block+0xe5/0x240
btrfs_search_slot+0x30e/0x9f0
btrfs_lookup_inode+0x31/0xb0
__btrfs_update_delayed_inode+0x5c/0x350
? kfree+0x80/0x250
__btrfs_commit_inode_delayed_items+0x7a1/0x7d0
btrfs_async_run_delayed_root+0xf7/0x1b0
btrfs_work_helper+0xc0/0x320
process_scheduled_works+0x196/0x360
worker_thread+0x2b8/0x370
? pr_cont_work+0x190/0x190
kthread+0x111/0x120
? kthread_blkcg+0x30/0x30
ret_from_fork+0x30/0x40
? kthread_blkcg+0x30/0x30
ret_from_fork_asm+0x11/0x20
Note the line
page dumped because: non-NULL mapping
but the actual mapping pointer isn't a valid kernel pointer. I suspect
that may be due to pointer hashing, though. I'm not convinced that's a
great idea for this case, but hey, here we are. Sometimes those "don't
leak kernel pointers" things cause problems for debugging.
Anyway, it looks like the btrfs_cow_block -> btrfs_force_cow_block ->
release_extent_buffer -> __folio_put path might be releasing a page
that is still attached to a mapping. Perhaps some page counting
imbalance?
This all happened under fairly normal - for me - workstation loads. I
was (of course) doing an allmodconfig kernel build after a pull, and I
had a handful of terminals and the web browser open. Nothing
particularly interesting or odd.
Does the above make any btrfs people go "Ahh, I see how that would be
a problem"?
Linus
在 2024/5/16 10:01, Linus Torvalds 写道:
> On Mon, 13 May 2024 at 09:28, David Sterba <[email protected]> wrote:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag
>
> So I initially blamed a GPU driver for the following problem, but Dave
> Airlie seems to think it's unlikely that problem would cause this kind
> of corruption, so now it looks like it might just be btrfs itself:
>
> BUG: Bad page state in process kworker/u261:13 pfn:31fb9a
> page: refcount:0 mapcount:0 mapping:00000000ff0b239e index:0x37ce8
> pfn:0x31fb9a
> aops:btree_aops ino:1
> flags: 0x2fffc600000020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x3fff)
> page_type: 0xffffffff()
> raw: 02fffc600000020c dead000000000100 dead000000000122 ffff9b191efb0338
> raw: 0000000000037ce8 0000000000000000 00000000ffffffff 0000000000000000
> page dumped because: non-NULL mapping
> CPU: 18 PID: 141351 Comm: kworker/u261:13 Tainted: G W
> 6.9.0-07381-g3860ca371740 #60
> Workqueue: btrfs-delayed-meta btrfs_work_helper
> Call Trace:
> bad_page+0xe0/0xf0
> free_unref_page_prepare+0x363/0x380
> ? __count_memcg_events+0x63/0xd0
> free_unref_page+0x33/0x1f0
> ? __mem_cgroup_uncharge+0x80/0xb0
> __folio_put+0x62/0x80
> release_extent_buffer+0xad/0x110
> btrfs_force_cow_block+0x68f/0x890
> btrfs_cow_block+0xe5/0x240
> btrfs_search_slot+0x30e/0x9f0
> btrfs_lookup_inode+0x31/0xb0
> __btrfs_update_delayed_inode+0x5c/0x350
> ? kfree+0x80/0x250
> __btrfs_commit_inode_delayed_items+0x7a1/0x7d0
> btrfs_async_run_delayed_root+0xf7/0x1b0
> btrfs_work_helper+0xc0/0x320
> process_scheduled_works+0x196/0x360
> worker_thread+0x2b8/0x370
> ? pr_cont_work+0x190/0x190
> kthread+0x111/0x120
> ? kthread_blkcg+0x30/0x30
> ret_from_fork+0x30/0x40
> ? kthread_blkcg+0x30/0x30
> ret_from_fork_asm+0x11/0x20
>
> Note the line
>
> page dumped because: non-NULL mapping
>
> but the actual mapping pointer isn't a valid kernel pointer. I suspect
> that may be due to pointer hashing, though. I'm not convinced that's a
> great idea for this case, but hey, here we are. Sometimes those "don't
> leak kernel pointers" things cause problems for debugging.
>
> Anyway, it looks like the btrfs_cow_block -> btrfs_force_cow_block ->
> release_extent_buffer -> __folio_put path might be releasing a page
> that is still attached to a mapping. Perhaps some page counting
> imbalance?
>
> This all happened under fairly normal - for me - workstation loads. I
> was (of course) doing an allmodconfig kernel build after a pull, and I
> had a handful of terminals and the web browser open. Nothing
> particularly interesting or odd.
Considering aarch64 is going more and more common, is the workstation
also an aarch64 platform? (the Ampere one?)
If so, mind to share the page size and the fs sectorsize?
That would at least help us to know if it's the subpage routine or the
regular routine.
Thanks,
Qu
>
> Does the above make any btrfs people go "Ahh, I see how that would be
> a problem"?
>
> Linus
>
On Thu, 16 May 2024 at 02:02, Qu Wenruo <[email protected]> wrote:
>
> Considering aarch64 is going more and more common, is the workstation
> also an aarch64 platform? (the Ampere one?)
No, this happened on my regular old AMD Threadripper.
Linus