Hi Linus, few patches for you - plus a simple merge conflict with VFS
changes:
Cheers,
Kent
diff --cc fs/bcachefs/super-io.c
index 010daebf987b,bd64eb68e84a..000000000000
--- a/fs/bcachefs/super-io.c
+++ b/fs/bcachefs/super-io.c
@@@ -723,12 -715,11 +723,12 @@@ retry
opt_set(*opts, nochanges, true);
}
- if (IS_ERR(sb->bdev_handle)) {
- ret = PTR_ERR(sb->bdev_handle);
+ if (IS_ERR(sb->s_bdev_file)) {
+ ret = PTR_ERR(sb->s_bdev_file);
+ prt_printf(&err, "error opening %s: %s", path, bch2_err_str(ret));
goto err;
}
- sb->bdev = sb->bdev_handle->bdev;
+ sb->bdev = file_bdev(sb->s_bdev_file);
ret = bch2_sb_realloc(sb, 0);
if (ret) {
The following changes since commit d206a76d7d2726f3b096037f2079ce0bd3ba329b:
Linux 6.8-rc6 (2024-02-25 15:46:06 -0800)
are available in the Git repository at:
https://evilpiepirate.org/git/bcachefs.git tags/bcachefs-20240312
for you to fetch changes up to 243c934566b7b0f9103201e259f5373ba38126c6:
bcachefs: reconstruct_alloc cleanup (2024-03-12 02:19:54 -0400)
----------------------------------------------------------------
bcachefs updates for 6.9
- Subvolume children btree; this is needed for providing a userspace
interface for walking subvolumes, which will come later
- Lots of improvements to directory structure checking
- Improved journal pipelining, significantly improving performance on
high iodepth write workloads
- Discard path improvements: the discard path is more efficient, and no
longer flushes the journal unnecessarily
- Buffered write path can now avoid taking the inode lock
- Pull out various library code for use in XFS: time stats,
mean_and_variance, darray, eytzinger, thread_with_file
- new mm helper: memalloc_flags_{save|restore}
- mempool now does kvmalloc mempools
----------------------------------------------------------------
Brian Foster (1):
bcachefs: fix lost journal buf wakeup due to improved pipelining
Calvin Owens (1):
bcachefs: Silence gcc warnings about arm arch ABI drift
Colin Ian King (1):
bcachefs: remove redundant assignment to variable ret
Daniel Hill (1):
bcachefs: rebalance_status now shows correct units
Darrick J. Wong (13):
time_stats: report lifetime of the stats object
time_stats: split stats-with-quantiles into a separate structure
time_stats: fix struct layout bloat
time_stats: add larger units
time_stats: don't print any output if event count is zero
time_stats: allow custom epoch names
mean_and_variance: put struct mean_and_variance_weighted on a diet
time_stats: shrink time_stat_buffer for better alignment
time_stats: report information in json format
thread_with_file: allow creation of readonly files
thread_with_file: fix various printf problems
thread_with_file: create ops structure for thread_with_stdio
thread_with_file: allow ioctls against these files
Erick Archer (1):
bcachefs: Prefer struct_size over open coded arithmetic
Guoyu Ou (1):
bcachefs: skip invisible entries in empty subvolume checking
Hongbo Li (3):
bcachefs: fix the error code when mounting with incorrect options.
bcachefs: avoid returning private error code in bch2_xattr_bcachefs_set
bcachefs: intercept mountoption value for bool type
Kent Overstreet (116):
bcachefs: journal_seq_blacklist_add() now handles entries being added out of order
bcachefs: extent_entry_next_safe()
bcachefs: no_splitbrain_check option
bcachefs: fix check_inode_deleted_list()
bcachefs: Fix journal replay with unreadable btree roots
bcachefs: Fix degraded mode fsck
bcachefs: Correctly validate k->u64s in btree node read path
bcachefs: Set path->uptodate when no node at level
bcachefs: fix split brain message
bcachefs: Kill unnecessary wakeups in journal reclaim
bcachefs: Split out journal workqueue
bcachefs: Avoid setting j->write_work unnecessarily
bcachefs: Journal writes should be REQ_SYNC|REQ_META
bcachefs: Avoid taking journal lock unnecessarily
bcachefs: fixup for building in userspace
bcachefs: Improve bch2_dirent_to_text()
bcachefs: Workqueues should be WQ_HIGHPRI
bcachefs: bch2_hash_set_snapshot() -> bch2_hash_set_in_snapshot()
bcachefs: Cleanup bch2_dirent_lookup_trans()
bcachefs: convert journal replay ptrs to darray
bcachefs: improve journal entry read fsck error messages
bcachefs: jset_entry_datetime
bcachefs: bio per journal buf
bcachefs: closure per journal buf
bcachefs: better journal pipelining
bcachefs: btree_and_journal_iter.trans
bcachefs: btree node prefetching in check_topology
bcachefs: Subvolumes may now be renamed
bcachefs: Switch to uuid_to_fsid()
bcachefs: Initialize super_block->s_uuid
bcachefs: move fsck_write_inode() to inode.c
bcachefs: bump max_active on btree_interior_update_worker
bcachefs: Kill some -EINVALs
bcachefs: Factor out check_subvol_dirent()
bcachefs: factor out check_inode_backpointer()
mm: introduce memalloc_flags_{save,restore}
mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN
bcachefs: bch2_inode_insert()
bcachefs: bch2_lookup() gives better error message on inode not found
mean and variance: Promote to lib/math
eytzinger: Promote to include/linux/
bcachefs: bch2_time_stats_to_seq_buf()
time_stats: Promote to lib/
bcache: Convert to lib/time_stats
time_stats: Kill TIME_STATS_HAVE_QUANTILES
mempool: kvmalloc pool
bcachefs: kill kvpmalloc()
bcachefs: thread_with_stdio: eliminate double buffering
bcachefs: thread_with_stdio: convert to darray
bcachefs: thread_with_stdio: kill thread_with_stdio_done()
bcachefs: thread_with_stdio: fix bch2_stdio_redirect_readline()
bcachefs: Thread with file documentation
darray: lift from bcachefs
thread_with_file: Lift from bcachefs
thread_with_stdio: Mark completed in ->release()
kernel/hung_task.c: export sysctl_hung_task_timeout_secs
thread_with_stdio: suppress hung task warning
bcachefs: Kill more -EIO error codes
bcachefs: Check subvol <-> inode pointers in check_subvol()
bcachefs: Check subvol <-> inode pointers in check_inode()
bcachefs: check_inode_dirent_inode()
bcachefs: better log message in lookup_inode_for_snapshot()
bcachefs: check bi_parent_subvol in check_inode()
bcachefs: simplify check_dirent_inode_dirent()
bcachefs: delete duplicated checks in check_dirent_to_subvol()
bcachefs: check inode->bi_parent_subvol against dirent
bcachefs: check dirent->d_parent_subvol
bcachefs: Repair subvol dirents that point to non subvols
bcachefs: bch_subvolume::parent -> creation_parent
bcachefs: Fix path where dirent -> subvol missing and we don't fix
bcachefs: Pass inode bkey to check_path()
bcachefs: check_path() now prints full inode when reattaching
bcachefs: Correctly reattach subvolumes
bcachefs: bch2_btree_bit_mod -> bch2_btree_bit_mod_buffered
bcachefs: bch2_btree_bit_mod()
bcachefs: bch_subvolume::fs_path_parent
bcachefs: BTREE_ID_subvolume_children
bcachefs: Check for subvolume children when deleting subvolumes
bcachefs: Pin btree cache in ram for random access in fsck
bcachefs: Save key_cache_path in peek_slot()
bcachefs: Track iter->ip_allocated at bch2_trans_copy_iter()
bcachefs: Use kvzalloc() when dynamically allocating btree paths
bcachefs: Improve error messages in device remove path
bcachefs: bch2_print_opts()
thread_with_file: Fix missing va_end()
bcachefs: bch2_trigger_alloc() handles state changes better
bcachefs: bch2_check_subvolume_structure()
bcachefs: check_path() now only needs to walk up to subvolume root
bcachefs: more informative write path error message
bcachefs: Drop redundant btree_path_downgrade()s
bcachefs: improve bch2_journal_buf_to_text()
bcachefs: Split out discard fastpath
bcachefs: Fix journal_buf bitfield accesses
bcachefs: Add journal.blocked to journal_debug_to_text()
thread_with_file: add f_ops.flush
bcachefs: Errcode tracepoint, documentation
bcachefs: jset_entry for loops declare loop iter
bcachefs: Rename journal_keys.d -> journal_keys.data
bcachefs: journal_keys now uses darray helpers
bcachefs: improve move_gap()
bcachefs: split out ignore_blacklisted, ignore_not_dirty
bcachefs: Fix bch2_journal_noflush_seq()
fs: file_remove_privs_flags()
bcachefs: Buffered write path now can avoid the inode lock
bcachefs: Split out bkey_types.h
bcachefs: copy_(to|from)_user_errcode()
lib/generic-radix-tree.c: Make nodes more reasonably sized
bcachefs: fix bch2_journal_buf_to_text()
bcachefs: Check for writing superblocks with nonsense member seq fields
bcachefs: Kill unused flags argument to btree_split()
bcachefs: fix deletion of indirect extents in btree_gc
bcachefs: Fix order of gc_done passes
bcachefs: Always flush write buffer in delete_dead_inodes()
bcachefs: Fix btree key cache coherency during replay
bcachefs: fix bch_folio_sector padding
bcachefs: reconstruct_alloc cleanup
Li Zetao (1):
bcachefs: Fix null-ptr-deref in bch2_fs_alloc()
Lukas Bulwahn (1):
MAINTAINERS: repair file entries in THREAD WITH FILE
Thomas Bertschinger (1):
bcachefs: omit alignment attribute on big endian struct bkey
Documentation/filesystems/bcachefs/errorcodes.rst | 30 +
MAINTAINERS | 39 +
drivers/md/bcache/Kconfig | 1 +
drivers/md/bcache/bcache.h | 1 +
drivers/md/bcache/bset.c | 6 +-
drivers/md/bcache/bset.h | 1 +
drivers/md/bcache/btree.c | 6 +-
drivers/md/bcache/super.c | 7 +
drivers/md/bcache/sysfs.c | 25 +-
drivers/md/bcache/util.c | 30 -
drivers/md/bcache/util.h | 52 +-
fs/bcachefs/Kconfig | 11 +-
fs/bcachefs/Makefile | 6 +-
fs/bcachefs/alloc_background.c | 219 +++++-
fs/bcachefs/alloc_background.h | 1 +
fs/bcachefs/alloc_foreground.c | 13 +-
fs/bcachefs/backpointers.c | 143 ++--
fs/bcachefs/bbpos_types.h | 2 +-
fs/bcachefs/bcachefs.h | 29 +-
fs/bcachefs/bcachefs_format.h | 53 +-
fs/bcachefs/bkey.h | 207 +----
fs/bcachefs/bkey_types.h | 213 ++++++
fs/bcachefs/bset.c | 2 +-
fs/bcachefs/btree_cache.c | 39 +-
fs/bcachefs/btree_gc.c | 153 ++--
fs/bcachefs/btree_io.c | 30 +-
fs/bcachefs/btree_iter.c | 28 +-
fs/bcachefs/btree_journal_iter.c | 180 +++--
fs/bcachefs/btree_journal_iter.h | 14 +-
fs/bcachefs/btree_key_cache.c | 8 +-
fs/bcachefs/btree_locking.c | 3 +-
fs/bcachefs/btree_locking.h | 2 +-
fs/bcachefs/btree_types.h | 11 +-
fs/bcachefs/btree_update.c | 25 +-
fs/bcachefs/btree_update.h | 3 +-
fs/bcachefs/btree_update_interior.c | 91 ++-
fs/bcachefs/btree_update_interior.h | 2 +
fs/bcachefs/btree_write_buffer.c | 4 +-
fs/bcachefs/btree_write_buffer_types.h | 2 +-
fs/bcachefs/buckets.c | 32 +-
fs/bcachefs/chardev.c | 63 +-
fs/bcachefs/checksum.c | 2 +-
fs/bcachefs/compress.c | 14 +-
fs/bcachefs/debug.c | 6 +-
fs/bcachefs/dirent.c | 143 ++--
fs/bcachefs/dirent.h | 6 +-
fs/bcachefs/ec.c | 4 +-
fs/bcachefs/errcode.c | 15 +-
fs/bcachefs/errcode.h | 18 +-
fs/bcachefs/error.c | 14 +-
fs/bcachefs/error.h | 2 +-
fs/bcachefs/extents.h | 11 +-
fs/bcachefs/fifo.h | 4 +-
fs/bcachefs/fs-common.c | 74 +-
fs/bcachefs/fs-io-buffered.c | 149 +++-
fs/bcachefs/fs-io-pagecache.h | 9 +-
fs/bcachefs/fs.c | 222 ++++--
fs/bcachefs/fsck.c | 849 ++++++++++++++-------
fs/bcachefs/fsck.h | 1 +
fs/bcachefs/inode.c | 55 +-
fs/bcachefs/inode.h | 19 +
fs/bcachefs/io_read.c | 6 +-
fs/bcachefs/io_write.c | 20 +-
fs/bcachefs/journal.c | 282 ++++---
fs/bcachefs/journal.h | 7 +-
fs/bcachefs/journal_io.c | 409 +++++-----
fs/bcachefs/journal_io.h | 47 +-
fs/bcachefs/journal_reclaim.c | 29 +-
fs/bcachefs/journal_sb.c | 2 +-
fs/bcachefs/journal_seq_blacklist.c | 75 +-
fs/bcachefs/journal_types.h | 36 +-
fs/bcachefs/lru.c | 7 +-
fs/bcachefs/migrate.c | 8 +-
fs/bcachefs/nocow_locking.c | 2 +-
fs/bcachefs/opts.c | 8 +-
fs/bcachefs/opts.h | 10 +
fs/bcachefs/rebalance.c | 4 +-
fs/bcachefs/recovery.c | 88 ++-
fs/bcachefs/recovery_types.h | 2 +
fs/bcachefs/replicas.c | 19 +-
fs/bcachefs/replicas.h | 3 +-
fs/bcachefs/sb-clean.c | 16 -
fs/bcachefs/sb-downgrade.c | 13 +-
fs/bcachefs/sb-errors_types.h | 21 +-
fs/bcachefs/sb-members.h | 2 +-
fs/bcachefs/str_hash.h | 15 +-
fs/bcachefs/subvolume.c | 187 ++++-
fs/bcachefs/subvolume.h | 9 +-
fs/bcachefs/subvolume_format.h | 4 +-
fs/bcachefs/subvolume_types.h | 2 +-
fs/bcachefs/super-io.c | 22 +-
fs/bcachefs/super-io.h | 2 +-
fs/bcachefs/super.c | 97 ++-
fs/bcachefs/sysfs.c | 4 +-
fs/bcachefs/thread_with_file.c | 299 --------
fs/bcachefs/thread_with_file.h | 41 -
fs/bcachefs/thread_with_file_types.h | 16 -
fs/bcachefs/trace.h | 19 +
fs/bcachefs/util.c | 374 +--------
fs/bcachefs/util.h | 180 +----
fs/bcachefs/xattr.c | 5 +-
fs/inode.c | 7 +-
{fs/bcachefs => include/linux}/darray.h | 59 +-
include/linux/darray_types.h | 22 +
{fs/bcachefs => include/linux}/eytzinger.h | 58 +-
include/linux/fs.h | 1 +
include/linux/generic-radix-tree.h | 29 +-
{fs/bcachefs => include/linux}/mean_and_variance.h | 14 +-
include/linux/mempool.h | 13 +
include/linux/sched.h | 4 +-
include/linux/sched/mm.h | 60 +-
include/linux/thread_with_file.h | 79 ++
include/linux/thread_with_file_types.h | 25 +
include/linux/time_stats.h | 167 ++++
kernel/hung_task.c | 1 +
lib/Kconfig | 7 +
lib/Kconfig.debug | 9 +
lib/Makefile | 5 +-
{fs/bcachefs => lib}/darray.c | 12 +-
lib/generic-radix-tree.c | 35 +-
lib/math/Kconfig | 3 +
lib/math/Makefile | 2 +
{fs/bcachefs => lib/math}/mean_and_variance.c | 31 +-
{fs/bcachefs => lib/math}/mean_and_variance_test.c | 83 +-
lib/sort.c | 89 +++
lib/thread_with_file.c | 454 +++++++++++
lib/time_stats.c | 373 +++++++++
mm/mempool.c | 13 +
128 files changed, 4583 insertions(+), 2868 deletions(-)
create mode 100644 Documentation/filesystems/bcachefs/errorcodes.rst
create mode 100644 fs/bcachefs/bkey_types.h
delete mode 100644 fs/bcachefs/thread_with_file.c
delete mode 100644 fs/bcachefs/thread_with_file.h
delete mode 100644 fs/bcachefs/thread_with_file_types.h
rename {fs/bcachefs => include/linux}/darray.h (66%)
create mode 100644 include/linux/darray_types.h
rename {fs/bcachefs => include/linux}/eytzinger.h (77%)
rename {fs/bcachefs => include/linux}/mean_and_variance.h (96%)
create mode 100644 include/linux/thread_with_file.h
create mode 100644 include/linux/thread_with_file_types.h
create mode 100644 include/linux/time_stats.h
rename {fs/bcachefs => lib}/darray.c (56%)
rename {fs/bcachefs => lib/math}/mean_and_variance.c (90%)
rename {fs/bcachefs => lib/math}/mean_and_variance_test.c (78%)
create mode 100644 lib/thread_with_file.c
create mode 100644 lib/time_stats.c
On Tue, 12 Mar 2024 at 18:10, Kent Overstreet <[email protected]> wrote:
>
> Hi Linus, few patches for you - plus a simple merge conflict with VFS
> changes:
The conflicts are trivial.
The "make random bcachefs code be a library function" stuff I looked
at, decided is senseless, and ended up meaning that I'm not pulling
this without a lot more explanation (and honestly, I don't think the
explanations would hold water).
That "stdio_redirect_printf()" and darray_char stuff is just
horrendous interfaces with no explanations. The interfaces are
disgusting.
Keep it in your own code where it belongs, don't try to make it some
generic library thing.
And if you *do* make it a library thing, it needs to be
(a) much more explained
(b) have much saner naming, and fewer disgusting and completely
nonsensical interfaces ("DARRAY()").
And no, finding one other filesystem to share this kind of code is not
sufficient to try to claim it's a sane interface and sane naming.
But the main dealbreaker is the insane math.
And dammit, we talked about the idiotic "mean and variance" garbage
long ago. It was wrong back then, it's *still* wrong.
You didn't explain why it couldn't use the *much* simpler MAD (median
absolute deviation) instead of using variance.
That bad decision directly results in that pointless use of overly
complex 128-bit math.
I called it insanely over-engineered back then, and as far as I can
tell, absolutely *NOTHING* has changed apart from some slight type
name details.
As long as you made it some kind of bcachefs-only thing, I don't mind.
But now you're trying to push this garbage as some kind of generic
library code that others would use, and that immediately means that I
*do* mind insanely overengineered interfaces.
The time_stats stuff otherwise looks at leask like a sane interface
with names and uses, but the use of that horrendous infrastructure
scuttles it.
Linus
On Wed, Mar 13, 2024 at 01:47:59PM -0700, Linus Torvalds wrote:
> On Tue, 12 Mar 2024 at 18:10, Kent Overstreet <[email protected]> wrote:
> >
> > Hi Linus, few patches for you - plus a simple merge conflict with VFS
> > changes:
>
> The conflicts are trivial.
>
> The "make random bcachefs code be a library function" stuff I looked
> at, decided is senseless, and ended up meaning that I'm not pulling
> this without a lot more explanation (and honestly, I don't think the
> explanations would hold water).
>
> That "stdio_redirect_printf()" and darray_char stuff is just
> horrendous interfaces with no explanations. The interfaces are
> disgusting.
It's a bidirectional pipe between a kthread and an fd. Not sure what's
complicated about that?
> And if you *do* make it a library thing, it needs to be
>
> (a) much more explained
>
> (b) have much saner naming, and fewer disgusting and completely
> nonsensical interfaces ("DARRAY()").
DARRAY() is just a dynamic array, aka a c++ vector; we open code those so
much it's _stupid_. I wouldn't be opposed to changing the name to
something more standard (Rust calls it a vector too); I started out with
the CCAN version and rewrote it later for hte kernel.
> And no, finding one other filesystem to share this kind of code is not
> sufficient to try to claim it's a sane interface and sane naming.
>
> But the main dealbreaker is the insane math.
>
> And dammit, we talked about the idiotic "mean and variance" garbage
> long ago. It was wrong back then, it's *still* wrong.
>
> You didn't explain why it couldn't use the *much* simpler MAD (median
> absolute deviation) instead of using variance.
I most certainly did.
I liked your MAD suggestion, but the catch was that we need an
exponentially weighted version, not just the standard version, and I
haven't seen an derivation of exponentially weighted MAD and doing that
is a bit above my statistical pay grade. I explained all this at the
time.
Besides that, the existing code works fine, the u128 stuff is right out
of Knuth (divide is the only even vaguely tricky one), and it's nicely
self contained. It's fine.
> I called it insanely over-engineered back then, and as far as I can
> tell, absolutely *NOTHING* has changed apart from some slight type
> name details.
>
> As long as you made it some kind of bcachefs-only thing, I don't mind.
>
> But now you're trying to push this garbage as some kind of generic
> library code that others would use, and that immediately means that I
> *do* mind insanely overengineered interfaces.
>
> The time_stats stuff otherwise looks at leask like a sane interface
> with names and uses, but the use of that horrendous infrastructure
> scuttles it.
Well, that leaves us at a bit of an impasse then because Darrick wants
this stuff for XFS (he was discovering useful stuff with it pretty much
right away) and I'm just not doing a MAD conversion, sorry. I'm just
being practical here, I like MAD in principle but that's too far outside
my wheelhouse.
Maybe we can get someone else interested? I have a feeling Peter could
whip it out in about 5 minutes...
On Wed, 13 Mar 2024 at 14:34, Kent Overstreet <[email protected]> wrote:
>
> I liked your MAD suggestion, but the catch was that we need an
> exponentially weighted version,
The code for the weighted version literally doesn't change.
The variance value is different, but the difference between MAD and
standard deviation is basically just a constant factor (which will be
different for different distributions, but so what? Any _particular_
case will have a particular distribution).
So why would a constant factor make _any_ difference for any
exponential weighting?
Anyway, feel free to keep your code in bcachefs.
And maybe xfs even wants to copy that code. I don't care, it seems
stupid, but that's a filesystem choice.
But if we're making it a generic kernel library, it needs to be sane.
Not making people do 64-bit square roots and 128-bit divides just for
a random statistical element.
Linus
On Wed, Mar 13, 2024 at 02:51:38PM -0700, Linus Torvalds wrote:
> On Wed, 13 Mar 2024 at 14:34, Kent Overstreet <[email protected]> wrote:
> >
> > I liked your MAD suggestion, but the catch was that we need an
> > exponentially weighted version,
>
> The code for the weighted version literally doesn't change.
Well, no, and there's another problem I can't believe I missed until
now. MAD is defined as median of the absolute deviations, not mean, and
you can't compute a median incrementally.
So MAD doesn't work here at all.
On Wed, Mar 13, 2024 at 06:22:57PM -0400, Kent Overstreet wrote:
> On Wed, Mar 13, 2024 at 02:51:38PM -0700, Linus Torvalds wrote:
> > On Wed, 13 Mar 2024 at 14:34, Kent Overstreet <[email protected]> wrote:
> > >
> > > I liked your MAD suggestion, but the catch was that we need an
> > > exponentially weighted version,
> >
> > The code for the weighted version literally doesn't change.
>
> Well, no, and there's another problem I can't believe I missed until
> now. MAD is defined as median of the absolute deviations, not mean, and
> you can't compute a median incrementally.
>
> So MAD doesn't work here at all.
Sorry, you were talking about mean absolute deviation; that does work
here.
On Wed, 13 Mar 2024 at 15:28, Kent Overstreet <[email protected]> wrote:
>
> Sorry, you were talking about mean absolute deviation; that does work
> here.
Yes, I meant mean, not median.
But the confusion is my fault - I wrote MAD and then to "explain"
that, I put "median" in my own email - so you read it right the first
time, and it was just me being sloppy and confusing things.
They are both called MAD in their own contexts, and they are much too
easy to confuse.
My bad,
Linus