2024-02-24 01:01:07

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCHBOMB] time_stats, thread_with_file: lifting generic code to lib

Hi all,

Kent and I went on a little sprint of figuring out if there were any
pieces of bcachefs that we could steal for XFS. It turns out that there
are -- the timestats code is useful for measuring delays due to lock
contention, and thread_with_file will be very helpful for exporting
filesystem metadata health events to userspace.

So here's a pile of patchsets lifting those pieces of bcachefs to lib
and fixing a bunch of bugs in them. These patches have already been
soaking in Kent's testing tree (and -next) for a few days, but
apparently not all of them got emailed so here I am blasting out the
entire thing.

If you want to see what XFS does with this, have a look at [1] and [2].
For 6.9 it'd be helpful to get these modules lifted.

--Darrick

[1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=health-monitoring_2024-02-23
[2] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tag/?h=contention-timestats_2024-02-23


2024-02-24 01:07:59

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCHSET 1/6] time_stats: promote to lib/

Hi all,

This is Kent Overstreet's series to lift the mean and variance
computation code, as well as the time statistics code, to become
generic library code.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems. Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=timestats-hoist
---
Commits in this patchset:
* mean and variance: Promote to lib/math
* eytzinger: Promote to include/linux/
* bcachefs: bch2_time_stats_to_seq_buf()
* time_stats: Promote to lib/
---
MAINTAINERS | 22 ++
fs/bcachefs/Kconfig | 10 -
fs/bcachefs/Makefile | 3
fs/bcachefs/alloc_foreground.c | 13 -
fs/bcachefs/bcachefs.h | 11 +
fs/bcachefs/bset.c | 2
fs/bcachefs/btree_cache.c | 2
fs/bcachefs/btree_gc.c | 2
fs/bcachefs/btree_io.c | 8 -
fs/bcachefs/btree_iter.c | 8 -
fs/bcachefs/btree_locking.h | 2
fs/bcachefs/btree_update_interior.c | 8 -
fs/bcachefs/io_read.c | 4
fs/bcachefs/io_write.c | 4
fs/bcachefs/journal.c | 5 -
fs/bcachefs/journal_io.c | 9 -
fs/bcachefs/journal_reclaim.c | 9 -
fs/bcachefs/journal_seq_blacklist.c | 6 -
fs/bcachefs/journal_types.h | 11 -
fs/bcachefs/nocow_locking.c | 2
fs/bcachefs/replicas.c | 19 +-
fs/bcachefs/replicas.h | 3
fs/bcachefs/super-io.h | 2
fs/bcachefs/super.c | 14 +
fs/bcachefs/util.c | 339 ++--------------------------------
fs/bcachefs/util.h | 86 ---------
include/linux/eytzinger.h | 58 +++---
include/linux/mean_and_variance.h | 0
include/linux/time_stats.h | 134 +++++++++++++
lib/Kconfig | 4
lib/Kconfig.debug | 9 +
lib/Makefile | 2
lib/math/Kconfig | 3
lib/math/Makefile | 2
lib/math/mean_and_variance.c | 3
lib/math/mean_and_variance_test.c | 3
lib/sort.c | 89 +++++++++
lib/time_stats.c | 271 +++++++++++++++++++++++++++
38 files changed, 662 insertions(+), 520 deletions(-)
rename fs/bcachefs/eytzinger.h => include/linux/eytzinger.h (77%)
rename fs/bcachefs/mean_and_variance.h => include/linux/mean_and_variance.h (100%)
create mode 100644 include/linux/time_stats.h
rename fs/bcachefs/mean_and_variance.c => lib/math/mean_and_variance.c (99%)
rename fs/bcachefs/mean_and_variance_test.c => lib/math/mean_and_variance_test.c (99%)
create mode 100644 lib/time_stats.c


2024-02-24 01:08:12

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCHSET 2/6] time_stats: cleanups and fixes

Hi all,

This series reduces the memory consumption of individual time_stats
objects, and adds reporting of how long each stat counter has been
making observations. It's a prep patch for adding some counters to XFS,
for which we'll want as low overhead as possible to maximimze
shotgunning effect.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems. Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=timestats-cleanups
---
Commits in this patchset:
* time_stats: report lifetime of the stats object
* time_stats: split stats-with-quantiles into a separate structure
* time_stats: fix struct layout bloat
* time_stats: add larger units
* time_stats: don't print any output if event count is zero
* time_stats: allow custom epoch names
* mean_and_variance: put struct mean_and_variance_weighted on a diet
* time_stats: shrink time_stat_buffer for better alignment
* time_stats: report information in json format
* time_stats: Kill TIME_STATS_HAVE_QUANTILES
---
fs/bcachefs/bcachefs.h | 2 -
fs/bcachefs/io_write.c | 2 -
fs/bcachefs/super.c | 10 +--
fs/bcachefs/sysfs.c | 4 +
fs/bcachefs/util.c | 15 ++--
include/linux/mean_and_variance.h | 14 ++--
include/linux/time_stats.h | 43 ++++++++++-
lib/math/mean_and_variance.c | 28 +++++--
lib/math/mean_and_variance_test.c | 80 +++++++++++---------
lib/time_stats.c | 148 +++++++++++++++++++++++++++++++------
10 files changed, 248 insertions(+), 98 deletions(-)


2024-02-24 01:08:51

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCHSET 4/6] thread_with_file: promote to lib/

Hi all,

This is Kent Overstreet's series to lift the thread_with_file support
code to generic library code. This enables the kernel to create a
pseudo file that userspace can use to read deeply structured event
information from the kernel. kthreads are used to manage the buffers
underlying the file operations.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems. Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=twf-hoist
---
Commits in this patchset:
* bcachefs: thread_with_stdio: eliminate double buffering
* bcachefs: thread_with_stdio: convert to darray
* bcachefs: thread_with_stdio: kill thread_with_stdio_done()
* bcachefs: thread_with_stdio: fix bch2_stdio_redirect_readline()
* bcachefs: Thread with file documentation
* darray: lift from bcachefs
* thread_with_file: Lift from bcachefs
* thread_with_stdio: Mark completed in ->release()
* kernel/hung_task.c: export sysctl_hung_task_timeout_secs
* thread_with_stdio: suppress hung task warning
---
MAINTAINERS | 16 +
fs/bcachefs/Kconfig | 1
fs/bcachefs/Makefile | 2
fs/bcachefs/bcachefs.h | 2
fs/bcachefs/btree_types.h | 2
fs/bcachefs/btree_update.c | 2
fs/bcachefs/btree_write_buffer_types.h | 2
fs/bcachefs/chardev.c | 24 +-
fs/bcachefs/error.c | 4
fs/bcachefs/fsck.c | 2
fs/bcachefs/journal_sb.c | 2
fs/bcachefs/sb-downgrade.c | 3
fs/bcachefs/sb-errors_types.h | 2
fs/bcachefs/sb-members.h | 2
fs/bcachefs/subvolume.h | 1
fs/bcachefs/subvolume_types.h | 2
fs/bcachefs/super.c | 9 -
fs/bcachefs/thread_with_file.c | 299 -------------------------
fs/bcachefs/thread_with_file.h | 41 ---
fs/bcachefs/thread_with_file_types.h | 16 -
fs/bcachefs/util.h | 29 --
include/linux/darray.h | 59 +++--
include/linux/darray_types.h | 22 ++
include/linux/thread_with_file.h | 71 ++++++
include/linux/thread_with_file_types.h | 25 ++
kernel/hung_task.c | 1
lib/Kconfig | 3
lib/Makefile | 3
lib/darray.c | 12 +
lib/thread_with_file.c | 379 ++++++++++++++++++++++++++++++++
30 files changed, 596 insertions(+), 442 deletions(-)
delete mode 100644 fs/bcachefs/thread_with_file.c
delete mode 100644 fs/bcachefs/thread_with_file.h
delete mode 100644 fs/bcachefs/thread_with_file_types.h
rename fs/bcachefs/darray.h => include/linux/darray.h (66%)
create mode 100644 include/linux/darray_types.h
create mode 100644 include/linux/thread_with_file.h
create mode 100644 include/linux/thread_with_file_types.h
rename fs/bcachefs/darray.c => lib/darray.c (56%)
create mode 100644 lib/thread_with_file.c


2024-02-24 01:09:06

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCHSET 5/6] thread_with_file: cleanups and fixes

Hi all,

This series fixes some problems with the thread_with_file code -- namely
that blocking stdout writes attempt a non-atomic memory allocation while
holding a spinlock. It also cleans up the ops handling so that we can
support ioctls on the thread_with_file itself.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems. Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=twf-cleanups
---
Commits in this patchset:
* thread_with_file: allow creation of readonly files
* thread_with_file: fix various printf problems
* thread_with_file: create ops structure for thread_with_stdio
* thread_with_file: allow ioctls against these files
* thread_with_file: Fix missing va_end()
---
fs/bcachefs/chardev.c | 18 ++++--
include/linux/thread_with_file.h | 20 +++++--
lib/thread_with_file.c | 113 ++++++++++++++++++++++++++++++--------
3 files changed, 115 insertions(+), 36 deletions(-)


2024-02-24 01:09:28

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 1/4] mean and variance: Promote to lib/math

From: Kent Overstreet <[email protected]>

Small statistics library, for taking in a series of value and computing
mean, weighted mean, standard deviation and weighted deviation.

The main use case is for statistics on latency measurements.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Daniel Hill <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
MAINTAINERS | 9 +++++++++
fs/bcachefs/Kconfig | 10 +---------
fs/bcachefs/Makefile | 3 ---
fs/bcachefs/util.c | 2 +-
fs/bcachefs/util.h | 3 +--
include/linux/mean_and_variance.h | 0
lib/Kconfig.debug | 9 +++++++++
lib/math/Kconfig | 3 +++
lib/math/Makefile | 2 ++
lib/math/mean_and_variance.c | 3 +--
lib/math/mean_and_variance_test.c | 3 +--
11 files changed, 28 insertions(+), 19 deletions(-)
rename fs/bcachefs/mean_and_variance.h => include/linux/mean_and_variance.h (100%)
rename fs/bcachefs/mean_and_variance.c => lib/math/mean_and_variance.c (99%)
rename fs/bcachefs/mean_and_variance_test.c => lib/math/mean_and_variance_test.c (99%)


diff --git a/MAINTAINERS b/MAINTAINERS
index 9ed4d38685394..3e13de69b7f07 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13387,6 +13387,15 @@ S: Maintained
F: drivers/net/mdio/mdio-regmap.c
F: include/linux/mdio/mdio-regmap.h

+MEAN AND VARIANCE LIBRARY
+M: Daniel B. Hill <[email protected]>
+M: Kent Overstreet <[email protected]>
+S: Maintained
+T: git https://github.com/YellowOnion/linux/
+F: include/linux/mean_and_variance.h
+F: lib/math/mean_and_variance.c
+F: lib/math/mean_and_variance_test.c
+
MEASUREMENT COMPUTING CIO-DAC IIO DRIVER
M: William Breathitt Gray <[email protected]>
L: [email protected]
diff --git a/fs/bcachefs/Kconfig b/fs/bcachefs/Kconfig
index 5cdfef3b551a7..72d1179262b33 100644
--- a/fs/bcachefs/Kconfig
+++ b/fs/bcachefs/Kconfig
@@ -24,6 +24,7 @@ config BCACHEFS_FS
select XXHASH
select SRCU
select SYMBOLIC_ERRNAME
+ select MEAN_AND_VARIANCE
help
The bcachefs filesystem - a modern, copy on write filesystem, with
support for multiple devices, compression, checksumming, etc.
@@ -86,12 +87,3 @@ config BCACHEFS_SIX_OPTIMISTIC_SPIN
Instead of immediately sleeping when attempting to take a six lock that
is held by another thread, spin for a short while, as long as the
thread owning the lock is running.
-
-config MEAN_AND_VARIANCE_UNIT_TEST
- tristate "mean_and_variance unit tests" if !KUNIT_ALL_TESTS
- depends on KUNIT
- depends on BCACHEFS_FS
- default KUNIT_ALL_TESTS
- help
- This option enables the kunit tests for mean_and_variance module.
- If unsure, say N.
diff --git a/fs/bcachefs/Makefile b/fs/bcachefs/Makefile
index 1a05cecda7cc5..b11ba74b8ad41 100644
--- a/fs/bcachefs/Makefile
+++ b/fs/bcachefs/Makefile
@@ -57,7 +57,6 @@ bcachefs-y := \
keylist.o \
logged_ops.o \
lru.o \
- mean_and_variance.o \
migrate.o \
move.o \
movinggc.o \
@@ -88,5 +87,3 @@ bcachefs-y := \
util.o \
varint.o \
xattr.o
-
-obj-$(CONFIG_MEAN_AND_VARIANCE_UNIT_TEST) += mean_and_variance_test.o
diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c
index 231003b405efc..c9d13dcf3ef1a 100644
--- a/fs/bcachefs/util.c
+++ b/fs/bcachefs/util.c
@@ -22,9 +22,9 @@
#include <linux/string.h>
#include <linux/types.h>
#include <linux/sched/clock.h>
+#include <linux/mean_and_variance.h>

#include "eytzinger.h"
-#include "mean_and_variance.h"
#include "util.h"

static const char si_units[] = "?kMGTPEZY";
diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
index b414736d59a5b..0059481995ef7 100644
--- a/fs/bcachefs/util.h
+++ b/fs/bcachefs/util.h
@@ -17,8 +17,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/workqueue.h>
-
-#include "mean_and_variance.h"
+#include <linux/mean_and_variance.h>

#include "darray.h"

diff --git a/fs/bcachefs/mean_and_variance.h b/include/linux/mean_and_variance.h
similarity index 100%
rename from fs/bcachefs/mean_and_variance.h
rename to include/linux/mean_and_variance.h
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 975a07f9f1cc0..817ddfe132cda 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2191,6 +2191,15 @@ config CPUMASK_KUNIT_TEST

If unsure, say N.

+config MEAN_AND_VARIANCE_UNIT_TEST
+ tristate "mean_and_variance unit tests" if !KUNIT_ALL_TESTS
+ depends on KUNIT
+ select MEAN_AND_VARIANCE
+ default KUNIT_ALL_TESTS
+ help
+ This option enables the kunit tests for mean_and_variance module.
+ If unsure, say N.
+
config TEST_LIST_SORT
tristate "Linked list sorting test" if !KUNIT_ALL_TESTS
depends on KUNIT
diff --git a/lib/math/Kconfig b/lib/math/Kconfig
index 0634b428d0cb7..7530ae9a3584f 100644
--- a/lib/math/Kconfig
+++ b/lib/math/Kconfig
@@ -15,3 +15,6 @@ config PRIME_NUMBERS

config RATIONAL
tristate
+
+config MEAN_AND_VARIANCE
+ tristate
diff --git a/lib/math/Makefile b/lib/math/Makefile
index 91fcdb0c9efe4..8cdfa13a67ce0 100644
--- a/lib/math/Makefile
+++ b/lib/math/Makefile
@@ -4,6 +4,8 @@ obj-y += div64.o gcd.o lcm.o int_log.o int_pow.o int_sqrt.o reciprocal_div.o
obj-$(CONFIG_CORDIC) += cordic.o
obj-$(CONFIG_PRIME_NUMBERS) += prime_numbers.o
obj-$(CONFIG_RATIONAL) += rational.o
+obj-$(CONFIG_MEAN_AND_VARIANCE) += mean_and_variance.o

obj-$(CONFIG_TEST_DIV64) += test_div64.o
obj-$(CONFIG_RATIONAL_KUNIT_TEST) += rational-test.o
+obj-$(CONFIG_MEAN_AND_VARIANCE_UNIT_TEST) += mean_and_variance_test.o
diff --git a/fs/bcachefs/mean_and_variance.c b/lib/math/mean_and_variance.c
similarity index 99%
rename from fs/bcachefs/mean_and_variance.c
rename to lib/math/mean_and_variance.c
index bf0ef668fd383..ba90293204bae 100644
--- a/fs/bcachefs/mean_and_variance.c
+++ b/lib/math/mean_and_variance.c
@@ -40,10 +40,9 @@
#include <linux/limits.h>
#include <linux/math.h>
#include <linux/math64.h>
+#include <linux/mean_and_variance.h>
#include <linux/module.h>

-#include "mean_and_variance.h"
-
u128_u u128_div(u128_u n, u64 d)
{
u128_u r;
diff --git a/fs/bcachefs/mean_and_variance_test.c b/lib/math/mean_and_variance_test.c
similarity index 99%
rename from fs/bcachefs/mean_and_variance_test.c
rename to lib/math/mean_and_variance_test.c
index 019583c3ca0ea..f45591a169d87 100644
--- a/fs/bcachefs/mean_and_variance_test.c
+++ b/lib/math/mean_and_variance_test.c
@@ -1,7 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include <kunit/test.h>
-
-#include "mean_and_variance.h"
+#include <linux/mean_and_variance.h>

#define MAX_SQR (SQRT_U64_MAX*SQRT_U64_MAX)



2024-02-24 01:09:44

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 2/4] eytzinger: Promote to include/linux/

From: Kent Overstreet <[email protected]>

eytzinger trees are a faster alternative to binary search. They're a bit
more expensive to setup, but lookups perform much better assuming the
tree isn't entirely in cache.

Binary search is a worst case scenario for branch prediction and
prefetching, but eytzinger trees have children adjacent in memory and
thus we can prefetch before knowing the result of a comparison.

An eytzinger tree is a binary tree laid out in an array, with the same
geometry as the usual binary heap construction, but used as a search
tree instead.

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
MAINTAINERS | 6 +
fs/bcachefs/bset.c | 2
fs/bcachefs/journal_seq_blacklist.c | 6 +
fs/bcachefs/replicas.c | 19 +++--
fs/bcachefs/replicas.h | 3 -
fs/bcachefs/super-io.h | 2
fs/bcachefs/util.c | 145 -----------------------------------
fs/bcachefs/util.h | 4 -
include/linux/eytzinger.h | 58 ++++++++------
lib/sort.c | 89 +++++++++++++++++++++
10 files changed, 148 insertions(+), 186 deletions(-)
rename fs/bcachefs/eytzinger.h => include/linux/eytzinger.h (77%)


diff --git a/MAINTAINERS b/MAINTAINERS
index 3e13de69b7f07..98a17270566d3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8066,6 +8066,12 @@ L: [email protected]
S: Maintained
F: drivers/iommu/exynos-iommu.c

+EYTZINGER TREE LIB
+M: Kent Overstreet <[email protected]>
+L: [email protected]
+S: Maintained
+F: include/linux/eytzinger.h
+
F2FS FILE SYSTEM
M: Jaegeuk Kim <[email protected]>
M: Chao Yu <[email protected]>
diff --git a/fs/bcachefs/bset.c b/fs/bcachefs/bset.c
index 3fd1085b6c61e..1d77aa55d641c 100644
--- a/fs/bcachefs/bset.c
+++ b/fs/bcachefs/bset.c
@@ -9,12 +9,12 @@
#include "bcachefs.h"
#include "btree_cache.h"
#include "bset.h"
-#include "eytzinger.h"
#include "trace.h"
#include "util.h"

#include <asm/unaligned.h>
#include <linux/console.h>
+#include <linux/eytzinger.h>
#include <linux/random.h>
#include <linux/prefetch.h>

diff --git a/fs/bcachefs/journal_seq_blacklist.c b/fs/bcachefs/journal_seq_blacklist.c
index 0200e299cfbb9..024c9b1b323f8 100644
--- a/fs/bcachefs/journal_seq_blacklist.c
+++ b/fs/bcachefs/journal_seq_blacklist.c
@@ -2,10 +2,11 @@

#include "bcachefs.h"
#include "btree_iter.h"
-#include "eytzinger.h"
#include "journal_seq_blacklist.h"
#include "super-io.h"

+#include <linux/eytzinger.h>
+
/*
* journal_seq_blacklist machinery:
*
@@ -119,8 +120,7 @@ int bch2_journal_seq_blacklist_add(struct bch_fs *c, u64 start, u64 end)
return ret ?: bch2_blacklist_table_initialize(c);
}

-static int journal_seq_blacklist_table_cmp(const void *_l,
- const void *_r, size_t size)
+static int journal_seq_blacklist_table_cmp(const void *_l, const void *_r)
{
const struct journal_seq_blacklist_table_entry *l = _l;
const struct journal_seq_blacklist_table_entry *r = _r;
diff --git a/fs/bcachefs/replicas.c b/fs/bcachefs/replicas.c
index cc2672c120312..678b9c20e2514 100644
--- a/fs/bcachefs/replicas.c
+++ b/fs/bcachefs/replicas.c
@@ -6,12 +6,15 @@
#include "replicas.h"
#include "super-io.h"

+#include <linux/sort.h>
+
static int bch2_cpu_replicas_to_sb_replicas(struct bch_fs *,
struct bch_replicas_cpu *);

/* Some (buggy!) compilers don't allow memcmp to be passed as a pointer */
-static int bch2_memcmp(const void *l, const void *r, size_t size)
+static int bch2_memcmp(const void *l, const void *r, const void *priv)
{
+ size_t size = (size_t) priv;
return memcmp(l, r, size);
}

@@ -39,7 +42,8 @@ void bch2_replicas_entry_sort(struct bch_replicas_entry_v1 *e)

static void bch2_cpu_replicas_sort(struct bch_replicas_cpu *r)
{
- eytzinger0_sort(r->entries, r->nr, r->entry_size, bch2_memcmp, NULL);
+ eytzinger0_sort_r(r->entries, r->nr, r->entry_size,
+ bch2_memcmp, NULL, (void *)(size_t)r->entry_size);
}

static void bch2_replicas_entry_v0_to_text(struct printbuf *out,
@@ -228,7 +232,7 @@ static inline int __replicas_entry_idx(struct bch_replicas_cpu *r,

verify_replicas_entry(search);

-#define entry_cmp(_l, _r, size) memcmp(_l, _r, entry_size)
+#define entry_cmp(_l, _r) memcmp(_l, _r, entry_size)
idx = eytzinger0_find(r->entries, r->nr, r->entry_size,
entry_cmp, search);
#undef entry_cmp
@@ -824,10 +828,11 @@ static int bch2_cpu_replicas_validate(struct bch_replicas_cpu *cpu_r,
{
unsigned i;

- sort_cmp_size(cpu_r->entries,
- cpu_r->nr,
- cpu_r->entry_size,
- bch2_memcmp, NULL);
+ sort_r(cpu_r->entries,
+ cpu_r->nr,
+ cpu_r->entry_size,
+ bch2_memcmp, NULL,
+ (void *)(size_t)cpu_r->entry_size);

for (i = 0; i < cpu_r->nr; i++) {
struct bch_replicas_entry_v1 *e =
diff --git a/fs/bcachefs/replicas.h b/fs/bcachefs/replicas.h
index 654a4b26d3a3c..983cce782ac2a 100644
--- a/fs/bcachefs/replicas.h
+++ b/fs/bcachefs/replicas.h
@@ -3,9 +3,10 @@
#define _BCACHEFS_REPLICAS_H

#include "bkey.h"
-#include "eytzinger.h"
#include "replicas_types.h"

+#include <linux/eytzinger.h>
+
void bch2_replicas_entry_sort(struct bch_replicas_entry_v1 *);
void bch2_replicas_entry_to_text(struct printbuf *,
struct bch_replicas_entry_v1 *);
diff --git a/fs/bcachefs/super-io.h b/fs/bcachefs/super-io.h
index 95e80e06316bf..f37620919e11a 100644
--- a/fs/bcachefs/super-io.h
+++ b/fs/bcachefs/super-io.h
@@ -3,12 +3,12 @@
#define _BCACHEFS_SUPER_IO_H

#include "extents.h"
-#include "eytzinger.h"
#include "super_types.h"
#include "super.h"
#include "sb-members.h"

#include <asm/byteorder.h>
+#include <linux/eytzinger.h>

static inline bool bch2_version_compatible(u16 version)
{
diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c
index c9d13dcf3ef1a..902f6b1a8a142 100644
--- a/fs/bcachefs/util.c
+++ b/fs/bcachefs/util.c
@@ -11,6 +11,7 @@
#include <linux/console.h>
#include <linux/ctype.h>
#include <linux/debugfs.h>
+#include <linux/eytzinger.h>
#include <linux/freezer.h>
#include <linux/kthread.h>
#include <linux/log2.h>
@@ -24,7 +25,6 @@
#include <linux/sched/clock.h>
#include <linux/mean_and_variance.h>

-#include "eytzinger.h"
#include "util.h"

static const char si_units[] = "?kMGTPEZY";
@@ -864,149 +864,6 @@ void memcpy_from_bio(void *dst, struct bio *src, struct bvec_iter src_iter)
}
}

-static int alignment_ok(const void *base, size_t align)
-{
- return IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
- ((unsigned long)base & (align - 1)) == 0;
-}
-
-static void u32_swap(void *a, void *b, size_t size)
-{
- u32 t = *(u32 *)a;
- *(u32 *)a = *(u32 *)b;
- *(u32 *)b = t;
-}
-
-static void u64_swap(void *a, void *b, size_t size)
-{
- u64 t = *(u64 *)a;
- *(u64 *)a = *(u64 *)b;
- *(u64 *)b = t;
-}
-
-static void generic_swap(void *a, void *b, size_t size)
-{
- char t;
-
- do {
- t = *(char *)a;
- *(char *)a++ = *(char *)b;
- *(char *)b++ = t;
- } while (--size > 0);
-}
-
-static inline int do_cmp(void *base, size_t n, size_t size,
- int (*cmp_func)(const void *, const void *, size_t),
- size_t l, size_t r)
-{
- return cmp_func(base + inorder_to_eytzinger0(l, n) * size,
- base + inorder_to_eytzinger0(r, n) * size,
- size);
-}
-
-static inline void do_swap(void *base, size_t n, size_t size,
- void (*swap_func)(void *, void *, size_t),
- size_t l, size_t r)
-{
- swap_func(base + inorder_to_eytzinger0(l, n) * size,
- base + inorder_to_eytzinger0(r, n) * size,
- size);
-}
-
-void eytzinger0_sort(void *base, size_t n, size_t size,
- int (*cmp_func)(const void *, const void *, size_t),
- void (*swap_func)(void *, void *, size_t))
-{
- int i, c, r;
-
- if (!swap_func) {
- if (size == 4 && alignment_ok(base, 4))
- swap_func = u32_swap;
- else if (size == 8 && alignment_ok(base, 8))
- swap_func = u64_swap;
- else
- swap_func = generic_swap;
- }
-
- /* heapify */
- for (i = n / 2 - 1; i >= 0; --i) {
- for (r = i; r * 2 + 1 < n; r = c) {
- c = r * 2 + 1;
-
- if (c + 1 < n &&
- do_cmp(base, n, size, cmp_func, c, c + 1) < 0)
- c++;
-
- if (do_cmp(base, n, size, cmp_func, r, c) >= 0)
- break;
-
- do_swap(base, n, size, swap_func, r, c);
- }
- }
-
- /* sort */
- for (i = n - 1; i > 0; --i) {
- do_swap(base, n, size, swap_func, 0, i);
-
- for (r = 0; r * 2 + 1 < i; r = c) {
- c = r * 2 + 1;
-
- if (c + 1 < i &&
- do_cmp(base, n, size, cmp_func, c, c + 1) < 0)
- c++;
-
- if (do_cmp(base, n, size, cmp_func, r, c) >= 0)
- break;
-
- do_swap(base, n, size, swap_func, r, c);
- }
- }
-}
-
-void sort_cmp_size(void *base, size_t num, size_t size,
- int (*cmp_func)(const void *, const void *, size_t),
- void (*swap_func)(void *, void *, size_t size))
-{
- /* pre-scale counters for performance */
- int i = (num/2 - 1) * size, n = num * size, c, r;
-
- if (!swap_func) {
- if (size == 4 && alignment_ok(base, 4))
- swap_func = u32_swap;
- else if (size == 8 && alignment_ok(base, 8))
- swap_func = u64_swap;
- else
- swap_func = generic_swap;
- }
-
- /* heapify */
- for ( ; i >= 0; i -= size) {
- for (r = i; r * 2 + size < n; r = c) {
- c = r * 2 + size;
- if (c < n - size &&
- cmp_func(base + c, base + c + size, size) < 0)
- c += size;
- if (cmp_func(base + r, base + c, size) >= 0)
- break;
- swap_func(base + r, base + c, size);
- }
- }
-
- /* sort */
- for (i = n - size; i > 0; i -= size) {
- swap_func(base, base + i, size);
- for (r = 0; r * 2 + size < i; r = c) {
- c = r * 2 + size;
- if (c < i - size &&
- cmp_func(base + c, base + c + size, size) < 0)
- c += size;
- if (cmp_func(base + r, base + c, size) >= 0)
- break;
- swap_func(base + r, base + c, size);
- }
- }
-}
-
static void mempool_free_vp(void *element, void *pool_data)
{
size_t size = (size_t) pool_data;
diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
index 0059481995ef7..c3b11c3d24ea9 100644
--- a/fs/bcachefs/util.h
+++ b/fs/bcachefs/util.h
@@ -737,10 +737,6 @@ static inline void memset_u64s_tail(void *s, int c, unsigned bytes)
memset(s + bytes, c, rem);
}

-void sort_cmp_size(void *base, size_t num, size_t size,
- int (*cmp_func)(const void *, const void *, size_t),
- void (*swap_func)(void *, void *, size_t));
-
/* just the memmove, doesn't update @_nr */
#define __array_insert_item(_array, _nr, _pos) \
memmove(&(_array)[(_pos) + 1], \
diff --git a/fs/bcachefs/eytzinger.h b/include/linux/eytzinger.h
similarity index 77%
rename from fs/bcachefs/eytzinger.h
rename to include/linux/eytzinger.h
index b04750dbf870b..1031501030449 100644
--- a/fs/bcachefs/eytzinger.h
+++ b/include/linux/eytzinger.h
@@ -1,27 +1,37 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _EYTZINGER_H
-#define _EYTZINGER_H
+#ifndef _LINUX_EYTZINGER_H
+#define _LINUX_EYTZINGER_H

#include <linux/bitops.h>
#include <linux/log2.h>

-#include "util.h"
+#ifdef EYTZINGER_DEBUG
+#define EYTZINGER_BUG_ON(cond) BUG_ON(cond)
+#else
+#define EYTZINGER_BUG_ON(cond)
+#endif

/*
* Traversal for trees in eytzinger layout - a full binary tree layed out in an
- * array
- */
-
-/*
- * One based indexing version:
+ * array.
*
- * With one based indexing each level of the tree starts at a power of two -
- * good for cacheline alignment:
+ * Consider using an eytzinger tree any time you would otherwise be doing binary
+ * search over an array. Binary search is a worst case scenario for branch
+ * prediction and prefetching, but in an eytzinger tree every node's children
+ * are adjacent in memory, thus we can prefetch children before knowing the
+ * result of the comparison, assuming multiple nodes fit on a cacheline.
+ *
+ * Two variants are provided, for one based indexing and zero based indexing.
+ *
+ * Zero based indexing is more convenient, but one based indexing has better
+ * alignment and thus better performance because each new level of the tree
+ * starts at a power of two, and thus if element 0 was cacheline aligned, each
+ * new level will be as well.
*/

static inline unsigned eytzinger1_child(unsigned i, unsigned child)
{
- EBUG_ON(child > 1);
+ EYTZINGER_BUG_ON(child > 1);

return (i << 1) + child;
}
@@ -58,7 +68,7 @@ static inline unsigned eytzinger1_last(unsigned size)

static inline unsigned eytzinger1_next(unsigned i, unsigned size)
{
- EBUG_ON(i > size);
+ EYTZINGER_BUG_ON(i > size);

if (eytzinger1_right_child(i) <= size) {
i = eytzinger1_right_child(i);
@@ -74,7 +84,7 @@ static inline unsigned eytzinger1_next(unsigned i, unsigned size)

static inline unsigned eytzinger1_prev(unsigned i, unsigned size)
{
- EBUG_ON(i > size);
+ EYTZINGER_BUG_ON(i > size);

if (eytzinger1_left_child(i) <= size) {
i = eytzinger1_left_child(i) + 1;
@@ -101,7 +111,7 @@ static inline unsigned __eytzinger1_to_inorder(unsigned i, unsigned size,
unsigned shift = __fls(size) - b;
int s;

- EBUG_ON(!i || i > size);
+ EYTZINGER_BUG_ON(!i || i > size);

i ^= 1U << b;
i <<= 1;
@@ -126,7 +136,7 @@ static inline unsigned __inorder_to_eytzinger1(unsigned i, unsigned size,
unsigned shift;
int s;

- EBUG_ON(!i || i > size);
+ EYTZINGER_BUG_ON(!i || i > size);

/*
* sign bit trick:
@@ -164,7 +174,7 @@ static inline unsigned inorder_to_eytzinger1(unsigned i, unsigned size)

static inline unsigned eytzinger0_child(unsigned i, unsigned child)
{
- EBUG_ON(child > 1);
+ EYTZINGER_BUG_ON(child > 1);

return (i << 1) + 1 + child;
}
@@ -231,11 +241,9 @@ static inline unsigned inorder_to_eytzinger0(unsigned i, unsigned size)
(_i) != -1; \
(_i) = eytzinger0_next((_i), (_size)))

-typedef int (*eytzinger_cmp_fn)(const void *l, const void *r, size_t size);
-
/* return greatest node <= @search, or -1 if not found */
static inline ssize_t eytzinger0_find_le(void *base, size_t nr, size_t size,
- eytzinger_cmp_fn cmp, const void *search)
+ cmp_func_t cmp, const void *search)
{
unsigned i, n = 0;

@@ -244,7 +252,7 @@ static inline ssize_t eytzinger0_find_le(void *base, size_t nr, size_t size,

do {
i = n;
- n = eytzinger0_child(i, cmp(search, base + i * size, size) >= 0);
+ n = eytzinger0_child(i, cmp(search, base + i * size) >= 0);
} while (n < nr);

if (n & 1) {
@@ -269,13 +277,13 @@ static inline ssize_t eytzinger0_find_le(void *base, size_t nr, size_t size,
int _res; \
\
while (_i < _nr && \
- (_res = _cmp(_search, _base + _i * _size, _size))) \
+ (_res = _cmp(_search, _base + _i * _size))) \
_i = eytzinger0_child(_i, _res > 0); \
_i; \
})

-void eytzinger0_sort(void *, size_t, size_t,
- int (*cmp_func)(const void *, const void *, size_t),
- void (*swap_func)(void *, void *, size_t));
+void eytzinger0_sort_r(void *, size_t, size_t,
+ cmp_r_func_t, swap_r_func_t, const void *);
+void eytzinger0_sort(void *, size_t, size_t, cmp_func_t, swap_func_t);

-#endif /* _EYTZINGER_H */
+#endif /* _LINUX_EYTZINGER_H */
diff --git a/lib/sort.c b/lib/sort.c
index b399bf10d6759..f5b2206c73461 100644
--- a/lib/sort.c
+++ b/lib/sort.c
@@ -290,3 +290,92 @@ void sort(void *base, size_t num, size_t size,
return sort_r(base, num, size, _CMP_WRAPPER, SWAP_WRAPPER, &w);
}
EXPORT_SYMBOL(sort);
+
+#include <linux/eytzinger.h>
+
+static inline int eytzinger0_do_cmp(void *base, size_t n, size_t size,
+ cmp_r_func_t cmp_func, const void *priv,
+ size_t l, size_t r)
+{
+ return do_cmp(base + inorder_to_eytzinger0(l, n) * size,
+ base + inorder_to_eytzinger0(r, n) * size,
+ cmp_func, priv);
+}
+
+static inline void eytzinger0_do_swap(void *base, size_t n, size_t size,
+ swap_r_func_t swap_func, const void *priv,
+ size_t l, size_t r)
+{
+ do_swap(base + inorder_to_eytzinger0(l, n) * size,
+ base + inorder_to_eytzinger0(r, n) * size,
+ size, swap_func, priv);
+}
+
+void eytzinger0_sort_r(void *base, size_t n, size_t size,
+ cmp_r_func_t cmp_func,
+ swap_r_func_t swap_func,
+ const void *priv)
+{
+ int i, c, r;
+
+ /* called from 'sort' without swap function, let's pick the default */
+ if (swap_func == SWAP_WRAPPER && !((struct wrapper *)priv)->swap)
+ swap_func = NULL;
+
+ if (!swap_func) {
+ if (is_aligned(base, size, 8))
+ swap_func = SWAP_WORDS_64;
+ else if (is_aligned(base, size, 4))
+ swap_func = SWAP_WORDS_32;
+ else
+ swap_func = SWAP_BYTES;
+ }
+
+ /* heapify */
+ for (i = n / 2 - 1; i >= 0; --i) {
+ for (r = i; r * 2 + 1 < n; r = c) {
+ c = r * 2 + 1;
+
+ if (c + 1 < n &&
+ eytzinger0_do_cmp(base, n, size, cmp_func, priv, c, c + 1) < 0)
+ c++;
+
+ if (eytzinger0_do_cmp(base, n, size, cmp_func, priv, r, c) >= 0)
+ break;
+
+ eytzinger0_do_swap(base, n, size, swap_func, priv, r, c);
+ }
+ }
+
+ /* sort */
+ for (i = n - 1; i > 0; --i) {
+ eytzinger0_do_swap(base, n, size, swap_func, priv, 0, i);
+
+ for (r = 0; r * 2 + 1 < i; r = c) {
+ c = r * 2 + 1;
+
+ if (c + 1 < i &&
+ eytzinger0_do_cmp(base, n, size, cmp_func, priv, c, c + 1) < 0)
+ c++;
+
+ if (eytzinger0_do_cmp(base, n, size, cmp_func, priv, r, c) >= 0)
+ break;
+
+ eytzinger0_do_swap(base, n, size, swap_func, priv, r, c);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(eytzinger0_sort_r);
+
+void eytzinger0_sort(void *base, size_t n, size_t size,
+ cmp_func_t cmp_func,
+ swap_func_t swap_func)
+{
+ struct wrapper w = {
+ .cmp = cmp_func,
+ .swap = swap_func,
+ };
+
+ return eytzinger0_sort_r(base, n, size, _CMP_WRAPPER, SWAP_WRAPPER, &w);
+}
+EXPORT_SYMBOL_GPL(eytzinger0_sort);


2024-02-24 01:10:04

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 3/4] bcachefs: bch2_time_stats_to_seq_buf()

From: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/bcachefs/super.c | 2 +
fs/bcachefs/util.c | 133 +++++++++++++++++++++++++++++++++++++++++++++------
fs/bcachefs/util.h | 4 ++
3 files changed, 123 insertions(+), 16 deletions(-)


diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index 6b23e11825e6d..b9e2c1032b920 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -1262,6 +1262,8 @@ static struct bch_dev *__bch2_dev_alloc(struct bch_fs *c,

bch2_time_stats_init(&ca->io_latency[READ]);
bch2_time_stats_init(&ca->io_latency[WRITE]);
+ ca->io_latency[READ].quantiles_enabled = true;
+ ca->io_latency[WRITE].quantiles_enabled = true;

ca->mi = bch2_mi_to_cpu(member);

diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c
index 902f6b1a8a142..4c63f81e18bc4 100644
--- a/fs/bcachefs/util.c
+++ b/fs/bcachefs/util.c
@@ -506,10 +506,8 @@ static inline void pr_name_and_units(struct printbuf *out, const char *name, u64

void bch2_time_stats_to_text(struct printbuf *out, struct bch2_time_stats *stats)
{
- const struct time_unit *u;
s64 f_mean = 0, d_mean = 0;
- u64 q, last_q = 0, f_stddev = 0, d_stddev = 0;
- int i;
+ u64 f_stddev = 0, d_stddev = 0;

if (stats->buffer) {
int cpu;
@@ -608,19 +606,122 @@ void bch2_time_stats_to_text(struct printbuf *out, struct bch2_time_stats *stats

printbuf_tabstops_reset(out);

- i = eytzinger0_first(NR_QUANTILES);
- u = pick_time_units(stats->quantiles.entries[i].m);
-
- prt_printf(out, "quantiles (%s):\t", u->name);
- eytzinger0_for_each(i, NR_QUANTILES) {
- bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;
-
- q = max(stats->quantiles.entries[i].m, last_q);
- prt_printf(out, "%llu ",
- div_u64(q, u->nsecs));
- if (is_last)
- prt_newline(out);
- last_q = q;
+ if (stats->quantiles_enabled) {
+ int i = eytzinger0_first(NR_QUANTILES);
+ const struct time_unit *u =
+ pick_time_units(stats->quantiles.entries[i].m);
+ u64 last_q = 0;
+
+ prt_printf(out, "quantiles (%s):\t", u->name);
+ eytzinger0_for_each(i, NR_QUANTILES) {
+ bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;
+
+ u64 q = max(stats->quantiles.entries[i].m, last_q);
+ prt_printf(out, "%llu ", div_u64(q, u->nsecs));
+ if (is_last)
+ prt_newline(out);
+ last_q = q;
+ }
+ }
+}
+
+#include <linux/seq_buf.h>
+
+static void seq_buf_time_units_aligned(struct seq_buf *out, u64 ns)
+{
+ const struct time_unit *u = pick_time_units(ns);
+
+ seq_buf_printf(out, "%8llu %s", div64_u64(ns, u->nsecs), u->name);
+}
+
+void bch2_time_stats_to_seq_buf(struct seq_buf *out, struct bch2_time_stats *stats)
+{
+ s64 f_mean = 0, d_mean = 0;
+ u64 f_stddev = 0, d_stddev = 0;
+
+ if (stats->buffer) {
+ int cpu;
+
+ spin_lock_irq(&stats->lock);
+ for_each_possible_cpu(cpu)
+ __bch2_time_stats_clear_buffer(stats, per_cpu_ptr(stats->buffer, cpu));
+ spin_unlock_irq(&stats->lock);
+ }
+
+ /*
+ * avoid divide by zero
+ */
+ if (stats->freq_stats.n) {
+ f_mean = mean_and_variance_get_mean(stats->freq_stats);
+ f_stddev = mean_and_variance_get_stddev(stats->freq_stats);
+ d_mean = mean_and_variance_get_mean(stats->duration_stats);
+ d_stddev = mean_and_variance_get_stddev(stats->duration_stats);
+ }
+
+ seq_buf_printf(out, "count: %llu\n", stats->duration_stats.n);
+
+ seq_buf_printf(out, " since mount recent\n");
+
+ seq_buf_printf(out, "duration of events\n");
+
+ seq_buf_printf(out, " min: ");
+ seq_buf_time_units_aligned(out, stats->min_duration);
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " max: ");
+ seq_buf_time_units_aligned(out, stats->max_duration);
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " total: ");
+ seq_buf_time_units_aligned(out, stats->total_duration);
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " mean: ");
+ seq_buf_time_units_aligned(out, d_mean);
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->duration_stats_weighted));
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " stddev: ");
+ seq_buf_time_units_aligned(out, d_stddev);
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted));
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, "time between events\n");
+
+ seq_buf_printf(out, " min: ");
+ seq_buf_time_units_aligned(out, stats->min_freq);
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " max: ");
+ seq_buf_time_units_aligned(out, stats->max_freq);
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " mean: ");
+ seq_buf_time_units_aligned(out, f_mean);
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->freq_stats_weighted));
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " stddev: ");
+ seq_buf_time_units_aligned(out, f_stddev);
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted));
+ seq_buf_printf(out, "\n");
+
+ if (stats->quantiles_enabled) {
+ int i = eytzinger0_first(NR_QUANTILES);
+ const struct time_unit *u =
+ pick_time_units(stats->quantiles.entries[i].m);
+ u64 last_q = 0;
+
+ prt_printf(out, "quantiles (%s):\t", u->name);
+ eytzinger0_for_each(i, NR_QUANTILES) {
+ bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;
+
+ u64 q = max(stats->quantiles.entries[i].m, last_q);
+ seq_buf_printf(out, "%llu ", div_u64(q, u->nsecs));
+ if (is_last)
+ seq_buf_printf(out, "\n");
+ last_q = q;
+ }
}
}
#else
diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
index c3b11c3d24ea9..7ff2d4fe26f68 100644
--- a/fs/bcachefs/util.h
+++ b/fs/bcachefs/util.h
@@ -382,6 +382,7 @@ struct bch2_time_stat_buffer {

struct bch2_time_stats {
spinlock_t lock;
+ bool quantiles_enabled;
/* all fields are in nanoseconds */
u64 min_duration;
u64 max_duration;
@@ -435,6 +436,9 @@ static inline bool track_event_change(struct bch2_time_stats *stats,

void bch2_time_stats_to_text(struct printbuf *, struct bch2_time_stats *);

+struct seq_buf;
+void bch2_time_stats_to_seq_buf(struct seq_buf *, struct bch2_time_stats *);
+
void bch2_time_stats_exit(struct bch2_time_stats *);
void bch2_time_stats_init(struct bch2_time_stats *);



2024-02-24 01:10:23

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 4/4] time_stats: Promote to lib/

From: Kent Overstreet <[email protected]>

Library code from bcachefs for tracking latency measurements.

The main interface is
time_stats_update(stats, start_time);

which collects a new event with an end time of the current time.

It features percpu buffering of input values, making it very low
overhead, and nicely formatted output to printbufs or seq_buf.

Sample output, from the bcache conversion:

root@moria-kvm:/sys/fs/bcache/bdaedb8c-4554-4dd2-87e4-276e51eb47cc# cat internal/btree_sort_times
count: 6414
since mount recent
duration of events
min: 440 ns
max: 1102 us
total: 674 ms
mean: 105 us 102 us
stddev: 101 us 88 us
time between events
min: 881 ns
max: 3 s
mean: 7 ms 6 ms
stddev: 52 ms 6 ms

Cc: Darrick J. Wong <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Coly Li <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
MAINTAINERS | 7 +
fs/bcachefs/Kconfig | 2
fs/bcachefs/alloc_foreground.c | 13 +-
fs/bcachefs/bcachefs.h | 11 +
fs/bcachefs/btree_cache.c | 2
fs/bcachefs/btree_gc.c | 2
fs/bcachefs/btree_io.c | 8 +
fs/bcachefs/btree_iter.c | 8 +
fs/bcachefs/btree_locking.h | 2
fs/bcachefs/btree_update_interior.c | 8 +
fs/bcachefs/io_read.c | 4 -
fs/bcachefs/io_write.c | 4 -
fs/bcachefs/journal.c | 5 -
fs/bcachefs/journal_io.c | 9 +
fs/bcachefs/journal_reclaim.c | 9 -
fs/bcachefs/journal_types.h | 11 -
fs/bcachefs/nocow_locking.c | 2
fs/bcachefs/super.c | 12 +-
fs/bcachefs/util.c | 263 ----------------------------------
fs/bcachefs/util.h | 83 -----------
include/linux/time_stats.h | 134 +++++++++++++++++
lib/Kconfig | 4 +
lib/Makefile | 2
lib/time_stats.c | 271 +++++++++++++++++++++++++++++++++++
24 files changed, 470 insertions(+), 406 deletions(-)
create mode 100644 include/linux/time_stats.h
create mode 100644 lib/time_stats.c


diff --git a/MAINTAINERS b/MAINTAINERS
index 98a17270566d3..aa762fe654e3e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22157,6 +22157,13 @@ F: kernel/time/ntp.c
F: kernel/time/time*.c
F: tools/testing/selftests/timers/

+TIME STATS:
+M: Kent Overstreet <[email protected]>
+M: Darrick J. Wong <[email protected]>
+S: Maintained
+F: include/linux/time_stats.h
+F: lib/time_stats.c
+
TIPC NETWORK LAYER
M: Jon Maloy <[email protected]>
M: Ying Xue <[email protected]>
diff --git a/fs/bcachefs/Kconfig b/fs/bcachefs/Kconfig
index 72d1179262b33..8c587ddd2f85e 100644
--- a/fs/bcachefs/Kconfig
+++ b/fs/bcachefs/Kconfig
@@ -24,7 +24,7 @@ config BCACHEFS_FS
select XXHASH
select SRCU
select SYMBOLIC_ERRNAME
- select MEAN_AND_VARIANCE
+ select TIME_STATS
help
The bcachefs filesystem - a modern, copy on write filesystem, with
support for multiple devices, compression, checksumming, etc.
diff --git a/fs/bcachefs/alloc_foreground.c b/fs/bcachefs/alloc_foreground.c
index 633d3223b353f..ca58193dd9027 100644
--- a/fs/bcachefs/alloc_foreground.c
+++ b/fs/bcachefs/alloc_foreground.c
@@ -236,8 +236,7 @@ static struct open_bucket *__try_alloc_bucket(struct bch_fs *c, struct bch_dev *
if (cl)
closure_wait(&c->open_buckets_wait, cl);

- track_event_change(&c->times[BCH_TIME_blocked_allocate_open_bucket],
- &c->blocked_allocate_open_bucket, true);
+ track_event_change(&c->times[BCH_TIME_blocked_allocate_open_bucket], true);
spin_unlock(&c->freelist_lock);
return ERR_PTR(-BCH_ERR_open_buckets_empty);
}
@@ -263,11 +262,8 @@ static struct open_bucket *__try_alloc_bucket(struct bch_fs *c, struct bch_dev *
ca->nr_open_buckets++;
bch2_open_bucket_hash_add(c, ob);

- track_event_change(&c->times[BCH_TIME_blocked_allocate_open_bucket],
- &c->blocked_allocate_open_bucket, false);
-
- track_event_change(&c->times[BCH_TIME_blocked_allocate],
- &c->blocked_allocate, false);
+ track_event_change(&c->times[BCH_TIME_blocked_allocate_open_bucket], false);
+ track_event_change(&c->times[BCH_TIME_blocked_allocate], false);

spin_unlock(&c->freelist_lock);
return ob;
@@ -555,8 +551,7 @@ static struct open_bucket *bch2_bucket_alloc_trans(struct btree_trans *trans,
goto again;
}

- track_event_change(&c->times[BCH_TIME_blocked_allocate],
- &c->blocked_allocate, true);
+ track_event_change(&c->times[BCH_TIME_blocked_allocate], true);

ob = ERR_PTR(-BCH_ERR_freelist_empty);
goto err;
diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index 69d0d60d50e36..92547d6fd2d95 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -200,6 +200,7 @@
#include <linux/seqlock.h>
#include <linux/shrinker.h>
#include <linux/srcu.h>
+#include <linux/time_stats.h>
#include <linux/types.h>
#include <linux/workqueue.h>
#include <linux/zstd.h>
@@ -593,7 +594,7 @@ struct bch_dev {

/* The rest of this all shows up in sysfs */
atomic64_t cur_latency[2];
- struct bch2_time_stats io_latency[2];
+ struct time_stats io_latency[2];

#define CONGESTED_MAX 1024
atomic_t congested;
@@ -640,8 +641,8 @@ struct btree_debug {
#define BCH_TRANSACTIONS_NR 128

struct btree_transaction_stats {
- struct bch2_time_stats duration;
- struct bch2_time_stats lock_hold_times;
+ struct time_stats duration;
+ struct time_stats lock_hold_times;
struct mutex lock;
unsigned nr_max_paths;
unsigned journal_entries_size;
@@ -919,8 +920,6 @@ struct bch_fs {
/* ALLOCATOR */
spinlock_t freelist_lock;
struct closure_waitlist freelist_wait;
- u64 blocked_allocate;
- u64 blocked_allocate_open_bucket;

open_bucket_idx_t open_buckets_freelist;
open_bucket_idx_t open_buckets_nr_free;
@@ -1104,7 +1103,7 @@ struct bch_fs {
unsigned copy_gc_enabled:1;
bool promote_whole_extents;

- struct bch2_time_stats times[BCH_TIME_STAT_NR];
+ struct time_stats times[BCH_TIME_STAT_NR];

struct btree_transaction_stats btree_transaction_stats[BCH_TRANSACTIONS_NR];

diff --git a/fs/bcachefs/btree_cache.c b/fs/bcachefs/btree_cache.c
index d7c81beac14af..8b3c04fc406f5 100644
--- a/fs/bcachefs/btree_cache.c
+++ b/fs/bcachefs/btree_cache.c
@@ -648,7 +648,7 @@ struct btree *bch2_btree_node_mem_alloc(struct btree_trans *trans, bool pcpu_rea
bch2_btree_keys_init(b);
set_btree_node_accessed(b);

- bch2_time_stats_update(&c->times[BCH_TIME_btree_node_mem_alloc],
+ time_stats_update(&c->times[BCH_TIME_btree_node_mem_alloc],
start_time);

memalloc_nofs_restore(flags);
diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c
index 1102995643b13..774df395e4c73 100644
--- a/fs/bcachefs/btree_gc.c
+++ b/fs/bcachefs/btree_gc.c
@@ -1970,7 +1970,7 @@ int bch2_gc_gens(struct bch_fs *c)

c->gc_count++;

- bch2_time_stats_update(&c->times[BCH_TIME_btree_gc], start_time);
+ time_stats_update(&c->times[BCH_TIME_btree_gc], start_time);
trace_and_count(c, gc_gens_end, c);
err:
for_each_member_device(c, ca) {
diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index aa9b6cbe32269..a56dcabb7ace7 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -327,7 +327,7 @@ static void btree_node_sort(struct bch_fs *c, struct btree *b,
BUG_ON(vstruct_end(&out->keys) > (void *) out + bytes);

if (sorting_entire_node)
- bch2_time_stats_update(&c->times[BCH_TIME_btree_node_sort],
+ time_stats_update(&c->times[BCH_TIME_btree_node_sort],
start_time);

/* Make sure we preserve bset journal_seq: */
@@ -397,7 +397,7 @@ void bch2_btree_sort_into(struct bch_fs *c,
&dst->format,
true);

- bch2_time_stats_update(&c->times[BCH_TIME_btree_node_sort],
+ time_stats_update(&c->times[BCH_TIME_btree_node_sort],
start_time);

set_btree_bset_end(dst, dst->set);
@@ -1251,7 +1251,7 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca,
out:
mempool_free(iter, &c->fill_iter);
printbuf_exit(&buf);
- bch2_time_stats_update(&c->times[BCH_TIME_btree_node_read_done], start_time);
+ time_stats_update(&c->times[BCH_TIME_btree_node_read_done], start_time);
return retry_read;
fsck_err:
if (ret == -BCH_ERR_btree_node_read_err_want_retry ||
@@ -1323,7 +1323,7 @@ static void btree_node_read_work(struct work_struct *work)
}
}

- bch2_time_stats_update(&c->times[BCH_TIME_btree_node_read],
+ time_stats_update(&c->times[BCH_TIME_btree_node_read],
rb->start_time);
bio_put(&rb->bio);

diff --git a/fs/bcachefs/btree_iter.c b/fs/bcachefs/btree_iter.c
index 5467a8635be11..f2d7b1dabcfbb 100644
--- a/fs/bcachefs/btree_iter.c
+++ b/fs/bcachefs/btree_iter.c
@@ -2899,7 +2899,7 @@ u32 bch2_trans_begin(struct btree_trans *trans)

if (!IS_ENABLED(CONFIG_BCACHEFS_NO_LATENCY_ACCT) &&
time_after64(now, trans->last_begin_time + 10))
- __bch2_time_stats_update(&btree_trans_stats(trans)->duration,
+ __time_stats_update(&btree_trans_stats(trans)->duration,
trans->last_begin_time, now);

if (!trans->restarted &&
@@ -3224,7 +3224,7 @@ void bch2_fs_btree_iter_exit(struct bch_fs *c)
s < c->btree_transaction_stats + ARRAY_SIZE(c->btree_transaction_stats);
s++) {
kfree(s->max_paths_text);
- bch2_time_stats_exit(&s->lock_hold_times);
+ time_stats_exit(&s->lock_hold_times);
}

if (c->btree_trans_barrier_initialized)
@@ -3240,8 +3240,8 @@ void bch2_fs_btree_iter_init_early(struct bch_fs *c)
for (s = c->btree_transaction_stats;
s < c->btree_transaction_stats + ARRAY_SIZE(c->btree_transaction_stats);
s++) {
- bch2_time_stats_init(&s->duration);
- bch2_time_stats_init(&s->lock_hold_times);
+ time_stats_init(&s->duration);
+ time_stats_init(&s->lock_hold_times);
mutex_init(&s->lock);
}

diff --git a/fs/bcachefs/btree_locking.h b/fs/bcachefs/btree_locking.h
index 4bd72c855da1a..f2e2c5881b7e4 100644
--- a/fs/bcachefs/btree_locking.h
+++ b/fs/bcachefs/btree_locking.h
@@ -122,7 +122,7 @@ static void btree_trans_lock_hold_time_update(struct btree_trans *trans,
struct btree_path *path, unsigned level)
{
#ifdef CONFIG_BCACHEFS_LOCK_TIME_STATS
- __bch2_time_stats_update(&btree_trans_stats(trans)->lock_hold_times,
+ __time_stats_update(&btree_trans_stats(trans)->lock_hold_times,
path->l[level].lock_taken_time,
local_clock());
#endif
diff --git a/fs/bcachefs/btree_update_interior.c b/fs/bcachefs/btree_update_interior.c
index 4530b14ff2c37..669379eaea2f0 100644
--- a/fs/bcachefs/btree_update_interior.c
+++ b/fs/bcachefs/btree_update_interior.c
@@ -517,7 +517,7 @@ static void bch2_btree_update_free(struct btree_update *as, struct btree_trans *
bch2_disk_reservation_put(c, &as->disk_res);
bch2_btree_reserve_put(as, trans);

- bch2_time_stats_update(&c->times[BCH_TIME_btree_interior_update_total],
+ time_stats_update(&c->times[BCH_TIME_btree_interior_update_total],
as->start_time);

mutex_lock(&c->btree_interior_update_lock);
@@ -1039,7 +1039,7 @@ static void bch2_btree_update_done(struct btree_update *as, struct btree_trans *
continue_at(&as->cl, btree_update_set_nodes_written,
as->c->btree_interior_update_worker);

- bch2_time_stats_update(&c->times[BCH_TIME_btree_interior_update_foreground],
+ time_stats_update(&c->times[BCH_TIME_btree_interior_update_foreground],
start_time);
}

@@ -1630,7 +1630,7 @@ static int btree_split(struct btree_update *as, struct btree_trans *trans,

bch2_trans_verify_locks(trans);

- bch2_time_stats_update(&c->times[n2
+ time_stats_update(&c->times[n2
? BCH_TIME_btree_node_split
: BCH_TIME_btree_node_compact],
start_time);
@@ -1936,7 +1936,7 @@ int __bch2_foreground_maybe_merge(struct btree_trans *trans,

bch2_btree_update_done(as, trans);

- bch2_time_stats_update(&c->times[BCH_TIME_btree_node_merge], start_time);
+ time_stats_update(&c->times[BCH_TIME_btree_node_merge], start_time);
out:
err:
if (new_path)
diff --git a/fs/bcachefs/io_read.c b/fs/bcachefs/io_read.c
index 3c574d8873a1e..dce136cd22713 100644
--- a/fs/bcachefs/io_read.c
+++ b/fs/bcachefs/io_read.c
@@ -134,7 +134,7 @@ static void promote_done(struct bch_write_op *wop)
container_of(wop, struct promote_op, write.op);
struct bch_fs *c = op->write.op.c;

- bch2_time_stats_update(&c->times[BCH_TIME_data_promote],
+ time_stats_update(&c->times[BCH_TIME_data_promote],
op->start_time);
promote_free(c, op);
}
@@ -356,7 +356,7 @@ static inline struct bch_read_bio *bch2_rbio_free(struct bch_read_bio *rbio)
static void bch2_rbio_done(struct bch_read_bio *rbio)
{
if (rbio->start_time)
- bch2_time_stats_update(&rbio->c->times[BCH_TIME_data_read],
+ time_stats_update(&rbio->c->times[BCH_TIME_data_read],
rbio->start_time);
bio_endio(&rbio->bio);
}
diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
index 2c098ac017b30..8123a84320e3f 100644
--- a/fs/bcachefs/io_write.c
+++ b/fs/bcachefs/io_write.c
@@ -88,7 +88,7 @@ void bch2_latency_acct(struct bch_dev *ca, u64 submit_time, int rw)

bch2_congested_acct(ca, io_latency, now, rw);

- __bch2_time_stats_update(&ca->io_latency[rw], submit_time, now);
+ __time_stats_update(&ca->io_latency[rw], submit_time, now);
}

#endif
@@ -457,7 +457,7 @@ static void bch2_write_done(struct closure *cl)

EBUG_ON(op->open_buckets.nr);

- bch2_time_stats_update(&c->times[BCH_TIME_data_write], op->start_time);
+ time_stats_update(&c->times[BCH_TIME_data_write], op->start_time);
bch2_disk_reservation_put(c, &op->res);

if (!(op->flags & BCH_WRITE_MOVE))
diff --git a/fs/bcachefs/journal.c b/fs/bcachefs/journal.c
index bc890776eb579..c5d6cc29be870 100644
--- a/fs/bcachefs/journal.c
+++ b/fs/bcachefs/journal.c
@@ -518,8 +518,7 @@ static int __journal_res_get(struct journal *j, struct journal_res *res,
ret = journal_entry_open(j);

if (ret == JOURNAL_ERR_max_in_flight) {
- track_event_change(&c->times[BCH_TIME_blocked_journal_max_in_flight],
- &j->max_in_flight_start, true);
+ track_event_change(&c->times[BCH_TIME_blocked_journal_max_in_flight], true);
if (trace_journal_entry_full_enabled()) {
struct printbuf buf = PRINTBUF;
buf.atomic++;
@@ -727,7 +726,7 @@ int bch2_journal_flush_seq(struct journal *j, u64 seq)
ret = wait_event_interruptible(j->wait, (ret2 = bch2_journal_flush_seq_async(j, seq, NULL)));

if (!ret)
- bch2_time_stats_update(j->flush_seq_time, start_time);
+ time_stats_update(j->flush_seq_time, start_time);

return ret ?: ret2 < 0 ? ret2 : 0;
}
diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c
index 47805193f18cc..23401e9686eed 100644
--- a/fs/bcachefs/journal_io.c
+++ b/fs/bcachefs/journal_io.c
@@ -1576,9 +1576,9 @@ static CLOSURE_CALLBACK(journal_write_done)
u64 v, seq;
int err = 0;

- bch2_time_stats_update(!JSET_NO_FLUSH(w->data)
- ? j->flush_write_time
- : j->noflush_write_time, j->write_start_time);
+ time_stats_update(!JSET_NO_FLUSH(w->data)
+ ? j->flush_write_time
+ : j->noflush_write_time, j->write_start_time);

if (!w->devs_written.nr) {
bch_err(c, "unable to write journal to sufficient devices");
@@ -1639,8 +1639,7 @@ static CLOSURE_CALLBACK(journal_write_done)
bch2_journal_reclaim_fast(j);
bch2_journal_space_available(j);

- track_event_change(&c->times[BCH_TIME_blocked_journal_max_in_flight],
- &j->max_in_flight_start, false);
+ track_event_change(&c->times[BCH_TIME_blocked_journal_max_in_flight], false);

closure_wake_up(&w->wait);
journal_wake(j);
diff --git a/fs/bcachefs/journal_reclaim.c b/fs/bcachefs/journal_reclaim.c
index 2cf626315652c..d58503b73b966 100644
--- a/fs/bcachefs/journal_reclaim.c
+++ b/fs/bcachefs/journal_reclaim.c
@@ -62,12 +62,9 @@ void bch2_journal_set_watermark(struct journal *j)
? BCH_WATERMARK_reclaim
: BCH_WATERMARK_stripe;

- if (track_event_change(&c->times[BCH_TIME_blocked_journal_low_on_space],
- &j->low_on_space_start, low_on_space) ||
- track_event_change(&c->times[BCH_TIME_blocked_journal_low_on_pin],
- &j->low_on_pin_start, low_on_pin) ||
- track_event_change(&c->times[BCH_TIME_blocked_write_buffer_full],
- &j->write_buffer_full_start, low_on_wb))
+ if (track_event_change(&c->times[BCH_TIME_blocked_journal_low_on_space], low_on_space) ||
+ track_event_change(&c->times[BCH_TIME_blocked_journal_low_on_pin], low_on_pin) ||
+ track_event_change(&c->times[BCH_TIME_blocked_write_buffer_full], low_on_wb))
trace_and_count(c, journal_full, c);

swap(watermark, j->watermark);
diff --git a/fs/bcachefs/journal_types.h b/fs/bcachefs/journal_types.h
index 38817c7a08515..b93e02c0a178a 100644
--- a/fs/bcachefs/journal_types.h
+++ b/fs/bcachefs/journal_types.h
@@ -274,14 +274,9 @@ struct journal {
u64 nr_noflush_writes;
u64 entry_bytes_written;

- u64 low_on_space_start;
- u64 low_on_pin_start;
- u64 max_in_flight_start;
- u64 write_buffer_full_start;
-
- struct bch2_time_stats *flush_write_time;
- struct bch2_time_stats *noflush_write_time;
- struct bch2_time_stats *flush_seq_time;
+ struct time_stats *flush_write_time;
+ struct time_stats *noflush_write_time;
+ struct time_stats *flush_seq_time;

#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map res_map;
diff --git a/fs/bcachefs/nocow_locking.c b/fs/bcachefs/nocow_locking.c
index 3c21981a4a1c0..181efa4a83fa1 100644
--- a/fs/bcachefs/nocow_locking.c
+++ b/fs/bcachefs/nocow_locking.c
@@ -85,7 +85,7 @@ void __bch2_bucket_nocow_lock(struct bucket_nocow_lock_table *t,
u64 start_time = local_clock();

__closure_wait_event(&l->wait, __bch2_bucket_nocow_trylock(l, dev_bucket, flags));
- bch2_time_stats_update(&c->times[BCH_TIME_nocow_lock_contended], start_time);
+ time_stats_update(&c->times[BCH_TIME_nocow_lock_contended], start_time);
}
}

diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index b9e2c1032b920..c491c5e102287 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -520,7 +520,7 @@ static void __bch2_fs_free(struct bch_fs *c)
unsigned i;

for (i = 0; i < BCH_TIME_STAT_NR; i++)
- bch2_time_stats_exit(&c->times[i]);
+ time_stats_exit(&c->times[i]);

bch2_free_pending_node_rewrites(c);
bch2_fs_sb_errors_exit(c);
@@ -753,7 +753,7 @@ static struct bch_fs *bch2_fs_alloc(struct bch_sb *sb, struct bch_opts opts)
c->journal_keys.initial_ref_held = true;

for (i = 0; i < BCH_TIME_STAT_NR; i++)
- bch2_time_stats_init(&c->times[i]);
+ time_stats_init(&c->times[i]);

bch2_fs_copygc_init(c);
bch2_fs_btree_key_cache_init_early(&c->btree_key_cache);
@@ -1168,8 +1168,8 @@ static void bch2_dev_free(struct bch_dev *ca)
bch2_dev_buckets_free(ca);
free_page((unsigned long) ca->sb_read_scratch);

- bch2_time_stats_exit(&ca->io_latency[WRITE]);
- bch2_time_stats_exit(&ca->io_latency[READ]);
+ time_stats_exit(&ca->io_latency[WRITE]);
+ time_stats_exit(&ca->io_latency[READ]);

percpu_ref_exit(&ca->io_ref);
percpu_ref_exit(&ca->ref);
@@ -1260,8 +1260,8 @@ static struct bch_dev *__bch2_dev_alloc(struct bch_fs *c,

INIT_WORK(&ca->io_error_work, bch2_io_error_work);

- bch2_time_stats_init(&ca->io_latency[READ]);
- bch2_time_stats_init(&ca->io_latency[WRITE]);
+ time_stats_init(&ca->io_latency[READ]);
+ time_stats_init(&ca->io_latency[WRITE]);
ca->io_latency[READ].quantiles_enabled = true;
ca->io_latency[WRITE].quantiles_enabled = true;

diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c
index 4c63f81e18bc4..88853513a15fa 100644
--- a/fs/bcachefs/util.c
+++ b/fs/bcachefs/util.c
@@ -337,32 +337,6 @@ void bch2_prt_datetime(struct printbuf *out, time64_t sec)
}
#endif

-static const struct time_unit {
- const char *name;
- u64 nsecs;
-} time_units[] = {
- { "ns", 1 },
- { "us", NSEC_PER_USEC },
- { "ms", NSEC_PER_MSEC },
- { "s", NSEC_PER_SEC },
- { "m", (u64) NSEC_PER_SEC * 60},
- { "h", (u64) NSEC_PER_SEC * 3600},
- { "eon", U64_MAX },
-};
-
-static const struct time_unit *pick_time_units(u64 ns)
-{
- const struct time_unit *u;
-
- for (u = time_units;
- u + 1 < time_units + ARRAY_SIZE(time_units) &&
- ns >= u[1].nsecs << 1;
- u++)
- ;
-
- return u;
-}
-
void bch2_pr_time_units(struct printbuf *out, u64 ns)
{
const struct time_unit *u = pick_time_units(ns);
@@ -370,121 +344,6 @@ void bch2_pr_time_units(struct printbuf *out, u64 ns)
prt_printf(out, "%llu %s", div_u64(ns, u->nsecs), u->name);
}

-/* time stats: */
-
-#ifndef CONFIG_BCACHEFS_NO_LATENCY_ACCT
-static void bch2_quantiles_update(struct bch2_quantiles *q, u64 v)
-{
- unsigned i = 0;
-
- while (i < ARRAY_SIZE(q->entries)) {
- struct bch2_quantile_entry *e = q->entries + i;
-
- if (unlikely(!e->step)) {
- e->m = v;
- e->step = max_t(unsigned, v / 2, 1024);
- } else if (e->m > v) {
- e->m = e->m >= e->step
- ? e->m - e->step
- : 0;
- } else if (e->m < v) {
- e->m = e->m + e->step > e->m
- ? e->m + e->step
- : U32_MAX;
- }
-
- if ((e->m > v ? e->m - v : v - e->m) < e->step)
- e->step = max_t(unsigned, e->step / 2, 1);
-
- if (v >= e->m)
- break;
-
- i = eytzinger0_child(i, v > e->m);
- }
-}
-
-static inline void bch2_time_stats_update_one(struct bch2_time_stats *stats,
- u64 start, u64 end)
-{
- u64 duration, freq;
-
- if (time_after64(end, start)) {
- duration = end - start;
- mean_and_variance_update(&stats->duration_stats, duration);
- mean_and_variance_weighted_update(&stats->duration_stats_weighted, duration);
- stats->max_duration = max(stats->max_duration, duration);
- stats->min_duration = min(stats->min_duration, duration);
- stats->total_duration += duration;
- bch2_quantiles_update(&stats->quantiles, duration);
- }
-
- if (stats->last_event && time_after64(end, stats->last_event)) {
- freq = end - stats->last_event;
- mean_and_variance_update(&stats->freq_stats, freq);
- mean_and_variance_weighted_update(&stats->freq_stats_weighted, freq);
- stats->max_freq = max(stats->max_freq, freq);
- stats->min_freq = min(stats->min_freq, freq);
- }
-
- stats->last_event = end;
-}
-
-static void __bch2_time_stats_clear_buffer(struct bch2_time_stats *stats,
- struct bch2_time_stat_buffer *b)
-{
- for (struct bch2_time_stat_buffer_entry *i = b->entries;
- i < b->entries + ARRAY_SIZE(b->entries);
- i++)
- bch2_time_stats_update_one(stats, i->start, i->end);
- b->nr = 0;
-}
-
-static noinline void bch2_time_stats_clear_buffer(struct bch2_time_stats *stats,
- struct bch2_time_stat_buffer *b)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&stats->lock, flags);
- __bch2_time_stats_clear_buffer(stats, b);
- spin_unlock_irqrestore(&stats->lock, flags);
-}
-
-void __bch2_time_stats_update(struct bch2_time_stats *stats, u64 start, u64 end)
-{
- unsigned long flags;
-
- WARN_ONCE(!stats->duration_stats_weighted.weight ||
- !stats->freq_stats_weighted.weight,
- "uninitialized time_stats");
-
- if (!stats->buffer) {
- spin_lock_irqsave(&stats->lock, flags);
- bch2_time_stats_update_one(stats, start, end);
-
- if (mean_and_variance_weighted_get_mean(stats->freq_stats_weighted) < 32 &&
- stats->duration_stats.n > 1024)
- stats->buffer =
- alloc_percpu_gfp(struct bch2_time_stat_buffer,
- GFP_ATOMIC);
- spin_unlock_irqrestore(&stats->lock, flags);
- } else {
- struct bch2_time_stat_buffer *b;
-
- preempt_disable();
- b = this_cpu_ptr(stats->buffer);
-
- BUG_ON(b->nr >= ARRAY_SIZE(b->entries));
- b->entries[b->nr++] = (struct bch2_time_stat_buffer_entry) {
- .start = start,
- .end = end
- };
-
- if (unlikely(b->nr == ARRAY_SIZE(b->entries)))
- bch2_time_stats_clear_buffer(stats, b);
- preempt_enable();
- }
-}
-
static void bch2_pr_time_units_aligned(struct printbuf *out, u64 ns)
{
const struct time_unit *u = pick_time_units(ns);
@@ -504,7 +363,7 @@ static inline void pr_name_and_units(struct printbuf *out, const char *name, u64

#define TABSTOP_SIZE 12

-void bch2_time_stats_to_text(struct printbuf *out, struct bch2_time_stats *stats)
+void bch2_time_stats_to_text(struct printbuf *out, struct time_stats *stats)
{
s64 f_mean = 0, d_mean = 0;
u64 f_stddev = 0, d_stddev = 0;
@@ -514,7 +373,7 @@ void bch2_time_stats_to_text(struct printbuf *out, struct bch2_time_stats *stats

spin_lock_irq(&stats->lock);
for_each_possible_cpu(cpu)
- __bch2_time_stats_clear_buffer(stats, per_cpu_ptr(stats->buffer, cpu));
+ __time_stats_clear_buffer(stats, per_cpu_ptr(stats->buffer, cpu));
spin_unlock_irq(&stats->lock);
}

@@ -625,124 +484,6 @@ void bch2_time_stats_to_text(struct printbuf *out, struct bch2_time_stats *stats
}
}

-#include <linux/seq_buf.h>
-
-static void seq_buf_time_units_aligned(struct seq_buf *out, u64 ns)
-{
- const struct time_unit *u = pick_time_units(ns);
-
- seq_buf_printf(out, "%8llu %s", div64_u64(ns, u->nsecs), u->name);
-}
-
-void bch2_time_stats_to_seq_buf(struct seq_buf *out, struct bch2_time_stats *stats)
-{
- s64 f_mean = 0, d_mean = 0;
- u64 f_stddev = 0, d_stddev = 0;
-
- if (stats->buffer) {
- int cpu;
-
- spin_lock_irq(&stats->lock);
- for_each_possible_cpu(cpu)
- __bch2_time_stats_clear_buffer(stats, per_cpu_ptr(stats->buffer, cpu));
- spin_unlock_irq(&stats->lock);
- }
-
- /*
- * avoid divide by zero
- */
- if (stats->freq_stats.n) {
- f_mean = mean_and_variance_get_mean(stats->freq_stats);
- f_stddev = mean_and_variance_get_stddev(stats->freq_stats);
- d_mean = mean_and_variance_get_mean(stats->duration_stats);
- d_stddev = mean_and_variance_get_stddev(stats->duration_stats);
- }
-
- seq_buf_printf(out, "count: %llu\n", stats->duration_stats.n);
-
- seq_buf_printf(out, " since mount recent\n");
-
- seq_buf_printf(out, "duration of events\n");
-
- seq_buf_printf(out, " min: ");
- seq_buf_time_units_aligned(out, stats->min_duration);
- seq_buf_printf(out, "\n");
-
- seq_buf_printf(out, " max: ");
- seq_buf_time_units_aligned(out, stats->max_duration);
- seq_buf_printf(out, "\n");
-
- seq_buf_printf(out, " total: ");
- seq_buf_time_units_aligned(out, stats->total_duration);
- seq_buf_printf(out, "\n");
-
- seq_buf_printf(out, " mean: ");
- seq_buf_time_units_aligned(out, d_mean);
- seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->duration_stats_weighted));
- seq_buf_printf(out, "\n");
-
- seq_buf_printf(out, " stddev: ");
- seq_buf_time_units_aligned(out, d_stddev);
- seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted));
- seq_buf_printf(out, "\n");
-
- seq_buf_printf(out, "time between events\n");
-
- seq_buf_printf(out, " min: ");
- seq_buf_time_units_aligned(out, stats->min_freq);
- seq_buf_printf(out, "\n");
-
- seq_buf_printf(out, " max: ");
- seq_buf_time_units_aligned(out, stats->max_freq);
- seq_buf_printf(out, "\n");
-
- seq_buf_printf(out, " mean: ");
- seq_buf_time_units_aligned(out, f_mean);
- seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->freq_stats_weighted));
- seq_buf_printf(out, "\n");
-
- seq_buf_printf(out, " stddev: ");
- seq_buf_time_units_aligned(out, f_stddev);
- seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted));
- seq_buf_printf(out, "\n");
-
- if (stats->quantiles_enabled) {
- int i = eytzinger0_first(NR_QUANTILES);
- const struct time_unit *u =
- pick_time_units(stats->quantiles.entries[i].m);
- u64 last_q = 0;
-
- prt_printf(out, "quantiles (%s):\t", u->name);
- eytzinger0_for_each(i, NR_QUANTILES) {
- bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;
-
- u64 q = max(stats->quantiles.entries[i].m, last_q);
- seq_buf_printf(out, "%llu ", div_u64(q, u->nsecs));
- if (is_last)
- seq_buf_printf(out, "\n");
- last_q = q;
- }
- }
-}
-#else
-void bch2_time_stats_to_text(struct printbuf *out, struct bch2_time_stats *stats) {}
-#endif
-
-void bch2_time_stats_exit(struct bch2_time_stats *stats)
-{
- free_percpu(stats->buffer);
-}
-
-void bch2_time_stats_init(struct bch2_time_stats *stats)
-{
- memset(stats, 0, sizeof(*stats));
- stats->duration_stats_weighted.weight = 8;
- stats->freq_stats_weighted.weight = 8;
- stats->min_duration = U64_MAX;
- stats->min_freq = U64_MAX;
- spin_lock_init(&stats->lock);
-}
-
/* ratelimit: */

/**
diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
index 7ff2d4fe26f68..cf8d16a911622 100644
--- a/fs/bcachefs/util.h
+++ b/fs/bcachefs/util.h
@@ -15,6 +15,7 @@
#include <linux/preempt.h>
#include <linux/ratelimit.h>
#include <linux/slab.h>
+#include <linux/time_stats.h>
#include <linux/vmalloc.h>
#include <linux/workqueue.h>
#include <linux/mean_and_variance.h>
@@ -360,87 +361,7 @@ static inline void prt_bdevname(struct printbuf *out, struct block_device *bdev)
#endif
}

-#define NR_QUANTILES 15
-#define QUANTILE_IDX(i) inorder_to_eytzinger0(i, NR_QUANTILES)
-#define QUANTILE_FIRST eytzinger0_first(NR_QUANTILES)
-#define QUANTILE_LAST eytzinger0_last(NR_QUANTILES)
-
-struct bch2_quantiles {
- struct bch2_quantile_entry {
- u64 m;
- u64 step;
- } entries[NR_QUANTILES];
-};
-
-struct bch2_time_stat_buffer {
- unsigned nr;
- struct bch2_time_stat_buffer_entry {
- u64 start;
- u64 end;
- } entries[32];
-};
-
-struct bch2_time_stats {
- spinlock_t lock;
- bool quantiles_enabled;
- /* all fields are in nanoseconds */
- u64 min_duration;
- u64 max_duration;
- u64 total_duration;
- u64 max_freq;
- u64 min_freq;
- u64 last_event;
- struct bch2_quantiles quantiles;
-
- struct mean_and_variance duration_stats;
- struct mean_and_variance_weighted duration_stats_weighted;
- struct mean_and_variance freq_stats;
- struct mean_and_variance_weighted freq_stats_weighted;
- struct bch2_time_stat_buffer __percpu *buffer;
-};
-
-#ifndef CONFIG_BCACHEFS_NO_LATENCY_ACCT
-void __bch2_time_stats_update(struct bch2_time_stats *stats, u64, u64);
-
-static inline void bch2_time_stats_update(struct bch2_time_stats *stats, u64 start)
-{
- __bch2_time_stats_update(stats, start, local_clock());
-}
-
-static inline bool track_event_change(struct bch2_time_stats *stats,
- u64 *start, bool v)
-{
- if (v != !!*start) {
- if (!v) {
- bch2_time_stats_update(stats, *start);
- *start = 0;
- } else {
- *start = local_clock() ?: 1;
- return true;
- }
- }
-
- return false;
-}
-#else
-static inline void __bch2_time_stats_update(struct bch2_time_stats *stats, u64 start, u64 end) {}
-static inline void bch2_time_stats_update(struct bch2_time_stats *stats, u64 start) {}
-static inline bool track_event_change(struct bch2_time_stats *stats,
- u64 *start, bool v)
-{
- bool ret = v && !*start;
- *start = v;
- return ret;
-}
-#endif
-
-void bch2_time_stats_to_text(struct printbuf *, struct bch2_time_stats *);
-
-struct seq_buf;
-void bch2_time_stats_to_seq_buf(struct seq_buf *, struct bch2_time_stats *);
-
-void bch2_time_stats_exit(struct bch2_time_stats *);
-void bch2_time_stats_init(struct bch2_time_stats *);
+void bch2_time_stats_to_text(struct printbuf *, struct time_stats *);

#define ewma_add(ewma, val, weight) \
({ \
diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
new file mode 100644
index 0000000000000..caefa7aba65a0
--- /dev/null
+++ b/include/linux/time_stats.h
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * time_stats - collect statistics on events that have a duration, with nicely
+ * formatted textual output on demand
+ *
+ * - percpu buffering of event collection: cheap enough to shotgun
+ * everywhere without worrying about overhead
+ *
+ * tracks:
+ * - number of events
+ * - maximum event duration ever seen
+ * - sum of all event durations
+ * - average event duration, standard and weighted
+ * - standard deviation of event durations, standard and weighted
+ * and analagous statistics for the frequency of events
+ *
+ * We provide both mean and weighted mean (exponentially weighted), and standard
+ * deviation and weighted standard deviation, to give an efficient-to-compute
+ * view of current behaviour versus. average behaviour - "did this event source
+ * just become wonky, or is this typical?".
+ *
+ * Particularly useful for tracking down latency issues.
+ */
+#ifndef _LINUX_TIME_STATS_H
+#define _LINUX_TIME_STATS_H
+
+#include <linux/mean_and_variance.h>
+#include <linux/sched/clock.h>
+#include <linux/spinlock_types.h>
+
+struct time_unit {
+ const char *name;
+ u64 nsecs;
+};
+
+/*
+ * given a nanosecond value, pick the preferred time units for printing:
+ */
+const struct time_unit *pick_time_units(u64 ns);
+
+/*
+ * quantiles - do not use:
+ *
+ * Only enabled if time_stats->quantiles_enabled has been manually set - don't
+ * use in new code.
+ */
+
+#define NR_QUANTILES 15
+#define QUANTILE_IDX(i) inorder_to_eytzinger0(i, NR_QUANTILES)
+#define QUANTILE_FIRST eytzinger0_first(NR_QUANTILES)
+#define QUANTILE_LAST eytzinger0_last(NR_QUANTILES)
+
+struct quantiles {
+ struct quantile_entry {
+ u64 m;
+ u64 step;
+ } entries[NR_QUANTILES];
+};
+
+struct time_stat_buffer {
+ unsigned nr;
+ struct time_stat_buffer_entry {
+ u64 start;
+ u64 end;
+ } entries[32];
+};
+
+struct time_stats {
+ spinlock_t lock;
+ bool quantiles_enabled;
+ /* all fields are in nanoseconds */
+ u64 min_duration;
+ u64 max_duration;
+ u64 total_duration;
+ u64 max_freq;
+ u64 min_freq;
+ u64 last_event;
+ u64 last_event_start;
+ struct quantiles quantiles;
+
+ struct mean_and_variance duration_stats;
+ struct mean_and_variance_weighted duration_stats_weighted;
+ struct mean_and_variance freq_stats;
+ struct mean_and_variance_weighted freq_stats_weighted;
+ struct time_stat_buffer __percpu *buffer;
+};
+
+void __time_stats_clear_buffer(struct time_stats *, struct time_stat_buffer *);
+void __time_stats_update(struct time_stats *stats, u64, u64);
+
+/**
+ * time_stats_update - collect a new event being tracked
+ *
+ * @stats - time_stats to update
+ * @start - start time of event, recorded with local_clock()
+ *
+ * The end duration of the event will be the current time
+ */
+static inline void time_stats_update(struct time_stats *stats, u64 start)
+{
+ __time_stats_update(stats, start, local_clock());
+}
+
+/**
+ * track_event_change - track state change events
+ *
+ * @stats - time_stats to update
+ * @v - new state, true or false
+ *
+ * Use this when tracking time stats for state changes, i.e. resource X becoming
+ * blocked/unblocked.
+ */
+static inline bool track_event_change(struct time_stats *stats, bool v)
+{
+ if (v != !!stats->last_event_start) {
+ if (!v) {
+ time_stats_update(stats, stats->last_event_start);
+ stats->last_event_start = 0;
+ } else {
+ stats->last_event_start = local_clock() ?: 1;
+ return true;
+ }
+ }
+
+ return false;
+}
+
+struct seq_buf;
+void time_stats_to_seq_buf(struct seq_buf *, struct time_stats *);
+
+void time_stats_exit(struct time_stats *);
+void time_stats_init(struct time_stats *);
+
+#endif /* _LINUX_TIME_STATS_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index 5ddda7c2ed9b3..3ba8b965f8c7e 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -785,3 +785,7 @@ config POLYNOMIAL

config FIRMWARE_TABLE
bool
+
+config TIME_STATS
+ tristate
+ select MEAN_AND_VARIANCE
diff --git a/lib/Makefile b/lib/Makefile
index 6b09731d8e619..57858997c87aa 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -370,6 +370,8 @@ obj-$(CONFIG_SBITMAP) += sbitmap.o

obj-$(CONFIG_PARMAN) += parman.o

+obj-$(CONFIG_TIME_STATS) += time_stats.o
+
obj-y += group_cpus.o

# GCC library routines
diff --git a/lib/time_stats.c b/lib/time_stats.c
new file mode 100644
index 0000000000000..081aeba88b535
--- /dev/null
+++ b/lib/time_stats.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/eytzinger.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/time.h>
+#include <linux/time_stats.h>
+#include <linux/spinlock.h>
+
+static const struct time_unit time_units[] = {
+ { "ns", 1 },
+ { "us", NSEC_PER_USEC },
+ { "ms", NSEC_PER_MSEC },
+ { "s", NSEC_PER_SEC },
+ { "m", (u64) NSEC_PER_SEC * 60},
+ { "h", (u64) NSEC_PER_SEC * 3600},
+ { "eon", U64_MAX },
+};
+
+const struct time_unit *pick_time_units(u64 ns)
+{
+ const struct time_unit *u;
+
+ for (u = time_units;
+ u + 1 < time_units + ARRAY_SIZE(time_units) &&
+ ns >= u[1].nsecs << 1;
+ u++)
+ ;
+
+ return u;
+}
+EXPORT_SYMBOL_GPL(pick_time_units);
+
+static void quantiles_update(struct quantiles *q, u64 v)
+{
+ unsigned i = 0;
+
+ while (i < ARRAY_SIZE(q->entries)) {
+ struct quantile_entry *e = q->entries + i;
+
+ if (unlikely(!e->step)) {
+ e->m = v;
+ e->step = max_t(unsigned, v / 2, 1024);
+ } else if (e->m > v) {
+ e->m = e->m >= e->step
+ ? e->m - e->step
+ : 0;
+ } else if (e->m < v) {
+ e->m = e->m + e->step > e->m
+ ? e->m + e->step
+ : U32_MAX;
+ }
+
+ if ((e->m > v ? e->m - v : v - e->m) < e->step)
+ e->step = max_t(unsigned, e->step / 2, 1);
+
+ if (v >= e->m)
+ break;
+
+ i = eytzinger0_child(i, v > e->m);
+ }
+}
+
+static inline void time_stats_update_one(struct time_stats *stats,
+ u64 start, u64 end)
+{
+ u64 duration, freq;
+
+ if (time_after64(end, start)) {
+ duration = end - start;
+ mean_and_variance_update(&stats->duration_stats, duration);
+ mean_and_variance_weighted_update(&stats->duration_stats_weighted, duration);
+ stats->max_duration = max(stats->max_duration, duration);
+ stats->min_duration = min(stats->min_duration, duration);
+ stats->total_duration += duration;
+
+ if (stats->quantiles_enabled)
+ quantiles_update(&stats->quantiles, duration);
+ }
+
+ if (stats->last_event && time_after64(end, stats->last_event)) {
+ freq = end - stats->last_event;
+ mean_and_variance_update(&stats->freq_stats, freq);
+ mean_and_variance_weighted_update(&stats->freq_stats_weighted, freq);
+ stats->max_freq = max(stats->max_freq, freq);
+ stats->min_freq = min(stats->min_freq, freq);
+ }
+
+ stats->last_event = end;
+}
+
+void __time_stats_clear_buffer(struct time_stats *stats,
+ struct time_stat_buffer *b)
+{
+ for (struct time_stat_buffer_entry *i = b->entries;
+ i < b->entries + ARRAY_SIZE(b->entries);
+ i++)
+ time_stats_update_one(stats, i->start, i->end);
+ b->nr = 0;
+}
+EXPORT_SYMBOL_GPL(__time_stats_clear_buffer);
+
+static noinline void time_stats_clear_buffer(struct time_stats *stats,
+ struct time_stat_buffer *b)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&stats->lock, flags);
+ __time_stats_clear_buffer(stats, b);
+ spin_unlock_irqrestore(&stats->lock, flags);
+}
+
+void __time_stats_update(struct time_stats *stats, u64 start, u64 end)
+{
+ unsigned long flags;
+
+ WARN_ONCE(!stats->duration_stats_weighted.weight ||
+ !stats->freq_stats_weighted.weight,
+ "uninitialized time_stats");
+
+ if (!stats->buffer) {
+ spin_lock_irqsave(&stats->lock, flags);
+ time_stats_update_one(stats, start, end);
+
+ if (mean_and_variance_weighted_get_mean(stats->freq_stats_weighted) < 32 &&
+ stats->duration_stats.n > 1024)
+ stats->buffer =
+ alloc_percpu_gfp(struct time_stat_buffer,
+ GFP_ATOMIC);
+ spin_unlock_irqrestore(&stats->lock, flags);
+ } else {
+ struct time_stat_buffer *b;
+
+ preempt_disable();
+ b = this_cpu_ptr(stats->buffer);
+
+ BUG_ON(b->nr >= ARRAY_SIZE(b->entries));
+ b->entries[b->nr++] = (struct time_stat_buffer_entry) {
+ .start = start,
+ .end = end
+ };
+
+ if (unlikely(b->nr == ARRAY_SIZE(b->entries)))
+ time_stats_clear_buffer(stats, b);
+ preempt_enable();
+ }
+}
+EXPORT_SYMBOL_GPL(__time_stats_update);
+
+#include <linux/seq_buf.h>
+
+static void seq_buf_time_units_aligned(struct seq_buf *out, u64 ns)
+{
+ const struct time_unit *u = pick_time_units(ns);
+
+ seq_buf_printf(out, "%8llu %s", div64_u64(ns, u->nsecs), u->name);
+}
+
+void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats)
+{
+ s64 f_mean = 0, d_mean = 0;
+ u64 f_stddev = 0, d_stddev = 0;
+
+ if (stats->buffer) {
+ int cpu;
+
+ spin_lock_irq(&stats->lock);
+ for_each_possible_cpu(cpu)
+ __time_stats_clear_buffer(stats, per_cpu_ptr(stats->buffer, cpu));
+ spin_unlock_irq(&stats->lock);
+ }
+
+ /*
+ * avoid divide by zero
+ */
+ if (stats->freq_stats.n) {
+ f_mean = mean_and_variance_get_mean(stats->freq_stats);
+ f_stddev = mean_and_variance_get_stddev(stats->freq_stats);
+ d_mean = mean_and_variance_get_mean(stats->duration_stats);
+ d_stddev = mean_and_variance_get_stddev(stats->duration_stats);
+ }
+
+ seq_buf_printf(out, "count: %llu\n", stats->duration_stats.n);
+
+ seq_buf_printf(out, " since mount recent\n");
+
+ seq_buf_printf(out, "duration of events\n");
+
+ seq_buf_printf(out, " min: ");
+ seq_buf_time_units_aligned(out, stats->min_duration);
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " max: ");
+ seq_buf_time_units_aligned(out, stats->max_duration);
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " total: ");
+ seq_buf_time_units_aligned(out, stats->total_duration);
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " mean: ");
+ seq_buf_time_units_aligned(out, d_mean);
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->duration_stats_weighted));
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " stddev: ");
+ seq_buf_time_units_aligned(out, d_stddev);
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted));
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, "time between events\n");
+
+ seq_buf_printf(out, " min: ");
+ seq_buf_time_units_aligned(out, stats->min_freq);
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " max: ");
+ seq_buf_time_units_aligned(out, stats->max_freq);
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " mean: ");
+ seq_buf_time_units_aligned(out, f_mean);
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->freq_stats_weighted));
+ seq_buf_printf(out, "\n");
+
+ seq_buf_printf(out, " stddev: ");
+ seq_buf_time_units_aligned(out, f_stddev);
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted));
+ seq_buf_printf(out, "\n");
+
+ if (stats->quantiles_enabled) {
+ int i = eytzinger0_first(NR_QUANTILES);
+ const struct time_unit *u =
+ pick_time_units(stats->quantiles.entries[i].m);
+ u64 last_q = 0;
+
+ seq_buf_printf(out, "quantiles (%s):\t", u->name);
+ eytzinger0_for_each(i, NR_QUANTILES) {
+ bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;
+
+ u64 q = max(stats->quantiles.entries[i].m, last_q);
+ seq_buf_printf(out, "%llu ", div_u64(q, u->nsecs));
+ if (is_last)
+ seq_buf_printf(out, "\n");
+ last_q = q;
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(time_stats_to_seq_buf);
+
+void time_stats_exit(struct time_stats *stats)
+{
+ free_percpu(stats->buffer);
+}
+EXPORT_SYMBOL_GPL(time_stats_exit);
+
+void time_stats_init(struct time_stats *stats)
+{
+ memset(stats, 0, sizeof(*stats));
+ stats->duration_stats_weighted.weight = 8;
+ stats->freq_stats_weighted.weight = 8;
+ stats->min_duration = U64_MAX;
+ stats->min_freq = U64_MAX;
+ spin_lock_init(&stats->lock);
+}
+EXPORT_SYMBOL_GPL(time_stats_init);
+
+MODULE_AUTHOR("Kent Overstreet");
+MODULE_LICENSE("GPL");


2024-02-24 01:10:40

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 01/10] time_stats: report lifetime of the stats object

From: Darrick J. Wong <[email protected]>

Capture the initialization time of the time_stats object so that we can
report how long the counter has been observing data.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/time_stats.h | 2 ++
lib/time_stats.c | 10 ++++++++++
2 files changed, 12 insertions(+)


diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
index caefa7aba65a0..eb1957cb77c0d 100644
--- a/include/linux/time_stats.h
+++ b/include/linux/time_stats.h
@@ -78,6 +78,8 @@ struct time_stats {
u64 last_event_start;
struct quantiles quantiles;

+ u64 start_time;
+
struct mean_and_variance duration_stats;
struct mean_and_variance_weighted duration_stats_weighted;
struct mean_and_variance freq_stats;
diff --git a/lib/time_stats.c b/lib/time_stats.c
index 081aeba88b535..8df4b55fc6337 100644
--- a/lib/time_stats.c
+++ b/lib/time_stats.c
@@ -158,10 +158,16 @@ static void seq_buf_time_units_aligned(struct seq_buf *out, u64 ns)
seq_buf_printf(out, "%8llu %s", div64_u64(ns, u->nsecs), u->name);
}

+static inline u64 time_stats_lifetime(const struct time_stats *stats)
+{
+ return local_clock() - stats->start_time;
+}
+
void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats)
{
s64 f_mean = 0, d_mean = 0;
u64 f_stddev = 0, d_stddev = 0;
+ u64 lifetime = time_stats_lifetime(stats);

if (stats->buffer) {
int cpu;
@@ -183,6 +189,9 @@ void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats)
}

seq_buf_printf(out, "count: %llu\n", stats->duration_stats.n);
+ seq_buf_printf(out, "lifetime: ");
+ seq_buf_time_units_aligned(out, lifetime);
+ seq_buf_printf(out, "\n");

seq_buf_printf(out, " since mount recent\n");

@@ -263,6 +272,7 @@ void time_stats_init(struct time_stats *stats)
stats->freq_stats_weighted.weight = 8;
stats->min_duration = U64_MAX;
stats->min_freq = U64_MAX;
+ stats->start_time = local_clock();
spin_lock_init(&stats->lock);
}
EXPORT_SYMBOL_GPL(time_stats_init);


2024-02-24 01:10:56

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 02/10] time_stats: split stats-with-quantiles into a separate structure

From: Darrick J. Wong <[email protected]>

Currently, struct time_stats has the optional ability to quantize the
information that it collects. This is /probably/ useful for callers who
want to see quantized information, but it more than doubles the size of
the structure from 224 bytes to 464. For users who don't care about
that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per
counter, split the two into separate pieces.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
fs/bcachefs/bcachefs.h | 2 +-
fs/bcachefs/io_write.c | 2 +-
fs/bcachefs/super.c | 10 ++++------
fs/bcachefs/sysfs.c | 4 ++--
fs/bcachefs/util.c | 7 ++++---
include/linux/time_stats.h | 36 ++++++++++++++++++++++++++++++++++--
lib/time_stats.c | 17 ++++++++++-------
7 files changed, 56 insertions(+), 22 deletions(-)


diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index 92547d6fd2d95..04e4a65909a4f 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -594,7 +594,7 @@ struct bch_dev {

/* The rest of this all shows up in sysfs */
atomic64_t cur_latency[2];
- struct time_stats io_latency[2];
+ struct time_stats_quantiles io_latency[2];

#define CONGESTED_MAX 1024
atomic_t congested;
diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
index 8123a84320e3f..3fa2cb1d5b13a 100644
--- a/fs/bcachefs/io_write.c
+++ b/fs/bcachefs/io_write.c
@@ -88,7 +88,7 @@ void bch2_latency_acct(struct bch_dev *ca, u64 submit_time, int rw)

bch2_congested_acct(ca, io_latency, now, rw);

- __time_stats_update(&ca->io_latency[rw], submit_time, now);
+ __time_stats_update(&ca->io_latency[rw].stats, submit_time, now);
}

#endif
diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index c491c5e102287..2c238030fb5d7 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -1168,8 +1168,8 @@ static void bch2_dev_free(struct bch_dev *ca)
bch2_dev_buckets_free(ca);
free_page((unsigned long) ca->sb_read_scratch);

- time_stats_exit(&ca->io_latency[WRITE]);
- time_stats_exit(&ca->io_latency[READ]);
+ time_stats_quantiles_exit(&ca->io_latency[WRITE]);
+ time_stats_quantiles_exit(&ca->io_latency[READ]);

percpu_ref_exit(&ca->io_ref);
percpu_ref_exit(&ca->ref);
@@ -1260,10 +1260,8 @@ static struct bch_dev *__bch2_dev_alloc(struct bch_fs *c,

INIT_WORK(&ca->io_error_work, bch2_io_error_work);

- time_stats_init(&ca->io_latency[READ]);
- time_stats_init(&ca->io_latency[WRITE]);
- ca->io_latency[READ].quantiles_enabled = true;
- ca->io_latency[WRITE].quantiles_enabled = true;
+ time_stats_quantiles_init(&ca->io_latency[READ]);
+ time_stats_quantiles_init(&ca->io_latency[WRITE]);

ca->mi = bch2_mi_to_cpu(member);

diff --git a/fs/bcachefs/sysfs.c b/fs/bcachefs/sysfs.c
index cee80c47feea2..c86a93a8d8fc8 100644
--- a/fs/bcachefs/sysfs.c
+++ b/fs/bcachefs/sysfs.c
@@ -930,10 +930,10 @@ SHOW(bch2_dev)
sysfs_print(io_latency_write, atomic64_read(&ca->cur_latency[WRITE]));

if (attr == &sysfs_io_latency_stats_read)
- bch2_time_stats_to_text(out, &ca->io_latency[READ]);
+ bch2_time_stats_to_text(out, &ca->io_latency[READ].stats);

if (attr == &sysfs_io_latency_stats_write)
- bch2_time_stats_to_text(out, &ca->io_latency[WRITE]);
+ bch2_time_stats_to_text(out, &ca->io_latency[WRITE].stats);

sysfs_printf(congested, "%u%%",
clamp(atomic_read(&ca->congested), 0, CONGESTED_MAX)
diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c
index 88853513a15fa..ef620bfe76cd2 100644
--- a/fs/bcachefs/util.c
+++ b/fs/bcachefs/util.c
@@ -365,6 +365,7 @@ static inline void pr_name_and_units(struct printbuf *out, const char *name, u64

void bch2_time_stats_to_text(struct printbuf *out, struct time_stats *stats)
{
+ struct quantiles *quantiles = time_stats_to_quantiles(stats);
s64 f_mean = 0, d_mean = 0;
u64 f_stddev = 0, d_stddev = 0;

@@ -465,17 +466,17 @@ void bch2_time_stats_to_text(struct printbuf *out, struct time_stats *stats)

printbuf_tabstops_reset(out);

- if (stats->quantiles_enabled) {
+ if (quantiles) {
int i = eytzinger0_first(NR_QUANTILES);
const struct time_unit *u =
- pick_time_units(stats->quantiles.entries[i].m);
+ pick_time_units(quantiles->entries[i].m);
u64 last_q = 0;

prt_printf(out, "quantiles (%s):\t", u->name);
eytzinger0_for_each(i, NR_QUANTILES) {
bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;

- u64 q = max(stats->quantiles.entries[i].m, last_q);
+ u64 q = max(quantiles->entries[i].m, last_q);
prt_printf(out, "%llu ", div_u64(q, u->nsecs));
if (is_last)
prt_newline(out);
diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
index eb1957cb77c0d..c05490101d197 100644
--- a/include/linux/time_stats.h
+++ b/include/linux/time_stats.h
@@ -27,6 +27,7 @@
#include <linux/mean_and_variance.h>
#include <linux/sched/clock.h>
#include <linux/spinlock_types.h>
+#include <linux/string.h>

struct time_unit {
const char *name;
@@ -67,7 +68,6 @@ struct time_stat_buffer {

struct time_stats {
spinlock_t lock;
- bool quantiles_enabled;
/* all fields are in nanoseconds */
u64 min_duration;
u64 max_duration;
@@ -76,7 +76,12 @@ struct time_stats {
u64 min_freq;
u64 last_event;
u64 last_event_start;
- struct quantiles quantiles;
+
+/*
+ * Is this really a struct time_stats_quantiled? Hide this flag in the least
+ * significant bit of the start time to avoid blowing up the structure size.
+ */
+#define TIME_STATS_HAVE_QUANTILES (1ULL << 0)

u64 start_time;

@@ -87,6 +92,22 @@ struct time_stats {
struct time_stat_buffer __percpu *buffer;
};

+struct time_stats_quantiles {
+ struct time_stats stats;
+ struct quantiles quantiles;
+};
+
+static inline struct quantiles *time_stats_to_quantiles(struct time_stats *stats)
+{
+ struct time_stats_quantiles *statq;
+
+ if (!(stats->start_time & TIME_STATS_HAVE_QUANTILES))
+ return NULL;
+
+ statq = container_of(stats, struct time_stats_quantiles, stats);
+ return &statq->quantiles;
+}
+
void __time_stats_clear_buffer(struct time_stats *, struct time_stat_buffer *);
void __time_stats_update(struct time_stats *stats, u64, u64);

@@ -133,4 +154,15 @@ void time_stats_to_seq_buf(struct seq_buf *, struct time_stats *);
void time_stats_exit(struct time_stats *);
void time_stats_init(struct time_stats *);

+static inline void time_stats_quantiles_exit(struct time_stats_quantiles *statq)
+{
+ time_stats_exit(&statq->stats);
+}
+static inline void time_stats_quantiles_init(struct time_stats_quantiles *statq)
+{
+ time_stats_init(&statq->stats);
+ statq->stats.start_time |= TIME_STATS_HAVE_QUANTILES;
+ memset(&statq->quantiles, 0, sizeof(statq->quantiles));
+}
+
#endif /* _LINUX_TIME_STATS_H */
diff --git a/lib/time_stats.c b/lib/time_stats.c
index 8df4b55fc6337..767b1a340e805 100644
--- a/lib/time_stats.c
+++ b/lib/time_stats.c
@@ -69,6 +69,8 @@ static inline void time_stats_update_one(struct time_stats *stats,
u64 duration, freq;

if (time_after64(end, start)) {
+ struct quantiles *quantiles = time_stats_to_quantiles(stats);
+
duration = end - start;
mean_and_variance_update(&stats->duration_stats, duration);
mean_and_variance_weighted_update(&stats->duration_stats_weighted, duration);
@@ -76,8 +78,8 @@ static inline void time_stats_update_one(struct time_stats *stats,
stats->min_duration = min(stats->min_duration, duration);
stats->total_duration += duration;

- if (stats->quantiles_enabled)
- quantiles_update(&stats->quantiles, duration);
+ if (quantiles)
+ quantiles_update(quantiles, duration);
}

if (stats->last_event && time_after64(end, stats->last_event)) {
@@ -160,11 +162,12 @@ static void seq_buf_time_units_aligned(struct seq_buf *out, u64 ns)

static inline u64 time_stats_lifetime(const struct time_stats *stats)
{
- return local_clock() - stats->start_time;
+ return local_clock() - (stats->start_time & ~TIME_STATS_HAVE_QUANTILES);
}

void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats)
{
+ struct quantiles *quantiles = time_stats_to_quantiles(stats);
s64 f_mean = 0, d_mean = 0;
u64 f_stddev = 0, d_stddev = 0;
u64 lifetime = time_stats_lifetime(stats);
@@ -239,17 +242,17 @@ void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats)
seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted));
seq_buf_printf(out, "\n");

- if (stats->quantiles_enabled) {
+ if (quantiles) {
int i = eytzinger0_first(NR_QUANTILES);
const struct time_unit *u =
- pick_time_units(stats->quantiles.entries[i].m);
+ pick_time_units(quantiles->entries[i].m);
u64 last_q = 0;

seq_buf_printf(out, "quantiles (%s):\t", u->name);
eytzinger0_for_each(i, NR_QUANTILES) {
bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;

- u64 q = max(stats->quantiles.entries[i].m, last_q);
+ u64 q = max(quantiles->entries[i].m, last_q);
seq_buf_printf(out, "%llu ", div_u64(q, u->nsecs));
if (is_last)
seq_buf_printf(out, "\n");
@@ -272,7 +275,7 @@ void time_stats_init(struct time_stats *stats)
stats->freq_stats_weighted.weight = 8;
stats->min_duration = U64_MAX;
stats->min_freq = U64_MAX;
- stats->start_time = local_clock();
+ stats->start_time = local_clock() & ~TIME_STATS_HAVE_QUANTILES;
spin_lock_init(&stats->lock);
}
EXPORT_SYMBOL_GPL(time_stats_init);


2024-02-24 01:11:29

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 03/10] time_stats: fix struct layout bloat

From: Darrick J. Wong <[email protected]>

Make these more efficient by getting rid of the holes. This reduces the
structure size from 224 bytes to 208 bytes.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/time_stats.h | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)


diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
index c05490101d197..1c1ba8efa7bfe 100644
--- a/include/linux/time_stats.h
+++ b/include/linux/time_stats.h
@@ -77,19 +77,19 @@ struct time_stats {
u64 last_event;
u64 last_event_start;

-/*
- * Is this really a struct time_stats_quantiled? Hide this flag in the least
- * significant bit of the start time to avoid blowing up the structure size.
- */
-#define TIME_STATS_HAVE_QUANTILES (1ULL << 0)
-
- u64 start_time;
-
struct mean_and_variance duration_stats;
- struct mean_and_variance_weighted duration_stats_weighted;
struct mean_and_variance freq_stats;
+ struct mean_and_variance_weighted duration_stats_weighted;
struct mean_and_variance_weighted freq_stats_weighted;
struct time_stat_buffer __percpu *buffer;
+
+/*
+ * Is this really a struct time_stats_quantiled? Hide this flag in the least
+ * significant bit of the start time to avoid blowing up the structure size.
+ */
+#define TIME_STATS_HAVE_QUANTILES (1ULL << 0)
+
+ u64 start_time;
};

struct time_stats_quantiles {


2024-02-24 01:11:59

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 06/10] time_stats: allow custom epoch names

From: Darrick J. Wong <[email protected]>

Let callers of time_stats_to_seq_buf define the epoch name; "mount"
doesn't make sense generally.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/time_stats.h | 2 +-
lib/time_stats.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)


diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
index 994823c17bca9..b2f71e3862c0f 100644
--- a/include/linux/time_stats.h
+++ b/include/linux/time_stats.h
@@ -151,7 +151,7 @@ static inline bool track_event_change(struct time_stats *stats, bool v)
#define TIME_STATS_PRINT_NO_ZEROES (1U << 0) /* print nothing if zero count */
struct seq_buf;
void time_stats_to_seq_buf(struct seq_buf *, struct time_stats *,
- unsigned int flags);
+ const char *epoch_name, unsigned int flags);

void time_stats_exit(struct time_stats *);
void time_stats_init(struct time_stats *);
diff --git a/lib/time_stats.c b/lib/time_stats.c
index 382935979f8f7..f4a21409006bd 100644
--- a/lib/time_stats.c
+++ b/lib/time_stats.c
@@ -169,7 +169,7 @@ static inline u64 time_stats_lifetime(const struct time_stats *stats)
}

void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats,
- unsigned int flags)
+ const char *epoch_name, unsigned int flags)
{
struct quantiles *quantiles = time_stats_to_quantiles(stats);
s64 f_mean = 0, d_mean = 0;
@@ -201,7 +201,7 @@ void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats,
seq_buf_time_units_aligned(out, lifetime);
seq_buf_printf(out, "\n");

- seq_buf_printf(out, " since mount recent\n");
+ seq_buf_printf(out, " since %-12s recent\n", epoch_name);

seq_buf_printf(out, "duration of events\n");



2024-02-24 01:12:12

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 07/10] mean_and_variance: put struct mean_and_variance_weighted on a diet

From: Darrick J. Wong <[email protected]>

The only caller of this code (time_stats) always knows the weights and
whether or not any information has been collected. Pass this
information into the mean and variance code so that it doesn't have to
store that information. This reduces the structure size from 24 to 16
bytes, which shrinks each time_stats counter to 192 bytes from 208.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
fs/bcachefs/util.c | 8 ++--
include/linux/mean_and_variance.h | 14 ++++--
include/linux/time_stats.h | 4 ++
lib/math/mean_and_variance.c | 28 ++++++++-----
lib/math/mean_and_variance_test.c | 80 ++++++++++++++++++++-----------------
lib/time_stats.c | 23 +++++------
6 files changed, 87 insertions(+), 70 deletions(-)


diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c
index ef620bfe76cd2..4c3e19d562852 100644
--- a/fs/bcachefs/util.c
+++ b/fs/bcachefs/util.c
@@ -429,14 +429,14 @@ void bch2_time_stats_to_text(struct printbuf *out, struct time_stats *stats)
prt_tab(out);
bch2_pr_time_units_aligned(out, d_mean);
prt_tab(out);
- bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->duration_stats_weighted));
+ bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT));
prt_newline(out);

prt_printf(out, "stddev:");
prt_tab(out);
bch2_pr_time_units_aligned(out, d_stddev);
prt_tab(out);
- bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted));
+ bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT));

printbuf_indent_sub(out, 2);
prt_newline(out);
@@ -452,14 +452,14 @@ void bch2_time_stats_to_text(struct printbuf *out, struct time_stats *stats)
prt_tab(out);
bch2_pr_time_units_aligned(out, f_mean);
prt_tab(out);
- bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->freq_stats_weighted));
+ bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT));
prt_newline(out);

prt_printf(out, "stddev:");
prt_tab(out);
bch2_pr_time_units_aligned(out, f_stddev);
prt_tab(out);
- bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted));
+ bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT));

printbuf_indent_sub(out, 2);
prt_newline(out);
diff --git a/include/linux/mean_and_variance.h b/include/linux/mean_and_variance.h
index 64df11ab422bf..4fcf062dd22c7 100644
--- a/include/linux/mean_and_variance.h
+++ b/include/linux/mean_and_variance.h
@@ -154,8 +154,6 @@ struct mean_and_variance {

/* expontentially weighted variant */
struct mean_and_variance_weighted {
- bool init;
- u8 weight; /* base 2 logarithim */
s64 mean;
u64 variance;
};
@@ -192,10 +190,14 @@ s64 mean_and_variance_get_mean(struct mean_and_variance s);
u64 mean_and_variance_get_variance(struct mean_and_variance s1);
u32 mean_and_variance_get_stddev(struct mean_and_variance s);

-void mean_and_variance_weighted_update(struct mean_and_variance_weighted *s, s64 v);
+void mean_and_variance_weighted_update(struct mean_and_variance_weighted *s,
+ s64 v, bool initted, u8 weight);

-s64 mean_and_variance_weighted_get_mean(struct mean_and_variance_weighted s);
-u64 mean_and_variance_weighted_get_variance(struct mean_and_variance_weighted s);
-u32 mean_and_variance_weighted_get_stddev(struct mean_and_variance_weighted s);
+s64 mean_and_variance_weighted_get_mean(struct mean_and_variance_weighted s,
+ u8 weight);
+u64 mean_and_variance_weighted_get_variance(struct mean_and_variance_weighted s,
+ u8 weight);
+u32 mean_and_variance_weighted_get_stddev(struct mean_and_variance_weighted s,
+ u8 weight);

#endif // MEAN_AND_VAIRANCE_H_
diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
index b2f71e3862c0f..dc539123f7997 100644
--- a/include/linux/time_stats.h
+++ b/include/linux/time_stats.h
@@ -79,6 +79,10 @@ struct time_stats {

struct mean_and_variance duration_stats;
struct mean_and_variance freq_stats;
+
+/* default weight for weighted mean and variance calculations */
+#define TIME_STATS_MV_WEIGHT 8
+
struct mean_and_variance_weighted duration_stats_weighted;
struct mean_and_variance_weighted freq_stats_weighted;
struct time_stat_buffer __percpu *buffer;
diff --git a/lib/math/mean_and_variance.c b/lib/math/mean_and_variance.c
index ba90293204bae..21ec6afc67884 100644
--- a/lib/math/mean_and_variance.c
+++ b/lib/math/mean_and_variance.c
@@ -102,14 +102,17 @@ EXPORT_SYMBOL_GPL(mean_and_variance_get_stddev);
* mean_and_variance_weighted_update() - exponentially weighted variant of mean_and_variance_update()
* @s: mean and variance number of samples and their sums
* @x: new value to include in the &mean_and_variance_weighted
+ * @initted: caller must track whether this is the first use or not
+ * @weight: ewma weight
*
* see linked pdf: function derived from equations 140-143 where alpha = 2^w.
* values are stored bitshifted for performance and added precision.
*/
-void mean_and_variance_weighted_update(struct mean_and_variance_weighted *s, s64 x)
+void mean_and_variance_weighted_update(struct mean_and_variance_weighted *s,
+ s64 x, bool initted, u8 weight)
{
// previous weighted variance.
- u8 w = s->weight;
+ u8 w = weight;
u64 var_w0 = s->variance;
// new value weighted.
s64 x_w = x << w;
@@ -118,45 +121,50 @@ void mean_and_variance_weighted_update(struct mean_and_variance_weighted *s, s64
// new mean weighted.
s64 u_w1 = s->mean + diff;

- if (!s->init) {
+ if (!initted) {
s->mean = x_w;
s->variance = 0;
} else {
s->mean = u_w1;
s->variance = ((var_w0 << w) - var_w0 + ((diff_w * (x_w - u_w1)) >> w)) >> w;
}
- s->init = true;
}
EXPORT_SYMBOL_GPL(mean_and_variance_weighted_update);

/**
* mean_and_variance_weighted_get_mean() - get mean from @s
* @s: mean and variance number of samples and their sums
+ * @weight: ewma weight
*/
-s64 mean_and_variance_weighted_get_mean(struct mean_and_variance_weighted s)
+s64 mean_and_variance_weighted_get_mean(struct mean_and_variance_weighted s,
+ u8 weight)
{
- return fast_divpow2(s.mean, s.weight);
+ return fast_divpow2(s.mean, weight);
}
EXPORT_SYMBOL_GPL(mean_and_variance_weighted_get_mean);

/**
* mean_and_variance_weighted_get_variance() -- get variance from @s
* @s: mean and variance number of samples and their sums
+ * @weight: ewma weight
*/
-u64 mean_and_variance_weighted_get_variance(struct mean_and_variance_weighted s)
+u64 mean_and_variance_weighted_get_variance(struct mean_and_variance_weighted s,
+ u8 weight)
{
// always positive don't need fast divpow2
- return s.variance >> s.weight;
+ return s.variance >> weight;
}
EXPORT_SYMBOL_GPL(mean_and_variance_weighted_get_variance);

/**
* mean_and_variance_weighted_get_stddev() - get standard deviation from @s
* @s: mean and variance number of samples and their sums
+ * @weight: ewma weight
*/
-u32 mean_and_variance_weighted_get_stddev(struct mean_and_variance_weighted s)
+u32 mean_and_variance_weighted_get_stddev(struct mean_and_variance_weighted s,
+ u8 weight)
{
- return int_sqrt64(mean_and_variance_weighted_get_variance(s));
+ return int_sqrt64(mean_and_variance_weighted_get_variance(s, weight));
}
EXPORT_SYMBOL_GPL(mean_and_variance_weighted_get_stddev);

diff --git a/lib/math/mean_and_variance_test.c b/lib/math/mean_and_variance_test.c
index f45591a169d87..0d8c2451a8588 100644
--- a/lib/math/mean_and_variance_test.c
+++ b/lib/math/mean_and_variance_test.c
@@ -30,53 +30,59 @@ static void mean_and_variance_basic_test(struct kunit *test)

static void mean_and_variance_weighted_test(struct kunit *test)
{
- struct mean_and_variance_weighted s = { .weight = 2 };
+ struct mean_and_variance_weighted s = { };

- mean_and_variance_weighted_update(&s, 10);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s), 10);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s), 0);
+ mean_and_variance_weighted_update(&s, 10, false, 2);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s, 2), 10);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s, 2), 0);

- mean_and_variance_weighted_update(&s, 20);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s), 12);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s), 18);
+ mean_and_variance_weighted_update(&s, 20, true, 2);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s, 2), 12);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s, 2), 18);

- mean_and_variance_weighted_update(&s, 30);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s), 16);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s), 72);
+ mean_and_variance_weighted_update(&s, 30, true, 2);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s, 2), 16);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s, 2), 72);

- s = (struct mean_and_variance_weighted) { .weight = 2 };
+ s = (struct mean_and_variance_weighted) { };

- mean_and_variance_weighted_update(&s, -10);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s), -10);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s), 0);
+ mean_and_variance_weighted_update(&s, -10, false, 2);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s, 2), -10);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s, 2), 0);

- mean_and_variance_weighted_update(&s, -20);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s), -12);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s), 18);
+ mean_and_variance_weighted_update(&s, -20, true, 2);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s, 2), -12);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s, 2), 18);

- mean_and_variance_weighted_update(&s, -30);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s), -16);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s), 72);
+ mean_and_variance_weighted_update(&s, -30, true, 2);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s, 2), -16);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s, 2), 72);
}

static void mean_and_variance_weighted_advanced_test(struct kunit *test)
{
- struct mean_and_variance_weighted s = { .weight = 8 };
+ struct mean_and_variance_weighted s = { };
+ bool initted = false;
s64 i;

- for (i = 10; i <= 100; i += 10)
- mean_and_variance_weighted_update(&s, i);
+ for (i = 10; i <= 100; i += 10) {
+ mean_and_variance_weighted_update(&s, i, initted, 8);
+ initted = true;
+ }

- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s), 11);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s), 107);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s, 8), 11);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s, 8), 107);

- s = (struct mean_and_variance_weighted) { .weight = 8 };
+ s = (struct mean_and_variance_weighted) { };
+ initted = false;

- for (i = -10; i >= -100; i -= 10)
- mean_and_variance_weighted_update(&s, i);
+ for (i = -10; i >= -100; i -= 10) {
+ mean_and_variance_weighted_update(&s, i, initted, 8);
+ initted = true;
+ }

- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s), -11);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s), 107);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(s, 8), -11);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_variance(s, 8), 107);
}

static void do_mean_and_variance_test(struct kunit *test,
@@ -91,26 +97,26 @@ static void do_mean_and_variance_test(struct kunit *test,
s64 *weighted_stddev)
{
struct mean_and_variance mv = {};
- struct mean_and_variance_weighted vw = { .weight = weight };
+ struct mean_and_variance_weighted vw = { };

for (unsigned i = 0; i < initial_n; i++) {
mean_and_variance_update(&mv, initial_value);
- mean_and_variance_weighted_update(&vw, initial_value);
+ mean_and_variance_weighted_update(&vw, initial_value, false, weight);

KUNIT_EXPECT_EQ(test, mean_and_variance_get_mean(mv), initial_value);
KUNIT_EXPECT_EQ(test, mean_and_variance_get_stddev(mv), 0);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(vw), initial_value);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_stddev(vw),0);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(vw, weight), initial_value);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_stddev(vw, weight),0);
}

for (unsigned i = 0; i < n; i++) {
mean_and_variance_update(&mv, data[i]);
- mean_and_variance_weighted_update(&vw, data[i]);
+ mean_and_variance_weighted_update(&vw, data[i], true, weight);

KUNIT_EXPECT_EQ(test, mean_and_variance_get_mean(mv), mean[i]);
KUNIT_EXPECT_EQ(test, mean_and_variance_get_stddev(mv), stddev[i]);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(vw), weighted_mean[i]);
- KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_stddev(vw),weighted_stddev[i]);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_mean(vw, weight), weighted_mean[i]);
+ KUNIT_EXPECT_EQ(test, mean_and_variance_weighted_get_stddev(vw, weight),weighted_stddev[i]);
}

KUNIT_EXPECT_EQ(test, mv.n, initial_n + n);
diff --git a/lib/time_stats.c b/lib/time_stats.c
index f4a21409006bd..0fb3d854e503b 100644
--- a/lib/time_stats.c
+++ b/lib/time_stats.c
@@ -70,13 +70,15 @@ static inline void time_stats_update_one(struct time_stats *stats,
u64 start, u64 end)
{
u64 duration, freq;
+ bool initted = stats->last_event != 0;

if (time_after64(end, start)) {
struct quantiles *quantiles = time_stats_to_quantiles(stats);

duration = end - start;
mean_and_variance_update(&stats->duration_stats, duration);
- mean_and_variance_weighted_update(&stats->duration_stats_weighted, duration);
+ mean_and_variance_weighted_update(&stats->duration_stats_weighted,
+ duration, initted, TIME_STATS_MV_WEIGHT);
stats->max_duration = max(stats->max_duration, duration);
stats->min_duration = min(stats->min_duration, duration);
stats->total_duration += duration;
@@ -88,7 +90,8 @@ static inline void time_stats_update_one(struct time_stats *stats,
if (stats->last_event && time_after64(end, stats->last_event)) {
freq = end - stats->last_event;
mean_and_variance_update(&stats->freq_stats, freq);
- mean_and_variance_weighted_update(&stats->freq_stats_weighted, freq);
+ mean_and_variance_weighted_update(&stats->freq_stats_weighted,
+ freq, initted, TIME_STATS_MV_WEIGHT);
stats->max_freq = max(stats->max_freq, freq);
stats->min_freq = min(stats->min_freq, freq);
}
@@ -121,15 +124,11 @@ void __time_stats_update(struct time_stats *stats, u64 start, u64 end)
{
unsigned long flags;

- WARN_ONCE(!stats->duration_stats_weighted.weight ||
- !stats->freq_stats_weighted.weight,
- "uninitialized time_stats");
-
if (!stats->buffer) {
spin_lock_irqsave(&stats->lock, flags);
time_stats_update_one(stats, start, end);

- if (mean_and_variance_weighted_get_mean(stats->freq_stats_weighted) < 32 &&
+ if (mean_and_variance_weighted_get_mean(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT) < 32 &&
stats->duration_stats.n > 1024)
stats->buffer =
alloc_percpu_gfp(struct time_stat_buffer,
@@ -219,12 +218,12 @@ void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats,

seq_buf_printf(out, " mean: ");
seq_buf_time_units_aligned(out, d_mean);
- seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->duration_stats_weighted));
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT));
seq_buf_printf(out, "\n");

seq_buf_printf(out, " stddev: ");
seq_buf_time_units_aligned(out, d_stddev);
- seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted));
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT));
seq_buf_printf(out, "\n");

seq_buf_printf(out, "time between events\n");
@@ -239,12 +238,12 @@ void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats,

seq_buf_printf(out, " mean: ");
seq_buf_time_units_aligned(out, f_mean);
- seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->freq_stats_weighted));
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT));
seq_buf_printf(out, "\n");

seq_buf_printf(out, " stddev: ");
seq_buf_time_units_aligned(out, f_stddev);
- seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted));
+ seq_buf_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT));
seq_buf_printf(out, "\n");

if (quantiles) {
@@ -276,8 +275,6 @@ EXPORT_SYMBOL_GPL(time_stats_exit);
void time_stats_init(struct time_stats *stats)
{
memset(stats, 0, sizeof(*stats));
- stats->duration_stats_weighted.weight = 8;
- stats->freq_stats_weighted.weight = 8;
stats->min_duration = U64_MAX;
stats->min_freq = U64_MAX;
stats->start_time = local_clock() & ~TIME_STATS_HAVE_QUANTILES;


2024-02-24 01:12:27

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 08/10] time_stats: shrink time_stat_buffer for better alignment

From: Darrick J. Wong <[email protected]>

Shrink this percpu object by one array element so that the object size
becomes exactly 512 bytes. This will lead to more efficient memory use,
hopefully.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/time_stats.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
index dc539123f7997..b3c810fff963a 100644
--- a/include/linux/time_stats.h
+++ b/include/linux/time_stats.h
@@ -63,7 +63,7 @@ struct time_stat_buffer {
struct time_stat_buffer_entry {
u64 start;
u64 end;
- } entries[32];
+ } entries[31];
};

struct time_stats {


2024-02-24 01:13:00

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 10/10] time_stats: Kill TIME_STATS_HAVE_QUANTILES

From: Kent Overstreet <[email protected]>

We have 4 spare bytes next to the spinlock, no need for bit stuffing

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
include/linux/time_stats.h | 19 +++++--------------
lib/time_stats.c | 4 ++--
2 files changed, 7 insertions(+), 16 deletions(-)


diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
index 4e1f5485ed039..6df2b34aa274b 100644
--- a/include/linux/time_stats.h
+++ b/include/linux/time_stats.h
@@ -68,6 +68,7 @@ struct time_stat_buffer {

struct time_stats {
spinlock_t lock;
+ bool have_quantiles;
/* all fields are in nanoseconds */
u64 min_duration;
u64 max_duration;
@@ -87,12 +88,6 @@ struct time_stats {
struct mean_and_variance_weighted freq_stats_weighted;
struct time_stat_buffer __percpu *buffer;

-/*
- * Is this really a struct time_stats_quantiled? Hide this flag in the least
- * significant bit of the start time to avoid blowing up the structure size.
- */
-#define TIME_STATS_HAVE_QUANTILES (1ULL << 0)
-
u64 start_time;
};

@@ -103,13 +98,9 @@ struct time_stats_quantiles {

static inline struct quantiles *time_stats_to_quantiles(struct time_stats *stats)
{
- struct time_stats_quantiles *statq;
-
- if (!(stats->start_time & TIME_STATS_HAVE_QUANTILES))
- return NULL;
-
- statq = container_of(stats, struct time_stats_quantiles, stats);
- return &statq->quantiles;
+ return stats->have_quantiles
+ ? &container_of(stats, struct time_stats_quantiles, stats)->quantiles
+ : NULL;
}

void __time_stats_clear_buffer(struct time_stats *, struct time_stat_buffer *);
@@ -169,7 +160,7 @@ static inline void time_stats_quantiles_exit(struct time_stats_quantiles *statq)
static inline void time_stats_quantiles_init(struct time_stats_quantiles *statq)
{
time_stats_init(&statq->stats);
- statq->stats.start_time |= TIME_STATS_HAVE_QUANTILES;
+ statq->stats.have_quantiles = true;
memset(&statq->quantiles, 0, sizeof(statq->quantiles));
}

diff --git a/lib/time_stats.c b/lib/time_stats.c
index c0f209dd9f6dd..0b90c80cba9f1 100644
--- a/lib/time_stats.c
+++ b/lib/time_stats.c
@@ -164,7 +164,7 @@ static void seq_buf_time_units_aligned(struct seq_buf *out, u64 ns)

static inline u64 time_stats_lifetime(const struct time_stats *stats)
{
- return local_clock() - (stats->start_time & ~TIME_STATS_HAVE_QUANTILES);
+ return local_clock() - stats->start_time;
}

void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats,
@@ -364,7 +364,7 @@ void time_stats_init(struct time_stats *stats)
memset(stats, 0, sizeof(*stats));
stats->min_duration = U64_MAX;
stats->min_freq = U64_MAX;
- stats->start_time = local_clock() & ~TIME_STATS_HAVE_QUANTILES;
+ stats->start_time = local_clock();
spin_lock_init(&stats->lock);
}
EXPORT_SYMBOL_GPL(time_stats_init);


2024-02-24 01:13:15

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 04/10] time_stats: add larger units

From: Darrick J. Wong <[email protected]>

Filesystems can stay mounted for a very long time, so add some larger
units.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
lib/time_stats.c | 3 +++
1 file changed, 3 insertions(+)


diff --git a/lib/time_stats.c b/lib/time_stats.c
index 767b1a340e805..43106bda43a92 100644
--- a/lib/time_stats.c
+++ b/lib/time_stats.c
@@ -16,6 +16,9 @@ static const struct time_unit time_units[] = {
{ "s", NSEC_PER_SEC },
{ "m", (u64) NSEC_PER_SEC * 60},
{ "h", (u64) NSEC_PER_SEC * 3600},
+ { "d", (u64) NSEC_PER_SEC * 3600 * 24},
+ { "w", (u64) NSEC_PER_SEC * 3600 * 24 * 7},
+ { "y", (u64) NSEC_PER_SEC * ((3600 * 24 * 7 * 365) + (3600 * (24 / 4) * 7))}, /* 365.25d */
{ "eon", U64_MAX },
};



2024-02-24 01:13:36

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 05/10] time_stats: don't print any output if event count is zero

From: Darrick J. Wong <[email protected]>

There's no point in printing an empty report for no data, so add a flag
that allows us to do that.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/time_stats.h | 4 +++-
lib/time_stats.c | 10 ++++++----
2 files changed, 9 insertions(+), 5 deletions(-)


diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
index 1c1ba8efa7bfe..994823c17bca9 100644
--- a/include/linux/time_stats.h
+++ b/include/linux/time_stats.h
@@ -148,8 +148,10 @@ static inline bool track_event_change(struct time_stats *stats, bool v)
return false;
}

+#define TIME_STATS_PRINT_NO_ZEROES (1U << 0) /* print nothing if zero count */
struct seq_buf;
-void time_stats_to_seq_buf(struct seq_buf *, struct time_stats *);
+void time_stats_to_seq_buf(struct seq_buf *, struct time_stats *,
+ unsigned int flags);

void time_stats_exit(struct time_stats *);
void time_stats_init(struct time_stats *);
diff --git a/lib/time_stats.c b/lib/time_stats.c
index 43106bda43a92..382935979f8f7 100644
--- a/lib/time_stats.c
+++ b/lib/time_stats.c
@@ -168,7 +168,8 @@ static inline u64 time_stats_lifetime(const struct time_stats *stats)
return local_clock() - (stats->start_time & ~TIME_STATS_HAVE_QUANTILES);
}

-void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats)
+void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats,
+ unsigned int flags)
{
struct quantiles *quantiles = time_stats_to_quantiles(stats);
s64 f_mean = 0, d_mean = 0;
@@ -184,14 +185,15 @@ void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats)
spin_unlock_irq(&stats->lock);
}

- /*
- * avoid divide by zero
- */
if (stats->freq_stats.n) {
+ /* avoid divide by zero */
f_mean = mean_and_variance_get_mean(stats->freq_stats);
f_stddev = mean_and_variance_get_stddev(stats->freq_stats);
d_mean = mean_and_variance_get_mean(stats->duration_stats);
d_stddev = mean_and_variance_get_stddev(stats->duration_stats);
+ } else if (flags & TIME_STATS_PRINT_NO_ZEROES) {
+ /* unless we didn't want zeroes anyway */
+ return;
}

seq_buf_printf(out, "count: %llu\n", stats->duration_stats.n);


2024-02-24 01:14:26

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 02/10] bcachefs: thread_with_stdio: convert to darray

From: Kent Overstreet <[email protected]>

- eliminate the dependency on printbufs, so that we can lift
thread_with_file for use in xfs
- add a nonblocking parameter to stdio_redirect_printf(), and either
block if the buffer is full or drop it on the floor - don't buffer
infinitely

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/bcachefs/super.c | 9 -
fs/bcachefs/thread_with_file.c | 229 +++++++++++++++++++++-------------
fs/bcachefs/thread_with_file.h | 7 +
fs/bcachefs/thread_with_file_types.h | 15 ++
4 files changed, 160 insertions(+), 100 deletions(-)


diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index 2c238030fb5d7..0cff8c5f3c104 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -56,6 +56,7 @@
#include "super.h"
#include "super-io.h"
#include "sysfs.h"
+#include "thread_with_file.h"
#include "trace.h"

#include <linux/backing-dev.h>
@@ -95,16 +96,10 @@ void __bch2_print(struct bch_fs *c, const char *fmt, ...)
if (likely(!stdio)) {
vprintk(fmt, args);
} else {
- unsigned long flags;
-
if (fmt[0] == KERN_SOH[0])
fmt += 2;

- spin_lock_irqsave(&stdio->output_lock, flags);
- prt_vprintf(&stdio->output_buf, fmt, args);
- spin_unlock_irqrestore(&stdio->output_lock, flags);
-
- wake_up(&stdio->output_wait);
+ bch2_stdio_redirect_vprintf(stdio, true, fmt, args);
}
va_end(args);
}
diff --git a/fs/bcachefs/thread_with_file.c b/fs/bcachefs/thread_with_file.c
index 8c3afb4c3204f..ca81d3fec3eef 100644
--- a/fs/bcachefs/thread_with_file.c
+++ b/fs/bcachefs/thread_with_file.c
@@ -2,7 +2,6 @@
#ifndef NO_BCACHEFS_FS

#include "bcachefs.h"
-#include "printbuf.h"
#include "thread_with_file.h"

#include <linux/anon_inodes.h>
@@ -65,48 +64,74 @@ int bch2_run_thread_with_file(struct thread_with_file *thr,
return ret;
}

-static inline bool thread_with_stdio_has_output(struct thread_with_stdio *thr)
+/* stdio_redirect */
+
+static bool stdio_redirect_has_input(struct stdio_redirect *stdio)
{
- return thr->stdio.output_buf.pos || thr->thr.done;
+ return stdio->input.buf.nr || stdio->done;
}

+static bool stdio_redirect_has_output(struct stdio_redirect *stdio)
+{
+ return stdio->output.buf.nr || stdio->done;
+}
+
+#define WRITE_BUFFER 4096
+
+static bool stdio_redirect_has_input_space(struct stdio_redirect *stdio)
+{
+ return stdio->input.buf.nr < WRITE_BUFFER || stdio->done;
+}
+
+static bool stdio_redirect_has_output_space(struct stdio_redirect *stdio)
+{
+ return stdio->output.buf.nr < WRITE_BUFFER || stdio->done;
+}
+
+static void stdio_buf_init(struct stdio_buf *buf)
+{
+ spin_lock_init(&buf->lock);
+ init_waitqueue_head(&buf->wait);
+ darray_init(&buf->buf);
+}
+
+/* thread_with_stdio */
+
static ssize_t thread_with_stdio_read(struct file *file, char __user *ubuf,
size_t len, loff_t *ppos)
{
struct thread_with_stdio *thr =
container_of(file->private_data, struct thread_with_stdio, thr);
- struct printbuf *buf = &thr->stdio.output_buf;
+ struct stdio_buf *buf = &thr->stdio.output;
size_t copied = 0, b;
int ret = 0;

- if ((file->f_flags & O_NONBLOCK) &&
- !thread_with_stdio_has_output(thr))
+ if (!(file->f_flags & O_NONBLOCK)) {
+ ret = wait_event_interruptible(buf->wait, stdio_redirect_has_output(&thr->stdio));
+ if (ret)
+ return ret;
+ } else if (!stdio_redirect_has_output(&thr->stdio))
return -EAGAIN;

- ret = wait_event_interruptible(thr->stdio.output_wait,
- thread_with_stdio_has_output(thr));
- if (ret)
- return ret;
-
- while (len && buf->pos) {
+ while (len && buf->buf.nr) {
if (fault_in_writeable(ubuf, len) == len) {
ret = -EFAULT;
break;
}

- spin_lock_irq(&thr->stdio.output_lock);
- b = min_t(size_t, len, buf->pos);
+ spin_lock_irq(&buf->lock);
+ b = min_t(size_t, len, buf->buf.nr);

- if (b && !copy_to_user_nofault(ubuf, buf->buf, b)) {
- memmove(buf->buf,
- buf->buf + b,
- buf->pos - b);
- buf->pos -= b;
+ if (b && !copy_to_user_nofault(ubuf, buf->buf.data, b)) {
ubuf += b;
len -= b;
copied += b;
+ buf->buf.nr -= b;
+ memmove(buf->buf.data,
+ buf->buf.data + b,
+ buf->buf.nr);
}
- spin_unlock_irq(&thr->stdio.output_lock);
+ spin_unlock_irq(&buf->lock);
}

return copied ?: ret;
@@ -118,25 +143,18 @@ static int thread_with_stdio_release(struct inode *inode, struct file *file)
container_of(file->private_data, struct thread_with_stdio, thr);

bch2_thread_with_file_exit(&thr->thr);
- printbuf_exit(&thr->stdio.input_buf);
- printbuf_exit(&thr->stdio.output_buf);
+ darray_exit(&thr->stdio.input.buf);
+ darray_exit(&thr->stdio.output.buf);
thr->exit(thr);
return 0;
}

-#define WRITE_BUFFER 4096
-
-static inline bool thread_with_stdio_has_input_space(struct thread_with_stdio *thr)
-{
- return thr->stdio.input_buf.pos < WRITE_BUFFER || thr->thr.done;
-}
-
static ssize_t thread_with_stdio_write(struct file *file, const char __user *ubuf,
size_t len, loff_t *ppos)
{
struct thread_with_stdio *thr =
container_of(file->private_data, struct thread_with_stdio, thr);
- struct printbuf *buf = &thr->stdio.input_buf;
+ struct stdio_buf *buf = &thr->stdio.input;
size_t copied = 0;
ssize_t ret = 0;

@@ -152,29 +170,29 @@ static ssize_t thread_with_stdio_write(struct file *file, const char __user *ubu
break;
}

- spin_lock(&thr->stdio.input_lock);
- if (buf->pos < WRITE_BUFFER)
- bch2_printbuf_make_room(buf, min(b, WRITE_BUFFER - buf->pos));
- b = min(len, printbuf_remaining_size(buf));
+ spin_lock(&buf->lock);
+ if (buf->buf.nr < WRITE_BUFFER)
+ darray_make_room_gfp(&buf->buf, min(b, WRITE_BUFFER - buf->buf.nr), __GFP_NOWARN);
+ b = min(len, darray_room(buf->buf));

- if (b && !copy_from_user_nofault(&buf->buf[buf->pos], ubuf, b)) {
- ubuf += b;
- len -= b;
- copied += b;
- buf->pos += b;
+ if (b && !copy_from_user_nofault(&buf->buf.data[buf->buf.nr], ubuf, b)) {
+ buf->buf.nr += b;
+ ubuf += b;
+ len -= b;
+ copied += b;
}
- spin_unlock(&thr->stdio.input_lock);
+ spin_unlock(&buf->lock);

if (b) {
- wake_up(&thr->stdio.input_wait);
+ wake_up(&buf->wait);
} else {
if ((file->f_flags & O_NONBLOCK)) {
ret = -EAGAIN;
break;
}

- ret = wait_event_interruptible(thr->stdio.input_wait,
- thread_with_stdio_has_input_space(thr));
+ ret = wait_event_interruptible(buf->wait,
+ stdio_redirect_has_input_space(&thr->stdio));
if (ret)
break;
}
@@ -188,14 +206,14 @@ static __poll_t thread_with_stdio_poll(struct file *file, struct poll_table_stru
struct thread_with_stdio *thr =
container_of(file->private_data, struct thread_with_stdio, thr);

- poll_wait(file, &thr->stdio.output_wait, wait);
- poll_wait(file, &thr->stdio.input_wait, wait);
+ poll_wait(file, &thr->stdio.output.wait, wait);
+ poll_wait(file, &thr->stdio.input.wait, wait);

__poll_t mask = 0;

- if (thread_with_stdio_has_output(thr))
+ if (stdio_redirect_has_output(&thr->stdio))
mask |= EPOLLIN;
- if (thread_with_stdio_has_input_space(thr))
+ if (stdio_redirect_has_input_space(&thr->stdio))
mask |= EPOLLOUT;
if (thr->thr.done)
mask |= EPOLLHUP|EPOLLERR;
@@ -203,75 +221,112 @@ static __poll_t thread_with_stdio_poll(struct file *file, struct poll_table_stru
}

static const struct file_operations thread_with_stdio_fops = {
- .release = thread_with_stdio_release,
+ .llseek = no_llseek,
.read = thread_with_stdio_read,
.write = thread_with_stdio_write,
.poll = thread_with_stdio_poll,
- .llseek = no_llseek,
+ .release = thread_with_stdio_release,
};

int bch2_run_thread_with_stdio(struct thread_with_stdio *thr,
void (*exit)(struct thread_with_stdio *),
int (*fn)(void *))
{
- thr->stdio.input_buf = PRINTBUF;
- thr->stdio.input_buf.atomic++;
- spin_lock_init(&thr->stdio.input_lock);
- init_waitqueue_head(&thr->stdio.input_wait);
-
- thr->stdio.output_buf = PRINTBUF;
- thr->stdio.output_buf.atomic++;
- spin_lock_init(&thr->stdio.output_lock);
- init_waitqueue_head(&thr->stdio.output_wait);
-
+ stdio_buf_init(&thr->stdio.input);
+ stdio_buf_init(&thr->stdio.output);
thr->exit = exit;

return bch2_run_thread_with_file(&thr->thr, &thread_with_stdio_fops, fn);
}

-int bch2_stdio_redirect_read(struct stdio_redirect *stdio, char *buf, size_t len)
+int bch2_stdio_redirect_read(struct stdio_redirect *stdio, char *ubuf, size_t len)
{
- wait_event(stdio->input_wait,
- stdio->input_buf.pos || stdio->done);
+ struct stdio_buf *buf = &stdio->input;

+ wait_event(buf->wait, stdio_redirect_has_input(stdio));
if (stdio->done)
return -1;

- spin_lock(&stdio->input_lock);
- int ret = min(len, stdio->input_buf.pos);
- stdio->input_buf.pos -= ret;
- memcpy(buf, stdio->input_buf.buf, ret);
- memmove(stdio->input_buf.buf,
- stdio->input_buf.buf + ret,
- stdio->input_buf.pos);
- spin_unlock(&stdio->input_lock);
+ spin_lock(&buf->lock);
+ int ret = min(len, buf->buf.nr);
+ buf->buf.nr -= ret;
+ memcpy(ubuf, buf->buf.data, ret);
+ memmove(buf->buf.data,
+ buf->buf.data + ret,
+ buf->buf.nr);
+ spin_unlock(&buf->lock);

- wake_up(&stdio->input_wait);
+ wake_up(&buf->wait);
return ret;
}

-int bch2_stdio_redirect_readline(struct stdio_redirect *stdio, char *buf, size_t len)
+int bch2_stdio_redirect_readline(struct stdio_redirect *stdio, char *ubuf, size_t len)
{
- wait_event(stdio->input_wait,
- stdio->input_buf.pos || stdio->done);
+ struct stdio_buf *buf = &stdio->input;

+ wait_event(buf->wait, stdio_redirect_has_input(stdio));
if (stdio->done)
return -1;

- spin_lock(&stdio->input_lock);
- int ret = min(len, stdio->input_buf.pos);
- char *n = memchr(stdio->input_buf.buf, '\n', ret);
- if (n)
- ret = min(ret, n + 1 - stdio->input_buf.buf);
- stdio->input_buf.pos -= ret;
- memcpy(buf, stdio->input_buf.buf, ret);
- memmove(stdio->input_buf.buf,
- stdio->input_buf.buf + ret,
- stdio->input_buf.pos);
- spin_unlock(&stdio->input_lock);
-
- wake_up(&stdio->input_wait);
+ spin_lock(&buf->lock);
+ int ret = min(len, buf->buf.nr);
+ char *n = memchr(buf->buf.data, '\n', ret);
+ if (!n)
+ ret = min(ret, n + 1 - buf->buf.data);
+ buf->buf.nr -= ret;
+ memcpy(ubuf, buf->buf.data, ret);
+ memmove(buf->buf.data,
+ buf->buf.data + ret,
+ buf->buf.nr);
+ spin_unlock(&buf->lock);
+
+ wake_up(&buf->wait);
return ret;
}

+__printf(3, 0)
+static void bch2_darray_vprintf(darray_char *out, gfp_t gfp, const char *fmt, va_list args)
+{
+ size_t len;
+
+ do {
+ va_list args2;
+ va_copy(args2, args);
+
+ len = vsnprintf(out->data + out->nr, darray_room(*out), fmt, args2);
+ } while (len + 1 > darray_room(*out) && !darray_make_room_gfp(out, len + 1, gfp));
+
+ out->nr += min(len, darray_room(*out));
+}
+
+void bch2_stdio_redirect_vprintf(struct stdio_redirect *stdio, bool nonblocking,
+ const char *fmt, va_list args)
+{
+ struct stdio_buf *buf = &stdio->output;
+ unsigned long flags;
+
+ if (!nonblocking)
+ wait_event(buf->wait, stdio_redirect_has_output_space(stdio));
+ else if (!stdio_redirect_has_output_space(stdio))
+ return;
+ if (stdio->done)
+ return;
+
+ spin_lock_irqsave(&buf->lock, flags);
+ bch2_darray_vprintf(&buf->buf, nonblocking ? __GFP_NOWARN : GFP_KERNEL, fmt, args);
+ spin_unlock_irqrestore(&buf->lock, flags);
+
+ wake_up(&buf->wait);
+}
+
+void bch2_stdio_redirect_printf(struct stdio_redirect *stdio, bool nonblocking,
+ const char *fmt, ...)
+{
+
+ va_list args;
+ va_start(args, fmt);
+ bch2_stdio_redirect_vprintf(stdio, nonblocking, fmt, args);
+ va_end(args);
+}
+
#endif /* NO_BCACHEFS_FS */
diff --git a/fs/bcachefs/thread_with_file.h b/fs/bcachefs/thread_with_file.h
index b5098b52db709..4243c7c5ad3f3 100644
--- a/fs/bcachefs/thread_with_file.h
+++ b/fs/bcachefs/thread_with_file.h
@@ -27,8 +27,8 @@ static inline void thread_with_stdio_done(struct thread_with_stdio *thr)
{
thr->thr.done = true;
thr->stdio.done = true;
- wake_up(&thr->stdio.input_wait);
- wake_up(&thr->stdio.output_wait);
+ wake_up(&thr->stdio.input.wait);
+ wake_up(&thr->stdio.output.wait);
}

int bch2_run_thread_with_stdio(struct thread_with_stdio *,
@@ -37,4 +37,7 @@ int bch2_run_thread_with_stdio(struct thread_with_stdio *,
int bch2_stdio_redirect_read(struct stdio_redirect *, char *, size_t);
int bch2_stdio_redirect_readline(struct stdio_redirect *, char *, size_t);

+__printf(3, 0) void bch2_stdio_redirect_vprintf(struct stdio_redirect *, bool, const char *, va_list);
+__printf(3, 4) void bch2_stdio_redirect_printf(struct stdio_redirect *, bool, const char *, ...);
+
#endif /* _BCACHEFS_THREAD_WITH_FILE_H */
diff --git a/fs/bcachefs/thread_with_file_types.h b/fs/bcachefs/thread_with_file_types.h
index 90b5e645e98ce..e0daf4eec341e 100644
--- a/fs/bcachefs/thread_with_file_types.h
+++ b/fs/bcachefs/thread_with_file_types.h
@@ -2,14 +2,21 @@
#ifndef _BCACHEFS_THREAD_WITH_FILE_TYPES_H
#define _BCACHEFS_THREAD_WITH_FILE_TYPES_H

+#include "darray.h"
+
+struct stdio_buf {
+ spinlock_t lock;
+ wait_queue_head_t wait;
+ darray_char buf;
+};
+
struct stdio_redirect {
- spinlock_t output_lock;
- wait_queue_head_t output_wait;
- struct printbuf output_buf;
+ struct stdio_buf input;
+ struct stdio_buf output;

spinlock_t input_lock;
wait_queue_head_t input_wait;
- struct printbuf input_buf;
+ darray_char input_buf;
bool done;
};



2024-02-24 01:14:42

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 03/10] bcachefs: thread_with_stdio: kill thread_with_stdio_done()

From: Kent Overstreet <[email protected]>

Move the cleanup code to a wrapper function, where we can call it after
the thread_with_stdio fn exits.

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/bcachefs/chardev.c | 14 ++++----------
fs/bcachefs/thread_with_file.c | 20 +++++++++++++++++---
fs/bcachefs/thread_with_file.h | 11 ++---------
3 files changed, 23 insertions(+), 22 deletions(-)


diff --git a/fs/bcachefs/chardev.c b/fs/bcachefs/chardev.c
index 226b39c176673..11711f54057e1 100644
--- a/fs/bcachefs/chardev.c
+++ b/fs/bcachefs/chardev.c
@@ -155,17 +155,14 @@ static void bch2_fsck_thread_exit(struct thread_with_stdio *_thr)
kfree(thr);
}

-static int bch2_fsck_offline_thread_fn(void *arg)
+static void bch2_fsck_offline_thread_fn(struct thread_with_stdio *stdio)
{
- struct fsck_thread *thr = container_of(arg, struct fsck_thread, thr);
+ struct fsck_thread *thr = container_of(stdio, struct fsck_thread, thr);
struct bch_fs *c = bch2_fs_open(thr->devs, thr->nr_devs, thr->opts);

thr->thr.thr.ret = PTR_ERR_OR_ZERO(c);
if (!thr->thr.thr.ret)
bch2_fs_stop(c);
-
- thread_with_stdio_done(&thr->thr);
- return 0;
}

static long bch2_ioctl_fsck_offline(struct bch_ioctl_fsck_offline __user *user_arg)
@@ -763,9 +760,9 @@ static long bch2_ioctl_disk_resize_journal(struct bch_fs *c,
return ret;
}

-static int bch2_fsck_online_thread_fn(void *arg)
+static void bch2_fsck_online_thread_fn(struct thread_with_stdio *stdio)
{
- struct fsck_thread *thr = container_of(arg, struct fsck_thread, thr);
+ struct fsck_thread *thr = container_of(stdio, struct fsck_thread, thr);
struct bch_fs *c = thr->c;

c->stdio_filter = current;
@@ -793,11 +790,8 @@ static int bch2_fsck_online_thread_fn(void *arg)
c->stdio_filter = NULL;
c->opts.fix_errors = old_fix_errors;

- thread_with_stdio_done(&thr->thr);
-
up(&c->online_fsck_mutex);
bch2_ro_ref_put(c);
- return 0;
}

static long bch2_ioctl_fsck_online(struct bch_fs *c,
diff --git a/fs/bcachefs/thread_with_file.c b/fs/bcachefs/thread_with_file.c
index ca81d3fec3eef..eb8ab4c47a94b 100644
--- a/fs/bcachefs/thread_with_file.c
+++ b/fs/bcachefs/thread_with_file.c
@@ -228,15 +228,29 @@ static const struct file_operations thread_with_stdio_fops = {
.release = thread_with_stdio_release,
};

+static int thread_with_stdio_fn(void *arg)
+{
+ struct thread_with_stdio *thr = arg;
+
+ thr->fn(thr);
+
+ thr->thr.done = true;
+ thr->stdio.done = true;
+ wake_up(&thr->stdio.input.wait);
+ wake_up(&thr->stdio.output.wait);
+ return 0;
+}
+
int bch2_run_thread_with_stdio(struct thread_with_stdio *thr,
void (*exit)(struct thread_with_stdio *),
- int (*fn)(void *))
+ void (*fn)(struct thread_with_stdio *))
{
stdio_buf_init(&thr->stdio.input);
stdio_buf_init(&thr->stdio.output);
- thr->exit = exit;
+ thr->exit = exit;
+ thr->fn = fn;

- return bch2_run_thread_with_file(&thr->thr, &thread_with_stdio_fops, fn);
+ return bch2_run_thread_with_file(&thr->thr, &thread_with_stdio_fops, thread_with_stdio_fn);
}

int bch2_stdio_redirect_read(struct stdio_redirect *stdio, char *ubuf, size_t len)
diff --git a/fs/bcachefs/thread_with_file.h b/fs/bcachefs/thread_with_file.h
index 4243c7c5ad3f3..66212fcae226a 100644
--- a/fs/bcachefs/thread_with_file.h
+++ b/fs/bcachefs/thread_with_file.h
@@ -21,19 +21,12 @@ struct thread_with_stdio {
struct thread_with_file thr;
struct stdio_redirect stdio;
void (*exit)(struct thread_with_stdio *);
+ void (*fn)(struct thread_with_stdio *);
};

-static inline void thread_with_stdio_done(struct thread_with_stdio *thr)
-{
- thr->thr.done = true;
- thr->stdio.done = true;
- wake_up(&thr->stdio.input.wait);
- wake_up(&thr->stdio.output.wait);
-}
-
int bch2_run_thread_with_stdio(struct thread_with_stdio *,
void (*exit)(struct thread_with_stdio *),
- int (*fn)(void *));
+ void (*fn)(struct thread_with_stdio *));
int bch2_stdio_redirect_read(struct stdio_redirect *, char *, size_t);
int bch2_stdio_redirect_readline(struct stdio_redirect *, char *, size_t);



2024-02-24 01:15:02

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 09/10] time_stats: report information in json format

From: Darrick J. Wong <[email protected]>

Export json versions of time statistics information. Given the tabular
nature of the numbers exposed, this will make it a lot easier for higher
(than C) level languages (e.g. python) to import information without
needing to write yet another clumsy string parser.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/time_stats.h | 2 +
lib/time_stats.c | 87 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 89 insertions(+)


diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
index b3c810fff963a..4e1f5485ed039 100644
--- a/include/linux/time_stats.h
+++ b/include/linux/time_stats.h
@@ -156,6 +156,8 @@ static inline bool track_event_change(struct time_stats *stats, bool v)
struct seq_buf;
void time_stats_to_seq_buf(struct seq_buf *, struct time_stats *,
const char *epoch_name, unsigned int flags);
+void time_stats_to_json(struct seq_buf *, struct time_stats *,
+ const char *epoch_name, unsigned int flags);

void time_stats_exit(struct time_stats *);
void time_stats_init(struct time_stats *);
diff --git a/lib/time_stats.c b/lib/time_stats.c
index 0fb3d854e503b..c0f209dd9f6dd 100644
--- a/lib/time_stats.c
+++ b/lib/time_stats.c
@@ -266,6 +266,93 @@ void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats,
}
EXPORT_SYMBOL_GPL(time_stats_to_seq_buf);

+void time_stats_to_json(struct seq_buf *out, struct time_stats *stats,
+ const char *epoch_name, unsigned int flags)
+{
+ struct quantiles *quantiles = time_stats_to_quantiles(stats);
+ s64 f_mean = 0, d_mean = 0;
+ u64 f_stddev = 0, d_stddev = 0;
+
+ if (stats->buffer) {
+ int cpu;
+
+ spin_lock_irq(&stats->lock);
+ for_each_possible_cpu(cpu)
+ __time_stats_clear_buffer(stats, per_cpu_ptr(stats->buffer, cpu));
+ spin_unlock_irq(&stats->lock);
+ }
+
+ if (stats->freq_stats.n) {
+ /* avoid divide by zero */
+ f_mean = mean_and_variance_get_mean(stats->freq_stats);
+ f_stddev = mean_and_variance_get_stddev(stats->freq_stats);
+ d_mean = mean_and_variance_get_mean(stats->duration_stats);
+ d_stddev = mean_and_variance_get_stddev(stats->duration_stats);
+ } else if (flags & TIME_STATS_PRINT_NO_ZEROES) {
+ /* unless we didn't want zeroes anyway */
+ return;
+ }
+
+ seq_buf_printf(out, "{\n");
+ seq_buf_printf(out, " \"epoch\": \"%s\",\n", epoch_name);
+ seq_buf_printf(out, " \"count\": %llu,\n", stats->duration_stats.n);
+
+ seq_buf_printf(out, " \"duration_ns\": {\n");
+ seq_buf_printf(out, " \"min\": %llu,\n", stats->min_duration);
+ seq_buf_printf(out, " \"max\": %llu,\n", stats->max_duration);
+ seq_buf_printf(out, " \"total\": %llu,\n", stats->total_duration);
+ seq_buf_printf(out, " \"mean\": %llu,\n", d_mean);
+ seq_buf_printf(out, " \"stddev\": %llu\n", d_stddev);
+ seq_buf_printf(out, " },\n");
+
+ d_mean = mean_and_variance_weighted_get_mean(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT);
+ d_stddev = mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT);
+
+ seq_buf_printf(out, " \"duration_ewma_ns\": {\n");
+ seq_buf_printf(out, " \"mean\": %llu,\n", d_mean);
+ seq_buf_printf(out, " \"stddev\": %llu\n", d_stddev);
+ seq_buf_printf(out, " },\n");
+
+ seq_buf_printf(out, " \"frequency_ns\": {\n");
+ seq_buf_printf(out, " \"min\": %llu,\n", stats->min_freq);
+ seq_buf_printf(out, " \"max\": %llu,\n", stats->max_freq);
+ seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
+ seq_buf_printf(out, " \"stddev\": %llu\n", f_stddev);
+ seq_buf_printf(out, " },\n");
+
+ f_mean = mean_and_variance_weighted_get_mean(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
+ f_stddev = mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
+
+ seq_buf_printf(out, " \"frequency_ewma_ns\": {\n");
+ seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
+ seq_buf_printf(out, " \"stddev\": %llu\n", f_stddev);
+
+ if (quantiles) {
+ u64 last_q = 0;
+
+ /* close frequency_ewma_ns but signal more items */
+ seq_buf_printf(out, " },\n");
+
+ seq_buf_printf(out, " \"quantiles_ns\": [\n");
+ eytzinger0_for_each(i, NR_QUANTILES) {
+ bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;
+
+ u64 q = max(quantiles->entries[i].m, last_q);
+ seq_buf_printf(out, " %llu", q);
+ if (!is_last)
+ seq_buf_printf(out, ", ");
+ last_q = q;
+ }
+ seq_buf_printf(out, " ]\n");
+ } else {
+ /* close frequency_ewma_ns without dumping further */
+ seq_buf_printf(out, " }\n");
+ }
+
+ seq_buf_printf(out, "}\n");
+}
+EXPORT_SYMBOL_GPL(time_stats_to_json);
+
void time_stats_exit(struct time_stats *stats)
{
free_percpu(stats->buffer);


2024-02-24 01:15:55

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 01/10] bcachefs: thread_with_stdio: eliminate double buffering

From: Kent Overstreet <[email protected]>

The output buffer lock has to be a spinlock so that we can write to it
from interrupt context, so we can't use a direct copy_to_user; this
switches thread_with_file_read() to use fault_in_writeable() and
copy_to_user_nofault(), similar to how thread_with_file_write() works.

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/bcachefs/thread_with_file.c | 56 ++++++++++++----------------------------
fs/bcachefs/thread_with_file.h | 1 -
2 files changed, 17 insertions(+), 40 deletions(-)


diff --git a/fs/bcachefs/thread_with_file.c b/fs/bcachefs/thread_with_file.c
index 9220d7de10db6..8c3afb4c3204f 100644
--- a/fs/bcachefs/thread_with_file.c
+++ b/fs/bcachefs/thread_with_file.c
@@ -67,16 +67,15 @@ int bch2_run_thread_with_file(struct thread_with_file *thr,

static inline bool thread_with_stdio_has_output(struct thread_with_stdio *thr)
{
- return thr->stdio.output_buf.pos ||
- thr->output2.nr ||
- thr->thr.done;
+ return thr->stdio.output_buf.pos || thr->thr.done;
}

-static ssize_t thread_with_stdio_read(struct file *file, char __user *buf,
+static ssize_t thread_with_stdio_read(struct file *file, char __user *ubuf,
size_t len, loff_t *ppos)
{
struct thread_with_stdio *thr =
container_of(file->private_data, struct thread_with_stdio, thr);
+ struct printbuf *buf = &thr->stdio.output_buf;
size_t copied = 0, b;
int ret = 0;

@@ -89,44 +88,25 @@ static ssize_t thread_with_stdio_read(struct file *file, char __user *buf,
if (ret)
return ret;

- if (thr->thr.done)
- return 0;
-
- while (len) {
- ret = darray_make_room(&thr->output2, thr->stdio.output_buf.pos);
- if (ret)
+ while (len && buf->pos) {
+ if (fault_in_writeable(ubuf, len) == len) {
+ ret = -EFAULT;
break;
+ }

spin_lock_irq(&thr->stdio.output_lock);
- b = min_t(size_t, darray_room(thr->output2), thr->stdio.output_buf.pos);
+ b = min_t(size_t, len, buf->pos);

- memcpy(&darray_top(thr->output2), thr->stdio.output_buf.buf, b);
- memmove(thr->stdio.output_buf.buf,
- thr->stdio.output_buf.buf + b,
- thr->stdio.output_buf.pos - b);
-
- thr->output2.nr += b;
- thr->stdio.output_buf.pos -= b;
+ if (b && !copy_to_user_nofault(ubuf, buf->buf, b)) {
+ memmove(buf->buf,
+ buf->buf + b,
+ buf->pos - b);
+ buf->pos -= b;
+ ubuf += b;
+ len -= b;
+ copied += b;
+ }
spin_unlock_irq(&thr->stdio.output_lock);
-
- b = min(len, thr->output2.nr);
- if (!b)
- break;
-
- b -= copy_to_user(buf, thr->output2.data, b);
- if (!b) {
- ret = -EFAULT;
- break;
- }
-
- copied += b;
- buf += b;
- len -= b;
-
- memmove(thr->output2.data,
- thr->output2.data + b,
- thr->output2.nr - b);
- thr->output2.nr -= b;
}

return copied ?: ret;
@@ -140,7 +120,6 @@ static int thread_with_stdio_release(struct inode *inode, struct file *file)
bch2_thread_with_file_exit(&thr->thr);
printbuf_exit(&thr->stdio.input_buf);
printbuf_exit(&thr->stdio.output_buf);
- darray_exit(&thr->output2);
thr->exit(thr);
return 0;
}
@@ -245,7 +224,6 @@ int bch2_run_thread_with_stdio(struct thread_with_stdio *thr,
spin_lock_init(&thr->stdio.output_lock);
init_waitqueue_head(&thr->stdio.output_wait);

- darray_init(&thr->output2);
thr->exit = exit;

return bch2_run_thread_with_file(&thr->thr, &thread_with_stdio_fops, fn);
diff --git a/fs/bcachefs/thread_with_file.h b/fs/bcachefs/thread_with_file.h
index 05879c5048c87..b5098b52db709 100644
--- a/fs/bcachefs/thread_with_file.h
+++ b/fs/bcachefs/thread_with_file.h
@@ -20,7 +20,6 @@ int bch2_run_thread_with_file(struct thread_with_file *,
struct thread_with_stdio {
struct thread_with_file thr;
struct stdio_redirect stdio;
- DARRAY(char) output2;
void (*exit)(struct thread_with_stdio *);
};



2024-02-24 01:16:17

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 06/10] darray: lift from bcachefs

From: Kent Overstreet <[email protected]>

dynamic arrays - inspired from CCAN darrays, basically c++ stl vectors.

Used by thread_with_stdio, which is also being lifted from bcachefs for
xfs.

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
MAINTAINERS | 7 ++++
fs/bcachefs/Makefile | 1 -
fs/bcachefs/btree_types.h | 2 +
fs/bcachefs/btree_update.c | 2 +
fs/bcachefs/btree_write_buffer_types.h | 2 +
fs/bcachefs/fsck.c | 2 +
fs/bcachefs/journal_sb.c | 2 +
fs/bcachefs/sb-downgrade.c | 3 +-
fs/bcachefs/sb-errors_types.h | 2 +
fs/bcachefs/sb-members.h | 2 +
fs/bcachefs/subvolume.h | 1 -
fs/bcachefs/subvolume_types.h | 2 +
fs/bcachefs/thread_with_file_types.h | 2 +
fs/bcachefs/util.h | 29 +---------------
include/linux/darray.h | 59 +++++++++++++++++++++-----------
include/linux/darray_types.h | 22 ++++++++++++
lib/Makefile | 2 +
lib/darray.c | 12 +++++--
18 files changed, 93 insertions(+), 61 deletions(-)
rename fs/bcachefs/darray.h => include/linux/darray.h (66%)
create mode 100644 include/linux/darray_types.h
rename fs/bcachefs/darray.c => lib/darray.c (56%)


diff --git a/MAINTAINERS b/MAINTAINERS
index aa762fe654e3e..97905e0d57a52 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5810,6 +5810,13 @@ F: net/ax25/ax25_out.c
F: net/ax25/ax25_timer.c
F: net/ax25/sysctl_net_ax25.c

+DARRAY
+M: Kent Overstreet <[email protected]>
+L: [email protected]
+S: Maintained
+F: include/linux/darray.h
+F: include/linux/darray_types.h
+
DATA ACCESS MONITOR
M: SeongJae Park <[email protected]>
L: [email protected]
diff --git a/fs/bcachefs/Makefile b/fs/bcachefs/Makefile
index b11ba74b8ad41..bb17d146b0900 100644
--- a/fs/bcachefs/Makefile
+++ b/fs/bcachefs/Makefile
@@ -27,7 +27,6 @@ bcachefs-y := \
checksum.o \
clock.o \
compress.o \
- darray.o \
debug.o \
dirent.o \
disk_groups.o \
diff --git a/fs/bcachefs/btree_types.h b/fs/bcachefs/btree_types.h
index 4a5a64499eb76..0d5eecbd3e9cf 100644
--- a/fs/bcachefs/btree_types.h
+++ b/fs/bcachefs/btree_types.h
@@ -2,12 +2,12 @@
#ifndef _BCACHEFS_BTREE_TYPES_H
#define _BCACHEFS_BTREE_TYPES_H

+#include <linux/darray_types.h>
#include <linux/list.h>
#include <linux/rhashtable.h>

#include "btree_key_cache_types.h"
#include "buckets_types.h"
-#include "darray.h"
#include "errcode.h"
#include "journal_types.h"
#include "replicas_types.h"
diff --git a/fs/bcachefs/btree_update.c b/fs/bcachefs/btree_update.c
index c3ff365acce9a..e5193116b092f 100644
--- a/fs/bcachefs/btree_update.c
+++ b/fs/bcachefs/btree_update.c
@@ -14,6 +14,8 @@
#include "snapshot.h"
#include "trace.h"

+#include <linux/darray.h>
+
static inline int btree_insert_entry_cmp(const struct btree_insert_entry *l,
const struct btree_insert_entry *r)
{
diff --git a/fs/bcachefs/btree_write_buffer_types.h b/fs/bcachefs/btree_write_buffer_types.h
index 9b9433de9c368..5f248873087c3 100644
--- a/fs/bcachefs/btree_write_buffer_types.h
+++ b/fs/bcachefs/btree_write_buffer_types.h
@@ -2,7 +2,7 @@
#ifndef _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H
#define _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H

-#include "darray.h"
+#include <linux/darray_types.h>
#include "journal_types.h"

#define BTREE_WRITE_BUFERED_VAL_U64s_MAX 4
diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c
index 6a760777bafb0..04d3d9957a203 100644
--- a/fs/bcachefs/fsck.c
+++ b/fs/bcachefs/fsck.c
@@ -5,7 +5,6 @@
#include "btree_cache.h"
#include "btree_update.h"
#include "buckets.h"
-#include "darray.h"
#include "dirent.h"
#include "error.h"
#include "fs-common.h"
@@ -18,6 +17,7 @@
#include "xattr.h"

#include <linux/bsearch.h>
+#include <linux/darray.h>
#include <linux/dcache.h> /* struct qstr */

/*
diff --git a/fs/bcachefs/journal_sb.c b/fs/bcachefs/journal_sb.c
index ae4fb8c3a2bc2..156691c203bef 100644
--- a/fs/bcachefs/journal_sb.c
+++ b/fs/bcachefs/journal_sb.c
@@ -2,8 +2,8 @@

#include "bcachefs.h"
#include "journal_sb.h"
-#include "darray.h"

+#include <linux/darray.h>
#include <linux/sort.h>

/* BCH_SB_FIELD_journal: */
diff --git a/fs/bcachefs/sb-downgrade.c b/fs/bcachefs/sb-downgrade.c
index 441dcb1bf160e..626eaaea5b01d 100644
--- a/fs/bcachefs/sb-downgrade.c
+++ b/fs/bcachefs/sb-downgrade.c
@@ -6,12 +6,13 @@
*/

#include "bcachefs.h"
-#include "darray.h"
#include "recovery.h"
#include "sb-downgrade.h"
#include "sb-errors.h"
#include "super-io.h"

+#include <linux/darray.h>
+
#define RECOVERY_PASS_ALL_FSCK BIT_ULL(63)

/*
diff --git a/fs/bcachefs/sb-errors_types.h b/fs/bcachefs/sb-errors_types.h
index c08aacdfd073c..9a3a74ca0806b 100644
--- a/fs/bcachefs/sb-errors_types.h
+++ b/fs/bcachefs/sb-errors_types.h
@@ -2,7 +2,7 @@
#ifndef _BCACHEFS_SB_ERRORS_TYPES_H
#define _BCACHEFS_SB_ERRORS_TYPES_H

-#include "darray.h"
+#include <linux/darray_types.h>

#define BCH_SB_ERRS() \
x(clean_but_journal_not_empty, 0) \
diff --git a/fs/bcachefs/sb-members.h b/fs/bcachefs/sb-members.h
index be0a941832715..e4d4d842229a6 100644
--- a/fs/bcachefs/sb-members.h
+++ b/fs/bcachefs/sb-members.h
@@ -2,7 +2,7 @@
#ifndef _BCACHEFS_SB_MEMBERS_H
#define _BCACHEFS_SB_MEMBERS_H

-#include "darray.h"
+#include <linux/darray.h>

extern char * const bch2_member_error_strs[];

diff --git a/fs/bcachefs/subvolume.h b/fs/bcachefs/subvolume.h
index a6f56f66e27cb..3ca1d183369c5 100644
--- a/fs/bcachefs/subvolume.h
+++ b/fs/bcachefs/subvolume.h
@@ -2,7 +2,6 @@
#ifndef _BCACHEFS_SUBVOLUME_H
#define _BCACHEFS_SUBVOLUME_H

-#include "darray.h"
#include "subvolume_types.h"

enum bkey_invalid_flags;
diff --git a/fs/bcachefs/subvolume_types.h b/fs/bcachefs/subvolume_types.h
index ae644adfc3916..40f16e3a6dd04 100644
--- a/fs/bcachefs/subvolume_types.h
+++ b/fs/bcachefs/subvolume_types.h
@@ -2,7 +2,7 @@
#ifndef _BCACHEFS_SUBVOLUME_TYPES_H
#define _BCACHEFS_SUBVOLUME_TYPES_H

-#include "darray.h"
+#include <linux/darray_types.h>

typedef DARRAY(u32) snapshot_id_list;

diff --git a/fs/bcachefs/thread_with_file_types.h b/fs/bcachefs/thread_with_file_types.h
index e0daf4eec341e..41990756aa261 100644
--- a/fs/bcachefs/thread_with_file_types.h
+++ b/fs/bcachefs/thread_with_file_types.h
@@ -2,7 +2,7 @@
#ifndef _BCACHEFS_THREAD_WITH_FILE_TYPES_H
#define _BCACHEFS_THREAD_WITH_FILE_TYPES_H

-#include "darray.h"
+#include <linux/darray_types.h>

struct stdio_buf {
spinlock_t lock;
diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
index cf8d16a911622..b354307903057 100644
--- a/fs/bcachefs/util.h
+++ b/fs/bcachefs/util.h
@@ -5,23 +5,22 @@
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/closure.h>
+#include <linux/darray.h>
#include <linux/errno.h>
#include <linux/freezer.h>
#include <linux/kernel.h>
-#include <linux/sched/clock.h>
#include <linux/llist.h>
#include <linux/log2.h>
#include <linux/percpu.h>
#include <linux/preempt.h>
#include <linux/ratelimit.h>
+#include <linux/sched/clock.h>
#include <linux/slab.h>
#include <linux/time_stats.h>
#include <linux/vmalloc.h>
#include <linux/workqueue.h>
#include <linux/mean_and_variance.h>

-#include "darray.h"
-
struct closure;

#ifdef CONFIG_BCACHEFS_DEBUG
@@ -662,30 +661,6 @@ static inline void memset_u64s_tail(void *s, int c, unsigned bytes)
memset(s + bytes, c, rem);
}

-/* just the memmove, doesn't update @_nr */
-#define __array_insert_item(_array, _nr, _pos) \
- memmove(&(_array)[(_pos) + 1], \
- &(_array)[(_pos)], \
- sizeof((_array)[0]) * ((_nr) - (_pos)))
-
-#define array_insert_item(_array, _nr, _pos, _new_item) \
-do { \
- __array_insert_item(_array, _nr, _pos); \
- (_nr)++; \
- (_array)[(_pos)] = (_new_item); \
-} while (0)
-
-#define array_remove_items(_array, _nr, _pos, _nr_to_remove) \
-do { \
- (_nr) -= (_nr_to_remove); \
- memmove(&(_array)[(_pos)], \
- &(_array)[(_pos) + (_nr_to_remove)], \
- sizeof((_array)[0]) * ((_nr) - (_pos))); \
-} while (0)
-
-#define array_remove_item(_array, _nr, _pos) \
- array_remove_items(_array, _nr, _pos, 1)
-
static inline void __move_gap(void *array, size_t element_size,
size_t nr, size_t size,
size_t old_gap, size_t new_gap)
diff --git a/fs/bcachefs/darray.h b/include/linux/darray.h
similarity index 66%
rename from fs/bcachefs/darray.h
rename to include/linux/darray.h
index 4b340d13caace..ff167eb795f22 100644
--- a/fs/bcachefs/darray.h
+++ b/include/linux/darray.h
@@ -1,34 +1,26 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _BCACHEFS_DARRAY_H
-#define _BCACHEFS_DARRAY_H
+/*
+ * (C) 2022-2024 Kent Overstreet <[email protected]>
+ */
+#ifndef _LINUX_DARRAY_H
+#define _LINUX_DARRAY_H

/*
- * Dynamic arrays:
+ * Dynamic arrays
*
* Inspired by CCAN's darray
*/

+#include <linux/darray_types.h>
#include <linux/slab.h>

-#define DARRAY_PREALLOCATED(_type, _nr) \
-struct { \
- size_t nr, size; \
- _type *data; \
- _type preallocated[_nr]; \
-}
-
-#define DARRAY(_type) DARRAY_PREALLOCATED(_type, 0)
-
-typedef DARRAY(char) darray_char;
-typedef DARRAY(char *) darray_str;
-
-int __bch2_darray_resize(darray_char *, size_t, size_t, gfp_t);
+int __darray_resize_slowpath(darray_char *, size_t, size_t, gfp_t);

static inline int __darray_resize(darray_char *d, size_t element_size,
size_t new_size, gfp_t gfp)
{
return unlikely(new_size > d->size)
- ? __bch2_darray_resize(d, element_size, new_size, gfp)
+ ? __darray_resize_slowpath(d, element_size, new_size, gfp)
: 0;
}

@@ -69,6 +61,28 @@ static inline int __darray_make_room(darray_char *d, size_t t_size, size_t more,
#define darray_first(_d) ((_d).data[0])
#define darray_last(_d) ((_d).data[(_d).nr - 1])

+/* Insert/remove items into the middle of a darray: */
+
+#define array_insert_item(_array, _nr, _pos, _new_item) \
+do { \
+ memmove(&(_array)[(_pos) + 1], \
+ &(_array)[(_pos)], \
+ sizeof((_array)[0]) * ((_nr) - (_pos))); \
+ (_nr)++; \
+ (_array)[(_pos)] = (_new_item); \
+} while (0)
+
+#define array_remove_items(_array, _nr, _pos, _nr_to_remove) \
+do { \
+ (_nr) -= (_nr_to_remove); \
+ memmove(&(_array)[(_pos)], \
+ &(_array)[(_pos) + (_nr_to_remove)], \
+ sizeof((_array)[0]) * ((_nr) - (_pos))); \
+} while (0)
+
+#define array_remove_item(_array, _nr, _pos) \
+ array_remove_items(_array, _nr, _pos, 1)
+
#define darray_insert_item(_d, pos, _item) \
({ \
size_t _pos = (pos); \
@@ -79,10 +93,15 @@ static inline int __darray_make_room(darray_char *d, size_t t_size, size_t more,
_ret; \
})

+#define darray_remove_items(_d, _pos, _nr_to_remove) \
+ array_remove_items((_d)->data, (_d)->nr, (_pos) - (_d)->data, _nr_to_remove)
+
#define darray_remove_item(_d, _pos) \
- array_remove_item((_d)->data, (_d)->nr, (_pos) - (_d)->data)
+ darray_remove_items(_d, _pos, 1)

-#define __darray_for_each(_d, _i) \
+/* Iteration: */
+
+#define __darray_for_each(_d, _i) \
for ((_i) = (_d).data; _i < (_d).data + (_d).nr; _i++)

#define darray_for_each(_d, _i) \
@@ -106,4 +125,4 @@ do { \
darray_init(_d); \
} while (0)

-#endif /* _BCACHEFS_DARRAY_H */
+#endif /* _LINUX_DARRAY_H */
diff --git a/include/linux/darray_types.h b/include/linux/darray_types.h
new file mode 100644
index 0000000000000..a400a0c3600d8
--- /dev/null
+++ b/include/linux/darray_types.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * (C) 2022-2024 Kent Overstreet <[email protected]>
+ */
+#ifndef _LINUX_DARRAY_TYpES_H
+#define _LINUX_DARRAY_TYpES_H
+
+#include <linux/types.h>
+
+#define DARRAY_PREALLOCATED(_type, _nr) \
+struct { \
+ size_t nr, size; \
+ _type *data; \
+ _type preallocated[_nr]; \
+}
+
+#define DARRAY(_type) DARRAY_PREALLOCATED(_type, 0)
+
+typedef DARRAY(char) darray_char;
+typedef DARRAY(char *) darray_str;
+
+#endif /* _LINUX_DARRAY_TYpES_H */
diff --git a/lib/Makefile b/lib/Makefile
index 57858997c87aa..830907bb8fc85 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -48,7 +48,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \
percpu-refcount.o rhashtable.o base64.o \
once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
- generic-radix-tree.o bitmap-str.o
+ generic-radix-tree.o bitmap-str.o darray.o
obj-$(CONFIG_STRING_SELFTEST) += test_string.o
obj-y += string_helpers.o
obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/fs/bcachefs/darray.c b/lib/darray.c
similarity index 56%
rename from fs/bcachefs/darray.c
rename to lib/darray.c
index ac35b8b705ae1..7cb064f14b391 100644
--- a/fs/bcachefs/darray.c
+++ b/lib/darray.c
@@ -1,10 +1,14 @@
// SPDX-License-Identifier: GPL-2.0
+/*
+ * (C) 2022-2024 Kent Overstreet <[email protected]>
+ */

+#include <linux/darray.h>
#include <linux/log2.h>
+#include <linux/module.h>
#include <linux/slab.h>
-#include "darray.h"

-int __bch2_darray_resize(darray_char *d, size_t element_size, size_t new_size, gfp_t gfp)
+int __darray_resize_slowpath(darray_char *d, size_t element_size, size_t new_size, gfp_t gfp)
{
if (new_size > d->size) {
new_size = roundup_pow_of_two(new_size);
@@ -22,3 +26,7 @@ int __bch2_darray_resize(darray_char *d, size_t element_size, size_t new_size, g

return 0;
}
+EXPORT_SYMBOL_GPL(__darray_resize_slowpath);
+
+MODULE_AUTHOR("Kent Overstreet");
+MODULE_LICENSE("GPL");


2024-02-24 01:16:51

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 08/10] thread_with_stdio: Mark completed in ->release()

From: Kent Overstreet <[email protected]>

This fixes stdio_redirect_read() getting stuck, not noticing that the
pipe has been closed.

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/thread_with_file.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)


diff --git a/lib/thread_with_file.c b/lib/thread_with_file.c
index 092996ca43fe7..f4946a437332a 100644
--- a/lib/thread_with_file.c
+++ b/lib/thread_with_file.c
@@ -201,6 +201,14 @@ EXPORT_SYMBOL_GPL(run_thread_with_file);

/* thread_with_stdio */

+static void thread_with_stdio_done(struct thread_with_stdio *thr)
+{
+ thr->thr.done = true;
+ thr->stdio.done = true;
+ wake_up(&thr->stdio.input.wait);
+ wake_up(&thr->stdio.output.wait);
+}
+
static ssize_t thread_with_stdio_read(struct file *file, char __user *ubuf,
size_t len, loff_t *ppos)
{
@@ -315,6 +323,7 @@ static int thread_with_stdio_release(struct inode *inode, struct file *file)
struct thread_with_stdio *thr =
container_of(file->private_data, struct thread_with_stdio, thr);

+ thread_with_stdio_done(thr);
thread_with_file_exit(&thr->thr);
darray_exit(&thr->stdio.input.buf);
darray_exit(&thr->stdio.output.buf);
@@ -336,10 +345,7 @@ static int thread_with_stdio_fn(void *arg)

thr->fn(thr);

- thr->thr.done = true;
- thr->stdio.done = true;
- wake_up(&thr->stdio.input.wait);
- wake_up(&thr->stdio.output.wait);
+ thread_with_stdio_done(thr);
return 0;
}



2024-02-24 01:17:07

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 09/10] kernel/hung_task.c: export sysctl_hung_task_timeout_secs

From: Kent Overstreet <[email protected]>

needed for thread_with_file; also rare but not unheard of to need this
in module code, when blocking on user input.

one workaround used by some code is wait_event_interruptible() - but
that can be buggy if the outer context isn't expecting unwinding.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: fuyuanli <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
kernel/hung_task.c | 1 +
1 file changed, 1 insertion(+)


diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 9a24574988d23..b2fc2727d6544 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -43,6 +43,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
* Zero means infinite timeout - no checking done:
*/
unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_TASK_TIMEOUT;
+EXPORT_SYMBOL_GPL(sysctl_hung_task_timeout_secs);

/*
* Zero (default value) means use sysctl_hung_task_timeout_secs:


2024-02-24 01:17:25

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 10/10] thread_with_stdio: suppress hung task warning

From: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/thread_with_file.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)


diff --git a/lib/thread_with_file.c b/lib/thread_with_file.c
index f4946a437332a..b09dc60ba6280 100644
--- a/lib/thread_with_file.c
+++ b/lib/thread_with_file.c
@@ -9,6 +9,7 @@
#include <linux/module.h>
#include <linux/pagemap.h>
#include <linux/poll.h>
+#include <linux/sched/sysctl.h>
#include <linux/thread_with_file.h>

/* stdio_redirect */
@@ -46,7 +47,15 @@ int stdio_redirect_read(struct stdio_redirect *stdio, char *ubuf, size_t len)
{
struct stdio_buf *buf = &stdio->input;

- wait_event(buf->wait, stdio_redirect_has_input(stdio));
+ /*
+ * we're waiting on user input (or for the file descriptor to be
+ * closed), don't want a hung task warning:
+ */
+ do {
+ wait_event_timeout(buf->wait, stdio_redirect_has_input(stdio),
+ sysctl_hung_task_timeout_secs * HZ / 2);
+ } while (!stdio_redirect_has_input(stdio));
+
if (stdio->done)
return -1;

@@ -67,7 +76,11 @@ int stdio_redirect_readline(struct stdio_redirect *stdio, char *ubuf, size_t len
size_t copied = 0;
ssize_t ret = 0;
again:
- wait_event(buf->wait, stdio_redirect_has_input(stdio));
+ do {
+ wait_event_timeout(buf->wait, stdio_redirect_has_input(stdio),
+ sysctl_hung_task_timeout_secs * HZ / 2);
+ } while (!stdio_redirect_has_input(stdio));
+
if (stdio->done) {
ret = -1;
goto out;


2024-02-24 01:18:07

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 04/10] bcachefs: thread_with_stdio: fix bch2_stdio_redirect_readline()

From: Kent Overstreet <[email protected]>

This fixes a bug where we'd return data without waiting for a newline,
if data was present but a newline was not.

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/bcachefs/thread_with_file.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)


diff --git a/fs/bcachefs/thread_with_file.c b/fs/bcachefs/thread_with_file.c
index eb8ab4c47a94b..830efb06ef0be 100644
--- a/fs/bcachefs/thread_with_file.c
+++ b/fs/bcachefs/thread_with_file.c
@@ -277,25 +277,36 @@ int bch2_stdio_redirect_read(struct stdio_redirect *stdio, char *ubuf, size_t le
int bch2_stdio_redirect_readline(struct stdio_redirect *stdio, char *ubuf, size_t len)
{
struct stdio_buf *buf = &stdio->input;
-
+ size_t copied = 0;
+ ssize_t ret = 0;
+again:
wait_event(buf->wait, stdio_redirect_has_input(stdio));
- if (stdio->done)
- return -1;
+ if (stdio->done) {
+ ret = -1;
+ goto out;
+ }

spin_lock(&buf->lock);
- int ret = min(len, buf->buf.nr);
- char *n = memchr(buf->buf.data, '\n', ret);
- if (!n)
- ret = min(ret, n + 1 - buf->buf.data);
- buf->buf.nr -= ret;
- memcpy(ubuf, buf->buf.data, ret);
+ size_t b = min(len, buf->buf.nr);
+ char *n = memchr(buf->buf.data, '\n', b);
+ if (n)
+ b = min_t(size_t, b, n + 1 - buf->buf.data);
+ buf->buf.nr -= b;
+ memcpy(ubuf, buf->buf.data, b);
memmove(buf->buf.data,
- buf->buf.data + ret,
+ buf->buf.data + b,
buf->buf.nr);
+ ubuf += b;
+ len -= b;
+ copied += b;
spin_unlock(&buf->lock);

wake_up(&buf->wait);
- return ret;
+
+ if (!n && len)
+ goto again;
+out:
+ return copied ?: ret;
}

__printf(3, 0)


2024-02-24 01:18:30

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 05/10] bcachefs: Thread with file documentation

From: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/bcachefs/thread_with_file.c | 15 ++++++++-------
fs/bcachefs/thread_with_file.h | 32 ++++++++++++++++++++++++++++++++
2 files changed, 40 insertions(+), 7 deletions(-)


diff --git a/fs/bcachefs/thread_with_file.c b/fs/bcachefs/thread_with_file.c
index 830efb06ef0be..dde9679b68b42 100644
--- a/fs/bcachefs/thread_with_file.c
+++ b/fs/bcachefs/thread_with_file.c
@@ -76,16 +76,16 @@ static bool stdio_redirect_has_output(struct stdio_redirect *stdio)
return stdio->output.buf.nr || stdio->done;
}

-#define WRITE_BUFFER 4096
+#define STDIO_REDIRECT_BUFSIZE 4096

static bool stdio_redirect_has_input_space(struct stdio_redirect *stdio)
{
- return stdio->input.buf.nr < WRITE_BUFFER || stdio->done;
+ return stdio->input.buf.nr < STDIO_REDIRECT_BUFSIZE || stdio->done;
}

static bool stdio_redirect_has_output_space(struct stdio_redirect *stdio)
{
- return stdio->output.buf.nr < WRITE_BUFFER || stdio->done;
+ return stdio->output.buf.nr < STDIO_REDIRECT_BUFSIZE || stdio->done;
}

static void stdio_buf_init(struct stdio_buf *buf)
@@ -171,11 +171,12 @@ static ssize_t thread_with_stdio_write(struct file *file, const char __user *ubu
}

spin_lock(&buf->lock);
- if (buf->buf.nr < WRITE_BUFFER)
- darray_make_room_gfp(&buf->buf, min(b, WRITE_BUFFER - buf->buf.nr), __GFP_NOWARN);
+ if (buf->buf.nr < STDIO_REDIRECT_BUFSIZE)
+ darray_make_room_gfp(&buf->buf,
+ min(b, STDIO_REDIRECT_BUFSIZE - buf->buf.nr), GFP_NOWAIT);
b = min(len, darray_room(buf->buf));

- if (b && !copy_from_user_nofault(&buf->buf.data[buf->buf.nr], ubuf, b)) {
+ if (b && !copy_from_user_nofault(&darray_top(buf->buf), ubuf, b)) {
buf->buf.nr += b;
ubuf += b;
len -= b;
@@ -338,7 +339,7 @@ void bch2_stdio_redirect_vprintf(struct stdio_redirect *stdio, bool nonblocking,
return;

spin_lock_irqsave(&buf->lock, flags);
- bch2_darray_vprintf(&buf->buf, nonblocking ? __GFP_NOWARN : GFP_KERNEL, fmt, args);
+ bch2_darray_vprintf(&buf->buf, nonblocking ? GFP_NOWAIT : GFP_KERNEL, fmt, args);
spin_unlock_irqrestore(&buf->lock, flags);

wake_up(&buf->wait);
diff --git a/fs/bcachefs/thread_with_file.h b/fs/bcachefs/thread_with_file.h
index 66212fcae226a..f06f8ff19a790 100644
--- a/fs/bcachefs/thread_with_file.h
+++ b/fs/bcachefs/thread_with_file.h
@@ -4,6 +4,38 @@

#include "thread_with_file_types.h"

+/*
+ * Thread with file: Run a kthread and connect it to a file descriptor, so that
+ * it can be interacted with via fd read/write methods and closing the file
+ * descriptor stops the kthread.
+ *
+ * We have two different APIs:
+ *
+ * thread_with_file, the low level version.
+ * You get to define the full file_operations, including your release function,
+ * which means that you must call bch2_thread_with_file_exit() from your
+ * .release method
+ *
+ * thread_with_stdio, the higher level version
+ * This implements full piping of input and output, including .poll.
+ *
+ * Notes on behaviour:
+ * - kthread shutdown behaves like writing or reading from a pipe that has been
+ * closed
+ * - Input and output buffers are 4096 bytes, although buffers may in some
+ * situations slightly exceed that limit so as to avoid chopping off a
+ * message in the middle in nonblocking mode.
+ * - Input/output buffers are lazily allocated, with GFP_NOWAIT allocations -
+ * should be fine but might change in future revisions.
+ * - Output buffer may grow past 4096 bytes to deal with messages that are
+ * bigger than 4096 bytes
+ * - Writing may be done blocking or nonblocking; in nonblocking mode, we only
+ * drop entire messages.
+ *
+ * To write, use stdio_redirect_printf()
+ * To read, use stdio_redirect_read() or stdio_redirect_readline()
+ */
+
struct task_struct;

struct thread_with_file {


2024-02-24 01:18:48

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 5/5] thread_with_file: Fix missing va_end()

From: Kent Overstreet <[email protected]>

Fixes: https://lore.kernel.org/linux-bcachefs/202402131603.E953E2CF@keescook/T/#u
Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/thread_with_file.c | 2 ++
1 file changed, 2 insertions(+)


diff --git a/lib/thread_with_file.c b/lib/thread_with_file.c
index 8b129744a48a3..37a1ea22823ca 100644
--- a/lib/thread_with_file.c
+++ b/lib/thread_with_file.c
@@ -118,6 +118,8 @@ static ssize_t darray_vprintf(darray_char *out, gfp_t gfp, const char *fmt, va_l

va_copy(args2, args);
len = vsnprintf(out->data + out->nr, darray_room(*out), fmt, args2);
+ va_end(args2);
+
if (len + 1 <= darray_room(*out)) {
out->nr += len;
return len;


2024-02-24 01:19:43

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 07/10] thread_with_file: Lift from bcachefs

From: Kent Overstreet <[email protected]>

thread_with_file and thread_with_stdio are abstractions for connecting
kthreads to file descriptors, which is handy for all sorts of things -
the running kthread has its lifetime connected to the file descriptor,
which means an asynchronous job running in the kernel can easily exit in
response to a ctrl-c, and the file descriptor also provides a
communications channel.

Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
MAINTAINERS | 9 +
fs/bcachefs/Kconfig | 1
fs/bcachefs/Makefile | 1
fs/bcachefs/bcachefs.h | 2
fs/bcachefs/chardev.c | 10 -
fs/bcachefs/error.c | 4
fs/bcachefs/super.c | 4
include/linux/thread_with_file.h | 35 ++-
include/linux/thread_with_file_types.h | 8 -
lib/Kconfig | 3
lib/Makefile | 1
lib/thread_with_file.c | 326 ++++++++++++++++----------------
12 files changed, 212 insertions(+), 192 deletions(-)
rename fs/bcachefs/thread_with_file.h => include/linux/thread_with_file.h (63%)
rename fs/bcachefs/thread_with_file_types.h => include/linux/thread_with_file_types.h (64%)
rename fs/bcachefs/thread_with_file.c => lib/thread_with_file.c (79%)


diff --git a/MAINTAINERS b/MAINTAINERS
index 97905e0d57a52..5799134b24737 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21888,6 +21888,15 @@ F: Documentation/userspace-api/media/drivers/thp7312.rst
F: drivers/media/i2c/thp7312.c
F: include/uapi/linux/thp7312.h

+THREAD WITH FILE
+M: Kent Overstreet <[email protected]>
+M: Darrick J. Wong <[email protected]>
+L: [email protected]
+S: Maintained
+F: include/linux/thread_with_file.c
+F: include/linux/thread_with_file_types.c
+F: lib/thread_with_file.c
+
THUNDERBOLT DMA TRAFFIC TEST DRIVER
M: Isaac Hazan <[email protected]>
L: [email protected]
diff --git a/fs/bcachefs/Kconfig b/fs/bcachefs/Kconfig
index 8c587ddd2f85e..08073d76e5a42 100644
--- a/fs/bcachefs/Kconfig
+++ b/fs/bcachefs/Kconfig
@@ -25,6 +25,7 @@ config BCACHEFS_FS
select SRCU
select SYMBOLIC_ERRNAME
select TIME_STATS
+ select THREAD_WITH_FILE
help
The bcachefs filesystem - a modern, copy on write filesystem, with
support for multiple devices, compression, checksumming, etc.
diff --git a/fs/bcachefs/Makefile b/fs/bcachefs/Makefile
index bb17d146b0900..d335b6572d72d 100644
--- a/fs/bcachefs/Makefile
+++ b/fs/bcachefs/Makefile
@@ -80,7 +80,6 @@ bcachefs-y := \
super-io.o \
sysfs.o \
tests.o \
- thread_with_file.o \
trace.o \
two_state_shared_lock.o \
util.o \
diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index 04e4a65909a4f..5f801256e8740 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -200,6 +200,7 @@
#include <linux/seqlock.h>
#include <linux/shrinker.h>
#include <linux/srcu.h>
+#include <linux/thread_with_file_types.h>
#include <linux/time_stats.h>
#include <linux/types.h>
#include <linux/workqueue.h>
@@ -466,7 +467,6 @@ enum bch_time_stats {
#include "replicas_types.h"
#include "subvolume_types.h"
#include "super_types.h"
-#include "thread_with_file_types.h"

/* Number of nodes btree coalesce will try to coalesce at once */
#define GC_MERGE_NODES 4U
diff --git a/fs/bcachefs/chardev.c b/fs/bcachefs/chardev.c
index 11711f54057e1..4cbda66bb6e0f 100644
--- a/fs/bcachefs/chardev.c
+++ b/fs/bcachefs/chardev.c
@@ -11,7 +11,6 @@
#include "replicas.h"
#include "super.h"
#include "super-io.h"
-#include "thread_with_file.h"

#include <linux/cdev.h>
#include <linux/device.h>
@@ -20,6 +19,7 @@
#include <linux/major.h>
#include <linux/sched/task.h>
#include <linux/slab.h>
+#include <linux/thread_with_file.h>
#include <linux/uaccess.h>

__must_check
@@ -217,7 +217,7 @@ static long bch2_ioctl_fsck_offline(struct bch_ioctl_fsck_offline __user *user_a

opt_set(thr->opts, stdio, (u64)(unsigned long)&thr->thr.stdio);

- ret = bch2_run_thread_with_stdio(&thr->thr,
+ ret = run_thread_with_stdio(&thr->thr,
bch2_fsck_thread_exit,
bch2_fsck_offline_thread_fn);
err:
@@ -422,7 +422,7 @@ static int bch2_data_job_release(struct inode *inode, struct file *file)
{
struct bch_data_ctx *ctx = container_of(file->private_data, struct bch_data_ctx, thr);

- bch2_thread_with_file_exit(&ctx->thr);
+ thread_with_file_exit(&ctx->thr);
kfree(ctx);
return 0;
}
@@ -472,7 +472,7 @@ static long bch2_ioctl_data(struct bch_fs *c,
ctx->c = c;
ctx->arg = arg;

- ret = bch2_run_thread_with_file(&ctx->thr,
+ ret = run_thread_with_file(&ctx->thr,
&bcachefs_data_ops,
bch2_data_thread);
if (ret < 0)
@@ -834,7 +834,7 @@ static long bch2_ioctl_fsck_online(struct bch_fs *c,
goto err;
}

- ret = bch2_run_thread_with_stdio(&thr->thr,
+ ret = run_thread_with_stdio(&thr->thr,
bch2_fsck_thread_exit,
bch2_fsck_online_thread_fn);
err:
diff --git a/fs/bcachefs/error.c b/fs/bcachefs/error.c
index d32c8bebe46c3..70a1253959740 100644
--- a/fs/bcachefs/error.c
+++ b/fs/bcachefs/error.c
@@ -2,7 +2,7 @@
#include "bcachefs.h"
#include "error.h"
#include "super.h"
-#include "thread_with_file.h"
+#include <linux/thread_with_file.h>

#define FSCK_ERR_RATELIMIT_NR 10

@@ -105,7 +105,7 @@ static enum ask_yn bch2_fsck_ask_yn(struct bch_fs *c)
do {
bch2_print(c, " (y,n, or Y,N for all errors of this type) ");

- int r = bch2_stdio_redirect_readline(stdio, buf, sizeof(buf) - 1);
+ int r = stdio_redirect_readline(stdio, buf, sizeof(buf) - 1);
if (r < 0)
return YN_NO;
buf[r] = '\0';
diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index 0cff8c5f3c104..38a87c8fc8235 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -56,7 +56,6 @@
#include "super.h"
#include "super-io.h"
#include "sysfs.h"
-#include "thread_with_file.h"
#include "trace.h"

#include <linux/backing-dev.h>
@@ -68,6 +67,7 @@
#include <linux/percpu.h>
#include <linux/random.h>
#include <linux/sysfs.h>
+#include <linux/thread_with_file.h>
#include <crypto/hash.h>

MODULE_LICENSE("GPL");
@@ -99,7 +99,7 @@ void __bch2_print(struct bch_fs *c, const char *fmt, ...)
if (fmt[0] == KERN_SOH[0])
fmt += 2;

- bch2_stdio_redirect_vprintf(stdio, true, fmt, args);
+ stdio_redirect_vprintf(stdio, true, fmt, args);
}
va_end(args);
}
diff --git a/fs/bcachefs/thread_with_file.h b/include/linux/thread_with_file.h
similarity index 63%
rename from fs/bcachefs/thread_with_file.h
rename to include/linux/thread_with_file.h
index f06f8ff19a790..54091f7ff3383 100644
--- a/fs/bcachefs/thread_with_file.h
+++ b/include/linux/thread_with_file.h
@@ -1,8 +1,11 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _BCACHEFS_THREAD_WITH_FILE_H
-#define _BCACHEFS_THREAD_WITH_FILE_H
+/*
+ * (C) 2022-2024 Kent Overstreet <[email protected]>
+ */
+#ifndef _LINUX_THREAD_WITH_FILE_H
+#define _LINUX_THREAD_WITH_FILE_H

-#include "thread_with_file_types.h"
+#include <linux/thread_with_file_types.h>

/*
* Thread with file: Run a kthread and connect it to a file descriptor, so that
@@ -13,7 +16,7 @@
*
* thread_with_file, the low level version.
* You get to define the full file_operations, including your release function,
- * which means that you must call bch2_thread_with_file_exit() from your
+ * which means that you must call thread_with_file_exit() from your
* .release method
*
* thread_with_stdio, the higher level version
@@ -44,10 +47,10 @@ struct thread_with_file {
bool done;
};

-void bch2_thread_with_file_exit(struct thread_with_file *);
-int bch2_run_thread_with_file(struct thread_with_file *,
- const struct file_operations *,
- int (*fn)(void *));
+void thread_with_file_exit(struct thread_with_file *);
+int run_thread_with_file(struct thread_with_file *,
+ const struct file_operations *,
+ int (*fn)(void *));

struct thread_with_stdio {
struct thread_with_file thr;
@@ -56,13 +59,13 @@ struct thread_with_stdio {
void (*fn)(struct thread_with_stdio *);
};

-int bch2_run_thread_with_stdio(struct thread_with_stdio *,
- void (*exit)(struct thread_with_stdio *),
- void (*fn)(struct thread_with_stdio *));
-int bch2_stdio_redirect_read(struct stdio_redirect *, char *, size_t);
-int bch2_stdio_redirect_readline(struct stdio_redirect *, char *, size_t);
+int run_thread_with_stdio(struct thread_with_stdio *,
+ void (*exit)(struct thread_with_stdio *),
+ void (*fn)(struct thread_with_stdio *));
+int stdio_redirect_read(struct stdio_redirect *, char *, size_t);
+int stdio_redirect_readline(struct stdio_redirect *, char *, size_t);

-__printf(3, 0) void bch2_stdio_redirect_vprintf(struct stdio_redirect *, bool, const char *, va_list);
-__printf(3, 4) void bch2_stdio_redirect_printf(struct stdio_redirect *, bool, const char *, ...);
+__printf(3, 0) void stdio_redirect_vprintf(struct stdio_redirect *, bool, const char *, va_list);
+__printf(3, 4) void stdio_redirect_printf(struct stdio_redirect *, bool, const char *, ...);

-#endif /* _BCACHEFS_THREAD_WITH_FILE_H */
+#endif /* _LINUX_THREAD_WITH_FILE_H */
diff --git a/fs/bcachefs/thread_with_file_types.h b/include/linux/thread_with_file_types.h
similarity index 64%
rename from fs/bcachefs/thread_with_file_types.h
rename to include/linux/thread_with_file_types.h
index 41990756aa261..98d0ad1253221 100644
--- a/fs/bcachefs/thread_with_file_types.h
+++ b/include/linux/thread_with_file_types.h
@@ -1,8 +1,10 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _BCACHEFS_THREAD_WITH_FILE_TYPES_H
-#define _BCACHEFS_THREAD_WITH_FILE_TYPES_H
+#ifndef _LINUX_THREAD_WITH_FILE_TYPES_H
+#define _LINUX_THREAD_WITH_FILE_TYPES_H

#include <linux/darray_types.h>
+#include <linux/spinlock_types.h>
+#include <linux/wait.h>

struct stdio_buf {
spinlock_t lock;
@@ -20,4 +22,4 @@ struct stdio_redirect {
bool done;
};

-#endif /* _BCACHEFS_THREAD_WITH_FILE_TYPES_H */
+#endif /* _LINUX_THREAD_WITH_FILE_TYPES_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index 3ba8b965f8c7e..9258d04e939db 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -789,3 +789,6 @@ config FIRMWARE_TABLE
config TIME_STATS
tristate
select MEAN_AND_VARIANCE
+
+config THREAD_WITH_FILE
+ tristate
diff --git a/lib/Makefile b/lib/Makefile
index 830907bb8fc85..e77304f69df03 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -371,6 +371,7 @@ obj-$(CONFIG_SBITMAP) += sbitmap.o
obj-$(CONFIG_PARMAN) += parman.o

obj-$(CONFIG_TIME_STATS) += time_stats.o
+obj-$(CONFIG_THREAD_WITH_FILE) += thread_with_file.o

obj-y += group_cpus.o

diff --git a/fs/bcachefs/thread_with_file.c b/lib/thread_with_file.c
similarity index 79%
rename from fs/bcachefs/thread_with_file.c
rename to lib/thread_with_file.c
index dde9679b68b42..092996ca43fe7 100644
--- a/fs/bcachefs/thread_with_file.c
+++ b/lib/thread_with_file.c
@@ -1,26 +1,160 @@
// SPDX-License-Identifier: GPL-2.0
-#ifndef NO_BCACHEFS_FS
-
-#include "bcachefs.h"
-#include "thread_with_file.h"
-
+/*
+ * (C) 2022-2024 Kent Overstreet <[email protected]>
+ */
#include <linux/anon_inodes.h>
+#include <linux/darray.h>
#include <linux/file.h>
#include <linux/kthread.h>
+#include <linux/module.h>
#include <linux/pagemap.h>
#include <linux/poll.h>
+#include <linux/thread_with_file.h>

-void bch2_thread_with_file_exit(struct thread_with_file *thr)
+/* stdio_redirect */
+
+#define STDIO_REDIRECT_BUFSIZE 4096
+
+static bool stdio_redirect_has_input(struct stdio_redirect *stdio)
+{
+ return stdio->input.buf.nr || stdio->done;
+}
+
+static bool stdio_redirect_has_output(struct stdio_redirect *stdio)
+{
+ return stdio->output.buf.nr || stdio->done;
+}
+
+static bool stdio_redirect_has_input_space(struct stdio_redirect *stdio)
+{
+ return stdio->input.buf.nr < STDIO_REDIRECT_BUFSIZE || stdio->done;
+}
+
+static bool stdio_redirect_has_output_space(struct stdio_redirect *stdio)
+{
+ return stdio->output.buf.nr < STDIO_REDIRECT_BUFSIZE || stdio->done;
+}
+
+static void stdio_buf_init(struct stdio_buf *buf)
+{
+ spin_lock_init(&buf->lock);
+ init_waitqueue_head(&buf->wait);
+ darray_init(&buf->buf);
+}
+
+int stdio_redirect_read(struct stdio_redirect *stdio, char *ubuf, size_t len)
+{
+ struct stdio_buf *buf = &stdio->input;
+
+ wait_event(buf->wait, stdio_redirect_has_input(stdio));
+ if (stdio->done)
+ return -1;
+
+ spin_lock(&buf->lock);
+ int ret = min(len, buf->buf.nr);
+ memcpy(ubuf, buf->buf.data, ret);
+ darray_remove_items(&buf->buf, buf->buf.data, ret);
+ spin_unlock(&buf->lock);
+
+ wake_up(&buf->wait);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(stdio_redirect_read);
+
+int stdio_redirect_readline(struct stdio_redirect *stdio, char *ubuf, size_t len)
+{
+ struct stdio_buf *buf = &stdio->input;
+ size_t copied = 0;
+ ssize_t ret = 0;
+again:
+ wait_event(buf->wait, stdio_redirect_has_input(stdio));
+ if (stdio->done) {
+ ret = -1;
+ goto out;
+ }
+
+ spin_lock(&buf->lock);
+ size_t b = min(len, buf->buf.nr);
+ char *n = memchr(buf->buf.data, '\n', b);
+ if (n)
+ b = min_t(size_t, b, n + 1 - buf->buf.data);
+ memcpy(ubuf, buf->buf.data, b);
+ darray_remove_items(&buf->buf, buf->buf.data, b);
+ ubuf += b;
+ len -= b;
+ copied += b;
+ spin_unlock(&buf->lock);
+
+ wake_up(&buf->wait);
+
+ if (!n && len)
+ goto again;
+out:
+ return copied ?: ret;
+}
+EXPORT_SYMBOL_GPL(stdio_redirect_readline);
+
+__printf(3, 0)
+static void darray_vprintf(darray_char *out, gfp_t gfp, const char *fmt, va_list args)
+{
+ size_t len;
+
+ do {
+ va_list args2;
+ va_copy(args2, args);
+
+ len = vsnprintf(out->data + out->nr, darray_room(*out), fmt, args2);
+ } while (len + 1 > darray_room(*out) && !darray_make_room_gfp(out, len + 1, gfp));
+
+ out->nr += min(len, darray_room(*out));
+}
+
+void stdio_redirect_vprintf(struct stdio_redirect *stdio, bool nonblocking,
+ const char *fmt, va_list args)
+{
+ struct stdio_buf *buf = &stdio->output;
+ unsigned long flags;
+
+ if (!nonblocking)
+ wait_event(buf->wait, stdio_redirect_has_output_space(stdio));
+ else if (!stdio_redirect_has_output_space(stdio))
+ return;
+ if (stdio->done)
+ return;
+
+ spin_lock_irqsave(&buf->lock, flags);
+ darray_vprintf(&buf->buf, nonblocking ? GFP_NOWAIT : GFP_KERNEL, fmt, args);
+ spin_unlock_irqrestore(&buf->lock, flags);
+
+ wake_up(&buf->wait);
+}
+EXPORT_SYMBOL_GPL(stdio_redirect_vprintf);
+
+void stdio_redirect_printf(struct stdio_redirect *stdio, bool nonblocking,
+ const char *fmt, ...)
+{
+
+ va_list args;
+ va_start(args, fmt);
+ stdio_redirect_vprintf(stdio, nonblocking, fmt, args);
+ va_end(args);
+}
+EXPORT_SYMBOL_GPL(stdio_redirect_printf);
+
+/* thread with file: */
+
+void thread_with_file_exit(struct thread_with_file *thr)
{
if (thr->task) {
kthread_stop(thr->task);
put_task_struct(thr->task);
}
}
+EXPORT_SYMBOL_GPL(thread_with_file_exit);

-int bch2_run_thread_with_file(struct thread_with_file *thr,
- const struct file_operations *fops,
- int (*fn)(void *))
+int run_thread_with_file(struct thread_with_file *thr,
+ const struct file_operations *fops,
+ int (*fn)(void *))
{
struct file *file = NULL;
int ret, fd = -1;
@@ -63,37 +197,7 @@ int bch2_run_thread_with_file(struct thread_with_file *thr,
kthread_stop(thr->task);
return ret;
}
-
-/* stdio_redirect */
-
-static bool stdio_redirect_has_input(struct stdio_redirect *stdio)
-{
- return stdio->input.buf.nr || stdio->done;
-}
-
-static bool stdio_redirect_has_output(struct stdio_redirect *stdio)
-{
- return stdio->output.buf.nr || stdio->done;
-}
-
-#define STDIO_REDIRECT_BUFSIZE 4096
-
-static bool stdio_redirect_has_input_space(struct stdio_redirect *stdio)
-{
- return stdio->input.buf.nr < STDIO_REDIRECT_BUFSIZE || stdio->done;
-}
-
-static bool stdio_redirect_has_output_space(struct stdio_redirect *stdio)
-{
- return stdio->output.buf.nr < STDIO_REDIRECT_BUFSIZE || stdio->done;
-}
-
-static void stdio_buf_init(struct stdio_buf *buf)
-{
- spin_lock_init(&buf->lock);
- init_waitqueue_head(&buf->wait);
- darray_init(&buf->buf);
-}
+EXPORT_SYMBOL_GPL(run_thread_with_file);

/* thread_with_stdio */

@@ -126,10 +230,7 @@ static ssize_t thread_with_stdio_read(struct file *file, char __user *ubuf,
ubuf += b;
len -= b;
copied += b;
- buf->buf.nr -= b;
- memmove(buf->buf.data,
- buf->buf.data + b,
- buf->buf.nr);
+ darray_remove_items(&buf->buf, buf->buf.data, b);
}
spin_unlock_irq(&buf->lock);
}
@@ -137,18 +238,6 @@ static ssize_t thread_with_stdio_read(struct file *file, char __user *ubuf,
return copied ?: ret;
}

-static int thread_with_stdio_release(struct inode *inode, struct file *file)
-{
- struct thread_with_stdio *thr =
- container_of(file->private_data, struct thread_with_stdio, thr);
-
- bch2_thread_with_file_exit(&thr->thr);
- darray_exit(&thr->stdio.input.buf);
- darray_exit(&thr->stdio.output.buf);
- thr->exit(thr);
- return 0;
-}
-
static ssize_t thread_with_stdio_write(struct file *file, const char __user *ubuf,
size_t len, loff_t *ppos)
{
@@ -221,6 +310,18 @@ static __poll_t thread_with_stdio_poll(struct file *file, struct poll_table_stru
return mask;
}

+static int thread_with_stdio_release(struct inode *inode, struct file *file)
+{
+ struct thread_with_stdio *thr =
+ container_of(file->private_data, struct thread_with_stdio, thr);
+
+ thread_with_file_exit(&thr->thr);
+ darray_exit(&thr->stdio.input.buf);
+ darray_exit(&thr->stdio.output.buf);
+ thr->exit(thr);
+ return 0;
+}
+
static const struct file_operations thread_with_stdio_fops = {
.llseek = no_llseek,
.read = thread_with_stdio_read,
@@ -242,117 +343,18 @@ static int thread_with_stdio_fn(void *arg)
return 0;
}

-int bch2_run_thread_with_stdio(struct thread_with_stdio *thr,
- void (*exit)(struct thread_with_stdio *),
- void (*fn)(struct thread_with_stdio *))
+int run_thread_with_stdio(struct thread_with_stdio *thr,
+ void (*exit)(struct thread_with_stdio *),
+ void (*fn)(struct thread_with_stdio *))
{
stdio_buf_init(&thr->stdio.input);
stdio_buf_init(&thr->stdio.output);
thr->exit = exit;
thr->fn = fn;

- return bch2_run_thread_with_file(&thr->thr, &thread_with_stdio_fops, thread_with_stdio_fn);
+ return run_thread_with_file(&thr->thr, &thread_with_stdio_fops, thread_with_stdio_fn);
}
+EXPORT_SYMBOL_GPL(run_thread_with_stdio);

-int bch2_stdio_redirect_read(struct stdio_redirect *stdio, char *ubuf, size_t len)
-{
- struct stdio_buf *buf = &stdio->input;
-
- wait_event(buf->wait, stdio_redirect_has_input(stdio));
- if (stdio->done)
- return -1;
-
- spin_lock(&buf->lock);
- int ret = min(len, buf->buf.nr);
- buf->buf.nr -= ret;
- memcpy(ubuf, buf->buf.data, ret);
- memmove(buf->buf.data,
- buf->buf.data + ret,
- buf->buf.nr);
- spin_unlock(&buf->lock);
-
- wake_up(&buf->wait);
- return ret;
-}
-
-int bch2_stdio_redirect_readline(struct stdio_redirect *stdio, char *ubuf, size_t len)
-{
- struct stdio_buf *buf = &stdio->input;
- size_t copied = 0;
- ssize_t ret = 0;
-again:
- wait_event(buf->wait, stdio_redirect_has_input(stdio));
- if (stdio->done) {
- ret = -1;
- goto out;
- }
-
- spin_lock(&buf->lock);
- size_t b = min(len, buf->buf.nr);
- char *n = memchr(buf->buf.data, '\n', b);
- if (n)
- b = min_t(size_t, b, n + 1 - buf->buf.data);
- buf->buf.nr -= b;
- memcpy(ubuf, buf->buf.data, b);
- memmove(buf->buf.data,
- buf->buf.data + b,
- buf->buf.nr);
- ubuf += b;
- len -= b;
- copied += b;
- spin_unlock(&buf->lock);
-
- wake_up(&buf->wait);
-
- if (!n && len)
- goto again;
-out:
- return copied ?: ret;
-}
-
-__printf(3, 0)
-static void bch2_darray_vprintf(darray_char *out, gfp_t gfp, const char *fmt, va_list args)
-{
- size_t len;
-
- do {
- va_list args2;
- va_copy(args2, args);
-
- len = vsnprintf(out->data + out->nr, darray_room(*out), fmt, args2);
- } while (len + 1 > darray_room(*out) && !darray_make_room_gfp(out, len + 1, gfp));
-
- out->nr += min(len, darray_room(*out));
-}
-
-void bch2_stdio_redirect_vprintf(struct stdio_redirect *stdio, bool nonblocking,
- const char *fmt, va_list args)
-{
- struct stdio_buf *buf = &stdio->output;
- unsigned long flags;
-
- if (!nonblocking)
- wait_event(buf->wait, stdio_redirect_has_output_space(stdio));
- else if (!stdio_redirect_has_output_space(stdio))
- return;
- if (stdio->done)
- return;
-
- spin_lock_irqsave(&buf->lock, flags);
- bch2_darray_vprintf(&buf->buf, nonblocking ? GFP_NOWAIT : GFP_KERNEL, fmt, args);
- spin_unlock_irqrestore(&buf->lock, flags);
-
- wake_up(&buf->wait);
-}
-
-void bch2_stdio_redirect_printf(struct stdio_redirect *stdio, bool nonblocking,
- const char *fmt, ...)
-{
-
- va_list args;
- va_start(args, fmt);
- bch2_stdio_redirect_vprintf(stdio, nonblocking, fmt, args);
- va_end(args);
-}
-
-#endif /* NO_BCACHEFS_FS */
+MODULE_AUTHOR("Kent Overstreet");
+MODULE_LICENSE("GPL");


2024-02-24 01:22:11

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 3/5] thread_with_file: create ops structure for thread_with_stdio

From: Darrick J. Wong <[email protected]>

Create an ops structure so we can add more file-based functionality in
the next few patches.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
fs/bcachefs/chardev.c | 18 ++++++++++++------
include/linux/thread_with_file.h | 16 ++++++++++------
lib/thread_with_file.c | 16 ++++++----------
3 files changed, 28 insertions(+), 22 deletions(-)


diff --git a/fs/bcachefs/chardev.c b/fs/bcachefs/chardev.c
index 4cbda66bb6e0f..a2f30f45f93f7 100644
--- a/fs/bcachefs/chardev.c
+++ b/fs/bcachefs/chardev.c
@@ -165,6 +165,11 @@ static void bch2_fsck_offline_thread_fn(struct thread_with_stdio *stdio)
bch2_fs_stop(c);
}

+static const struct thread_with_stdio_ops bch2_offline_fsck_ops = {
+ .exit = bch2_fsck_thread_exit,
+ .fn = bch2_fsck_offline_thread_fn,
+};
+
static long bch2_ioctl_fsck_offline(struct bch_ioctl_fsck_offline __user *user_arg)
{
struct bch_ioctl_fsck_offline arg;
@@ -217,9 +222,7 @@ static long bch2_ioctl_fsck_offline(struct bch_ioctl_fsck_offline __user *user_a

opt_set(thr->opts, stdio, (u64)(unsigned long)&thr->thr.stdio);

- ret = run_thread_with_stdio(&thr->thr,
- bch2_fsck_thread_exit,
- bch2_fsck_offline_thread_fn);
+ ret = run_thread_with_stdio(&thr->thr, &bch2_offline_fsck_ops);
err:
if (ret < 0) {
if (thr)
@@ -794,6 +797,11 @@ static void bch2_fsck_online_thread_fn(struct thread_with_stdio *stdio)
bch2_ro_ref_put(c);
}

+static const struct thread_with_stdio_ops bch2_online_fsck_ops = {
+ .exit = bch2_fsck_thread_exit,
+ .fn = bch2_fsck_online_thread_fn,
+};
+
static long bch2_ioctl_fsck_online(struct bch_fs *c,
struct bch_ioctl_fsck_online arg)
{
@@ -834,9 +842,7 @@ static long bch2_ioctl_fsck_online(struct bch_fs *c,
goto err;
}

- ret = run_thread_with_stdio(&thr->thr,
- bch2_fsck_thread_exit,
- bch2_fsck_online_thread_fn);
+ ret = run_thread_with_stdio(&thr->thr, &bch2_online_fsck_ops);
err:
if (ret < 0) {
bch_err_fn(c, ret);
diff --git a/include/linux/thread_with_file.h b/include/linux/thread_with_file.h
index 7b133a15d3540..445b1b12a0bd6 100644
--- a/include/linux/thread_with_file.h
+++ b/include/linux/thread_with_file.h
@@ -52,19 +52,23 @@ int run_thread_with_file(struct thread_with_file *,
const struct file_operations *,
int (*fn)(void *));

+struct thread_with_stdio;
+
+struct thread_with_stdio_ops {
+ void (*exit)(struct thread_with_stdio *);
+ void (*fn)(struct thread_with_stdio *);
+};
+
struct thread_with_stdio {
struct thread_with_file thr;
struct stdio_redirect stdio;
- void (*exit)(struct thread_with_stdio *);
- void (*fn)(struct thread_with_stdio *);
+ const struct thread_with_stdio_ops *ops;
};

int run_thread_with_stdio(struct thread_with_stdio *,
- void (*exit)(struct thread_with_stdio *),
- void (*fn)(struct thread_with_stdio *));
+ const struct thread_with_stdio_ops *);
int run_thread_with_stdout(struct thread_with_stdio *,
- void (*exit)(struct thread_with_stdio *),
- void (*fn)(struct thread_with_stdio *));
+ const struct thread_with_stdio_ops *);
int stdio_redirect_read(struct stdio_redirect *, char *, size_t);
int stdio_redirect_readline(struct stdio_redirect *, char *, size_t);

diff --git a/lib/thread_with_file.c b/lib/thread_with_file.c
index 70a805ef017f9..2edf33c3e7dc5 100644
--- a/lib/thread_with_file.c
+++ b/lib/thread_with_file.c
@@ -359,7 +359,7 @@ static int thread_with_stdio_release(struct inode *inode, struct file *file)
thread_with_file_exit(&thr->thr);
darray_exit(&thr->stdio.input.buf);
darray_exit(&thr->stdio.output.buf);
- thr->exit(thr);
+ thr->ops->exit(thr);
return 0;
}

@@ -398,33 +398,29 @@ static int thread_with_stdio_fn(void *arg)
{
struct thread_with_stdio *thr = arg;

- thr->fn(thr);
+ thr->ops->fn(thr);

thread_with_stdio_done(thr);
return 0;
}

int run_thread_with_stdio(struct thread_with_stdio *thr,
- void (*exit)(struct thread_with_stdio *),
- void (*fn)(struct thread_with_stdio *))
+ const struct thread_with_stdio_ops *ops)
{
stdio_buf_init(&thr->stdio.input);
stdio_buf_init(&thr->stdio.output);
- thr->exit = exit;
- thr->fn = fn;
+ thr->ops = ops;

return run_thread_with_file(&thr->thr, &thread_with_stdio_fops, thread_with_stdio_fn);
}
EXPORT_SYMBOL_GPL(run_thread_with_stdio);

int run_thread_with_stdout(struct thread_with_stdio *thr,
- void (*exit)(struct thread_with_stdio *),
- void (*fn)(struct thread_with_stdio *))
+ const struct thread_with_stdio_ops *ops)
{
stdio_buf_init(&thr->stdio.input);
stdio_buf_init(&thr->stdio.output);
- thr->exit = exit;
- thr->fn = fn;
+ thr->ops = ops;

return run_thread_with_file(&thr->thr, &thread_with_stdout_fops, thread_with_stdio_fn);
}


2024-02-24 01:22:39

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 4/5] thread_with_file: allow ioctls against these files

From: Darrick J. Wong <[email protected]>

Make it so that a thread_with_stdio user can handle ioctls against the
file descriptor.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/thread_with_file.h | 1 +
lib/thread_with_file.c | 12 ++++++++++++
2 files changed, 13 insertions(+)


diff --git a/include/linux/thread_with_file.h b/include/linux/thread_with_file.h
index 445b1b12a0bd6..33770938d5d9a 100644
--- a/include/linux/thread_with_file.h
+++ b/include/linux/thread_with_file.h
@@ -57,6 +57,7 @@ struct thread_with_stdio;
struct thread_with_stdio_ops {
void (*exit)(struct thread_with_stdio *);
void (*fn)(struct thread_with_stdio *);
+ long (*unlocked_ioctl)(struct thread_with_stdio *, unsigned int, unsigned long);
};

struct thread_with_stdio {
diff --git a/lib/thread_with_file.c b/lib/thread_with_file.c
index 2edf33c3e7dc5..8b129744a48a3 100644
--- a/lib/thread_with_file.c
+++ b/lib/thread_with_file.c
@@ -379,12 +379,23 @@ static __poll_t thread_with_stdout_poll(struct file *file, struct poll_table_str
return mask;
}

+static long thread_with_stdio_ioctl(struct file *file, unsigned int cmd, unsigned long p)
+{
+ struct thread_with_stdio *thr =
+ container_of(file->private_data, struct thread_with_stdio, thr);
+
+ if (thr->ops->unlocked_ioctl)
+ return thr->ops->unlocked_ioctl(thr, cmd, p);
+ return -ENOTTY;
+}
+
static const struct file_operations thread_with_stdio_fops = {
.llseek = no_llseek,
.read = thread_with_stdio_read,
.write = thread_with_stdio_write,
.poll = thread_with_stdio_poll,
.release = thread_with_stdio_release,
+ .unlocked_ioctl = thread_with_stdio_ioctl,
};

static const struct file_operations thread_with_stdout_fops = {
@@ -392,6 +403,7 @@ static const struct file_operations thread_with_stdout_fops = {
.read = thread_with_stdio_read,
.poll = thread_with_stdout_poll,
.release = thread_with_stdio_release,
+ .unlocked_ioctl = thread_with_stdio_ioctl,
};

static int thread_with_stdio_fn(void *arg)


2024-02-24 01:26:41

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 2/5] thread_with_file: fix various printf problems

From: Darrick J. Wong <[email protected]>

Experimentally fix some problems with stdio_redirect_vprintf by creating
a MOO variant with which we can experiment. We can't do a GFP_KERNEL
allocation while holding the spinlock, and I don't like how the printf
function can silently truncate the output if memory allocation fails.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/thread_with_file.h | 4 +--
lib/thread_with_file.c | 55 ++++++++++++++++++++++++++------------
2 files changed, 39 insertions(+), 20 deletions(-)


diff --git a/include/linux/thread_with_file.h b/include/linux/thread_with_file.h
index 5f7e85bc8322b..7b133a15d3540 100644
--- a/include/linux/thread_with_file.h
+++ b/include/linux/thread_with_file.h
@@ -68,7 +68,7 @@ int run_thread_with_stdout(struct thread_with_stdio *,
int stdio_redirect_read(struct stdio_redirect *, char *, size_t);
int stdio_redirect_readline(struct stdio_redirect *, char *, size_t);

-__printf(3, 0) void stdio_redirect_vprintf(struct stdio_redirect *, bool, const char *, va_list);
-__printf(3, 4) void stdio_redirect_printf(struct stdio_redirect *, bool, const char *, ...);
+__printf(3, 0) ssize_t stdio_redirect_vprintf(struct stdio_redirect *, bool, const char *, va_list);
+__printf(3, 4) ssize_t stdio_redirect_printf(struct stdio_redirect *, bool, const char *, ...);

#endif /* _LINUX_THREAD_WITH_FILE_H */
diff --git a/lib/thread_with_file.c b/lib/thread_with_file.c
index 71028611b8d59..70a805ef017f9 100644
--- a/lib/thread_with_file.c
+++ b/lib/thread_with_file.c
@@ -108,49 +108,68 @@ int stdio_redirect_readline(struct stdio_redirect *stdio, char *ubuf, size_t len
EXPORT_SYMBOL_GPL(stdio_redirect_readline);

__printf(3, 0)
-static void darray_vprintf(darray_char *out, gfp_t gfp, const char *fmt, va_list args)
+static ssize_t darray_vprintf(darray_char *out, gfp_t gfp, const char *fmt, va_list args)
{
- size_t len;
+ ssize_t ret;

do {
va_list args2;
+ size_t len;
+
va_copy(args2, args);
-
len = vsnprintf(out->data + out->nr, darray_room(*out), fmt, args2);
- } while (len + 1 > darray_room(*out) && !darray_make_room_gfp(out, len + 1, gfp));
+ if (len + 1 <= darray_room(*out)) {
+ out->nr += len;
+ return len;
+ }

- out->nr += min(len, darray_room(*out));
+ ret = darray_make_room_gfp(out, len + 1, gfp);
+ } while (ret == 0);
+
+ return ret;
}

-void stdio_redirect_vprintf(struct stdio_redirect *stdio, bool nonblocking,
- const char *fmt, va_list args)
+ssize_t stdio_redirect_vprintf(struct stdio_redirect *stdio, bool nonblocking,
+ const char *fmt, va_list args)
{
struct stdio_buf *buf = &stdio->output;
unsigned long flags;
+ ssize_t ret;

- if (!nonblocking)
- wait_event(buf->wait, stdio_redirect_has_output_space(stdio));
- else if (!stdio_redirect_has_output_space(stdio))
- return;
- if (stdio->done)
- return;
-
+again:
spin_lock_irqsave(&buf->lock, flags);
- darray_vprintf(&buf->buf, nonblocking ? GFP_NOWAIT : GFP_KERNEL, fmt, args);
+ ret = darray_vprintf(&buf->buf, GFP_NOWAIT, fmt, args);
spin_unlock_irqrestore(&buf->lock, flags);

+ if (ret < 0) {
+ if (nonblocking)
+ return -EAGAIN;
+
+ ret = wait_event_interruptible(buf->wait,
+ stdio_redirect_has_output_space(stdio));
+ if (ret)
+ return ret;
+ goto again;
+ }
+
wake_up(&buf->wait);
+ return ret;
+
}
EXPORT_SYMBOL_GPL(stdio_redirect_vprintf);

-void stdio_redirect_printf(struct stdio_redirect *stdio, bool nonblocking,
- const char *fmt, ...)
+ssize_t stdio_redirect_printf(struct stdio_redirect *stdio, bool nonblocking,
+ const char *fmt, ...)
{

va_list args;
+ ssize_t ret;
+
va_start(args, fmt);
- stdio_redirect_vprintf(stdio, nonblocking, fmt, args);
+ ret = stdio_redirect_vprintf(stdio, nonblocking, fmt, args);
va_end(args);
+
+ return ret;
}
EXPORT_SYMBOL_GPL(stdio_redirect_printf);



2024-02-24 01:27:33

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 1/5] thread_with_file: allow creation of readonly files

From: Darrick J. Wong <[email protected]>

Create a new run_thread_with_stdout function that opens a file in
O_RDONLY mode so that the kernel can write things to userspace but
userspace cannot write to the kernel. This will be used to convey xfs
health event information to userspace.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/thread_with_file.h | 3 +++
lib/thread_with_file.c | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 39 insertions(+)


diff --git a/include/linux/thread_with_file.h b/include/linux/thread_with_file.h
index 54091f7ff3383..5f7e85bc8322b 100644
--- a/include/linux/thread_with_file.h
+++ b/include/linux/thread_with_file.h
@@ -62,6 +62,9 @@ struct thread_with_stdio {
int run_thread_with_stdio(struct thread_with_stdio *,
void (*exit)(struct thread_with_stdio *),
void (*fn)(struct thread_with_stdio *));
+int run_thread_with_stdout(struct thread_with_stdio *,
+ void (*exit)(struct thread_with_stdio *),
+ void (*fn)(struct thread_with_stdio *));
int stdio_redirect_read(struct stdio_redirect *, char *, size_t);
int stdio_redirect_readline(struct stdio_redirect *, char *, size_t);

diff --git a/lib/thread_with_file.c b/lib/thread_with_file.c
index b09dc60ba6280..71028611b8d59 100644
--- a/lib/thread_with_file.c
+++ b/lib/thread_with_file.c
@@ -344,6 +344,22 @@ static int thread_with_stdio_release(struct inode *inode, struct file *file)
return 0;
}

+static __poll_t thread_with_stdout_poll(struct file *file, struct poll_table_struct *wait)
+{
+ struct thread_with_stdio *thr =
+ container_of(file->private_data, struct thread_with_stdio, thr);
+
+ poll_wait(file, &thr->stdio.output.wait, wait);
+
+ __poll_t mask = 0;
+
+ if (stdio_redirect_has_output(&thr->stdio))
+ mask |= EPOLLIN;
+ if (thr->thr.done)
+ mask |= EPOLLHUP|EPOLLERR;
+ return mask;
+}
+
static const struct file_operations thread_with_stdio_fops = {
.llseek = no_llseek,
.read = thread_with_stdio_read,
@@ -352,6 +368,13 @@ static const struct file_operations thread_with_stdio_fops = {
.release = thread_with_stdio_release,
};

+static const struct file_operations thread_with_stdout_fops = {
+ .llseek = no_llseek,
+ .read = thread_with_stdio_read,
+ .poll = thread_with_stdout_poll,
+ .release = thread_with_stdio_release,
+};
+
static int thread_with_stdio_fn(void *arg)
{
struct thread_with_stdio *thr = arg;
@@ -375,5 +398,18 @@ int run_thread_with_stdio(struct thread_with_stdio *thr,
}
EXPORT_SYMBOL_GPL(run_thread_with_stdio);

+int run_thread_with_stdout(struct thread_with_stdio *thr,
+ void (*exit)(struct thread_with_stdio *),
+ void (*fn)(struct thread_with_stdio *))
+{
+ stdio_buf_init(&thr->stdio.input);
+ stdio_buf_init(&thr->stdio.output);
+ thr->exit = exit;
+ thr->fn = fn;
+
+ return run_thread_with_file(&thr->thr, &thread_with_stdout_fops, thread_with_stdio_fn);
+}
+EXPORT_SYMBOL_GPL(run_thread_with_stdout);
+
MODULE_AUTHOR("Kent Overstreet");
MODULE_LICENSE("GPL");


2024-02-24 04:15:55

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 09/10] time_stats: report information in json format

On Fri, Feb 23, 2024 at 05:12:26PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <[email protected]>
>
> Export json versions of time statistics information. Given the tabular
> nature of the numbers exposed, this will make it a lot easier for higher
> (than C) level languages (e.g. python) to import information without
> needing to write yet another clumsy string parser.
>
> Signed-off-by: Darrick J. Wong <[email protected]>
> Signed-off-by: Kent Overstreet <[email protected]>
> ---
> include/linux/time_stats.h | 2 +
> lib/time_stats.c | 87 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 89 insertions(+)
>
>
> diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
> index b3c810fff963a..4e1f5485ed039 100644
> --- a/include/linux/time_stats.h
> +++ b/include/linux/time_stats.h
> @@ -156,6 +156,8 @@ static inline bool track_event_change(struct time_stats *stats, bool v)
> struct seq_buf;
> void time_stats_to_seq_buf(struct seq_buf *, struct time_stats *,
> const char *epoch_name, unsigned int flags);
> +void time_stats_to_json(struct seq_buf *, struct time_stats *,
> + const char *epoch_name, unsigned int flags);
>
> void time_stats_exit(struct time_stats *);
> void time_stats_init(struct time_stats *);
> diff --git a/lib/time_stats.c b/lib/time_stats.c
> index 0fb3d854e503b..c0f209dd9f6dd 100644
> --- a/lib/time_stats.c
> +++ b/lib/time_stats.c
> @@ -266,6 +266,93 @@ void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats,
> }
> EXPORT_SYMBOL_GPL(time_stats_to_seq_buf);
>
> +void time_stats_to_json(struct seq_buf *out, struct time_stats *stats,
> + const char *epoch_name, unsigned int flags)
> +{
> + struct quantiles *quantiles = time_stats_to_quantiles(stats);
> + s64 f_mean = 0, d_mean = 0;
> + u64 f_stddev = 0, d_stddev = 0;
> +
> + if (stats->buffer) {
> + int cpu;
> +
> + spin_lock_irq(&stats->lock);
> + for_each_possible_cpu(cpu)
> + __time_stats_clear_buffer(stats, per_cpu_ptr(stats->buffer, cpu));
> + spin_unlock_irq(&stats->lock);
> + }
> +
> + if (stats->freq_stats.n) {
> + /* avoid divide by zero */
> + f_mean = mean_and_variance_get_mean(stats->freq_stats);
> + f_stddev = mean_and_variance_get_stddev(stats->freq_stats);
> + d_mean = mean_and_variance_get_mean(stats->duration_stats);
> + d_stddev = mean_and_variance_get_stddev(stats->duration_stats);
> + } else if (flags & TIME_STATS_PRINT_NO_ZEROES) {
> + /* unless we didn't want zeroes anyway */
> + return;
> + }
> +
> + seq_buf_printf(out, "{\n");
> + seq_buf_printf(out, " \"epoch\": \"%s\",\n", epoch_name);
> + seq_buf_printf(out, " \"count\": %llu,\n", stats->duration_stats.n);
> +
> + seq_buf_printf(out, " \"duration_ns\": {\n");
> + seq_buf_printf(out, " \"min\": %llu,\n", stats->min_duration);
> + seq_buf_printf(out, " \"max\": %llu,\n", stats->max_duration);
> + seq_buf_printf(out, " \"total\": %llu,\n", stats->total_duration);
> + seq_buf_printf(out, " \"mean\": %llu,\n", d_mean);
> + seq_buf_printf(out, " \"stddev\": %llu\n", d_stddev);
> + seq_buf_printf(out, " },\n");
> +
> + d_mean = mean_and_variance_weighted_get_mean(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT);
> + d_stddev = mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT);
> +
> + seq_buf_printf(out, " \"duration_ewma_ns\": {\n");
> + seq_buf_printf(out, " \"mean\": %llu,\n", d_mean);
> + seq_buf_printf(out, " \"stddev\": %llu\n", d_stddev);
> + seq_buf_printf(out, " },\n");
> +
> + seq_buf_printf(out, " \"frequency_ns\": {\n");

I took the variable names too literally here; these labels really ought
to be "between_ns" and "between_ewma_ns" to maintain consistency with
the labels in the table format.

> + seq_buf_printf(out, " \"min\": %llu,\n", stats->min_freq);
> + seq_buf_printf(out, " \"max\": %llu,\n", stats->max_freq);
> + seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
> + seq_buf_printf(out, " \"stddev\": %llu\n", f_stddev);
> + seq_buf_printf(out, " },\n");
> +
> + f_mean = mean_and_variance_weighted_get_mean(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
> + f_stddev = mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
> +
> + seq_buf_printf(out, " \"frequency_ewma_ns\": {\n");
> + seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
> + seq_buf_printf(out, " \"stddev\": %llu\n", f_stddev);
> +
> + if (quantiles) {
> + u64 last_q = 0;
> +
> + /* close frequency_ewma_ns but signal more items */

(also this comment)

> + seq_buf_printf(out, " },\n");
> +
> + seq_buf_printf(out, " \"quantiles_ns\": [\n");
> + eytzinger0_for_each(i, NR_QUANTILES) {
> + bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;
> +
> + u64 q = max(quantiles->entries[i].m, last_q);
> + seq_buf_printf(out, " %llu", q);
> + if (!is_last)
> + seq_buf_printf(out, ", ");
> + last_q = q;
> + }
> + seq_buf_printf(out, " ]\n");
> + } else {
> + /* close frequency_ewma_ns without dumping further */

(this one too)

Kent, would you mind making that edit the next time you reflow your
branch?

--D

> + seq_buf_printf(out, " }\n");
> + }
> +
> + seq_buf_printf(out, "}\n");
> +}
> +EXPORT_SYMBOL_GPL(time_stats_to_json);
> +
> void time_stats_exit(struct time_stats *stats)
> {
> free_percpu(stats->buffer);
>
>

2024-02-24 05:10:56

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 09/10] time_stats: report information in json format

On Fri, Feb 23, 2024 at 08:15:45PM -0800, Darrick J. Wong wrote:
> On Fri, Feb 23, 2024 at 05:12:26PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <[email protected]>
> >
> > Export json versions of time statistics information. Given the tabular
> > nature of the numbers exposed, this will make it a lot easier for higher
> > (than C) level languages (e.g. python) to import information without
> > needing to write yet another clumsy string parser.
> >
> > Signed-off-by: Darrick J. Wong <[email protected]>
> > Signed-off-by: Kent Overstreet <[email protected]>
> > ---
> > include/linux/time_stats.h | 2 +
> > lib/time_stats.c | 87 ++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 89 insertions(+)
> >
> >
> > diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
> > index b3c810fff963a..4e1f5485ed039 100644
> > --- a/include/linux/time_stats.h
> > +++ b/include/linux/time_stats.h
> > @@ -156,6 +156,8 @@ static inline bool track_event_change(struct time_stats *stats, bool v)
> > struct seq_buf;
> > void time_stats_to_seq_buf(struct seq_buf *, struct time_stats *,
> > const char *epoch_name, unsigned int flags);
> > +void time_stats_to_json(struct seq_buf *, struct time_stats *,
> > + const char *epoch_name, unsigned int flags);
> >
> > void time_stats_exit(struct time_stats *);
> > void time_stats_init(struct time_stats *);
> > diff --git a/lib/time_stats.c b/lib/time_stats.c
> > index 0fb3d854e503b..c0f209dd9f6dd 100644
> > --- a/lib/time_stats.c
> > +++ b/lib/time_stats.c
> > @@ -266,6 +266,93 @@ void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats,
> > }
> > EXPORT_SYMBOL_GPL(time_stats_to_seq_buf);
> >
> > +void time_stats_to_json(struct seq_buf *out, struct time_stats *stats,
> > + const char *epoch_name, unsigned int flags)
> > +{
> > + struct quantiles *quantiles = time_stats_to_quantiles(stats);
> > + s64 f_mean = 0, d_mean = 0;
> > + u64 f_stddev = 0, d_stddev = 0;
> > +
> > + if (stats->buffer) {
> > + int cpu;
> > +
> > + spin_lock_irq(&stats->lock);
> > + for_each_possible_cpu(cpu)
> > + __time_stats_clear_buffer(stats, per_cpu_ptr(stats->buffer, cpu));
> > + spin_unlock_irq(&stats->lock);
> > + }
> > +
> > + if (stats->freq_stats.n) {
> > + /* avoid divide by zero */
> > + f_mean = mean_and_variance_get_mean(stats->freq_stats);
> > + f_stddev = mean_and_variance_get_stddev(stats->freq_stats);
> > + d_mean = mean_and_variance_get_mean(stats->duration_stats);
> > + d_stddev = mean_and_variance_get_stddev(stats->duration_stats);
> > + } else if (flags & TIME_STATS_PRINT_NO_ZEROES) {
> > + /* unless we didn't want zeroes anyway */
> > + return;
> > + }
> > +
> > + seq_buf_printf(out, "{\n");
> > + seq_buf_printf(out, " \"epoch\": \"%s\",\n", epoch_name);
> > + seq_buf_printf(out, " \"count\": %llu,\n", stats->duration_stats.n);
> > +
> > + seq_buf_printf(out, " \"duration_ns\": {\n");
> > + seq_buf_printf(out, " \"min\": %llu,\n", stats->min_duration);
> > + seq_buf_printf(out, " \"max\": %llu,\n", stats->max_duration);
> > + seq_buf_printf(out, " \"total\": %llu,\n", stats->total_duration);
> > + seq_buf_printf(out, " \"mean\": %llu,\n", d_mean);
> > + seq_buf_printf(out, " \"stddev\": %llu\n", d_stddev);
> > + seq_buf_printf(out, " },\n");
> > +
> > + d_mean = mean_and_variance_weighted_get_mean(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT);
> > + d_stddev = mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT);
> > +
> > + seq_buf_printf(out, " \"duration_ewma_ns\": {\n");
> > + seq_buf_printf(out, " \"mean\": %llu,\n", d_mean);
> > + seq_buf_printf(out, " \"stddev\": %llu\n", d_stddev);
> > + seq_buf_printf(out, " },\n");
> > +
> > + seq_buf_printf(out, " \"frequency_ns\": {\n");
>
> I took the variable names too literally here; these labels really ought
> to be "between_ns" and "between_ewma_ns" to maintain consistency with
> the labels in the table format.
>
> > + seq_buf_printf(out, " \"min\": %llu,\n", stats->min_freq);
> > + seq_buf_printf(out, " \"max\": %llu,\n", stats->max_freq);
> > + seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
> > + seq_buf_printf(out, " \"stddev\": %llu\n", f_stddev);
> > + seq_buf_printf(out, " },\n");
> > +
> > + f_mean = mean_and_variance_weighted_get_mean(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
> > + f_stddev = mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
> > +
> > + seq_buf_printf(out, " \"frequency_ewma_ns\": {\n");
> > + seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
> > + seq_buf_printf(out, " \"stddev\": %llu\n", f_stddev);
> > +
> > + if (quantiles) {
> > + u64 last_q = 0;
> > +
> > + /* close frequency_ewma_ns but signal more items */
>
> (also this comment)
>
> > + seq_buf_printf(out, " },\n");
> > +
> > + seq_buf_printf(out, " \"quantiles_ns\": [\n");
> > + eytzinger0_for_each(i, NR_QUANTILES) {
> > + bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;
> > +
> > + u64 q = max(quantiles->entries[i].m, last_q);
> > + seq_buf_printf(out, " %llu", q);
> > + if (!is_last)
> > + seq_buf_printf(out, ", ");
> > + last_q = q;
> > + }
> > + seq_buf_printf(out, " ]\n");
> > + } else {
> > + /* close frequency_ewma_ns without dumping further */
>
> (this one too)
>
> Kent, would you mind making that edit the next time you reflow your
> branch?
>
> --D
>
> > + seq_buf_printf(out, " }\n");
> > + }
> > +
> > + seq_buf_printf(out, "}\n");
> > +}
> > +EXPORT_SYMBOL_GPL(time_stats_to_json);
> > +
> > void time_stats_exit(struct time_stats *stats)
> > {
> > free_percpu(stats->buffer);
> >
> >


From 5885a65fa5a0aace7bdf1a8fa58ac2bca3b15900 Mon Sep 17 00:00:00 2001
From: Kent Overstreet <[email protected]>
Date: Sat, 24 Feb 2024 00:10:06 -0500
Subject: [PATCH] fixup! time_stats: report information in json format


diff --git a/lib/time_stats.c b/lib/time_stats.c
index 0b90c80cba9f..d7dd64baebb8 100644
--- a/lib/time_stats.c
+++ b/lib/time_stats.c
@@ -313,7 +313,7 @@ void time_stats_to_json(struct seq_buf *out, struct time_stats *stats,
seq_buf_printf(out, " \"stddev\": %llu\n", d_stddev);
seq_buf_printf(out, " },\n");

- seq_buf_printf(out, " \"frequency_ns\": {\n");
+ seq_buf_printf(out, " \"between_ns\": {\n");
seq_buf_printf(out, " \"min\": %llu,\n", stats->min_freq);
seq_buf_printf(out, " \"max\": %llu,\n", stats->max_freq);
seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
@@ -323,14 +323,14 @@ void time_stats_to_json(struct seq_buf *out, struct time_stats *stats,
f_mean = mean_and_variance_weighted_get_mean(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
f_stddev = mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);

- seq_buf_printf(out, " \"frequency_ewma_ns\": {\n");
+ seq_buf_printf(out, " \"between_ewma_ns\": {\n");
seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
seq_buf_printf(out, " \"stddev\": %llu\n", f_stddev);

if (quantiles) {
u64 last_q = 0;

- /* close frequency_ewma_ns but signal more items */
+ /* close between_ewma_ns but signal more items */
seq_buf_printf(out, " },\n");

seq_buf_printf(out, " \"quantiles_ns\": [\n");
@@ -345,7 +345,7 @@ void time_stats_to_json(struct seq_buf *out, struct time_stats *stats,
}
seq_buf_printf(out, " ]\n");
} else {
- /* close frequency_ewma_ns without dumping further */
+ /* close between_ewma_ns without dumping further */
seq_buf_printf(out, " }\n");
}


2024-02-24 06:02:59

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 09/10] time_stats: report information in json format

On Sat, Feb 24, 2024 at 12:10:33AM -0500, Kent Overstreet wrote:
> On Fri, Feb 23, 2024 at 08:15:45PM -0800, Darrick J. Wong wrote:
> > On Fri, Feb 23, 2024 at 05:12:26PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <[email protected]>
> > >
> > > Export json versions of time statistics information. Given the tabular
> > > nature of the numbers exposed, this will make it a lot easier for higher
> > > (than C) level languages (e.g. python) to import information without
> > > needing to write yet another clumsy string parser.
> > >
> > > Signed-off-by: Darrick J. Wong <[email protected]>
> > > Signed-off-by: Kent Overstreet <[email protected]>
> > > ---
> > > include/linux/time_stats.h | 2 +
> > > lib/time_stats.c | 87 ++++++++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 89 insertions(+)
> > >
> > >
> > > diff --git a/include/linux/time_stats.h b/include/linux/time_stats.h
> > > index b3c810fff963a..4e1f5485ed039 100644
> > > --- a/include/linux/time_stats.h
> > > +++ b/include/linux/time_stats.h
> > > @@ -156,6 +156,8 @@ static inline bool track_event_change(struct time_stats *stats, bool v)
> > > struct seq_buf;
> > > void time_stats_to_seq_buf(struct seq_buf *, struct time_stats *,
> > > const char *epoch_name, unsigned int flags);
> > > +void time_stats_to_json(struct seq_buf *, struct time_stats *,
> > > + const char *epoch_name, unsigned int flags);
> > >
> > > void time_stats_exit(struct time_stats *);
> > > void time_stats_init(struct time_stats *);
> > > diff --git a/lib/time_stats.c b/lib/time_stats.c
> > > index 0fb3d854e503b..c0f209dd9f6dd 100644
> > > --- a/lib/time_stats.c
> > > +++ b/lib/time_stats.c
> > > @@ -266,6 +266,93 @@ void time_stats_to_seq_buf(struct seq_buf *out, struct time_stats *stats,
> > > }
> > > EXPORT_SYMBOL_GPL(time_stats_to_seq_buf);
> > >
> > > +void time_stats_to_json(struct seq_buf *out, struct time_stats *stats,
> > > + const char *epoch_name, unsigned int flags)
> > > +{
> > > + struct quantiles *quantiles = time_stats_to_quantiles(stats);
> > > + s64 f_mean = 0, d_mean = 0;
> > > + u64 f_stddev = 0, d_stddev = 0;
> > > +
> > > + if (stats->buffer) {
> > > + int cpu;
> > > +
> > > + spin_lock_irq(&stats->lock);
> > > + for_each_possible_cpu(cpu)
> > > + __time_stats_clear_buffer(stats, per_cpu_ptr(stats->buffer, cpu));
> > > + spin_unlock_irq(&stats->lock);
> > > + }
> > > +
> > > + if (stats->freq_stats.n) {
> > > + /* avoid divide by zero */
> > > + f_mean = mean_and_variance_get_mean(stats->freq_stats);
> > > + f_stddev = mean_and_variance_get_stddev(stats->freq_stats);
> > > + d_mean = mean_and_variance_get_mean(stats->duration_stats);
> > > + d_stddev = mean_and_variance_get_stddev(stats->duration_stats);
> > > + } else if (flags & TIME_STATS_PRINT_NO_ZEROES) {
> > > + /* unless we didn't want zeroes anyway */
> > > + return;
> > > + }
> > > +
> > > + seq_buf_printf(out, "{\n");
> > > + seq_buf_printf(out, " \"epoch\": \"%s\",\n", epoch_name);
> > > + seq_buf_printf(out, " \"count\": %llu,\n", stats->duration_stats.n);
> > > +
> > > + seq_buf_printf(out, " \"duration_ns\": {\n");
> > > + seq_buf_printf(out, " \"min\": %llu,\n", stats->min_duration);
> > > + seq_buf_printf(out, " \"max\": %llu,\n", stats->max_duration);
> > > + seq_buf_printf(out, " \"total\": %llu,\n", stats->total_duration);
> > > + seq_buf_printf(out, " \"mean\": %llu,\n", d_mean);
> > > + seq_buf_printf(out, " \"stddev\": %llu\n", d_stddev);
> > > + seq_buf_printf(out, " },\n");
> > > +
> > > + d_mean = mean_and_variance_weighted_get_mean(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT);
> > > + d_stddev = mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT);
> > > +
> > > + seq_buf_printf(out, " \"duration_ewma_ns\": {\n");
> > > + seq_buf_printf(out, " \"mean\": %llu,\n", d_mean);
> > > + seq_buf_printf(out, " \"stddev\": %llu\n", d_stddev);
> > > + seq_buf_printf(out, " },\n");
> > > +
> > > + seq_buf_printf(out, " \"frequency_ns\": {\n");
> >
> > I took the variable names too literally here; these labels really ought
> > to be "between_ns" and "between_ewma_ns" to maintain consistency with
> > the labels in the table format.
> >
> > > + seq_buf_printf(out, " \"min\": %llu,\n", stats->min_freq);
> > > + seq_buf_printf(out, " \"max\": %llu,\n", stats->max_freq);
> > > + seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
> > > + seq_buf_printf(out, " \"stddev\": %llu\n", f_stddev);
> > > + seq_buf_printf(out, " },\n");
> > > +
> > > + f_mean = mean_and_variance_weighted_get_mean(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
> > > + f_stddev = mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
> > > +
> > > + seq_buf_printf(out, " \"frequency_ewma_ns\": {\n");
> > > + seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
> > > + seq_buf_printf(out, " \"stddev\": %llu\n", f_stddev);
> > > +
> > > + if (quantiles) {
> > > + u64 last_q = 0;
> > > +
> > > + /* close frequency_ewma_ns but signal more items */
> >
> > (also this comment)
> >
> > > + seq_buf_printf(out, " },\n");
> > > +
> > > + seq_buf_printf(out, " \"quantiles_ns\": [\n");
> > > + eytzinger0_for_each(i, NR_QUANTILES) {
> > > + bool is_last = eytzinger0_next(i, NR_QUANTILES) == -1;
> > > +
> > > + u64 q = max(quantiles->entries[i].m, last_q);
> > > + seq_buf_printf(out, " %llu", q);
> > > + if (!is_last)
> > > + seq_buf_printf(out, ", ");
> > > + last_q = q;
> > > + }
> > > + seq_buf_printf(out, " ]\n");
> > > + } else {
> > > + /* close frequency_ewma_ns without dumping further */
> >
> > (this one too)
> >
> > Kent, would you mind making that edit the next time you reflow your
> > branch?
> >
> > --D
> >
> > > + seq_buf_printf(out, " }\n");
> > > + }
> > > +
> > > + seq_buf_printf(out, "}\n");
> > > +}
> > > +EXPORT_SYMBOL_GPL(time_stats_to_json);
> > > +
> > > void time_stats_exit(struct time_stats *stats)
> > > {
> > > free_percpu(stats->buffer);
> > >
> > >
>
>
> From 5885a65fa5a0aace7bdf1a8fa58ac2bca3b15900 Mon Sep 17 00:00:00 2001
> From: Kent Overstreet <[email protected]>
> Date: Sat, 24 Feb 2024 00:10:06 -0500
> Subject: [PATCH] fixup! time_stats: report information in json format
>
>
> diff --git a/lib/time_stats.c b/lib/time_stats.c
> index 0b90c80cba9f..d7dd64baebb8 100644
> --- a/lib/time_stats.c
> +++ b/lib/time_stats.c
> @@ -313,7 +313,7 @@ void time_stats_to_json(struct seq_buf *out, struct time_stats *stats,
> seq_buf_printf(out, " \"stddev\": %llu\n", d_stddev);
> seq_buf_printf(out, " },\n");
>
> - seq_buf_printf(out, " \"frequency_ns\": {\n");
> + seq_buf_printf(out, " \"between_ns\": {\n");
> seq_buf_printf(out, " \"min\": %llu,\n", stats->min_freq);
> seq_buf_printf(out, " \"max\": %llu,\n", stats->max_freq);
> seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
> @@ -323,14 +323,14 @@ void time_stats_to_json(struct seq_buf *out, struct time_stats *stats,
> f_mean = mean_and_variance_weighted_get_mean(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
> f_stddev = mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT);
>
> - seq_buf_printf(out, " \"frequency_ewma_ns\": {\n");
> + seq_buf_printf(out, " \"between_ewma_ns\": {\n");

Looks good to me,
Reviewed-by: Darrick J. Wong <[email protected]>

--D

> seq_buf_printf(out, " \"mean\": %llu,\n", f_mean);
> seq_buf_printf(out, " \"stddev\": %llu\n", f_stddev);
>
> if (quantiles) {
> u64 last_q = 0;
>
> - /* close frequency_ewma_ns but signal more items */
> + /* close between_ewma_ns but signal more items */
> seq_buf_printf(out, " },\n");
>
> seq_buf_printf(out, " \"quantiles_ns\": [\n");
> @@ -345,7 +345,7 @@ void time_stats_to_json(struct seq_buf *out, struct time_stats *stats,
> }
> seq_buf_printf(out, " ]\n");
> } else {
> - /* close frequency_ewma_ns without dumping further */
> + /* close between_ewma_ns without dumping further */
> seq_buf_printf(out, " }\n");
> }
>