Subject: [PATCH v7 0/4] use bin_attribute to break the size limitation of cpumap ABI

v7:
- update doc in code for new APIs according to the comments of
Andy Shevchenko;
- other minor cleanup and commit log fix according to the comments
of Andy Shevchenko

v6:
-minor cleanup according to Andy Shevchenko's comment;
-take bitmap_print_to_buf back according to Yury Norov's comment and
fix the documents; and also take the bitmap testcase back.
-Sorry, Yury, I don't think it is doable to move memory allocation
to drivers.
Considering a driver like topology.c, we have M CPUs and each CPU
has N various nodes like core_siblings, package_cpus, die_cpus etc,
we can't know the size of each node of each CPU in advance. The best
time to get the size of each node is really when users read the sysfs
node. otherwise, we have to scan M*N nodes in drivers in advance to
figure out the exact size of buffers we need.
On the other hand, it is crazily tricky to ask a bundle of drivers to
find a proper place to save the pointer of allocated buffers so that
they can be re-used in second read of the same bin_attribute node.
And I doubt it is really useful to save the address of buffers
somewhere. Immediately freeing it seems to be a correct option to
avoid runtime waste of memory. We can't predict when users will read
topology ABI and which node users will read.
Finally, reading topology things wouldn't be the really cpu-bound
things in user applications, hardly this kind of ABI things can be
a performance bottleneck. Users use numactl and lstopo commands to
read ABIs but nobody will do it again and again. And a normal
application won't do topology repeatly. So the overhead caused by
malloc/free in the new bitmap API doesn't really matter.
if we really want a place to re-used the buffer and avoid malloc/free,
it seems this should be done in some common place rather than each
driver. still it is hard to find the best place.

Thanks for the comments of Yury and Andy in v5.

v5:
-remove the bitmap API bitmap_print_to_buf, alternatively, only provide
cpumap_print_to_buf API as what we really care about is cpumask for
this moment. we can freely take bitmap_print_to_buf back once we find
the second user.
hopefully this can alleviate Yury's worries on possible abuse of a new
bitmap API.
-correct the document of cpumap_print_to_buf;
-In patch1, clearly explain why we need this new API in commit log;
-Also refine the commit log of patch 2 and 3;
-As the modification is narrowed to the scope of cpumask, the kunit
test of bitmap_print_to_buf doesn't apply in the new patchset. so
test case patch4/4 is removed.

Thanks for the comments of Greg, Yury, Andy. Thanks for Jonathan's
review.

v4:
-add test cases for bitmap_print_to_buf API;
-add Reviewed-by of Jonathan Cameron for patches 1-3, thanks!

v3:
-fixed the strlen issue and patch #1,#2,#3 minor formatting issues, thanks
to Andy Shevchenko and Jonathan Cameron.

v2:
-split the original patch #1 into two patches and use kasprintf() in
-patch #1 to simplify the code. do some minor formatting adjustments.

Background:

the whole story began from this thread when Jonatah and me tried to add a
new topology level-cluster which exists on kunpeng920 and X86 Jacobsville:
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/

in the discussion, Greg had some concern about the potential one page size
limitation of sysfs ABI for topology. Greg's comment is reasonable
and I think we should address the problem.

For this moment, numa node, cpu topology and some other drivers are using
cpu bitmap and list to expose hardware topology. When cpu number is large,
the page buffer of sysfs won't be able to hold the whole bitmask or list.
This doesn't really happen nowadays for bitmask as the maximum NR_CPUS
is 8196 for X86_64 and 4096 for ARM64 since
8196 * 9 / 32 = 2305
is still smaller than 4KB page size.

So the existing BUILD_BUG_ON() in drivers/base/node.c is pretty much
preventing future problems when hardware gets more and more CPUs:
static ssize_t node_read_cpumap(struct device *dev, bool list, char *buf)
{
cpumask_var_t mask;
struct node *node_dev = to_node(dev);

/* 2008/04/07: buf currently PAGE_SIZE, need 9 chars per 32 bits. */
BUILD_BUG_ON((NR_CPUS/32 * 9) > (PAGE_SIZE-1));
}

But those ABIs exposing cpu lists are much more tricky as a list could be
like: 0, 3, 5, 7, 9, 11... etc. so nobody knows the size till the last
moment. Comparing to bitmask, list is easier to exceed one page.

In the previous discussion, Greg and Dave Hansen preferred to remove this
kind of limitation totally and remove the BUILD_BUG_ON() in
drivers/base/node.c together:
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/

Todo:

right now, only topology and node are addressed. there are many other
drivers are calling cpumap_print_to_pagebuf() and have the similar
problems. we are going to address them one by one after this patchset
settles down.

Barry Song (1):
lib: test_bitmap: add bitmap_print_to_buf test cases

Tian Tao (3):
cpumask: introduce cpumap_print_to_buf to support large bitmask and
list
topology: use bin_attribute to break the size limitation of cpumap ABI
drivers/base/node.c: use bin_attribute to break the size limitation of
cpumap ABI

drivers/base/node.c | 51 +++++++++-----
drivers/base/topology.c | 115 ++++++++++++++++--------------
include/linux/bitmap.h | 2 +
include/linux/cpumask.h | 63 +++++++++++++++++
lib/bitmap.c | 78 +++++++++++++++++++++
lib/test_bitmap.c | 150 ++++++++++++++++++++++++++++++++++++++++
6 files changed, 389 insertions(+), 70 deletions(-)

--
2.25.1


Subject: [PATCH v7 1/4] cpumask: introduce cpumap_print_to_buf to support large bitmask and list

From: Tian Tao <[email protected]>

The existing cpumap_print_to_pagebuf() is used by cpu topology and other
drivers to export hexadecimal bitmask and decimal list to userspace by
sysfs ABI.

Right now, those drivers are using a normal attribute for this kind of
ABIs. A normal attribute typically has show entry as below:

static ssize_t example_dev_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
...
return cpumap_print_to_pagebuf(true, buf, &pmu_mmdc->cpu);
}
show entry of attribute has no offset and count parameters and this
means the file is limited to one page only.

cpumap_print_to_pagebuf() API works terribly well for this kind of
normal attribute with buf parameter and without offset, count:

static inline ssize_t
cpumap_print_to_pagebuf(bool list, char *buf, const struct cpumask *mask)
{
return bitmap_print_to_pagebuf(list, buf, cpumask_bits(mask),
nr_cpu_ids);
}

The problem is once we have many cpus, we have a chance to make bitmask
or list more than one page. Especially for list, it could be as complex
as 0,3,5,7,9,...... We have no simple way to know it exact size.

It turns out bin_attribute is a way to break this limit. bin_attribute
has show entry as below:
static ssize_t
example_bin_attribute_show(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
loff_t offset, size_t count)
{
...
}

With the new offset and count parameters, this makes sysfs ABI be able
to support file size more than one page. For example, offset could be
>= 4096.

This patch introduces cpumap_print_to_buf() and its bitmap infrastructure
bitmap_print_to_buf() so that those drivers can move to bin_attribute to
support large bitmask and list. At the same time, we have to pass those
corresponding parameters such as offset, count from bin_attribute to this
new API.

Signed-off-by: Tian Tao <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: "Ma, Jianpeng" <[email protected]>
Cc: Yury Norov <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Barry Song <[email protected]>
---
v7:
explanation deserves to be a paragraph in code according to Andy's
comments;

include/linux/bitmap.h | 2 ++
include/linux/cpumask.h | 63 +++++++++++++++++++++++++++++++++
lib/bitmap.c | 78 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 143 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index a36cfcec4e77..0de6effa2797 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -226,6 +226,8 @@ void bitmap_copy_le(unsigned long *dst, const unsigned long *src, unsigned int n
unsigned int bitmap_ord_to_pos(const unsigned long *bitmap, unsigned int ord, unsigned int nbits);
int bitmap_print_to_pagebuf(bool list, char *buf,
const unsigned long *maskp, int nmaskbits);
+int bitmap_print_to_buf(bool list, char *buf, const unsigned long *maskp,
+ int nmaskbits, loff_t off, size_t count);

#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
#define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f3689a52bfd0..f81ade866cf7 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -983,6 +983,69 @@ cpumap_print_to_pagebuf(bool list, char *buf, const struct cpumask *mask)
nr_cpu_ids);
}

+/**
+ * cpumap_print_to_buf - copies the cpumask into the buffer
+ * @list: indicates whether the cpumap must be list
+ * true: print in decimal list format
+ * false: print in hexadecimal bitmask format
+ *
+ * The existing cpumap_print_to_pagebuf() is used by cpu topology and other
+ * drivers to export hexadecimal bitmask and decimal list to userspace by
+ * sysfs ABI.
+ * Drivers might be using a normal attribute for this kind of ABIs. A
+ * normal attribute typically has show entry as below:
+ * static ssize_t example_attribute_show(struct device *dev,
+ * struct device_attribute *attr, char *buf)
+ * {
+ * ...
+ * return cpumap_print_to_pagebuf(true, buf, &pmu_mmdc->cpu);
+ * }
+ * show entry of attribute has no offset and count parameters. this means
+ * the file is limited to one page only.
+ * cpumap_print_to_pagebuf() API works terribly well for this kind of
+ * normal attribute with buf parameter and without offset, count:
+ * cpumap_print_to_pagebuf(bool list, char *buf, const struct cpumask *mask)
+ * {
+ * }
+ * The problem is once we have many cpus, we have a chance to make bitmask
+ * or list more than one page. Especially for list, it could be as complex
+ * as 0,3,5,7,9,... We have no simple way to know it exact size.
+ * It turns out bin_attribute is a way to break this limit. bin_attribute
+ * has show entry as below:
+ * static ssize_t
+ * example_bin_attribute_show(struct file *filp, struct kobject *kobj,
+ * struct bin_attribute *attr, char *buf,
+ * loff_t offset, size_t count)
+ * {
+ * ...
+ * }
+ * With the new offset and count parameters, this makes sysfs ABI be able
+ * to support file size more than one page. For example, offset could be
+ * >= 4096.
+ * cpumap_print_to_buf() makes those drivers be able to to support large
+ * bitmask and list after they move to use bin_attribute. In result, we
+ * have to pass the corresponding parameters such as off, count from
+ * bin_attribute show entry to this API.
+ *
+ * @mask: the cpumask to copy
+ * @buf: the buffer to copy into
+ * @off: in the string from which we are copying, We copy to @buf
+ * @count: the maximum number of bytes to print
+ *
+ * The function copies the cpumask into the buffer either as comma-separated
+ * list of cpus or hex values of cpumask; Typically used by bin_attribute to
+ * export cpumask bitmask and list ABI.
+ *
+ * Returns the length of how many bytes have been copied.
+ */
+static inline ssize_t
+cpumap_print_to_buf(bool list, char *buf, const struct cpumask *mask,
+ loff_t off, size_t count)
+{
+ return bitmap_print_to_buf(list, buf, cpumask_bits(mask),
+ nr_cpu_ids, off, count);
+}
+
#if NR_CPUS <= BITS_PER_LONG
#define CPU_MASK_ALL \
(cpumask_t) { { \
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 9401d39e4722..56bcffe2fa8c 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -487,6 +487,84 @@ int bitmap_print_to_pagebuf(bool list, char *buf, const unsigned long *maskp,
}
EXPORT_SYMBOL(bitmap_print_to_pagebuf);

+/**
+ * bitmap_print_to_buf - convert bitmap to list or hex format ASCII string
+ * @list: indicates whether the bitmap must be list
+ * true: print in decimal list format
+ * false: print in hexadecimal bitmask format
+ *
+ * The bitmap_print_to_pagebuf() is used indirectly via its cpumap wrapper
+ * cpumap_print_to_pagebuf() or directly by drivers to export hexadecimal
+ * bitmask and decimal list to userspace by sysfs ABI.
+ * Drivers might be using a normal attribute for this kind of ABIs. A
+ * normal attribute typically has show entry as below:
+ * static ssize_t example_attribute_show(struct device *dev,
+ * struct device_attribute *attr, char *buf)
+ * {
+ * ...
+ * return bitmap_print_to_pagebuf(true, buf, &mask, nr_trig_max);
+ * }
+ * show entry of attribute has no offset and count parameters and this
+ * means the file is limited to one page only.
+ * bitmap_print_to_pagebuf() API works terribly well for this kind of
+ * normal attribute with buf parameter and without offset, count:
+ * bitmap_print_to_pagebuf(bool list, char *buf, const unsigned long *maskp,
+ * int nmaskbits)
+ * {
+ * }
+ * The problem is once we have a large bitmap, we have a chance to get a
+ * bitmask or list more than one page. Especially for list, it could be
+ * as complex as 0,3,5,7,9,... We have no simple way to know it exact size.
+ * It turns out bin_attribute is a way to break this limit. bin_attribute
+ * has show entry as below:
+ * static ssize_t
+ * example_bin_attribute_show(struct file *filp, struct kobject *kobj,
+ * struct bin_attribute *attr, char *buf,
+ * loff_t offset, size_t count)
+ * {
+ * ...
+ * }
+ * With the new offset and count parameters, this makes sysfs ABI be able
+ * to support file size more than one page. For example, offset could be
+ * >= 4096.
+ * bitmap_print_to_buf() and its cpumap wrapper cpumap_print_to_buf() makes
+ * those drivers be able to support large bitmask and list after they move
+ * to use bin_attribute. In result, we have to pass the corresponding
+ * parameters such as off, count from bin_attribute show entry to this API.
+ *
+ * @buf: buffer into which string is placed
+ * @maskp: pointer to bitmap to convert
+ * @nmaskbits: size of bitmap, in bits
+ * @off: in the string from which we are copying, We copy to @buf
+ * @count: the maximum number of bytes to print
+ *
+ * The role of cpumap_print_to_buf() and cpumap_print_to_pagebuf() is similar,
+ * the difference is that bitmap_print_to_pagebuf() mainly serves sysfs
+ * attribute with the assumption the destination buffer is exactly one page
+ * and won't be more than one page. cpumap_print_to_buf(), on the other hand,
+ * mainly serves bin_attribute which doesn't work with exact one page, and it
+ * can break the size limit of converted decimal list and hexadecimal bitmask.
+ *
+ * Returns the number of characters actually printed to @buf
+ */
+int bitmap_print_to_buf(bool list, char *buf, const unsigned long *maskp,
+ int nmaskbits, loff_t off, size_t count)
+{
+ const char *fmt = list ? "%*pbl\n" : "%*pb\n";
+ ssize_t size;
+ void *data;
+
+ data = kasprintf(GFP_KERNEL, fmt, nmaskbits, maskp);
+ if (!data)
+ return -ENOMEM;
+
+ size = memory_read_from_buffer(buf, count, &off, data, strlen(data) + 1);
+ kfree(data);
+
+ return size;
+}
+EXPORT_SYMBOL(bitmap_print_to_buf);
+
/*
* Region 9-38:4/10 describes the following bitmap structure:
* 0 9 12 18 38 N
--
2.25.1

Subject: [PATCH v7 4/4] lib: test_bitmap: add bitmap_print_to_buf test cases

The added test items cover both cases where bitmap buf of the printed
result is greater than and less than 4KB.
And it also covers the case where offset for bitmap_print_to_buf is
non-zero which will happen when printed buf is larger than one page
in sysfs bin_attribute.

Signed-off-by: Barry Song <[email protected]>
---
-v7:
minor cleanup according to Andy's comments

lib/test_bitmap.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 150 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 4ea73f5aed41..eb8ebaf12865 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -19,6 +19,7 @@
KSTM_MODULE_GLOBALS();

static char pbl_buffer[PAGE_SIZE] __initdata;
+static char print_buf[PAGE_SIZE * 2] __initdata;

static const unsigned long exp1[] __initconst = {
BITMAP_FROM_U64(1),
@@ -156,6 +157,20 @@ static bool __init __check_eq_clump8(const char *srcfile, unsigned int line,
return true;
}

+static bool __init
+__check_eq_str(const char *srcfile, unsigned int line,
+ const char *exp_str, const char *str,
+ unsigned int len)
+{
+ bool eq;
+
+ eq = strncmp(exp_str, str, len) == 0;
+ if (!eq)
+ pr_err("[%s:%u] expected %s, got %s\n", srcfile, line, exp_str, str);
+
+ return eq;
+}
+
#define __expect_eq(suffix, ...) \
({ \
int result = 0; \
@@ -173,6 +188,7 @@ static bool __init __check_eq_clump8(const char *srcfile, unsigned int line,
#define expect_eq_pbl(...) __expect_eq(pbl, ##__VA_ARGS__)
#define expect_eq_u32_array(...) __expect_eq(u32_array, ##__VA_ARGS__)
#define expect_eq_clump8(...) __expect_eq(clump8, ##__VA_ARGS__)
+#define expect_eq_str(...) __expect_eq(str, ##__VA_ARGS__)

static void __init test_zero_clear(void)
{
@@ -660,6 +676,139 @@ static void __init test_bitmap_cut(void)
}
}

+struct test_bitmap_print {
+ const unsigned long *bitmap;
+ unsigned long nbits;
+ const char *mask;
+ const char *list;
+};
+
+static const unsigned long small_bitmap[] __initconst = {
+ BITMAP_FROM_U64(0x3333333311111111ULL),
+};
+
+static const char small_mask[] __initconst = "33333333,11111111\n";
+static const char small_list[] __initconst = "0,4,8,12,16,20,24,28,32-33,36-37,40-41,44-45,48-49,52-53,56-57,60-61\n";
+
+static const unsigned long large_bitmap[] __initconst = {
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+ BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
+};
+
+static const char large_mask[] __initconst = "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111,"
+ "33333333,11111111,33333333,11111111\n";
+
+static const char large_list[] __initconst = /* more than 4KB */
+ "0,4,8,12,16,20,24,28,32-33,36-37,40-41,44-45,48-49,52-53,56-57,60-61,64,68,72,76,80,84,88,92,96-97,100-101,104-1"
+ "05,108-109,112-113,116-117,120-121,124-125,128,132,136,140,144,148,152,156,160-161,164-165,168-169,172-173,176-1"
+ "77,180-181,184-185,188-189,192,196,200,204,208,212,216,220,224-225,228-229,232-233,236-237,240-241,244-245,248-2"
+ "49,252-253,256,260,264,268,272,276,280,284,288-289,292-293,296-297,300-301,304-305,308-309,312-313,316-317,320,3"
+ "24,328,332,336,340,344,348,352-353,356-357,360-361,364-365,368-369,372-373,376-377,380-381,384,388,392,396,400,4"
+ "04,408,412,416-417,420-421,424-425,428-429,432-433,436-437,440-441,444-445,448,452,456,460,464,468,472,476,480-4"
+ "81,484-485,488-489,492-493,496-497,500-501,504-505,508-509,512,516,520,524,528,532,536,540,544-545,548-549,552-5"
+ "53,556-557,560-561,564-565,568-569,572-573,576,580,584,588,592,596,600,604,608-609,612-613,616-617,620-621,624-6"
+ "25,628-629,632-633,636-637,640,644,648,652,656,660,664,668,672-673,676-677,680-681,684-685,688-689,692-693,696-6"
+ "97,700-701,704,708,712,716,720,724,728,732,736-737,740-741,744-745,748-749,752-753,756-757,760-761,764-765,768,7"
+ "72,776,780,784,788,792,796,800-801,804-805,808-809,812-813,816-817,820-821,824-825,828-829,832,836,840,844,848,8"
+ "52,856,860,864-865,868-869,872-873,876-877,880-881,884-885,888-889,892-893,896,900,904,908,912,916,920,924,928-9"
+ "29,932-933,936-937,940-941,944-945,948-949,952-953,956-957,960,964,968,972,976,980,984,988,992-993,996-997,1000-"
+ "1001,1004-1005,1008-1009,1012-1013,1016-1017,1020-1021,1024,1028,1032,1036,1040,1044,1048,1052,1056-1057,1060-10"
+ "61,1064-1065,1068-1069,1072-1073,1076-1077,1080-1081,1084-1085,1088,1092,1096,1100,1104,1108,1112,1116,1120-1121"
+ ",1124-1125,1128-1129,1132-1133,1136-1137,1140-1141,1144-1145,1148-1149,1152,1156,1160,1164,1168,1172,1176,1180,1"
+ "184-1185,1188-1189,1192-1193,1196-1197,1200-1201,1204-1205,1208-1209,1212-1213,1216,1220,1224,1228,1232,1236,124"
+ "0,1244,1248-1249,1252-1253,1256-1257,1260-1261,1264-1265,1268-1269,1272-1273,1276-1277,1280,1284,1288,1292,1296,"
+ "1300,1304,1308,1312-1313,1316-1317,1320-1321,1324-1325,1328-1329,1332-1333,1336-1337,1340-1341,1344,1348,1352,13"
+ "56,1360,1364,1368,1372,1376-1377,1380-1381,1384-1385,1388-1389,1392-1393,1396-1397,1400-1401,1404-1405,1408,1412"
+ ",1416,1420,1424,1428,1432,1436,1440-1441,1444-1445,1448-1449,1452-1453,1456-1457,1460-1461,1464-1465,1468-1469,1"
+ "472,1476,1480,1484,1488,1492,1496,1500,1504-1505,1508-1509,1512-1513,1516-1517,1520-1521,1524-1525,1528-1529,153"
+ "2-1533,1536,1540,1544,1548,1552,1556,1560,1564,1568-1569,1572-1573,1576-1577,1580-1581,1584-1585,1588-1589,1592-"
+ "1593,1596-1597,1600,1604,1608,1612,1616,1620,1624,1628,1632-1633,1636-1637,1640-1641,1644-1645,1648-1649,1652-16"
+ "53,1656-1657,1660-1661,1664,1668,1672,1676,1680,1684,1688,1692,1696-1697,1700-1701,1704-1705,1708-1709,1712-1713"
+ ",1716-1717,1720-1721,1724-1725,1728,1732,1736,1740,1744,1748,1752,1756,1760-1761,1764-1765,1768-1769,1772-1773,1"
+ "776-1777,1780-1781,1784-1785,1788-1789,1792,1796,1800,1804,1808,1812,1816,1820,1824-1825,1828-1829,1832-1833,183"
+ "6-1837,1840-1841,1844-1845,1848-1849,1852-1853,1856,1860,1864,1868,1872,1876,1880,1884,1888-1889,1892-1893,1896-"
+ "1897,1900-1901,1904-1905,1908-1909,1912-1913,1916-1917,1920,1924,1928,1932,1936,1940,1944,1948,1952-1953,1956-19"
+ "57,1960-1961,1964-1965,1968-1969,1972-1973,1976-1977,1980-1981,1984,1988,1992,1996,2000,2004,2008,2012,2016-2017"
+ ",2020-2021,2024-2025,2028-2029,2032-2033,2036-2037,2040-2041,2044-2045,2048,2052,2056,2060,2064,2068,2072,2076,2"
+ "080-2081,2084-2085,2088-2089,2092-2093,2096-2097,2100-2101,2104-2105,2108-2109,2112,2116,2120,2124,2128,2132,213"
+ "6,2140,2144-2145,2148-2149,2152-2153,2156-2157,2160-2161,2164-2165,2168-2169,2172-2173,2176,2180,2184,2188,2192,"
+ "2196,2200,2204,2208-2209,2212-2213,2216-2217,2220-2221,2224-2225,2228-2229,2232-2233,2236-2237,2240,2244,2248,22"
+ "52,2256,2260,2264,2268,2272-2273,2276-2277,2280-2281,2284-2285,2288-2289,2292-2293,2296-2297,2300-2301,2304,2308"
+ ",2312,2316,2320,2324,2328,2332,2336-2337,2340-2341,2344-2345,2348-2349,2352-2353,2356-2357,2360-2361,2364-2365,2"
+ "368,2372,2376,2380,2384,2388,2392,2396,2400-2401,2404-2405,2408-2409,2412-2413,2416-2417,2420-2421,2424-2425,242"
+ "8-2429,2432,2436,2440,2444,2448,2452,2456,2460,2464-2465,2468-2469,2472-2473,2476-2477,2480-2481,2484-2485,2488-"
+ "2489,2492-2493,2496,2500,2504,2508,2512,2516,2520,2524,2528-2529,2532-2533,2536-2537,2540-2541,2544-2545,2548-25"
+ "49,2552-2553,2556-2557\n";
+
+static const struct test_bitmap_print test_print[] __initconst = {
+ { small_bitmap, sizeof(small_bitmap) * BITS_PER_BYTE, small_mask, small_list },
+ { large_bitmap, sizeof(large_bitmap) * BITS_PER_BYTE, large_mask, large_list },
+};
+
+static void __init test_bitmap_print_buf(void)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_print); i++) {
+ const struct test_bitmap_print *t = &test_print[i];
+ int n;
+
+ n = bitmap_print_to_buf(false, print_buf, t->bitmap, t->nbits,
+ 0, 2 * PAGE_SIZE);
+ expect_eq_uint(strlen(t->mask) + 1, n);
+ expect_eq_str(t->mask, print_buf, n);
+
+ n = bitmap_print_to_buf(true, print_buf, t->bitmap, t->nbits,
+ 0, 2 * PAGE_SIZE);
+ expect_eq_uint(strlen(t->list) + 1, n);
+ expect_eq_str(t->list, print_buf, n);
+
+ /* test bitmap_print_to_buf by non-zero offset */
+ if (strlen(t->list) > PAGE_SIZE) {
+ n = bitmap_print_to_buf(true, print_buf, t->bitmap, t->nbits,
+ PAGE_SIZE, PAGE_SIZE);
+ expect_eq_uint(strlen(t->list) + 1 - PAGE_SIZE, n);
+ expect_eq_str(t->list + PAGE_SIZE, print_buf, n);
+ }
+ }
+}
+
static void __init selftest(void)
{
test_zero_clear();
@@ -672,6 +821,7 @@ static void __init selftest(void)
test_mem_optimisations();
test_for_each_set_clump8();
test_bitmap_cut();
+ test_bitmap_print_buf();
}

KSTM_MODULE_LOADERS(test_bitmap);
--
2.25.1

Subject: [PATCH v7 3/4] drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI

From: Tian Tao <[email protected]>

Reading /sys/devices/system/cpu/cpuX/nodeX/ returns cpumap and cpulist.
However, the size of this file is limited to PAGE_SIZE because of the
limitation for sysfs attribute.

This patch moves to use bin_attribute to extend the ABI to be more
than one page so that cpumap bitmask and list won't be potentially
trimmed.

Signed-off-by: Tian Tao <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Barry Song <[email protected]>
---
-v7:
sys -> /sys, thanks for Andy's comments

drivers/base/node.c | 51 +++++++++++++++++++++++++++++----------------
1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 4a4ae868ad9f..89a72aba72a3 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -27,42 +27,44 @@ static struct bus_type node_subsys = {
};


-static ssize_t node_read_cpumap(struct device *dev, bool list, char *buf)
+static ssize_t node_read_cpumap(struct device *dev, bool list, char *buf,
+ loff_t off, size_t count)
{
ssize_t n;
cpumask_var_t mask;
struct node *node_dev = to_node(dev);

- /* 2008/04/07: buf currently PAGE_SIZE, need 9 chars per 32 bits. */
- BUILD_BUG_ON((NR_CPUS/32 * 9) > (PAGE_SIZE-1));
-
if (!alloc_cpumask_var(&mask, GFP_KERNEL))
return 0;

cpumask_and(mask, cpumask_of_node(node_dev->dev.id), cpu_online_mask);
- n = cpumap_print_to_pagebuf(list, buf, mask);
+ n = cpumap_print_to_buf(list, buf, mask, off, count);
free_cpumask_var(mask);

return n;
}

-static inline ssize_t cpumap_show(struct device *dev,
- struct device_attribute *attr,
- char *buf)
+static inline ssize_t cpumap_read(struct file *file, struct kobject *kobj,
+ struct bin_attribute *attr, char *buf,
+ loff_t off, size_t count)
{
- return node_read_cpumap(dev, false, buf);
+ struct device *dev = kobj_to_dev(kobj);
+
+ return node_read_cpumap(dev, false, buf, off, count);
}

-static DEVICE_ATTR_RO(cpumap);
+static BIN_ATTR_RO(cpumap, 0);

-static inline ssize_t cpulist_show(struct device *dev,
- struct device_attribute *attr,
- char *buf)
+static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
+ struct bin_attribute *attr, char *buf,
+ loff_t off, size_t count)
{
- return node_read_cpumap(dev, true, buf);
+ struct device *dev = kobj_to_dev(kobj);
+
+ return node_read_cpumap(dev, true, buf, off, count);
}

-static DEVICE_ATTR_RO(cpulist);
+static BIN_ATTR_RO(cpulist, 0);

/**
* struct node_access_nodes - Access class device to hold user visible
@@ -557,15 +559,28 @@ static ssize_t node_read_distance(struct device *dev,
static DEVICE_ATTR(distance, 0444, node_read_distance, NULL);

static struct attribute *node_dev_attrs[] = {
- &dev_attr_cpumap.attr,
- &dev_attr_cpulist.attr,
&dev_attr_meminfo.attr,
&dev_attr_numastat.attr,
&dev_attr_distance.attr,
&dev_attr_vmstat.attr,
NULL
};
-ATTRIBUTE_GROUPS(node_dev);
+
+static struct bin_attribute *node_dev_bin_attrs[] = {
+ &bin_attr_cpumap,
+ &bin_attr_cpulist,
+ NULL
+};
+
+static const struct attribute_group node_dev_group = {
+ .attrs = node_dev_attrs,
+ .bin_attrs = node_dev_bin_attrs
+};
+
+static const struct attribute_group *node_dev_groups[] = {
+ &node_dev_group,
+ NULL
+};

#ifdef CONFIG_HUGETLBFS
/*
--
2.25.1

2021-07-15 16:49:24

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] cpumask: introduce cpumap_print_to_buf to support large bitmask and list

On Thu, Jul 15, 2021 at 11:58:53PM +1200, Barry Song wrote:
> (10.1.198.147)
> X-CFilter-Loop: Reflected
> Status: O
> Content-Length: 10263
> Lines: 252
>
> From: Tian Tao <[email protected]>

[...]

> +int bitmap_print_to_buf(bool list, char *buf, const unsigned long *maskp,
> + int nmaskbits, loff_t off, size_t count)
> +{
> + const char *fmt = list ? "%*pbl\n" : "%*pb\n";
> + ssize_t size;
> + void *data;
> +
> + data = kasprintf(GFP_KERNEL, fmt, nmaskbits, maskp);
> + if (!data)
> + return -ENOMEM;
> +
> + size = memory_read_from_buffer(buf, count, &off, data, strlen(data) + 1);
> + kfree(data);
> +
> + return size;
> +}
> +EXPORT_SYMBOL(bitmap_print_to_buf);

In discussion to v4 of this series I've pointed out inefficiency of
this approach. Now it's v7, but the problem is still there.

1. You make user of your API guess aboout proper @count without any
hint. This is worse than how it works now with pure vsnprintf().
2. For big bitmaps and small @count, your code will make enormous
amount of unneeded work. For example, if the full length of string
representation of bitmap is 1M, and length of the output buffer is
1k, one will have to call bitmap_print_to_buf() 1000 times. With
current design it assumes that you allocate the full amount of memory
1000 times, free it 1000 times and print huge bitmap 1000 times to
just return small part of it.

NAK for this, and please stop submitting wrong approach again and
again.

2021-07-15 18:11:28

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v7 4/4] lib: test_bitmap: add bitmap_print_to_buf test cases

On Thu, Jul 15, 2021 at 11:58:56PM +1200, Barry Song wrote:
>
> The added test items cover both cases where bitmap buf of the printed
> result is greater than and less than 4KB.
> And it also covers the case where offset for bitmap_print_to_buf is
> non-zero which will happen when printed buf is larger than one page
> in sysfs bin_attribute.
>
> Signed-off-by: Barry Song <[email protected]>
> ---
> -v7:
> minor cleanup according to Andy's comments
>
> lib/test_bitmap.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 150 insertions(+)
>
> diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
> index 4ea73f5aed41..eb8ebaf12865 100644
> --- a/lib/test_bitmap.c
> +++ b/lib/test_bitmap.c
> @@ -19,6 +19,7 @@
> KSTM_MODULE_GLOBALS();
>
> static char pbl_buffer[PAGE_SIZE] __initdata;
> +static char print_buf[PAGE_SIZE * 2] __initdata;
>
> static const unsigned long exp1[] __initconst = {
> BITMAP_FROM_U64(1),
> @@ -156,6 +157,20 @@ static bool __init __check_eq_clump8(const char *srcfile, unsigned int line,
> return true;
> }
>
> +static bool __init
> +__check_eq_str(const char *srcfile, unsigned int line,
> + const char *exp_str, const char *str,
> + unsigned int len)
> +{
> + bool eq;
> +
> + eq = strncmp(exp_str, str, len) == 0;
> + if (!eq)
> + pr_err("[%s:%u] expected %s, got %s\n", srcfile, line, exp_str, str);
> +
> + return eq;
> +}
> +
> #define __expect_eq(suffix, ...) \
> ({ \
> int result = 0; \
> @@ -173,6 +188,7 @@ static bool __init __check_eq_clump8(const char *srcfile, unsigned int line,
> #define expect_eq_pbl(...) __expect_eq(pbl, ##__VA_ARGS__)
> #define expect_eq_u32_array(...) __expect_eq(u32_array, ##__VA_ARGS__)
> #define expect_eq_clump8(...) __expect_eq(clump8, ##__VA_ARGS__)
> +#define expect_eq_str(...) __expect_eq(str, ##__VA_ARGS__)
>
> static void __init test_zero_clear(void)
> {
> @@ -660,6 +676,139 @@ static void __init test_bitmap_cut(void)
> }
> }
>
> +struct test_bitmap_print {
> + const unsigned long *bitmap;
> + unsigned long nbits;
> + const char *mask;
> + const char *list;
> +};
> +
> +static const unsigned long small_bitmap[] __initconst = {
> + BITMAP_FROM_U64(0x3333333311111111ULL),
> +};
> +
> +static const char small_mask[] __initconst = "33333333,11111111\n";
> +static const char small_list[] __initconst = "0,4,8,12,16,20,24,28,32-33,36-37,40-41,44-45,48-49,52-53,56-57,60-61\n";
> +
> +static const unsigned long large_bitmap[] __initconst = {
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> + BITMAP_FROM_U64(0x3333333311111111ULL), BITMAP_FROM_U64(0x3333333311111111ULL),
> +};
> +
> +static const char large_mask[] __initconst = "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111,"
> + "33333333,11111111,33333333,11111111\n";
> +
> +static const char large_list[] __initconst = /* more than 4KB */
> + "0,4,8,12,16,20,24,28,32-33,36-37,40-41,44-45,48-49,52-53,56-57,60-61,64,68,72,76,80,84,88,92,96-97,100-101,104-1"
> + "05,108-109,112-113,116-117,120-121,124-125,128,132,136,140,144,148,152,156,160-161,164-165,168-169,172-173,176-1"
> + "77,180-181,184-185,188-189,192,196,200,204,208,212,216,220,224-225,228-229,232-233,236-237,240-241,244-245,248-2"
> + "49,252-253,256,260,264,268,272,276,280,284,288-289,292-293,296-297,300-301,304-305,308-309,312-313,316-317,320,3"
> + "24,328,332,336,340,344,348,352-353,356-357,360-361,364-365,368-369,372-373,376-377,380-381,384,388,392,396,400,4"
> + "04,408,412,416-417,420-421,424-425,428-429,432-433,436-437,440-441,444-445,448,452,456,460,464,468,472,476,480-4"
> + "81,484-485,488-489,492-493,496-497,500-501,504-505,508-509,512,516,520,524,528,532,536,540,544-545,548-549,552-5"
> + "53,556-557,560-561,564-565,568-569,572-573,576,580,584,588,592,596,600,604,608-609,612-613,616-617,620-621,624-6"
> + "25,628-629,632-633,636-637,640,644,648,652,656,660,664,668,672-673,676-677,680-681,684-685,688-689,692-693,696-6"
> + "97,700-701,704,708,712,716,720,724,728,732,736-737,740-741,744-745,748-749,752-753,756-757,760-761,764-765,768,7"
> + "72,776,780,784,788,792,796,800-801,804-805,808-809,812-813,816-817,820-821,824-825,828-829,832,836,840,844,848,8"
> + "52,856,860,864-865,868-869,872-873,876-877,880-881,884-885,888-889,892-893,896,900,904,908,912,916,920,924,928-9"
> + "29,932-933,936-937,940-941,944-945,948-949,952-953,956-957,960,964,968,972,976,980,984,988,992-993,996-997,1000-"
> + "1001,1004-1005,1008-1009,1012-1013,1016-1017,1020-1021,1024,1028,1032,1036,1040,1044,1048,1052,1056-1057,1060-10"
> + "61,1064-1065,1068-1069,1072-1073,1076-1077,1080-1081,1084-1085,1088,1092,1096,1100,1104,1108,1112,1116,1120-1121"
> + ",1124-1125,1128-1129,1132-1133,1136-1137,1140-1141,1144-1145,1148-1149,1152,1156,1160,1164,1168,1172,1176,1180,1"
> + "184-1185,1188-1189,1192-1193,1196-1197,1200-1201,1204-1205,1208-1209,1212-1213,1216,1220,1224,1228,1232,1236,124"
> + "0,1244,1248-1249,1252-1253,1256-1257,1260-1261,1264-1265,1268-1269,1272-1273,1276-1277,1280,1284,1288,1292,1296,"
> + "1300,1304,1308,1312-1313,1316-1317,1320-1321,1324-1325,1328-1329,1332-1333,1336-1337,1340-1341,1344,1348,1352,13"
> + "56,1360,1364,1368,1372,1376-1377,1380-1381,1384-1385,1388-1389,1392-1393,1396-1397,1400-1401,1404-1405,1408,1412"
> + ",1416,1420,1424,1428,1432,1436,1440-1441,1444-1445,1448-1449,1452-1453,1456-1457,1460-1461,1464-1465,1468-1469,1"
> + "472,1476,1480,1484,1488,1492,1496,1500,1504-1505,1508-1509,1512-1513,1516-1517,1520-1521,1524-1525,1528-1529,153"
> + "2-1533,1536,1540,1544,1548,1552,1556,1560,1564,1568-1569,1572-1573,1576-1577,1580-1581,1584-1585,1588-1589,1592-"
> + "1593,1596-1597,1600,1604,1608,1612,1616,1620,1624,1628,1632-1633,1636-1637,1640-1641,1644-1645,1648-1649,1652-16"
> + "53,1656-1657,1660-1661,1664,1668,1672,1676,1680,1684,1688,1692,1696-1697,1700-1701,1704-1705,1708-1709,1712-1713"
> + ",1716-1717,1720-1721,1724-1725,1728,1732,1736,1740,1744,1748,1752,1756,1760-1761,1764-1765,1768-1769,1772-1773,1"
> + "776-1777,1780-1781,1784-1785,1788-1789,1792,1796,1800,1804,1808,1812,1816,1820,1824-1825,1828-1829,1832-1833,183"
> + "6-1837,1840-1841,1844-1845,1848-1849,1852-1853,1856,1860,1864,1868,1872,1876,1880,1884,1888-1889,1892-1893,1896-"
> + "1897,1900-1901,1904-1905,1908-1909,1912-1913,1916-1917,1920,1924,1928,1932,1936,1940,1944,1948,1952-1953,1956-19"
> + "57,1960-1961,1964-1965,1968-1969,1972-1973,1976-1977,1980-1981,1984,1988,1992,1996,2000,2004,2008,2012,2016-2017"
> + ",2020-2021,2024-2025,2028-2029,2032-2033,2036-2037,2040-2041,2044-2045,2048,2052,2056,2060,2064,2068,2072,2076,2"
> + "080-2081,2084-2085,2088-2089,2092-2093,2096-2097,2100-2101,2104-2105,2108-2109,2112,2116,2120,2124,2128,2132,213"
> + "6,2140,2144-2145,2148-2149,2152-2153,2156-2157,2160-2161,2164-2165,2168-2169,2172-2173,2176,2180,2184,2188,2192,"
> + "2196,2200,2204,2208-2209,2212-2213,2216-2217,2220-2221,2224-2225,2228-2229,2232-2233,2236-2237,2240,2244,2248,22"
> + "52,2256,2260,2264,2268,2272-2273,2276-2277,2280-2281,2284-2285,2288-2289,2292-2293,2296-2297,2300-2301,2304,2308"
> + ",2312,2316,2320,2324,2328,2332,2336-2337,2340-2341,2344-2345,2348-2349,2352-2353,2356-2357,2360-2361,2364-2365,2"
> + "368,2372,2376,2380,2384,2388,2392,2396,2400-2401,2404-2405,2408-2409,2412-2413,2416-2417,2420-2421,2424-2425,242"
> + "8-2429,2432,2436,2440,2444,2448,2452,2456,2460,2464-2465,2468-2469,2472-2473,2476-2477,2480-2481,2484-2485,2488-"
> + "2489,2492-2493,2496,2500,2504,2508,2512,2516,2520,2524,2528-2529,2532-2533,2536-2537,2540-2541,2544-2545,2548-25"
> + "49,2552-2553,2556-2557\n";

Indeed it's longer than 80 chars per line, even than 100 chars.

I think it's not possible for reviewers to manually check
correctness of such a huge test line. Could you consider using
bitmap_parselist() + strncmp() instead of manual inspecting.

> +static const struct test_bitmap_print test_print[] __initconst = {
> + { small_bitmap, sizeof(small_bitmap) * BITS_PER_BYTE, small_mask, small_list },
> + { large_bitmap, sizeof(large_bitmap) * BITS_PER_BYTE, large_mask, large_list },
> +};
> +
> +static void __init test_bitmap_print_buf(void)
> +{
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(test_print); i++) {
> + const struct test_bitmap_print *t = &test_print[i];
> + int n;
> +
> + n = bitmap_print_to_buf(false, print_buf, t->bitmap, t->nbits,
> + 0, 2 * PAGE_SIZE);
> + expect_eq_uint(strlen(t->mask) + 1, n);
> + expect_eq_str(t->mask, print_buf, n);
> +
> + n = bitmap_print_to_buf(true, print_buf, t->bitmap, t->nbits,
> + 0, 2 * PAGE_SIZE);
> + expect_eq_uint(strlen(t->list) + 1, n);
> + expect_eq_str(t->list, print_buf, n);
> +
> + /* test bitmap_print_to_buf by non-zero offset */
> + if (strlen(t->list) > PAGE_SIZE) {
> + n = bitmap_print_to_buf(true, print_buf, t->bitmap, t->nbits,
> + PAGE_SIZE, PAGE_SIZE);
> + expect_eq_uint(strlen(t->list) + 1 - PAGE_SIZE, n);
> + expect_eq_str(t->list + PAGE_SIZE, print_buf, n);
> + }
> + }
> +}
> +
> static void __init selftest(void)
> {
> test_zero_clear();
> @@ -672,6 +821,7 @@ static void __init selftest(void)
> test_mem_optimisations();
> test_for_each_set_clump8();
> test_bitmap_cut();
> + test_bitmap_print_buf();
> }
>
> KSTM_MODULE_LOADERS(test_bitmap);
> --
> 2.25.1

Subject: RE: [PATCH v7 1/4] cpumask: introduce cpumap_print_to_buf to support large bitmask and list



> -----Original Message-----
> From: Yury Norov [mailto:[email protected]]
> Sent: Friday, July 16, 2021 3:29 AM
> To: Song Bao Hua (Barry Song) <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; tangchengchang
> <[email protected]>; Zengtao (B) <[email protected]>;
> yangyicong <[email protected]>; [email protected]; Linuxarm
> <[email protected]>; tiantao (H) <[email protected]>
> Subject: Re: [PATCH v7 1/4] cpumask: introduce cpumap_print_to_buf to support
> large bitmask and list
>
> On Thu, Jul 15, 2021 at 11:58:53PM +1200, Barry Song wrote:
> > (10.1.198.147)
> > X-CFilter-Loop: Reflected
> > Status: O
> > Content-Length: 10263
> > Lines: 252
> >
> > From: Tian Tao <[email protected]>
>
> [...]
>
> > +int bitmap_print_to_buf(bool list, char *buf, const unsigned long *maskp,
> > + int nmaskbits, loff_t off, size_t count)
> > +{
> > + const char *fmt = list ? "%*pbl\n" : "%*pb\n";
> > + ssize_t size;
> > + void *data;
> > +
> > + data = kasprintf(GFP_KERNEL, fmt, nmaskbits, maskp);
> > + if (!data)
> > + return -ENOMEM;
> > +
> > + size = memory_read_from_buffer(buf, count, &off, data, strlen(data) + 1);
> > + kfree(data);
> > +
> > + return size;
> > +}
> > +EXPORT_SYMBOL(bitmap_print_to_buf);
>
> In discussion to v4 of this series I've pointed out inefficiency of
> this approach. Now it's v7, but the problem is still there.
>
> 1. You make user of your API guess aboout proper @count without any
> hint. This is worse than how it works now with pure vsnprintf().

This isn't true. While this count comes from sysfs bin_attribute,
sysfs show() entry guarantee the count is proper and inside the
valid range of the buffer. Otherwise, sysfs bin_attribute has totally
broken for all users.

> 2. For big bitmaps and small @count, your code will make enormous
> amount of unneeded work. For example, if the full length of string
> representation of bitmap is 1M, and length of the output buffer is
> 1k, one will have to call bitmap_print_to_buf() 1000 times. With
> current design it assumes that you allocate the full amount of memory
> 1000 times, free it 1000 times and print huge bitmap 1000 times to
> just return small part of it.

This isn't true either. Nobody is actually holding a cpumap like 1MB.
4KB has been used in current kernel for a long time, no machine
has really complained it is not enough. So I would expect the real
case would be one time for majority, perhaps twice for some machines
which we haven't seen yet.

>
> NAK for this, and please stop submitting wrong approach again and
> again.

I have always answered your email and explained with a lot of word,
but you totally ignored my explanation and didn't even answer my
explanation in v5 and v6. That seems quite unfair.

Considering a driver which has M cpus and N different topology
entries in its show entry:

example_bin_attribute_show(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
loff_t offset, size_t count)
{
...
}
In case what you say is true and this show() is called 1000 times
with different offset if the buffer is as big as 1MB.
How would the code work by reusing a buffer allocated in advance,
like below?

example_bin_attribute_show(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
loff_t offset, size_t count)
{
//1st time:
char *bitmap_buf = bitmap_buffer_allocate(....);
save bitmap_buf to somewhere?

//2nd~1000 time
reuse the bitmap_buf?

//1000time
Free the bitmap buf?
}

Or like below?

char *global_bitmap_buf = bitmap_buffer_allocate(....)?

example_bin_attribute_show(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
loff_t offset, size_t count)
{
//1st - 1000 time?
Reuse the global_bitmap_buf?
}

Neither of the above way is good to me. The 1st one is not doable
at all. The second one is not doable either. M*N entries will have
different size and the buffer being re-used seems to be wasting
memory.

I'd appreciate if you could post some pseudo code so that v8 could
make some actual difference and go to the way you prefer.
On the other hand, we also need Greg's Ack on driver changes which
might happen afterwards.

So please post some pseudo code rather than simply put a NAK
by ignoring my long explanation.

Thanks
Barry

Subject: RE: [PATCH v7 1/4] cpumask: introduce cpumap_print_to_buf to support large bitmask and list



> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Friday, July 16, 2021 9:08 AM
> To: 'Yury Norov' <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; tangchengchang
> <[email protected]>; Zengtao (B) <[email protected]>;
> yangyicong <[email protected]>; [email protected]; Linuxarm
> <[email protected]>; tiantao (H) <[email protected]>
> Subject: RE: [PATCH v7 1/4] cpumask: introduce cpumap_print_to_buf to support
> large bitmask and list
>
>
>
> > -----Original Message-----
> > From: Yury Norov [mailto:[email protected]]
> > Sent: Friday, July 16, 2021 3:29 AM
> > To: Song Bao Hua (Barry Song) <[email protected]>
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; tangchengchang
> > <[email protected]>; Zengtao (B) <[email protected]>;
> > yangyicong <[email protected]>; [email protected]; Linuxarm
> > <[email protected]>; tiantao (H) <[email protected]>
> > Subject: Re: [PATCH v7 1/4] cpumask: introduce cpumap_print_to_buf to support
> > large bitmask and list
> >
> > On Thu, Jul 15, 2021 at 11:58:53PM +1200, Barry Song wrote:
> > > (10.1.198.147)
> > > X-CFilter-Loop: Reflected
> > > Status: O
> > > Content-Length: 10263
> > > Lines: 252
> > >
> > > From: Tian Tao <[email protected]>
> >
> > [...]
> >
> > > +int bitmap_print_to_buf(bool list, char *buf, const unsigned long *maskp,
> > > + int nmaskbits, loff_t off, size_t count)
> > > +{
> > > + const char *fmt = list ? "%*pbl\n" : "%*pb\n";
> > > + ssize_t size;
> > > + void *data;
> > > +
> > > + data = kasprintf(GFP_KERNEL, fmt, nmaskbits, maskp);
> > > + if (!data)
> > > + return -ENOMEM;
> > > +
> > > + size = memory_read_from_buffer(buf, count, &off, data, strlen(data) +
> 1);
> > > + kfree(data);
> > > +
> > > + return size;
> > > +}
> > > +EXPORT_SYMBOL(bitmap_print_to_buf);
> >
> > In discussion to v4 of this series I've pointed out inefficiency of
> > this approach. Now it's v7, but the problem is still there.
> >
> > 1. You make user of your API guess aboout proper @count without any
> > hint. This is worse than how it works now with pure vsnprintf().
>
> This isn't true. While this count comes from sysfs bin_attribute,
> sysfs show() entry guarantee the count is proper and inside the
> valid range of the buffer. Otherwise, sysfs bin_attribute has totally
> broken for all users.

In addition, "You make user of your API guess about proper @count
without any hint" isn't true either. For a sysfs ABI and other
files, users will get EOF when it arrives the end of the file.
This is exactly the hint for users to stop further reading.

The new APIs are serving ABI purpose. hardly other kernel modules
will want a printed list.

Thanks
Barry

2021-07-28 13:42:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v7 0/4] use bin_attribute to break the size limitation of cpumap ABI

On Thu, Jul 15, 2021 at 11:58:52PM +1200, Barry Song wrote:
> v7:
> - update doc in code for new APIs according to the comments of
> Andy Shevchenko;
> - other minor cleanup and commit log fix according to the comments
> of Andy Shevchenko

I'm lost to tell if this is the latest version or if there are more
changes? Can you send this again with the latest changes so I can
review it?

thanks,

greg k-h

2021-07-28 14:59:20

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v7 0/4] use bin_attribute to break the size limitation of cpumap ABI

On Wed, Jul 28, 2021 at 03:41:00PM +0200, Greg KH wrote:
> On Thu, Jul 15, 2021 at 11:58:52PM +1200, Barry Song wrote:
> > v7:
> > - update doc in code for new APIs according to the comments of
> > Andy Shevchenko;
> > - other minor cleanup and commit log fix according to the comments
> > of Andy Shevchenko
>
> I'm lost to tell if this is the latest version or if there are more
> changes? Can you send this again with the latest changes so I can
> review it?

Barry, Greg,

If you decide to keep bitmap_print_to_buf in lib/bitmap.c, could you
please add the following patch to the series.

Thanks,
Yury


From 58602766dc2877d2103a334db6c2c2e1e6b8c89b Mon Sep 17 00:00:00 2001
From: Yury Norov <[email protected]>
Date: Wed, 28 Jul 2021 07:39:30 -0700
Subject: [PATCH] bitmap: extend comment to bitmap_print_to_buf

Extend comment to new function to warn potential users about caveats.

Signed-off-by: Yury Norov <[email protected]>
---
lib/bitmap.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 56bcffe2fa8c..b9f557ca668c 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -545,6 +545,24 @@ EXPORT_SYMBOL(bitmap_print_to_pagebuf);
* mainly serves bin_attribute which doesn't work with exact one page, and it
* can break the size limit of converted decimal list and hexadecimal bitmask.
*
+ * WARNING!
+ *
+ * This function is not a replacement for sprintf() or bitmap_print_to_pagebuf().
+ * It is intended to workaround sysfs limitations discussed above and should be
+ * used carefully in general case for the following reasons:
+ * - Time complexity is O(nbits^2/count), comparing to O(nbits) for snprintf().
+ * - Memory complexity is O(nbits), comparing to O(1) for snprintf().
+ * - @off and @count are NOT offset and number of bits to print.
+ * - If printing part of bitmap as list, the resulting string is not a correct
+ * list representation of bitmap. Particularly, some bits within or out of
+ * related interval may be erroneously set or unset. The format of the string
+ * may be broken, so bitmap_parselist() may fail parsing it.
+ * - If printing the whole bitmap as list by parts, user must ensure the order
+ * of calls of the function such that the offset is incremented linearly.
+ * - If printing the whole bitmap as list by parts, user must keep bitmap
+ * unchanged between the very first and very last call. Otherwise concatenated
+ * result may be incorrect, and format may be broken.
+ *
* Returns the number of characters actually printed to @buf
*/
int bitmap_print_to_buf(bool list, char *buf, const unsigned long *maskp,
--
2.30.2


2021-07-28 15:29:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v7 0/4] use bin_attribute to break the size limitation of cpumap ABI

On Wed, Jul 28, 2021 at 07:53:59AM -0700, Yury Norov wrote:
> On Wed, Jul 28, 2021 at 03:41:00PM +0200, Greg KH wrote:
> > On Thu, Jul 15, 2021 at 11:58:52PM +1200, Barry Song wrote:
> > > v7:
> > > - update doc in code for new APIs according to the comments of
> > > Andy Shevchenko;
> > > - other minor cleanup and commit log fix according to the comments
> > > of Andy Shevchenko
> >
> > I'm lost to tell if this is the latest version or if there are more
> > changes? Can you send this again with the latest changes so I can
> > review it?
>
> Barry, Greg,
>
> If you decide to keep bitmap_print_to_buf in lib/bitmap.c, could you
> please add the following patch to the series.

Feel free to submit this as an add-on patch in the proper way so that it
could be applied.

thanks,

greg k-h

2021-07-28 19:02:36

by Yury Norov

[permalink] [raw]
Subject: [PATCH] bitmap: extend comment to bitmap_print_to_buf

Extend comment to new function to warn potential users about caveats.

Signed-off-by: Yury Norov <[email protected]>
---
lib/bitmap.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 56bcffe2fa8c..b9f557ca668c 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -545,6 +545,24 @@ EXPORT_SYMBOL(bitmap_print_to_pagebuf);
* mainly serves bin_attribute which doesn't work with exact one page, and it
* can break the size limit of converted decimal list and hexadecimal bitmask.
*
+ * WARNING!
+ *
+ * This function is not a replacement for sprintf() or bitmap_print_to_pagebuf().
+ * It is intended to workaround sysfs limitations discussed above and should be
+ * used carefully in general case for the following reasons:
+ * - Time complexity is O(nbits^2/count), comparing to O(nbits) for snprintf().
+ * - Memory complexity is O(nbits), comparing to O(1) for snprintf().
+ * - @off and @count are NOT offset and number of bits to print.
+ * - If printing part of bitmap as list, the resulting string is not a correct
+ * list representation of bitmap. Particularly, some bits within or out of
+ * related interval may be erroneously set or unset. The format of the string
+ * may be broken, so bitmap_parselist-like parser may fail parsing it.
+ * - If printing the whole bitmap as list by parts, user must ensure the order
+ * of calls of the function such that the offset is incremented linearly.
+ * - If printing the whole bitmap as list by parts, user must keep bitmap
+ * unchanged between the very first and very last call. Otherwise concatenated
+ * result may be incorrect, and format may be broken.
+ *
* Returns the number of characters actually printed to @buf
*/
int bitmap_print_to_buf(bool list, char *buf, const unsigned long *maskp,
--
2.30.2


Subject: RE: [PATCH] bitmap: extend comment to bitmap_print_to_buf



> -----Original Message-----
> From: Yury Norov [mailto:[email protected]]
> Sent: Thursday, July 29, 2021 6:59 AM
> To: [email protected]; [email protected]; Song Bao Hua
> (Barry Song) <[email protected]>
> Cc: Yury Norov <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; tangchengchang <[email protected]>; Zengtao (B)
> <[email protected]>; yangyicong <[email protected]>;
> [email protected]; Linuxarm <[email protected]>;
> [email protected]; [email protected]
> Subject: [PATCH] bitmap: extend comment to bitmap_print_to_buf
>
> Extend comment to new function to warn potential users about caveats.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---

Looks awesome. Thanks, Yury. I have integrated your patch into
the latest series v8.

> lib/bitmap.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/lib/bitmap.c b/lib/bitmap.c
> index 56bcffe2fa8c..b9f557ca668c 100644
> --- a/lib/bitmap.c
> +++ b/lib/bitmap.c
> @@ -545,6 +545,24 @@ EXPORT_SYMBOL(bitmap_print_to_pagebuf);
> * mainly serves bin_attribute which doesn't work with exact one page, and it
> * can break the size limit of converted decimal list and hexadecimal bitmask.
> *
> + * WARNING!
> + *
> + * This function is not a replacement for sprintf() or
> bitmap_print_to_pagebuf().
> + * It is intended to workaround sysfs limitations discussed above and should
> be
> + * used carefully in general case for the following reasons:
> + * - Time complexity is O(nbits^2/count), comparing to O(nbits) for snprintf().
> + * - Memory complexity is O(nbits), comparing to O(1) for snprintf().
> + * - @off and @count are NOT offset and number of bits to print.
> + * - If printing part of bitmap as list, the resulting string is not a correct
> + * list representation of bitmap. Particularly, some bits within or out of
> + * related interval may be erroneously set or unset. The format of the string
> + * may be broken, so bitmap_parselist-like parser may fail parsing it.
> + * - If printing the whole bitmap as list by parts, user must ensure the order
> + * of calls of the function such that the offset is incremented linearly.
> + * - If printing the whole bitmap as list by parts, user must keep bitmap
> + * unchanged between the very first and very last call. Otherwise concatenated
> + * result may be incorrect, and format may be broken.
> + *
> * Returns the number of characters actually printed to @buf
> */
> int bitmap_print_to_buf(bool list, char *buf, const unsigned long *maskp,
> --
> 2.30.2

Thanks
Barry