LinuxLists.cc - [RFC] Printing numbers in SI units

2024-01-24 18:59:14

Subject: [RFC] Printing numbers in SI units

I was looking at hugetlbfs and it has several snippets of code like
this:

string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
pr_warn("HugeTLB: allocating %u of page size %s failed node%d. Only allocated %lu hugepages.\n",
h->max_huge_pages_node[nid], buf, nid, i);

That's not terribly ergonomic, so I wondered if I could do better.
Unfortunately, I decided to do it using the SPECIAL flag which GCC
warns about. But I've written the code now, so I'm sending it out in
case anybody has a better idea for how to incorporate it.

diff --git a/lib/test_printf.c b/lib/test_printf.c
index 69b6a5e177f2..69af4d24a814 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -178,6 +178,16 @@ test_number(void)
* behaviour.
*/
test("00|0|0|0|0", "%.2d|%.1d|%.0d|%.*d|%1.0d", 0, 0, 0, 0, 0, 0);
+
+ /*
+ * C23 does not define the effect of "alternative form". Indeed
+ * I think it actually defines it to be Undefined Behaviour which
+ * apparently lets the compiler delete your entire source code.
+ */
+ test("2KiB", "%#d", 2048);
+ test("2MiB", "%#d", 2048 * 1024);
+ test("1GiB", "%#d", 1024 * 1024 * 1024);
+ test("1000MiB", "%#d", 1024 * 1024 * 1000);
}

static void __init
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 552738f14275..a702582c598c 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -370,6 +370,56 @@ char *put_dec(char *buf, unsigned long long n)

#endif

+/*
+ * precision is the number of digits after the decimal place. we limit
+ * it to two, because more would be complicated and unnecessary.
+ */
+static noinline_for_stack
+char *put_si(char *buf, unsigned long long n, int precision)
+{
+ char *name = "KMGTPEZB";
+ int unit = 0;
+ unsigned remainder = 0;
+
+ if (precision > 2)
+ precision = 2;
+
+ while (n >= 1024) {
+ remainder = n % 1024;
+ n /= 1024;
+ unit++;
+ }
+
+ remainder *= 1000;
+ remainder /= 1024;
+
+ /* Round up */
+ if (precision == 2)
+ remainder += 500;
+ else if (precision == 1)
+ remainder += 50;
+ else
+ remainder += 5;
+ if (remainder >= 1000) {
+ remainder -= 1000;
+ n += 1;
+ }
+
+ *buf++ = 'B';
+ if (unit > 0) {
+ *buf++ = 'i';
+ *buf++ = name[unit - 1];
+ }
+
+ if (precision > 0) {
+ *((u16 *)buf) = decpair[remainder / 10];
+ buf += precision;
+ *buf++ = '.';
+ }
+
+ return put_dec_trunc8(buf, n);
+}
+
/*
* Convert passed number to decimal string.
* Returns the length of string. On buffer overflow, returns 0.
@@ -507,6 +557,9 @@ char *number(char *buf, char *end, unsigned long long num,
tmp[i++] = (hex_asc_upper[((unsigned char)num) & mask] | locase);
num >>= shift;
} while (num);
+ } else if (spec.flags & SPECIAL) {
+ i = put_si(tmp, num, precision) - tmp;
+ precision = i;
} else { /* base 10 */
i = put_dec(tmp, num) - tmp;
}

2024-01-24 22:43:31

by Rasmus Villemoes

[permalink] [raw]

Subject: Re: [RFC] Printing numbers in SI units

On 24/01/2024 19.58, Matthew Wilcox wrote:
> I was looking at hugetlbfs and it has several snippets of code like
> this:
>
> string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
> pr_warn("HugeTLB: allocating %u of page size %s failed node%d. Only allocated %lu hugepages.\n",
> h->max_huge_pages_node[nid], buf, nid, i);
>
> That's not terribly ergonomic, so I wondered if I could do better.
> Unfortunately, I decided to do it using the SPECIAL flag which GCC
> warns about. But I've written the code now, so I'm sending it out in
> case anybody has a better idea for how to incorporate it.

Well, something that gcc will warn about with Wformat isn't gonna fly,
obviously. But my man page also mentions ' as a possible flag for d
conversions:

' For decimal conversion (i, d, u, f, F, g, G) the output is
to be grouped with thousands'
grouping characters if the locale information indicates any.

Obviously, our printf wouldn't implement that, but "grouping by
(roughly) 1000s" is kinda related to what you want the output to be, so
it seems apt. The man page also says "Note that many versions of gcc(1)
cannot parse this option and will issue a warning.", but I think that's
an ancient and irrelevant note. None of the gcc (or clang) versions
godbolt knows about give that warning.

I'm not really sure the implementation should imply a trailing B; that
might as well be included in the format string itself - perhaps the
quantity is bits, so a lowercase b is better, or perhaps it's some
frequency so the user wants Hz.

As for frequency, one would probably prefer the real, 1000^n, SI
prefixes and not the IEC 1024^n ones, and regardless, I think there
should be some way of doing both STRING_UNITS_2 and STRING_UNITS_10. At
first I thought one could distinguish by using either one or two ', but
gcc does warn for repeated flags. Two options I can think of: (a)
Overload the precision, so a precision p >= 10 means STRING_UNITS_10
with actual precision being p-10 (and yes, by all means cap that at 2).
(b) Currently %i and %d are completely equivalent, we could make them
different in case a ' flag is present and make one do the 2^n and other
the 10^n.

In the over-engineering department, maybe let the space flag indicate to
put a space between the number and the prefix.

>
> diff --git a/lib/test_printf.c b/lib/test_printf.c
> index 69b6a5e177f2..69af4d24a814 100644
> --- a/lib/test_printf.c
> +++ b/lib/test_printf.c
> @@ -178,6 +178,16 @@ test_number(void)
> * behaviour.
> */
> test("00|0|0|0|0", "%.2d|%.1d|%.0d|%.*d|%1.0d", 0, 0, 0, 0, 0, 0);
> +
> + /*
> + * C23 does not define the effect of "alternative form". Indeed
> + * I think it actually defines it to be Undefined Behaviour which
> + * apparently lets the compiler delete your entire source code.
> + */
> + test("2KiB", "%#d", 2048);
> + test("2MiB", "%#d", 2048 * 1024);
> + test("1GiB", "%#d", 1024 * 1024 * 1024);
> + test("1000MiB", "%#d", 1024 * 1024 * 1000);
> }
>
> static void __init
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 552738f14275..a702582c598c 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -370,6 +370,56 @@ char *put_dec(char *buf, unsigned long long n)
>
> #endif
>
> +/*
> + * precision is the number of digits after the decimal place. we limit
> + * it to two, because more would be complicated and unnecessary.
> + */
> +static noinline_for_stack
> +char *put_si(char *buf, unsigned long long n, int precision)

Please don't call the Mi etc. "SI", there's already enough confusion
about what a megabyte really is.

Rasmus

2024-01-27 11:12:22

by David Laight

[permalink] [raw]

Subject: RE: [RFC] Printing numbers in SI units

...
> + * C23 does not define the effect of "alternative form". Indeed
> + * I think it actually defines it to be Undefined Behaviour which
> + * apparently lets the compiler delete your entire source code.

Isn't there a large laser in orbit that fires down and
completely destroys your computer?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2024-02-28 11:26:47

by Petr Mladek

[permalink] [raw]

Subject: Re: [RFC] Printing numbers in SI units

On Wed 2024-01-24 23:43:15, Rasmus Villemoes wrote:
> On 24/01/2024 19.58, Matthew Wilcox wrote:
> > I was looking at hugetlbfs and it has several snippets of code like
> > this:
> >
> > string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
> > pr_warn("HugeTLB: allocating %u of page size %s failed node%d. Only allocated %lu hugepages.\n",
> > h->max_huge_pages_node[nid], buf, nid, i);
> >
> > That's not terribly ergonomic, so I wondered if I could do better.
> > Unfortunately, I decided to do it using the SPECIAL flag which GCC
> > warns about. But I've written the code now, so I'm sending it out in
> > case anybody has a better idea for how to incorporate it.
>
> Well, something that gcc will warn about with Wformat isn't gonna fly,
> obviously. But my man page also mentions ' as a possible flag for d
> conversions:
>
> ' For decimal conversion (i, d, u, f, F, g, G) the output is
> to be grouped with thousands'
> grouping characters if the locale information indicates any.
>
> Obviously, our printf wouldn't implement that, but "grouping by
> (roughly) 1000s" is kinda related to what you want the output to be, so
> it seems apt. The man page also says "Note that many versions of gcc(1)
> cannot parse this option and will issue a warning.", but I think that's
> an ancient and irrelevant note. None of the gcc (or clang) versions
> godbolt knows about give that warning.

I am personally not a big fan of using random printf() modifiers for
a non-standard behavior.

For me, it is much easier to look up what string_get_size() does.
The alternative would be to check "man 3 printf" just to realize that
%#d or %'d does something else in the kernel. And where the hell is
the behavior documented. I know it but how many others do?

And if I wanted to use it in a new code then I would not expect
that it is supported by vsprintf() kernel implementation. I would
try to find it in the existing string helpers and get sad that
it is not there.

I see only 18 users:

$> git grep string_get_size | grep -v -e test -e string_helpers | wc -l
18

> I'm not really sure the implementation should imply a trailing B; that
> might as well be included in the format string itself - perhaps the
> quantity is bits, so a lowercase b is better, or perhaps it's some
> frequency so the user wants Hz.
>
> As for frequency, one would probably prefer the real, 1000^n, SI
> prefixes and not the IEC 1024^n ones, and regardless, I think there
> should be some way of doing both STRING_UNITS_2 and STRING_UNITS_10. At
> first I thought one could distinguish by using either one or two ', but
> gcc does warn for repeated flags. Two options I can think of: (a)
> Overload the precision, so a precision p >= 10 means STRING_UNITS_10
> with actual precision being p-10 (and yes, by all means cap that at 2).

Yet another non-standard behavior.

> (b) Currently %i and %d are completely equivalent, we could make them
> different in case a ' flag is present and make one do the 2^n and other
> the 10^n.

Same here.

IMHO, this "feature" would add more harm than good. And it is not worth it.

Best Regards,
Petr

2024-04-26 15:32:34

by Geert Uytterhoeven

[permalink] [raw]

Subject: Re: [RFC] Printing numbers in SI units

Hi Rasmus,

On Wed, Jan 24, 2024 at 11:43 PM Rasmus Villemoes
<[email protected]> wrote:
> On 24/01/2024 19.58, Matthew Wilcox wrote:
> > I was looking at hugetlbfs and it has several snippets of code like
> > this:
> >
> > string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
> > pr_warn("HugeTLB: allocating %u of page size %s failed node%d. Only allocated %lu hugepages.\n",
> > h->max_huge_pages_node[nid], buf, nid, i);
> >
> > That's not terribly ergonomic, so I wondered if I could do better.
> > Unfortunately, I decided to do it using the SPECIAL flag which GCC
> > warns about. But I've written the code now, so I'm sending it out in
> > case anybody has a better idea for how to incorporate it.
>
> Well, something that gcc will warn about with Wformat isn't gonna fly,
> obviously. But my man page also mentions ' as a possible flag for d
> conversions:
>
> ' For decimal conversion (i, d, u, f, F, g, G) the output is
> to be grouped with thousands'
> grouping characters if the locale information indicates any.

> Obviously, our printf wouldn't implement that, [...]

Why not? ;-)

Old Gmail-white-space-damaged patch below, which I wrote when I got
fed up with meticulously counting zeros in GHz-range clock
frequencies...

Author: Geert Uytterhoeven <[email protected]>
Date: Thu Aug 11 13:52:46 2016 +0200

lib/vsprintf.c: Add support for thousands' grouping

Use an underscore as the grouping character.

TODO:
- Documentation
- Self test
- Do we want to use this in /sys/kernel/debug/clk/clk_summary ?
RFC patch, compatibility was already broken by commit
e55a839a7a1c561b ("clk: add clock protection mechanism to clk
core")

Signed-off-by: Geert Uytterhoeven <[email protected]>
---
Originally I wanted to use grouping by 4 for octal and hexadecimal
numbers. Unfortunately gcc prints a warning when using the grouping
character with octal and hexadecimal numbers:

warning: ''' flag used with '%x' gnu_printf format [-Wformat=]
warning: ''' flag used with '%o' gnu_printf format [-Wformat=]

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index f4a8bfacb70befbc..51e516cf4cc68156 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -414,6 +414,9 @@ int num_to_str(char *buf, int size, unsigned long
long num, unsigned int width)
#define ZEROPAD 16 /* pad with zero, must be 16
== '0' - ' ' */
#define SMALL 32 /* use lowercase in hex (must be 32 == 0x20) */
#define SPECIAL 64 /* prefix hex with "0x", octal
with "0" */
+#define GROUP 128 /* thousands' grouping */
+// warning: ''' flag used with '%x' gnu_printf format [-Wformat=]
+// warning: ''' flag used with '%o' gnu_printf format [-Wformat=]

static_assert(SIGN == 1);
static_assert(ZEROPAD == ('0' - ' '));
@@ -462,7 +465,7 @@ char *number(char *buf, char *end, unsigned long long num,
char sign;
char locase;
int need_pfx = ((spec.flags & SPECIAL) && spec.base != 10);
- int i;
+ int i, j;
bool is_zero = num == 0LL;
int field_width = spec.field_width;
int precision = spec.precision;
@@ -511,6 +514,11 @@ char *number(char *buf, char *end, unsigned long long num,
i = put_dec(tmp, num) - tmp;
}

+ /* take into account grouping characters */
+ j = i;
+ if (spec.flags & GROUP)
+ i += (j - 1) / 3;
+
/* printing 100 using %2d gives "100", not "00" */
if (i > precision)
precision = i;
@@ -559,10 +567,16 @@ char *number(char *buf, char *end, unsigned long long num,
++buf;
}
/* actual digits of result */
- while (--i >= 0) {
+ while (--j >= 0) {
if (buf < end)
- *buf = tmp[i];
+ *buf = tmp[j];
++buf;
+ if ((spec.flags & GROUP) && j && !(j % 3)) {
+ if (buf < end)
+ // FIXME '\''
+ *buf = '_';
+ ++buf;
+ }
}
/* trailing space padding */
while (--field_width >= 0) {
@@ -2567,6 +2581,7 @@ int format_decode(const char *fmt, struct
printf_spec *spec)
case ' ': spec->flags |= SPACE; break;
case '#': spec->flags |= SPECIAL; break;
case '0': spec->flags |= ZEROPAD; break;
+ case '\'': spec->flags |= GROUP; break;
default: found = false;
}

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68korg

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2024-04-26 15:43:51

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [RFC] Printing numbers in SI units

On Fri, Apr 26, 2024 at 05:27:08PM +0200, Geert Uytterhoeven wrote:
> On Wed, Jan 24, 2024 at 11:43 PM Rasmus Villemoes
> <[email protected]> wrote:
> > On 24/01/2024 19.58, Matthew Wilcox wrote:
> > > I was looking at hugetlbfs and it has several snippets of code like
> > > this:
> > >
> > > string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
> > > pr_warn("HugeTLB: allocating %u of page size %s failed node%d. Only allocated %lu hugepages.\n",
> > > h->max_huge_pages_node[nid], buf, nid, i);
> > >
> > > That's not terribly ergonomic, so I wondered if I could do better.
> > > Unfortunately, I decided to do it using the SPECIAL flag which GCC
> > > warns about. But I've written the code now, so I'm sending it out in
> > > case anybody has a better idea for how to incorporate it.
> >
> > Well, something that gcc will warn about with Wformat isn't gonna fly,
> > obviously. But my man page also mentions ' as a possible flag for d
> > conversions:
> >
> > ' For decimal conversion (i, d, u, f, F, g, G) the output is
> > to be grouped with thousands'
> > grouping characters if the locale information indicates any.
>
> > Obviously, our printf wouldn't implement that, [...]
>
> Why not? ;-)

:-)

> Old Gmail-white-space-damaged patch below, which I wrote when I got
> fed up with meticulously counting zeros in GHz-range clock
> frequencies...

..

> lib/vsprintf.c: Add support for thousands' grouping
>
> Use an underscore as the grouping character.
>
> TODO:
> - Documentation
> - Self test

Definitely won't fly without these two.

> - Do we want to use this in /sys/kernel/debug/clk/clk_summary ?
> RFC patch, compatibility was already broken by commit
> e55a839a7a1c561b ("clk: add clock protection mechanism to clk
> core")
>
> Signed-off-by: Geert Uytterhoeven <[email protected]>
> ---
> Originally I wanted to use grouping by 4 for octal and hexadecimal
> numbers.

Not sure about octals, but for the hexadecimal we have hex_dump_to_buffer()
and %ph. The latter one can be modified to support grouping.

--
With Best Regards,
Andy Shevchenko

2024-05-06 15:55:28

by Petr Mladek

[permalink] [raw]

Subject: Re: [RFC] Printing numbers in SI units

On Fri 2024-04-26 17:27:08, Geert Uytterhoeven wrote:
> Hi Rasmus,
>
> On Wed, Jan 24, 2024 at 11:43 PM Rasmus Villemoes
> <[email protected]> wrote:
> > On 24/01/2024 19.58, Matthew Wilcox wrote:
> > > I was looking at hugetlbfs and it has several snippets of code like
> > > this:
> > >
> > > string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
> > > pr_warn("HugeTLB: allocating %u of page size %s failed node%d. Only allocated %lu hugepages.\n",
> > > h->max_huge_pages_node[nid], buf, nid, i);
> > >
> > > That's not terribly ergonomic, so I wondered if I could do better.
> > > Unfortunately, I decided to do it using the SPECIAL flag which GCC
> > > warns about. But I've written the code now, so I'm sending it out in
> > > case anybody has a better idea for how to incorporate it.
> >
> > Well, something that gcc will warn about with Wformat isn't gonna fly,
> > obviously. But my man page also mentions ' as a possible flag for d
> > conversions:
> >
> > ' For decimal conversion (i, d, u, f, F, g, G) the output is
> > to be grouped with thousands'
> > grouping characters if the locale information indicates any.
>
> > Obviously, our printf wouldn't implement that, [...]
>
> Why not? ;-)
>
> Old Gmail-white-space-damaged patch below, which I wrote when I got
> fed up with meticulously counting zeros in GHz-range clock
> frequencies...

:-)

> Author: Geert Uytterhoeven <[email protected]>
> Date: Thu Aug 11 13:52:46 2016 +0200
>
> lib/vsprintf.c: Add support for thousands' grouping
>
> Use an underscore as the grouping character.
>
> TODO:
> - Documentation
> - Self test
> - Do we want to use this in /sys/kernel/debug/clk/clk_summary ?
> RFC patch, compatibility was already broken by commit
> e55a839a7a1c561b ("clk: add clock protection mechanism to clk
> core")

Yeah, it should have some users if we want to maintain it.

Best Regards,
Petr