2018-08-27 23:17:00

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 01/13] seq_file: rewrite seq_puts() in terms of seq_write()

Space savings -- 42 bytes!

seq_puts 71 29 [-42]

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/seq_file.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 1dea7a8a5255..0c282a88a896 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -653,14 +653,7 @@ EXPORT_SYMBOL(seq_putc);

void seq_puts(struct seq_file *m, const char *s)
{
- int len = strlen(s);
-
- if (m->count + len >= m->size) {
- seq_set_overflow(m);
- return;
- }
- memcpy(m->buf + m->count, s, len);
- m->count += len;
+ seq_write(m, s, strlen(s));
}
EXPORT_SYMBOL(seq_puts);

--
2.16.4



2018-08-27 23:17:02

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 06/13] proc: convert /proc/self to _print_integer()

Benchmark readlink("/proc/self") 2^23 times:

8.205992458 seconds time elapsed ( +- 0.15% )
7.535168869 seconds time elapsed ( +- 0.09% )

-8.2%

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/proc/self.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/proc/self.c b/fs/proc/self.c
index 127265e5c55f..b2279412237b 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -14,6 +14,7 @@ static const char *proc_self_get_link(struct dentry *dentry,
{
struct pid_namespace *ns = proc_pid_ns(inode);
pid_t tgid = task_tgid_nr_ns(current, ns);
+ char buf[10], *p = buf + sizeof(buf);
char *name;

if (!tgid)
@@ -22,7 +23,11 @@ static const char *proc_self_get_link(struct dentry *dentry,
name = kmalloc(10 + 1, dentry ? GFP_KERNEL : GFP_ATOMIC);
if (unlikely(!name))
return dentry ? ERR_PTR(-ENOMEM) : ERR_PTR(-ECHILD);
- sprintf(name, "%u", tgid);
+
+ p = _print_integer_u32(p, tgid);
+ memcpy(name, p, buf + sizeof(buf) - p);
+ name[buf + sizeof(buf) - p] = '\0';
+
set_delayed_call(done, kfree_link, name);
return name;
}
--
2.16.4


2018-08-27 23:17:02

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 04/13] proc: rename "p" variable in proc_map_files_readdir()

Use "mfi", add "const" and move structure definition closer while I'm at it.

Note: moving "struct map_files_info info;" declaration to the scope
where it is used bloats the code by ~90 bytes. I'm not sure what's
going on.

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/proc/base.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index f96babf3cffc..17666bd61ac8 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2014,12 +2014,6 @@ static int map_files_get_link(struct dentry *dentry, struct path *path)
return rc;
}

-struct map_files_info {
- unsigned long start;
- unsigned long end;
- fmode_t mode;
-};
-
/*
* Only allow CAP_SYS_ADMIN to follow the links, due to concerns about how the
* symlinks may be used to bypass permissions on ancestor directories in the
@@ -2119,6 +2113,12 @@ static const struct inode_operations proc_map_files_inode_operations = {
.setattr = proc_setattr,
};

+struct map_files_info {
+ unsigned long start;
+ unsigned long end;
+ fmode_t mode;
+};
+
static int
proc_map_files_readdir(struct file *file, struct dir_context *ctx)
{
@@ -2128,7 +2128,6 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
unsigned long nr_files, pos, i;
struct flex_array *fa = NULL;
struct map_files_info info;
- struct map_files_info *p;
int ret;

ret = -ENOENT;
@@ -2196,16 +2195,17 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
mmput(mm);

for (i = 0; i < nr_files; i++) {
+ const struct map_files_info *mfi;
char buf[4 * sizeof(long) + 2]; /* max: %lx-%lx\0 */
unsigned int len;

- p = flex_array_get(fa, i);
- len = snprintf(buf, sizeof(buf), "%lx-%lx", p->start, p->end);
+ mfi = flex_array_get(fa, i);
+ len = snprintf(buf, sizeof(buf), "%lx-%lx", mfi->start, mfi->end);
if (!proc_fill_cache(file, ctx,
buf, len,
proc_map_files_instantiate,
task,
- (void *)(unsigned long)p->mode))
+ (void *)(unsigned long)mfi->mode))
break;
ctx->pos++;
}
--
2.16.4


2018-08-27 23:17:05

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 08/13] proc: convert /proc/*/fd to _print_integer()

Benchmark opendir+readdir("/proc/self/fd")+closedir 2^21 times
with 4 descriptors (0, 1, 2, 3 from opendir):

11.802099126 seconds time elapsed ( +- 0.23% )
10.950810068 seconds time elapsed ( +- 0.23% )

-7.2%

Benchmark the same thing with 1000 descriptors:

362.1250 us per iteration
288.4375 us

-20%

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/proc/fd.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index e098302b5101..60ad1935eefc 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -247,8 +247,7 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
fd++, ctx->pos++) {
struct file *f;
struct fd_data data;
- char name[10 + 1];
- unsigned int len;
+ char name[10], *p = name + sizeof(name);

f = fcheck_files(files, fd);
if (!f)
@@ -257,9 +256,10 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
rcu_read_unlock();
data.fd = fd;

- len = snprintf(name, sizeof(name), "%u", fd);
+ p = _print_integer_u32(p, fd);
if (!proc_fill_cache(file, ctx,
- name, len, instantiate, tsk,
+ p, name + sizeof(name) - p,
+ instantiate, tsk,
&data))
goto out_fd_loop;
cond_resched();
--
2.16.4


2018-08-27 23:17:16

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 13/13] proc: convert /proc/*/task/*/children to _print_integer()

Benchmark pread /proc/1/task/1/children 2^21 times on the same system:

6.766400479 s
4.328648442

-36%

(need to remeasure on a controlled set of children)

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/proc/array.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index d0565527166a..045ce2cac1dd 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -710,9 +710,11 @@ get_children_pid(struct inode *inode, struct pid *pid_prev, loff_t pos)
static int children_seq_show(struct seq_file *seq, void *v)
{
struct inode *inode = file_inode(seq->file);
+ char buf[10 + 1], *p = buf + sizeof(buf);

- seq_printf(seq, "%d ", pid_nr_ns(v, proc_pid_ns(inode)));
- return 0;
+ *--p = ' ';
+ p = _print_integer_u32(p, pid_nr_ns(v, proc_pid_ns(inode)));
+ return seq_write(seq, p, buf + sizeof(buf) - p);
}

static void *children_seq_start(struct seq_file *seq, loff_t *pos)
--
2.16.4


2018-08-27 23:17:30

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 11/13] proc: readdir /proc/*/task

---
fs/proc/base.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 33f444721965..668e465c86b3 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3549,11 +3549,11 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
task;
task = next_tid(task), ctx->pos++) {
- char name[10 + 1];
- unsigned int len;
+ char name[10], *p = name + sizeof(name);
+
tid = task_pid_nr_ns(task, ns);
- len = snprintf(name, sizeof(name), "%u", tid);
- if (!proc_fill_cache(file, ctx, name, len,
+ p = _print_integer_u32(p, tid);
+ if (!proc_fill_cache(file, ctx, p, name + sizeof(name) - p,
proc_task_instantiate, task, NULL)) {
/* returning this tgid failed, save it as the first
* pid for the next readir call */
--
2.16.4


2018-08-27 23:17:32

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 12/13] proc: convert /proc/*/statm to _print_integer()

Benchmark pread("/proc/self/statm") 2^23 times:

6.135596793 seconds time elapsed ( +- 0.11% )
5.685442773 seconds time elapsed ( +- 0.11% )

-7.3%

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/proc/array.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 5016e03a4dba..d0565527166a 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -627,27 +627,27 @@ int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns,
{
unsigned long size = 0, resident = 0, shared = 0, text = 0, data = 0;
struct mm_struct *mm = get_task_mm(task);
+ /* "%lu %lu %lu %lu 0 %lu 0\n" */
+ char buf[5 * ((sizeof(long) * 5 / 2) + 1) + 2 + 2];
+ char *p = buf + sizeof(buf);

if (mm) {
size = task_statm(mm, &shared, &text, &data, &resident);
mmput(mm);
}
- /*
- * For quick read, open code by putting numbers directly
- * expected format is
- * seq_printf(m, "%lu %lu %lu %lu 0 %lu 0\n",
- * size, resident, shared, text, data);
- */
- seq_put_decimal_ull(m, "", size);
- seq_put_decimal_ull(m, " ", resident);
- seq_put_decimal_ull(m, " ", shared);
- seq_put_decimal_ull(m, " ", text);
- seq_put_decimal_ull(m, " ", 0);
- seq_put_decimal_ull(m, " ", data);
- seq_put_decimal_ull(m, " ", 0);
- seq_putc(m, '\n');

- return 0;
+ p = memcpy(p - 3, " 0\n", 3);
+ p = _print_integer_ul(p, data);
+ p = memcpy(p - 3, " 0 ", 3);
+ p = _print_integer_ul(p, text);
+ *--p = ' ';
+ p = _print_integer_ul(p, shared);
+ *--p = ' ';
+ p = _print_integer_ul(p, resident);
+ *--p = ' ';
+ p = _print_integer_ul(p, size);
+
+ return seq_write(m, p, buf + sizeof(buf) - p);
}

#ifdef CONFIG_PROC_CHILDREN
--
2.16.4


2018-08-27 23:17:47

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 10/13] proc: convert readdir /proc to _print_integer()

Benchmark readdir("/proc") 2^13 times with 2K processes in a pid
namespace:

850.3750 us per readdir
786.5625

-7.5%

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/proc/base.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 79d2f7d72ad1..33f444721965 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3223,16 +3223,15 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
for (iter = next_tgid(ns, iter);
iter.task;
iter.tgid += 1, iter = next_tgid(ns, iter)) {
- char name[10 + 1];
- unsigned int len;
+ char name[10], *p = name + sizeof(name);

cond_resched();
if (!has_pid_permissions(ns, iter.task, HIDEPID_INVISIBLE))
continue;

- len = snprintf(name, sizeof(name), "%u", iter.tgid);
+ p = _print_integer_u32(p, iter.tgid);
ctx->pos = iter.tgid + TGID_OFFSET;
- if (!proc_fill_cache(file, ctx, name, len,
+ if (!proc_fill_cache(file, ctx, p, name + sizeof(name) - p,
proc_pid_instantiate, iter.task, NULL)) {
put_task_struct(iter.task);
return 0;
--
2.16.4


2018-08-27 23:17:55

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 09/13] proc: convert dentry flushing on exit to _print_integer()

Benchmark fork+exit+waitpid 2^16 times:

6.579161299 seconds time elapsed ( +- 0.24% )
6.482729157 seconds time elapsed ( +- 0.42% )

-1.5%

Dentry flushing is very small part of exit(2), effects should be more
visible on a tiny 1-page process which doesn't uses libc.

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/proc/base.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 17666bd61ac8..79d2f7d72ad1 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3022,11 +3022,11 @@ static const struct inode_operations proc_tgid_base_inode_operations = {
static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
{
struct dentry *dentry, *leader, *dir;
- char buf[10 + 1];
+ char buf[10];
struct qstr name;

- name.name = buf;
- name.len = snprintf(buf, sizeof(buf), "%u", pid);
+ name.name = _print_integer_u32(buf + sizeof(buf), pid);
+ name.len = buf + sizeof(buf) - (char *)name.name;
/* no ->d_hash() rejects on procfs */
dentry = d_hash_and_lookup(mnt->mnt_root, &name);
if (dentry) {
@@ -3037,8 +3037,8 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
if (pid == tgid)
return;

- name.name = buf;
- name.len = snprintf(buf, sizeof(buf), "%u", tgid);
+ name.name = _print_integer_u32(buf + sizeof(buf), tgid);
+ name.len = buf + sizeof(buf) - (char *)name.name;
leader = d_hash_and_lookup(mnt->mnt_root, &name);
if (!leader)
goto out;
@@ -3049,8 +3049,8 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
if (!dir)
goto out_put_leader;

- name.name = buf;
- name.len = snprintf(buf, sizeof(buf), "%u", pid);
+ name.name = _print_integer_u32(buf + sizeof(buf), pid);
+ name.len = buf + sizeof(buf) - (char *)name.name;
dentry = d_hash_and_lookup(dir, &name);
if (dentry) {
d_invalidate(dentry);
--
2.16.4


2018-08-27 23:18:28

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 03/13] proc: rename "p" variable in proc_readfd_common()

Use slightly more obvious "tsk" and prepare for changes in printing.

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/proc/fd.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 81882a13212d..e098302b5101 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -228,16 +228,16 @@ static struct dentry *proc_lookupfd_common(struct inode *dir,
static int proc_readfd_common(struct file *file, struct dir_context *ctx,
instantiate_t instantiate)
{
- struct task_struct *p = get_proc_task(file_inode(file));
+ struct task_struct *tsk = get_proc_task(file_inode(file));
struct files_struct *files;
unsigned int fd;

- if (!p)
+ if (!tsk)
return -ENOENT;

if (!dir_emit_dots(file, ctx))
goto out;
- files = get_files_struct(p);
+ files = get_files_struct(tsk);
if (!files)
goto out;

@@ -259,7 +259,7 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,

len = snprintf(name, sizeof(name), "%u", fd);
if (!proc_fill_cache(file, ctx,
- name, len, instantiate, p,
+ name, len, instantiate, tsk,
&data))
goto out_fd_loop;
cond_resched();
@@ -269,7 +269,7 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
out_fd_loop:
put_files_struct(files);
out:
- put_task_struct(p);
+ put_task_struct(tsk);
return 0;
}

--
2.16.4


2018-08-27 23:18:32

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 02/13] proc: apply seq_puts() whenever possible

seq_puts() is faster than seq_printf() because it doesn't search for
format specifiers.

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/proc/array.c | 16 ++++++++--------
fs/proc/base.c | 2 +-
2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 0ceb3b6b37e7..5016e03a4dba 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -343,28 +343,28 @@ static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
#ifdef CONFIG_SECCOMP
seq_put_decimal_ull(m, "\nSeccomp:\t", p->seccomp.mode);
#endif
- seq_printf(m, "\nSpeculation_Store_Bypass:\t");
+ seq_puts(m, "\nSpeculation_Store_Bypass:\t");
switch (arch_prctl_spec_ctrl_get(p, PR_SPEC_STORE_BYPASS)) {
case -EINVAL:
- seq_printf(m, "unknown");
+ seq_puts(m, "unknown");
break;
case PR_SPEC_NOT_AFFECTED:
- seq_printf(m, "not vulnerable");
+ seq_puts(m, "not vulnerable");
break;
case PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE:
- seq_printf(m, "thread force mitigated");
+ seq_puts(m, "thread force mitigated");
break;
case PR_SPEC_PRCTL | PR_SPEC_DISABLE:
- seq_printf(m, "thread mitigated");
+ seq_puts(m, "thread mitigated");
break;
case PR_SPEC_PRCTL | PR_SPEC_ENABLE:
- seq_printf(m, "thread vulnerable");
+ seq_puts(m, "thread vulnerable");
break;
case PR_SPEC_DISABLE:
- seq_printf(m, "globally mitigated");
+ seq_puts(m, "globally mitigated");
break;
default:
- seq_printf(m, "vulnerable");
+ seq_puts(m, "vulnerable");
break;
}
seq_putc(m, '\n');
diff --git a/fs/proc/base.c b/fs/proc/base.c
index ccf86f16d9f0..f96babf3cffc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -442,7 +442,7 @@ static int proc_pid_schedstat(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
if (unlikely(!sched_info_on()))
- seq_printf(m, "0 0 0\n");
+ seq_puts(m, "0 0 0\n");
else
seq_printf(m, "%llu %llu %lu\n",
(unsigned long long)task->se.sum_exec_runtime,
--
2.16.4


2018-08-27 23:18:40

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 05/13] proc: new and improved way to print decimals

C lacks a capable preprocess to turn

snprintf(buf, sizeof(buf), "%u", x);

into

print_integer_u32(buf, x);

so vsnprintf() is forced to have a million branches.
Benchmark anything which uses /proc and look for format_decode().

This unfortunate situation was partially fixed by seq_put_decimal_ull()
function which skipped "format specifier" part. However, it still does
unnecessary copies internally and even reflects the digits before
putting them into final buffer. It also does strlen() which is done at
runtime.

The following 3 functions

_print_integer_u32
_print_integer_u64
_print_integer_ul

cut all the overhead by printing backwards one character at a time:

x = 123456789

| <====|
|...123456789|

This is just as fast as current printing by 2 characters at a time,
because pids, fds, uids are small integers so emitting 2 characters
doesn't make much difference. It also generates very small code
(146 bytes total here, not counting the callers).
Current put_dec() and friends are surprisingly large.

All the functions have the following signature:

char *_print_integer_XXX(char *p, T x);

They are written quite in a very specific way to prevent gcc from
inlining everything and making a mess.

They aren't exported and advertised because idiomatic way of using them
is not something you see every day:
* fixed sized buffer on stack capable of holding the worst case,
* pointer past the end of the buffer (yay 6.5.6 p8!)
* no buffer length checks (wheee),
* no NUL terminator (ha-ha-ha),
* emitting output BACKWARDS (one character at a time!),
* finally one copy to the final buffer (one copy, one!).

char buf[10 + 1 + 20 + 1], *p = buf + sizeof(buf);

*--p = '\n';
p = _print_integer_u64(p, y);
*--p = ' ';
p = _print_integer_u32(p, x);

seq_write(seq, p, buf + sizeof(buf) - p);

As the comment says, do not tell anyone about these functions.

The plan is to use them inside /proc and only inside /proc.

Signed-off-by: Alexey Dobriyan <[email protected]>
---
fs/proc/internal.h | 11 +++++++++++
fs/proc/util.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 58 insertions(+)

diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 5185d7f6a51e..be4965ef8e48 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -127,6 +127,17 @@ void task_dump_owner(struct task_struct *task, umode_t mode,
kuid_t *ruid, kgid_t *rgid);

unsigned name_to_int(const struct qstr *qstr);
+
+char *_print_integer_u32(char *, u32);
+char *_print_integer_u64(char *, u64);
+static inline char *_print_integer_ul(char *p, unsigned long x)
+{
+ if (sizeof(unsigned long) == 4)
+ return _print_integer_u32(p, x);
+ else
+ return _print_integer_u64(p, x);
+}
+
/*
* Offset of the first process in the /proc root directory..
*/
diff --git a/fs/proc/util.c b/fs/proc/util.c
index b161cfa0f9fa..2d9ceab04289 100644
--- a/fs/proc/util.c
+++ b/fs/proc/util.c
@@ -1,4 +1,5 @@
#include <linux/dcache.h>
+#include <linux/math64.h>

unsigned name_to_int(const struct qstr *qstr)
{
@@ -21,3 +22,49 @@ unsigned name_to_int(const struct qstr *qstr)
out:
return ~0U;
}
+
+/*
+ * Print an integer in decimal.
+ * "p" initially points PAST THE END OF THE BUFFER!
+ *
+ * DO NOT USE THESE FUNCTIONS!
+ *
+ * Do not copy these functions.
+ * Do not document these functions.
+ * Do not move these functions to lib/ or elsewhere.
+ * Do not export these functions to modules.
+ * Do not tell anyone about these functions.
+ */
+noinline
+char *_print_integer_u32(char *p, u32 x)
+{
+ do {
+ *--p = '0' + (x % 10);
+ x /= 10;
+ } while (x != 0);
+ return p;
+}
+
+static char *__print_integer_u32(char *p, u32 x)
+{
+ /* 0 <= x < 10^8 */
+ char *p0 = p - 8;
+
+ p = _print_integer_u32(p, x);
+ while (p != p0)
+ *--p = '0';
+ return p;
+}
+
+char *_print_integer_u64(char *p, u64 x)
+{
+ while (x >= 100000000) {
+ u64 q;
+ u32 r;
+
+ q = div_u64_rem(x, 100000000, &r);
+ p = __print_integer_u32(p, r);
+ x = q;
+ }
+ return _print_integer_u32(p, x);
+}
--
2.16.4


2018-08-27 23:18:47

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH 07/13] proc: convert /proc/thread-self to _print_integer()

Benchmark readlink("/proc/thread-self") 2^23 times:

9.447948508 seconds time elapsed ( +- 0.06% )
7.846435274 seconds time elapsed ( +- 0.07% )

-17%
---
fs/proc/thread_self.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index b905010ca9eb..03dd644d8ada 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -15,6 +15,7 @@ static const char *proc_thread_self_get_link(struct dentry *dentry,
struct pid_namespace *ns = proc_pid_ns(inode);
pid_t tgid = task_tgid_nr_ns(current, ns);
pid_t pid = task_pid_nr_ns(current, ns);
+ char buf[10 + 6 + 10], *p = buf + sizeof(buf);
char *name;

if (!pid)
@@ -22,7 +23,13 @@ static const char *proc_thread_self_get_link(struct dentry *dentry,
name = kmalloc(10 + 6 + 10 + 1, dentry ? GFP_KERNEL : GFP_ATOMIC);
if (unlikely(!name))
return dentry ? ERR_PTR(-ENOMEM) : ERR_PTR(-ECHILD);
- sprintf(name, "%u/task/%u", tgid, pid);
+
+ p = _print_integer_u32(p, pid);
+ p = memcpy(p - 6, "/task/", 6);
+ p = _print_integer_u32(p, tgid);
+ memcpy(name, p, buf + sizeof(buf) - p);
+ name[buf + sizeof(buf) - p] = '\0';
+
set_delayed_call(done, kfree_link, name);
return name;
}
--
2.16.4


2018-08-28 12:38:00

by Ahmed S. Darwish

[permalink] [raw]
Subject: Re: [PATCH 11/13] proc: readdir /proc/*/task

On Tue, Aug 28, 2018 at 02:15:01AM +0300, Alexey Dobriyan wrote:
> ---
> fs/proc/base.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>

Missing description and S-o-b. Further comments below..

> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 33f444721965..668e465c86b3 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3549,11 +3549,11 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
> for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
> task;
> task = next_tid(task), ctx->pos++) {
> - char name[10 + 1];
> - unsigned int len;
> + char name[10], *p = name + sizeof(name);
> +

Multiple issues:

- len should be 11, as was in the original code
(0xffffffff = 4294967295, 10 letters)

- while we're at it, let's use a constant for the '11' instead of
mysterious magic numbers

- 'p' is clearly overflowing the stack here

> tid = task_pid_nr_ns(task, ns);
> - len = snprintf(name, sizeof(name), "%u", tid);
> - if (!proc_fill_cache(file, ctx, name, len,
> + p = _print_integer_u32(p, tid);
> + if (!proc_fill_cache(file, ctx, p, name + sizeof(name) - p,

You're replacing snprintf() code __that did proper len checking__
with code that does not. That's not good.

I can't see how the fourth proc_fill_cache() parameter, ``name +
sizeof(name)'' safely ever replace the original 'len' parameter.
It's a pointer value .. (!)

Overall this looks like a broken patch submitted by mistake.

Thanks,

--
Darwish
http://darwish.chasingpointers.com

2018-08-28 13:06:38

by Ahmed S. Darwish

[permalink] [raw]
Subject: Re: [PATCH 11/13] proc: readdir /proc/*/task

On Tue, Aug 28, 2018 at 12:36:22PM +0000, Ahmed S. Darwish wrote:
> On Tue, Aug 28, 2018 at 02:15:01AM +0300, Alexey Dobriyan wrote:
> > ---
> > fs/proc/base.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
>
> Missing description and S-o-b. Further comments below..
>
> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > index 33f444721965..668e465c86b3 100644
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -3549,11 +3549,11 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
> > for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
> > task;
> > task = next_tid(task), ctx->pos++) {
> > - char name[10 + 1];
> > - unsigned int len;
> > + char name[10], *p = name + sizeof(name);
> > +
>
> Multiple issues:
>
> - len should be 11, as was in the original code
> (0xffffffff = 4294967295, 10 letters)
>
> - while we're at it, let's use a constant for the '11' instead of
> mysterious magic numbers
>
> - 'p' is clearly overflowing the stack here
>

See below:

> > tid = task_pid_nr_ns(task, ns);
> > - len = snprintf(name, sizeof(name), "%u", tid);
> > - if (!proc_fill_cache(file, ctx, name, len,
> > + p = _print_integer_u32(p, tid);
> > + if (!proc_fill_cache(file, ctx, p, name + sizeof(name) - p,
>
> You're replacing snprintf() code __that did proper len checking__
> with code that does not. That's not good.
>
> I can't see how the fourth proc_fill_cache() parameter, ``name +
> sizeof(name)'' safely ever replace the original 'len' parameter.
> It's a pointer value .. (!)
>

Ok, there's a "- p" in the end, so the length looks to be Ok.

Nonetheless, the whole patch series is introducing funny code
like:

+/*
+ * Print an integer in decimal.
+ * "p" initially points PAST THE END OF THE BUFFER!
+ *
+ * DO NOT USE THESE FUNCTIONS!
+ *
+ * Do not copy these functions.
+ * Do not document these functions.
+ * Do not move these functions to lib/ or elsewhere.
+ * Do not export these functions to modules.
+ * Do not tell anyone about these functions.
+ */
+noinline
+char *_print_integer_u32(char *p, u32 x)
+{
+ do {
+ *--p = '0' + (x % 10);
+ x /= 10;
+ } while (x != 0);
+ return p;
+}

And thus the code using these functions is throwing invalid
past-the-stack pointers and strings with no NULL terminators
like there's no tomorrow...

IMHO It's an accident waiting to happen to sprinkle pointers
like that everywhere. Are we really in a super hot path to
justify all that?

/me confused

--
Darwish
http://darwish.chasingpointers.com

2018-08-28 18:28:04

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 01/13] seq_file: rewrite seq_puts() in terms of seq_write()

On Tue, 2018-08-28 at 02:14 +0300, Alexey Dobriyan wrote:
> Space savings -- 42 bytes!
>
> seq_puts 71 29 [-42]
>
> Signed-off-by: Alexey Dobriyan <[email protected]>
> ---
> fs/seq_file.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index 1dea7a8a5255..0c282a88a896 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -653,14 +653,7 @@ EXPORT_SYMBOL(seq_putc);
>
> void seq_puts(struct seq_file *m, const char *s)
> {
> - int len = strlen(s);
> -
> - if (m->count + len >= m->size) {
> - seq_set_overflow(m);
> - return;
> - }
> - memcpy(m->buf + m->count, s, len);
> - m->count += len;
> + seq_write(m, s, strlen(s));
> }
> EXPORT_SYMBOL(seq_puts);
>

If execution speed is really an issue, as
almost all of the uses are for const strings,
why not use a #define and avoid the runtime
cost of strlen where possible.


2018-08-28 19:33:23

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH 11/13] proc: readdir /proc/*/task

On Tue, Aug 28, 2018 at 12:36:22PM +0000, Ahmed S. Darwish wrote:
> On Tue, Aug 28, 2018 at 02:15:01AM +0300, Alexey Dobriyan wrote:
> > ---
> > fs/proc/base.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
>
> Missing description and S-o-b. Further comments below..
>
> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > index 33f444721965..668e465c86b3 100644
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -3549,11 +3549,11 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
> > for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
> > task;
> > task = next_tid(task), ctx->pos++) {
> > - char name[10 + 1];
> > - unsigned int len;
> > + char name[10], *p = name + sizeof(name);
> > +
>
> Multiple issues:
>
> - len should be 11, as was in the original code
> (0xffffffff = 4294967295, 10 letters)

len should be 10.

> - while we're at it, let's use a constant for the '11' instead of
> mysterious magic numbers
>
> - 'p' is clearly overflowing the stack here
>
> > tid = task_pid_nr_ns(task, ns);
> > - len = snprintf(name, sizeof(name), "%u", tid);
> > - if (!proc_fill_cache(file, ctx, name, len,
> > + p = _print_integer_u32(p, tid);
> > + if (!proc_fill_cache(file, ctx, p, name + sizeof(name) - p,
>
> You're replacing snprintf() code __that did proper len checking__
> with code that does not. That's not good.

Yes, the whole point of the patch is to skip length checking.

> I can't see how the fourth proc_fill_cache() parameter, ``name +
> sizeof(name)'' safely ever replace the original 'len' parameter.
> It's a pointer value .. (!)
>
> Overall this looks like a broken patch submitted by mistake.

2018-08-28 19:37:50

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH 11/13] proc: readdir /proc/*/task

On Tue, Aug 28, 2018 at 01:04:40PM +0000, Ahmed S. Darwish wrote:
> On Tue, Aug 28, 2018 at 12:36:22PM +0000, Ahmed S. Darwish wrote:
> > On Tue, Aug 28, 2018 at 02:15:01AM +0300, Alexey Dobriyan wrote:
> > > ---
> > > fs/proc/base.c | 8 ++++----
> > > 1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> >
> > Missing description and S-o-b. Further comments below..
> >
> > > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > > index 33f444721965..668e465c86b3 100644
> > > --- a/fs/proc/base.c
> > > +++ b/fs/proc/base.c
> > > @@ -3549,11 +3549,11 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
> > > for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
> > > task;
> > > task = next_tid(task), ctx->pos++) {
> > > - char name[10 + 1];
> > > - unsigned int len;
> > > + char name[10], *p = name + sizeof(name);
> > > +
> >
> > Multiple issues:
> >
> > - len should be 11, as was in the original code
> > (0xffffffff = 4294967295, 10 letters)
> >
> > - while we're at it, let's use a constant for the '11' instead of
> > mysterious magic numbers
> >
> > - 'p' is clearly overflowing the stack here
> >
>
> See below:
>
> > > tid = task_pid_nr_ns(task, ns);
> > > - len = snprintf(name, sizeof(name), "%u", tid);
> > > - if (!proc_fill_cache(file, ctx, name, len,
> > > + p = _print_integer_u32(p, tid);
> > > + if (!proc_fill_cache(file, ctx, p, name + sizeof(name) - p,
> >
> > You're replacing snprintf() code __that did proper len checking__
> > with code that does not. That's not good.
> >
> > I can't see how the fourth proc_fill_cache() parameter, ``name +
> > sizeof(name)'' safely ever replace the original 'len' parameter.
> > It's a pointer value .. (!)
> >
>
> Ok, there's a "- p" in the end, so the length looks to be Ok.
>
> Nonetheless, the whole patch series is introducing funny code
> like:
>
> +/*
> + * Print an integer in decimal.
> + * "p" initially points PAST THE END OF THE BUFFER!
> + *
> + * DO NOT USE THESE FUNCTIONS!
> + *
> + * Do not copy these functions.
> + * Do not document these functions.
> + * Do not move these functions to lib/ or elsewhere.
> + * Do not export these functions to modules.
> + * Do not tell anyone about these functions.
> + */
> +noinline
> +char *_print_integer_u32(char *p, u32 x)
> +{
> + do {
> + *--p = '0' + (x % 10);
> + x /= 10;
> + } while (x != 0);
> + return p;
> +}
>
> And thus the code using these functions is throwing invalid
> past-the-stack pointers and strings with no NULL terminators
> like there's no tomorrow...
>
> IMHO It's an accident waiting to happen to sprinkle pointers
> like that everywhere.

It is not if people will be prohibited from moving this code to lib/ and
"improving" it by adding more parameters.

> Are we really in a super hot path to justify all that?

/proc is very slow, try profiling just about anything involving /proc.

2018-08-28 20:50:47

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 01/13] seq_file: rewrite seq_puts() in terms of seq_write()

On Tue, 2018-08-28 at 11:24 -0700, Joe Perches wrote:
> On Tue, 2018-08-28 at 02:14 +0300, Alexey Dobriyan wrote:
> > Space savings -- 42 bytes!
> >
> > seq_puts 71 29 [-42]
> >
> > Signed-off-by: Alexey Dobriyan <[email protected]>
> > ---
> > fs/seq_file.c | 9 +--------
> > 1 file changed, 1 insertion(+), 8 deletions(-)
> >
> > diff --git a/fs/seq_file.c b/fs/seq_file.c
> > index 1dea7a8a5255..0c282a88a896 100644
> > --- a/fs/seq_file.c
> > +++ b/fs/seq_file.c
> > @@ -653,14 +653,7 @@ EXPORT_SYMBOL(seq_putc);
> >
> > void seq_puts(struct seq_file *m, const char *s)
> > {
> > - int len = strlen(s);
> > -
> > - if (m->count + len >= m->size) {
> > - seq_set_overflow(m);
> > - return;
> > - }
> > - memcpy(m->buf + m->count, s, len);
> > - m->count += len;
> > + seq_write(m, s, strlen(s));
> > }
> > EXPORT_SYMBOL(seq_puts);
> >
>
> If execution speed is really an issue, as
> almost all of the uses are for const strings,
> why not use a #define and avoid the runtime
> cost of strlen where possible.

fyi: an x86-64 defconfig increases < .5 kb and
presumably is faster for most all seq_puts uses

$ size vmlinux.new vmlinux.old
text data bss dec hex
filename
17342316 4982128 999628 23324072 163e5a8
vmlinux.new
17341857 4982128 999628 23323613 163e3dd
vmlinux.old

with
---
fs/seq_file.c | 13 -------------
include/linux/seq_file.h | 2 +-
2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 1dea7a8a5255..97d474b6788e 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -651,19 +651,6 @@ void seq_putc(struct seq_file *m, char c)
}
EXPORT_SYMBOL(seq_putc);

-void seq_puts(struct seq_file *m, const char *s)
-{
- int len = strlen(s);
-
- if (m->count + len >= m->size) {
- seq_set_overflow(m);
- return;
- }
- memcpy(m->buf + m->count, s, len);
- m->count += len;
-}
-EXPORT_SYMBOL(seq_puts);
-
/**
* A helper routine for putting decimal numbers without rich format of printf().
* only 'unsigned long long' is supported.
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index a121982af0f5..722028805df6 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -117,7 +117,7 @@ void seq_vprintf(struct seq_file *m, const char *fmt, va_list args);
__printf(2, 3)
void seq_printf(struct seq_file *m, const char *fmt, ...);
void seq_putc(struct seq_file *m, char c);
-void seq_puts(struct seq_file *m, const char *s);
+#define seq_puts(m, s) seq_write(m, s, strlen(s))
void seq_put_decimal_ull_width(struct seq_file *m, const char *delimiter,
unsigned long long num, unsigned int width);
void seq_put_decimal_ull(struct seq_file *m, const char *delimiter,

2018-08-29 09:51:02

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 06/13] proc: convert /proc/self to _print_integer()

From: Alexey Dobriyan
> Sent: 28 August 2018 00:15
> ---
> fs/proc/self.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/self.c b/fs/proc/self.c
> index 127265e5c55f..b2279412237b 100644
> --- a/fs/proc/self.c
> +++ b/fs/proc/self.c
> @@ -14,6 +14,7 @@ static const char *proc_self_get_link(struct dentry *dentry,
> {
> struct pid_namespace *ns = proc_pid_ns(inode);
> pid_t tgid = task_tgid_nr_ns(current, ns);
> + char buf[10], *p = buf + sizeof(buf);
> char *name;
>
> if (!tgid)
> @@ -22,7 +23,11 @@ static const char *proc_self_get_link(struct dentry *dentry,
> name = kmalloc(10 + 1, dentry ? GFP_KERNEL : GFP_ATOMIC);
> if (unlikely(!name))
> return dentry ? ERR_PTR(-ENOMEM) : ERR_PTR(-ECHILD);
> - sprintf(name, "%u", tgid);
> +

Best not to 'hide' the initialisation of 'p' at the top of the function.
Much easier to see what is going on if it is moved here.

> + p = _print_integer_u32(p, tgid);

or just:
p = _print_integer(buf + sizeof(buf), tgid);

(What a horrid interface ...)

> + memcpy(name, p, buf + sizeof(buf) - p);
> + name[buf + sizeof(buf) - p] = '\0';
> +
> set_delayed_call(done, kfree_link, name);
> return name;
> }
> --
> 2.16.4

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2018-08-29 23:47:35

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 11/13] proc: readdir /proc/*/task

On Tue, 28 Aug 2018 22:35:02 +0300 Alexey Dobriyan <[email protected]> wrote:

> > Are we really in a super hot path to justify all that?
>
> /proc is very slow, try profiling just about anything involving /proc.

So how much does this patchset help? Some timing measurements would
really help things along, if they show goodness.


2018-08-30 13:15:48

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH 06/13] proc: convert /proc/self to _print_integer()

On Wed, Aug 29, 2018 at 09:50:50AM +0000, David Laight wrote:
> From: Alexey Dobriyan
> > Sent: 28 August 2018 00:15
> > ---
> > fs/proc/self.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/proc/self.c b/fs/proc/self.c
> > index 127265e5c55f..b2279412237b 100644
> > --- a/fs/proc/self.c
> > +++ b/fs/proc/self.c
> > @@ -14,6 +14,7 @@ static const char *proc_self_get_link(struct dentry *dentry,
> > {
> > struct pid_namespace *ns = proc_pid_ns(inode);
> > pid_t tgid = task_tgid_nr_ns(current, ns);
> > + char buf[10], *p = buf + sizeof(buf);
> > char *name;
> >
> > if (!tgid)
> > @@ -22,7 +23,11 @@ static const char *proc_self_get_link(struct dentry *dentry,
> > name = kmalloc(10 + 1, dentry ? GFP_KERNEL : GFP_ATOMIC);
> > if (unlikely(!name))
> > return dentry ? ERR_PTR(-ENOMEM) : ERR_PTR(-ECHILD);
> > - sprintf(name, "%u", tgid);
> > +
>
> Best not to 'hide' the initialisation of 'p' at the top of the function.
> Much easier to see what is going on if it is moved here.

It was written that way to have initialization near the buffer,
so that it is obvious that pointer and buffer are connected and
making

p = _print_integer_uNN(p, x);

standard building block which can be copypasted left and right and
still be correct.

> > + p = _print_integer_u32(p, tgid);
>
> or just:
> p = _print_integer(buf + sizeof(buf), tgid);
>
> (What a horrid interface ...)

It is. The problem is that adding bound checking will get you current
vsnprintf() and number() and num_to_str().

> > + memcpy(name, p, buf + sizeof(buf) - p);
> > + name[buf + sizeof(buf) - p] = '\0';
> > +
> > set_delayed_call(done, kfree_link, name);
> > return name;
> > }

2018-08-30 13:22:26

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH 11/13] proc: readdir /proc/*/task

On Wed, Aug 29, 2018 at 04:43:30PM -0700, Andrew Morton wrote:
> On Tue, 28 Aug 2018 22:35:02 +0300 Alexey Dobriyan <[email protected]> wrote:
>
> > > Are we really in a super hot path to justify all that?
> >
> > /proc is very slow, try profiling just about anything involving /proc.
>
> So how much does this patchset help? Some timing measurements would
> really help things along, if they show goodness.

Benchmarking results are in individual patches. They are ~1-7-20%
better overall. However those are microbenchmarks.

2018-09-03 14:19:52

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH 01/13] seq_file: rewrite seq_puts() in terms of seq_write()

Joe Perches wrote:

> > If execution speed is really an issue, as
> > almost all of the uses are for const strings,
> > why not use a #define and avoid the runtime
> > cost of strlen where possible.
>
> fyi: an x86-64 defconfig increases < .5 kb and
> presumably is faster for most all seq_puts uses

> -void seq_puts(struct seq_file *m, const char *s);
> +#define seq_puts(m, s) seq_write(m, s, strlen(s))

I don't know, seq_puts() patch was last minute observation.
It is independent of _print_integer() stuff.

Your version should be "static inline" of course.