There're many truncated kthreads in the kernel, which may make trouble
for the user, for example, the user can't get detailed device
information from the task comm.
This patchset tries to improve this problem fundamentally by extending
the task comm size from 16 to 24. In order to do that, we have to do
some cleanups first.
1. Make the copy of task comm always safe no matter what the task
comm size is. For example,
Unsafe Safe
strlcpy strscpy_pad
strncpy strscpy_pad
bpf_probe_read_kernel bpf_probe_read_kernel_str
bpf_core_read_str
bpf_get_current_comm
perf_event__prepare_comm
prctl(2)
After this step, the comm size change won't make any trouble to the
kernel or the in-tree tools for example perf, BPF programs.
2. Cleanup some old hard-coded 16
Actually we don't need to convert all of them to TASK_COMM_LEN or
TASK_COMM_LEN_16, what we really care about is if the convert can
make the code more reasonable or easier to understand. For
example, some in-tree tools read the comm from sched:sched_switch
tracepoint, as it is derived from the kernel, we'd better make them
consistent with the kernel.
3. Extend the task comm size from 16 to 24
task_struct is growing rather regularly by 8 bytes. This size change
should be acceptable. We used to think about extending the size for
CONFIG_BASE_FULL only, but that would be a burden for maintenance
and introduce code complexity.
4. Print a warning if the kthread comm is still truncated.
5. What will happen to the out-of-tree tools after this change?
If the tool get task comm through kernel API, for example prctl(2),
bpf_get_current_comm() and etc, then it doesn't matter how large the
user buffer is, because it will always get a string with a nul
terminator. While if it gets the task comm through direct string copy,
the user tool must make sure the copied string has a nul terminator
itself. As TASK_COMM_LEN is not exposed to userspace, there's no
reason that it must require a fixed-size task comm.
Changes since v5:
- extend the comm size for both CONFIG_BASE_{FULL, SMALL} that could
make the code more simple and easier to maintain.
- avoid changing too much hard-coded 16 in BPF programs per Andrii.
Changes since v4:
- introduce TASK_COMM_LEN_16 and TASK_COMM_LEN_24 per Steven
- replace hard-coded 16 with TASK_COMM_LEN_16 per Kees
- use strscpy_pad() instead of strlcpy()/strncpy() per Kees
- make perf test adopt to task comm size change per Arnaldo and Mathieu
- fix warning reported by kernel test robot
Changes since v3:
- fixes -Wstringop-truncation warning reported by kernel test robot
Changes since v2:
- avoid change UAPI code per Kees
- remove the description of out of tree code from commit log per Peter
Changes since v1:
- extend task comm to 24bytes, per Petr
- improve the warning per Petr
- make the checkpatch warning a separate patch
Yafang Shao (12):
fs/exec: make __set_task_comm always set a nul ternimated string
fs/exec: make __get_task_comm always get a nul terminated string
drivers/connector: make connector comm always nul ternimated
drivers/infiniband: make setup_ctxt always get a nul terminated task
comm
elfcore: make prpsinfo always get a nul terminated task comm
samples/bpf/test_overhead_kprobe_kern: make it adopt to task comm size
change
samples/bpf/offwaketime_kern: make sched_switch tracepoint args adopt
to comm size change
tools/bpf/bpftool/skeleton: make it adopt to task comm size change
tools/perf/test: make perf test adopt to task comm size change
tools/testing/selftests/bpf: make it adopt to task comm size change
sched.h: extend task comm from 16 to 24
kernel/kthread: show a warning if kthread's comm is truncated
drivers/connector/cn_proc.c | 5 +++-
drivers/infiniband/hw/qib/qib.h | 2 +-
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
fs/binfmt_elf.c | 2 +-
fs/exec.c | 5 ++--
include/linux/elfcore-compat.h | 3 ++-
include/linux/elfcore.h | 4 +--
include/linux/sched.h | 9 +++++--
kernel/kthread.c | 7 ++++-
samples/bpf/offwaketime_kern.c | 4 +--
samples/bpf/test_overhead_kprobe_kern.c | 11 ++++----
samples/bpf/test_overhead_tp_kern.c | 5 ++--
tools/bpf/bpftool/skeleton/pid_iter.bpf.c | 4 +--
tools/include/linux/sched.h | 11 ++++++++
tools/perf/tests/evsel-tp-sched.c | 26 ++++++++++++++-----
.../selftests/bpf/progs/test_stacktrace_map.c | 6 ++---
.../selftests/bpf/progs/test_tracepoint.c | 6 ++---
17 files changed, 77 insertions(+), 35 deletions(-)
create mode 100644 tools/include/linux/sched.h
--
2.17.1
Use strscpy_pad() instead of strlcpy() to make the comm always nul
terminated. As the comment above the hard-coded 16, we can replace it
with TASK_COMM_LEN, then it will adopt to the comm size change.
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
drivers/infiniband/hw/qib/qib.h | 2 +-
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
index 9363bccfc6e7..a8e1c30c370f 100644
--- a/drivers/infiniband/hw/qib/qib.h
+++ b/drivers/infiniband/hw/qib/qib.h
@@ -196,7 +196,7 @@ struct qib_ctxtdata {
pid_t pid;
pid_t subpid[QLOGIC_IB_MAX_SUBCTXT];
/* same size as task_struct .comm[], command that opened context */
- char comm[16];
+ char comm[TASK_COMM_LEN];
/* pkeys set by this use of this ctxt */
u16 pkeys[4];
/* so file ops can get at unit */
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index 63854f4b6524..7ab2b448c183 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -1321,7 +1321,7 @@ static int setup_ctxt(struct qib_pportdata *ppd, int ctxt,
rcd->tid_pg_list = ptmp;
rcd->pid = current->pid;
init_waitqueue_head(&dd->rcd[ctxt]->wait);
- strlcpy(rcd->comm, current->comm, sizeof(rcd->comm));
+ strscpy_pad(rcd->comm, current->comm, sizeof(rcd->comm));
ctxt_fp(fp) = rcd;
qib_stats.sps_ctxts++;
dd->freectxts--;
--
2.17.1
If the dest buffer size is smaller than sizeof(tsk->comm), the buffer
will be without null ternimator, that may cause problem. We can make sure
the buffer size not smaller than comm at the callsite to avoid that
problem, but there may be callsite that we can't easily change.
Using strscpy_pad() instead of strncpy() in __get_task_comm() can make
the string always nul ternimated.
Suggested-by: Kees Cook <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
fs/exec.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c
index 404156b5b314..bf2a7a91eeea 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1209,7 +1209,8 @@ static int unshare_sighand(struct task_struct *me)
char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
{
task_lock(tsk);
- strncpy(buf, tsk->comm, buf_size);
+ /* The copied value is always null terminated */
+ strscpy_pad(buf, tsk->comm, buf_size);
task_unlock(tsk);
return buf;
}
--
2.17.1
kernel test robot reported a -Wstringop-truncation warning after I
extend task comm from 16 to 24. Below is the detailed warning:
fs/binfmt_elf.c: In function 'fill_psinfo.isra':
>> fs/binfmt_elf.c:1575:9: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 23 [-Wstringop-truncation]
1575 | strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This patch can fix this warning.
Replacing strncpy() with strscpy_pad() can avoid this warning.
This patch also replace the hard-coded 16 with TASK_COMM_LEN to make it
more compatible with task comm size change.
I also verfied if it still work well when I extend the comm size to 24.
struct elf_prpsinfo is used to dump the task information in userspace
coredump or kernel vmcore. Below is the verfication of vmcore,
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff9d21a940 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffffa09e40f85e80 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffffa09e40f81f80 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffffa09e40f83f00 RU 0.0 0 0 [swapper/3]
> 0 0 4 ffffa09e40f80000 RU 0.0 0 0 [swapper/4]
> 0 0 5 ffffa09e40f89f80 RU 0.0 0 0 [swapper/5]
0 0 6 ffffa09e40f8bf00 RU 0.0 0 0 [swapper/6]
> 0 0 7 ffffa09e40f88000 RU 0.0 0 0 [swapper/7]
> 0 0 8 ffffa09e40f8de80 RU 0.0 0 0 [swapper/8]
> 0 0 9 ffffa09e40f95e80 RU 0.0 0 0 [swapper/9]
> 0 0 10 ffffa09e40f91f80 RU 0.0 0 0 [swapper/10]
> 0 0 11 ffffa09e40f93f00 RU 0.0 0 0 [swapper/11]
> 0 0 12 ffffa09e40f90000 RU 0.0 0 0 [swapper/12]
> 0 0 13 ffffa09e40f9bf00 RU 0.0 0 0 [swapper/13]
> 0 0 14 ffffa09e40f98000 RU 0.0 0 0 [swapper/14]
> 0 0 15 ffffa09e40f9de80 RU 0.0 0 0 [swapper/15]
It works well as expected.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
fs/binfmt_elf.c | 2 +-
include/linux/elfcore-compat.h | 3 ++-
include/linux/elfcore.h | 4 ++--
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index a813b70f594e..a4ba79fce2a9 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1572,7 +1572,7 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
SET_UID(psinfo->pr_uid, from_kuid_munged(cred->user_ns, cred->uid));
SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
rcu_read_unlock();
- strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
+ strscpy_pad(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
return 0;
}
diff --git a/include/linux/elfcore-compat.h b/include/linux/elfcore-compat.h
index e272c3d452ce..afa0eb45196b 100644
--- a/include/linux/elfcore-compat.h
+++ b/include/linux/elfcore-compat.h
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/elfcore.h>
#include <linux/compat.h>
+#include <linux/sched.h>
/*
* Make sure these layouts match the linux/elfcore.h native definitions.
@@ -43,7 +44,7 @@ struct compat_elf_prpsinfo
__compat_uid_t pr_uid;
__compat_gid_t pr_gid;
compat_pid_t pr_pid, pr_ppid, pr_pgrp, pr_sid;
- char pr_fname[16];
+ char pr_fname[TASK_COMM_LEN];
char pr_psargs[ELF_PRARGSZ];
};
diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h
index 2aaa15779d50..8d79cd58b09a 100644
--- a/include/linux/elfcore.h
+++ b/include/linux/elfcore.h
@@ -65,8 +65,8 @@ struct elf_prpsinfo
__kernel_gid_t pr_gid;
pid_t pr_pid, pr_ppid, pr_pgrp, pr_sid;
/* Lots missing */
- char pr_fname[16]; /* filename of executable */
- char pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
+ char pr_fname[TASK_COMM_LEN]; /* filename of executable */
+ char pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
};
static inline void elf_core_copy_regs(elf_gregset_t *elfregs, struct pt_regs *regs)
--
2.17.1
bpf_probe_read_kernel_str() will add a nul terminator to the dst, then
we don't care about if the dst size is big enough. This patch also
replaces the hard-coded 16 with TASK_COMM_LEN to make it adopt to task
comm size change.
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
samples/bpf/test_overhead_kprobe_kern.c | 11 ++++++-----
samples/bpf/test_overhead_tp_kern.c | 5 +++--
2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/samples/bpf/test_overhead_kprobe_kern.c b/samples/bpf/test_overhead_kprobe_kern.c
index f6d593e47037..8fdd2c9c56b2 100644
--- a/samples/bpf/test_overhead_kprobe_kern.c
+++ b/samples/bpf/test_overhead_kprobe_kern.c
@@ -6,6 +6,7 @@
*/
#include <linux/version.h>
#include <linux/ptrace.h>
+#include <linux/sched.h>
#include <uapi/linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
@@ -22,17 +23,17 @@ int prog(struct pt_regs *ctx)
{
struct signal_struct *signal;
struct task_struct *tsk;
- char oldcomm[16] = {};
- char newcomm[16] = {};
+ char oldcomm[TASK_COMM_LEN] = {};
+ char newcomm[TASK_COMM_LEN] = {};
u16 oom_score_adj;
u32 pid;
tsk = (void *)PT_REGS_PARM1(ctx);
pid = _(tsk->pid);
- bpf_probe_read_kernel(oldcomm, sizeof(oldcomm), &tsk->comm);
- bpf_probe_read_kernel(newcomm, sizeof(newcomm),
- (void *)PT_REGS_PARM2(ctx));
+ bpf_probe_read_kernel_str(oldcomm, sizeof(oldcomm), &tsk->comm);
+ bpf_probe_read_kernel_str(newcomm, sizeof(newcomm),
+ (void *)PT_REGS_PARM2(ctx));
signal = _(tsk->signal);
oom_score_adj = _(signal->oom_score_adj);
return 0;
diff --git a/samples/bpf/test_overhead_tp_kern.c b/samples/bpf/test_overhead_tp_kern.c
index eaa32693f8fc..80edadacb692 100644
--- a/samples/bpf/test_overhead_tp_kern.c
+++ b/samples/bpf/test_overhead_tp_kern.c
@@ -4,6 +4,7 @@
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
+#include <linux/sched.h>
#include <uapi/linux/bpf.h>
#include <bpf/bpf_helpers.h>
@@ -11,8 +12,8 @@
struct task_rename {
__u64 pad;
__u32 pid;
- char oldcomm[16];
- char newcomm[16];
+ char oldcomm[TASK_COMM_LEN];
+ char newcomm[TASK_COMM_LEN];
__u16 oom_score_adj;
};
SEC("tracepoint/task/task_rename")
--
2.17.1
connector comm was introduced in commit
f786ecba4158 ("connector: add comm change event report to proc connector").
struct comm_proc_event was defined in include/linux/cn_proc.h first and
then been moved into file include/uapi/linux/cn_proc.h in commit
607ca46e97a1 ("UAPI: (Scripted) Disintegrate include/linux").
As this is the UAPI code, we can't change it without potentially breaking
things (i.e. userspace binaries have this size built in, so we can't just
change the size). To prepare for the followup change - extending task
comm, we have to use __get_task_comm() to avoid the BUILD_BUG_ON() in
proc_comm_connector().
__get_task_comm() always get a nul terminated string, so we don't worry
about whether it is truncated or not.
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Vladimir Zapolskiy <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: David Howells <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
drivers/connector/cn_proc.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 646ad385e490..c88ba2dc1eae 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -230,7 +230,10 @@ void proc_comm_connector(struct task_struct *task)
ev->what = PROC_EVENT_COMM;
ev->event_data.comm.process_pid = task->pid;
ev->event_data.comm.process_tgid = task->tgid;
- get_task_comm(ev->event_data.comm.comm, task);
+
+ /* This may get truncated. */
+ __get_task_comm(ev->event_data.comm.comm,
+ sizeof(ev->event_data.comm.comm), task);
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
--
2.17.1
Show a warning if task comm is truncated. Below is the result
of my test case:
truncated kthread comm:I-am-a-kthread-with-lon, pid:14 by 6 characters
Suggested-by: Petr Mladek <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
kernel/kthread.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 5b37a8567168..46b924c92078 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -399,12 +399,17 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };
char name[TASK_COMM_LEN];
+ int len;
/*
* task is already visible to other tasks, so updating
* COMM must be protected.
*/
- vsnprintf(name, sizeof(name), namefmt, args);
+ len = vsnprintf(name, sizeof(name), namefmt, args);
+ if (len >= TASK_COMM_LEN) {
+ pr_warn("truncated kthread comm:%s, pid:%d by %d characters\n",
+ name, task->pid, len - TASK_COMM_LEN + 1);
+ }
set_task_comm(task, name);
/*
* root may have changed our (kthreadd's) priority or CPU mask.
--
2.17.1
The hard-coded 16 is used in various bpf progs. These progs get task
comm either via bpf_get_current_comm() or prctl() or
bpf_core_read_str(), all of which can work well even if the task comm size
is changed.
In these BPF programs, one thing to be improved is the
sched:sched_switch tracepoint args. As the tracepoint args are derived
from the kernel, we'd better make it same with the kernel. So the macro
TASK_COMM_LEN is converted to type enum, then all the BPF programs can
get it through BTF.
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
include/linux/sched.h | 9 +++++++--
tools/testing/selftests/bpf/progs/test_stacktrace_map.c | 6 +++---
tools/testing/selftests/bpf/progs/test_tracepoint.c | 6 +++---
3 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c1a927ddec64..124538db792c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -274,8 +274,13 @@ struct task_group;
#define get_current_state() READ_ONCE(current->__state)
-/* Task command name length: */
-#define TASK_COMM_LEN 16
+/*
+ * Define the task command name length as enum, then it can be visible to
+ * BPF programs.
+ */
+enum {
+ TASK_COMM_LEN = 16,
+};
extern void scheduler_tick(void);
diff --git a/tools/testing/selftests/bpf/progs/test_stacktrace_map.c b/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
index 00ed48672620..e9b602a6dc1b 100644
--- a/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
+++ b/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2018 Facebook
-#include <linux/bpf.h>
+#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#ifndef PERF_MAX_STACK_DEPTH
@@ -41,11 +41,11 @@ struct {
/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
struct sched_switch_args {
unsigned long long pad;
- char prev_comm[16];
+ char prev_comm[TASK_COMM_LEN];
int prev_pid;
int prev_prio;
long long prev_state;
- char next_comm[16];
+ char next_comm[TASK_COMM_LEN];
int next_pid;
int next_prio;
};
diff --git a/tools/testing/selftests/bpf/progs/test_tracepoint.c b/tools/testing/selftests/bpf/progs/test_tracepoint.c
index 4b825ee122cf..f21982681e28 100644
--- a/tools/testing/selftests/bpf/progs/test_tracepoint.c
+++ b/tools/testing/selftests/bpf/progs/test_tracepoint.c
@@ -1,17 +1,17 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2017 Facebook
-#include <linux/bpf.h>
+#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
struct sched_switch_args {
unsigned long long pad;
- char prev_comm[16];
+ char prev_comm[TASK_COMM_LEN];
int prev_pid;
int prev_prio;
long long prev_state;
- char next_comm[16];
+ char next_comm[TASK_COMM_LEN];
int next_pid;
int next_prio;
};
--
2.17.1
bpf_probe_read_kernel_str() will add a nul terminator to the dst, then
we don't care about if the dst size is big enough.
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
tools/bpf/bpftool/skeleton/pid_iter.bpf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c
index d9b420972934..f70702fcb224 100644
--- a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c
+++ b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c
@@ -71,8 +71,8 @@ int iter(struct bpf_iter__task_file *ctx)
e.pid = task->tgid;
e.id = get_obj_id(file->private_data, obj_type);
- bpf_probe_read_kernel(&e.comm, sizeof(e.comm),
- task->group_leader->comm);
+ bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm),
+ task->group_leader->comm);
bpf_seq_write(ctx->meta->seq, &e, sizeof(e));
return 0;
--
2.17.1
When I was implementing a new per-cpu kthread cfs_migration, I found the
comm of it "cfs_migration/%u" is truncated due to the limitation of
TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are
all with the same name "cfs_migration/1", which will confuse the user. This
issue is not critical, because we can get the corresponding CPU from the
task's Cpus_allowed. But for kthreads correspoinding to other hardware
devices, it is not easy to get the detailed device info from task comm,
for example,
jbd2/nvme0n1p2-
xfs-reclaim/sdf
We can also shorten the name to work around this problem, but I find
there are so many truncated kthreads:
rcu_tasks_kthre
rcu_tasks_rude_
rcu_tasks_trace
poll_mpt3sas0_s
ext4-rsv-conver
xfs-reclaim/sd{a, b, c, ...}
xfs-blockgc/sd{a, b, c, ...}
xfs-inodegc/sd{a, b, c, ...}
audit_send_repl
ecryptfs-kthrea
vfio-irqfd-clea
jbd2/nvme0n1p2-
...
We should improve this problem fundamentally by extending comm size to
24 bytes. task_struct is growing rather regularly by 8 bytes.
After this change, the truncated kthreads listed above will be
displayed as:
rcu_tasks_kthread
rcu_tasks_rude_kthread
rcu_tasks_trace_kthread
poll_mpt3sas0_statu
ext4-rsv-conversion
xfs-reclaim/sdf1
xfs-blockgc/sdf1
xfs-inodegc/sdf1
audit_send_reply
ecryptfs-kthread
vfio-irqfd-cleanup
jbd2/nvme0n1p2-8
As we have converted all the unsafe copy of task comm to the safe one,
this change won't make any trouble to the kernel or the in-tree tools.
The safe one and unsafe one of comm copy as follows,
Unsafe Safe
strlcpy strscpy_pad
strncpy strscpy_pad
bpf_probe_read_kernel bpf_probe_read_kernel_str
bpf_core_read_str
bpf_get_current_comm
perf_event__prepare_comm
prctl(2)
Regarding the possible risk it may take to the out-of-tree user tools, if
the user tools get the task comm through kernel API like prctl(2),
bpf_get_current_comm() and etc, the tools still work well after this
change. While If the user tools get the task comm through direct string
copy, it must make sure the copied string should be with a nul terminator.
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
include/linux/sched.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 124538db792c..490d12eabe44 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -279,7 +279,7 @@ struct task_group;
* BPF programs.
*/
enum {
- TASK_COMM_LEN = 16,
+ TASK_COMM_LEN = 24,
};
extern void scheduler_tick(void);
--
2.17.1
kernel test robot reported a perf-test failure after I extended task comm
size from 16 to 24. The failure as follows,
2021-10-13 18:00:46 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-317419b91ef4eff4e2f046088201e4dc4065caa0/tools/perf/perf test 15
15: Parse sched tracepoints fields : FAILED!
The reason is perf-test requires a fixed-size task comm. If we extend
task comm size to 24, it will not equil with the required size 16 in perf
test.
After some analyzation, I found perf itself can adopt to the size
change, for example, below is the output of perf-sched after I extend
comm size to 24 -
task 614 ( kthreadd: 84), nr_events: 1
task 615 ( systemd: 843), nr_events: 1
task 616 ( networkd-dispat: 1026), nr_events: 1
task 617 ( systemd: 846), nr_events: 1
$ cat /proc/843/comm
networkd-dispatcher
The task comm can be displayed correctly as expected.
Replace old hard-coded 16 with the new one can fix the warning, but we'd
better make the test accept both old and new sizes, then it can be
backward compatibility.
After this patch, the perf-test succeeds no matter task comm is 16 or
24 -
15: Parse sched tracepoints fields : Ok
This patch is a preparation for the followup patch.
Reported-by: kernel test robot <[email protected]>
Suggested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
tools/include/linux/sched.h | 11 +++++++++++
tools/perf/tests/evsel-tp-sched.c | 26 ++++++++++++++++++++------
2 files changed, 31 insertions(+), 6 deletions(-)
create mode 100644 tools/include/linux/sched.h
diff --git a/tools/include/linux/sched.h b/tools/include/linux/sched.h
new file mode 100644
index 000000000000..0d575afd7f43
--- /dev/null
+++ b/tools/include/linux/sched.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _TOOLS_LINUX_SCHED_H
+#define _TOOLS_LINUX_SCHED_H
+
+/* Keep both length for backward compatibility */
+enum {
+ TASK_COMM_LEN_16 = 16,
+ TASK_COMM_LEN = 24,
+};
+
+#endif /* _TOOLS_LINUX_SCHED_H */
diff --git a/tools/perf/tests/evsel-tp-sched.c b/tools/perf/tests/evsel-tp-sched.c
index f9e34bd26cf3..029f2a8c8e51 100644
--- a/tools/perf/tests/evsel-tp-sched.c
+++ b/tools/perf/tests/evsel-tp-sched.c
@@ -1,11 +1,13 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/err.h>
+#include <linux/sched.h>
#include <traceevent/event-parse.h>
#include "evsel.h"
#include "tests.h"
#include "debug.h"
-static int evsel__test_field(struct evsel *evsel, const char *name, int size, bool should_be_signed)
+static int evsel__test_field_alt(struct evsel *evsel, const char *name,
+ int size, int alternate_size, bool should_be_signed)
{
struct tep_format_field *field = evsel__field(evsel, name);
int is_signed;
@@ -23,15 +25,24 @@ static int evsel__test_field(struct evsel *evsel, const char *name, int size, bo
ret = -1;
}
- if (field->size != size) {
- pr_debug("%s: \"%s\" size (%d) should be %d!\n",
+ if (field->size != size && field->size != alternate_size) {
+ pr_debug("%s: \"%s\" size (%d) should be %d",
evsel->name, name, field->size, size);
+ if (alternate_size > 0)
+ pr_debug(" or %d", alternate_size);
+ pr_debug("!\n");
ret = -1;
}
return ret;
}
+static int evsel__test_field(struct evsel *evsel, const char *name,
+ int size, bool should_be_signed)
+{
+ return evsel__test_field_alt(evsel, name, size, -1, should_be_signed);
+}
+
int test__perf_evsel__tp_sched_test(struct test *test __maybe_unused, int subtest __maybe_unused)
{
struct evsel *evsel = evsel__newtp("sched", "sched_switch");
@@ -42,7 +53,8 @@ int test__perf_evsel__tp_sched_test(struct test *test __maybe_unused, int subtes
return -1;
}
- if (evsel__test_field(evsel, "prev_comm", 16, false))
+ if (evsel__test_field_alt(evsel, "prev_comm", TASK_COMM_LEN_16,
+ TASK_COMM_LEN, false))
ret = -1;
if (evsel__test_field(evsel, "prev_pid", 4, true))
@@ -54,7 +66,8 @@ int test__perf_evsel__tp_sched_test(struct test *test __maybe_unused, int subtes
if (evsel__test_field(evsel, "prev_state", sizeof(long), true))
ret = -1;
- if (evsel__test_field(evsel, "next_comm", 16, false))
+ if (evsel__test_field_alt(evsel, "next_comm", TASK_COMM_LEN_16,
+ TASK_COMM_LEN, false))
ret = -1;
if (evsel__test_field(evsel, "next_pid", 4, true))
@@ -72,7 +85,8 @@ int test__perf_evsel__tp_sched_test(struct test *test __maybe_unused, int subtes
return -1;
}
- if (evsel__test_field(evsel, "comm", 16, false))
+ if (evsel__test_field_alt(evsel, "comm", TASK_COMM_LEN_16,
+ TASK_COMM_LEN, false))
ret = -1;
if (evsel__test_field(evsel, "pid", 4, true))
--
2.17.1
Make sure the string set to task comm is always nul ternimated.
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
fs/exec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c
index a098c133d8d7..404156b5b314 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1224,7 +1224,7 @@ void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec)
{
task_lock(tsk);
trace_task_rename(tsk, buf);
- strlcpy(tsk->comm, buf, sizeof(tsk->comm));
+ strscpy_pad(tsk->comm, buf, sizeof(tsk->comm));
task_unlock(tsk);
perf_event_comm(tsk, exec);
}
--
2.17.1
The sched:sched_switch tracepoint is derived from kernel, we should make
its args compitable with the kernel.
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
---
samples/bpf/offwaketime_kern.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/samples/bpf/offwaketime_kern.c b/samples/bpf/offwaketime_kern.c
index 4866afd054da..eb4d94742e6b 100644
--- a/samples/bpf/offwaketime_kern.c
+++ b/samples/bpf/offwaketime_kern.c
@@ -113,11 +113,11 @@ static inline int update_counts(void *ctx, u32 pid, u64 delta)
/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
struct sched_switch_args {
unsigned long long pad;
- char prev_comm[16];
+ char prev_comm[TASK_COMM_LEN];
int prev_pid;
int prev_prio;
long long prev_state;
- char next_comm[16];
+ char next_comm[TASK_COMM_LEN];
int next_pid;
int next_prio;
};
--
2.17.1
On 10/25/21 4:33 AM, Yafang Shao wrote:
> Use strscpy_pad() instead of strlcpy() to make the comm always nul
> terminated. As the comment above the hard-coded 16, we can replace it
> with TASK_COMM_LEN, then it will adopt to the comm size change.
>
> Signed-off-by: Yafang Shao <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
> ---
> drivers/infiniband/hw/qib/qib.h | 2 +-
> drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
This qib patch looks fine.
Acked-by: Dennis Dalessandro <[email protected]>
On Mon, 25 Oct 2021 11:10:09 -0700
Alexei Starovoitov <[email protected]> wrote:
> It looks like a churn that doesn't really address the problem.
> If we were to allow long names then make it into a pointer and use 16 byte
> as an optimized storage for short names. Any longer name would be a pointer.
> In other words make it similar to dentry->d_iname.
That would be quite a bigger undertaking too, as it is assumed throughout
the kernel that the task->comm is TASK_COMM_LEN and is nul terminated. And
most locations that save the comm simply use a fixed size string of
TASK_COMM_LEN. Not saying its not feasible, but it would require a lot more
analysis of the impact by changing such a fundamental part of task struct
from a static to something requiring allocation.
Unless you are suggesting that we truncate like normal the 16 byte names
(to a max of 15 characters), and add a way to hold the entire name for
those locations that understand it.
-- Steve
On Mon, Oct 25, 2021 at 05:05:03PM -0400, Steven Rostedt wrote:
> On Mon, 25 Oct 2021 11:10:09 -0700
> Alexei Starovoitov <[email protected]> wrote:
>
> > It looks like a churn that doesn't really address the problem.
> > If we were to allow long names then make it into a pointer and use 16 byte
> > as an optimized storage for short names. Any longer name would be a pointer.
> > In other words make it similar to dentry->d_iname.
>
> That would be quite a bigger undertaking too, as it is assumed throughout
> the kernel that the task->comm is TASK_COMM_LEN and is nul terminated. And
> most locations that save the comm simply use a fixed size string of
> TASK_COMM_LEN. Not saying its not feasible, but it would require a lot more
> analysis of the impact by changing such a fundamental part of task struct
> from a static to something requiring allocation.
>
> Unless you are suggesting that we truncate like normal the 16 byte names
> (to a max of 15 characters), and add a way to hold the entire name for
> those locations that understand it.
Agreed -- this is a small change for what is already an "uncommon"
corner case. I don't think this needs to suddenly become an unbounded
string. :)
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:04AM +0000, Yafang Shao wrote:
> Make sure the string set to task comm is always nul ternimated.
typo nit: "terminated"
>
> Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:06AM +0000, Yafang Shao wrote:
> connector comm was introduced in commit
> f786ecba4158 ("connector: add comm change event report to proc connector").
> struct comm_proc_event was defined in include/linux/cn_proc.h first and
> then been moved into file include/uapi/linux/cn_proc.h in commit
> 607ca46e97a1 ("UAPI: (Scripted) Disintegrate include/linux").
>
> As this is the UAPI code, we can't change it without potentially breaking
> things (i.e. userspace binaries have this size built in, so we can't just
> change the size). To prepare for the followup change - extending task
> comm, we have to use __get_task_comm() to avoid the BUILD_BUG_ON() in
> proc_comm_connector().
I wonder, looking at this again, if it might make more sense to avoid
this cn_proc.c change, and instead, adjust get_task_comm() like so:
#define get_task_comm(buf, tsk)
__get_task_comm(buf, __must_be_array(buf) + sizeof(buf), tsk)
This would still enforce the original goal of making sure
get_task_comm() is being used on a char array, and now that
__get_task_comm() will truncate & pad, it's safe to use on both
too-small and too-big arrays.
--
Kees Cook
On Mon, Oct 25, 2021 at 1:33 AM Yafang Shao <[email protected]> wrote:
>
> There're many truncated kthreads in the kernel, which may make trouble
> for the user, for example, the user can't get detailed device
> information from the task comm.
>
> This patchset tries to improve this problem fundamentally by extending
> the task comm size from 16 to 24. In order to do that, we have to do
> some cleanups first.
It looks like a churn that doesn't really address the problem.
If we were to allow long names then make it into a pointer and use 16 byte
as an optimized storage for short names. Any longer name would be a pointer.
In other words make it similar to dentry->d_iname.
On Mon, Oct 25, 2021 at 08:33:07AM +0000, Yafang Shao wrote:
> Use strscpy_pad() instead of strlcpy() to make the comm always nul
> terminated. As the comment above the hard-coded 16, we can replace it
> with TASK_COMM_LEN, then it will adopt to the comm size change.
>
> Signed-off-by: Yafang Shao <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
> ---
> drivers/infiniband/hw/qib/qib.h | 2 +-
> drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
> index 9363bccfc6e7..a8e1c30c370f 100644
> --- a/drivers/infiniband/hw/qib/qib.h
> +++ b/drivers/infiniband/hw/qib/qib.h
> @@ -196,7 +196,7 @@ struct qib_ctxtdata {
> pid_t pid;
> pid_t subpid[QLOGIC_IB_MAX_SUBCTXT];
> /* same size as task_struct .comm[], command that opened context */
> - char comm[16];
> + char comm[TASK_COMM_LEN];
> /* pkeys set by this use of this ctxt */
> u16 pkeys[4];
> /* so file ops can get at unit */
> diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
> index 63854f4b6524..7ab2b448c183 100644
> --- a/drivers/infiniband/hw/qib/qib_file_ops.c
> +++ b/drivers/infiniband/hw/qib/qib_file_ops.c
> @@ -1321,7 +1321,7 @@ static int setup_ctxt(struct qib_pportdata *ppd, int ctxt,
> rcd->tid_pg_list = ptmp;
> rcd->pid = current->pid;
> init_waitqueue_head(&dd->rcd[ctxt]->wait);
> - strlcpy(rcd->comm, current->comm, sizeof(rcd->comm));
> + strscpy_pad(rcd->comm, current->comm, sizeof(rcd->comm));
This should use (the adjusted) get_task_comm() instead of leaving this
open-coded.
> ctxt_fp(fp) = rcd;
> qib_stats.sps_ctxts++;
> dd->freectxts--;
> --
> 2.17.1
>
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:09AM +0000, Yafang Shao wrote:
> bpf_probe_read_kernel_str() will add a nul terminator to the dst, then
> we don't care about if the dst size is big enough. This patch also
> replaces the hard-coded 16 with TASK_COMM_LEN to make it adopt to task
> comm size change.
>
> Signed-off-by: Yafang Shao <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
As these are samples, I guess it's fine to change their sizes.
Reviewed-by: Kees Cook <[email protected]>
> ---
> samples/bpf/test_overhead_kprobe_kern.c | 11 ++++++-----
> samples/bpf/test_overhead_tp_kern.c | 5 +++--
> 2 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/samples/bpf/test_overhead_kprobe_kern.c b/samples/bpf/test_overhead_kprobe_kern.c
> index f6d593e47037..8fdd2c9c56b2 100644
> --- a/samples/bpf/test_overhead_kprobe_kern.c
> +++ b/samples/bpf/test_overhead_kprobe_kern.c
> @@ -6,6 +6,7 @@
> */
> #include <linux/version.h>
> #include <linux/ptrace.h>
> +#include <linux/sched.h>
> #include <uapi/linux/bpf.h>
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_tracing.h>
> @@ -22,17 +23,17 @@ int prog(struct pt_regs *ctx)
> {
> struct signal_struct *signal;
> struct task_struct *tsk;
> - char oldcomm[16] = {};
> - char newcomm[16] = {};
> + char oldcomm[TASK_COMM_LEN] = {};
> + char newcomm[TASK_COMM_LEN] = {};
> u16 oom_score_adj;
> u32 pid;
>
> tsk = (void *)PT_REGS_PARM1(ctx);
>
> pid = _(tsk->pid);
> - bpf_probe_read_kernel(oldcomm, sizeof(oldcomm), &tsk->comm);
> - bpf_probe_read_kernel(newcomm, sizeof(newcomm),
> - (void *)PT_REGS_PARM2(ctx));
> + bpf_probe_read_kernel_str(oldcomm, sizeof(oldcomm), &tsk->comm);
> + bpf_probe_read_kernel_str(newcomm, sizeof(newcomm),
> + (void *)PT_REGS_PARM2(ctx));
> signal = _(tsk->signal);
> oom_score_adj = _(signal->oom_score_adj);
> return 0;
> diff --git a/samples/bpf/test_overhead_tp_kern.c b/samples/bpf/test_overhead_tp_kern.c
> index eaa32693f8fc..80edadacb692 100644
> --- a/samples/bpf/test_overhead_tp_kern.c
> +++ b/samples/bpf/test_overhead_tp_kern.c
> @@ -4,6 +4,7 @@
> * modify it under the terms of version 2 of the GNU General Public
> * License as published by the Free Software Foundation.
> */
> +#include <linux/sched.h>
> #include <uapi/linux/bpf.h>
> #include <bpf/bpf_helpers.h>
>
> @@ -11,8 +12,8 @@
> struct task_rename {
> __u64 pad;
> __u32 pid;
> - char oldcomm[16];
> - char newcomm[16];
> + char oldcomm[TASK_COMM_LEN];
> + char newcomm[TASK_COMM_LEN];
> __u16 oom_score_adj;
> };
> SEC("tracepoint/task/task_rename")
> --
> 2.17.1
>
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:08AM +0000, Yafang Shao wrote:
> kernel test robot reported a -Wstringop-truncation warning after I
> extend task comm from 16 to 24. Below is the detailed warning:
>
> fs/binfmt_elf.c: In function 'fill_psinfo.isra':
> >> fs/binfmt_elf.c:1575:9: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 23 [-Wstringop-truncation]
> 1575 | strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> This patch can fix this warning.
>
> Replacing strncpy() with strscpy_pad() can avoid this warning.
>
> This patch also replace the hard-coded 16 with TASK_COMM_LEN to make it
> more compatible with task comm size change.
>
> I also verfied if it still work well when I extend the comm size to 24.
> struct elf_prpsinfo is used to dump the task information in userspace
> coredump or kernel vmcore. Below is the verfication of vmcore,
>
> crash> ps
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 ffffffff9d21a940 RU 0.0 0 0 [swapper/0]
> > 0 0 1 ffffa09e40f85e80 RU 0.0 0 0 [swapper/1]
> > 0 0 2 ffffa09e40f81f80 RU 0.0 0 0 [swapper/2]
> > 0 0 3 ffffa09e40f83f00 RU 0.0 0 0 [swapper/3]
> > 0 0 4 ffffa09e40f80000 RU 0.0 0 0 [swapper/4]
> > 0 0 5 ffffa09e40f89f80 RU 0.0 0 0 [swapper/5]
> 0 0 6 ffffa09e40f8bf00 RU 0.0 0 0 [swapper/6]
> > 0 0 7 ffffa09e40f88000 RU 0.0 0 0 [swapper/7]
> > 0 0 8 ffffa09e40f8de80 RU 0.0 0 0 [swapper/8]
> > 0 0 9 ffffa09e40f95e80 RU 0.0 0 0 [swapper/9]
> > 0 0 10 ffffa09e40f91f80 RU 0.0 0 0 [swapper/10]
> > 0 0 11 ffffa09e40f93f00 RU 0.0 0 0 [swapper/11]
> > 0 0 12 ffffa09e40f90000 RU 0.0 0 0 [swapper/12]
> > 0 0 13 ffffa09e40f9bf00 RU 0.0 0 0 [swapper/13]
> > 0 0 14 ffffa09e40f98000 RU 0.0 0 0 [swapper/14]
> > 0 0 15 ffffa09e40f9de80 RU 0.0 0 0 [swapper/15]
>
> It works well as expected.
>
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Yafang Shao <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
> ---
> fs/binfmt_elf.c | 2 +-
> include/linux/elfcore-compat.h | 3 ++-
> include/linux/elfcore.h | 4 ++--
> 3 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index a813b70f594e..a4ba79fce2a9 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1572,7 +1572,7 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
> SET_UID(psinfo->pr_uid, from_kuid_munged(cred->user_ns, cred->uid));
> SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
> rcu_read_unlock();
> - strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
> + strscpy_pad(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
This should use get_task_comm().
>
> return 0;
> }
> diff --git a/include/linux/elfcore-compat.h b/include/linux/elfcore-compat.h
> index e272c3d452ce..afa0eb45196b 100644
> --- a/include/linux/elfcore-compat.h
> +++ b/include/linux/elfcore-compat.h
> @@ -5,6 +5,7 @@
> #include <linux/elf.h>
> #include <linux/elfcore.h>
> #include <linux/compat.h>
> +#include <linux/sched.h>
>
> /*
> * Make sure these layouts match the linux/elfcore.h native definitions.
> @@ -43,7 +44,7 @@ struct compat_elf_prpsinfo
> __compat_uid_t pr_uid;
> __compat_gid_t pr_gid;
> compat_pid_t pr_pid, pr_ppid, pr_pgrp, pr_sid;
> - char pr_fname[16];
> + char pr_fname[TASK_COMM_LEN];
> char pr_psargs[ELF_PRARGSZ];
> };
>
> diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h
> index 2aaa15779d50..8d79cd58b09a 100644
> --- a/include/linux/elfcore.h
> +++ b/include/linux/elfcore.h
> @@ -65,8 +65,8 @@ struct elf_prpsinfo
> __kernel_gid_t pr_gid;
> pid_t pr_pid, pr_ppid, pr_pgrp, pr_sid;
> /* Lots missing */
> - char pr_fname[16]; /* filename of executable */
> - char pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
> + char pr_fname[TASK_COMM_LEN]; /* filename of executable */
> + char pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
> };
>
> static inline void elf_core_copy_regs(elf_gregset_t *elfregs, struct pt_regs *regs)
> --
> 2.17.1
>
These structs are externally parsed -- we can't change the size of
pr_fname AFAICT.
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:10AM +0000, Yafang Shao wrote:
> The sched:sched_switch tracepoint is derived from kernel, we should make
> its args compitable with the kernel.
>
> Signed-off-by: Yafang Shao <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
> ---
> samples/bpf/offwaketime_kern.c | 4 ++--
Seems this should be merged with the prior bpf samples patch?
-Kees
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/samples/bpf/offwaketime_kern.c b/samples/bpf/offwaketime_kern.c
> index 4866afd054da..eb4d94742e6b 100644
> --- a/samples/bpf/offwaketime_kern.c
> +++ b/samples/bpf/offwaketime_kern.c
> @@ -113,11 +113,11 @@ static inline int update_counts(void *ctx, u32 pid, u64 delta)
> /* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
> struct sched_switch_args {
> unsigned long long pad;
> - char prev_comm[16];
> + char prev_comm[TASK_COMM_LEN];
> int prev_pid;
> int prev_prio;
> long long prev_state;
> - char next_comm[16];
> + char next_comm[TASK_COMM_LEN];
> int next_pid;
> int next_prio;
> };
> --
> 2.17.1
>
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:11AM +0000, Yafang Shao wrote:
> bpf_probe_read_kernel_str() will add a nul terminator to the dst, then
> we don't care about if the dst size is big enough.
>
> Signed-off-by: Yafang Shao <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
So, if we're ever going to copying these buffers out of the kernel (I
don't know what the object lifetime here in bpf is for "e", etc), we
should be zero-padding (as get_task_comm() does).
Should this, instead, be using a bounce buffer?
get_task_comm(comm, task->group_leader);
bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm), comm);
-Kees
> ---
> tools/bpf/bpftool/skeleton/pid_iter.bpf.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c
> index d9b420972934..f70702fcb224 100644
> --- a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c
> +++ b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c
> @@ -71,8 +71,8 @@ int iter(struct bpf_iter__task_file *ctx)
>
> e.pid = task->tgid;
> e.id = get_obj_id(file->private_data, obj_type);
> - bpf_probe_read_kernel(&e.comm, sizeof(e.comm),
> - task->group_leader->comm);
> + bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm),
> + task->group_leader->comm);
> bpf_seq_write(ctx->meta->seq, &e, sizeof(e));
>
> return 0;
> --
> 2.17.1
>
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:12AM +0000, Yafang Shao wrote:
> kernel test robot reported a perf-test failure after I extended task comm
> size from 16 to 24. The failure as follows,
>
> 2021-10-13 18:00:46 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-317419b91ef4eff4e2f046088201e4dc4065caa0/tools/perf/perf test 15
> 15: Parse sched tracepoints fields : FAILED!
>
> The reason is perf-test requires a fixed-size task comm. If we extend
> task comm size to 24, it will not equil with the required size 16 in perf
> test.
>
> After some analyzation, I found perf itself can adopt to the size
> change, for example, below is the output of perf-sched after I extend
> comm size to 24 -
>
> task 614 ( kthreadd: 84), nr_events: 1
> task 615 ( systemd: 843), nr_events: 1
> task 616 ( networkd-dispat: 1026), nr_events: 1
> task 617 ( systemd: 846), nr_events: 1
>
> $ cat /proc/843/comm
> networkd-dispatcher
>
> The task comm can be displayed correctly as expected.
>
> Replace old hard-coded 16 with the new one can fix the warning, but we'd
> better make the test accept both old and new sizes, then it can be
> backward compatibility.
>
> After this patch, the perf-test succeeds no matter task comm is 16 or
> 24 -
>
> 15: Parse sched tracepoints fields : Ok
>
> This patch is a preparation for the followup patch.
>
> Reported-by: kernel test robot <[email protected]>
> Suggested-by: Arnaldo Carvalho de Melo <[email protected]>
> Signed-off-by: Yafang Shao <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
I'll let the perf folks comment on this one. :)
-Kees
> ---
> tools/include/linux/sched.h | 11 +++++++++++
> tools/perf/tests/evsel-tp-sched.c | 26 ++++++++++++++++++++------
> 2 files changed, 31 insertions(+), 6 deletions(-)
> create mode 100644 tools/include/linux/sched.h
>
> diff --git a/tools/include/linux/sched.h b/tools/include/linux/sched.h
> new file mode 100644
> index 000000000000..0d575afd7f43
> --- /dev/null
> +++ b/tools/include/linux/sched.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _TOOLS_LINUX_SCHED_H
> +#define _TOOLS_LINUX_SCHED_H
> +
> +/* Keep both length for backward compatibility */
> +enum {
> + TASK_COMM_LEN_16 = 16,
> + TASK_COMM_LEN = 24,
> +};
> +
> +#endif /* _TOOLS_LINUX_SCHED_H */
> diff --git a/tools/perf/tests/evsel-tp-sched.c b/tools/perf/tests/evsel-tp-sched.c
> index f9e34bd26cf3..029f2a8c8e51 100644
> --- a/tools/perf/tests/evsel-tp-sched.c
> +++ b/tools/perf/tests/evsel-tp-sched.c
> @@ -1,11 +1,13 @@
> // SPDX-License-Identifier: GPL-2.0
> #include <linux/err.h>
> +#include <linux/sched.h>
> #include <traceevent/event-parse.h>
> #include "evsel.h"
> #include "tests.h"
> #include "debug.h"
>
> -static int evsel__test_field(struct evsel *evsel, const char *name, int size, bool should_be_signed)
> +static int evsel__test_field_alt(struct evsel *evsel, const char *name,
> + int size, int alternate_size, bool should_be_signed)
> {
> struct tep_format_field *field = evsel__field(evsel, name);
> int is_signed;
> @@ -23,15 +25,24 @@ static int evsel__test_field(struct evsel *evsel, const char *name, int size, bo
> ret = -1;
> }
>
> - if (field->size != size) {
> - pr_debug("%s: \"%s\" size (%d) should be %d!\n",
> + if (field->size != size && field->size != alternate_size) {
> + pr_debug("%s: \"%s\" size (%d) should be %d",
> evsel->name, name, field->size, size);
> + if (alternate_size > 0)
> + pr_debug(" or %d", alternate_size);
> + pr_debug("!\n");
> ret = -1;
> }
>
> return ret;
> }
>
> +static int evsel__test_field(struct evsel *evsel, const char *name,
> + int size, bool should_be_signed)
> +{
> + return evsel__test_field_alt(evsel, name, size, -1, should_be_signed);
> +}
> +
> int test__perf_evsel__tp_sched_test(struct test *test __maybe_unused, int subtest __maybe_unused)
> {
> struct evsel *evsel = evsel__newtp("sched", "sched_switch");
> @@ -42,7 +53,8 @@ int test__perf_evsel__tp_sched_test(struct test *test __maybe_unused, int subtes
> return -1;
> }
>
> - if (evsel__test_field(evsel, "prev_comm", 16, false))
> + if (evsel__test_field_alt(evsel, "prev_comm", TASK_COMM_LEN_16,
> + TASK_COMM_LEN, false))
> ret = -1;
>
> if (evsel__test_field(evsel, "prev_pid", 4, true))
> @@ -54,7 +66,8 @@ int test__perf_evsel__tp_sched_test(struct test *test __maybe_unused, int subtes
> if (evsel__test_field(evsel, "prev_state", sizeof(long), true))
> ret = -1;
>
> - if (evsel__test_field(evsel, "next_comm", 16, false))
> + if (evsel__test_field_alt(evsel, "next_comm", TASK_COMM_LEN_16,
> + TASK_COMM_LEN, false))
> ret = -1;
>
> if (evsel__test_field(evsel, "next_pid", 4, true))
> @@ -72,7 +85,8 @@ int test__perf_evsel__tp_sched_test(struct test *test __maybe_unused, int subtes
> return -1;
> }
>
> - if (evsel__test_field(evsel, "comm", 16, false))
> + if (evsel__test_field_alt(evsel, "comm", TASK_COMM_LEN_16,
> + TASK_COMM_LEN, false))
> ret = -1;
>
> if (evsel__test_field(evsel, "pid", 4, true))
> --
> 2.17.1
>
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:13AM +0000, Yafang Shao wrote:
> The hard-coded 16 is used in various bpf progs. These progs get task
> comm either via bpf_get_current_comm() or prctl() or
> bpf_core_read_str(), all of which can work well even if the task comm size
> is changed.
>
> In these BPF programs, one thing to be improved is the
> sched:sched_switch tracepoint args. As the tracepoint args are derived
> from the kernel, we'd better make it same with the kernel. So the macro
> TASK_COMM_LEN is converted to type enum, then all the BPF programs can
> get it through BTF.
>
> Signed-off-by: Yafang Shao <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
> ---
> include/linux/sched.h | 9 +++++++--
> tools/testing/selftests/bpf/progs/test_stacktrace_map.c | 6 +++---
> tools/testing/selftests/bpf/progs/test_tracepoint.c | 6 +++---
> 3 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index c1a927ddec64..124538db792c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -274,8 +274,13 @@ struct task_group;
>
> #define get_current_state() READ_ONCE(current->__state)
>
> -/* Task command name length: */
> -#define TASK_COMM_LEN 16
> +/*
> + * Define the task command name length as enum, then it can be visible to
> + * BPF programs.
> + */
> +enum {
> + TASK_COMM_LEN = 16,
> +};
>
> extern void scheduler_tick(void);
>
> diff --git a/tools/testing/selftests/bpf/progs/test_stacktrace_map.c b/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
> index 00ed48672620..e9b602a6dc1b 100644
> --- a/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
> +++ b/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
> @@ -1,7 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
> // Copyright (c) 2018 Facebook
>
> -#include <linux/bpf.h>
> +#include <vmlinux.h>
Why is this change needed here and below?
> #include <bpf/bpf_helpers.h>
>
> #ifndef PERF_MAX_STACK_DEPTH
> @@ -41,11 +41,11 @@ struct {
> /* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
> struct sched_switch_args {
> unsigned long long pad;
> - char prev_comm[16];
> + char prev_comm[TASK_COMM_LEN];
> int prev_pid;
> int prev_prio;
> long long prev_state;
> - char next_comm[16];
> + char next_comm[TASK_COMM_LEN];
> int next_pid;
> int next_prio;
> };
> diff --git a/tools/testing/selftests/bpf/progs/test_tracepoint.c b/tools/testing/selftests/bpf/progs/test_tracepoint.c
> index 4b825ee122cf..f21982681e28 100644
> --- a/tools/testing/selftests/bpf/progs/test_tracepoint.c
> +++ b/tools/testing/selftests/bpf/progs/test_tracepoint.c
> @@ -1,17 +1,17 @@
> // SPDX-License-Identifier: GPL-2.0
> // Copyright (c) 2017 Facebook
>
> -#include <linux/bpf.h>
> +#include <vmlinux.h>
> #include <bpf/bpf_helpers.h>
>
> /* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
> struct sched_switch_args {
> unsigned long long pad;
> - char prev_comm[16];
> + char prev_comm[TASK_COMM_LEN];
> int prev_pid;
> int prev_prio;
> long long prev_state;
> - char next_comm[16];
> + char next_comm[TASK_COMM_LEN];
> int next_pid;
> int next_prio;
> };
> --
> 2.17.1
>
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:14AM +0000, Yafang Shao wrote:
> When I was implementing a new per-cpu kthread cfs_migration, I found the
> comm of it "cfs_migration/%u" is truncated due to the limitation of
> TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are
> all with the same name "cfs_migration/1", which will confuse the user. This
> issue is not critical, because we can get the corresponding CPU from the
> task's Cpus_allowed. But for kthreads correspoinding to other hardware
> devices, it is not easy to get the detailed device info from task comm,
> for example,
>
> jbd2/nvme0n1p2-
> xfs-reclaim/sdf
>
> We can also shorten the name to work around this problem, but I find
> there are so many truncated kthreads:
>
> rcu_tasks_kthre
> rcu_tasks_rude_
> rcu_tasks_trace
> poll_mpt3sas0_s
> ext4-rsv-conver
> xfs-reclaim/sd{a, b, c, ...}
> xfs-blockgc/sd{a, b, c, ...}
> xfs-inodegc/sd{a, b, c, ...}
> audit_send_repl
> ecryptfs-kthrea
> vfio-irqfd-clea
> jbd2/nvme0n1p2-
> ...
>
> We should improve this problem fundamentally by extending comm size to
> 24 bytes. task_struct is growing rather regularly by 8 bytes.
>
> After this change, the truncated kthreads listed above will be
> displayed as:
>
> rcu_tasks_kthread
> rcu_tasks_rude_kthread
> rcu_tasks_trace_kthread
> poll_mpt3sas0_statu
> ext4-rsv-conversion
> xfs-reclaim/sdf1
> xfs-blockgc/sdf1
> xfs-inodegc/sdf1
> audit_send_reply
> ecryptfs-kthread
> vfio-irqfd-cleanup
> jbd2/nvme0n1p2-8
>
> As we have converted all the unsafe copy of task comm to the safe one,
> this change won't make any trouble to the kernel or the in-tree tools.
> The safe one and unsafe one of comm copy as follows,
>
> Unsafe Safe
> strlcpy strscpy_pad
> strncpy strscpy_pad
> bpf_probe_read_kernel bpf_probe_read_kernel_str
> bpf_core_read_str
> bpf_get_current_comm
> perf_event__prepare_comm
> prctl(2)
>
> Regarding the possible risk it may take to the out-of-tree user tools, if
> the user tools get the task comm through kernel API like prctl(2),
> bpf_get_current_comm() and etc, the tools still work well after this
> change. While If the user tools get the task comm through direct string
> copy, it must make sure the copied string should be with a nul terminator.
>
> Signed-off-by: Yafang Shao <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
> ---
> include/linux/sched.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 124538db792c..490d12eabe44 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -279,7 +279,7 @@ struct task_group;
> * BPF programs.
> */
> enum {
> - TASK_COMM_LEN = 16,
> + TASK_COMM_LEN = 24,
> };
I suspect this should be kept in sync with the tools/ copy of sched.h
(i.e. we may need to keep the TASK_COMM_LEN_16 around in the kernel tree
too.)
>
> extern void scheduler_tick(void);
> --
> 2.17.1
>
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:15AM +0000, Yafang Shao wrote:
> Show a warning if task comm is truncated. Below is the result
> of my test case:
>
> truncated kthread comm:I-am-a-kthread-with-lon, pid:14 by 6 characters
>
> Suggested-by: Petr Mladek <[email protected]>
> Signed-off-by: Yafang Shao <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
> ---
> kernel/kthread.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 5b37a8567168..46b924c92078 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -399,12 +399,17 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
> if (!IS_ERR(task)) {
> static const struct sched_param param = { .sched_priority = 0 };
> char name[TASK_COMM_LEN];
> + int len;
>
> /*
> * task is already visible to other tasks, so updating
> * COMM must be protected.
> */
> - vsnprintf(name, sizeof(name), namefmt, args);
> + len = vsnprintf(name, sizeof(name), namefmt, args);
> + if (len >= TASK_COMM_LEN) {
And since this failure case is slow-path, we could improve the warning
as other had kind of suggested earlier with something like this instead:
char *full_comm;
full_comm = kvasprintf(GFP_KERNEL, namefmt, args);
pr_warn("truncated kthread comm '%s' to '%s' (pid:%d)\n",
full_comm, name);
kfree(full_comm);
}
> set_task_comm(task, name);
> /*
> * root may have changed our (kthreadd's) priority or CPU mask.
> --
> 2.17.1
>
--
Kees Cook
On Mon, Oct 25, 2021 at 08:33:05AM +0000, Yafang Shao wrote:
> If the dest buffer size is smaller than sizeof(tsk->comm), the buffer
> will be without null ternimator, that may cause problem. We can make sure
> the buffer size not smaller than comm at the callsite to avoid that
> problem, but there may be callsite that we can't easily change.
>
> Using strscpy_pad() instead of strncpy() in __get_task_comm() can make
> the string always nul ternimated.
>
> Suggested-by: Kees Cook <[email protected]>
> Suggested-by: Steven Rostedt <[email protected]>
> Signed-off-by: Yafang Shao <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Petr Mladek <[email protected]>
> ---
> fs/exec.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 404156b5b314..bf2a7a91eeea 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1209,7 +1209,8 @@ static int unshare_sighand(struct task_struct *me)
> char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
> {
> task_lock(tsk);
> - strncpy(buf, tsk->comm, buf_size);
> + /* The copied value is always null terminated */
This may could say "always NUL terminated and zero-padded"
> + strscpy_pad(buf, tsk->comm, buf_size);
> task_unlock(tsk);
> return buf;
> }
> --
> 2.17.1
>
But for the replacement with strscpy_pad(), yes please:
Reviewed-by: Kees Cook <[email protected]>
--
Kees Cook
On Tue, Oct 26, 2021 at 5:07 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 25, 2021 at 08:33:04AM +0000, Yafang Shao wrote:
> > Make sure the string set to task comm is always nul ternimated.
>
> typo nit: "terminated"
>
Thanks for pointing this out. I will correct lt.
> >
> > Signed-off-by: Yafang Shao <[email protected]>
>
> Reviewed-by: Kees Cook <[email protected]>
>
Thanks for the review.
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 5:16 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 25, 2021 at 08:33:07AM +0000, Yafang Shao wrote:
> > Use strscpy_pad() instead of strlcpy() to make the comm always nul
> > terminated. As the comment above the hard-coded 16, we can replace it
> > with TASK_COMM_LEN, then it will adopt to the comm size change.
> >
> > Signed-off-by: Yafang Shao <[email protected]>
> > Cc: Mathieu Desnoyers <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > ---
> > drivers/infiniband/hw/qib/qib.h | 2 +-
> > drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
> > index 9363bccfc6e7..a8e1c30c370f 100644
> > --- a/drivers/infiniband/hw/qib/qib.h
> > +++ b/drivers/infiniband/hw/qib/qib.h
> > @@ -196,7 +196,7 @@ struct qib_ctxtdata {
> > pid_t pid;
> > pid_t subpid[QLOGIC_IB_MAX_SUBCTXT];
> > /* same size as task_struct .comm[], command that opened context */
> > - char comm[16];
> > + char comm[TASK_COMM_LEN];
> > /* pkeys set by this use of this ctxt */
> > u16 pkeys[4];
> > /* so file ops can get at unit */
> > diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
> > index 63854f4b6524..7ab2b448c183 100644
> > --- a/drivers/infiniband/hw/qib/qib_file_ops.c
> > +++ b/drivers/infiniband/hw/qib/qib_file_ops.c
> > @@ -1321,7 +1321,7 @@ static int setup_ctxt(struct qib_pportdata *ppd, int ctxt,
> > rcd->tid_pg_list = ptmp;
> > rcd->pid = current->pid;
> > init_waitqueue_head(&dd->rcd[ctxt]->wait);
> > - strlcpy(rcd->comm, current->comm, sizeof(rcd->comm));
> > + strscpy_pad(rcd->comm, current->comm, sizeof(rcd->comm));
>
> This should use (the adjusted) get_task_comm() instead of leaving this
> open-coded.
>
Sure, that is better.
> > ctxt_fp(fp) = rcd;
> > qib_stats.sps_ctxts++;
> > dd->freectxts--;
> > --
> > 2.17.1
> >
>
> --
> Kees Cook
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 5:30 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 25, 2021 at 08:33:14AM +0000, Yafang Shao wrote:
> > When I was implementing a new per-cpu kthread cfs_migration, I found the
> > comm of it "cfs_migration/%u" is truncated due to the limitation of
> > TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are
> > all with the same name "cfs_migration/1", which will confuse the user. This
> > issue is not critical, because we can get the corresponding CPU from the
> > task's Cpus_allowed. But for kthreads correspoinding to other hardware
> > devices, it is not easy to get the detailed device info from task comm,
> > for example,
> >
> > jbd2/nvme0n1p2-
> > xfs-reclaim/sdf
> >
> > We can also shorten the name to work around this problem, but I find
> > there are so many truncated kthreads:
> >
> > rcu_tasks_kthre
> > rcu_tasks_rude_
> > rcu_tasks_trace
> > poll_mpt3sas0_s
> > ext4-rsv-conver
> > xfs-reclaim/sd{a, b, c, ...}
> > xfs-blockgc/sd{a, b, c, ...}
> > xfs-inodegc/sd{a, b, c, ...}
> > audit_send_repl
> > ecryptfs-kthrea
> > vfio-irqfd-clea
> > jbd2/nvme0n1p2-
> > ...
> >
> > We should improve this problem fundamentally by extending comm size to
> > 24 bytes. task_struct is growing rather regularly by 8 bytes.
> >
> > After this change, the truncated kthreads listed above will be
> > displayed as:
> >
> > rcu_tasks_kthread
> > rcu_tasks_rude_kthread
> > rcu_tasks_trace_kthread
> > poll_mpt3sas0_statu
> > ext4-rsv-conversion
> > xfs-reclaim/sdf1
> > xfs-blockgc/sdf1
> > xfs-inodegc/sdf1
> > audit_send_reply
> > ecryptfs-kthread
> > vfio-irqfd-cleanup
> > jbd2/nvme0n1p2-8
> >
> > As we have converted all the unsafe copy of task comm to the safe one,
> > this change won't make any trouble to the kernel or the in-tree tools.
> > The safe one and unsafe one of comm copy as follows,
> >
> > Unsafe Safe
> > strlcpy strscpy_pad
> > strncpy strscpy_pad
> > bpf_probe_read_kernel bpf_probe_read_kernel_str
> > bpf_core_read_str
> > bpf_get_current_comm
> > perf_event__prepare_comm
> > prctl(2)
> >
> > Regarding the possible risk it may take to the out-of-tree user tools, if
> > the user tools get the task comm through kernel API like prctl(2),
> > bpf_get_current_comm() and etc, the tools still work well after this
> > change. While If the user tools get the task comm through direct string
> > copy, it must make sure the copied string should be with a nul terminator.
> >
> > Signed-off-by: Yafang Shao <[email protected]>
> > Cc: Mathieu Desnoyers <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > ---
> > include/linux/sched.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 124538db792c..490d12eabe44 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -279,7 +279,7 @@ struct task_group;
> > * BPF programs.
> > */
> > enum {
> > - TASK_COMM_LEN = 16,
> > + TASK_COMM_LEN = 24,
> > };
>
> I suspect this should be kept in sync with the tools/ copy of sched.h
> (i.e. we may need to keep the TASK_COMM_LEN_16 around in the kernel tree
> too.)
>
Sure. I will change it.
> >
> > extern void scheduler_tick(void);
> > --
> > 2.17.1
> >
>
> --
> Kees Cook
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 5:24 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 25, 2021 at 08:33:11AM +0000, Yafang Shao wrote:
> > bpf_probe_read_kernel_str() will add a nul terminator to the dst, then
> > we don't care about if the dst size is big enough.
> >
> > Signed-off-by: Yafang Shao <[email protected]>
> > Cc: Mathieu Desnoyers <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Petr Mladek <[email protected]>
>
> So, if we're ever going to copying these buffers out of the kernel (I
> don't know what the object lifetime here in bpf is for "e", etc), we
> should be zero-padding (as get_task_comm() does).
>
> Should this, instead, be using a bounce buffer?
The comment in bpf_probe_read_kernel_str_common() says
: /*
: * The strncpy_from_kernel_nofault() call will likely not fill the
: * entire buffer, but that's okay in this circumstance as we're probing
: * arbitrary memory anyway similar to bpf_probe_read_*() and might
: * as well probe the stack. Thus, memory is explicitly cleared
: * only in error case, so that improper users ignoring return
: * code altogether don't copy garbage; otherwise length of string
: * is returned that can be used for bpf_perf_event_output() et al.
: */
It seems that it doesn't matter if the buffer is filled as that is
probing arbitrary memory.
>
> get_task_comm(comm, task->group_leader);
This helper can't be used by the BPF programs, as it is not exported to BPF.
> bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm), comm);
>
> -Kees
>
> > ---
> > tools/bpf/bpftool/skeleton/pid_iter.bpf.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c
> > index d9b420972934..f70702fcb224 100644
> > --- a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c
> > +++ b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c
> > @@ -71,8 +71,8 @@ int iter(struct bpf_iter__task_file *ctx)
> >
> > e.pid = task->tgid;
> > e.id = get_obj_id(file->private_data, obj_type);
> > - bpf_probe_read_kernel(&e.comm, sizeof(e.comm),
> > - task->group_leader->comm);
> > + bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm),
> > + task->group_leader->comm);
> > bpf_seq_write(ctx->meta->seq, &e, sizeof(e));
> >
> > return 0;
> > --
> > 2.17.1
> >
>
> --
> Kees Cook
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 5:29 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 25, 2021 at 08:33:13AM +0000, Yafang Shao wrote:
> > The hard-coded 16 is used in various bpf progs. These progs get task
> > comm either via bpf_get_current_comm() or prctl() or
> > bpf_core_read_str(), all of which can work well even if the task comm size
> > is changed.
> >
> > In these BPF programs, one thing to be improved is the
> > sched:sched_switch tracepoint args. As the tracepoint args are derived
> > from the kernel, we'd better make it same with the kernel. So the macro
> > TASK_COMM_LEN is converted to type enum, then all the BPF programs can
> > get it through BTF.
> >
> > Signed-off-by: Yafang Shao <[email protected]>
> > Cc: Mathieu Desnoyers <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > ---
> > include/linux/sched.h | 9 +++++++--
> > tools/testing/selftests/bpf/progs/test_stacktrace_map.c | 6 +++---
> > tools/testing/selftests/bpf/progs/test_tracepoint.c | 6 +++---
> > 3 files changed, 13 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index c1a927ddec64..124538db792c 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -274,8 +274,13 @@ struct task_group;
> >
> > #define get_current_state() READ_ONCE(current->__state)
> >
> > -/* Task command name length: */
> > -#define TASK_COMM_LEN 16
> > +/*
> > + * Define the task command name length as enum, then it can be visible to
> > + * BPF programs.
> > + */
> > +enum {
> > + TASK_COMM_LEN = 16,
> > +};
> >
> > extern void scheduler_tick(void);
> >
> > diff --git a/tools/testing/selftests/bpf/progs/test_stacktrace_map.c b/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
> > index 00ed48672620..e9b602a6dc1b 100644
> > --- a/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
> > +++ b/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
> > @@ -1,7 +1,7 @@
> > // SPDX-License-Identifier: GPL-2.0
> > // Copyright (c) 2018 Facebook
> >
> > -#include <linux/bpf.h>
> > +#include <vmlinux.h>
>
> Why is this change needed here and below?
>
If the BPF programs want to use the type defined in the kernel, for
example the enum we used here, we must include the vmlinux.h generated
by BTF.
> > #include <bpf/bpf_helpers.h>
> >
> > #ifndef PERF_MAX_STACK_DEPTH
> > @@ -41,11 +41,11 @@ struct {
> > /* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
> > struct sched_switch_args {
> > unsigned long long pad;
> > - char prev_comm[16];
> > + char prev_comm[TASK_COMM_LEN];
> > int prev_pid;
> > int prev_prio;
> > long long prev_state;
> > - char next_comm[16];
> > + char next_comm[TASK_COMM_LEN];
> > int next_pid;
> > int next_prio;
> > };
> > diff --git a/tools/testing/selftests/bpf/progs/test_tracepoint.c b/tools/testing/selftests/bpf/progs/test_tracepoint.c
> > index 4b825ee122cf..f21982681e28 100644
> > --- a/tools/testing/selftests/bpf/progs/test_tracepoint.c
> > +++ b/tools/testing/selftests/bpf/progs/test_tracepoint.c
> > @@ -1,17 +1,17 @@
> > // SPDX-License-Identifier: GPL-2.0
> > // Copyright (c) 2017 Facebook
> >
> > -#include <linux/bpf.h>
> > +#include <vmlinux.h>
> > #include <bpf/bpf_helpers.h>
> >
> > /* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
> > struct sched_switch_args {
> > unsigned long long pad;
> > - char prev_comm[16];
> > + char prev_comm[TASK_COMM_LEN];
> > int prev_pid;
> > int prev_prio;
> > long long prev_state;
> > - char next_comm[16];
> > + char next_comm[TASK_COMM_LEN];
> > int next_pid;
> > int next_prio;
> > };
> > --
> > 2.17.1
> >
>
> --
> Kees Cook
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 5:35 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 25, 2021 at 08:33:15AM +0000, Yafang Shao wrote:
> > Show a warning if task comm is truncated. Below is the result
> > of my test case:
> >
> > truncated kthread comm:I-am-a-kthread-with-lon, pid:14 by 6 characters
> >
> > Suggested-by: Petr Mladek <[email protected]>
> > Signed-off-by: Yafang Shao <[email protected]>
> > Reviewed-by: Kees Cook <[email protected]>
> > Cc: Mathieu Desnoyers <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > ---
> > kernel/kthread.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index 5b37a8567168..46b924c92078 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -399,12 +399,17 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
> > if (!IS_ERR(task)) {
> > static const struct sched_param param = { .sched_priority = 0 };
> > char name[TASK_COMM_LEN];
> > + int len;
> >
> > /*
> > * task is already visible to other tasks, so updating
> > * COMM must be protected.
> > */
> > - vsnprintf(name, sizeof(name), namefmt, args);
> > + len = vsnprintf(name, sizeof(name), namefmt, args);
> > + if (len >= TASK_COMM_LEN) {
>
> And since this failure case is slow-path, we could improve the warning
> as other had kind of suggested earlier with something like this instead:
>
It Makes sense to me. I will do it as you suggested.
> char *full_comm;
>
> full_comm = kvasprintf(GFP_KERNEL, namefmt, args);
> pr_warn("truncated kthread comm '%s' to '%s' (pid:%d)\n",
> full_comm, name);
>
> kfree(full_comm);
> }
> > set_task_comm(task, name);
> > /*
> > * root may have changed our (kthreadd's) priority or CPU mask.
> > --
> > 2.17.1
> >
>
> --
> Kees Cook
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 5:08 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 25, 2021 at 08:33:05AM +0000, Yafang Shao wrote:
> > If the dest buffer size is smaller than sizeof(tsk->comm), the buffer
> > will be without null ternimator, that may cause problem. We can make sure
> > the buffer size not smaller than comm at the callsite to avoid that
> > problem, but there may be callsite that we can't easily change.
> >
> > Using strscpy_pad() instead of strncpy() in __get_task_comm() can make
> > the string always nul ternimated.
> >
> > Suggested-by: Kees Cook <[email protected]>
> > Suggested-by: Steven Rostedt <[email protected]>
> > Signed-off-by: Yafang Shao <[email protected]>
> > Cc: Mathieu Desnoyers <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > ---
> > fs/exec.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 404156b5b314..bf2a7a91eeea 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1209,7 +1209,8 @@ static int unshare_sighand(struct task_struct *me)
> > char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
> > {
> > task_lock(tsk);
> > - strncpy(buf, tsk->comm, buf_size);
> > + /* The copied value is always null terminated */
>
> This may could say "always NUL terminated and zero-padded"
>
Sure. I will change it.
> > + strscpy_pad(buf, tsk->comm, buf_size);
> > task_unlock(tsk);
> > return buf;
> > }
> > --
> > 2.17.1
> >
>
> But for the replacement with strscpy_pad(), yes please:
>
> Reviewed-by: Kees Cook <[email protected]>
>
>
> --
> Kees Cook
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 5:14 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 25, 2021 at 08:33:06AM +0000, Yafang Shao wrote:
> > connector comm was introduced in commit
> > f786ecba4158 ("connector: add comm change event report to proc connector").
> > struct comm_proc_event was defined in include/linux/cn_proc.h first and
> > then been moved into file include/uapi/linux/cn_proc.h in commit
> > 607ca46e97a1 ("UAPI: (Scripted) Disintegrate include/linux").
> >
> > As this is the UAPI code, we can't change it without potentially breaking
> > things (i.e. userspace binaries have this size built in, so we can't just
> > change the size). To prepare for the followup change - extending task
> > comm, we have to use __get_task_comm() to avoid the BUILD_BUG_ON() in
> > proc_comm_connector().
>
> I wonder, looking at this again, if it might make more sense to avoid
> this cn_proc.c change, and instead, adjust get_task_comm() like so:
>
> #define get_task_comm(buf, tsk)
> __get_task_comm(buf, __must_be_array(buf) + sizeof(buf), tsk)
>
> This would still enforce the original goal of making sure
> get_task_comm() is being used on a char array, and now that
> __get_task_comm() will truncate & pad, it's safe to use on both
> too-small and too-big arrays.
>
It Makes sense to me. I will do it as you suggested.
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 5:18 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 25, 2021 at 08:33:08AM +0000, Yafang Shao wrote:
> > kernel test robot reported a -Wstringop-truncation warning after I
> > extend task comm from 16 to 24. Below is the detailed warning:
> >
> > fs/binfmt_elf.c: In function 'fill_psinfo.isra':
> > >> fs/binfmt_elf.c:1575:9: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 23 [-Wstringop-truncation]
> > 1575 | strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
> > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > This patch can fix this warning.
> >
> > Replacing strncpy() with strscpy_pad() can avoid this warning.
> >
> > This patch also replace the hard-coded 16 with TASK_COMM_LEN to make it
> > more compatible with task comm size change.
> >
> > I also verfied if it still work well when I extend the comm size to 24.
> > struct elf_prpsinfo is used to dump the task information in userspace
> > coredump or kernel vmcore. Below is the verfication of vmcore,
> >
> > crash> ps
> > PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > 0 0 0 ffffffff9d21a940 RU 0.0 0 0 [swapper/0]
> > > 0 0 1 ffffa09e40f85e80 RU 0.0 0 0 [swapper/1]
> > > 0 0 2 ffffa09e40f81f80 RU 0.0 0 0 [swapper/2]
> > > 0 0 3 ffffa09e40f83f00 RU 0.0 0 0 [swapper/3]
> > > 0 0 4 ffffa09e40f80000 RU 0.0 0 0 [swapper/4]
> > > 0 0 5 ffffa09e40f89f80 RU 0.0 0 0 [swapper/5]
> > 0 0 6 ffffa09e40f8bf00 RU 0.0 0 0 [swapper/6]
> > > 0 0 7 ffffa09e40f88000 RU 0.0 0 0 [swapper/7]
> > > 0 0 8 ffffa09e40f8de80 RU 0.0 0 0 [swapper/8]
> > > 0 0 9 ffffa09e40f95e80 RU 0.0 0 0 [swapper/9]
> > > 0 0 10 ffffa09e40f91f80 RU 0.0 0 0 [swapper/10]
> > > 0 0 11 ffffa09e40f93f00 RU 0.0 0 0 [swapper/11]
> > > 0 0 12 ffffa09e40f90000 RU 0.0 0 0 [swapper/12]
> > > 0 0 13 ffffa09e40f9bf00 RU 0.0 0 0 [swapper/13]
> > > 0 0 14 ffffa09e40f98000 RU 0.0 0 0 [swapper/14]
> > > 0 0 15 ffffa09e40f9de80 RU 0.0 0 0 [swapper/15]
> >
> > It works well as expected.
> >
> > Reported-by: kernel test robot <[email protected]>
> > Signed-off-by: Yafang Shao <[email protected]>
> > Cc: Mathieu Desnoyers <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > ---
> > fs/binfmt_elf.c | 2 +-
> > include/linux/elfcore-compat.h | 3 ++-
> > include/linux/elfcore.h | 4 ++--
> > 3 files changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > index a813b70f594e..a4ba79fce2a9 100644
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -1572,7 +1572,7 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
> > SET_UID(psinfo->pr_uid, from_kuid_munged(cred->user_ns, cred->uid));
> > SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
> > rcu_read_unlock();
> > - strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
> > + strscpy_pad(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
>
> This should use get_task_comm().
>
Sure.
> >
> > return 0;
> > }
> > diff --git a/include/linux/elfcore-compat.h b/include/linux/elfcore-compat.h
> > index e272c3d452ce..afa0eb45196b 100644
> > --- a/include/linux/elfcore-compat.h
> > +++ b/include/linux/elfcore-compat.h
> > @@ -5,6 +5,7 @@
> > #include <linux/elf.h>
> > #include <linux/elfcore.h>
> > #include <linux/compat.h>
> > +#include <linux/sched.h>
> >
> > /*
> > * Make sure these layouts match the linux/elfcore.h native definitions.
> > @@ -43,7 +44,7 @@ struct compat_elf_prpsinfo
> > __compat_uid_t pr_uid;
> > __compat_gid_t pr_gid;
> > compat_pid_t pr_pid, pr_ppid, pr_pgrp, pr_sid;
> > - char pr_fname[16];
> > + char pr_fname[TASK_COMM_LEN];
> > char pr_psargs[ELF_PRARGSZ];
> > };
> >
> > diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h
> > index 2aaa15779d50..8d79cd58b09a 100644
> > --- a/include/linux/elfcore.h
> > +++ b/include/linux/elfcore.h
> > @@ -65,8 +65,8 @@ struct elf_prpsinfo
> > __kernel_gid_t pr_gid;
> > pid_t pr_pid, pr_ppid, pr_pgrp, pr_sid;
> > /* Lots missing */
> > - char pr_fname[16]; /* filename of executable */
> > - char pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
> > + char pr_fname[TASK_COMM_LEN]; /* filename of executable */
> > + char pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
> > };
> >
> > static inline void elf_core_copy_regs(elf_gregset_t *elfregs, struct pt_regs *regs)
> > --
> > 2.17.1
> >
>
> These structs are externally parsed -- we can't change the size of
> pr_fname AFAICT.
>
Yes, they are parsed by crash utility and other tools.
I will keep pr_fname as-is.
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 5:21 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 25, 2021 at 08:33:10AM +0000, Yafang Shao wrote:
> > The sched:sched_switch tracepoint is derived from kernel, we should make
> > its args compitable with the kernel.
> >
> > Signed-off-by: Yafang Shao <[email protected]>
> > Cc: Mathieu Desnoyers <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > ---
> > samples/bpf/offwaketime_kern.c | 4 ++--
>
> Seems this should be merged with the prior bpf samples patch?
>
Sure
>
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/samples/bpf/offwaketime_kern.c b/samples/bpf/offwaketime_kern.c
> > index 4866afd054da..eb4d94742e6b 100644
> > --- a/samples/bpf/offwaketime_kern.c
> > +++ b/samples/bpf/offwaketime_kern.c
> > @@ -113,11 +113,11 @@ static inline int update_counts(void *ctx, u32 pid, u64 delta)
> > /* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
> > struct sched_switch_args {
> > unsigned long long pad;
> > - char prev_comm[16];
> > + char prev_comm[TASK_COMM_LEN];
> > int prev_pid;
> > int prev_prio;
> > long long prev_state;
> > - char next_comm[16];
> > + char next_comm[TASK_COMM_LEN];
> > int next_pid;
> > int next_prio;
> > };
> > --
> > 2.17.1
> >
>
> --
> Kees Cook
--
Thanks
Yafang
On Mon 2021-10-25 17:05:03, Steven Rostedt wrote:
> On Mon, 25 Oct 2021 11:10:09 -0700
> Alexei Starovoitov <[email protected]> wrote:
>
> > It looks like a churn that doesn't really address the problem.
> > If we were to allow long names then make it into a pointer and use 16 byte
> > as an optimized storage for short names. Any longer name would be a pointer.
> > In other words make it similar to dentry->d_iname.
>
> That would be quite a bigger undertaking too, as it is assumed throughout
> the kernel that the task->comm is TASK_COMM_LEN and is nul terminated. And
> most locations that save the comm simply use a fixed size string of
> TASK_COMM_LEN. Not saying its not feasible, but it would require a lot more
> analysis of the impact by changing such a fundamental part of task struct
> from a static to something requiring allocation.
I fully agree. The evolution of this patchset clearly shows how many
code paths depend on the existing behavior.
> Unless you are suggesting that we truncate like normal the 16 byte names
> (to a max of 15 characters), and add a way to hold the entire name for
> those locations that understand it.
Yup. If the problem is only with kthreads, it might be possible to
store the pointer into "struct kthread" and update proc_task_name().
It would generalize the solution already used by workqueues.
I think that something like this was mentioned in the discussion
about v1.
Best Regards,
Petr
On Tue, 26 Oct 2021 10:18:51 +0800
Yafang Shao <[email protected]> wrote:
> > So, if we're ever going to copying these buffers out of the kernel (I
> > don't know what the object lifetime here in bpf is for "e", etc), we
> > should be zero-padding (as get_task_comm() does).
> >
> > Should this, instead, be using a bounce buffer?
>
> The comment in bpf_probe_read_kernel_str_common() says
>
> : /*
> : * The strncpy_from_kernel_nofault() call will likely not fill the
> : * entire buffer, but that's okay in this circumstance as we're probing
> : * arbitrary memory anyway similar to bpf_probe_read_*() and might
> : * as well probe the stack. Thus, memory is explicitly cleared
> : * only in error case, so that improper users ignoring return
> : * code altogether don't copy garbage; otherwise length of string
> : * is returned that can be used for bpf_perf_event_output() et al.
> : */
>
> It seems that it doesn't matter if the buffer is filled as that is
> probing arbitrary memory.
>
> >
> > get_task_comm(comm, task->group_leader);
>
> This helper can't be used by the BPF programs, as it is not exported to BPF.
>
> > bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm), comm);
I guess Kees is worried that e.comm will have something exported to user
space that it shouldn't. But since e is part of the BPF program, does the
BPF JIT take care to make sure everything on its stack is zero'd out, such
that a user BPF couldn't just read various items off its stack and by doing
so, see kernel memory it shouldn't be seeing?
I'm guessing it does, otherwise this would be a bigger issue than this
patch series.
-- Steve
On Tue, Oct 26, 2021 at 9:12 PM Steven Rostedt <[email protected]> wrote:
>
> On Tue, 26 Oct 2021 10:18:51 +0800
> Yafang Shao <[email protected]> wrote:
>
> > > So, if we're ever going to copying these buffers out of the kernel (I
> > > don't know what the object lifetime here in bpf is for "e", etc), we
> > > should be zero-padding (as get_task_comm() does).
> > >
> > > Should this, instead, be using a bounce buffer?
> >
> > The comment in bpf_probe_read_kernel_str_common() says
> >
> > : /*
> > : * The strncpy_from_kernel_nofault() call will likely not fill the
> > : * entire buffer, but that's okay in this circumstance as we're probing
> > : * arbitrary memory anyway similar to bpf_probe_read_*() and might
> > : * as well probe the stack. Thus, memory is explicitly cleared
> > : * only in error case, so that improper users ignoring return
> > : * code altogether don't copy garbage; otherwise length of string
> > : * is returned that can be used for bpf_perf_event_output() et al.
> > : */
> >
> > It seems that it doesn't matter if the buffer is filled as that is
> > probing arbitrary memory.
> >
> > >
> > > get_task_comm(comm, task->group_leader);
> >
> > This helper can't be used by the BPF programs, as it is not exported to BPF.
> >
> > > bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm), comm);
>
> I guess Kees is worried that e.comm will have something exported to user
> space that it shouldn't. But since e is part of the BPF program, does the
> BPF JIT take care to make sure everything on its stack is zero'd out, such
> that a user BPF couldn't just read various items off its stack and by doing
> so, see kernel memory it shouldn't be seeing?
>
Understood.
It can leak information to the user if the user buffer is large enough.
> I'm guessing it does, otherwise this would be a bigger issue than this
> patch series.
>
I will think about how to fix it.
At first glance, it seems we'd better introduce a new BPF helper like
bpf_probe_read_kernel_str_pad().
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 9:55 PM Yafang Shao <[email protected]> wrote:
>
> On Tue, Oct 26, 2021 at 9:12 PM Steven Rostedt <[email protected]> wrote:
> >
> > On Tue, 26 Oct 2021 10:18:51 +0800
> > Yafang Shao <[email protected]> wrote:
> >
> > > > So, if we're ever going to copying these buffers out of the kernel (I
> > > > don't know what the object lifetime here in bpf is for "e", etc), we
> > > > should be zero-padding (as get_task_comm() does).
> > > >
> > > > Should this, instead, be using a bounce buffer?
> > >
> > > The comment in bpf_probe_read_kernel_str_common() says
> > >
> > > : /*
> > > : * The strncpy_from_kernel_nofault() call will likely not fill the
> > > : * entire buffer, but that's okay in this circumstance as we're probing
> > > : * arbitrary memory anyway similar to bpf_probe_read_*() and might
> > > : * as well probe the stack. Thus, memory is explicitly cleared
> > > : * only in error case, so that improper users ignoring return
> > > : * code altogether don't copy garbage; otherwise length of string
> > > : * is returned that can be used for bpf_perf_event_output() et al.
> > > : */
> > >
> > > It seems that it doesn't matter if the buffer is filled as that is
> > > probing arbitrary memory.
> > >
> > > >
> > > > get_task_comm(comm, task->group_leader);
> > >
> > > This helper can't be used by the BPF programs, as it is not exported to BPF.
> > >
> > > > bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm), comm);
> >
> > I guess Kees is worried that e.comm will have something exported to user
> > space that it shouldn't. But since e is part of the BPF program, does the
> > BPF JIT take care to make sure everything on its stack is zero'd out, such
> > that a user BPF couldn't just read various items off its stack and by doing
> > so, see kernel memory it shouldn't be seeing?
> >
>
Ah, you mean the BPF JIT has already avoided leaking information to user.
I will check the BPF JIT code first.
> Understood.
> It can leak information to the user if the user buffer is large enough.
>
>
> > I'm guessing it does, otherwise this would be a bigger issue than this
> > patch series.
> >
>
> I will think about how to fix it.
> At first glance, it seems we'd better introduce a new BPF helper like
> bpf_probe_read_kernel_str_pad().
>
> --
> Thanks
> Yafang
--
Thanks
Yafang
On Tue, Oct 26, 2021 at 9:12 PM Steven Rostedt <[email protected]> wrote:
>
> On Tue, 26 Oct 2021 10:18:51 +0800
> Yafang Shao <[email protected]> wrote:
>
> > > So, if we're ever going to copying these buffers out of the kernel (I
> > > don't know what the object lifetime here in bpf is for "e", etc), we
> > > should be zero-padding (as get_task_comm() does).
> > >
> > > Should this, instead, be using a bounce buffer?
> >
> > The comment in bpf_probe_read_kernel_str_common() says
> >
> > : /*
> > : * The strncpy_from_kernel_nofault() call will likely not fill the
> > : * entire buffer, but that's okay in this circumstance as we're probing
> > : * arbitrary memory anyway similar to bpf_probe_read_*() and might
> > : * as well probe the stack. Thus, memory is explicitly cleared
> > : * only in error case, so that improper users ignoring return
> > : * code altogether don't copy garbage; otherwise length of string
> > : * is returned that can be used for bpf_perf_event_output() et al.
> > : */
> >
> > It seems that it doesn't matter if the buffer is filled as that is
> > probing arbitrary memory.
> >
> > >
> > > get_task_comm(comm, task->group_leader);
> >
> > This helper can't be used by the BPF programs, as it is not exported to BPF.
> >
> > > bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm), comm);
>
> I guess Kees is worried that e.comm will have something exported to user
> space that it shouldn't. But since e is part of the BPF program, does the
> BPF JIT take care to make sure everything on its stack is zero'd out, such
> that a user BPF couldn't just read various items off its stack and by doing
> so, see kernel memory it shouldn't be seeing?
>
> I'm guessing it does, otherwise this would be a bigger issue than this
> patch series.
>
You guess is correct per my verification.
But the BPF JIT doesn't zero it out, while it really does is adding
some character like '?' in my verification.
Anyway we don't need to worry that the kernel information may be
leaked though bpf_probe_read_kernel_str().
--
Thanks
Yafang
On Mon 2021-10-25 14:35:42, Kees Cook wrote:
> On Mon, Oct 25, 2021 at 08:33:15AM +0000, Yafang Shao wrote:
> > Show a warning if task comm is truncated. Below is the result
> > of my test case:
> >
> > truncated kthread comm:I-am-a-kthread-with-lon, pid:14 by 6 characters
> >
> > Suggested-by: Petr Mladek <[email protected]>
> > Signed-off-by: Yafang Shao <[email protected]>
> > Reviewed-by: Kees Cook <[email protected]>
> > Cc: Mathieu Desnoyers <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > ---
> > kernel/kthread.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index 5b37a8567168..46b924c92078 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -399,12 +399,17 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
> > if (!IS_ERR(task)) {
> > static const struct sched_param param = { .sched_priority = 0 };
> > char name[TASK_COMM_LEN];
> > + int len;
> >
> > /*
> > * task is already visible to other tasks, so updating
> > * COMM must be protected.
> > */
> > - vsnprintf(name, sizeof(name), namefmt, args);
> > + len = vsnprintf(name, sizeof(name), namefmt, args);
> > + if (len >= TASK_COMM_LEN) {
>
> And since this failure case is slow-path, we could improve the warning
> as other had kind of suggested earlier with something like this instead:
>
> char *full_comm;
>
> full_comm = kvasprintf(GFP_KERNEL, namefmt, args);
You need to use va_copy()/va_end() if you want to use the same va_args
twice.
For example, see how kvasprintf() is implemented. It calls
vsnprintf() twice and it uses va_copy()/va_end() around the the first call.
kvasprintf() could also return NULL if there is not enough memory.
> pr_warn("truncated kthread comm '%s' to '%s' (pid:%d)\n",
> full_comm, name);
BTW: Is this message printed during normal boot? I did not tried the
patchset myself.
We should add this warning only if there is a good solution how to
avoid the truncated names. And we should me sure that the most common
kthreads/workqueues do not trigger it. It would be ugly to print many
warnings during boot if people could not get rid of them easily.
> kfree(full_comm);
> }
> > set_task_comm(task, name);
> > /*
> > * root may have changed our (kthreadd's) priority or CPU mask.
Best Regards,
Petr
On Thu, Oct 28, 2021 at 4:10 AM Petr Mladek <[email protected]> wrote:
>
> On Mon 2021-10-25 14:35:42, Kees Cook wrote:
> > On Mon, Oct 25, 2021 at 08:33:15AM +0000, Yafang Shao wrote:
> > > Show a warning if task comm is truncated. Below is the result
> > > of my test case:
> > >
> > > truncated kthread comm:I-am-a-kthread-with-lon, pid:14 by 6 characters
> > >
> > > Suggested-by: Petr Mladek <[email protected]>
> > > Signed-off-by: Yafang Shao <[email protected]>
> > > Reviewed-by: Kees Cook <[email protected]>
> > > Cc: Mathieu Desnoyers <[email protected]>
> > > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > > Cc: Andrii Nakryiko <[email protected]>
> > > Cc: Peter Zijlstra <[email protected]>
> > > Cc: Steven Rostedt <[email protected]>
> > > Cc: Al Viro <[email protected]>
> > > Cc: Kees Cook <[email protected]>
> > > Cc: Petr Mladek <[email protected]>
> > > ---
> > > kernel/kthread.c | 7 ++++++-
> > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > > index 5b37a8567168..46b924c92078 100644
> > > --- a/kernel/kthread.c
> > > +++ b/kernel/kthread.c
> > > @@ -399,12 +399,17 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
> > > if (!IS_ERR(task)) {
> > > static const struct sched_param param = { .sched_priority = 0 };
> > > char name[TASK_COMM_LEN];
> > > + int len;
> > >
> > > /*
> > > * task is already visible to other tasks, so updating
> > > * COMM must be protected.
> > > */
> > > - vsnprintf(name, sizeof(name), namefmt, args);
> > > + len = vsnprintf(name, sizeof(name), namefmt, args);
> > > + if (len >= TASK_COMM_LEN) {
> >
> > And since this failure case is slow-path, we could improve the warning
> > as other had kind of suggested earlier with something like this instead:
> >
> > char *full_comm;
> >
> > full_comm = kvasprintf(GFP_KERNEL, namefmt, args);
>
> You need to use va_copy()/va_end() if you want to use the same va_args
> twice.
>
> For example, see how kvasprintf() is implemented. It calls
> vsnprintf() twice and it uses va_copy()/va_end() around the the first call.
>
Does it mean that if we want to call vsnprintf() three times, we must
use va_copy()/va_end() around the first call and the second call ?
IOW, if we call vsnprintf() multiple times, all the calls except the
last call should be protected by va_copy()/va_end().
Actually I don't quite understand why we should do it like this. I
will try to understand it, and appreciate it if you could explain it
in detail.
BTW, can we use va_copy()/va_end() in vsnprintf(), then the caller
doesn't need to care how many times it will call vsnprintf().
> kvasprintf() could also return NULL if there is not enough memory.
Right. We need to do the NULL check.
>
> > pr_warn("truncated kthread comm '%s' to '%s' (pid:%d)\n",
> > full_comm, name);
>
> BTW: Is this message printed during normal boot? I did not tried the
> patchset myself.
>
Yes, it will be printed at boot time.
> We should add this warning only if there is a good solution how to
> avoid the truncated names. And we should me sure that the most common
> kthreads/workqueues do not trigger it. It would be ugly to print many
> warnings during boot if people could not get rid of them easily.
>
As we have extended task comm to 24, there's no such warning printed
for the existing kthreads/workqueues.
IOW, it will only print for the newly introduced one if it has a long name.
That means this printing is under control.
> > kfree(full_comm);
> > }
> > > set_task_comm(task, name);
> > > /*
> > > * root may have changed our (kthreadd's) priority or CPU mask.
>
> Best Regards,
> Petr
--
Thanks
Yafang
On Thu, Oct 28, 2021 at 4:10 AM Petr Mladek <[email protected]> wrote:
>
> On Mon 2021-10-25 14:35:42, Kees Cook wrote:
> > On Mon, Oct 25, 2021 at 08:33:15AM +0000, Yafang Shao wrote:
> > > Show a warning if task comm is truncated. Below is the result
> > > of my test case:
> > >
> > > truncated kthread comm:I-am-a-kthread-with-lon, pid:14 by 6 characters
> > >
> > > Suggested-by: Petr Mladek <[email protected]>
> > > Signed-off-by: Yafang Shao <[email protected]>
> > > Reviewed-by: Kees Cook <[email protected]>
> > > Cc: Mathieu Desnoyers <[email protected]>
> > > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > > Cc: Andrii Nakryiko <[email protected]>
> > > Cc: Peter Zijlstra <[email protected]>
> > > Cc: Steven Rostedt <[email protected]>
> > > Cc: Al Viro <[email protected]>
> > > Cc: Kees Cook <[email protected]>
> > > Cc: Petr Mladek <[email protected]>
> > > ---
> > > kernel/kthread.c | 7 ++++++-
> > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > > index 5b37a8567168..46b924c92078 100644
> > > --- a/kernel/kthread.c
> > > +++ b/kernel/kthread.c
> > > @@ -399,12 +399,17 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
> > > if (!IS_ERR(task)) {
> > > static const struct sched_param param = { .sched_priority = 0 };
> > > char name[TASK_COMM_LEN];
> > > + int len;
> > >
> > > /*
> > > * task is already visible to other tasks, so updating
> > > * COMM must be protected.
> > > */
> > > - vsnprintf(name, sizeof(name), namefmt, args);
> > > + len = vsnprintf(name, sizeof(name), namefmt, args);
> > > + if (len >= TASK_COMM_LEN) {
> >
> > And since this failure case is slow-path, we could improve the warning
> > as other had kind of suggested earlier with something like this instead:
> >
> > char *full_comm;
> >
> > full_comm = kvasprintf(GFP_KERNEL, namefmt, args);
>
> You need to use va_copy()/va_end() if you want to use the same va_args
> twice.
>
Now I understand it.
So the patch will be:
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 5b37a8567168..c1ff67283725 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -399,12 +399,29 @@ struct task_struct *__kthread_create_on_node(int
(*threadfn)(void *data),
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };
char name[TASK_COMM_LEN];
+ char *full_comm;
+ va_list aq;
+ int len;
/*
* task is already visible to other tasks, so updating
* COMM must be protected.
*/
- vsnprintf(name, sizeof(name), namefmt, args);
+ va_copy(aq, args);
+ len = vsnprintf(name, sizeof(name), namefmt, aq);
+ va_end(aq);
+ if (len >= TASK_COMM_LEN) {
+ full_comm = kvasprintf(GFP_KERNEL, namefmt, args);
+ if (full_comm) {
+ pr_warn("truncated kthread comm '%s'
to '%s' (pid:%d)\n",
+ full_comm, name, task->pid);
+ kfree(full_comm);
+ } else {
+ pr_warn("truncated kthread comm '%s'
(pid:%d) by %d characters\n",
+ name, task->pid, len -
TASK_COMM_LEN + 1);
+
+ }
+ }
set_task_comm(task, name);
/*
* root may have changed our (kthreadd's) priority or CPU mask.
That seems a little overkill to me.
I prefer to keep the v6 as-is.
> For example, see how kvasprintf() is implemented. It calls
> vsnprintf() twice and it uses va_copy()/va_end() around the the first call.
>
> kvasprintf() could also return NULL if there is not enough memory.
>
> > pr_warn("truncated kthread comm '%s' to '%s' (pid:%d)\n",
> > full_comm, name);
>
> BTW: Is this message printed during normal boot? I did not tried the
> patchset myself.
>
> We should add this warning only if there is a good solution how to
> avoid the truncated names. And we should me sure that the most common
> kthreads/workqueues do not trigger it. It would be ugly to print many
> warnings during boot if people could not get rid of them easily.
>
> > kfree(full_comm);
> > }
> > > set_task_comm(task, name);
> > > /*
> > > * root may have changed our (kthreadd's) priority or CPU mask.
>
> Best Regards,
> Petr
--
Thanks
Yafang