2021-11-20 11:29:31

by Yafang Shao

[permalink] [raw]
Subject: [PATCH v2] kthread: dynamically allocate memory to store kthread's full name

When I was implementing a new per-cpu kthread cfs_migration, I found the
comm of it "cfs_migration/%u" is truncated due to the limitation of
TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are
all with the same name "cfs_migration/1", which will confuse the user. This
issue is not critical, because we can get the corresponding CPU from the
task's Cpus_allowed. But for kthreads correspoinding to other hardware
devices, it is not easy to get the detailed device info from task comm,
for example,

jbd2/nvme0n1p2-
xfs-reclaim/sdf

Currently there are so many truncated kthreads:

rcu_tasks_kthre
rcu_tasks_rude_
rcu_tasks_trace
poll_mpt3sas0_s
ext4-rsv-conver
xfs-reclaim/sd{a, b, c, ...}
xfs-blockgc/sd{a, b, c, ...}
xfs-inodegc/sd{a, b, c, ...}
audit_send_repl
ecryptfs-kthrea
vfio-irqfd-clea
jbd2/nvme0n1p2-
...

We can shorten these names to work around this problem, but it may be
not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-' for
example, it is a nice name, and it is not a good idea to shorten it.

One possible way to fix this issue is extending the task comm size, but
as task->comm is used in lots of places, that may cause some potential
buffer overflows. Another more conservative approach is introducing a new
pointer to store kthread's full name if it is truncated, which won't
introduce too much overhead as it is in the non-critical path. Finally we
make a dicision to use the second approach. See also the discussions in
this thread:
https://lore.kernel.org/lkml/[email protected]/

After this change, the full name of these truncated kthreads will be
displayed via /proc/[pid]/comm:

rcu_tasks_kthread
rcu_tasks_rude_kthread
rcu_tasks_trace_kthread
poll_mpt3sas0_statu
ext4-rsv-conversion
xfs-reclaim/sdf1
xfs-blockgc/sdf1
xfs-inodegc/sdf1
audit_send_reply
ecryptfs-kthread
vfio-irqfd-cleanup
jbd2/nvme0n1p2-8

Suggested-by: Petr Mladek <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Michal Miroslaw <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>

---

Changes since v1:
1. leave it turncated when out of memory (Kees & Petr)
2. do null check in free_kthread_struct (Petr)
---
fs/proc/array.c | 3 +++
include/linux/kthread.h | 1 +
kernel/kthread.c | 32 ++++++++++++++++++++++++++++++--
3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index ff869a66b34e..4321aa63835d 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -92,6 +92,7 @@
#include <linux/string_helpers.h>
#include <linux/user_namespace.h>
#include <linux/fs_struct.h>
+#include <linux/kthread.h>

#include <asm/processor.h>
#include "internal.h"
@@ -102,6 +103,8 @@ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape)

if (p->flags & PF_WQ_WORKER)
wq_worker_comm(tcomm, sizeof(tcomm), p);
+ else if (p->flags & PF_KTHREAD)
+ get_kthread_comm(tcomm, sizeof(tcomm), p);
else
__get_task_comm(tcomm, sizeof(tcomm), p);

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 346b0f269161..2a5c04494663 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -33,6 +33,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
unsigned int cpu,
const char *namefmt);

+void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk);
void set_kthread_struct(struct task_struct *p);

void kthread_set_per_cpu(struct task_struct *k, int cpu);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 7113003fab63..a70cd5dc94e3 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -60,6 +60,8 @@ struct kthread {
#ifdef CONFIG_BLK_CGROUP
struct cgroup_subsys_state *blkcg_css;
#endif
+ /* To store the full name if task comm is truncated. */
+ char *full_name;
};

enum KTHREAD_BITS {
@@ -93,6 +95,18 @@ static inline struct kthread *__to_kthread(struct task_struct *p)
return kthread;
}

+void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk)
+{
+ struct kthread *kthread = to_kthread(tsk);
+
+ if (!kthread || !kthread->full_name) {
+ __get_task_comm(buf, buf_size, tsk);
+ return;
+ }
+
+ strscpy_pad(buf, kthread->full_name, buf_size);
+}
+
void set_kthread_struct(struct task_struct *p)
{
struct kthread *kthread;
@@ -118,9 +132,13 @@ void free_kthread_struct(struct task_struct *k)
* or if kmalloc() in kthread() failed.
*/
kthread = to_kthread(k);
+ if (!kthread)
+ return;
+
#ifdef CONFIG_BLK_CGROUP
- WARN_ON_ONCE(kthread && kthread->blkcg_css);
+ WARN_ON_ONCE(kthread->blkcg_css);
#endif
+ kfree(kthread->full_name);
kfree(kthread);
}

@@ -406,12 +424,22 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
task = create->result;
if (!IS_ERR(task)) {
char name[TASK_COMM_LEN];
+ va_list aq;
+ int len;

/*
* task is already visible to other tasks, so updating
* COMM must be protected.
*/
- vsnprintf(name, sizeof(name), namefmt, args);
+ va_copy(aq, args);
+ len = vsnprintf(name, sizeof(name), namefmt, aq);
+ va_end(aq);
+ if (len >= TASK_COMM_LEN) {
+ struct kthread *kthread = to_kthread(task);
+
+ /* leave it truncated when out of memory. */
+ kthread->full_name = kvasprintf(GFP_KERNEL, namefmt, args);
+ }
set_task_comm(task, name);
}
kfree(create);
--
2.17.1



2021-11-25 09:20:13

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v2] kthread: dynamically allocate memory to store kthread's full name

On Sat 2021-11-20 11:28:50, Yafang Shao wrote:
> When I was implementing a new per-cpu kthread cfs_migration, I found the
> comm of it "cfs_migration/%u" is truncated due to the limitation of
> TASK_COMM_LEN.
>
> One possible way to fix this issue is extending the task comm size, but
> as task->comm is used in lots of places, that may cause some potential
> buffer overflows. Another more conservative approach is introducing a new
> pointer to store kthread's full name if it is truncated, which won't
> introduce too much overhead as it is in the non-critical path. Finally we
> make a dicision to use the second approach. See also the discussions in
> this thread:
> https://lore.kernel.org/lkml/[email protected]/
>
> After this change, the full name of these truncated kthreads will be
> displayed via /proc/[pid]/comm:

> Signed-off-by: Yafang Shao <[email protected]>

Looks good to me:

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2021-11-25 09:38:58

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2] kthread: dynamically allocate memory to store kthread's full name

On 20.11.21 12:28, Yafang Shao wrote:
> When I was implementing a new per-cpu kthread cfs_migration, I found the
> comm of it "cfs_migration/%u" is truncated due to the limitation of
> TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are
> all with the same name "cfs_migration/1", which will confuse the user. This
> issue is not critical, because we can get the corresponding CPU from the
> task's Cpus_allowed. But for kthreads correspoinding to other hardware
> devices, it is not easy to get the detailed device info from task comm,
> for example,
>
> jbd2/nvme0n1p2-
> xfs-reclaim/sdf
>
> Currently there are so many truncated kthreads:
>
> rcu_tasks_kthre
> rcu_tasks_rude_
> rcu_tasks_trace
> poll_mpt3sas0_s
> ext4-rsv-conver
> xfs-reclaim/sd{a, b, c, ...}
> xfs-blockgc/sd{a, b, c, ...}
> xfs-inodegc/sd{a, b, c, ...}
> audit_send_repl
> ecryptfs-kthrea
> vfio-irqfd-clea
> jbd2/nvme0n1p2-
> ...
>
> We can shorten these names to work around this problem, but it may be
> not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-' for
> example, it is a nice name, and it is not a good idea to shorten it.
>
> One possible way to fix this issue is extending the task comm size, but
> as task->comm is used in lots of places, that may cause some potential
> buffer overflows. Another more conservative approach is introducing a new
> pointer to store kthread's full name if it is truncated, which won't
> introduce too much overhead as it is in the non-critical path. Finally we
> make a dicision to use the second approach. See also the discussions in
> this thread:
> https://lore.kernel.org/lkml/[email protected]/
>
> After this change, the full name of these truncated kthreads will be
> displayed via /proc/[pid]/comm:
>
> rcu_tasks_kthread
> rcu_tasks_rude_kthread
> rcu_tasks_trace_kthread
> poll_mpt3sas0_statu
> ext4-rsv-conversion
> xfs-reclaim/sdf1
> xfs-blockgc/sdf1
> xfs-inodegc/sdf1
> audit_send_reply
> ecryptfs-kthread
> vfio-irqfd-cleanup
> jbd2/nvme0n1p2-8

I do wonder if that could break some user space that assumes these names
have maximum length ..

But LGTM

Reviewed-by: David Hildenbrand <[email protected]>


--
Thanks,

David / dhildenb


2021-11-25 14:48:19

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v2] kthread: dynamically allocate memory to store kthread's full name

On Thu 2021-11-25 10:36:49, David Hildenbrand wrote:
> On 20.11.21 12:28, Yafang Shao wrote:
> > When I was implementing a new per-cpu kthread cfs_migration, I found the
> > comm of it "cfs_migration/%u" is truncated due to the limitation of
> > TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are
> > all with the same name "cfs_migration/1", which will confuse the user. This
> > issue is not critical, because we can get the corresponding CPU from the
> > task's Cpus_allowed. But for kthreads correspoinding to other hardware
> > devices, it is not easy to get the detailed device info from task comm,
> > for example,
> >
> > jbd2/nvme0n1p2-
> > xfs-reclaim/sdf
> >
> > Currently there are so many truncated kthreads:
> >
> > rcu_tasks_kthre
> > rcu_tasks_rude_
> > rcu_tasks_trace
> > poll_mpt3sas0_s
> > ext4-rsv-conver
> > xfs-reclaim/sd{a, b, c, ...}
> > xfs-blockgc/sd{a, b, c, ...}
> > xfs-inodegc/sd{a, b, c, ...}
> > audit_send_repl
> > ecryptfs-kthrea
> > vfio-irqfd-clea
> > jbd2/nvme0n1p2-
> > ...
> >
> > We can shorten these names to work around this problem, but it may be
> > not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-' for
> > example, it is a nice name, and it is not a good idea to shorten it.
> >
> > One possible way to fix this issue is extending the task comm size, but
> > as task->comm is used in lots of places, that may cause some potential
> > buffer overflows. Another more conservative approach is introducing a new
> > pointer to store kthread's full name if it is truncated, which won't
> > introduce too much overhead as it is in the non-critical path. Finally we
> > make a dicision to use the second approach. See also the discussions in
> > this thread:
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > After this change, the full name of these truncated kthreads will be
> > displayed via /proc/[pid]/comm:
> >
> > rcu_tasks_kthread
> > rcu_tasks_rude_kthread
> > rcu_tasks_trace_kthread
> > poll_mpt3sas0_statu
> > ext4-rsv-conversion
> > xfs-reclaim/sdf1
> > xfs-blockgc/sdf1
> > xfs-inodegc/sdf1
> > audit_send_reply
> > ecryptfs-kthread
> > vfio-irqfd-cleanup
> > jbd2/nvme0n1p2-8
>
> I do wonder if that could break some user space that assumes these names
> have maximum length ..

There is high chance that we will be on the safe side. Workqueue
kthreads already provided longer names. They are even dynamic
because the currently handled workqueue name is part of the name,
see wq_worker_comm().

Best Regards,
Petr

2021-11-25 14:49:11

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2] kthread: dynamically allocate memory to store kthread's full name

On 25.11.21 15:46, Petr Mladek wrote:
> On Thu 2021-11-25 10:36:49, David Hildenbrand wrote:
>> On 20.11.21 12:28, Yafang Shao wrote:
>>> When I was implementing a new per-cpu kthread cfs_migration, I found the
>>> comm of it "cfs_migration/%u" is truncated due to the limitation of
>>> TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are
>>> all with the same name "cfs_migration/1", which will confuse the user. This
>>> issue is not critical, because we can get the corresponding CPU from the
>>> task's Cpus_allowed. But for kthreads correspoinding to other hardware
>>> devices, it is not easy to get the detailed device info from task comm,
>>> for example,
>>>
>>> jbd2/nvme0n1p2-
>>> xfs-reclaim/sdf
>>>
>>> Currently there are so many truncated kthreads:
>>>
>>> rcu_tasks_kthre
>>> rcu_tasks_rude_
>>> rcu_tasks_trace
>>> poll_mpt3sas0_s
>>> ext4-rsv-conver
>>> xfs-reclaim/sd{a, b, c, ...}
>>> xfs-blockgc/sd{a, b, c, ...}
>>> xfs-inodegc/sd{a, b, c, ...}
>>> audit_send_repl
>>> ecryptfs-kthrea
>>> vfio-irqfd-clea
>>> jbd2/nvme0n1p2-
>>> ...
>>>
>>> We can shorten these names to work around this problem, but it may be
>>> not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-' for
>>> example, it is a nice name, and it is not a good idea to shorten it.
>>>
>>> One possible way to fix this issue is extending the task comm size, but
>>> as task->comm is used in lots of places, that may cause some potential
>>> buffer overflows. Another more conservative approach is introducing a new
>>> pointer to store kthread's full name if it is truncated, which won't
>>> introduce too much overhead as it is in the non-critical path. Finally we
>>> make a dicision to use the second approach. See also the discussions in
>>> this thread:
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> After this change, the full name of these truncated kthreads will be
>>> displayed via /proc/[pid]/comm:
>>>
>>> rcu_tasks_kthread
>>> rcu_tasks_rude_kthread
>>> rcu_tasks_trace_kthread
>>> poll_mpt3sas0_statu
>>> ext4-rsv-conversion
>>> xfs-reclaim/sdf1
>>> xfs-blockgc/sdf1
>>> xfs-inodegc/sdf1
>>> audit_send_reply
>>> ecryptfs-kthread
>>> vfio-irqfd-cleanup
>>> jbd2/nvme0n1p2-8
>>
>> I do wonder if that could break some user space that assumes these names
>> have maximum length ..
>
> There is high chance that we will be on the safe side. Workqueue
> kthreads already provided longer names. They are even dynamic
> because the currently handled workqueue name is part of the name,
> see wq_worker_comm().

Great, thanks!


--
Thanks,

David / dhildenb