2021-06-30 00:35:55

by Paul Burton

[permalink] [raw]
Subject: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

The tgid_map array records a mapping from pid to tgid, where the index
of an entry within the array is the pid & the value stored at that index
is the tgid.

The saved_tgids_next() function iterates over pointers into the tgid_map
array & dereferences the pointers which results in the tgid, but then it
passes that dereferenced value to trace_find_tgid() which treats it as a
pid & does a further lookup within the tgid_map array. It seems likely
that the intent here was to skip over entries in tgid_map for which the
recorded tgid is zero, but instead we end up skipping over entries for
which the thread group leader hasn't yet had its own tgid recorded in
tgid_map.

A minimal fix would be to remove the call to trace_find_tgid, turning:

if (trace_find_tgid(*ptr))

into:

if (*ptr)

..but it seems like this logic can be much simpler if we simply let
seq_read() iterate over the whole tgid_map array & filter out empty
entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
approach, removing the incorrect logic here entirely.

Signed-off-by: Paul Burton <[email protected]>
Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: <[email protected]>
---
kernel/trace/trace.c | 38 +++++++++++++-------------------------
1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d23a09d3eb37b..9570667310bcc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5608,37 +5608,20 @@ static const struct file_operations tracing_readme_fops = {

static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
{
- int *ptr = v;
+ int pid = ++(*pos);

- if (*pos || m->count)
- ptr++;
-
- (*pos)++;
-
- for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) {
- if (trace_find_tgid(*ptr))
- return ptr;
- }
+ if (pid > PID_MAX_DEFAULT)
+ return NULL;

- return NULL;
+ return &tgid_map[pid];
}

static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
{
- void *v;
- loff_t l = 0;
-
- if (!tgid_map)
+ if (!tgid_map || *pos > PID_MAX_DEFAULT)
return NULL;

- v = &tgid_map[0];
- while (l <= *pos) {
- v = saved_tgids_next(m, v, &l);
- if (!v)
- return NULL;
- }
-
- return v;
+ return &tgid_map[*pos];
}

static void saved_tgids_stop(struct seq_file *m, void *v)
@@ -5647,9 +5630,14 @@ static void saved_tgids_stop(struct seq_file *m, void *v)

static int saved_tgids_show(struct seq_file *m, void *v)
{
- int pid = (int *)v - tgid_map;
+ int *entry = (int *)v;
+ int pid = entry - tgid_map;
+ int tgid = *entry;
+
+ if (tgid == 0)
+ return SEQ_SKIP;

- seq_printf(m, "%d %d\n", pid, trace_find_tgid(pid));
+ seq_printf(m, "%d %d\n", pid, tgid);
return 0;
}


base-commit: 62fb9874f5da54fdb243003b386128037319b219
--
2.32.0.93.g670b81a890-goog


2021-06-30 00:36:43

by Paul Burton

[permalink] [raw]
Subject: [PATCH 2/2] tracing: Resize tgid_map to PID_MAX_LIMIT, not PID_MAX_DEFAULT

Currently tgid_map is sized at PID_MAX_DEFAULT entries, which means that
on systems where pid_max is configured higher than PID_MAX_DEFAULT the
ftrace record-tgid option doesn't work so well. Any tasks with PIDs
higher than PID_MAX_DEFAULT are simply not recorded in tgid_map, and
don't show up in the saved_tgids file.

In particular since systemd v243 & above configure pid_max to its
highest possible 1<<22 value by default on 64 bit systems this renders
the record-tgids option of little use.

Increase the size of tgid_map to PID_MAX_LIMIT instead, allowing it to
cover the full range of PIDs up to the maximum value of pid_max.

On 64 bit systems this will increase the size of tgid_map from 256KiB to
16MiB. Whilst this 64x increase in memory overhead sounds significant 64
bit systems are presumably best placed to accommodate it, and since
tgid_map is only allocated when the record-tgid option is actually used
presumably the user would rather it spends sufficient memory to actually
record the tgids they expect.

The size of tgid_map will also increase for CONFIG_BASE_SMALL=y
configurations, but these seem unlikely to be systems upon which people
are running ftrace with record-tgid anyway.

Signed-off-by: Paul Burton <[email protected]>
Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: <[email protected]>
---
kernel/trace/trace.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 9570667310bcc..c893c0c2754f8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2460,7 +2460,7 @@ void trace_find_cmdline(int pid, char comm[])

int trace_find_tgid(int pid)
{
- if (unlikely(!tgid_map || !pid || pid > PID_MAX_DEFAULT))
+ if (unlikely(!tgid_map || !pid || pid > PID_MAX_LIMIT))
return 0;

return tgid_map[pid];
@@ -2472,7 +2472,7 @@ static int trace_save_tgid(struct task_struct *tsk)
if (!tsk->pid)
return 1;

- if (unlikely(!tgid_map || tsk->pid > PID_MAX_DEFAULT))
+ if (unlikely(!tgid_map || tsk->pid > PID_MAX_LIMIT))
return 0;

tgid_map[tsk->pid] = tsk->tgid;
@@ -5194,7 +5194,7 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)

if (mask == TRACE_ITER_RECORD_TGID) {
if (!tgid_map)
- tgid_map = kvcalloc(PID_MAX_DEFAULT + 1,
+ tgid_map = kvcalloc(PID_MAX_LIMIT + 1,
sizeof(*tgid_map),
GFP_KERNEL);
if (!tgid_map) {
@@ -5610,7 +5610,7 @@ static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
{
int pid = ++(*pos);

- if (pid > PID_MAX_DEFAULT)
+ if (pid > PID_MAX_LIMIT)
return NULL;

return &tgid_map[pid];
@@ -5618,7 +5618,7 @@ static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)

static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
{
- if (!tgid_map || *pos > PID_MAX_DEFAULT)
+ if (!tgid_map || *pos > PID_MAX_LIMIT)
return NULL;

return &tgid_map[*pos];
--
2.32.0.93.g670b81a890-goog

2021-06-30 12:33:25

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On Tue, 29 Jun 2021 17:34:05 -0700
Paul Burton <[email protected]> wrote:

> The tgid_map array records a mapping from pid to tgid, where the index
> of an entry within the array is the pid & the value stored at that index
> is the tgid.
>
> The saved_tgids_next() function iterates over pointers into the tgid_map
> array & dereferences the pointers which results in the tgid, but then it
> passes that dereferenced value to trace_find_tgid() which treats it as a
> pid & does a further lookup within the tgid_map array. It seems likely
> that the intent here was to skip over entries in tgid_map for which the
> recorded tgid is zero, but instead we end up skipping over entries for
> which the thread group leader hasn't yet had its own tgid recorded in
> tgid_map.
>
> A minimal fix would be to remove the call to trace_find_tgid, turning:
>
> if (trace_find_tgid(*ptr))
>
> into:
>
> if (*ptr)
>
> ..but it seems like this logic can be much simpler if we simply let
> seq_read() iterate over the whole tgid_map array & filter out empty
> entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
> approach, removing the incorrect logic here entirely.
>
> Signed-off-by: Paul Burton <[email protected]>
> Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
> Cc: Steven Rostedt <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: <[email protected]>
> ---

Joel,

Can you review this please.

-- Steve

2021-06-30 12:36:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: Resize tgid_map to PID_MAX_LIMIT, not PID_MAX_DEFAULT

On Tue, 29 Jun 2021 17:34:06 -0700
Paul Burton <[email protected]> wrote:

> On 64 bit systems this will increase the size of tgid_map from 256KiB to
> 16MiB. Whilst this 64x increase in memory overhead sounds significant 64
> bit systems are presumably best placed to accommodate it, and since
> tgid_map is only allocated when the record-tgid option is actually used
> presumably the user would rather it spends sufficient memory to actually
> record the tgids they expect.

NAK. Please see how I fixed this for the saved_cmdlines, and implement
it the same way.

785e3c0a3a87 ("tracing: Map all PIDs to command lines")

It's a cache, it doesn't need to save everything.

-- Steve


>
> The size of tgid_map will also increase for CONFIG_BASE_SMALL=y
> configurations, but these seem unlikely to be systems upon which people
> are running ftrace with record-tgid anyway.

2021-06-30 16:45:01

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On Wed, Jun 30, 2021 at 8:31 AM Steven Rostedt <[email protected]> wrote:
>
> On Tue, 29 Jun 2021 17:34:05 -0700
> Paul Burton <[email protected]> wrote:
>
> > The tgid_map array records a mapping from pid to tgid, where the index
> > of an entry within the array is the pid & the value stored at that index
> > is the tgid.
> >
> > The saved_tgids_next() function iterates over pointers into the tgid_map
> > array & dereferences the pointers which results in the tgid, but then it
> > passes that dereferenced value to trace_find_tgid() which treats it as a
> > pid & does a further lookup within the tgid_map array. It seems likely
> > that the intent here was to skip over entries in tgid_map for which the
> > recorded tgid is zero, but instead we end up skipping over entries for
> > which the thread group leader hasn't yet had its own tgid recorded in
> > tgid_map.
> >
> > A minimal fix would be to remove the call to trace_find_tgid, turning:
> >
> > if (trace_find_tgid(*ptr))
> >
> > into:
> >
> > if (*ptr)
> >
> > ..but it seems like this logic can be much simpler if we simply let
> > seq_read() iterate over the whole tgid_map array & filter out empty
> > entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
> > approach, removing the incorrect logic here entirely.
> >
> > Signed-off-by: Paul Burton <[email protected]>
> > Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Joel Fernandes <[email protected]>
> > Cc: <[email protected]>
> > ---
>
> Joel,
>
> Can you review this please.

Sure thing Steve, will review it today.

thanks,
-Joel

2021-06-30 21:12:18

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: Resize tgid_map to PID_MAX_LIMIT, not PID_MAX_DEFAULT

Hi Steven,

On Wed, Jun 30, 2021 at 08:35:13AM -0400, Steven Rostedt wrote:
> On Tue, 29 Jun 2021 17:34:06 -0700
> Paul Burton <[email protected]> wrote:
>
> > On 64 bit systems this will increase the size of tgid_map from 256KiB to
> > 16MiB. Whilst this 64x increase in memory overhead sounds significant 64
> > bit systems are presumably best placed to accommodate it, and since
> > tgid_map is only allocated when the record-tgid option is actually used
> > presumably the user would rather it spends sufficient memory to actually
> > record the tgids they expect.
>
> NAK. Please see how I fixed this for the saved_cmdlines, and implement
> it the same way.
>
> 785e3c0a3a87 ("tracing: Map all PIDs to command lines")
>
> It's a cache, it doesn't need to save everything.

Well sure, but it's a cache that (modulo pid recycling) previously had a
100% hit rate for tasks observed in sched_switch events.

It differs from saved_cmdlines in a few key ways that led me to treat it
differently:

1) The cost of allocating map_pid_to_cmdline is paid by all users of
ftrace, whilst as I mentioned in my commit description the cost of
allocating tgid_map is only paid by those who actually enable the
record-tgid option.

2) We verify that the data in map_pid_to_cmdline is valid by
cross-referencing it against map_cmdline_to_pid before reporting it.
We don't currently have an equivalent for tgid_map, so we'd need to
add a second array or make tgid_map an array of struct { int pid; int
tgid; } to avoid reporting incorrect tgids. We therefore need to
double the memory we consume or further reduce the effectiveness of
this cache.

3) As mentioned before, with the default pid_max tgid_map/record-tgid
has a 100% hit rate which was never the case for saved_cmdlines. If
we go with a solution that changes this property then I certainly
think the docs need updating - the description of saved_tgids in
Documentation/trace/ftrace.rst makes no mention of this being
anything but a perfect recreation of pid->tgid relationships, and
unlike the description of saved_cmdlines it doesn't use the word
"cache" at all.

Having said that I think taking a similar approach to saved_cmdlines
would be better than what we have now, though I'm not sure whether it'll
be sufficient to actually be usable for me. My use case is grouping
threads into processes when displaying scheduling information, and
experience tells me that if any threads don't get grouped appropriately
the result will be questions.

Thanks,
Paul

2021-06-30 21:36:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: Resize tgid_map to PID_MAX_LIMIT, not PID_MAX_DEFAULT

On Wed, 30 Jun 2021 14:09:42 -0700
Paul Burton <[email protected]> wrote:

> Hi Steven,
>
> On Wed, Jun 30, 2021 at 08:35:13AM -0400, Steven Rostedt wrote:
> > On Tue, 29 Jun 2021 17:34:06 -0700
> > Paul Burton <[email protected]> wrote:
> >
> > > On 64 bit systems this will increase the size of tgid_map from 256KiB to
> > > 16MiB. Whilst this 64x increase in memory overhead sounds significant 64
> > > bit systems are presumably best placed to accommodate it, and since
> > > tgid_map is only allocated when the record-tgid option is actually used
> > > presumably the user would rather it spends sufficient memory to actually
> > > record the tgids they expect.
> >
> > NAK. Please see how I fixed this for the saved_cmdlines, and implement
> > it the same way.
> >
> > 785e3c0a3a87 ("tracing: Map all PIDs to command lines")
> >
> > It's a cache, it doesn't need to save everything.
>
> Well sure, but it's a cache that (modulo pid recycling) previously had a
> 100% hit rate for tasks observed in sched_switch events.

Obviously it wasn't 100% when it went over the PID_MAX_DEFAULT.

>
> It differs from saved_cmdlines in a few key ways that led me to treat it
> differently:
>
> 1) The cost of allocating map_pid_to_cmdline is paid by all users of
> ftrace, whilst as I mentioned in my commit description the cost of
> allocating tgid_map is only paid by those who actually enable the
> record-tgid option.

I'll admit that this is a valid point.

>
> 2) We verify that the data in map_pid_to_cmdline is valid by
> cross-referencing it against map_cmdline_to_pid before reporting it.
> We don't currently have an equivalent for tgid_map, so we'd need to
> add a second array or make tgid_map an array of struct { int pid; int
> tgid; } to avoid reporting incorrect tgids. We therefore need to
> double the memory we consume or further reduce the effectiveness of
> this cache.

Double 256K is just 512K which is still much less than 16M.

>
> 3) As mentioned before, with the default pid_max tgid_map/record-tgid
> has a 100% hit rate which was never the case for saved_cmdlines. If
> we go with a solution that changes this property then I certainly
> think the docs need updating - the description of saved_tgids in
> Documentation/trace/ftrace.rst makes no mention of this being
> anything but a perfect recreation of pid->tgid relationships, and
> unlike the description of saved_cmdlines it doesn't use the word
> "cache" at all.
>
> Having said that I think taking a similar approach to saved_cmdlines
> would be better than what we have now, though I'm not sure whether it'll
> be sufficient to actually be usable for me. My use case is grouping
> threads into processes when displaying scheduling information, and
> experience tells me that if any threads don't get grouped appropriately
> the result will be questions.

I found a few bugs recently in the saved_cmdlines that were causing a
much higher miss rate, but now it's been rather accurate. I wonder how
much pain that's been causing you?

Anyway, I'll wait to hear what Joel says on this. If he thinks this is
worth 16M of memory when enabled, I may take it.

-- Steve

2021-06-30 22:31:35

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On Tue, Jun 29, 2021 at 8:34 PM Paul Burton <[email protected]> wrote:
>
> The tgid_map array records a mapping from pid to tgid, where the index
> of an entry within the array is the pid & the value stored at that index
> is the tgid.
>
> The saved_tgids_next() function iterates over pointers into the tgid_map
> array & dereferences the pointers which results in the tgid, but then it
> passes that dereferenced value to trace_find_tgid() which treats it as a
> pid & does a further lookup within the tgid_map array. It seems likely
> that the intent here was to skip over entries in tgid_map for which the
> recorded tgid is zero, but instead we end up skipping over entries for
> which the thread group leader hasn't yet had its own tgid recorded in
> tgid_map.
>
> A minimal fix would be to remove the call to trace_find_tgid, turning:
>
> if (trace_find_tgid(*ptr))
>
> into:
>
> if (*ptr)
>
> ..but it seems like this logic can be much simpler if we simply let
> seq_read() iterate over the whole tgid_map array & filter out empty
> entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
> approach, removing the incorrect logic here entirely.

Looks reasonable except for one nit:

> Signed-off-by: Paul Burton <[email protected]>
> Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
> Cc: Steven Rostedt <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: <[email protected]>
> ---
> kernel/trace/trace.c | 38 +++++++++++++-------------------------
> 1 file changed, 13 insertions(+), 25 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index d23a09d3eb37b..9570667310bcc 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -5608,37 +5608,20 @@ static const struct file_operations tracing_readme_fops = {
>
> static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
> {
> - int *ptr = v;
> + int pid = ++(*pos);
>
> - if (*pos || m->count)
> - ptr++;
> -
> - (*pos)++;
> -
> - for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) {
> - if (trace_find_tgid(*ptr))
> - return ptr;

It would be great if you can add back the check for !tgid_map to both
next() and show() as well, for added robustness (since the old code
previously did it).

With that change:
Reviewed-by: Joel Fernandes (Google) <[email protected]>

thanks,

-Joel


> - }
> + if (pid > PID_MAX_DEFAULT)
> + return NULL;
>
> - return NULL;
> + return &tgid_map[pid];
> }
>
> static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
> {
> - void *v;
> - loff_t l = 0;
> -
> - if (!tgid_map)
> + if (!tgid_map || *pos > PID_MAX_DEFAULT)
> return NULL;
>
> - v = &tgid_map[0];
> - while (l <= *pos) {
> - v = saved_tgids_next(m, v, &l);
> - if (!v)
> - return NULL;
> - }
> -
> - return v;
> + return &tgid_map[*pos];
> }
>
> static void saved_tgids_stop(struct seq_file *m, void *v)
> @@ -5647,9 +5630,14 @@ static void saved_tgids_stop(struct seq_file *m, void *v)
>
> static int saved_tgids_show(struct seq_file *m, void *v)
> {
> - int pid = (int *)v - tgid_map;
> + int *entry = (int *)v;
> + int pid = entry - tgid_map;
> + int tgid = *entry;
> +
> + if (tgid == 0)
> + return SEQ_SKIP;
>
> - seq_printf(m, "%d %d\n", pid, trace_find_tgid(pid));
> + seq_printf(m, "%d %d\n", pid, tgid);
> return 0;
> }
>
>
> base-commit: 62fb9874f5da54fdb243003b386128037319b219
> --
> 2.32.0.93.g670b81a890-goog
>

2021-06-30 22:39:51

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: Resize tgid_map to PID_MAX_LIMIT, not PID_MAX_DEFAULT

On Wed, Jun 30, 2021 at 5:34 PM Steven Rostedt <[email protected]> wrote:
[..]
> > Having said that I think taking a similar approach to saved_cmdlines
> > would be better than what we have now, though I'm not sure whether it'll
> > be sufficient to actually be usable for me. My use case is grouping
> > threads into processes when displaying scheduling information, and
> > experience tells me that if any threads don't get grouped appropriately
> > the result will be questions.
>
> I found a few bugs recently in the saved_cmdlines that were causing a
> much higher miss rate, but now it's been rather accurate. I wonder how
> much pain that's been causing you?
>
> Anyway, I'll wait to hear what Joel says on this. If he thinks this is
> worth 16M of memory when enabled, I may take it.

I am not a huge fan of the 16M, in Android we enable this feature on
all devices. Low end Android devices traced in the field are sometimes
1 GB and the added memory pressure may be unwelcome. Very least, maybe
make it optional only for folks who increase pid_max?

thanks,
-Joel

2021-06-30 23:13:08

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: Resize tgid_map to PID_MAX_LIMIT, not PID_MAX_DEFAULT

On Wed, 30 Jun 2021 18:34:11 -0400
Joel Fernandes <[email protected]> wrote:

> > Anyway, I'll wait to hear what Joel says on this. If he thinks this is
> > worth 16M of memory when enabled, I may take it.
>
> I am not a huge fan of the 16M, in Android we enable this feature on
> all devices. Low end Android devices traced in the field are sometimes
> 1 GB and the added memory pressure may be unwelcome. Very least, maybe
> make it optional only for folks who increase pid_max?

Yeah, can we just set it to the size of pid_max, at whatever it is set
to?

-- Steve

2021-07-01 14:44:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: Resize tgid_map to PID_MAX_LIMIT, not PID_MAX_DEFAULT

On Wed, 30 Jun 2021 19:11:45 -0400
Steven Rostedt <[email protected]> wrote:

> On Wed, 30 Jun 2021 18:34:11 -0400
> Joel Fernandes <[email protected]> wrote:
>
> > > Anyway, I'll wait to hear what Joel says on this. If he thinks this is
> > > worth 16M of memory when enabled, I may take it.
> >
> > I am not a huge fan of the 16M, in Android we enable this feature on
> > all devices. Low end Android devices traced in the field are sometimes
> > 1 GB and the added memory pressure may be unwelcome. Very least, maybe
> > make it optional only for folks who increase pid_max?
>
> Yeah, can we just set it to the size of pid_max, at whatever it is set
> to?

Can you send a V2 with this update and address Joel's comments to patch 1,
and get it to me today? I plan on sending Linus a pull request
tomorrow, which means I have to kick off my tests tonight for what I
want to send.

-- Steve

2021-07-01 17:25:52

by Paul Burton

[permalink] [raw]
Subject: [PATCH v2 1/2] tracing: Simplify & fix saved_tgids logic

The tgid_map array records a mapping from pid to tgid, where the index
of an entry within the array is the pid & the value stored at that index
is the tgid.

The saved_tgids_next() function iterates over pointers into the tgid_map
array & dereferences the pointers which results in the tgid, but then it
passes that dereferenced value to trace_find_tgid() which treats it as a
pid & does a further lookup within the tgid_map array. It seems likely
that the intent here was to skip over entries in tgid_map for which the
recorded tgid is zero, but instead we end up skipping over entries for
which the thread group leader hasn't yet had its own tgid recorded in
tgid_map.

A minimal fix would be to remove the call to trace_find_tgid, turning:

if (trace_find_tgid(*ptr))

into:

if (*ptr)

..but it seems like this logic can be much simpler if we simply let
seq_read() iterate over the whole tgid_map array & filter out empty
entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
approach, removing the incorrect logic here entirely.

Signed-off-by: Paul Burton <[email protected]>
Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: <[email protected]>
---
Changes in v2:
- Add comments describing why we know tgid_map is non-NULL in
saved_tgids_next() & saved_tgids_start().
---
kernel/trace/trace.c | 45 ++++++++++++++++++++------------------------
1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d23a09d3eb37..7a37c9e36b88 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5608,37 +5608,23 @@ static const struct file_operations tracing_readme_fops = {

static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
{
- int *ptr = v;
+ int pid = ++(*pos);

- if (*pos || m->count)
- ptr++;
-
- (*pos)++;
-
- for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) {
- if (trace_find_tgid(*ptr))
- return ptr;
- }
+ if (pid > PID_MAX_DEFAULT)
+ return NULL;

- return NULL;
+ // We already know that tgid_map is non-NULL here because the v
+ // argument is by definition a non-NULL pointer into tgid_map returned
+ // by saved_tgids_start() or an earlier call to saved_tgids_next().
+ return &tgid_map[pid];
}

static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
{
- void *v;
- loff_t l = 0;
-
- if (!tgid_map)
+ if (!tgid_map || *pos > PID_MAX_DEFAULT)
return NULL;

- v = &tgid_map[0];
- while (l <= *pos) {
- v = saved_tgids_next(m, v, &l);
- if (!v)
- return NULL;
- }
-
- return v;
+ return &tgid_map[*pos];
}

static void saved_tgids_stop(struct seq_file *m, void *v)
@@ -5647,9 +5633,18 @@ static void saved_tgids_stop(struct seq_file *m, void *v)

static int saved_tgids_show(struct seq_file *m, void *v)
{
- int pid = (int *)v - tgid_map;
+ int *entry = (int *)v;
+ int pid, tgid;
+
+ // We already know that tgid_map is non-NULL here because the v
+ // argument is by definition a non-NULL pointer into tgid_map returned
+ // by saved_tgids_start() or saved_tgids_next().
+ pid = entry - tgid_map;
+ tgid = *entry;
+ if (tgid == 0)
+ return SEQ_SKIP;

- seq_printf(m, "%d %d\n", pid, trace_find_tgid(pid));
+ seq_printf(m, "%d %d\n", pid, tgid);
return 0;
}


base-commit: 62fb9874f5da54fdb243003b386128037319b219
--
2.32.0.93.g670b81a890-goog

2021-07-01 17:28:09

by Paul Burton

[permalink] [raw]
Subject: [PATCH v2 2/2] tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT

Currently tgid_map is sized at PID_MAX_DEFAULT entries, which means that
on systems where pid_max is configured higher than PID_MAX_DEFAULT the
ftrace record-tgid option doesn't work so well. Any tasks with PIDs
higher than PID_MAX_DEFAULT are simply not recorded in tgid_map, and
don't show up in the saved_tgids file.

In particular since systemd v243 & above configure pid_max to its
highest possible 1<<22 value by default on 64 bit systems this renders
the record-tgids option of little use.

Increase the size of tgid_map to the configured pid_max instead,
allowing it to cover the full range of PIDs up to the maximum value of
PID_MAX_LIMIT if the system is configured that way.

On 64 bit systems with pid_max == PID_MAX_LIMIT this will increase the
size of tgid_map from 256KiB to 16MiB. Whilst this 64x increase in
memory overhead sounds significant 64 bit systems are presumably best
placed to accommodate it, and since tgid_map is only allocated when the
record-tgid option is actually used presumably the user would rather it
spends sufficient memory to actually record the tgids they expect.

The size of tgid_map could also increase for CONFIG_BASE_SMALL=y
configurations, but these seem unlikely to be systems upon which people
are both configuring a large pid_max and running ftrace with record-tgid
anyway.

Of note is that we only allocate tgid_map once, the first time that the
record-tgid option is enabled. Therefore its size is only set once, to
the value of pid_max at the time the record-tgid option is first
enabled. If a user increases pid_max after that point, the saved_tgids
file will not contain entries for any tasks with pids beyond the earlier
value of pid_max.

Signed-off-by: Paul Burton <[email protected]>
Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: <[email protected]>
---
Changes in v2:
- Use the configured value of pid_max at the time record-tgid is enabled
rather than unconditionally using PID_MAX_LIMIT, to avoid added memory
overhead for systems that don't configure such a high pid_max.
---
kernel/trace/trace.c | 60 ++++++++++++++++++++++++++++++--------------
1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7a37c9e36b88..3c4b3b207c06 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2184,8 +2184,13 @@ void tracing_reset_all_online_cpus(void)
}
}

+// The tgid_map array maps from pid to tgid; i.e. the value stored at index i
+// is the tgid last observed corresponding to pid=i.
static int *tgid_map;

+// The maximum valid index into tgid_map.
+static size_t tgid_map_max;
+
#define SAVED_CMDLINES_DEFAULT 128
#define NO_CMDLINE_MAP UINT_MAX
static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED;
@@ -2458,24 +2463,39 @@ void trace_find_cmdline(int pid, char comm[])
preempt_enable();
}

+static int *trace_find_tgid_ptr(int pid)
+{
+ // Pairs with the smp_store_release in set_tracer_flag() to ensure that
+ // if we observe a non-NULL tgid_map then we also observe the correct
+ // tgid_map_max.
+ int *map = smp_load_acquire(&tgid_map);
+
+ if (unlikely(!map || pid > tgid_map_max))
+ return NULL;
+
+ return &map[pid];
+}
+
int trace_find_tgid(int pid)
{
- if (unlikely(!tgid_map || !pid || pid > PID_MAX_DEFAULT))
- return 0;
+ int *ptr = trace_find_tgid_ptr(pid);

- return tgid_map[pid];
+ return ptr ? *ptr : 0;
}

static int trace_save_tgid(struct task_struct *tsk)
{
+ int *ptr;
+
/* treat recording of idle task as a success */
if (!tsk->pid)
return 1;

- if (unlikely(!tgid_map || tsk->pid > PID_MAX_DEFAULT))
+ ptr = trace_find_tgid_ptr(tsk->pid);
+ if (!ptr)
return 0;

- tgid_map[tsk->pid] = tsk->tgid;
+ *ptr = tsk->tgid;
return 1;
}

@@ -5171,6 +5191,8 @@ int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set)

int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
{
+ int *map;
+
if ((mask == TRACE_ITER_RECORD_TGID) ||
(mask == TRACE_ITER_RECORD_CMD))
lockdep_assert_held(&event_mutex);
@@ -5193,10 +5215,17 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
trace_event_enable_cmd_record(enabled);

if (mask == TRACE_ITER_RECORD_TGID) {
- if (!tgid_map)
- tgid_map = kvcalloc(PID_MAX_DEFAULT + 1,
- sizeof(*tgid_map),
- GFP_KERNEL);
+ if (!tgid_map) {
+ tgid_map_max = pid_max;
+ map = kvcalloc(tgid_map_max + 1, sizeof(*tgid_map),
+ GFP_KERNEL);
+
+ // Pairs with smp_load_acquire() in
+ // trace_find_tgid_ptr() to ensure that if it observes
+ // the tgid_map we just allocated then it also observes
+ // the corresponding tgid_map_max value.
+ smp_store_release(&tgid_map, map);
+ }
if (!tgid_map) {
tr->trace_flags &= ~TRACE_ITER_RECORD_TGID;
return -ENOMEM;
@@ -5610,21 +5639,14 @@ static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
{
int pid = ++(*pos);

- if (pid > PID_MAX_DEFAULT)
- return NULL;
-
- // We already know that tgid_map is non-NULL here because the v
- // argument is by definition a non-NULL pointer into tgid_map returned
- // by saved_tgids_start() or an earlier call to saved_tgids_next().
- return &tgid_map[pid];
+ return trace_find_tgid_ptr(pid);
}

static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
{
- if (!tgid_map || *pos > PID_MAX_DEFAULT)
- return NULL;
+ int pid = *pos;

- return &tgid_map[*pos];
+ return trace_find_tgid_ptr(pid);
}

static void saved_tgids_stop(struct seq_file *m, void *v)
--
2.32.0.93.g670b81a890-goog

2021-07-01 17:36:14

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

Hi Joel,

On Wed, Jun 30, 2021 at 06:29:55PM -0400, Joel Fernandes wrote:
> On Tue, Jun 29, 2021 at 8:34 PM Paul Burton <[email protected]> wrote:
> >
> > The tgid_map array records a mapping from pid to tgid, where the index
> > of an entry within the array is the pid & the value stored at that index
> > is the tgid.
> >
> > The saved_tgids_next() function iterates over pointers into the tgid_map
> > array & dereferences the pointers which results in the tgid, but then it
> > passes that dereferenced value to trace_find_tgid() which treats it as a
> > pid & does a further lookup within the tgid_map array. It seems likely
> > that the intent here was to skip over entries in tgid_map for which the
> > recorded tgid is zero, but instead we end up skipping over entries for
> > which the thread group leader hasn't yet had its own tgid recorded in
> > tgid_map.
> >
> > A minimal fix would be to remove the call to trace_find_tgid, turning:
> >
> > if (trace_find_tgid(*ptr))
> >
> > into:
> >
> > if (*ptr)
> >
> > ..but it seems like this logic can be much simpler if we simply let
> > seq_read() iterate over the whole tgid_map array & filter out empty
> > entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
> > approach, removing the incorrect logic here entirely.
>
> Looks reasonable except for one nit:
>
> > Signed-off-by: Paul Burton <[email protected]>
> > Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Joel Fernandes <[email protected]>
> > Cc: <[email protected]>
> > ---
> > kernel/trace/trace.c | 38 +++++++++++++-------------------------
> > 1 file changed, 13 insertions(+), 25 deletions(-)
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index d23a09d3eb37b..9570667310bcc 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -5608,37 +5608,20 @@ static const struct file_operations tracing_readme_fops = {
> >
> > static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
> > {
> > - int *ptr = v;
> > + int pid = ++(*pos);
> >
> > - if (*pos || m->count)
> > - ptr++;
> > -
> > - (*pos)++;
> > -
> > - for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) {
> > - if (trace_find_tgid(*ptr))
> > - return ptr;
>
> It would be great if you can add back the check for !tgid_map to both
> next() and show() as well, for added robustness (since the old code
> previously did it).

That condition cannot happen, because both next() & show() are called to
iterate through the content of the seq_file & by definition their v
argument is non-NULL (else seq_file would have finished iterating
already). That argument came from either start() or an earlier call to
next(), which would only have returned a non-NULL pointer into tgid_map
if tgid_map is non-NULL.

I've added comments to this effect in v2, though the second patch in v2
does wind up effectively adding back the check in next() anyway in order
to reuse some code.

I was tempted to just add the redundant checks anyway (pick your battles
and all) but for show() in particular it wound up making things seem
non-sensical to me ("display the value describing this non-NULL pointer
into tgid_map only if tgid_map is not NULL?").

Thanks,
Paul

> With that change:
> Reviewed-by: Joel Fernandes (Google) <[email protected]>
>
> thanks,
>
> -Joel
>
>
> > - }
> > + if (pid > PID_MAX_DEFAULT)
> > + return NULL;
> >
> > - return NULL;
> > + return &tgid_map[pid];
> > }
> >
> > static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
> > {
> > - void *v;
> > - loff_t l = 0;
> > -
> > - if (!tgid_map)
> > + if (!tgid_map || *pos > PID_MAX_DEFAULT)
> > return NULL;
> >
> > - v = &tgid_map[0];
> > - while (l <= *pos) {
> > - v = saved_tgids_next(m, v, &l);
> > - if (!v)
> > - return NULL;
> > - }
> > -
> > - return v;
> > + return &tgid_map[*pos];
> > }
> >
> > static void saved_tgids_stop(struct seq_file *m, void *v)
> > @@ -5647,9 +5630,14 @@ static void saved_tgids_stop(struct seq_file *m, void *v)
> >
> > static int saved_tgids_show(struct seq_file *m, void *v)
> > {
> > - int pid = (int *)v - tgid_map;
> > + int *entry = (int *)v;
> > + int pid = entry - tgid_map;
> > + int tgid = *entry;
> > +
> > + if (tgid == 0)
> > + return SEQ_SKIP;
> >
> > - seq_printf(m, "%d %d\n", pid, trace_find_tgid(pid));
> > + seq_printf(m, "%d %d\n", pid, tgid);
> > return 0;
> > }
> >
> >
> > base-commit: 62fb9874f5da54fdb243003b386128037319b219
> > --
> > 2.32.0.93.g670b81a890-goog
> >

2021-07-01 18:10:44

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] tracing: Simplify & fix saved_tgids logic

On Thu, Jul 1, 2021 at 1:24 PM Paul Burton <[email protected]> wrote:
>
> The tgid_map array records a mapping from pid to tgid, where the index
> of an entry within the array is the pid & the value stored at that index
> is the tgid.
>
> The saved_tgids_next() function iterates over pointers into the tgid_map
> array & dereferences the pointers which results in the tgid, but then it
> passes that dereferenced value to trace_find_tgid() which treats it as a
> pid & does a further lookup within the tgid_map array. It seems likely
> that the intent here was to skip over entries in tgid_map for which the
> recorded tgid is zero, but instead we end up skipping over entries for
> which the thread group leader hasn't yet had its own tgid recorded in
> tgid_map.
>
> A minimal fix would be to remove the call to trace_find_tgid, turning:
>
> if (trace_find_tgid(*ptr))
>
> into:
>
> if (*ptr)
>
> ..but it seems like this logic can be much simpler if we simply let
> seq_read() iterate over the whole tgid_map array & filter out empty
> entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
> approach, removing the incorrect logic here entirely.
>
> Signed-off-by: Paul Burton <[email protected]>
> Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
> Cc: Steven Rostedt <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Joel Fernandes <[email protected]>

Reviewed-by: Joel Fernandes (Google) <[email protected]>

2021-07-01 18:11:28

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On Thu, Jul 1, 2021 at 2:07 PM Steven Rostedt <[email protected]> wrote:
>
> On Thu, 1 Jul 2021 10:31:59 -0700
> Paul Burton <[email protected]> wrote:
>
> > I was tempted to just add the redundant checks anyway (pick your battles
> > and all) but for show() in particular it wound up making things seem
> > non-sensical to me ("display the value describing this non-NULL pointer
> > into tgid_map only if tgid_map is not NULL?").
>
> I agree with your assessment, and will actually take your first patch,
> as I don't think the comment is that helpful, not to mention, we don't
> use '//' comments in the kernel, so that would have to be changed.
>
> But for cases like this, I usually have something like:
>
>
> if (WARN_ON_ONCE(!tgid_map))
> return -1;
>
> Because the logic is what makes tgid_map not being NULL, but as
> experience has taught me, the logic can sometimes be mistaken, at least
> as time goes by. And things that are protected by logic, deserve a
> WARN*() when it doesn't go as planned.
>
> We can always add that later, if needed.

Agreed, was thinking similar/same.

thanks,

-Joel

2021-07-01 18:12:41

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On Thu, 1 Jul 2021 10:31:59 -0700
Paul Burton <[email protected]> wrote:

> I was tempted to just add the redundant checks anyway (pick your battles
> and all) but for show() in particular it wound up making things seem
> non-sensical to me ("display the value describing this non-NULL pointer
> into tgid_map only if tgid_map is not NULL?").

I agree with your assessment, and will actually take your first patch,
as I don't think the comment is that helpful, not to mention, we don't
use '//' comments in the kernel, so that would have to be changed.

But for cases like this, I usually have something like:


if (WARN_ON_ONCE(!tgid_map))
return -1;

Because the logic is what makes tgid_map not being NULL, but as
experience has taught me, the logic can sometimes be mistaken, at least
as time goes by. And things that are protected by logic, deserve a
WARN*() when it doesn't go as planned.

We can always add that later, if needed.

-- Steve

2021-07-01 18:14:15

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On Thu, Jul 01, 2021 at 02:07:54PM -0400, Steven Rostedt wrote:
> On Thu, 1 Jul 2021 10:31:59 -0700
> Paul Burton <[email protected]> wrote:
>
> > I was tempted to just add the redundant checks anyway (pick your battles
> > and all) but for show() in particular it wound up making things seem
> > non-sensical to me ("display the value describing this non-NULL pointer
> > into tgid_map only if tgid_map is not NULL?").
>
> I agree with your assessment, and will actually take your first patch,
> as I don't think the comment is that helpful,

Thanks - agreed, the comment doesn't add much.

> not to mention, we don't
> use '//' comments in the kernel, so that would have to be changed.

D'oh! Apparently a year away from the kernel melted my internal style
checker. Interestingly though, checkpatch didn't complain about this as
I would have expected...

Thanks,
Paul

2021-07-01 18:14:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT

On Thu, 1 Jul 2021 10:24:07 -0700
Paul Burton <[email protected]> wrote:

> +static int *trace_find_tgid_ptr(int pid)
> +{
> + // Pairs with the smp_store_release in set_tracer_flag() to ensure that
> + // if we observe a non-NULL tgid_map then we also observe the correct
> + // tgid_map_max.

BTW, it's against the Linux kernel coding style to use // for comments.

I can take this patch, but I need to change this to:

/*
* Pairs with the smp_store_release in set_tracer_flag() to ensure that
* if we observe a non-NULL tgid_map then we also observe the correct
* tgid_map_max.
*/

Same with the other comments. Please follow coding style that can be
found in:

Documentation/process/coding-style.rst

And see section 8 on Commenting.

Thanks,

-- Steve


> + int *map = smp_load_acquire(&tgid_map);
> +
> + if (unlikely(!map || pid > tgid_map_max))
> + return NULL;
> +
> + return &map[pid];
> +}
> +

2021-07-01 18:16:56

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT

Hi Steven,

On Thu, Jul 01, 2021 at 02:12:21PM -0400, Steven Rostedt wrote:
> On Thu, 1 Jul 2021 10:24:07 -0700
> Paul Burton <[email protected]> wrote:
>
> > +static int *trace_find_tgid_ptr(int pid)
> > +{
> > + // Pairs with the smp_store_release in set_tracer_flag() to ensure that
> > + // if we observe a non-NULL tgid_map then we also observe the correct
> > + // tgid_map_max.
>
> BTW, it's against the Linux kernel coding style to use // for comments.
>
> I can take this patch, but I need to change this to:
>
> /*
> * Pairs with the smp_store_release in set_tracer_flag() to ensure that
> * if we observe a non-NULL tgid_map then we also observe the correct
> * tgid_map_max.
> */
>
> Same with the other comments. Please follow coding style that can be
> found in:
>
> Documentation/process/coding-style.rst
>
> And see section 8 on Commenting.

Yeah, sorry about that - I should know better having been a maintainer
in a former life...

Just to confirm - are you happy to fix those up when applying or should
I send a v3?

Thanks,
Paul

2021-07-01 18:32:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

[ Added Joe Perches ]

On Thu, 1 Jul 2021 11:12:54 -0700
Paul Burton <[email protected]> wrote:

> > not to mention, we don't
> > use '//' comments in the kernel, so that would have to be changed.
>
> D'oh! Apparently a year away from the kernel melted my internal style
> checker. Interestingly though, checkpatch didn't complain about this as
> I would have expected...

Joe, should the above be added to checkpatch?

I do understand that there are a few cases it's acceptable. Like for
SPDX headers.

-- Steve

2021-07-01 18:34:34

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT

On Thu, 1 Jul 2021 11:15:37 -0700
Paul Burton <[email protected]> wrote:

> Yeah, sorry about that - I should know better having been a maintainer
> in a former life...

No problem.

>
> Just to confirm - are you happy to fix those up when applying or should
> I send a v3?

I made the conversion and I'm going to start my testing now.

Joel, I never saw a reviewed-by from you for this patch.

Thanks!

-- Steve

2021-07-01 18:55:47

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On Thu, Jul 1, 2021 at 1:32 PM Paul Burton <[email protected]> wrote:
>
> Hi Joel,
>
> On Wed, Jun 30, 2021 at 06:29:55PM -0400, Joel Fernandes wrote:
> > On Tue, Jun 29, 2021 at 8:34 PM Paul Burton <[email protected]> wrote:
> > >
> > > The tgid_map array records a mapping from pid to tgid, where the index
> > > of an entry within the array is the pid & the value stored at that index
> > > is the tgid.
> > >
> > > The saved_tgids_next() function iterates over pointers into the tgid_map
> > > array & dereferences the pointers which results in the tgid, but then it
> > > passes that dereferenced value to trace_find_tgid() which treats it as a
> > > pid & does a further lookup within the tgid_map array. It seems likely
> > > that the intent here was to skip over entries in tgid_map for which the
> > > recorded tgid is zero, but instead we end up skipping over entries for
> > > which the thread group leader hasn't yet had its own tgid recorded in
> > > tgid_map.
> > >
> > > A minimal fix would be to remove the call to trace_find_tgid, turning:
> > >
> > > if (trace_find_tgid(*ptr))
> > >
> > > into:
> > >
> > > if (*ptr)
> > >
> > > ..but it seems like this logic can be much simpler if we simply let
> > > seq_read() iterate over the whole tgid_map array & filter out empty
> > > entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
> > > approach, removing the incorrect logic here entirely.
> >
> > Looks reasonable except for one nit:
> >
> > > Signed-off-by: Paul Burton <[email protected]>
> > > Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
> > > Cc: Steven Rostedt <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: Joel Fernandes <[email protected]>
> > > Cc: <[email protected]>
> > > ---
> > > kernel/trace/trace.c | 38 +++++++++++++-------------------------
> > > 1 file changed, 13 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > > index d23a09d3eb37b..9570667310bcc 100644
> > > --- a/kernel/trace/trace.c
> > > +++ b/kernel/trace/trace.c
> > > @@ -5608,37 +5608,20 @@ static const struct file_operations tracing_readme_fops = {
> > >
> > > static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
> > > {
> > > - int *ptr = v;
> > > + int pid = ++(*pos);
> > >
> > > - if (*pos || m->count)
> > > - ptr++;
> > > -
> > > - (*pos)++;
> > > -
> > > - for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) {
> > > - if (trace_find_tgid(*ptr))
> > > - return ptr;
> >
> > It would be great if you can add back the check for !tgid_map to both
> > next() and show() as well, for added robustness (since the old code
> > previously did it).
>
> That condition cannot happen, because both next() & show() are called to
> iterate through the content of the seq_file & by definition their v
> argument is non-NULL (else seq_file would have finished iterating
> already). That argument came from either start() or an earlier call to
> next(), which would only have returned a non-NULL pointer into tgid_map
> if tgid_map is non-NULL.

Hmm, You do have a point. Alright then. You could add my Reviewed-by
tag for this patch to subsequent postings.

thanks,
-Joel

2021-07-01 19:42:38

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On 2021-07-01 11:26, Steven Rostedt wrote:
> [ Added Joe Perches ]
>
> On Thu, 1 Jul 2021 11:12:54 -0700
> Paul Burton <[email protected]> wrote:
>
>> > not to mention, we don't
>> > use '//' comments in the kernel, so that would have to be changed.
>>
>> D'oh! Apparently a year away from the kernel melted my internal style
>> checker. Interestingly though, checkpatch didn't complain about this
>> as
>> I would have expected...
>
> Joe, should the above be added to checkpatch?
>
> I do understand that there are a few cases it's acceptable. Like for
> SPDX headers.

C99 comments are allowed since about 5 years ago.

And if you really want there's a checkpatch option to disable them.

https://lore.kernel.org/patchwork/patch/1031132/

2021-07-01 20:01:13

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On Thu, 01 Jul 2021 12:35:29 -0700
Joe Perches <[email protected]> wrote:

> C99 comments are allowed since about 5 years ago.

Really, I thought Linus hated them. Personally, I find them rather ugly
myself. The only user of them I see in the kernel/ directory appears to
be for RCU. But Paul's on the C/C++ committee, so perhaps he favors them.

The net/ directory doesn't have any, except perhaps to comment out code
(which I sometimes use it for that too).

The block/, arch/x86/ directories don't have them either.

I wouldn't go and change checkpatch, but I still rather avoid them,
especially for multi line comments.

/*
* When it comes to multi line comments I prefer using something
* that denotes a start and an end to the comment, as it makes it
* look like a nice clip of information.
*/

Instead of:

// When it comes to multi line comments I prefer using something
// that denotes a start and an end to the comment, as it makes it
// look like a nice clip of information.

Which just looks like noise. But hey, maybe that's just me because I
find "*" as a sign of information and '//' something to ignore. ;-)

-- Steve

2021-07-01 21:08:28

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On 2021-07-01 12:51, Steven Rostedt wrote:
> On Thu, 01 Jul 2021 12:35:29 -0700
> Joe Perches <[email protected]> wrote:
>
>> C99 comments are allowed since about 5 years ago.
>
> Really, I thought Linus hated them. Personally, I find them rather ugly
> myself. The only user of them I see in the kernel/ directory appears to
> be for RCU. But Paul's on the C/C++ committee, so perhaps he favors
> them.
>
> The net/ directory doesn't have any, except perhaps to comment out code
> (which I sometimes use it for that too).
>
> The block/, arch/x86/ directories don't have them either.
>
> I wouldn't go and change checkpatch, but I still rather avoid them,
> especially for multi line comments.
>
> /*
> * When it comes to multi line comments I prefer using something
> * that denotes a start and an end to the comment, as it makes it
> * look like a nice clip of information.
> */
>
> Instead of:
>
> // When it comes to multi line comments I prefer using something
> // that denotes a start and an end to the comment, as it makes it
> // look like a nice clip of information.
>
> Which just looks like noise. But hey, maybe that's just me because I
> find "*" as a sign of information and '//' something to ignore. ;-)

May I suggest using something other than an amber vt220?



2021-07-01 23:51:18

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: Simplify & fix saved_tgids logic

On Thu, Jul 1, 2021 at 5:07 PM Joe Perches <[email protected]> wrote:
>
> On 2021-07-01 12:51, Steven Rostedt wrote:
> > On Thu, 01 Jul 2021 12:35:29 -0700
> > Joe Perches <[email protected]> wrote:
> >
> >> C99 comments are allowed since about 5 years ago.
> >
> > Really, I thought Linus hated them. Personally, I find them rather ugly
> > myself. The only user of them I see in the kernel/ directory appears to
> > be for RCU. But Paul's on the C/C++ committee, so perhaps he favors
> > them.
> >
> > The net/ directory doesn't have any, except perhaps to comment out code
> > (which I sometimes use it for that too).
> >
> > The block/, arch/x86/ directories don't have them either.
> >
> > I wouldn't go and change checkpatch, but I still rather avoid them,
> > especially for multi line comments.
> >
> > /*
> > * When it comes to multi line comments I prefer using something
> > * that denotes a start and an end to the comment, as it makes it
> > * look like a nice clip of information.
> > */
> >
> > Instead of:
> >
> > // When it comes to multi line comments I prefer using something
> > // that denotes a start and an end to the comment, as it makes it
> > // look like a nice clip of information.
> >
> > Which just looks like noise. But hey, maybe that's just me because I
> > find "*" as a sign of information and '//' something to ignore. ;-)
>
> May I suggest using something other than an amber vt220?

Steve - mostly comments are to be ignored and the code is the ultimate
source of truth ;-), so // is fine :-D

That said, don't discard the amber vt220 I recently sent you just
because Joe says so ;-) <:o)

- Joel