2021-11-17 19:40:06

by Gerald Schaefer

[permalink] [raw]
Subject: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

Reading from alloc/free_traces attribute in /sys/kernel/debug/slab/ results
in an endless sequence of "No data". This is because slab_debugfs_start()
does not check for a "past end of file" condition and return NULL.

Fix it by adding such a check and return NULL.

Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs")
Cc: <[email protected]> # v5.14+
Reported-by: Steffen Maier <[email protected]>
Signed-off-by: Gerald Schaefer <[email protected]>
---
mm/slub.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index f7368bfffb7a..336609671bc2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -6115,6 +6115,11 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)

static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
{
+ struct loc_track *t = seq->private;
+
+ if (*ppos > t->count)
+ return NULL;
+
return ppos;
}

--
2.25.1



2021-11-19 10:41:42

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

On 11/17/21 20:39, Gerald Schaefer wrote:
> Reading from alloc/free_traces attribute in /sys/kernel/debug/slab/ results
> in an endless sequence of "No data". This is because slab_debugfs_start()
> does not check for a "past end of file" condition and return NULL.

I still have no idea how that endless sequence happens.
To get it, we would have to call slab_debugfs_show() repeatedly with such v
that *v == 0. Which should only happen with slab_debugfs_start() with *ppos
== 0. Which your patch won't change because you add a '*ppos > t->count'
condition, so *ppos has to be at least 1 to trigger this.

But yeah, AFAIK we should detect this in slab_debugfs_start() anyway.
But I think the condition should be something like below, because we are
past end of file already with *ppos == t->count. But if both are 0, we want
to proceed for the "No data" output.

// to show the No data
if (!*ppos && !t->count)
return ppos;

if (*ppos >= t->count)
return ppos;

return ppos;

> Fix it by adding such a check and return NULL.
>
> Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs")
> Cc: <[email protected]> # v5.14+
> Reported-by: Steffen Maier <[email protected]>
> Signed-off-by: Gerald Schaefer <[email protected]>
> ---
> mm/slub.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index f7368bfffb7a..336609671bc2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -6115,6 +6115,11 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
>
> static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
> {
> + struct loc_track *t = seq->private;
> +
> + if (*ppos > t->count)
> + return NULL;
> +
> return ppos;
> }
>
>


2021-11-19 20:00:37

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

On Fri, 19 Nov 2021 11:41:38 +0100
Vlastimil Babka <[email protected]> wrote:

> On 11/17/21 20:39, Gerald Schaefer wrote:
> > Reading from alloc/free_traces attribute in /sys/kernel/debug/slab/ results
> > in an endless sequence of "No data". This is because slab_debugfs_start()
> > does not check for a "past end of file" condition and return NULL.
>
> I still have no idea how that endless sequence happens.
> To get it, we would have to call slab_debugfs_show() repeatedly with such v
> that *v == 0. Which should only happen with slab_debugfs_start() with *ppos
> == 0. Which your patch won't change because you add a '*ppos > t->count'
> condition, so *ppos has to be at least 1 to trigger this.

Yes, very strange. After a closer look to fs/seq_file.c, especially
seq_read_iter(), it seems that op->next will only be called when m->count == 0,
at least in the first while(1) loop. Printing "No data\n" sets m->count
to 8, so it will continue after Fill:, then call op->next, which returns NULL
and breaks the second while(1) loop, and also calls op->stop. Then it returns
from seq_read_iter(), only to be called again, and again, ...

Only when op->start returns NULL it will end it for good, probably
because seq_read_iter() will then return 0 instead of 8. Not sure if
there is a better way to fix this than by adding a second "return NULL"
to op->start, which feels a bit awkward and makes you wonder why the
"return NULL" from op->next is not enough.

>
> But yeah, AFAIK we should detect this in slab_debugfs_start() anyway.
> But I think the condition should be something like below, because we are
> past end of file already with *ppos == t->count. But if both are 0, we want
> to proceed for the "No data" output.

Ah ok, I wasn't sure about the "t->count > 0" case, i.e. if the check for
"*ppos > t->count" would still be correct there. So apparently it wouldn't,
and we need two checks, like you suggested

>
> // to show the No data
> if (!*ppos && !t->count)
> return ppos;
>
> if (*ppos >= t->count)
> return ppos;

That should be return NULL here, right?

>
> return ppos;
>

Will send a new patch, unless I find a better way after investigating the
endless seq_read_iter() calls mentioned above.
Is there an easy way to test the "t->count > 0" case, i.e. what would need
to be done to get some other reply than "No data"?

2021-11-22 15:02:33

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

On 11/19/21 20:59, Gerald Schaefer wrote:
> On Fri, 19 Nov 2021 11:41:38 +0100
> Vlastimil Babka <[email protected]> wrote:
>
>> On 11/17/21 20:39, Gerald Schaefer wrote:
>> > Reading from alloc/free_traces attribute in /sys/kernel/debug/slab/ results
>> > in an endless sequence of "No data". This is because slab_debugfs_start()
>> > does not check for a "past end of file" condition and return NULL.
>>
>> I still have no idea how that endless sequence happens.
>> To get it, we would have to call slab_debugfs_show() repeatedly with such v
>> that *v == 0. Which should only happen with slab_debugfs_start() with *ppos
>> == 0. Which your patch won't change because you add a '*ppos > t->count'
>> condition, so *ppos has to be at least 1 to trigger this.
>
> Yes, very strange. After a closer look to fs/seq_file.c, especially
> seq_read_iter(), it seems that op->next will only be called when m->count == 0,
> at least in the first while(1) loop. Printing "No data\n" sets m->count
> to 8, so it will continue after Fill:, then call op->next, which returns NULL
> and breaks the second while(1) loop, and also calls op->stop. Then it returns
> from seq_read_iter(), only to be called again, and again, ...
>
> Only when op->start returns NULL it will end it for good, probably
> because seq_read_iter() will then return 0 instead of 8.

Ah, thanks for investigating.

> Not sure if
> there is a better way to fix this than by adding a second "return NULL"
> to op->start, which feels a bit awkward and makes you wonder why the
> "return NULL" from op->next is not enough.

I think it's fine to require op->start to return NULL, even if it didn't
cause this infinite loop.

>>
>> But yeah, AFAIK we should detect this in slab_debugfs_start() anyway.
>> But I think the condition should be something like below, because we are
>> past end of file already with *ppos == t->count. But if both are 0, we want
>> to proceed for the "No data" output.
>
> Ah ok, I wasn't sure about the "t->count > 0" case, i.e. if the check for
> "*ppos > t->count" would still be correct there. So apparently it wouldn't,
> and we need two checks, like you suggested
>
>>
>> // to show the No data
>> if (!*ppos && !t->count)
>> return ppos;
>>
>> if (*ppos >= t->count)
>> return ppos;
>
> That should be return NULL here, right?

Doh, right.

>>
>> return ppos;
>>
>
> Will send a new patch, unless I find a better way after investigating the
> endless seq_read_iter() calls mentioned above.
> Is there an easy way to test the "t->count > 0" case, i.e. what would need
> to be done to get some other reply than "No data"?

Hm the debugfs files alloc_tracess/free_traces for any cache with non-zero
objects (see /proc/slabinfo for that) should have t->count > 0. If the files
are created for a cache, it means the related SLAB_STORE_USER debugging was
enabled both during config and boot-time. If you see only a few caches with
alloc_tracess/free_traces (because they are from e.g. some test module that
adds SLAB_STORE_USER explicitly) and all happen to have 0 objects, boot with
slub_debug=U parameter and then all caches will have this enabled and many
will have >0 objects.

2021-11-22 20:14:25

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

On Mon, 22 Nov 2021 16:02:28 +0100
Vlastimil Babka <[email protected]> wrote:

> On 11/19/21 20:59, Gerald Schaefer wrote:
> > On Fri, 19 Nov 2021 11:41:38 +0100
> > Vlastimil Babka <[email protected]> wrote:
> >
> >> On 11/17/21 20:39, Gerald Schaefer wrote:
> >> > Reading from alloc/free_traces attribute in /sys/kernel/debug/slab/ results
> >> > in an endless sequence of "No data". This is because slab_debugfs_start()
> >> > does not check for a "past end of file" condition and return NULL.
> >>
> >> I still have no idea how that endless sequence happens.
> >> To get it, we would have to call slab_debugfs_show() repeatedly with such v
> >> that *v == 0. Which should only happen with slab_debugfs_start() with *ppos
> >> == 0. Which your patch won't change because you add a '*ppos > t->count'
> >> condition, so *ppos has to be at least 1 to trigger this.
> >
> > Yes, very strange. After a closer look to fs/seq_file.c, especially
> > seq_read_iter(), it seems that op->next will only be called when m->count == 0,
> > at least in the first while(1) loop. Printing "No data\n" sets m->count
> > to 8, so it will continue after Fill:, then call op->next, which returns NULL
> > and breaks the second while(1) loop, and also calls op->stop. Then it returns
> > from seq_read_iter(), only to be called again, and again, ...
> >
> > Only when op->start returns NULL it will end it for good, probably
> > because seq_read_iter() will then return 0 instead of 8.
>
> Ah, thanks for investigating.
>
> > Not sure if
> > there is a better way to fix this than by adding a second "return NULL"
> > to op->start, which feels a bit awkward and makes you wonder why the
> > "return NULL" from op->next is not enough.
>
> I think it's fine to require op->start to return NULL, even if it didn't
> cause this infinite loop.

Yes, this all seems to work as designed, and it is slowly beginning to
make some sense. The only other way to stop it would be to return from
op->show, w/o printing anything, like it would be done for t->count > 0,
when idx becomes >= t->count. In theory at least, see below.

>
> >>
> >> But yeah, AFAIK we should detect this in slab_debugfs_start() anyway.
> >> But I think the condition should be something like below, because we are
> >> past end of file already with *ppos == t->count. But if both are 0, we want
> >> to proceed for the "No data" output.
> >
> > Ah ok, I wasn't sure about the "t->count > 0" case, i.e. if the check for
> > "*ppos > t->count" would still be correct there. So apparently it wouldn't,
> > and we need two checks, like you suggested
> >
> >>
> >> // to show the No data
> >> if (!*ppos && !t->count)
> >> return ppos;
> >>
> >> if (*ppos >= t->count)
> >> return ppos;
> >
> > That should be return NULL here, right?
>
> Doh, right.
>
> >>
> >> return ppos;
> >>
> >
> > Will send a new patch, unless I find a better way after investigating the
> > endless seq_read_iter() calls mentioned above.
> > Is there an easy way to test the "t->count > 0" case, i.e. what would need
> > to be done to get some other reply than "No data"?
>
> Hm the debugfs files alloc_tracess/free_traces for any cache with non-zero
> objects (see /proc/slabinfo for that) should have t->count > 0. If the files
> are created for a cache, it means the related SLAB_STORE_USER debugging was
> enabled both during config and boot-time. If you see only a few caches with
> alloc_tracess/free_traces (because they are from e.g. some test module that
> adds SLAB_STORE_USER explicitly) and all happen to have 0 objects, boot with
> slub_debug=U parameter and then all caches will have this enabled and many
> will have >0 objects.

Thanks. While testing this properly, yet another bug showed up. The idx
in op->show remains 0 in all iterations, so I always see the same line
printed t->count times (or infinitely, ATM). Not sure if this only shows
on s390 due to endianness, but the reason is this:

unsigned int idx = *(unsigned int *)v;

IIUC, void *v is always the same as loff_t *ppos, and therefore idx also
should be *ppos. De-referencing the loff_t * with an unsigned int * only
gives the upper 32 bit half of the 64 bit value, which remains 0.

This would be fixed e.g. with

unsigned int idx = (unsigned int) *(loff_t *) v;

With this fixed, my original patch actually also works for t->count > 0,
because then op->show would return w/o printing anything when idx reaches
t->count. For t->count > 0, it would even work w/o any extra checks in
op->start because of that, only "No data" would be printed infinitely.

It probably still makes sense to make this explicit in op->start, by
checking separately for !*ppos && !t->count, and returning NULL for
*ppos >= t->count, as you suggested.

I think I will also make idx an unsigned long again, like it was before
commit 64dd68497be7, and similar to t->count. Not sure if it needs to
be, and with proper casting unsigned int is also possible, but why
change it?

2021-11-22 20:34:13

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

On Mon, 22 Nov 2021 21:14:00 +0100
Gerald Schaefer <[email protected]> wrote:

[...]
>
> Thanks. While testing this properly, yet another bug showed up. The idx
> in op->show remains 0 in all iterations, so I always see the same line
> printed t->count times (or infinitely, ATM). Not sure if this only shows
> on s390 due to endianness, but the reason is this:
>
> unsigned int idx = *(unsigned int *)v;
>
> IIUC, void *v is always the same as loff_t *ppos, and therefore idx also
> should be *ppos. De-referencing the loff_t * with an unsigned int * only
> gives the upper 32 bit half of the 64 bit value, which remains 0.
>
> This would be fixed e.g. with
>
> unsigned int idx = (unsigned int) *(loff_t *) v;
>
> With this fixed, my original patch actually also works for t->count > 0,
> because then op->show would return w/o printing anything when idx reaches
> t->count. For t->count > 0, it would even work w/o any extra checks in
> op->start because of that, only "No data" would be printed infinitely.

Oh, no, that would actually also fix the "No data" part, as op->show
will then also return w/o printing in the next iteration, so that op->next
would correctly end it all.

This could also explain why it might all have worked fine on x86 (haven't
verified), and really only showed on big-endian s390.

Hmm, now I'm not so sure anymore if we really want the additional
checks and return NULL in op->start, just to make it "double safe".

>
> It probably still makes sense to make this explicit in op->start, by
> checking separately for !*ppos && !t->count, and returning NULL for
> *ppos >= t->count, as you suggested.
>
> I think I will also make idx an unsigned long again, like it was before
> commit 64dd68497be7, and similar to t->count. Not sure if it needs to
> be, and with proper casting unsigned int is also possible, but why
> change it?


2021-11-23 14:19:53

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

On 11/22/21 21:33, Gerald Schaefer wrote:
> On Mon, 22 Nov 2021 21:14:00 +0100
> Gerald Schaefer <[email protected]> wrote:
>
> [...]
>>
>> Thanks. While testing this properly, yet another bug showed up. The idx
>> in op->show remains 0 in all iterations, so I always see the same line
>> printed t->count times (or infinitely, ATM). Not sure if this only shows
>> on s390 due to endianness, but the reason is this:
>>
>> unsigned int idx = *(unsigned int *)v;

Uh, good catch. I was actually looking suspiciously at how we cast signed to
unsigned, but didn't occur to me that shortening together with endiannes is
the problem.

>>
>> IIUC, void *v is always the same as loff_t *ppos, and therefore idx also
>> should be *ppos. De-referencing the loff_t * with an unsigned int * only
>> gives the upper 32 bit half of the 64 bit value, which remains 0.
>>
>> This would be fixed e.g. with
>>
>> unsigned int idx = (unsigned int) *(loff_t *) v;

With all this experience I'm now inclined to rather follow more the example
in Documentation/filesystems/seq_file.rst and don't pass around the pointer
that we got as ppos in slab_debugfs_start(), and that seq_file.c points to
m->index.

In that example an own value is kmalloced:

loff_t *spos = kmalloc(sizeof(loff_t), GFP_KERNEL);

while we could just make this a field of loc_track?


>> With this fixed, my original patch actually also works for t->count > 0,
>> because then op->show would return w/o printing anything when idx reaches
>> t->count. For t->count > 0, it would even work w/o any extra checks in
>> op->start because of that, only "No data" would be printed infinitely.
>
> Oh, no, that would actually also fix the "No data" part, as op->show
> will then also return w/o printing in the next iteration, so that op->next
> would correctly end it all.
>
> This could also explain why it might all have worked fine on x86 (haven't
> verified), and really only showed on big-endian s390.
>
> Hmm, now I'm not so sure anymore if we really want the additional
> checks and return NULL in op->start, just to make it "double safe".

I guess we don't.

>>
>> It probably still makes sense to make this explicit in op->start, by
>> checking separately for !*ppos && !t->count, and returning NULL for
>> *ppos >= t->count, as you suggested.
>>
>> I think I will also make idx an unsigned long again, like it was before
>> commit 64dd68497be7, and similar to t->count. Not sure if it needs to
>> be, and with proper casting unsigned int is also possible, but why
>> change it?
>


2021-11-25 16:15:26

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

On Tue, 23 Nov 2021 15:19:49 +0100
Vlastimil Babka <[email protected]> wrote:

> On 11/22/21 21:33, Gerald Schaefer wrote:
> > On Mon, 22 Nov 2021 21:14:00 +0100
> > Gerald Schaefer <[email protected]> wrote:
> >
> > [...]
> >>
> >> Thanks. While testing this properly, yet another bug showed up. The idx
> >> in op->show remains 0 in all iterations, so I always see the same line
> >> printed t->count times (or infinitely, ATM). Not sure if this only shows
> >> on s390 due to endianness, but the reason is this:
> >>
> >> unsigned int idx = *(unsigned int *)v;
>
> Uh, good catch. I was actually looking suspiciously at how we cast signed to
> unsigned, but didn't occur to me that shortening together with endiannes is
> the problem.
>
> >>
> >> IIUC, void *v is always the same as loff_t *ppos, and therefore idx also
> >> should be *ppos. De-referencing the loff_t * with an unsigned int * only
> >> gives the upper 32 bit half of the 64 bit value, which remains 0.
> >>
> >> This would be fixed e.g. with
> >>
> >> unsigned int idx = (unsigned int) *(loff_t *) v;
>
> With all this experience I'm now inclined to rather follow more the example
> in Documentation/filesystems/seq_file.rst and don't pass around the pointer
> that we got as ppos in slab_debugfs_start(), and that seq_file.c points to
> m->index.
>
> In that example an own value is kmalloced:
>
> loff_t *spos = kmalloc(sizeof(loff_t), GFP_KERNEL);
>
> while we could just make this a field of loc_track?

Yes, following the example sounds good, and it would also make proper use
of *v in op->next, which might make the code more readable. It also looks
like it already does exactly what is needed here, i.e. have a simple
iterator that just counts the lines.

I don't think the iterator needs to be saved in loc_track. IIUC, it is
already passed around like in the example, and can then be simply compared
to t->count, similar to the existing code.

This is what I'm currently testing, and it seems to work fine. Will send
a new patch, if there are no objections:

---
mm/slub.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index a8626825a829..effb7b8d8f88 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -6052,10 +6052,9 @@ __initcall(slab_sysfs_init);
#if defined(CONFIG_SLUB_DEBUG) && defined(CONFIG_DEBUG_FS)
static int slab_debugfs_show(struct seq_file *seq, void *v)
{
-
- struct location *l;
- unsigned int idx = *(unsigned int *)v;
struct loc_track *t = seq->private;
+ loff_t idx = *(loff_t *) v;
+ struct location *l;

if (idx < t->count) {
l = &t->loc[idx];
@@ -6099,23 +6098,29 @@ static int slab_debugfs_show(struct seq_file *seq, void *v)

static void slab_debugfs_stop(struct seq_file *seq, void *v)
{
+ kfree(v);
}

static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
{
struct loc_track *t = seq->private;
+ loff_t *idxp = (loff_t *) v;

- v = ppos;
- ++*ppos;
- if (*ppos <= t->count)
- return v;
+ *ppos = ++(*idxp);
+ if (*idxp <= t->count)
+ return idxp;

return NULL;
}

static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
{
- return ppos;
+ loff_t *idxp = kmalloc(sizeof(loff_t), GFP_KERNEL);
+
+ if (!idxp)
+ return NULL;
+ *idxp = *ppos;
+ return idxp;
}

static const struct seq_operations slab_debugfs_sops = {
--
2.32.0


2021-11-25 20:15:07

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

On Thu, 25 Nov 2021 17:13:10 +0100
Gerald Schaefer <[email protected]> wrote:

> On Tue, 23 Nov 2021 15:19:49 +0100
> Vlastimil Babka <[email protected]> wrote:
>
> > On 11/22/21 21:33, Gerald Schaefer wrote:
> > > On Mon, 22 Nov 2021 21:14:00 +0100
> > > Gerald Schaefer <[email protected]> wrote:
> > >
> > > [...]
> > >>
> > >> Thanks. While testing this properly, yet another bug showed up. The idx
> > >> in op->show remains 0 in all iterations, so I always see the same line
> > >> printed t->count times (or infinitely, ATM). Not sure if this only shows
> > >> on s390 due to endianness, but the reason is this:
> > >>
> > >> unsigned int idx = *(unsigned int *)v;
> >
> > Uh, good catch. I was actually looking suspiciously at how we cast signed to
> > unsigned, but didn't occur to me that shortening together with endiannes is
> > the problem.
> >
> > >>
> > >> IIUC, void *v is always the same as loff_t *ppos, and therefore idx also
> > >> should be *ppos. De-referencing the loff_t * with an unsigned int * only
> > >> gives the upper 32 bit half of the 64 bit value, which remains 0.
> > >>
> > >> This would be fixed e.g. with
> > >>
> > >> unsigned int idx = (unsigned int) *(loff_t *) v;
> >
> > With all this experience I'm now inclined to rather follow more the example
> > in Documentation/filesystems/seq_file.rst and don't pass around the pointer
> > that we got as ppos in slab_debugfs_start(), and that seq_file.c points to
> > m->index.
> >
> > In that example an own value is kmalloced:
> >
> > loff_t *spos = kmalloc(sizeof(loff_t), GFP_KERNEL);
> >
> > while we could just make this a field of loc_track?
>
> Yes, following the example sounds good, and it would also make proper use
> of *v in op->next, which might make the code more readable. It also looks
> like it already does exactly what is needed here, i.e. have a simple
> iterator that just counts the lines.
>
> I don't think the iterator needs to be saved in loc_track. IIUC, it is
> already passed around like in the example, and can then be simply compared
> to t->count, similar to the existing code.
>
> This is what I'm currently testing, and it seems to work fine. Will send
> a new patch, if there are no objections:

Oh well, I have one objection, returning NULL from op->next will be
passed to op->stop, and then it will not free the allocated value.

The example is elegantly avoiding this, by not returning NULL anywhere,
and also not stopping. Sigh.

Maybe not return NULL in op->next, but only from op->start, and only
when no allocation was made or it was freed already? Or free it only/
already in op->next, when returning NULL?

2021-11-25 22:03:46

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

On 11/25/21 21:12, Gerald Schaefer wrote:
> On Thu, 25 Nov 2021 17:13:10 +0100
> Gerald Schaefer <[email protected]> wrote:
>
>> On Tue, 23 Nov 2021 15:19:49 +0100
>> Vlastimil Babka <[email protected]> wrote:
>>
>>> On 11/22/21 21:33, Gerald Schaefer wrote:
>>>> On Mon, 22 Nov 2021 21:14:00 +0100
>>>> Gerald Schaefer <[email protected]> wrote:
>>>>
>>>> [...]
>>>>>
>>>>> Thanks. While testing this properly, yet another bug showed up. The idx
>>>>> in op->show remains 0 in all iterations, so I always see the same line
>>>>> printed t->count times (or infinitely, ATM). Not sure if this only shows
>>>>> on s390 due to endianness, but the reason is this:
>>>>>
>>>>> unsigned int idx = *(unsigned int *)v;
>>>
>>> Uh, good catch. I was actually looking suspiciously at how we cast signed to
>>> unsigned, but didn't occur to me that shortening together with endiannes is
>>> the problem.
>>>
>>>>>
>>>>> IIUC, void *v is always the same as loff_t *ppos, and therefore idx also
>>>>> should be *ppos. De-referencing the loff_t * with an unsigned int * only
>>>>> gives the upper 32 bit half of the 64 bit value, which remains 0.
>>>>>
>>>>> This would be fixed e.g. with
>>>>>
>>>>> unsigned int idx = (unsigned int) *(loff_t *) v;
>>>
>>> With all this experience I'm now inclined to rather follow more the example
>>> in Documentation/filesystems/seq_file.rst and don't pass around the pointer
>>> that we got as ppos in slab_debugfs_start(), and that seq_file.c points to
>>> m->index.
>>>
>>> In that example an own value is kmalloced:
>>>
>>> loff_t *spos = kmalloc(sizeof(loff_t), GFP_KERNEL);
>>>
>>> while we could just make this a field of loc_track?
>>
>> Yes, following the example sounds good, and it would also make proper use
>> of *v in op->next, which might make the code more readable. It also looks
>> like it already does exactly what is needed here, i.e. have a simple
>> iterator that just counts the lines.
>>
>> I don't think the iterator needs to be saved in loc_track. IIUC, it is
>> already passed around like in the example, and can then be simply compared
>> to t->count, similar to the existing code.

Saving it the loc_track doesn't preclude using the pointer that's being
passed around. It however avoids the extra kmalloc and turns out it
should also solve the return NULL from op->next() issue you describe below?

>> This is what I'm currently testing, and it seems to work fine. Will send
>> a new patch, if there are no objections:
>
> Oh well, I have one objection, returning NULL from op->next will be
> passed to op->stop, and then it will not free the allocated value.
>
> The example is elegantly avoiding this, by not returning NULL anywhere,
> and also not stopping. Sigh.
>
> Maybe not return NULL in op->next, but only from op->start, and only
> when no allocation was made or it was freed already? Or free it only/
> already in op->next, when returning NULL?

From these two probably the "free in op->next". But still inclined to
store it in loc_track.
Why does the API need to be so awkward...



2021-11-26 17:14:03

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

On Thu, 25 Nov 2021 23:00:47 +0100
Vlastimil Babka <[email protected]> wrote:

> On 11/25/21 21:12, Gerald Schaefer wrote:
> > On Thu, 25 Nov 2021 17:13:10 +0100
> > Gerald Schaefer <[email protected]> wrote:
> >
> >> On Tue, 23 Nov 2021 15:19:49 +0100
> >> Vlastimil Babka <[email protected]> wrote:
> >>
> >>> On 11/22/21 21:33, Gerald Schaefer wrote:
> >>>> On Mon, 22 Nov 2021 21:14:00 +0100
> >>>> Gerald Schaefer <[email protected]> wrote:
> >>>>
> >>>> [...]
> >>>>>
> >>>>> Thanks. While testing this properly, yet another bug showed up. The idx
> >>>>> in op->show remains 0 in all iterations, so I always see the same line
> >>>>> printed t->count times (or infinitely, ATM). Not sure if this only shows
> >>>>> on s390 due to endianness, but the reason is this:
> >>>>>
> >>>>> unsigned int idx = *(unsigned int *)v;
> >>>
> >>> Uh, good catch. I was actually looking suspiciously at how we cast signed to
> >>> unsigned, but didn't occur to me that shortening together with endiannes is
> >>> the problem.
> >>>
> >>>>>
> >>>>> IIUC, void *v is always the same as loff_t *ppos, and therefore idx also
> >>>>> should be *ppos. De-referencing the loff_t * with an unsigned int * only
> >>>>> gives the upper 32 bit half of the 64 bit value, which remains 0.
> >>>>>
> >>>>> This would be fixed e.g. with
> >>>>>
> >>>>> unsigned int idx = (unsigned int) *(loff_t *) v;
> >>>
> >>> With all this experience I'm now inclined to rather follow more the example
> >>> in Documentation/filesystems/seq_file.rst and don't pass around the pointer
> >>> that we got as ppos in slab_debugfs_start(), and that seq_file.c points to
> >>> m->index.
> >>>
> >>> In that example an own value is kmalloced:
> >>>
> >>> loff_t *spos = kmalloc(sizeof(loff_t), GFP_KERNEL);
> >>>
> >>> while we could just make this a field of loc_track?
> >>
> >> Yes, following the example sounds good, and it would also make proper use
> >> of *v in op->next, which might make the code more readable. It also looks
> >> like it already does exactly what is needed here, i.e. have a simple
> >> iterator that just counts the lines.
> >>
> >> I don't think the iterator needs to be saved in loc_track. IIUC, it is
> >> already passed around like in the example, and can then be simply compared
> >> to t->count, similar to the existing code.
>
> Saving it the loc_track doesn't preclude using the pointer that's being
> passed around. It however avoids the extra kmalloc and turns out it
> should also solve the return NULL from op->next() issue you describe below?

Yes, storing idx in loc_track seems straight-forward, and should also
improve code readability.

Patch will follow right here. I made idx a loff_t, as it seemed easier to
handle in op->start/next, then casted to unsigned long again in op->show.
Not sure if the unsigned int had any benefit. Given that loff_t is a signed
64 bit type, I guess both are wrong if it should ever turn negative (can it,
i.e. can ppos turn negative?). With unsigned long, chances are better that
it will at least turn into something big enough to not pass the
"idx < t->count" check.

>
> >> This is what I'm currently testing, and it seems to work fine. Will send
> >> a new patch, if there are no objections:
> >
> > Oh well, I have one objection, returning NULL from op->next will be
> > passed to op->stop, and then it will not free the allocated value.
> >
> > The example is elegantly avoiding this, by not returning NULL anywhere,
> > and also not stopping. Sigh.
> >
> > Maybe not return NULL in op->next, but only from op->start, and only
> > when no allocation was made or it was freed already? Or free it only/
> > already in op->next, when returning NULL?
>
> From these two probably the "free in op->next". But still inclined to
> store it in loc_track.
> Why does the API need to be so awkward...

Maybe we are just holding it wrong :-)

2021-11-26 17:21:11

by Gerald Schaefer

[permalink] [raw]
Subject: [PATCH] mm/slub: fix endianness bug for alloc/free_traces attributes

On big-endian s390, the alloc/free_traces attributes produce endless
output, because of always 0 idx in slab_debugfs_show().

idx is de-referenced from *v, which points to a loff_t value, with

unsigned int idx = *(unsigned int *)v;

This will only give the upper 32 bits on big-endian, which remain 0.

Instead of only fixing this de-reference, during discussion it seemed
more appropriate to change the seq_ops so that they use an explicit
iterator in private loc_track struct.

This patch adds idx to loc_track, which will also fix the endianness bug.

Link: https://lore.kernel.org/r/[email protected]
Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs")
Cc: <[email protected]> # v5.14+
Reported-by: Steffen Maier <[email protected]>
Signed-off-by: Gerald Schaefer <[email protected]>
---
mm/slub.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index a8626825a829..abe7db581d68 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5081,6 +5081,7 @@ struct loc_track {
unsigned long max;
unsigned long count;
struct location *loc;
+ loff_t idx;
};

static struct dentry *slab_debugfs_root;
@@ -6052,11 +6053,11 @@ __initcall(slab_sysfs_init);
#if defined(CONFIG_SLUB_DEBUG) && defined(CONFIG_DEBUG_FS)
static int slab_debugfs_show(struct seq_file *seq, void *v)
{
-
- struct location *l;
- unsigned int idx = *(unsigned int *)v;
struct loc_track *t = seq->private;
+ struct location *l;
+ unsigned long idx;

+ idx = (unsigned long) t->idx;
if (idx < t->count) {
l = &t->loc[idx];

@@ -6105,16 +6106,18 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
{
struct loc_track *t = seq->private;

- v = ppos;
- ++*ppos;
+ t->idx = ++(*ppos);
if (*ppos <= t->count)
- return v;
+ return ppos;

return NULL;
}

static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
{
+ struct loc_track *t = seq->private;
+
+ t->idx = *ppos;
return ppos;
}

--
2.32.0


2021-11-29 14:28:57

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] mm/slub: fix endianness bug for alloc/free_traces attributes

On 11/26/21 18:18, Gerald Schaefer wrote:
> On big-endian s390, the alloc/free_traces attributes produce endless
> output, because of always 0 idx in slab_debugfs_show().
>
> idx is de-referenced from *v, which points to a loff_t value, with
>
> unsigned int idx = *(unsigned int *)v;
>
> This will only give the upper 32 bits on big-endian, which remain 0.
>
> Instead of only fixing this de-reference, during discussion it seemed
> more appropriate to change the seq_ops so that they use an explicit
> iterator in private loc_track struct.
>
> This patch adds idx to loc_track, which will also fix the endianness bug.
>
> Link: https://lore.kernel.org/r/[email protected]
> Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs")
> Cc: <[email protected]> # v5.14+
> Reported-by: Steffen Maier <[email protected]>
> Signed-off-by: Gerald Schaefer <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

With a nit below:

> ---
> mm/slub.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index a8626825a829..abe7db581d68 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5081,6 +5081,7 @@ struct loc_track {
> unsigned long max;
> unsigned long count;
> struct location *loc;
> + loff_t idx;
> };
>
> static struct dentry *slab_debugfs_root;
> @@ -6052,11 +6053,11 @@ __initcall(slab_sysfs_init);
> #if defined(CONFIG_SLUB_DEBUG) && defined(CONFIG_DEBUG_FS)
> static int slab_debugfs_show(struct seq_file *seq, void *v)
> {
> -
> - struct location *l;
> - unsigned int idx = *(unsigned int *)v;
> struct loc_track *t = seq->private;
> + struct location *l;
> + unsigned long idx;
>
> + idx = (unsigned long) t->idx;
> if (idx < t->count) {
> l = &t->loc[idx];
>
> @@ -6105,16 +6106,18 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
> {
> struct loc_track *t = seq->private;
>
> - v = ppos;
> - ++*ppos;
> + t->idx = ++(*ppos);
> if (*ppos <= t->count)
> - return v;
> + return ppos;

What I had in mind, and to be more in line with the seq_file example, would
be to return &t->idx here. Then it's what gets passed to slab_debugfs_show()
as 'v'. But since we ignore 'v' there, it probably doesn't matter.

>
> return NULL;
> }
>
> static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
> {
> + struct loc_track *t = seq->private;
> +
> + t->idx = *ppos;
> return ppos;

And same here.

> }
>
>