2021-09-29 21:35:17

by Rustam Kovhaev

[permalink] [raw]
Subject: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

For kmalloc() allocations SLOB prepends the blocks with a 4-byte header,
and it puts the size of the allocated blocks in that header.
Blocks allocated with kmem_cache_alloc() allocations do not have that
header.

SLOB explodes when you allocate memory with kmem_cache_alloc() and then
try to free it with kfree() instead of kmem_cache_free().
SLOB will assume that there is a header when there is none, read some
garbage to size variable and corrupt the adjacent objects, which
eventually leads to hang or panic.

Let's make XFS work with SLOB by using proper free function.

Fixes: 9749fee83f38 ("xfs: enable the xfs_defer mechanism to process extents to free")
Signed-off-by: Rustam Kovhaev <[email protected]>
---
fs/xfs/xfs_extfree_item.c | 6 +++---
mm/slob.c | 6 ++++--
2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 3f8a0713573a..a4b8caa2c601 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -482,7 +482,7 @@ xfs_extent_free_finish_item(
free->xefi_startblock,
free->xefi_blockcount,
&free->xefi_oinfo, free->xefi_skip_discard);
- kmem_free(free);
+ kmem_cache_free(xfs_bmap_free_item_zone, free);
return error;
}

@@ -502,7 +502,7 @@ xfs_extent_free_cancel_item(
struct xfs_extent_free_item *free;

free = container_of(item, struct xfs_extent_free_item, xefi_list);
- kmem_free(free);
+ kmem_cache_free(xfs_bmap_free_item_zone, free);
}

const struct xfs_defer_op_type xfs_extent_free_defer_type = {
@@ -564,7 +564,7 @@ xfs_agfl_free_finish_item(
extp->ext_len = free->xefi_blockcount;
efdp->efd_next_extent++;

- kmem_free(free);
+ kmem_cache_free(xfs_bmap_free_item_zone, free);
return error;
}

diff --git a/mm/slob.c b/mm/slob.c
index 74d3f6e60666..d2d859ded5f8 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -389,7 +389,6 @@ static void slob_free(void *block, int size)

if (unlikely(ZERO_OR_NULL_PTR(block)))
return;
- BUG_ON(!size);

sp = virt_to_page(block);
units = SLOB_UNITS(size);
@@ -556,6 +555,7 @@ void kfree(const void *block)
if (PageSlab(sp)) {
int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
unsigned int *m = (unsigned int *)(block - align);
+ BUG_ON(!*m || *m > (PAGE_SIZE - align));
slob_free(m, *m + align);
} else {
unsigned int order = compound_order(sp);
@@ -649,8 +649,10 @@ EXPORT_SYMBOL(kmem_cache_alloc_node);

static void __kmem_cache_free(void *b, int size)
{
- if (size < PAGE_SIZE)
+ if (size < PAGE_SIZE) {
+ BUG_ON(!size);
slob_free(b, size);
+ }
else
slob_free_pages(b, get_order(size));
}
--
2.30.2


2021-09-30 04:44:19

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On Wed, Sep 29, 2021 at 02:23:47PM -0700, Rustam Kovhaev wrote:
> For kmalloc() allocations SLOB prepends the blocks with a 4-byte header,
> and it puts the size of the allocated blocks in that header.
> Blocks allocated with kmem_cache_alloc() allocations do not have that
> header.
>
> SLOB explodes when you allocate memory with kmem_cache_alloc() and then
> try to free it with kfree() instead of kmem_cache_free().
> SLOB will assume that there is a header when there is none, read some
> garbage to size variable and corrupt the adjacent objects, which
> eventually leads to hang or panic.
>
> Let's make XFS work with SLOB by using proper free function.
>
> Fixes: 9749fee83f38 ("xfs: enable the xfs_defer mechanism to process extents to free")
> Signed-off-by: Rustam Kovhaev <[email protected]>

IOWs, XFS has been broken on SLOB for over 5 years and nobody
anywhere has noticed.

And we've just had a discussion where the very best solution was to
use kfree() on kmem_cache_alloc() objects so we didn't ahve to spend
CPU doing global type table lookups or use an extra 8 bytes of
memory per object to track the slab cache just so we could call
kmem_cache_free() with the correct slab cache.

But, of course, SLOB doesn't allow this and I was really tempted to
solve that by adding a Kconfig "depends on SLAB|SLUB" option so that
we don't have to care about SLOB not working.

However, as it turns out that XFS on SLOB has already been broken
for so long, maybe we should just not care about SLOB code and
seriously consider just adding a specific dependency on SLAB|SLUB...

Thoughts?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2021-09-30 08:35:15

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On 9/30/21 06:42, Dave Chinner wrote:
> On Wed, Sep 29, 2021 at 02:23:47PM -0700, Rustam Kovhaev wrote:
>> For kmalloc() allocations SLOB prepends the blocks with a 4-byte header,
>> and it puts the size of the allocated blocks in that header.
>> Blocks allocated with kmem_cache_alloc() allocations do not have that
>> header.
>>
>> SLOB explodes when you allocate memory with kmem_cache_alloc() and then
>> try to free it with kfree() instead of kmem_cache_free().
>> SLOB will assume that there is a header when there is none, read some
>> garbage to size variable and corrupt the adjacent objects, which
>> eventually leads to hang or panic.
>>
>> Let's make XFS work with SLOB by using proper free function.
>>
>> Fixes: 9749fee83f38 ("xfs: enable the xfs_defer mechanism to process extents to free")
>> Signed-off-by: Rustam Kovhaev <[email protected]>
>
> IOWs, XFS has been broken on SLOB for over 5 years and nobody
> anywhere has noticed.
>
> And we've just had a discussion where the very best solution was to
> use kfree() on kmem_cache_alloc() objects so we didn't ahve to spend
> CPU doing global type table lookups or use an extra 8 bytes of
> memory per object to track the slab cache just so we could call
> kmem_cache_free() with the correct slab cache.
>
> But, of course, SLOB doesn't allow this and I was really tempted to
> solve that by adding a Kconfig "depends on SLAB|SLUB" option so that
> we don't have to care about SLOB not working.
>
> However, as it turns out that XFS on SLOB has already been broken
> for so long, maybe we should just not care about SLOB code and
> seriously consider just adding a specific dependency on SLAB|SLUB...

I think it's fair if something like XFS (not meant for tiny systems AFAIK?)
excludes SLOB (meant for tiny systems). Clearly nobody tried to use these
two together last 5 years anyway.
Maybe we could also just add the 4 bytes to all SLOB objects, declare
kfree() is always fine and be done with it. Yes, it will make SLOB footprint
somewhat less tiny, but even whan we added kmalloc power of two alignment
guarantees, the impact on SLOB was negligible.

> Thoughts?
>
> Cheers,
>
> Dave.
>

2021-09-30 19:32:19

by Rustam Kovhaev

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On Thu, Sep 30, 2021 at 10:13:40AM +0200, Vlastimil Babka wrote:
> On 9/30/21 06:42, Dave Chinner wrote:
> > On Wed, Sep 29, 2021 at 02:23:47PM -0700, Rustam Kovhaev wrote:
> >> For kmalloc() allocations SLOB prepends the blocks with a 4-byte header,
> >> and it puts the size of the allocated blocks in that header.
> >> Blocks allocated with kmem_cache_alloc() allocations do not have that
> >> header.
> >>
> >> SLOB explodes when you allocate memory with kmem_cache_alloc() and then
> >> try to free it with kfree() instead of kmem_cache_free().
> >> SLOB will assume that there is a header when there is none, read some
> >> garbage to size variable and corrupt the adjacent objects, which
> >> eventually leads to hang or panic.
> >>
> >> Let's make XFS work with SLOB by using proper free function.
> >>
> >> Fixes: 9749fee83f38 ("xfs: enable the xfs_defer mechanism to process extents to free")
> >> Signed-off-by: Rustam Kovhaev <[email protected]>
> >
> > IOWs, XFS has been broken on SLOB for over 5 years and nobody
> > anywhere has noticed.
> >
> > And we've just had a discussion where the very best solution was to
> > use kfree() on kmem_cache_alloc() objects so we didn't ahve to spend
> > CPU doing global type table lookups or use an extra 8 bytes of
> > memory per object to track the slab cache just so we could call
> > kmem_cache_free() with the correct slab cache.
> >
> > But, of course, SLOB doesn't allow this and I was really tempted to
> > solve that by adding a Kconfig "depends on SLAB|SLUB" option so that
> > we don't have to care about SLOB not working.
> >
> > However, as it turns out that XFS on SLOB has already been broken
> > for so long, maybe we should just not care about SLOB code and
> > seriously consider just adding a specific dependency on SLAB|SLUB...
>
> I think it's fair if something like XFS (not meant for tiny systems AFAIK?)
> excludes SLOB (meant for tiny systems). Clearly nobody tried to use these
> two together last 5 years anyway.

+1 for adding Kconfig option, it seems like some things are not meant to
be together.

> Maybe we could also just add the 4 bytes to all SLOB objects, declare
> kfree() is always fine and be done with it. Yes, it will make SLOB footprint
> somewhat less tiny, but even whan we added kmalloc power of two alignment
> guarantees, the impact on SLOB was negligible.

I'll send a patch to add a 4-byte header for kmem_cache_alloc()
allocations.

> > Thoughts?
> >
> > Cheers,
> >
> > Dave.
> >
>

2021-09-30 21:39:26

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On 9/30/21 8:48 PM, Rustam Kovhaev wrote:
> On Thu, Sep 30, 2021 at 10:13:40AM +0200, Vlastimil Babka wrote:
>>
>> I think it's fair if something like XFS (not meant for tiny systems AFAIK?)
>> excludes SLOB (meant for tiny systems). Clearly nobody tried to use these
>> two together last 5 years anyway.
>
> +1 for adding Kconfig option, it seems like some things are not meant to
> be together.

But if we patch SLOB, we won't need it.

>> Maybe we could also just add the 4 bytes to all SLOB objects, declare
>> kfree() is always fine and be done with it. Yes, it will make SLOB footprint
>> somewhat less tiny, but even whan we added kmalloc power of two alignment
>> guarantees, the impact on SLOB was negligible.
>
> I'll send a patch to add a 4-byte header for kmem_cache_alloc()
> allocations.

Thanks. Please report in the changelog slab usage from /proc/meminfo
before and after patch (at least a snapshot after a full boot).

>>> Thoughts?
>>>
>>> Cheers,
>>>
>>> Dave.
>>>
>>

2021-10-01 00:42:05

by Rustam Kovhaev

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On Thu, Sep 30, 2021 at 11:10:10PM +0200, Vlastimil Babka wrote:
> On 9/30/21 8:48 PM, Rustam Kovhaev wrote:
> > On Thu, Sep 30, 2021 at 10:13:40AM +0200, Vlastimil Babka wrote:
> >>
> >> I think it's fair if something like XFS (not meant for tiny systems AFAIK?)
> >> excludes SLOB (meant for tiny systems). Clearly nobody tried to use these
> >> two together last 5 years anyway.
> >
> > +1 for adding Kconfig option, it seems like some things are not meant to
> > be together.
>
> But if we patch SLOB, we won't need it.

OK, so we consider XFS on SLOB a supported configuration that might be
used and should be tested.
I'll look into maybe adding a config with CONFIG_SLOB and CONFIG_XFS_FS
to syzbot.

It seems that we need to patch SLOB anyway, because any other code can
hit the very same issue.

> >> Maybe we could also just add the 4 bytes to all SLOB objects, declare
> >> kfree() is always fine and be done with it. Yes, it will make SLOB footprint
> >> somewhat less tiny, but even whan we added kmalloc power of two alignment
> >> guarantees, the impact on SLOB was negligible.
> >
> > I'll send a patch to add a 4-byte header for kmem_cache_alloc()
> > allocations.
>
> Thanks. Please report in the changelog slab usage from /proc/meminfo
> before and after patch (at least a snapshot after a full boot).

OK.

2021-10-04 01:08:43

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On Thu, 30 Sep 2021, Rustam Kovhaev wrote:

> > >> I think it's fair if something like XFS (not meant for tiny systems AFAIK?)
> > >> excludes SLOB (meant for tiny systems). Clearly nobody tried to use these
> > >> two together last 5 years anyway.
> > >
> > > +1 for adding Kconfig option, it seems like some things are not meant to
> > > be together.
> >
> > But if we patch SLOB, we won't need it.
>
> OK, so we consider XFS on SLOB a supported configuration that might be
> used and should be tested.
> I'll look into maybe adding a config with CONFIG_SLOB and CONFIG_XFS_FS
> to syzbot.
>
> It seems that we need to patch SLOB anyway, because any other code can
> hit the very same issue.
>

It's probably best to introduce both (SLOB fix and Kconfig change for
XFS), at least in the interim because the combo of XFS and SLOB could be
broken in other ways. If syzbot doesn't complain with a patched kernel to
allow SLOB to be used with XFS, then we could potentially allow them to be
used together.

(I'm not sure that this freeing issue is the *only* thing that is broken,
nor that we have sufficient information to make that determination right
now..)

2021-10-12 20:45:01

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On Sun, Oct 03, 2021 at 06:07:20PM -0700, David Rientjes wrote:
> On Thu, 30 Sep 2021, Rustam Kovhaev wrote:
>
> > > >> I think it's fair if something like XFS (not meant for tiny systems AFAIK?)
> > > >> excludes SLOB (meant for tiny systems). Clearly nobody tried to use these
> > > >> two together last 5 years anyway.
> > > >
> > > > +1 for adding Kconfig option, it seems like some things are not meant to
> > > > be together.
> > >
> > > But if we patch SLOB, we won't need it.
> >
> > OK, so we consider XFS on SLOB a supported configuration that might be
> > used and should be tested.
> > I'll look into maybe adding a config with CONFIG_SLOB and CONFIG_XFS_FS
> > to syzbot.
> >
> > It seems that we need to patch SLOB anyway, because any other code can
> > hit the very same issue.
> >
>
> It's probably best to introduce both (SLOB fix and Kconfig change for
> XFS), at least in the interim because the combo of XFS and SLOB could be
> broken in other ways. If syzbot doesn't complain with a patched kernel to
> allow SLOB to be used with XFS, then we could potentially allow them to be
> used together.
>
> (I'm not sure that this freeing issue is the *only* thing that is broken,
> nor that we have sufficient information to make that determination right
> now..)

I audited the entire xfs (kernel) codebase and didn't find any other
usage errors. Thanks for the patch; I'll apply it to for-next.

--D

2021-10-12 20:46:44

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On Tue, Oct 12, 2021 at 01:43:20PM -0700, Darrick J. Wong wrote:
> On Sun, Oct 03, 2021 at 06:07:20PM -0700, David Rientjes wrote:
> > On Thu, 30 Sep 2021, Rustam Kovhaev wrote:
> >
> > > > >> I think it's fair if something like XFS (not meant for tiny systems AFAIK?)
> > > > >> excludes SLOB (meant for tiny systems). Clearly nobody tried to use these
> > > > >> two together last 5 years anyway.
> > > > >
> > > > > +1 for adding Kconfig option, it seems like some things are not meant to
> > > > > be together.
> > > >
> > > > But if we patch SLOB, we won't need it.
> > >
> > > OK, so we consider XFS on SLOB a supported configuration that might be
> > > used and should be tested.
> > > I'll look into maybe adding a config with CONFIG_SLOB and CONFIG_XFS_FS
> > > to syzbot.
> > >
> > > It seems that we need to patch SLOB anyway, because any other code can
> > > hit the very same issue.
> > >
> >
> > It's probably best to introduce both (SLOB fix and Kconfig change for
> > XFS), at least in the interim because the combo of XFS and SLOB could be
> > broken in other ways. If syzbot doesn't complain with a patched kernel to
> > allow SLOB to be used with XFS, then we could potentially allow them to be
> > used together.
> >
> > (I'm not sure that this freeing issue is the *only* thing that is broken,
> > nor that we have sufficient information to make that determination right
> > now..)
>
> I audited the entire xfs (kernel) codebase and didn't find any other
> usage errors. Thanks for the patch; I'll apply it to for-next.

Also, the obligatory

Reviewed-by: Darrick J. Wong <[email protected]>

--D

>
> --D

2021-10-12 21:36:51

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On 10/12/2021 10:43 PM, Darrick J. Wong wrote:
> On Tue, Oct 12, 2021 at 01:43:20PM -0700, Darrick J. Wong wrote:
>> On Sun, Oct 03, 2021 at 06:07:20PM -0700, David Rientjes wrote:
>>> On Thu, 30 Sep 2021, Rustam Kovhaev wrote:
>>>
>>>>>>> I think it's fair if something like XFS (not meant for tiny systems AFAIK?)
>>>>>>> excludes SLOB (meant for tiny systems). Clearly nobody tried to use these
>>>>>>> two together last 5 years anyway.
>>>>>>
>>>>>> +1 for adding Kconfig option, it seems like some things are not meant to
>>>>>> be together.
>>>>>
>>>>> But if we patch SLOB, we won't need it.
>>>>
>>>> OK, so we consider XFS on SLOB a supported configuration that might be
>>>> used and should be tested.
>>>> I'll look into maybe adding a config with CONFIG_SLOB and CONFIG_XFS_FS
>>>> to syzbot.
>>>>
>>>> It seems that we need to patch SLOB anyway, because any other code can
>>>> hit the very same issue.
>>>>
>>>
>>> It's probably best to introduce both (SLOB fix and Kconfig change for
>>> XFS), at least in the interim because the combo of XFS and SLOB could be
>>> broken in other ways. If syzbot doesn't complain with a patched kernel to
>>> allow SLOB to be used with XFS, then we could potentially allow them to be
>>> used together.
>>>
>>> (I'm not sure that this freeing issue is the *only* thing that is broken,
>>> nor that we have sufficient information to make that determination right
>>> now..)
>>
>> I audited the entire xfs (kernel) codebase and didn't find any other
>> usage errors. Thanks for the patch; I'll apply it to for-next.

Which patch, the one that started this thread and uses kmem_cache_free() instead
of kfree()? I thought we said it's not the best way?

> Also, the obligatory
>
> Reviewed-by: Darrick J. Wong <[email protected]>
>
> --D
>
>>
>> --D

2021-10-12 23:24:07

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On Tue, Oct 12, 2021 at 11:32:25PM +0200, Vlastimil Babka wrote:
> On 10/12/2021 10:43 PM, Darrick J. Wong wrote:
> > On Tue, Oct 12, 2021 at 01:43:20PM -0700, Darrick J. Wong wrote:
> >> On Sun, Oct 03, 2021 at 06:07:20PM -0700, David Rientjes wrote:
> >>> On Thu, 30 Sep 2021, Rustam Kovhaev wrote:
> >>>
> >>>>>>> I think it's fair if something like XFS (not meant for tiny systems AFAIK?)
> >>>>>>> excludes SLOB (meant for tiny systems). Clearly nobody tried to use these
> >>>>>>> two together last 5 years anyway.
> >>>>>>
> >>>>>> +1 for adding Kconfig option, it seems like some things are not meant to
> >>>>>> be together.
> >>>>>
> >>>>> But if we patch SLOB, we won't need it.
> >>>>
> >>>> OK, so we consider XFS on SLOB a supported configuration that might be
> >>>> used and should be tested.
> >>>> I'll look into maybe adding a config with CONFIG_SLOB and CONFIG_XFS_FS
> >>>> to syzbot.
> >>>>
> >>>> It seems that we need to patch SLOB anyway, because any other code can
> >>>> hit the very same issue.
> >>>>
> >>>
> >>> It's probably best to introduce both (SLOB fix and Kconfig change for
> >>> XFS), at least in the interim because the combo of XFS and SLOB could be
> >>> broken in other ways. If syzbot doesn't complain with a patched kernel to
> >>> allow SLOB to be used with XFS, then we could potentially allow them to be
> >>> used together.
> >>>
> >>> (I'm not sure that this freeing issue is the *only* thing that is broken,
> >>> nor that we have sufficient information to make that determination right
> >>> now..)
> >>
> >> I audited the entire xfs (kernel) codebase and didn't find any other
> >> usage errors. Thanks for the patch; I'll apply it to for-next.
>
> Which patch, the one that started this thread and uses kmem_cache_free() instead
> of kfree()? I thought we said it's not the best way?

It's probably better to fix slob to be able to tell that a kmem_free'd
object actually belongs to a cache and should get freed that way, just
like its larger sl[ua]b cousins.

However, even if that does come to pass, anybody /else/ who wants to
start(?) using XFS on a SLOB system will need this patch to fix the
minor papercut. Now that I've checked the rest of the codebase, I don't
find it reasonable to make XFS mutually exclusive with SLOB over two
instances of slab cache misuse. Hence the RVB. :)

--D

> > Also, the obligatory
> >
> > Reviewed-by: Darrick J. Wong <[email protected]>
> >
> > --D
> >
> >>
> >> --D
>

2021-10-13 07:40:47

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On 10/13/21 01:22, Darrick J. Wong wrote:
> On Tue, Oct 12, 2021 at 11:32:25PM +0200, Vlastimil Babka wrote:
>> On 10/12/2021 10:43 PM, Darrick J. Wong wrote:
>> > On Tue, Oct 12, 2021 at 01:43:20PM -0700, Darrick J. Wong wrote:
>> >> On Sun, Oct 03, 2021 at 06:07:20PM -0700, David Rientjes wrote:
>> >>
>> >> I audited the entire xfs (kernel) codebase and didn't find any other
>> >> usage errors. Thanks for the patch; I'll apply it to for-next.
>>
>> Which patch, the one that started this thread and uses kmem_cache_free() instead
>> of kfree()? I thought we said it's not the best way?
>
> It's probably better to fix slob to be able to tell that a kmem_free'd
> object actually belongs to a cache and should get freed that way, just
> like its larger sl[ua]b cousins.

Agreed. Rustam, do you still plan to do that?

> However, even if that does come to pass, anybody /else/ who wants to
> start(?) using XFS on a SLOB system will need this patch to fix the
> minor papercut. Now that I've checked the rest of the codebase, I don't
> find it reasonable to make XFS mutually exclusive with SLOB over two
> instances of slab cache misuse. Hence the RVB. :)

Ok. I was just wondering because Dave's first reply was that actually you'll
need to expand the use of kfree() instead of kmem_cache_free().

2021-10-13 17:01:15

by Rustam Kovhaev

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On Wed, Oct 13, 2021 at 09:38:31AM +0200, Vlastimil Babka wrote:
> On 10/13/21 01:22, Darrick J. Wong wrote:
> > On Tue, Oct 12, 2021 at 11:32:25PM +0200, Vlastimil Babka wrote:
> >> On 10/12/2021 10:43 PM, Darrick J. Wong wrote:
> >> > On Tue, Oct 12, 2021 at 01:43:20PM -0700, Darrick J. Wong wrote:
> >> >> On Sun, Oct 03, 2021 at 06:07:20PM -0700, David Rientjes wrote:
> >> >>
> >> >> I audited the entire xfs (kernel) codebase and didn't find any other
> >> >> usage errors. Thanks for the patch; I'll apply it to for-next.
> >>
> >> Which patch, the one that started this thread and uses kmem_cache_free() instead
> >> of kfree()? I thought we said it's not the best way?
> >
> > It's probably better to fix slob to be able to tell that a kmem_free'd
> > object actually belongs to a cache and should get freed that way, just
> > like its larger sl[ua]b cousins.
>
> Agreed. Rustam, do you still plan to do that?

Yes, I do, thank you.

>
> > However, even if that does come to pass, anybody /else/ who wants to
> > start(?) using XFS on a SLOB system will need this patch to fix the
> > minor papercut. Now that I've checked the rest of the codebase, I don't
> > find it reasonable to make XFS mutually exclusive with SLOB over two
> > instances of slab cache misuse. Hence the RVB. :)
>
> Ok. I was just wondering because Dave's first reply was that actually you'll
> need to expand the use of kfree() instead of kmem_cache_free().
>

2021-10-15 09:57:45

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] xfs: use kmem_cache_free() for kmem_cache objects

On Wed, Oct 13, 2021 at 09:56:41AM -0700, Rustam Kovhaev wrote:
> On Wed, Oct 13, 2021 at 09:38:31AM +0200, Vlastimil Babka wrote:
> > On 10/13/21 01:22, Darrick J. Wong wrote:
> > > On Tue, Oct 12, 2021 at 11:32:25PM +0200, Vlastimil Babka wrote:
> > >> On 10/12/2021 10:43 PM, Darrick J. Wong wrote:
> > >> > On Tue, Oct 12, 2021 at 01:43:20PM -0700, Darrick J. Wong wrote:
> > >> >> On Sun, Oct 03, 2021 at 06:07:20PM -0700, David Rientjes wrote:
> > >> >>
> > >> >> I audited the entire xfs (kernel) codebase and didn't find any other
> > >> >> usage errors. Thanks for the patch; I'll apply it to for-next.
> > >>
> > >> Which patch, the one that started this thread and uses kmem_cache_free() instead
> > >> of kfree()? I thought we said it's not the best way?
> > >
> > > It's probably better to fix slob to be able to tell that a kmem_free'd
> > > object actually belongs to a cache and should get freed that way, just
> > > like its larger sl[ua]b cousins.
> >
> > Agreed. Rustam, do you still plan to do that?
>
> Yes, I do, thank you.

Note that I left out the parts of the patch that changed mm/slob.c
because I didn't think that was appropriate for a patch titled 'xfs:'.

>
> >
> > > However, even if that does come to pass, anybody /else/ who wants to
> > > start(?) using XFS on a SLOB system will need this patch to fix the
> > > minor papercut. Now that I've checked the rest of the codebase, I don't
> > > find it reasonable to make XFS mutually exclusive with SLOB over two
> > > instances of slab cache misuse. Hence the RVB. :)
> >
> > Ok. I was just wondering because Dave's first reply was that actually you'll
> > need to expand the use of kfree() instead of kmem_cache_free().

I look forward to doing this, but since XFS is a downstream consumer of
the kmem apis, we'll have to wait until the slob changes land to do
that.

--D

2021-10-18 03:51:24

by Rustam Kovhaev

[permalink] [raw]
Subject: [PATCH] slob: add size header to all allocations

Let's prepend all allocations of (PAGE_SIZE - align_offset) and less
with the size header. This way kmem_cache_alloc() memory can be freed
with kfree() and the other way around, as long as they are less than
(PAGE_SIZE - align_offset).

The main reason for this change is to simplify SLOB a little bit, make
it a bit easier to debug whenever something goes wrong.

meminfo right after the system boot, without the patch:
Slab: 35500 kB

the same, with the patch:
Slab: 36396 kB

stats for compiling glibc without the patch:
1,493,972.89 msec task-clock # 3.557 CPUs utilized
317,158 context-switches # 212.292 /sec
8,567 cpu-migrations # 5.734 /sec
33,788,323 page-faults # 22.616 K/sec
5,267,687,400,091 cycles # 3.526 GHz
4,388,201,248,601 instructions # 0.83 insn per cycle
885,424,236,657 branches # 592.664 M/sec
14,117,492,893 branch-misses # 1.59% of all branches

420.051843478 seconds time elapsed

472.784856000 seconds user
1024.645256000 seconds sys

the same with the patch:
1,803,990.92 msec task-clock # 3.597 CPUs utilized
330,110 context-switches # 182.989 /sec
9,170 cpu-migrations # 5.083 /sec
33,789,627 page-faults # 18.730 K/sec
6,499,753,661,134 cycles # 3.603 GHz
4,564,216,028,344 instructions # 0.70 insn per cycle
917,120,742,440 branches # 508.384 M/sec
15,068,415,552 branch-misses # 1.64% of all branches

501.519434175 seconds time elapsed

495.587614000 seconds user
1312.652833000 seconds sys

Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Rustam Kovhaev <[email protected]>
---
mm/slob.c | 47 ++++++++++++++++++++++++-----------------------
1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/mm/slob.c b/mm/slob.c
index 74d3f6e60666..3d8fbb33f5a3 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -373,25 +373,28 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
}
if (unlikely(gfp & __GFP_ZERO))
memset(b, 0, size);
+ /* Write size in the header */
+ *(unsigned int *)b = size - align_offset;
+ b = (void *)b + align_offset;
return b;
}

/*
* slob_free: entry point into the slob allocator.
*/
-static void slob_free(void *block, int size)
+static void slob_free(void *block)
{
struct page *sp;
- slob_t *prev, *next, *b = (slob_t *)block;
+ int align_offset = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
+ void *hdr = (void *)block - align_offset;
+ unsigned int size = *(unsigned int *)hdr + align_offset;
+ slob_t *prev, *next, *b = (slob_t *)hdr;
slobidx_t units;
unsigned long flags;
struct list_head *slob_list;

- if (unlikely(ZERO_OR_NULL_PTR(block)))
- return;
- BUG_ON(!size);
-
- sp = virt_to_page(block);
+ BUG_ON(!size || size >= PAGE_SIZE);
+ sp = virt_to_page(hdr);
units = SLOB_UNITS(size);

spin_lock_irqsave(&slob_lock, flags);
@@ -476,7 +479,6 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
static __always_inline void *
__do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
{
- unsigned int *m;
int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
void *ret;

@@ -497,12 +499,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
if (!size)
return ZERO_SIZE_PTR;

- m = slob_alloc(size + minalign, gfp, align, node, minalign);
-
- if (!m)
- return NULL;
- *m = size;
- ret = (void *)m + minalign;
+ ret = slob_alloc(size + minalign, gfp, align, node, minalign);

trace_kmalloc_node(caller, ret,
size, size + minalign, gfp, node);
@@ -554,9 +551,7 @@ void kfree(const void *block)

sp = virt_to_page(block);
if (PageSlab(sp)) {
- int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
- unsigned int *m = (unsigned int *)(block - align);
- slob_free(m, *m + align);
+ slob_free((void *)block);
} else {
unsigned int order = compound_order(sp);
mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
@@ -567,7 +562,6 @@ void kfree(const void *block)
}
EXPORT_SYMBOL(kfree);

-/* can't use ksize for kmem_cache_alloc memory, only kmalloc */
size_t __ksize(const void *block)
{
struct page *sp;
@@ -600,16 +594,17 @@ int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)

static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
{
+ int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
void *b;

flags &= gfp_allowed_mask;

might_alloc(flags);

- if (c->size < PAGE_SIZE) {
- b = slob_alloc(c->size, flags, c->align, node, 0);
+ if (c->size < PAGE_SIZE - minalign) {
+ b = slob_alloc(c->size + minalign, flags, c->align, node, minalign);
trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
- SLOB_UNITS(c->size) * SLOB_UNIT,
+ SLOB_UNITS(c->size + minalign) * SLOB_UNIT,
flags, node);
} else {
b = slob_new_pages(flags, get_order(c->size), node);
@@ -649,8 +644,14 @@ EXPORT_SYMBOL(kmem_cache_alloc_node);

static void __kmem_cache_free(void *b, int size)
{
- if (size < PAGE_SIZE)
- slob_free(b, size);
+ struct page *sp;
+
+ if (unlikely(ZERO_OR_NULL_PTR(b)))
+ return;
+
+ sp = virt_to_page(b);
+ if (PageSlab(sp))
+ slob_free(b);
else
slob_free_pages(b, get_order(size));
}
--
2.30.2

2021-10-18 09:28:16

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] slob: add size header to all allocations

On 10/18/21 05:38, Rustam Kovhaev wrote:
> Let's prepend all allocations of (PAGE_SIZE - align_offset) and less
> with the size header. This way kmem_cache_alloc() memory can be freed
> with kfree() and the other way around, as long as they are less than
> (PAGE_SIZE - align_offset).

This size limitation seems like an unnecessary gotcha. Couldn't we make
these large allocations in slob_alloc_node() (that use slob_new_pages()
directly) similar enough to large kmalloc() ones, so that kfree() can
recognize them and free properly? AFAICS it might mean just adding
__GFP_COMP to make sure there's a compound order stored, as these already
don't seem to set PageSlab.

> The main reason for this change is to simplify SLOB a little bit, make
> it a bit easier to debug whenever something goes wrong.

I would say the main reason is to simplify the slab API and guarantee that
both kmem_cache_alloc() and kmalloc() can be freed by kfree().

We should also update the comments at top of slob.c to reflect the change.
And Documentation/core-api/memory-allocation.rst (the last paragraph).

> meminfo right after the system boot, without the patch:
> Slab: 35500 kB
>
> the same, with the patch:
> Slab: 36396 kB

2.5% increase, hopefully acceptable.

Thanks!

2021-10-19 01:28:26

by Rustam Kovhaev

[permalink] [raw]
Subject: Re: [PATCH] slob: add size header to all allocations

On Mon, Oct 18, 2021 at 11:22:46AM +0200, Vlastimil Babka wrote:
> On 10/18/21 05:38, Rustam Kovhaev wrote:
> > Let's prepend all allocations of (PAGE_SIZE - align_offset) and less
> > with the size header. This way kmem_cache_alloc() memory can be freed
> > with kfree() and the other way around, as long as they are less than
> > (PAGE_SIZE - align_offset).
>
> This size limitation seems like an unnecessary gotcha. Couldn't we make
> these large allocations in slob_alloc_node() (that use slob_new_pages()
> directly) similar enough to large kmalloc() ones, so that kfree() can
> recognize them and free properly? AFAICS it might mean just adding
> __GFP_COMP to make sure there's a compound order stored, as these already
> don't seem to set PageSlab.

Thanks for the pointers, I'll send a new version.

> > The main reason for this change is to simplify SLOB a little bit, make
> > it a bit easier to debug whenever something goes wrong.
>
> I would say the main reason is to simplify the slab API and guarantee that
> both kmem_cache_alloc() and kmalloc() can be freed by kfree().
>
> We should also update the comments at top of slob.c to reflect the change.
> And Documentation/core-api/memory-allocation.rst (the last paragraph).

OK, thank you!

> > meminfo right after the system boot, without the patch:
> > Slab: 35500 kB
> >
> > the same, with the patch:
> > Slab: 36396 kB
>
> 2.5% increase, hopefully acceptable.
>
> Thanks!

2021-10-20 11:48:21

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] slob: add size header to all allocations

On Sun, Oct 17, 2021 at 08:38:41PM -0700, Rustam Kovhaev wrote:
> Let's prepend all allocations of (PAGE_SIZE - align_offset) and less
> with the size header. This way kmem_cache_alloc() memory can be freed
> with kfree() and the other way around, as long as they are less than
> (PAGE_SIZE - align_offset).

Hello Rustam, I measured its impact on memory usage on
tiny kernel configuration as SLOB is used in very small machine.

on x86 32 bit + tinyconfig:
Before:
Slab: 668 kB

After:
Slab: 688~692 kB

it adds 20~24kB.

>
> The main reason for this change is to simplify SLOB a little bit, make
> it a bit easier to debug whenever something goes wrong.
>

It seems acceptable But I wonder it is worth to increase memory usage
to allow freeing kmem_cache_alloc-ed objects by kfree()?

Thanks,
Hyeonggon

> meminfo right after the system boot, without the patch:
> Slab: 35500 kB
>
> the same, with the patch:
> Slab: 36396 kB
>

2021-10-21 17:39:11

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] slob: add size header to all allocations

On 10/20/21 13:46, Hyeonggon Yoo wrote:
> On Sun, Oct 17, 2021 at 08:38:41PM -0700, Rustam Kovhaev wrote:
>> Let's prepend all allocations of (PAGE_SIZE - align_offset) and less
>> with the size header. This way kmem_cache_alloc() memory can be freed
>> with kfree() and the other way around, as long as they are less than
>> (PAGE_SIZE - align_offset).
>
> Hello Rustam, I measured its impact on memory usage on
> tiny kernel configuration as SLOB is used in very small machine.
>
> on x86 32 bit + tinyconfig:
> Before:
> Slab: 668 kB
>
> After:
> Slab: 688~692 kB
>
> it adds 20~24kB.

Thanks for the measurement. That's 3.5% increase.

>
>>
>> The main reason for this change is to simplify SLOB a little bit, make
>> it a bit easier to debug whenever something goes wrong.
>>
>
> It seems acceptable But I wonder it is worth to increase memory usage
> to allow freeing kmem_cache_alloc-ed objects by kfree()?

Not for the reason above, but for providing a useful API guarantee
regardless of selected slab allocator IMHO yes.

> Thanks,
> Hyeonggon
>
>> meminfo right after the system boot, without the patch:
>> Slab: 35500 kB
>>
>> the same, with the patch:
>> Slab: 36396 kB
>>
>

2021-10-23 06:43:45

by Rustam Kovhaev

[permalink] [raw]
Subject: [PATCH v2] slob: add size header to all allocations

Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
size header.
It simplifies the slab API and guarantees that both kmem_cache_alloc()
and kmalloc() memory could be freed by kfree().

meminfo right after the system boot, without the patch:
Slab: 35456 kB

the same, with the patch:
Slab: 36160 kB

Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Rustam Kovhaev <[email protected]>
---
v2:
- Allocate compound pages in slob_alloc_node()
- Use slob_free_pages() in kfree()
- Update documentation

Documentation/core-api/memory-allocation.rst | 4 +-
mm/slob.c | 114 +++++++++----------
2 files changed, 55 insertions(+), 63 deletions(-)

diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst
index 5954ddf6ee13..fea0ed11a7c5 100644
--- a/Documentation/core-api/memory-allocation.rst
+++ b/Documentation/core-api/memory-allocation.rst
@@ -172,5 +172,5 @@ wrappers can allocate memory from that cache.

When the allocated memory is no longer needed it must be freed. You can
use kvfree() for the memory allocated with `kmalloc`, `vmalloc` and
-`kvmalloc`. The slab caches should be freed with kmem_cache_free(). And
-don't forget to destroy the cache with kmem_cache_destroy().
+`kvmalloc`. The slab caches can be freed with kmem_cache_free() or kvfree().
+And don't forget to destroy the cache with kmem_cache_destroy().
diff --git a/mm/slob.c b/mm/slob.c
index 74d3f6e60666..0cba0b569877 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -25,23 +25,18 @@
* into the free list in address order, so this is effectively an
* address-ordered first fit.
*
- * Above this is an implementation of kmalloc/kfree. Blocks returned
- * from kmalloc are prepended with a 4-byte header with the kmalloc size.
- * If kmalloc is asked for objects of PAGE_SIZE or larger, it calls
- * alloc_pages() directly, allocating compound pages so the page order
- * does not have to be separately tracked.
- * These objects are detected in kfree() because PageSlab()
- * is false for them.
+ * Blocks that are less than (PAGE_SIZE - minalign) are prepended with a
+ * 4-byte header with the size. Larger blocks do not have the header and
+ * SLOB calls alloc_pages() directly, allocating compound pages so the
+ * page order does not have to be separately tracked. These objects are
+ * detected in kfree() because PageSlab() is false for them.
*
* SLAB is emulated on top of SLOB by simply calling constructors and
* destructors for every SLAB allocation. Objects are returned with the
* 4-byte alignment unless the SLAB_HWCACHE_ALIGN flag is set, in which
* case the low-level allocator will fragment blocks to create the proper
- * alignment. Again, objects of page-size or greater are allocated by
- * calling alloc_pages(). As SLAB objects know their size, no separate
- * size bookkeeping is necessary and there is essentially no allocation
- * space overhead, and compound pages aren't needed for multi-page
- * allocations.
+ * alignment. Again, objects of (PAGE_SIZE - minalign) or greater are
+ * allocated by calling alloc_pages().
*
* NUMA support in SLOB is fairly simplistic, pushing most of the real
* logic down to the page allocator, and simply doing the node accounting
@@ -207,12 +202,14 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
return page_address(page);
}

-static void slob_free_pages(void *b, int order)
+static void slob_free_pages(struct page *sp, int order)
{
- struct page *sp = virt_to_page(b);
-
- if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += 1 << order;
+ if (PageSlab(sp)) {
+ __ClearPageSlab(sp);
+ page_mapcount_reset(sp);
+ if (current->reclaim_state)
+ current->reclaim_state->reclaimed_slab += 1 << order;
+ }

mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
-(PAGE_SIZE << order));
@@ -247,9 +244,7 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align,
/*
* 'aligned' will hold the address of the slob block so that the
* address 'aligned'+'align_offset' is aligned according to the
- * 'align' parameter. This is for kmalloc() which prepends the
- * allocated block with its size, so that the block itself is
- * aligned when needed.
+ * 'align' parameter.
*/
if (align) {
aligned = (slob_t *)
@@ -373,25 +368,28 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
}
if (unlikely(gfp & __GFP_ZERO))
memset(b, 0, size);
+ /* Write size in the header */
+ *(unsigned int *)b = size - align_offset;
+ b = (void *)b + align_offset;
return b;
}

/*
* slob_free: entry point into the slob allocator.
*/
-static void slob_free(void *block, int size)
+static void slob_free(void *block)
{
struct page *sp;
- slob_t *prev, *next, *b = (slob_t *)block;
+ int align_offset = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
+ void *hdr = block - align_offset;
+ unsigned int size = *(unsigned int *)hdr + align_offset;
+ slob_t *prev, *next, *b = hdr;
slobidx_t units;
unsigned long flags;
struct list_head *slob_list;

- if (unlikely(ZERO_OR_NULL_PTR(block)))
- return;
- BUG_ON(!size);
-
- sp = virt_to_page(block);
+ BUG_ON(!size || size >= PAGE_SIZE);
+ sp = virt_to_page(hdr);
units = SLOB_UNITS(size);

spin_lock_irqsave(&slob_lock, flags);
@@ -401,9 +399,7 @@ static void slob_free(void *block, int size)
if (slob_page_free(sp))
clear_slob_page_free(sp);
spin_unlock_irqrestore(&slob_lock, flags);
- __ClearPageSlab(sp);
- page_mapcount_reset(sp);
- slob_free_pages(b, 0);
+ slob_free_pages(sp, 0);
return;
}

@@ -476,7 +472,6 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
static __always_inline void *
__do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
{
- unsigned int *m;
int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
void *ret;

@@ -497,12 +492,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
if (!size)
return ZERO_SIZE_PTR;

- m = slob_alloc(size + minalign, gfp, align, node, minalign);
-
- if (!m)
- return NULL;
- *m = size;
- ret = (void *)m + minalign;
+ ret = slob_alloc(size + minalign, gfp, align, node, minalign);

trace_kmalloc_node(caller, ret,
size, size + minalign, gfp, node);
@@ -553,21 +543,13 @@ void kfree(const void *block)
kmemleak_free(block);

sp = virt_to_page(block);
- if (PageSlab(sp)) {
- int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
- unsigned int *m = (unsigned int *)(block - align);
- slob_free(m, *m + align);
- } else {
- unsigned int order = compound_order(sp);
- mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
- -(PAGE_SIZE << order));
- __free_pages(sp, order);
-
- }
+ if (PageSlab(sp))
+ slob_free((void *)block);
+ else
+ slob_free_pages(sp, compound_order(sp));
}
EXPORT_SYMBOL(kfree);

-/* can't use ksize for kmem_cache_alloc memory, only kmalloc */
size_t __ksize(const void *block)
{
struct page *sp;
@@ -600,22 +582,26 @@ int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)

static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
{
+ int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
void *b;

flags &= gfp_allowed_mask;

might_alloc(flags);

- if (c->size < PAGE_SIZE) {
- b = slob_alloc(c->size, flags, c->align, node, 0);
+ if (c->size < PAGE_SIZE - minalign) {
+ b = slob_alloc(c->size + minalign, flags, c->align, node, minalign);
trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
- SLOB_UNITS(c->size) * SLOB_UNIT,
+ SLOB_UNITS(c->size + minalign) * SLOB_UNIT,
flags, node);
} else {
- b = slob_new_pages(flags, get_order(c->size), node);
+ unsigned int order = get_order(c->size);
+
+ if (likely(order))
+ flags |= __GFP_COMP;
+ b = slob_new_pages(flags, order, node);
trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
- PAGE_SIZE << get_order(c->size),
- flags, node);
+ PAGE_SIZE << order, flags, node);
}

if (b && c->ctor) {
@@ -647,12 +633,18 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t gfp, int node)
EXPORT_SYMBOL(kmem_cache_alloc_node);
#endif

-static void __kmem_cache_free(void *b, int size)
+static void __kmem_cache_free(void *b)
{
- if (size < PAGE_SIZE)
- slob_free(b, size);
+ struct page *sp;
+
+ if (unlikely(ZERO_OR_NULL_PTR(b)))
+ return;
+
+ sp = virt_to_page(b);
+ if (PageSlab(sp))
+ slob_free(b);
else
- slob_free_pages(b, get_order(size));
+ slob_free_pages(sp, compound_order(sp));
}

static void kmem_rcu_free(struct rcu_head *head)
@@ -660,7 +652,7 @@ static void kmem_rcu_free(struct rcu_head *head)
struct slob_rcu *slob_rcu = (struct slob_rcu *)head;
void *b = (void *)slob_rcu - (slob_rcu->size - sizeof(struct slob_rcu));

- __kmem_cache_free(b, slob_rcu->size);
+ __kmem_cache_free(b);
}

void kmem_cache_free(struct kmem_cache *c, void *b)
@@ -672,7 +664,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
slob_rcu->size = c->size;
call_rcu(&slob_rcu->head, kmem_rcu_free);
} else {
- __kmem_cache_free(b, c->size);
+ __kmem_cache_free(b);
}

trace_kmem_cache_free(_RET_IP_, b, c->name);
--
2.30.2

2021-10-24 10:46:31

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] slob: add size header to all allocations

On Thu, Oct 21, 2021 at 07:36:26PM +0200, Vlastimil Babka wrote:
> On 10/20/21 13:46, Hyeonggon Yoo wrote:
> > On Sun, Oct 17, 2021 at 08:38:41PM -0700, Rustam Kovhaev wrote:
> >> Let's prepend all allocations of (PAGE_SIZE - align_offset) and less
> >> with the size header. This way kmem_cache_alloc() memory can be freed
> >> with kfree() and the other way around, as long as they are less than
> >> (PAGE_SIZE - align_offset).
> >
> > Hello Rustam, I measured its impact on memory usage on
> > tiny kernel configuration as SLOB is used in very small machine.
> >
> > on x86 32 bit + tinyconfig:
> > Before:
> > Slab: 668 kB
> >
> > After:
> > Slab: 688~692 kB
> >
> > it adds 20~24kB.
>
> Thanks for the measurement. That's 3.5% increase.
>

You're welcome.

> >
> >>
> >> The main reason for this change is to simplify SLOB a little bit, make
> >> it a bit easier to debug whenever something goes wrong.
> >>
> >
> > It seems acceptable But I wonder it is worth to increase memory usage
> > to allow freeing kmem_cache_alloc-ed objects by kfree()?
>
> Not for the reason above, but for providing a useful API guarantee
> regardless of selected slab allocator IMHO yes.
>

Mm.. that means some callers free kmem_cache_alloc-ed object using
kfree, and SLAB/SLUB already support that, and SLOB doesn't.

In what situations is freeing using kfree needed?
Wouldn't this make code confusing?

> > Thanks,
> > Hyeonggon
> >
> >> meminfo right after the system boot, without the patch:
> >> Slab: 35500 kB
> >>
> >> the same, with the patch:
> >> Slab: 36396 kB
> >>
> >
>

2021-10-25 08:20:44

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] slob: add size header to all allocations

On 10/24/21 12:43, Hyeonggon Yoo wrote:
>> >>
>> >> The main reason for this change is to simplify SLOB a little bit, make
>> >> it a bit easier to debug whenever something goes wrong.
>> >>
>> >
>> > It seems acceptable But I wonder it is worth to increase memory usage
>> > to allow freeing kmem_cache_alloc-ed objects by kfree()?
>>
>> Not for the reason above, but for providing a useful API guarantee
>> regardless of selected slab allocator IMHO yes.
>>
>
> Mm.. that means some callers free kmem_cache_alloc-ed object using
> kfree, and SLAB/SLUB already support that, and SLOB doesn't.

Exactly. Finding that out started this whole thread.

> In what situations is freeing using kfree needed?
> Wouldn't this make code confusing?

XFS seems to have good reasons - at some common freeing place objects can
appears from multiple caches, and it would be expensive to track their cache
just to free them. See
https://lore.kernel.org/all/[email protected]/

IMHO it really makes sense to support this from API point of view.
kmem_cache_alloc() is basically a more specific version of the generic
kmalloc(). It makes sense if the generic kind of free, that is kfree() works
on those objects too.

>> > Thanks,
>> > Hyeonggon
>> >
>> >> meminfo right after the system boot, without the patch:
>> >> Slab: 35500 kB
>> >>
>> >> the same, with the patch:
>> >> Slab: 36396 kB
>> >>
>> >
>>
>

2021-10-25 13:23:08

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2] slob: add size header to all allocations

On 10/23/21 08:41, Rustam Kovhaev wrote:
> Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
> size header.
> It simplifies the slab API and guarantees that both kmem_cache_alloc()
> and kmalloc() memory could be freed by kfree().
>
> meminfo right after the system boot, without the patch:
> Slab: 35456 kB
>
> the same, with the patch:
> Slab: 36160 kB
>
> Link: https://lore.kernel.org/lkml/[email protected]
> Signed-off-by: Rustam Kovhaev <[email protected]>

Seems overal correct to me, thanks! I'll just suggest some improvements:

> ---
> v2:
> - Allocate compound pages in slob_alloc_node()
> - Use slob_free_pages() in kfree()
> - Update documentation
>
> Documentation/core-api/memory-allocation.rst | 4 +-
> mm/slob.c | 114 +++++++++----------
> 2 files changed, 55 insertions(+), 63 deletions(-)
>
> diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst
> index 5954ddf6ee13..fea0ed11a7c5 100644
> --- a/Documentation/core-api/memory-allocation.rst
> +++ b/Documentation/core-api/memory-allocation.rst
> @@ -172,5 +172,5 @@ wrappers can allocate memory from that cache.
>
> When the allocated memory is no longer needed it must be freed. You can
> use kvfree() for the memory allocated with `kmalloc`, `vmalloc` and
> -`kvmalloc`. The slab caches should be freed with kmem_cache_free(). And
> -don't forget to destroy the cache with kmem_cache_destroy().
> +`kvmalloc`. The slab caches can be freed with kmem_cache_free() or kvfree().
> +And don't forget to destroy the cache with kmem_cache_destroy().

I would phrase it like this (improves also weird wording "The slab caches
should be freed with..." prior to your patch, etc.):

When the allocated memory is no longer needed it must be freed. Objects
allocated by `kmalloc` can be freed by `kfree` or `kvfree`.
Objects allocated by `kmem_cache_alloc` can be freed with `kmem_cache_free`
or also by `kfree` or `kvfree`.
Memory allocated by `vmalloc` can be freed with `vfree` or `kvfree`.
Memory allocated by `kvmalloc` can be freed with `kvfree`.
Caches created by `kmem_cache_create` should be freed with with
`kmem_cache_destroy`.

> -static void slob_free_pages(void *b, int order)
> +static void slob_free_pages(struct page *sp, int order)
> {
> - struct page *sp = virt_to_page(b);
> -
> - if (current->reclaim_state)
> - current->reclaim_state->reclaimed_slab += 1 << order;
> + if (PageSlab(sp)) {
> + __ClearPageSlab(sp);
> + page_mapcount_reset(sp);
> + if (current->reclaim_state)
> + current->reclaim_state->reclaimed_slab += 1 << order;
> + }
>
> mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
> -(PAGE_SIZE << order));
> @@ -247,9 +244,7 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align,
> /*
> * 'aligned' will hold the address of the slob block so that the
> * address 'aligned'+'align_offset' is aligned according to the
> - * 'align' parameter. This is for kmalloc() which prepends the
> - * allocated block with its size, so that the block itself is
> - * aligned when needed.
> + * 'align' parameter.
> */
> if (align) {
> aligned = (slob_t *)
> @@ -373,25 +368,28 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
> }
> if (unlikely(gfp & __GFP_ZERO))
> memset(b, 0, size);
> + /* Write size in the header */
> + *(unsigned int *)b = size - align_offset;
> + b = (void *)b + align_offset;
> return b;

I would just "return (void *)b + align_offset;" here, no need to update 'b'.

> }
>
> /*
> * slob_free: entry point into the slob allocator.
> */
> -static void slob_free(void *block, int size)
> +static void slob_free(void *block)
> {
> struct page *sp;
> - slob_t *prev, *next, *b = (slob_t *)block;
> + int align_offset = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);

This patch adds a number of these in several functions, it was just
__do_kmalloc_node(). It's compile-time constant so I would just #define it
somewhere at the top of slob.c, e.g. something like:

#if ARCH_KMALLOC_MINALIGN < ARCH_SLAB_MINALIGN
#define SLOB_HDR_SIZE ARCH_SLAB_MINALIGN
#else
#define SLOB_HDR_SIZE ARCH_KMALLOC_MINALIGN
#endif

> + void *hdr = block - align_offset;
> + unsigned int size = *(unsigned int *)hdr + align_offset;
> + slob_t *prev, *next, *b = hdr;

IMHO this is too subtle to put in the declaration. I would move these
assignments below the declarations.

That way you can also ditch 'hdr' and just do a 'block -= SLOB_HDR_SIZE;';

> slobidx_t units;
> unsigned long flags;
> struct list_head *slob_list;
>
> - if (unlikely(ZERO_OR_NULL_PTR(block)))
> - return;
> - BUG_ON(!size);
> -
> - sp = virt_to_page(block);
> + BUG_ON(!size || size >= PAGE_SIZE);
> + sp = virt_to_page(hdr);
> units = SLOB_UNITS(size);
>
> spin_lock_irqsave(&slob_lock, flags);

2021-10-26 04:46:31

by Rustam Kovhaev

[permalink] [raw]
Subject: Re: [PATCH v2] slob: add size header to all allocations

On Mon, Oct 25, 2021 at 11:36:53AM +0200, Vlastimil Babka wrote:
> On 10/23/21 08:41, Rustam Kovhaev wrote:
> > Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
> > size header.
> > It simplifies the slab API and guarantees that both kmem_cache_alloc()
> > and kmalloc() memory could be freed by kfree().
> >
> > meminfo right after the system boot, without the patch:
> > Slab: 35456 kB
> >
> > the same, with the patch:
> > Slab: 36160 kB
> >
> > Link: https://lore.kernel.org/lkml/[email protected]
> > Signed-off-by: Rustam Kovhaev <[email protected]>
>
> Seems overal correct to me, thanks! I'll just suggest some improvements:

Thank you, I'll send a v3.

2021-10-29 03:08:52

by Rustam Kovhaev

[permalink] [raw]
Subject: [PATCH v3] slob: add size header to all allocations

Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
size header.
It simplifies the slab API and guarantees that both kmem_cache_alloc()
and kmalloc() memory could be freed by kfree().

meminfo right after the system boot, x86-64 on xfs, without the patch:
Slab: 35456 kB

the same, with the patch:
Slab: 36100 kB

Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Rustam Kovhaev <[email protected]>
---
v3:
- Add SLOB_HDR_SIZE define
- Remove references to minalign
- Improve documentation wording

v2:
- Allocate compound pages in slob_alloc_node()
- Use slob_free_pages() in kfree()
- Update documentation
---
Documentation/core-api/memory-allocation.rst | 12 +-
mm/slob.c | 132 +++++++++----------
2 files changed, 68 insertions(+), 76 deletions(-)

diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst
index 5954ddf6ee13..31806d5ebeec 100644
--- a/Documentation/core-api/memory-allocation.rst
+++ b/Documentation/core-api/memory-allocation.rst
@@ -170,7 +170,11 @@ should be used if a part of the cache might be copied to the userspace.
After the cache is created kmem_cache_alloc() and its convenience
wrappers can allocate memory from that cache.

-When the allocated memory is no longer needed it must be freed. You can
-use kvfree() for the memory allocated with `kmalloc`, `vmalloc` and
-`kvmalloc`. The slab caches should be freed with kmem_cache_free(). And
-don't forget to destroy the cache with kmem_cache_destroy().
+When the allocated memory is no longer needed it must be freed.
+Objects allocated by `kmalloc` can be freed with `kfree` or `kvfree`.
+Objects allocated by `kmem_cache_alloc` can be freed with `kmem_cache_free`
+or also by `kfree` or `kvfree`.
+Memory allocated by `vmalloc` can be freed with `vfree` or `kvfree`.
+Memory allocated by `kvmalloc` can be freed with `kvfree`.
+Caches created by `kmem_cache_create` should be freed with
+`kmem_cache_destroy`.
diff --git a/mm/slob.c b/mm/slob.c
index 74d3f6e60666..4a4f3fe40a59 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -25,23 +25,18 @@
* into the free list in address order, so this is effectively an
* address-ordered first fit.
*
- * Above this is an implementation of kmalloc/kfree. Blocks returned
- * from kmalloc are prepended with a 4-byte header with the kmalloc size.
- * If kmalloc is asked for objects of PAGE_SIZE or larger, it calls
- * alloc_pages() directly, allocating compound pages so the page order
- * does not have to be separately tracked.
- * These objects are detected in kfree() because PageSlab()
- * is false for them.
+ * Blocks that are less than (PAGE_SIZE - SLOB_HDR_SIZE) are prepended with
+ * a 4-byte header with the size. Larger blocks do not have the header and
+ * SLOB calls alloc_pages() directly, allocating compound pages so the
+ * page order does not have to be separately tracked. These objects are
+ * detected in kfree() because PageSlab() is false for them.
*
* SLAB is emulated on top of SLOB by simply calling constructors and
* destructors for every SLAB allocation. Objects are returned with the
* 4-byte alignment unless the SLAB_HWCACHE_ALIGN flag is set, in which
* case the low-level allocator will fragment blocks to create the proper
- * alignment. Again, objects of page-size or greater are allocated by
- * calling alloc_pages(). As SLAB objects know their size, no separate
- * size bookkeeping is necessary and there is essentially no allocation
- * space overhead, and compound pages aren't needed for multi-page
- * allocations.
+ * alignment. Again, objects of (PAGE_SIZE - SLOB_HDR_SIZE) or greater are
+ * allocated by calling alloc_pages().
*
* NUMA support in SLOB is fairly simplistic, pushing most of the real
* logic down to the page allocator, and simply doing the node accounting
@@ -88,6 +83,8 @@ typedef s16 slobidx_t;
typedef s32 slobidx_t;
#endif

+#define SLOB_HDR_SIZE max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN)
+
struct slob_block {
slobidx_t units;
};
@@ -207,12 +204,14 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
return page_address(page);
}

-static void slob_free_pages(void *b, int order)
+static void slob_free_pages(struct page *sp, int order)
{
- struct page *sp = virt_to_page(b);
-
- if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += 1 << order;
+ if (PageSlab(sp)) {
+ __ClearPageSlab(sp);
+ page_mapcount_reset(sp);
+ if (current->reclaim_state)
+ current->reclaim_state->reclaimed_slab += 1 << order;
+ }

mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
-(PAGE_SIZE << order));
@@ -247,9 +246,7 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align,
/*
* 'aligned' will hold the address of the slob block so that the
* address 'aligned'+'align_offset' is aligned according to the
- * 'align' parameter. This is for kmalloc() which prepends the
- * allocated block with its size, so that the block itself is
- * aligned when needed.
+ * 'align' parameter.
*/
if (align) {
aligned = (slob_t *)
@@ -373,25 +370,26 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
}
if (unlikely(gfp & __GFP_ZERO))
memset(b, 0, size);
- return b;
+ /* Write size in the header */
+ *(unsigned int *)b = size - align_offset;
+ return (void *)b + align_offset;
}

/*
* slob_free: entry point into the slob allocator.
*/
-static void slob_free(void *block, int size)
+static void slob_free(void *block)
{
struct page *sp;
- slob_t *prev, *next, *b = (slob_t *)block;
+ slob_t *prev, *next, *b = block - SLOB_HDR_SIZE;
+ unsigned int size;
slobidx_t units;
unsigned long flags;
struct list_head *slob_list;

- if (unlikely(ZERO_OR_NULL_PTR(block)))
- return;
- BUG_ON(!size);
-
- sp = virt_to_page(block);
+ size = *(unsigned int *)b + SLOB_HDR_SIZE;
+ BUG_ON(!size || size >= PAGE_SIZE);
+ sp = virt_to_page(b);
units = SLOB_UNITS(size);

spin_lock_irqsave(&slob_lock, flags);
@@ -401,9 +399,7 @@ static void slob_free(void *block, int size)
if (slob_page_free(sp))
clear_slob_page_free(sp);
spin_unlock_irqrestore(&slob_lock, flags);
- __ClearPageSlab(sp);
- page_mapcount_reset(sp);
- slob_free_pages(b, 0);
+ slob_free_pages(sp, 0);
return;
}

@@ -476,36 +472,29 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
static __always_inline void *
__do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
{
- unsigned int *m;
- int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
void *ret;

gfp &= gfp_allowed_mask;

might_alloc(gfp);

- if (size < PAGE_SIZE - minalign) {
- int align = minalign;
+ if (size < PAGE_SIZE - SLOB_HDR_SIZE) {
+ int align = SLOB_HDR_SIZE;

/*
* For power of two sizes, guarantee natural alignment for
* kmalloc()'d objects.
*/
if (is_power_of_2(size))
- align = max(minalign, (int) size);
+ align = max(align, (int) size);

if (!size)
return ZERO_SIZE_PTR;

- m = slob_alloc(size + minalign, gfp, align, node, minalign);
-
- if (!m)
- return NULL;
- *m = size;
- ret = (void *)m + minalign;
+ ret = slob_alloc(size + SLOB_HDR_SIZE, gfp, align, node, SLOB_HDR_SIZE);

trace_kmalloc_node(caller, ret,
- size, size + minalign, gfp, node);
+ size, size + SLOB_HDR_SIZE, gfp, node);
} else {
unsigned int order = get_order(size);

@@ -553,26 +542,17 @@ void kfree(const void *block)
kmemleak_free(block);

sp = virt_to_page(block);
- if (PageSlab(sp)) {
- int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
- unsigned int *m = (unsigned int *)(block - align);
- slob_free(m, *m + align);
- } else {
- unsigned int order = compound_order(sp);
- mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
- -(PAGE_SIZE << order));
- __free_pages(sp, order);
-
- }
+ if (PageSlab(sp))
+ slob_free((void *)block);
+ else
+ slob_free_pages(sp, compound_order(sp));
}
EXPORT_SYMBOL(kfree);

-/* can't use ksize for kmem_cache_alloc memory, only kmalloc */
size_t __ksize(const void *block)
{
struct page *sp;
- int align;
- unsigned int *m;
+ unsigned int size;

BUG_ON(!block);
if (unlikely(block == ZERO_SIZE_PTR))
@@ -582,9 +562,8 @@ size_t __ksize(const void *block)
if (unlikely(!PageSlab(sp)))
return page_size(sp);

- align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
- m = (unsigned int *)(block - align);
- return SLOB_UNITS(*m) * SLOB_UNIT;
+ size = *(unsigned int *)(block - SLOB_HDR_SIZE);
+ return SLOB_UNITS(size) * SLOB_UNIT;
}
EXPORT_SYMBOL(__ksize);

@@ -606,16 +585,19 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)

might_alloc(flags);

- if (c->size < PAGE_SIZE) {
- b = slob_alloc(c->size, flags, c->align, node, 0);
+ if (c->size < PAGE_SIZE - SLOB_HDR_SIZE) {
+ b = slob_alloc(c->size + SLOB_HDR_SIZE, flags, c->align, node, SLOB_HDR_SIZE);
trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
- SLOB_UNITS(c->size) * SLOB_UNIT,
+ SLOB_UNITS(c->size + SLOB_HDR_SIZE) * SLOB_UNIT,
flags, node);
} else {
- b = slob_new_pages(flags, get_order(c->size), node);
+ unsigned int order = get_order(c->size);
+
+ if (likely(order))
+ flags |= __GFP_COMP;
+ b = slob_new_pages(flags, order, node);
trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
- PAGE_SIZE << get_order(c->size),
- flags, node);
+ PAGE_SIZE << order, flags, node);
}

if (b && c->ctor) {
@@ -647,12 +629,18 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t gfp, int node)
EXPORT_SYMBOL(kmem_cache_alloc_node);
#endif

-static void __kmem_cache_free(void *b, int size)
+static void __kmem_cache_free(void *b)
{
- if (size < PAGE_SIZE)
- slob_free(b, size);
+ struct page *sp;
+
+ if (unlikely(ZERO_OR_NULL_PTR(b)))
+ return;
+
+ sp = virt_to_page(b);
+ if (PageSlab(sp))
+ slob_free(b);
else
- slob_free_pages(b, get_order(size));
+ slob_free_pages(sp, compound_order(sp));
}

static void kmem_rcu_free(struct rcu_head *head)
@@ -660,7 +648,7 @@ static void kmem_rcu_free(struct rcu_head *head)
struct slob_rcu *slob_rcu = (struct slob_rcu *)head;
void *b = (void *)slob_rcu - (slob_rcu->size - sizeof(struct slob_rcu));

- __kmem_cache_free(b, slob_rcu->size);
+ __kmem_cache_free(b);
}

void kmem_cache_free(struct kmem_cache *c, void *b)
@@ -672,7 +660,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
slob_rcu->size = c->size;
call_rcu(&slob_rcu->head, kmem_rcu_free);
} else {
- __kmem_cache_free(b, c->size);
+ __kmem_cache_free(b);
}

trace_kmem_cache_free(_RET_IP_, b, c->name);
--
2.30.2

2021-11-16 11:26:53

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3] slob: add size header to all allocations

On 10/29/21 05:05, Rustam Kovhaev wrote:
> Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
> size header.
> It simplifies the slab API and guarantees that both kmem_cache_alloc()
> and kmalloc() memory could be freed by kfree().
>
> meminfo right after the system boot, x86-64 on xfs, without the patch:
> Slab: 35456 kB
>
> the same, with the patch:
> Slab: 36100 kB
>
> Link: https://lore.kernel.org/lkml/[email protected]
> Signed-off-by: Rustam Kovhaev <[email protected]>

Sorry for the late reply. I think we can further simplify this. We have:

static void *slob_alloc(size_t size, gfp_t gfp, int align,
int node, int align_offset)

Previously there was one caller that passed size as unchanged, align_offset
= 0, the other passed size + SLOB_HDR_SIZE, align_offset = SLOB_HDR_SIZE.
Now both callers do the latter. We can drop the align_offset parameter and
pass size unchanged. slob_alloc() can internally add SLOB_HDR_SIZE to size,
and use SLOB_HDR_SIZE instead of align_offset.

Thanks,
Vlastimil


2021-11-16 23:19:35

by Rustam Kovhaev

[permalink] [raw]
Subject: Re: [PATCH v3] slob: add size header to all allocations

On Tue, Nov 16, 2021 at 12:26:17PM +0100, Vlastimil Babka wrote:
> On 10/29/21 05:05, Rustam Kovhaev wrote:
> > Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
> > size header.
> > It simplifies the slab API and guarantees that both kmem_cache_alloc()
> > and kmalloc() memory could be freed by kfree().
> >
> > meminfo right after the system boot, x86-64 on xfs, without the patch:
> > Slab: 35456 kB
> >
> > the same, with the patch:
> > Slab: 36100 kB
> >
> > Link: https://lore.kernel.org/lkml/[email protected]
> > Signed-off-by: Rustam Kovhaev <[email protected]>
>
> Sorry for the late reply. I think we can further simplify this. We have:
>
> static void *slob_alloc(size_t size, gfp_t gfp, int align,
> int node, int align_offset)
>
> Previously there was one caller that passed size as unchanged, align_offset
> = 0, the other passed size + SLOB_HDR_SIZE, align_offset = SLOB_HDR_SIZE.
> Now both callers do the latter. We can drop the align_offset parameter and
> pass size unchanged. slob_alloc() can internally add SLOB_HDR_SIZE to size,
> and use SLOB_HDR_SIZE instead of align_offset.

Thank you, I'll send a v4.


2021-11-22 01:30:52

by Rustam Kovhaev

[permalink] [raw]
Subject: [PATCH v4] slob: add size header to all allocations

Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
size header.
It simplifies the slab API and guarantees that both kmem_cache_alloc()
and kmalloc() memory could be freed by kfree().

meminfo right after the system boot, x86-64 on xfs, without the patch:
Slab: 34700 kB

the same, with the patch:
Slab: 35752 kB

Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Rustam Kovhaev <[email protected]>
---
v4:
- Drop the align_offset parameter and pass size unchanged to
slob_alloc()
- Add SLOB_HDR_SIZE to size in slob_alloc()

v3:
- Add SLOB_HDR_SIZE define
- Remove references to minalign
- Improve documentation wording

v2:
- Allocate compound pages in slob_alloc_node()
- Use slob_free_pages() in kfree()
- Update documentation
---
Documentation/core-api/memory-allocation.rst | 12 +-
mm/slob.c | 140 +++++++++----------
2 files changed, 72 insertions(+), 80 deletions(-)

diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst
index 5954ddf6ee13..31806d5ebeec 100644
--- a/Documentation/core-api/memory-allocation.rst
+++ b/Documentation/core-api/memory-allocation.rst
@@ -170,7 +170,11 @@ should be used if a part of the cache might be copied to the userspace.
After the cache is created kmem_cache_alloc() and its convenience
wrappers can allocate memory from that cache.

-When the allocated memory is no longer needed it must be freed. You can
-use kvfree() for the memory allocated with `kmalloc`, `vmalloc` and
-`kvmalloc`. The slab caches should be freed with kmem_cache_free(). And
-don't forget to destroy the cache with kmem_cache_destroy().
+When the allocated memory is no longer needed it must be freed.
+Objects allocated by `kmalloc` can be freed with `kfree` or `kvfree`.
+Objects allocated by `kmem_cache_alloc` can be freed with `kmem_cache_free`
+or also by `kfree` or `kvfree`.
+Memory allocated by `vmalloc` can be freed with `vfree` or `kvfree`.
+Memory allocated by `kvmalloc` can be freed with `kvfree`.
+Caches created by `kmem_cache_create` should be freed with
+`kmem_cache_destroy`.
diff --git a/mm/slob.c b/mm/slob.c
index 03deee1e6a94..5cc49102e070 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -25,23 +25,18 @@
* into the free list in address order, so this is effectively an
* address-ordered first fit.
*
- * Above this is an implementation of kmalloc/kfree. Blocks returned
- * from kmalloc are prepended with a 4-byte header with the kmalloc size.
- * If kmalloc is asked for objects of PAGE_SIZE or larger, it calls
- * alloc_pages() directly, allocating compound pages so the page order
- * does not have to be separately tracked.
- * These objects are detected in kfree() because PageSlab()
- * is false for them.
+ * Blocks that are less than (PAGE_SIZE - SLOB_HDR_SIZE) are prepended with
+ * a 4-byte header with the size. Larger blocks do not have the header and
+ * SLOB calls alloc_pages() directly, allocating compound pages so the
+ * page order does not have to be separately tracked. These objects are
+ * detected in kfree() because PageSlab() is false for them.
*
* SLAB is emulated on top of SLOB by simply calling constructors and
* destructors for every SLAB allocation. Objects are returned with the
* 4-byte alignment unless the SLAB_HWCACHE_ALIGN flag is set, in which
* case the low-level allocator will fragment blocks to create the proper
- * alignment. Again, objects of page-size or greater are allocated by
- * calling alloc_pages(). As SLAB objects know their size, no separate
- * size bookkeeping is necessary and there is essentially no allocation
- * space overhead, and compound pages aren't needed for multi-page
- * allocations.
+ * alignment. Again, objects of (PAGE_SIZE - SLOB_HDR_SIZE) or greater are
+ * allocated by calling alloc_pages().
*
* NUMA support in SLOB is fairly simplistic, pushing most of the real
* logic down to the page allocator, and simply doing the node accounting
@@ -88,6 +83,8 @@ typedef s16 slobidx_t;
typedef s32 slobidx_t;
#endif

+#define SLOB_HDR_SIZE max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN)
+
struct slob_block {
slobidx_t units;
};
@@ -207,12 +204,14 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
return page_address(page);
}

-static void slob_free_pages(void *b, int order)
+static void slob_free_pages(struct page *sp, int order)
{
- struct page *sp = virt_to_page(b);
-
- if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += 1 << order;
+ if (PageSlab(sp)) {
+ __ClearPageSlab(sp);
+ page_mapcount_reset(sp);
+ if (current->reclaim_state)
+ current->reclaim_state->reclaimed_slab += 1 << order;
+ }

mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
-(PAGE_SIZE << order));
@@ -247,9 +246,7 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align,
/*
* 'aligned' will hold the address of the slob block so that the
* address 'aligned'+'align_offset' is aligned according to the
- * 'align' parameter. This is for kmalloc() which prepends the
- * allocated block with its size, so that the block itself is
- * aligned when needed.
+ * 'align' parameter.
*/
if (align) {
aligned = (slob_t *)
@@ -298,8 +295,7 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align,
/*
* slob_alloc: entry point into the slob allocator.
*/
-static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
- int align_offset)
+static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
{
struct page *sp;
struct list_head *slob_list;
@@ -307,6 +303,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
unsigned long flags;
bool _unused;

+ size += SLOB_HDR_SIZE;
if (size < SLOB_BREAK1)
slob_list = &free_slob_small;
else if (size < SLOB_BREAK2)
@@ -330,7 +327,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
if (sp->units < SLOB_UNITS(size))
continue;

- b = slob_page_alloc(sp, size, align, align_offset, &page_removed_from_list);
+ b = slob_page_alloc(sp, size, align, SLOB_HDR_SIZE, &page_removed_from_list);
if (!b)
continue;

@@ -367,31 +364,32 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
INIT_LIST_HEAD(&sp->slab_list);
set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
set_slob_page_free(sp, slob_list);
- b = slob_page_alloc(sp, size, align, align_offset, &_unused);
+ b = slob_page_alloc(sp, size, align, SLOB_HDR_SIZE, &_unused);
BUG_ON(!b);
spin_unlock_irqrestore(&slob_lock, flags);
}
if (unlikely(gfp & __GFP_ZERO))
memset(b, 0, size);
- return b;
+ /* Write size in the header */
+ *(unsigned int *)b = size - SLOB_HDR_SIZE;
+ return (void *)b + SLOB_HDR_SIZE;
}

/*
* slob_free: entry point into the slob allocator.
*/
-static void slob_free(void *block, int size)
+static void slob_free(void *block)
{
struct page *sp;
- slob_t *prev, *next, *b = (slob_t *)block;
+ slob_t *prev, *next, *b = block - SLOB_HDR_SIZE;
+ unsigned int size;
slobidx_t units;
unsigned long flags;
struct list_head *slob_list;

- if (unlikely(ZERO_OR_NULL_PTR(block)))
- return;
- BUG_ON(!size);
-
- sp = virt_to_page(block);
+ size = *(unsigned int *)b + SLOB_HDR_SIZE;
+ BUG_ON(!size || size >= PAGE_SIZE);
+ sp = virt_to_page(b);
units = SLOB_UNITS(size);

spin_lock_irqsave(&slob_lock, flags);
@@ -401,9 +399,7 @@ static void slob_free(void *block, int size)
if (slob_page_free(sp))
clear_slob_page_free(sp);
spin_unlock_irqrestore(&slob_lock, flags);
- __ClearPageSlab(sp);
- page_mapcount_reset(sp);
- slob_free_pages(b, 0);
+ slob_free_pages(sp, 0);
return;
}

@@ -476,36 +472,29 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
static __always_inline void *
__do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
{
- unsigned int *m;
- int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
void *ret;

gfp &= gfp_allowed_mask;

might_alloc(gfp);

- if (size < PAGE_SIZE - minalign) {
- int align = minalign;
+ if (size < PAGE_SIZE - SLOB_HDR_SIZE) {
+ int align = SLOB_HDR_SIZE;

/*
* For power of two sizes, guarantee natural alignment for
* kmalloc()'d objects.
*/
if (is_power_of_2(size))
- align = max(minalign, (int) size);
+ align = max(align, (int) size);

if (!size)
return ZERO_SIZE_PTR;

- m = slob_alloc(size + minalign, gfp, align, node, minalign);
-
- if (!m)
- return NULL;
- *m = size;
- ret = (void *)m + minalign;
+ ret = slob_alloc(size, gfp, align, node);

trace_kmalloc_node(caller, ret,
- size, size + minalign, gfp, node);
+ size, size + SLOB_HDR_SIZE, gfp, node);
} else {
unsigned int order = get_order(size);

@@ -553,26 +542,17 @@ void kfree(const void *block)
kmemleak_free(block);

sp = virt_to_page(block);
- if (PageSlab(sp)) {
- int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
- unsigned int *m = (unsigned int *)(block - align);
- slob_free(m, *m + align);
- } else {
- unsigned int order = compound_order(sp);
- mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
- -(PAGE_SIZE << order));
- __free_pages(sp, order);
-
- }
+ if (PageSlab(sp))
+ slob_free((void *)block);
+ else
+ slob_free_pages(sp, compound_order(sp));
}
EXPORT_SYMBOL(kfree);

-/* can't use ksize for kmem_cache_alloc memory, only kmalloc */
size_t __ksize(const void *block)
{
struct page *sp;
- int align;
- unsigned int *m;
+ unsigned int size;

BUG_ON(!block);
if (unlikely(block == ZERO_SIZE_PTR))
@@ -582,9 +562,8 @@ size_t __ksize(const void *block)
if (unlikely(!PageSlab(sp)))
return page_size(sp);

- align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
- m = (unsigned int *)(block - align);
- return SLOB_UNITS(*m) * SLOB_UNIT;
+ size = *(unsigned int *)(block - SLOB_HDR_SIZE);
+ return SLOB_UNITS(size) * SLOB_UNIT;
}
EXPORT_SYMBOL(__ksize);

@@ -606,16 +585,19 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)

might_alloc(flags);

- if (c->size < PAGE_SIZE) {
- b = slob_alloc(c->size, flags, c->align, node, 0);
+ if (c->size < PAGE_SIZE - SLOB_HDR_SIZE) {
+ b = slob_alloc(c->size, flags, c->align, node);
trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
- SLOB_UNITS(c->size) * SLOB_UNIT,
+ SLOB_UNITS(c->size + SLOB_HDR_SIZE) * SLOB_UNIT,
flags, node);
} else {
- b = slob_new_pages(flags, get_order(c->size), node);
+ unsigned int order = get_order(c->size);
+
+ if (likely(order))
+ flags |= __GFP_COMP;
+ b = slob_new_pages(flags, order, node);
trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
- PAGE_SIZE << get_order(c->size),
- flags, node);
+ PAGE_SIZE << order, flags, node);
}

if (b && c->ctor) {
@@ -647,12 +629,18 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t gfp, int node)
EXPORT_SYMBOL(kmem_cache_alloc_node);
#endif

-static void __kmem_cache_free(void *b, int size)
+static void __kmem_cache_free(void *b)
{
- if (size < PAGE_SIZE)
- slob_free(b, size);
+ struct page *sp;
+
+ if (unlikely(ZERO_OR_NULL_PTR(b)))
+ return;
+
+ sp = virt_to_page(b);
+ if (PageSlab(sp))
+ slob_free(b);
else
- slob_free_pages(b, get_order(size));
+ slob_free_pages(sp, compound_order(sp));
}

static void kmem_rcu_free(struct rcu_head *head)
@@ -660,7 +648,7 @@ static void kmem_rcu_free(struct rcu_head *head)
struct slob_rcu *slob_rcu = (struct slob_rcu *)head;
void *b = (void *)slob_rcu - (slob_rcu->size - sizeof(struct slob_rcu));

- __kmem_cache_free(b, slob_rcu->size);
+ __kmem_cache_free(b);
}

void kmem_cache_free(struct kmem_cache *c, void *b)
@@ -673,7 +661,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
slob_rcu->size = c->size;
call_rcu(&slob_rcu->head, kmem_rcu_free);
} else {
- __kmem_cache_free(b, c->size);
+ __kmem_cache_free(b);
}
}
EXPORT_SYMBOL(kmem_cache_free);
--
2.30.2


Subject: Re: [PATCH v4] slob: add size header to all allocations

On Sun, 21 Nov 2021, Rustam Kovhaev wrote:

> Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
> size header.
> It simplifies the slab API and guarantees that both kmem_cache_alloc()
> and kmalloc() memory could be freed by kfree().
>
> meminfo right after the system boot, x86-64 on xfs, without the patch:
> Slab: 34700 kB
>
> the same, with the patch:
> Slab: 35752 kB

> +#define SLOB_HDR_SIZE max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN)

Ok that is up to 128 bytes on some architectues. Mostly 32 or 64 bytes.

> @@ -307,6 +303,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
> unsigned long flags;
> bool _unused;
>
> + size += SLOB_HDR_SIZE;

And every object now has this overhead? 128 bytes extra in extreme cases
per object?


> - if (size < PAGE_SIZE - minalign) {
> - int align = minalign;
> + if (size < PAGE_SIZE - SLOB_HDR_SIZE) {
> + int align = SLOB_HDR_SIZE;

And the object is also aligned to 128 bytes boundaries on some
architectures.

So a 4 byte object occupies 256 bytes in SLOB?

SLOB will no longer be a low memory overhead allocator then.



2021-11-22 09:40:11

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4] slob: add size header to all allocations

On 11/22/21 10:22, Christoph Lameter wrote:
> On Sun, 21 Nov 2021, Rustam Kovhaev wrote:
>
>> Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
>> size header.
>> It simplifies the slab API and guarantees that both kmem_cache_alloc()
>> and kmalloc() memory could be freed by kfree().
>>
>> meminfo right after the system boot, x86-64 on xfs, without the patch:
>> Slab: 34700 kB
>>
>> the same, with the patch:
>> Slab: 35752 kB
>
>> +#define SLOB_HDR_SIZE max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN)
>
> Ok that is up to 128 bytes on some architectues. Mostly 32 or 64 bytes.
>
>> @@ -307,6 +303,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
>> unsigned long flags;
>> bool _unused;
>>
>> + size += SLOB_HDR_SIZE;
>
> And every object now has this overhead? 128 bytes extra in extreme cases
> per object?
>
>
>> - if (size < PAGE_SIZE - minalign) {
>> - int align = minalign;
>> + if (size < PAGE_SIZE - SLOB_HDR_SIZE) {
>> + int align = SLOB_HDR_SIZE;
>
> And the object is also aligned to 128 bytes boundaries on some
> architectures.
>
> So a 4 byte object occupies 256 bytes in SLOB?
>
> SLOB will no longer be a low memory overhead allocator then.

Hm good point, didn't realize those MINALIGN constants can be that large. I
think this overhead was already the case with SLOB for kmalloc caches, but
now it would be worse, as it would be for all kmem caches.

But it seems there's no reason we couldn't do better? I.e. use the value of
SLOB_HDR_SIZE only to align the beginning of actual object (and name the
define different than SLOB_HDR_SIZE). But the size of the header, where we
store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
64bit. The address of the header shouldn't have a reason to be also aligned
to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
it and not the slab consumers which rely on those alignments?

Subject: Re: [PATCH v4] slob: add size header to all allocations

On Mon, 22 Nov 2021, Vlastimil Babka wrote:

> But it seems there's no reason we couldn't do better? I.e. use the value of
> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
> define different than SLOB_HDR_SIZE). But the size of the header, where we
> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
> 64bit. The address of the header shouldn't have a reason to be also aligned
> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
> it and not the slab consumers which rely on those alignments?

Well the best way would be to put it at the end of the object in order to
avoid the alignment problem. This is a particular issue with SLOB because
it allows multiple types of objects in a single page frame.

If only one type of object would be allowed then the object size etc can
be stored in the page struct.

So I guess placement at the beginning cannot be avoided. That in turn runs
into trouble with the DMA requirements on some platforms where the
beginning of the object has to be cache line aligned.

I dont know but it seems that making slob that sophisticated is counter
productive. Remove SLOB?


2021-11-22 10:45:40

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4] slob: add size header to all allocations

On 11/22/21 11:36, Christoph Lameter wrote:
> On Mon, 22 Nov 2021, Vlastimil Babka wrote:
>
>> But it seems there's no reason we couldn't do better? I.e. use the value of
>> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
>> define different than SLOB_HDR_SIZE). But the size of the header, where we
>> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
>> 64bit. The address of the header shouldn't have a reason to be also aligned
>> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
>> it and not the slab consumers which rely on those alignments?
>
> Well the best way would be to put it at the end of the object in order to
> avoid the alignment problem. This is a particular issue with SLOB because
> it allows multiple types of objects in a single page frame.
>
> If only one type of object would be allowed then the object size etc can
> be stored in the page struct.
>
> So I guess placement at the beginning cannot be avoided. That in turn runs
> into trouble with the DMA requirements on some platforms where the
> beginning of the object has to be cache line aligned.

It's no problem to have the real beginning of the object aligned, and the
prepended header not. The code already does that before this patch for the
kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
ARCH_SLAB_MINALIGN.

> I dont know but it seems that making slob that sophisticated is counter
> productive. Remove SLOB?

I wouldn't mind, but somebody might :)



Subject: Re: [PATCH v4] slob: add size header to all allocations

On Mon, 22 Nov 2021, Vlastimil Babka wrote:

> It's no problem to have the real beginning of the object aligned, and the
> prepended header not. The code already does that before this patch for the
> kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
> bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
> ARCH_SLAB_MINALIGN.

Ok but then the first object in a page may still have those issues.

> > I dont know but it seems that making slob that sophisticated is counter
> > productive. Remove SLOB?
>
> I wouldn't mind, but somebody might :)

Well run a space efficiency analysis after this patch. If the memory used
is larger than SLUB (with the configuration for minimal data footprint)
then there is no reason for SLOB to continue.


2021-11-22 11:49:39

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4] slob: add size header to all allocations

On 11/22/21 12:40, Christoph Lameter wrote:
> On Mon, 22 Nov 2021, Vlastimil Babka wrote:
>
>> It's no problem to have the real beginning of the object aligned, and the
>> prepended header not. The code already does that before this patch for the
>> kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
>> bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
>> ARCH_SLAB_MINALIGN.
>
> Ok but then the first object in a page may still have those issues.

Hmm, that's right. I guess we should also distinguish ARCH_KMALLOC_MINALIGN
for kmalloc paths, and ARCH_SLAB_MINALIGN for kmem_cache_alloc paths. Seems
the latter is generally smaller, thus some holes left by kmalloc allocations
could be filled later by kmem_cache_alloc allocations.

>
>> > I dont know but it seems that making slob that sophisticated is counter
>> > productive. Remove SLOB?
>>
>> I wouldn't mind, but somebody might :)
>
> Well run a space efficiency analysis after this patch. If the memory used
> is larger than SLUB (with the configuration for minimal data footprint)
> then there is no reason for SLOB to continue.

Makes sense.

2021-11-23 10:18:33

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v4] slob: add size header to all allocations

From: Vlastimil Babka
> Sent: 22 November 2021 10:46
>
> On 11/22/21 11:36, Christoph Lameter wrote:
> > On Mon, 22 Nov 2021, Vlastimil Babka wrote:
> >
> >> But it seems there's no reason we couldn't do better? I.e. use the value of
> >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
> >> define different than SLOB_HDR_SIZE). But the size of the header, where we
> >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
> >> 64bit. The address of the header shouldn't have a reason to be also aligned
> >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
> >> it and not the slab consumers which rely on those alignments?
> >
> > Well the best way would be to put it at the end of the object in order to
> > avoid the alignment problem. This is a particular issue with SLOB because
> > it allows multiple types of objects in a single page frame.
> >
> > If only one type of object would be allowed then the object size etc can
> > be stored in the page struct.

Or just a single byte that is the index of the associated free list structure.
For 32bit and for the smaller kmalloc() area it may be reasonable to have
a separate array indexed by the page of the address.

> > So I guess placement at the beginning cannot be avoided. That in turn runs
> > into trouble with the DMA requirements on some platforms where the
> > beginning of the object has to be cache line aligned.
>
> It's no problem to have the real beginning of the object aligned, and the
> prepended header not.

I'm not sure that helps.
The header can't share a cache line with the previous item (because it
might be mapped for DMA) so will always take a full cache line.

There might me some strange scheme where you put the size at the end
and the offset of the 'last end' into the page struct.
The DMA API should let you safely read the size from an allocated
buffer - but you can't modify it.

There is also all the code that allocates 'power of 2' sized buffers
under the assumption they are efficient - as soon as you add a size
field that assumption just causes the sizes of item to (often) double.

David

> The code already does that before this patch for the
> kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
> bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
> ARCH_SLAB_MINALIGN.
>
> > I dont know but it seems that making slob that sophisticated is counter
> > productive. Remove SLOB?
>
> I wouldn't mind, but somebody might :)
>

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-11-30 07:00:30

by Rustam Kovhaev

[permalink] [raw]
Subject: Re: [PATCH v4] slob: add size header to all allocations

On Tue, Nov 23, 2021 at 10:18:27AM +0000, David Laight wrote:
> From: Vlastimil Babka
> > Sent: 22 November 2021 10:46
> >
> > On 11/22/21 11:36, Christoph Lameter wrote:
> > > On Mon, 22 Nov 2021, Vlastimil Babka wrote:
> > >
> > >> But it seems there's no reason we couldn't do better? I.e. use the value of
> > >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
> > >> define different than SLOB_HDR_SIZE). But the size of the header, where we
> > >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
> > >> 64bit. The address of the header shouldn't have a reason to be also aligned
> > >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
> > >> it and not the slab consumers which rely on those alignments?
> > >
> > > Well the best way would be to put it at the end of the object in order to
> > > avoid the alignment problem. This is a particular issue with SLOB because
> > > it allows multiple types of objects in a single page frame.
> > >
> > > If only one type of object would be allowed then the object size etc can
> > > be stored in the page struct.
>
> Or just a single byte that is the index of the associated free list structure.
> For 32bit and for the smaller kmalloc() area it may be reasonable to have
> a separate array indexed by the page of the address.
>
> > > So I guess placement at the beginning cannot be avoided. That in turn runs
> > > into trouble with the DMA requirements on some platforms where the
> > > beginning of the object has to be cache line aligned.
> >
> > It's no problem to have the real beginning of the object aligned, and the
> > prepended header not.
>
> I'm not sure that helps.
> The header can't share a cache line with the previous item (because it
> might be mapped for DMA) so will always take a full cache line.

I thought that DMA API allocates buffers that are larger than page size.
DMA pool seems to be able to give out smaller buffers, but underneath it
seems to be calling page allocator.
The SLOB objects that have this header are all less than page size, and
they cannot end up in DMA code paths, or can they?

>
> There might me some strange scheme where you put the size at the end
> and the offset of the 'last end' into the page struct.
> The DMA API should let you safely read the size from an allocated
> buffer - but you can't modify it.
>
> There is also all the code that allocates 'power of 2' sized buffers
> under the assumption they are efficient - as soon as you add a size
> field that assumption just causes the sizes of item to (often) double.
>
> David
>
> > The code already does that before this patch for the
> > kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
> > bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
> > ARCH_SLAB_MINALIGN.
> >
> > > I dont know but it seems that making slob that sophisticated is counter
> > > productive. Remove SLOB?
> >
> > I wouldn't mind, but somebody might :)
> >
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2021-11-30 09:23:36

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v4] slob: add size header to all allocations

From: Rustam Kovhaev
> Sent: 30 November 2021 07:00
>
> On Tue, Nov 23, 2021 at 10:18:27AM +0000, David Laight wrote:
> > From: Vlastimil Babka
> > > Sent: 22 November 2021 10:46
> > >
> > > On 11/22/21 11:36, Christoph Lameter wrote:
> > > > On Mon, 22 Nov 2021, Vlastimil Babka wrote:
> > > >
> > > >> But it seems there's no reason we couldn't do better? I.e. use the value of
> > > >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
> > > >> define different than SLOB_HDR_SIZE). But the size of the header, where we
> > > >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
> > > >> 64bit. The address of the header shouldn't have a reason to be also aligned
> > > >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
> > > >> it and not the slab consumers which rely on those alignments?
> > > >
> > > > Well the best way would be to put it at the end of the object in order to
> > > > avoid the alignment problem. This is a particular issue with SLOB because
> > > > it allows multiple types of objects in a single page frame.
> > > >
...
> >
> > > > So I guess placement at the beginning cannot be avoided. That in turn runs
> > > > into trouble with the DMA requirements on some platforms where the
> > > > beginning of the object has to be cache line aligned.
> > >
> > > It's no problem to have the real beginning of the object aligned, and the
> > > prepended header not.
> >
> > I'm not sure that helps.
> > The header can't share a cache line with the previous item (because it
> > might be mapped for DMA) so will always take a full cache line.
>
> I thought that DMA API allocates buffers that are larger than page size.
> DMA pool seems to be able to give out smaller buffers, but underneath it
> seems to be calling page allocator.
> The SLOB objects that have this header are all less than page size, and
> they cannot end up in DMA code paths, or can they?

The problem isn't dma_alloc_coherent() it is when memory allocated
elsewhere is used for DMA.
On systems with non-coherent DMA accesses the data cache has to be
flushed before all and invalidated after read DMA transfers.
The cpu must not dirty any of the cache lines associated with a read DMA.

This is on top of any requirements for the alignment of the returned address.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


Subject: Re: [PATCH v4] slob: add size header to all allocations

On Mon, 29 Nov 2021, Rustam Kovhaev wrote:

> > I'm not sure that helps.
> > The header can't share a cache line with the previous item (because it
> > might be mapped for DMA) so will always take a full cache line.
>
> I thought that DMA API allocates buffers that are larger than page size.
> DMA pool seems to be able to give out smaller buffers, but underneath it
> seems to be calling page allocator.
> The SLOB objects that have this header are all less than page size, and
> they cannot end up in DMA code paths, or can they?

kmalloc slab allocations must return dmaable memory. If the underlying
hardware can only dma to a cache line border then all objects must be
aligned that way.


2021-11-30 15:05:40

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4] slob: add size header to all allocations

On 11/23/21 11:18, David Laight wrote:
> From: Vlastimil Babka
>> Sent: 22 November 2021 10:46
>>
>> On 11/22/21 11:36, Christoph Lameter wrote:
>> > On Mon, 22 Nov 2021, Vlastimil Babka wrote:
>> >
>> >> But it seems there's no reason we couldn't do better? I.e. use the value of
>> >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
>> >> define different than SLOB_HDR_SIZE). But the size of the header, where we
>> >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
>> >> 64bit. The address of the header shouldn't have a reason to be also aligned
>> >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
>> >> it and not the slab consumers which rely on those alignments?
>> >
>> > Well the best way would be to put it at the end of the object in order to
>> > avoid the alignment problem. This is a particular issue with SLOB because
>> > it allows multiple types of objects in a single page frame.
>> >
>> > If only one type of object would be allowed then the object size etc can
>> > be stored in the page struct.
>
> Or just a single byte that is the index of the associated free list structure.
> For 32bit and for the smaller kmalloc() area it may be reasonable to have
> a separate array indexed by the page of the address.
>
>> > So I guess placement at the beginning cannot be avoided. That in turn runs
>> > into trouble with the DMA requirements on some platforms where the
>> > beginning of the object has to be cache line aligned.
>>
>> It's no problem to have the real beginning of the object aligned, and the
>> prepended header not.
>
> I'm not sure that helps.
> The header can't share a cache line with the previous item (because it
> might be mapped for DMA) so will always take a full cache line.

So if this is true, then I think we already have a problem with SLOB today
(and AFAICS it's not even due to changes done by my 2019 commit 59bb47985c1d
("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)" but
older).

Let's say we are on arm64 where (AFAICS):
ARCH_KMALLOC_MINALIGN = ARCH_DMA_MINALIGN = 128
ARCH_SLAB_MINALIGN = 64

The point is that ARCH_SLAB_MINALIGN is smaller than ARCH_DMA_MINALIGN.

Let's say we call kmalloc(64) and get a completely fresh page.
In SLOB, alloc() or rather __do_kmalloc_node() will calculate minalign to
max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) thus 128.
It will call slob_alloc() for size of size+minalign=64+128=192, align and
align_offset = 128
Thus the allocation will use 128 bytes for the header, 64 for the object.
Both the header and object aligned to 128 bytes.
But the remaining 64 bytes of the second 128 bytes will remain free, as the
allocated size is 192 bytes:

| 128B header, aligned | 64B object | 64B free | rest also free |

If there's another kmalloc allocation, the 128 bytes aligment due to
ARCH_KMALLOC_MINALIGN will avoid it from using these 64 bytes, so that's
fine. But if there's a kmem_cache_alloc() from a cache serving <=64B
objects, it will be aligned to ARCH_SLAB_MINALIGN and happily use those 64
bytes that share the 128 block where the previous kmalloc allocation lies.

So either I missed something or we violate the rule that kmalloc() provides
blocks where ARCH_KMALLOC_MINALIGN is not just the alignment of their
beginning but also nothing else touches the N*ARCH_KMALLOC_MINALIGN area
containing the allocated object.

> There might me some strange scheme where you put the size at the end
> and the offset of the 'last end' into the page struct.
> The DMA API should let you safely read the size from an allocated
> buffer - but you can't modify it.
>
> There is also all the code that allocates 'power of 2' sized buffers
> under the assumption they are efficient - as soon as you add a size
> field that assumption just causes the sizes of item to (often) double.
>
> David
>
>> The code already does that before this patch for the
>> kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
>> bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
>> ARCH_SLAB_MINALIGN.
>>
>> > I dont know but it seems that making slob that sophisticated is counter
>> > productive. Remove SLOB?
>>
>> I wouldn't mind, but somebody might :)
>>
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>


2021-11-30 15:23:19

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v4] slob: add size header to all allocations

From: Vlastimil Babka
> Sent: 30 November 2021 14:56
>
> On 11/23/21 11:18, David Laight wrote:
> > From: Vlastimil Babka
> >> Sent: 22 November 2021 10:46
> >>
> >> On 11/22/21 11:36, Christoph Lameter wrote:
> >> > On Mon, 22 Nov 2021, Vlastimil Babka wrote:
> >> >
> >> >> But it seems there's no reason we couldn't do better? I.e. use the value of
> >> >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
> >> >> define different than SLOB_HDR_SIZE). But the size of the header, where we
> >> >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
> >> >> 64bit. The address of the header shouldn't have a reason to be also aligned
> >> >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
> >> >> it and not the slab consumers which rely on those alignments?
> >> >
> >> > Well the best way would be to put it at the end of the object in order to
> >> > avoid the alignment problem. This is a particular issue with SLOB because
> >> > it allows multiple types of objects in a single page frame.
> >> >
> >> > If only one type of object would be allowed then the object size etc can
> >> > be stored in the page struct.
> >
> > Or just a single byte that is the index of the associated free list structure.
> > For 32bit and for the smaller kmalloc() area it may be reasonable to have
> > a separate array indexed by the page of the address.
> >
> >> > So I guess placement at the beginning cannot be avoided. That in turn runs
> >> > into trouble with the DMA requirements on some platforms where the
> >> > beginning of the object has to be cache line aligned.
> >>
> >> It's no problem to have the real beginning of the object aligned, and the
> >> prepended header not.
> >
> > I'm not sure that helps.
> > The header can't share a cache line with the previous item (because it
> > might be mapped for DMA) so will always take a full cache line.
>
> So if this is true, then I think we already have a problem with SLOB today
> (and AFAICS it's not even due to changes done by my 2019 commit 59bb47985c1d
> ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)" but
> older).
>
> Let's say we are on arm64 where (AFAICS):
> ARCH_KMALLOC_MINALIGN = ARCH_DMA_MINALIGN = 128
> ARCH_SLAB_MINALIGN = 64

Is that valid?
Isn't SLAB being used to implement kmalloc() so the architecture
defined alignment must apply?

> The point is that ARCH_SLAB_MINALIGN is smaller than ARCH_DMA_MINALIGN.
>
> Let's say we call kmalloc(64) and get a completely fresh page.
> In SLOB, alloc() or rather __do_kmalloc_node() will calculate minalign to
> max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) thus 128.
> It will call slob_alloc() for size of size+minalign=64+128=192, align and
> align_offset = 128
> Thus the allocation will use 128 bytes for the header, 64 for the object.
> Both the header and object aligned to 128 bytes.
> But the remaining 64 bytes of the second 128 bytes will remain free, as the
> allocated size is 192 bytes:
>
> | 128B header, aligned | 64B object | 64B free | rest also free |

That is horribly wasteful on memory :-)

> If there's another kmalloc allocation, the 128 bytes aligment due to
> ARCH_KMALLOC_MINALIGN will avoid it from using these 64 bytes, so that's
> fine. But if there's a kmem_cache_alloc() from a cache serving <=64B
> objects, it will be aligned to ARCH_SLAB_MINALIGN and happily use those 64
> bytes that share the 128 block where the previous kmalloc allocation lies.

If the memory returned by kmem_cache_alloc() can be used for DMA then
ARCH_DMA_MINALIGN has to apply to the returned buffers.
So, maybe, that cache can't exist?

I'd expect that ARCH_DMA_MINALIGN forces allocations to be a multiple
of that size.
More particularly the rest of the area can't be allocated to anything else.
So it ought to be valid to return the 2nd half of a 128 byte cache line
provided the first half isn't written while the allocation is active.

But that ARCH_KMALLOC_MINALIGN only applies to 'large' items?
Small items only need aligning to the power of 2 below their size.
So 8 bytes items only need 8 byte alignment even though a larger
item might need (say) 64 byte alignment.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Subject: Re: [PATCH v4] slob: add size header to all allocations

On Tue, 30 Nov 2021, Vlastimil Babka wrote:

> So either I missed something or we violate the rule that kmalloc() provides
> blocks where ARCH_KMALLOC_MINALIGN is not just the alignment of their
> beginning but also nothing else touches the N*ARCH_KMALLOC_MINALIGN area
> containing the allocated object.

Indeed.... The DMA API documentation in the kernel states:

Documentation/DMA-API-HOWTO.rst

2) ARCH_KMALLOC_MINALIGN

+ Architectures must ensure that kmalloc'ed buffer is
+ DMA-safe. Drivers and subsystems depend on it. If an architecture
+ isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in
+ the CPU cache is identical to data in main memory),
+ ARCH_KMALLOC_MINALIGN must be set so that the memory allocator
+ makes sure that kmalloc'ed buffer doesn't share a cache line with
+ the others. See arch/arm/include/asm/cache.h as an example.


Note that this is only the case for kmalloc. Not for a slab cache setup
separately from the kmalloc array. That is why ARCH_KMALLOC_MINALIGN
exists.

2021-11-30 15:39:33

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4] slob: add size header to all allocations

On 11/30/21 16:21, David Laight wrote:
> From: Vlastimil Babka
>> Sent: 30 November 2021 14:56
>>
>> On 11/23/21 11:18, David Laight wrote:
>> > From: Vlastimil Babka
>> >
>> > Or just a single byte that is the index of the associated free list structure.
>> > For 32bit and for the smaller kmalloc() area it may be reasonable to have
>> > a separate array indexed by the page of the address.
>> >
>> >> > So I guess placement at the beginning cannot be avoided. That in turn runs
>> >> > into trouble with the DMA requirements on some platforms where the
>> >> > beginning of the object has to be cache line aligned.
>> >>
>> >> It's no problem to have the real beginning of the object aligned, and the
>> >> prepended header not.
>> >
>> > I'm not sure that helps.
>> > The header can't share a cache line with the previous item (because it
>> > might be mapped for DMA) so will always take a full cache line.
>>
>> So if this is true, then I think we already have a problem with SLOB today
>> (and AFAICS it's not even due to changes done by my 2019 commit 59bb47985c1d
>> ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)" but
>> older).
>>
>> Let's say we are on arm64 where (AFAICS):
>> ARCH_KMALLOC_MINALIGN = ARCH_DMA_MINALIGN = 128
>> ARCH_SLAB_MINALIGN = 64
>
> Is that valid?
> Isn't SLAB being used to implement kmalloc() so the architecture
> defined alignment must apply?

SLAB is used to implement kmalloc() yes, but that's an implementation
detail. I assume that we provide these DMA guarantees to all kmalloc() users
as we don't know which will use it for DMA and which not, but if somebody
creates their specific SLAB cache, they have to decide explicitly if they
are going to use DMA with those objects, and request such alignment if yes.
If not, we can use smaller alignment that's only required by e.g. the CPU.

>> The point is that ARCH_SLAB_MINALIGN is smaller than ARCH_DMA_MINALIGN.
>>
>> Let's say we call kmalloc(64) and get a completely fresh page.
>> In SLOB, alloc() or rather __do_kmalloc_node() will calculate minalign to
>> max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) thus 128.
>> It will call slob_alloc() for size of size+minalign=64+128=192, align and
>> align_offset = 128
>> Thus the allocation will use 128 bytes for the header, 64 for the object.
>> Both the header and object aligned to 128 bytes.
>> But the remaining 64 bytes of the second 128 bytes will remain free, as the
>> allocated size is 192 bytes:
>>
>> | 128B header, aligned | 64B object | 64B free | rest also free |
>
> That is horribly wasteful on memory :-)

Yes. I don't know how this historically came to be for SLOB, which was
supposed to minimize memory usage (at the expense of cpu efficiency). But
the point raised in this thread that if we extend this to all
kmem_cache_alloc() allocations to make them possible to free with kfree(),
we'll make it a lot worse :/

>> If there's another kmalloc allocation, the 128 bytes aligment due to
>> ARCH_KMALLOC_MINALIGN will avoid it from using these 64 bytes, so that's
>> fine. But if there's a kmem_cache_alloc() from a cache serving <=64B
>> objects, it will be aligned to ARCH_SLAB_MINALIGN and happily use those 64
>> bytes that share the 128 block where the previous kmalloc allocation lies.
>
> If the memory returned by kmem_cache_alloc() can be used for DMA then
> ARCH_DMA_MINALIGN has to apply to the returned buffers.
> So, maybe, that cache can't exist?

See above, I assume that cache cannot be used for DMA. But if we are trying
to protect the kmalloc(64) DMA guarantees, then that shouldn't depend on the
guarantees of the second, unrelated allocation?

> I'd expect that ARCH_DMA_MINALIGN forces allocations to be a multiple
> of that size.

Doesn't seem so. That would indeed fix the problem, assuming it really is a
problem (yet seems nobody reported it occuring in practice).

> More particularly the rest of the area can't be allocated to anything else.
> So it ought to be valid to return the 2nd half of a 128 byte cache line
> provided the first half isn't written while the allocation is active.

As the allocator cannot know when the first half will be used for DMA by
whoever allocated it, we can only assume it can happen at any time, and not
return the 2nd half, ever?

> But that ARCH_KMALLOC_MINALIGN only applies to 'large' items?
> Small items only need aligning to the power of 2 below their size.
> So 8 bytes items only need 8 byte alignment even though a larger
> item might need (say) 64 byte alignment.

But if we never defined such threshold (that would probably have to be arch
independent) then we can't start making such assumptions today, as we don't
know which kmalloc() users do expect DMA and which not? It would have to be
a flag or something. And yeah there is already a __GFP_DMA flag, but it
means something a bit different...

>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>