2012-08-02 13:11:28

by Glauber Costa

[permalink] [raw]
Subject: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation

The slab allocators provide its users with memory regions, with very few
placement guarantees. No user should assume an actual page is given by
kmalloc calls that are multiple of a page in size. This means that we
can be sure that every sane user of the interface would not mess with
the page reference counting of the underlying page.

When freeing objects, the slub allocator will most of the time free
empty pages by calling __free_pages(). But high-order kmalloc will be
diposed by means of put_page() instead.

It makes no sense to call put_page() in kernel pages that are not
reference counted, which is the case here.

Signed-off-by: Glauber Costa <[email protected]>
CC: David Rientjes <[email protected]>
CC: Pekka Enberg <[email protected]>
CC: Christoph Lameter <[email protected]>
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index e517d43..9ca4e20 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3453,7 +3453,7 @@ void kfree(const void *x)
if (unlikely(!PageSlab(page))) {
BUG_ON(!PageCompound(page));
kmemleak_free(x);
- put_page(page);
+ __free_pages(page, compound_order(page));
return;
}
slab_free(page->slab, page, object, _RET_IP_);
--
1.7.11.2


Subject: Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation

On Thu, 2 Aug 2012, Glauber Costa wrote:

> diff --git a/mm/slub.c b/mm/slub.c
> index e517d43..9ca4e20 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3453,7 +3453,7 @@ void kfree(const void *x)
> if (unlikely(!PageSlab(page))) {
> BUG_ON(!PageCompound(page));
> kmemleak_free(x);
> - put_page(page);
> + __free_pages(page, compound_order(page));

Hmmm... put_page would have called put_compound_page(). which would have
called the dtor function. dtor is set to __free_pages() ok which does
mlock checks and verifies that the page is in a proper condition for
freeing. Then it calls free_one_page().

__free_pages() decrements the refcount and then calls __free_pages_ok().

So we loose the checking and the dtor stuff with this patch. Guess that is
ok?

Acked-by: Christoph Lameter <[email protected]>

2012-08-02 16:42:13

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation

On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote:
> On Thu, 2 Aug 2012, Glauber Costa wrote:
>
> > diff --git a/mm/slub.c b/mm/slub.c
> > index e517d43..9ca4e20 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3453,7 +3453,7 @@ void kfree(const void *x)
> > if (unlikely(!PageSlab(page))) {
> > BUG_ON(!PageCompound(page));
> > kmemleak_free(x);
> > - put_page(page);
> > + __free_pages(page, compound_order(page));
>
> Hmmm... put_page would have called put_compound_page(). which would have
> called the dtor function. dtor is set to __free_pages() ok which does
> mlock checks and verifies that the page is in a proper condition for
> freeing. Then it calls free_one_page().
>
> __free_pages() decrements the refcount and then calls __free_pages_ok().
>
> So we loose the checking and the dtor stuff with this patch. Guess that is
> ok?

The changelog is not correct, however. People DO get pages underlying
slab objects and even free the slab objects before returning the page.
See recent fix:

commit 5bf5f03c271907978489868a4c72aeb42b5127d2
Author: Pravin B Shelar <[email protected]>
Date: Tue May 29 15:06:49 2012 -0700

mm: fix slab->page flags corruption

Transparent huge pages can change page->flags (PG_compound_lock) without
taking Slab lock. Since THP can not break slab pages we can safely access
compound page without taking compound lock.

Specifically this patch fixes a race between compound_unlock() and slab
functions which perform page-flags updates. This can occur when
get_page()/put_page() is called on a page from slab.

2012-08-02 16:51:42

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation

On 08/02/2012 08:42 PM, Johannes Weiner wrote:
> On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote:
>> On Thu, 2 Aug 2012, Glauber Costa wrote:
>>
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index e517d43..9ca4e20 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -3453,7 +3453,7 @@ void kfree(const void *x)
>>> if (unlikely(!PageSlab(page))) {
>>> BUG_ON(!PageCompound(page));
>>> kmemleak_free(x);
>>> - put_page(page);
>>> + __free_pages(page, compound_order(page));
>>
>> Hmmm... put_page would have called put_compound_page(). which would have
>> called the dtor function. dtor is set to __free_pages() ok which does
>> mlock checks and verifies that the page is in a proper condition for
>> freeing. Then it calls free_one_page().
>>
>> __free_pages() decrements the refcount and then calls __free_pages_ok().
>>
>> So we loose the checking and the dtor stuff with this patch. Guess that is
>> ok?
>
> The changelog is not correct, however. People DO get pages underlying
> slab objects and even free the slab objects before returning the page.
> See recent fix:

Well, yes, in the sense that slab objects are page-backed.

The point is that a user of kmalloc/kfree should not treat a memory area
as if it were a page, even if it is page-sized.

If it is just the Changelog you are unhappy about, I can do another
submission rewording it.

> commit 5bf5f03c271907978489868a4c72aeb42b5127d2
> Author: Pravin B Shelar <[email protected]>
> Date: Tue May 29 15:06:49 2012 -0700
>
> mm: fix slab->page flags corruption
>
> Transparent huge pages can change page->flags (PG_compound_lock) without
> taking Slab lock. Since THP can not break slab pages we can safely access
> compound page without taking compound lock.
>
> Specifically this patch fixes a race between compound_unlock() and slab
> functions which perform page-flags updates. This can occur when
> get_page()/put_page() is called on a page from slab.

This is just another argument not to do put_page on slab pages!

2012-08-02 17:10:27

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation

On Thu, Aug 02, 2012 at 08:51:31PM +0400, Glauber Costa wrote:
> On 08/02/2012 08:42 PM, Johannes Weiner wrote:
> > On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote:
> >> On Thu, 2 Aug 2012, Glauber Costa wrote:
> >>
> >>> diff --git a/mm/slub.c b/mm/slub.c
> >>> index e517d43..9ca4e20 100644
> >>> --- a/mm/slub.c
> >>> +++ b/mm/slub.c
> >>> @@ -3453,7 +3453,7 @@ void kfree(const void *x)
> >>> if (unlikely(!PageSlab(page))) {
> >>> BUG_ON(!PageCompound(page));
> >>> kmemleak_free(x);
> >>> - put_page(page);
> >>> + __free_pages(page, compound_order(page));
> >>
> >> Hmmm... put_page would have called put_compound_page(). which would have
> >> called the dtor function. dtor is set to __free_pages() ok which does
> >> mlock checks and verifies that the page is in a proper condition for
> >> freeing. Then it calls free_one_page().
> >>
> >> __free_pages() decrements the refcount and then calls __free_pages_ok().
> >>
> >> So we loose the checking and the dtor stuff with this patch. Guess that is
> >> ok?
> >
> > The changelog is not correct, however. People DO get pages underlying
> > slab objects and even free the slab objects before returning the page.
> > See recent fix:
>
> Well, yes, in the sense that slab objects are page-backed.
>
> The point is that a user of kmalloc/kfree should not treat a memory area
> as if it were a page, even if it is page-sized.

I whole-heartedly agree. But it's hard to verify there aren't any
doing that. And even though it's ugly to do, it's technically
working, no? No longer supporting it would be a regression.

> If it is just the Changelog you are unhappy about, I can do another
> submission rewording it.

__free_pages still respects the refcount, so I think the Changelog is
not actually appropriate for the change you're making. You're just
changing what Christoph outlined above, the compound page handling.

2012-08-02 17:24:34

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation

On 08/02/2012 09:10 PM, Johannes Weiner wrote:
> On Thu, Aug 02, 2012 at 08:51:31PM +0400, Glauber Costa wrote:
>> On 08/02/2012 08:42 PM, Johannes Weiner wrote:
>>> On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote:
>>>> On Thu, 2 Aug 2012, Glauber Costa wrote:
>>>>
>>>>> diff --git a/mm/slub.c b/mm/slub.c
>>>>> index e517d43..9ca4e20 100644
>>>>> --- a/mm/slub.c
>>>>> +++ b/mm/slub.c
>>>>> @@ -3453,7 +3453,7 @@ void kfree(const void *x)
>>>>> if (unlikely(!PageSlab(page))) {
>>>>> BUG_ON(!PageCompound(page));
>>>>> kmemleak_free(x);
>>>>> - put_page(page);
>>>>> + __free_pages(page, compound_order(page));
>>>>
>>>> Hmmm... put_page would have called put_compound_page(). which would have
>>>> called the dtor function. dtor is set to __free_pages() ok which does
>>>> mlock checks and verifies that the page is in a proper condition for
>>>> freeing. Then it calls free_one_page().
>>>>
>>>> __free_pages() decrements the refcount and then calls __free_pages_ok().
>>>>
>>>> So we loose the checking and the dtor stuff with this patch. Guess that is
>>>> ok?
>>>
>>> The changelog is not correct, however. People DO get pages underlying
>>> slab objects and even free the slab objects before returning the page.
>>> See recent fix:
>>
>> Well, yes, in the sense that slab objects are page-backed.
>>
>> The point is that a user of kmalloc/kfree should not treat a memory area
>> as if it were a page, even if it is page-sized.
>
> I whole-heartedly agree. But it's hard to verify there aren't any
> doing that. And even though it's ugly to do, it's technically
> working, no? No longer supporting it would be a regression.

I've done an extensive audit per Christoph's request, and although of
course this is not enough to guarantee it 100 %, it should at least be
enough to sustain a belief that it should be reasonably safe.

About regressions, yes, it is working. But as you know, this area is
under undergoing change by myself. For kmemcg to work, we need to
explicitly mark instances of __free_pages that are accounted. With this
patch, this is trivial. Without this patch, I need to come up with a
quite ugly hack to mark put_pages as well, that would exist for no
reason aside from "avoid touching this".

I could of course just bundle this is my series, but since this is an
independent change, it is better to send it separate so it get better
review, testing and validation.

>> If it is just the Changelog you are unhappy about, I can do another
>> submission rewording it.
>
> __free_pages still respects the refcount, so I think the Changelog is
> not actually appropriate for the change you're making. You're just
> changing what Christoph outlined above, the compound page handling.

I can update the Changelog, no problem.