2006-09-04 06:56:35

by Aubrey Li

[permalink] [raw]
Subject: kernel BUGs when removing largish files with the SLOB allocator

Hi all,

I'm working on the nommu blackfin uclinux(2.6.16) platform and get a
kernel BUG! at mm/nommu.c:124 when removing largish file with the SLOB
allocator.

root:~> df -k
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/mtdblock3 3008 1264 1744 42% /
root:~> cp /bin/busybox /busy
root:~> df -k
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/mtdblock3 3008 1532 1476 51% /
root:~> ls -l /bin/busybox /busy
-rwxr-xr-x 1 0 0 423904 /bin/busybox
-rwxr-xr-x 1 0 0 423904 /busy
root:~> md5sum /bin/busybox
7db253a2259ab71bc854c9e5dac544d6 /bin/busybox
root:~> md5sum /busy
7db253a2259ab71bc854c9e5dac544d6 /busy
root:~> rm /busy
kernel BUG at mm/nommu.c:124!
Kernel panic - not syncing: BUG!

Bug comes from mm/nommu.c:
=======================================
if (!objp || !((page = virt_to_page(objp))) ||
(unsigned long)objp >= memory_end)
return 0;

if (PageSlab(page))
return ksize(objp);

BUG_ON(page->index < 0);
124: BUG_ON(page->index >= MAX_ORDER);
=======================================

This seems that the SLOB allocator doesn't set the SLAB page flag
while nommu.c seem to be written for SLAB only.

On my side the following patch seems to work around the issue
============================================================
--- nommu.c 2006-06-26 14:49:28.000000000 +0800
+++ nommu.c.new 2006-06-26 14:47:20.000000000 +0800
@@ -18,7 +18,9 @@
#include <linux/file.h>
#include <linux/highmem.h>
#include <linux/pagemap.h>
-#include <linux/slab.h>
+#ifdef CONFIG_SLAB
+# include <linux/slab.h>
+#endif
#include <linux/vmalloc.h>
#include <linux/ptrace.h>
#include <linux/blkdev.h>
@@ -117,7 +119,9 @@
if (!objp || !((page = virt_to_page(objp))) || (unsigned
long)objp >= memory_end)
return 0;

+#ifdef CONFIG_SLAB
if (PageSlab(page))
+#endif
return ksize(objp);

BUG_ON(page->index < 0);
============================================================

Is there any solution/patch to fix the issue?

Any suggestions are really appreciated.

Thanks,
-Aubrey

--
VGER BF report: U 0.498985


2006-09-04 10:21:57

by David Howells

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator


Aubrey <[email protected]> wrote:

> Is there any solution/patch to fix the issue?

Make the SLOB allocator mark its pages PG_slab, just like the SLAB allocator
does. I think this should be okay as the SLOB allocator and the SLAB
allocator seem to be mutually exclusive.

Using PG_slab would also give an instant check to things like SLOB's kfree().

> +#ifdef CONFIG_SLAB
> if (PageSlab(page))
> +#endif

This is not a valid workaround as the object won't necessarily have been
allocated from a slab (shared ramfs mappings and SYSV SHM for example). You
may not pass to ksize() objects allocated by means other than SLAB/SLOB.

David

--
VGER BF report: H 1.12398e-05

2006-09-05 03:52:41

by Aubrey Li

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

IMHO the problem is nommu.c is written for slab only. So when slob is
enabled, it need to be considered to make some modification to make
two or more memory allocator algorithms work properly, rather than to
force all others algorithm to be compatible with the current one(slab)
to match the code in the nommu.c, which is not common enough.

Does that make sense?

Regards,
-Aubrey

On 9/4/06, David Howells <[email protected]> wrote:
>
> Aubrey <[email protected]> wrote:
>
> > Is there any solution/patch to fix the issue?
>
> Make the SLOB allocator mark its pages PG_slab, just like the SLAB allocator
> does. I think this should be okay as the SLOB allocator and the SLAB
> allocator seem to be mutually exclusive.
>
> Using PG_slab would also give an instant check to things like SLOB's kfree().
>
> > +#ifdef CONFIG_SLAB
> > if (PageSlab(page))
> > +#endif
>
> This is not a valid workaround as the object won't necessarily have been
> allocated from a slab (shared ramfs mappings and SYSV SHM for example). You
> may not pass to ksize() objects allocated by means other than SLAB/SLOB.
>
> David
>

2006-09-05 09:35:28

by David Howells

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

Aubrey <[email protected]> wrote:

> IMHO the problem is nommu.c is written for slab only. So when slob is
> enabled, it need to be considered to make some modification to make
> two or more memory allocator algorithms work properly, rather than to
> force all others algorithm to be compatible with the current one(slab)
> to match the code in the nommu.c, which is not common enough.
>
> Does that make sense?

No, not really.

The point is that kobjsize() needs to determine the size of the object it has
been asked to assess. It knows how to do that directly if the page is
allocated by the main page allocator, but not if the page belongs to the slab
allocator. The quickest way it can determine this is to look at PG_slab. In
such a case it defers to the slab allocator for a determination.

What I don't want to happen is that we have to defer immediately to the slob
allocator which then goes and searches various lists to see if it owns the
page. Remember: unless the page is _marked_ as belonging to the slob
allocator, the slob allocator may _not_ assume any of the metadata in struct
page is valid slob metadata. It _has_ to determine the validity of the page
by other means _before_ it can use the metadata, and that most likely means a
search. This is why PG_slab exists: if it is set, you _know_ you can
instantly trust the metadata.

Since slob appears to be an entry-point-by-entry-point replacement for the
slab allocator, the slob allocator can also mark its pages for anything that's
looking to defer to it using PG_slab since the presence of slab and slob are
mutually exclusive.

Also, we already have two major memory allocator algorithms in the kernel at
any one time: (1) the main page allocator and (2) slab or slob. We don't
really want to start going to three or more.


So, I come back to the main question: Why don't you want to use PG_slab?

David

2006-09-06 02:35:57

by Aubrey Li

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

Yeah, I agree with most of your opinion. Using PG_slab is really a
quickest way to determine the size of the object. But I think using a
flag named "PG_slab" on a memory algorithm named "slob" seems not
reasonable. It may confuse the people who start to read the kernel
source code. So I'm writing to ask if there is a better solution to
fix the issue.

-Aubrey

On 9/5/06, David Howells <[email protected]> wrote:
> Aubrey <[email protected]> wrote:
>
> > IMHO the problem is nommu.c is written for slab only. So when slob is
> > enabled, it need to be considered to make some modification to make
> > two or more memory allocator algorithms work properly, rather than to
> > force all others algorithm to be compatible with the current one(slab)
> > to match the code in the nommu.c, which is not common enough.
> >
> > Does that make sense?
>
> No, not really.
>
> The point is that kobjsize() needs to determine the size of the object it has
> been asked to assess. It knows how to do that directly if the page is
> allocated by the main page allocator, but not if the page belongs to the slab
> allocator. The quickest way it can determine this is to look at PG_slab. In
> such a case it defers to the slab allocator for a determination.
>
> What I don't want to happen is that we have to defer immediately to the slob
> allocator which then goes and searches various lists to see if it owns the
> page. Remember: unless the page is _marked_ as belonging to the slob
> allocator, the slob allocator may _not_ assume any of the metadata in struct
> page is valid slob metadata. It _has_ to determine the validity of the page
> by other means _before_ it can use the metadata, and that most likely means a
> search. This is why PG_slab exists: if it is set, you _know_ you can
> instantly trust the metadata.
>
> Since slob appears to be an entry-point-by-entry-point replacement for the
> slab allocator, the slob allocator can also mark its pages for anything that's
> looking to defer to it using PG_slab since the presence of slab and slob are
> mutually exclusive.
>
> Also, we already have two major memory allocator algorithms in the kernel at
> any one time: (1) the main page allocator and (2) slab or slob. We don't
> really want to start going to three or more.
>
>
> So, I come back to the main question: Why don't you want to use PG_slab?
>
> David
>

2006-09-06 03:36:14

by Nick Piggin

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

Aubrey wrote:

> Yeah, I agree with most of your opinion. Using PG_slab is really a
> quickest way to determine the size of the object. But I think using a
> flag named "PG_slab" on a memory algorithm named "slob" seems not
> reasonable. It may confuse the people who start to read the kernel
> source code. So I'm writing to ask if there is a better solution to
> fix the issue.


No, confusing would be a "slab replacement" that doesn't provide the same
API as slab and thus requires users to use ifdefs.

I've already suggested exact same thing as David in the exact same situation
about 6 months ago. It is the right way to go.

--

Send instant messages to your online friends http://au.messenger.yahoo.com

2006-09-12 08:07:06

by Aubrey Li

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

On 9/6/06, Nick Piggin <[email protected]> wrote:
> Aubrey wrote:
>
> > Yeah, I agree with most of your opinion. Using PG_slab is really a
> > quickest way to determine the size of the object. But I think using a
> > flag named "PG_slab" on a memory algorithm named "slob" seems not
> > reasonable. It may confuse the people who start to read the kernel
> > source code. So I'm writing to ask if there is a better solution to
> > fix the issue.
>
>
> No, confusing would be a "slab replacement" that doesn't provide the same
> API as slab and thus requires users to use ifdefs.
>
> I've already suggested exact same thing as David in the exact same situation
> about 6 months ago. It is the right way to go.

OK. Here is the patch and work properly on my side.
Welcome any suggestions and comments.

Thanks,
-Aubrey
=================================================================
diff --git a/mm/slob.c b/mm/slob.c
index 7b52b20..18ed170 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -109,6 +109,7 @@ static void *slob_alloc(size_t size, gfp

slob_free(cur, PAGE_SIZE);
spin_lock_irqsave(&slob_lock, flags);
+ SetPageSlab(virt_to_page(cur));
cur = slobfree;
}
}
@@ -162,6 +163,8 @@ void *kmalloc(size_t size, gfp_t gfp)
slob_t *m;
bigblock_t *bb;
unsigned long flags;
+ int i;
+ struct page *page;

if (size < PAGE_SIZE - SLOB_UNIT) {
m = slob_alloc(size + SLOB_UNIT, gfp, 0);
@@ -173,12 +176,17 @@ void *kmalloc(size_t size, gfp_t gfp)
return 0;

bb->order = find_order(size);
- bb->pages = (void *)__get_free_pages(gfp, bb->order);
+ page = alloc_pages(gfp, bb->order);
+ bb->pages = (void *)page_address(page);

if (bb->pages) {
spin_lock_irqsave(&block_lock, flags);
bb->next = bigblocks;
bigblocks = bb;
+ for (i = 0; i < (1 << bb->order); i++) {
+ SetPageSlab(page);
+ page++;
+ }
spin_unlock_irqrestore(&block_lock, flags);
return bb->pages;
}
@@ -202,8 +210,16 @@ void kfree(const void *block)
spin_lock_irqsave(&block_lock, flags);
for (bb = bigblocks; bb; last = &bb->next, bb = bb->next) {
if (bb->pages == block) {
+ struct page *page = virt_to_page(bb->pages);
+ int i;
+
*last = bb->next;
spin_unlock_irqrestore(&block_lock, flags);
+ for (i = 0; i < (1 << bb->order); i++) {
+ if (!TestClearPageSlab(page))
+ BUG();
+ page++;
+ }
free_pages((unsigned long)block, bb->order);
slob_free(bb, sizeof(bigblock_t));
return;
@@ -332,10 +348,12 @@ static struct timer_list slob_timer = TI

void kmem_cache_init(void)
{
+#if 0
void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);

if (p)
free_page((unsigned long)p);
+#endif

mod_timer(&slob_timer, jiffies + HZ);
}

2006-09-12 08:54:38

by David Howells

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

Aubrey <[email protected]> wrote:

> OK. Here is the patch and work properly on my side.
> Welcome any suggestions and comments.

It looks reasonable. Don't forget to sign off the patch.

> void kmem_cache_init(void)
> {
> +#if 0
> void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
>
> if (p)
> free_page((unsigned long)p);
> +#endif

Any idea what that's about?

David

2006-09-12 10:53:30

by Aubrey Li

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

On 9/12/06, David Howells <[email protected]> wrote:
> Aubrey <[email protected]> wrote:
>
> > OK. Here is the patch and work properly on my side.
> > Welcome any suggestions and comments.
>
> It looks reasonable. Don't forget to sign off the patch.
>
> > void kmem_cache_init(void)
> > {
> > +#if 0
> > void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
> >
> > if (p)
> > free_page((unsigned long)p);
> > +#endif
>
> Any idea what that's about?
>
IMHO kmem_cache_init() needn't to do anything for the slob allocation.

In addition, the original code allocate one page and then free it.
Here, with the patch, slob_alloc() will set the SLAB flag on the page,
and the page with this flag can't pass the free_pages_check(), if so,
the kernel will run into bad_page() panic.

So, I removed this piece of code for the draft of the patch.
Welcome any better idea.

Regards,
-Aubrey

2006-09-12 17:45:14

by Matt Mackall

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

On Tue, Sep 12, 2006 at 04:07:03PM +0800, Aubrey wrote:
> On 9/6/06, Nick Piggin <[email protected]> wrote:
> >Aubrey wrote:
> >
> >> Yeah, I agree with most of your opinion. Using PG_slab is really a
> >> quickest way to determine the size of the object. But I think using a
> >> flag named "PG_slab" on a memory algorithm named "slob" seems not
> >> reasonable. It may confuse the people who start to read the kernel
> >> source code. So I'm writing to ask if there is a better solution to
> >> fix the issue.
> >
> >
> >No, confusing would be a "slab replacement" that doesn't provide the same
> >API as slab and thus requires users to use ifdefs.
> >
> >I've already suggested exact same thing as David in the exact same
> >situation
> >about 6 months ago. It is the right way to go.
>
> OK. Here is the patch and work properly on my side.
> Welcome any suggestions and comments.
>
> Thanks,
> -Aubrey
> =================================================================
> diff --git a/mm/slob.c b/mm/slob.c
> index 7b52b20..18ed170 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -109,6 +109,7 @@ static void *slob_alloc(size_t size, gfp
>
> slob_free(cur, PAGE_SIZE);
> spin_lock_irqsave(&slob_lock, flags);
> + SetPageSlab(virt_to_page(cur));
> cur = slobfree;
> }
> }
> @@ -162,6 +163,8 @@ void *kmalloc(size_t size, gfp_t gfp)
> slob_t *m;
> bigblock_t *bb;
> unsigned long flags;
> + int i;
> + struct page *page;
>
> if (size < PAGE_SIZE - SLOB_UNIT) {
> m = slob_alloc(size + SLOB_UNIT, gfp, 0);
> @@ -173,12 +176,17 @@ void *kmalloc(size_t size, gfp_t gfp)
> return 0;
>
> bb->order = find_order(size);
> - bb->pages = (void *)__get_free_pages(gfp, bb->order);
> + page = alloc_pages(gfp, bb->order);
> + bb->pages = (void *)page_address(page);

Useless (void *) casts are out of fashion. You shouldn't copy mine.

> if (bb->pages) {
> spin_lock_irqsave(&block_lock, flags);
> bb->next = bigblocks;
> bigblocks = bb;
> + for (i = 0; i < (1 << bb->order); i++) {
> + SetPageSlab(page);
> + page++;
> + }

for ( ; page < page + (1 << bb->order), page++)
SetPageSlab(page);

I'm not sure it even makes sense to mark the bigblock pages,
especially for the purposes of kobjsize. But see below.

> spin_unlock_irqrestore(&block_lock, flags);
> return bb->pages;
> }
> @@ -202,8 +210,16 @@ void kfree(const void *block)
> spin_lock_irqsave(&block_lock, flags);
> for (bb = bigblocks; bb; last = &bb->next, bb = bb->next) {
> if (bb->pages == block) {
> + struct page *page = virt_to_page(bb->pages);
> + int i;
> +
> *last = bb->next;
> spin_unlock_irqrestore(&block_lock, flags);
> + for (i = 0; i < (1 << bb->order); i++) {
> + if (!TestClearPageSlab(page))
> + BUG();
> + page++;
> + }

Please drop the BUG. We've already established it's on our lists by
this point.

> void kmem_cache_init(void)
> {
> +#if 0
> void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
>
> if (p)
> free_page((unsigned long)p);
> +#endif
>
> mod_timer(&slob_timer, jiffies + HZ);
> }

You just broke the bit that shrinks the arena.


I don't like this patch. We've just grown SLOB by about 10% everywhere
to make nommu happy. Is this needed because nommu is doing something
special for MMU-less machines or because it's doing something bogus?

To my eyes, the answer seems to clearly be the latter. There's nothing
nommu-specific about kobjsize at all, so it ought to be in the main
kernel if it's actually kosher. But kobjsize tries to guess the size
of an object of unknown type, which is a very suspicious business. If
you've got a pointer, I would propose you ought to know what type it
is, or ought not mess with it at all.

Looking through all the users of kobjsize, it seems we always know
what the type is (and it's usually a VMA). I instead propose we use
ksize on objects we know to be SLAB/SLOB-allocated and add a new
function (kpagesize?) to size other objects where nommu needs it.

As for whether SLOB should emulate PG_slab on general principles, as
far as I can tell, PG_slab should be considered a private debugging
aid to SLAB. All the other users look rather bogus to my eye.

If anything, SLOB should internally abuse PG_slab to mark big pages
and dispense with its bigblocks lists.

--
Mathematics is the supreme nostalgia of our time.

2006-09-12 19:10:48

by David Howells

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

Matt Mackall <[email protected]> wrote:

> > + for (i = 0; i < (1 << bb->order); i++) {
> > + SetPageSlab(page);
> > + page++;
> > + }
>
> for ( ; page < page + (1 << bb->order), page++)
> SetPageSlab(page);

Ugh. No. You can't do that. "page < page + X" will be true until "page + X"
wraps the end of memory.

> > + for (i = 0; i < (1 << bb->order); i++) {
> > + if (!TestClearPageSlab(page))
> > + BUG();
> > + page++;
> > + }
>
> Please drop the BUG. We've already established it's on our lists by
> this point.

I disagree. Let's catch accidental reuse of pages. It should, however, be
marked unlikely().

> You just broke the bit that shrinks the arena.

How? This is only called once when things are being initialised. There can
be no SLOB objects allocated prior to that point.

> I don't like this patch. We've just grown SLOB by about 10% everywhere
> to make nommu happy. Is this needed because nommu is doing something
> special for MMU-less machines or because it's doing something bogus?

See preceding discussion on LKML.

kobjsize() needs to work out how to calculate the size of an object. One of
the main classes of object it might be given is slab/slob objects. Either
these should be marked with PG_slab as per the slab allocator, of kobjsize()
has to go and walk all the slab allocator or slob allocator lists to find out
whether this page belongs to them. Only then can it know whether or not it
can trust the page metadata as slab/slob metadata or not.

> Looking through all the users of kobjsize, it seems we always know
> what the type is (and it's usually a VMA). I instead propose we use
> ksize on objects we know to be SLAB/SLOB-allocated and add a new
> function (kpagesize?) to size other objects where nommu needs it.

Yes, we might have a VMA, but no, we do not know how big the object we've
_actually_ been given by whoever is. kmalloc() doesn't tell us that and
get_unmapped_area() doesn't tell us that but we want it to account correctly
for the ammount of memory allocated.

You say "use ksize on objects we know to be SLAB/SLOB-allocated" which is the
whole point of PG_slab.

> As for whether SLOB should emulate PG_slab on general principles, as
> far as I can tell, PG_slab should be considered a private debugging
> aid to SLAB. All the other users look rather bogus to my eye.
>
> If anything, SLOB should internally abuse PG_slab to mark big pages
> and dispense with its bigblocks lists.

It's not abuse. SLOB is a drop-in replacement for the SLAB allocator,
therefore PG_slab belongs to it instead of SLAB.


Anyway, if you've got a better idea, then patch please.

David

2006-09-12 19:25:26

by David Howells

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

Matt Mackall <[email protected]> wrote:

> Looking through all the users of kobjsize, it seems we always know
> what the type is (and it's usually a VMA). I instead propose we use
> ksize on objects we know to be SLAB/SLOB-allocated and add a new
> function (kpagesize?) to size other objects where nommu needs it.

It sounds like we'd need an op in the VMA to do the per-type size thing (the
VMA itself not the VMA ops table).

David

2006-09-12 20:30:17

by Matt Mackall

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

On Tue, Sep 12, 2006 at 08:25:04PM +0100, David Howells wrote:
> Matt Mackall <[email protected]> wrote:
>
> > Looking through all the users of kobjsize, it seems we always know
> > what the type is (and it's usually a VMA). I instead propose we use
> > ksize on objects we know to be SLAB/SLOB-allocated and add a new
> > function (kpagesize?) to size other objects where nommu needs it.
>
> It sounds like we'd need an op in the VMA to do the per-type size thing (the
> VMA itself not the VMA ops table).

Not sure yet. There's only one user in nommu.c that shouldn't just be
changed to ksize() that I can see, and that's the one in
show_process_blocks(). That could test for VM_MAPPED_COPY and keep its
hands off otherwise.

I can imagine situations where ->mmap returns pointers to something
that's statically allocated anyway (XIP?), where kobjsize doesn't
really make sense.

Also, looks like the WARN_ON_SLACK code has rotten, result isn't
defined in that function. Change it to base, perhaps?

--
Mathematics is the supreme nostalgia of our time.

2006-09-12 20:51:26

by Matt Mackall

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

On Tue, Sep 12, 2006 at 08:25:04PM +0100, David Howells wrote:
> Matt Mackall <[email protected]> wrote:
>
> > Looking through all the users of kobjsize, it seems we always know
> > what the type is (and it's usually a VMA). I instead propose we use
> > ksize on objects we know to be SLAB/SLOB-allocated and add a new
> > function (kpagesize?) to size other objects where nommu needs it.
>
> It sounds like we'd need an op in the VMA to do the per-type size thing (the
> VMA itself not the VMA ops table).

On looking closer, one place where both the current code and my
proposed change are wrong are measurements on the init task, where
many elements of task_struct are statically allocated.

--
Mathematics is the supreme nostalgia of our time.

2006-09-12 20:52:42

by Matt Mackall

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

On Tue, Sep 12, 2006 at 08:10:32PM +0100, David Howells wrote:
> > You just broke the bit that shrinks the arena.
>
> How? This is only called once when things are being initialised. There can
> be no SLOB objects allocated prior to that point.

It's on a timer.

--
Mathematics is the supreme nostalgia of our time.

2006-09-12 20:57:06

by David Howells

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

Matt Mackall <[email protected]> wrote:

> > > You just broke the bit that shrinks the arena.
> >
> > How? This is only called once when things are being initialised. There can
> > be no SLOB objects allocated prior to that point.
>
> It's on a timer.

So what then? The timer is still initialised:

void kmem_cache_init(void)
{
+#if 0
void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);

if (p)
free_page((unsigned long)p);
+#endif

mod_timer(&slob_timer, jiffies + HZ);
}

David

2006-09-12 21:02:53

by David Howells

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

Matt Mackall <[email protected]> wrote:

> Not sure yet. There's only one user in nommu.c that shouldn't just be
> changed to ksize() that I can see, and that's the one in
> show_process_blocks(). That could test for VM_MAPPED_COPY and keep its
> hands off otherwise.

Hmmm... You're right. However, note binfmt_elf_fdpic(). This calls ksize()
but should really call kobjsize(). It should not assume that the allocation
it's been given is of any particular type. IIRC ksize() changed purpose at
some point.

> I can imagine situations where ->mmap returns pointers to something
> that's statically allocated anyway (XIP?), where kobjsize doesn't
> really make sense.

ramfs for example. See get_unmapped_area() hooks, not mmap() hooks.

> Also, looks like the WARN_ON_SLACK code has rotten, result isn't
> defined in that function. Change it to base, perhaps?

Yeah. It might be worth ditching it entirely too.

David

2006-09-12 21:05:59

by Matt Mackall

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

On Tue, Sep 12, 2006 at 09:56:46PM +0100, David Howells wrote:
> Matt Mackall <[email protected]> wrote:
>
> > > > You just broke the bit that shrinks the arena.
> > >
> > > How? This is only called once when things are being initialised. There can
> > > be no SLOB objects allocated prior to that point.
> >
> > It's on a timer.
>
> So what then? The timer is still initialised:
>
> void kmem_cache_init(void)
> {
> +#if 0
> void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
>
> if (p)
> free_page((unsigned long)p);
> +#endif
>
> mod_timer(&slob_timer, jiffies + HZ);
> }

"allocate a page from the slob arena"
"if successful, release it to the page allocator"
"re-arm timer"

The only tricky part is the timer points back to _this very function_.

--
Mathematics is the supreme nostalgia of our time.

2006-09-12 21:16:39

by Matt Mackall

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

On Tue, Sep 12, 2006 at 10:02:44PM +0100, David Howells wrote:
> Matt Mackall <[email protected]> wrote:
>
> > Not sure yet. There's only one user in nommu.c that shouldn't just be
> > changed to ksize() that I can see, and that's the one in
> > show_process_blocks(). That could test for VM_MAPPED_COPY and keep its
> > hands off otherwise.
>
> Hmmm... You're right. However, note binfmt_elf_fdpic(). This calls ksize()
> but should really call kobjsize(). It should not assume that the allocation
> it's been given is of any particular type.

I presume you mean load_elf_fdpic_binary, which is doing:

fullsize = ksize((char *) current->mm->start_brk);

That's a little troubling.

> IIRC ksize() changed purpose at some point.

Uh, nope. ksize doesn't even exist in 2.4 and has always done the same
thing in 2.6.

--
Mathematics is the supreme nostalgia of our time.

2006-09-12 22:00:09

by David Howells

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

Matt Mackall <[email protected]> wrote:

> The only tricky part is the timer points back to _this very function_.

OIC. Brrr. That _definitely_ needs commenting - as has been demonstrated.
SLOB is using the theoretically one-shot initiator to do garbage collection.

The:

if (size == PAGE_SIZE) /* trying to shrink arena? */
return 0;

In slob_alloc() definitely looks very dodgy, now that I see it. What happens
if some normal user asks to allocate exactly a page? Oh... I suppose it never
gets into slob_alloc() to allocate the main piece of storage.

David

2006-09-13 02:16:20

by Nick Piggin

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

David Howells wrote:
> Matt Mackall <[email protected]> wrote:
>
>
>>>+ for (i = 0; i < (1 << bb->order); i++) {
>>>+ SetPageSlab(page);
>>>+ page++;
>>>+ }
>>
>>for ( ; page < page + (1 << bb->order), page++)
>> SetPageSlab(page);
>
>
> Ugh. No. You can't do that. "page < page + X" will be true until "page + X"
> wraps the end of memory.
>
>
>>>+ for (i = 0; i < (1 << bb->order); i++) {
>>>+ if (!TestClearPageSlab(page))
>>>+ BUG();
>>>+ page++;
>>>+ }
>>
>>Please drop the BUG. We've already established it's on our lists by
>>this point.
>
>
> I disagree. Let's catch accidental reuse of pages. It should, however, be
> marked unlikely().

If you do this, the biggest problem with those ops is that they are atomic,
and the latter also requires strong memory barriers. Don't use RMW variants,
and use __ prepended iff you are the only user of the page at this point.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-09-13 07:21:50

by Aubrey Li

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

I changed my patch with some of your suggestions. Anyway, my system
works properly with the patch while the current kernel does not.

From: Aubrey.Li <[email protected]>
Date: Wed, 13 Sep 2006 15:01:23 +0800
Subject: [PATCH] Make the SLOB allocator mark its pages with PG_slab.
Signed-off-by: Aubrey.Li <[email protected]>
---
mm/slob.c | 20 ++++++++++++++------
1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/mm/slob.c b/mm/slob.c
index 7b52b20..05f0d16 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -109,6 +109,7 @@ static void *slob_alloc(size_t size, gfp

slob_free(cur, PAGE_SIZE);
spin_lock_irqsave(&slob_lock, flags);
+ SetPageSlab(virt_to_page(cur));
cur = slobfree;
}
}
@@ -173,12 +174,17 @@ void *kmalloc(size_t size, gfp_t gfp)
return 0;

bb->order = find_order(size);
- bb->pages = (void *)__get_free_pages(gfp, bb->order);
+ page = alloc_pages(gfp, bb->order);
+ bb->pages = page_address(page);

if (bb->pages) {
spin_lock_irqsave(&block_lock, flags);
bb->next = bigblocks;
bigblocks = bb;
+ for (i = 0; i < (1 << bb->order); i++) {
+ SetPageSlab(page);
+ page++;
+ }
spin_unlock_irqrestore(&block_lock, flags);
return bb->pages;
}
@@ -202,8 +208,15 @@ void kfree(const void *block)
spin_lock_irqsave(&block_lock, flags);
for (bb = bigblocks; bb; last = &bb->next, bb = bb->next) {
if (bb->pages == block) {
+ struct page *page = virt_to_page(bb->pages);
+ int i;
+
*last = bb->next;
spin_unlock_irqrestore(&block_lock, flags);
+ for (i = 0; i < (1 << bb->order); i++) {
+ ClearPageSlab(page);
+ page++;
+ }
free_pages((unsigned long)block, bb->order);
slob_free(bb, sizeof(bigblock_t));
return;
@@ -332,11 +345,6 @@ static struct timer_list slob_timer = TI

void kmem_cache_init(void)
{
- void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
-
- if (p)
- free_page((unsigned long)p);
-
mod_timer(&slob_timer, jiffies + HZ);
}

--
1.4.0


Attachments:
(No filename) (1.93 kB)
0001-Make-the-SLOB-allocator-mark-its-pages-with-PG_slab.txt (1.88 kB)
Download all attachments

2006-09-13 19:40:59

by Matt Mackall

[permalink] [raw]
Subject: Re: kernel BUGs when removing largish files with the SLOB allocator

Another issue that occurred to me last night is that the size of
objects allocated with SLOB's slab-like API are implicit and not
calculable from the object. kmalloc'ed objects, in contrast, have a
header that contains the object size.

So ksize(kmalloc(...)) works, but not ksize(kmem_cache_alloc(...)). I
don't know if anything in the kernel is using the latter aside from
kobjsize.

--
Mathematics is the supreme nostalgia of our time.