2005-09-23 06:14:36

by David Miller

[permalink] [raw]
Subject: making kmalloc BUG() might not be a good idea


I'm sort-of concerned about this change:

[PATCH] __kmalloc: Generate BUG if size requested is too large.

it opens a can of worms, and stuff that used to generate
-ENOMEM kinds of failures will now BUG() the kernel.

Unless you're going to audit every user triggerable
path for proper size limiting, I think we should revert
this change.

Thanks.


2005-09-23 06:30:40

by Nick Piggin

[permalink] [raw]
Subject: Re: making kmalloc BUG() might not be a good idea

David S. Miller wrote:

>I'm sort-of concerned about this change:
>
> [PATCH] __kmalloc: Generate BUG if size requested is too large.
>
>it opens a can of worms, and stuff that used to generate
>-ENOMEM kinds of failures will now BUG() the kernel.
>
>Unless you're going to audit every user triggerable
>path for proper size limiting, I think we should revert
>this change.
>
>Thanks.
>
>

Making it WARN might be a good compromise.


Send instant messages to your online friends http://au.messenger.yahoo.com

2005-09-23 06:54:19

by David Miller

[permalink] [raw]
Subject: Re: making kmalloc BUG() might not be a good idea

From: Nick Piggin <[email protected]>
Date: Fri, 23 Sep 2005 16:30:33 +1000

> David S. Miller wrote:
>
> >I'm sort-of concerned about this change:
> >
> > [PATCH] __kmalloc: Generate BUG if size requested is too large.
> >
> >it opens a can of worms, and stuff that used to generate
> >-ENOMEM kinds of failures will now BUG() the kernel.
> >
> >Unless you're going to audit every user triggerable
> >path for proper size limiting, I think we should revert
> >this change.
>
> Making it WARN might be a good compromise.

It's a better, but it's still turning a harmless -ENOMEM
into a DoS log file filler for the cases where a size limit
check is missing and is user triggerable.

Another idea is to put it under CONFIG_DEBUG or something.

2005-09-23 07:10:05

by Ingo Oeser

[permalink] [raw]
Subject: Re: making kmalloc BUG() might not be a good idea

Hi,

On Friday 23 September 2005 08:30, Nick Piggin wrote:
> David S. Miller wrote:
> >I'm sort-of concerned about this change:
> >
> > [PATCH] __kmalloc: Generate BUG if size requested is too large.
> >
> >it opens a can of worms, and stuff that used to generate
> >-ENOMEM kinds of failures will now BUG() the kernel.
> Making it WARN might be a good compromise.

Which has the potential to spam the logs with a user triggerable event
without even killing the responsible process.
Same problem, just worse.

I could live with a solution that enables it based on a config.

KERNEL_HACKING is no such config. That feature is almost always
enabled, because MAGIC_SYSRQ depends on it and a significant amount
of Linux-Admins like it for a "sync, remount ro and reboot" sequence.
So you need a new one.


Regards

Ingo Oeser



Attachments:
(No filename) (827.00 B)
(No filename) (189.00 B)
Download all attachments

2005-09-23 07:57:17

by Nick Piggin

[permalink] [raw]
Subject: Re: making kmalloc BUG() might not be a good idea

Ingo Oeser wrote:
> Hi,
>
> On Friday 23 September 2005 08:30, Nick Piggin wrote:
>
>>David S. Miller wrote:
>>
>>>I'm sort-of concerned about this change:
>>>
>>> [PATCH] __kmalloc: Generate BUG if size requested is too large.
>>>
>>>it opens a can of worms, and stuff that used to generate
>>>-ENOMEM kinds of failures will now BUG() the kernel.
>>
>>Making it WARN might be a good compromise.
>
>
> Which has the potential to spam the logs with a user triggerable event
> without even killing the responsible process.
> Same problem, just worse.
>

As opposed to potentially taking the system down? I don't
think so.

> I could live with a solution that enables it based on a config.
>

Then you'll get people not enabling it on real workloads, or
tuning it off if it bugs them. No, the point of having a WARN
there is really for people like SGI to detect a few rare failure
cases when they first boot up their 1024+ CPU systems. It is not
going to spam anyone's logs (and if it does it *needs* fixing).

What you don't want is to kill the responsible process, because
at that point they're deep in the kernel, probably holding other
locks and resources.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-09-23 08:09:46

by David Miller

[permalink] [raw]
Subject: Re: making kmalloc BUG() might not be a good idea

From: Nick Piggin <[email protected]>
Date: Fri, 23 Sep 2005 17:58:00 +1000

> Then you'll get people not enabling it on real workloads, or
> tuning it off if it bugs them. No, the point of having a WARN
> there is really for people like SGI to detect a few rare failure
> cases when they first boot up their 1024+ CPU systems. It is not
> going to spam anyone's logs (and if it does it *needs* fixing).

SGI (and people "like" them) can't enable a debug option when bringing
up new changes for the first time on that huge system? Why is this?

What in the world are all these CONFIG_*DEBUG* options for then?
They are there for "I'm doing something radically new, or my new
change isn't working, therefore I need more debugging than usual."

We want it to spam the logs, sure, during _development_. We don't
want it on production systems where any kind of downtime is a very
serious problem. Rate limited, maybe, but not for every call as
that's simply asking for trouble.

This is why we have things like net_ratelimit() in the networking btw.
It's there so you can't remotely spam someone's logs just becuase you
figured out the "bug of the week" magic packet that erroneously
generates a huge number of log messages.

If we know how to make certain classes of bugs non-lethal, we should
do so because there will always be bugs. :-) This change makes
previously non-lethal bugs potentially kill the machine.

2005-09-23 09:02:54

by Nick Piggin

[permalink] [raw]
Subject: Re: making kmalloc BUG() might not be a good idea

David S. Miller wrote:
> From: Nick Piggin <[email protected]>
> Date: Fri, 23 Sep 2005 17:58:00 +1000
>

> If we know how to make certain classes of bugs non-lethal, we should
> do so because there will always be bugs. :-) This change makes
> previously non-lethal bugs potentially kill the machine.
>

Oh the BUG is bad, sure. I just thought WARN would be a better _compromise_
than BUG in that it will achieve the same result without takeing the machine
down.

I think the CONFIG_DEBUG options are there for some major types of debugging
that require significant infrastructure or can slow down the kernel quite
a lot. With that said, I think there is an option somewhere to turn off all
WARNs and remove strings from all BUGs.

Regarding proliferation of assertions and warnings everywhere - without any
official standard, I think we're mostly being sensible with them (at least
in the core code that I look at). A warn in kmalloc for this wouldn't be
anything radical.

I don't much care for it, but I agree the BUG has to go.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-09-23 09:17:47

by Coywolf Qi Hunt

[permalink] [raw]
Subject: Re: making kmalloc BUG() might not be a good idea

On 9/23/05, Nick Piggin <[email protected]> wrote:
> David S. Miller wrote:
> > From: Nick Piggin <[email protected]>
> > Date: Fri, 23 Sep 2005 17:58:00 +1000
> >
>
> > If we know how to make certain classes of bugs non-lethal, we should
> > do so because there will always be bugs. :-) This change makes
> > previously non-lethal bugs potentially kill the machine.
> >
>
> Oh the BUG is bad, sure. I just thought WARN would be a better _compromise_
> than BUG in that it will achieve the same result without takeing the machine
> down.
>
> I think the CONFIG_DEBUG options are there for some major types of debugging
> that require significant infrastructure or can slow down the kernel quite
> a lot. With that said, I think there is an option somewhere to turn off all
> WARNs and remove strings from all BUGs.
>
> Regarding proliferation of assertions and warnings everywhere - without any
> official standard, I think we're mostly being sensible with them (at least
> in the core code that I look at). A warn in kmalloc for this wouldn't be
> anything radical.
>
> I don't much care for it, but I agree the BUG has to go.
>

Nice to see: + revert-oversized-kmalloc-check.patch added to -mm tree
--
Coywolf Qi Hunt
http://sosdg.org/~coywolf/

2005-09-23 15:58:52

by Christoph Lameter

[permalink] [raw]
Subject: Make kzalloc a macro

How about this patch making kzalloc a macro?

---

Make kzalloc a macro and use __GFP_ZERO for zeroed slab allocations

kzalloc is right now a function call. The optimization that the kmalloc macro
provides does not work for kzalloc invocations. kmalloc also determines the
slab to use at compile time and fails the compilation if the size is too big.
kzalloc cannot do not.

Moreover there is no kzalloc_node.

This patch adds the ability to the slab allocator to indicate that an entity
should be zeroed by using __GFP_ZERO in the same way to the page allocator.

__GFP_ZERO may be specified when using any slab allocation operation.

Then kzalloc can be defined as a simple macro which will then perform all the
compile time checks of kmalloc and also check the sizes.

Signed-off-by: Christoph Lameter

Index: linux-2.6.14-rc2/include/linux/slab.h
===================================================================
--- linux-2.6.14-rc2.orig/include/linux/slab.h 2005-09-19 20:00:41.000000000 -0700
+++ linux-2.6.14-rc2/include/linux/slab.h 2005-09-22 16:25:25.000000000 -0700
@@ -99,7 +99,7 @@ found:
return __kmalloc(size, flags);
}

-extern void *kzalloc(size_t, unsigned int __nocast);
+#define kzalloc(__size, __flags) kmalloc(__size, (__flags) | __GFP_ZERO)

/**
* kcalloc - allocate memory for an array. The memory is set to zero.
Index: linux-2.6.14-rc2/mm/slab.c
===================================================================
--- linux-2.6.14-rc2.orig/mm/slab.c 2005-09-22 11:58:45.000000000 -0700
+++ linux-2.6.14-rc2/mm/slab.c 2005-09-23 08:53:16.000000000 -0700
@@ -1190,7 +1190,7 @@ static void *kmem_getpages(kmem_cache_t
void *addr;
int i;

- flags |= cachep->gfpflags;
+ flags = (flags | cachep->gfpflags) & ~__GFP_ZERO;
if (likely(nodeid == -1)) {
page = alloc_pages(flags, cachep->gfporder);
} else {
@@ -2508,11 +2508,23 @@ cache_alloc_debugcheck_after(kmem_cache_
#define cache_alloc_debugcheck_after(a,b,objp,d) (objp)
#endif

+static inline void *obj_checkout(kmem_cache_t *cachep, unsigned int __nocast flags, void *objp)
+{
+ if (likely(objp)) {
+ objp = cache_alloc_debugcheck_after(cachep, flags, objp,
+ __builtin_return_address(0));
+ if (unlikely(flags & __GFP_ZERO))
+ memset(objp, 0, obj_reallen(cachep));
+ else
+ prefetchw(objp);
+ }
+ return objp;
+}

static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
{
unsigned long save_flags;
- void* objp;
+ void *objp;
struct array_cache *ac;

cache_alloc_debugcheck_before(cachep, flags);
@@ -2528,10 +2540,7 @@ static inline void *__cache_alloc(kmem_c
objp = cache_alloc_refill(cachep, flags);
}
local_irq_restore(save_flags);
- objp = cache_alloc_debugcheck_after(cachep, flags, objp,
- __builtin_return_address(0));
- prefetchw(objp);
- return objp;
+ return obj_checkout(cachep, flags, objp);
}

#ifdef CONFIG_NUMA
@@ -2550,14 +2559,23 @@ static void *__cache_alloc_node(kmem_cac
l3 = cachep->nodelists[nodeid];
BUG_ON(!l3);

+ cache_alloc_debugcheck_before(cachep, flags);
+
retry:
spin_lock(&l3->list_lock);
entry = l3->slabs_partial.next;
if (entry == &l3->slabs_partial) {
l3->free_touched = 1;
entry = l3->slabs_free.next;
- if (entry == &l3->slabs_free)
- goto must_grow;
+ if (entry == &l3->slabs_free) {
+ spin_unlock(&l3->list_lock);
+ x = cache_grow(cachep, flags, nodeid);
+
+ if (!x)
+ return NULL;
+
+ goto retry;
+ }
}

slabp = list_entry(entry, struct slab, list);
@@ -2590,18 +2608,7 @@ retry:
}

spin_unlock(&l3->list_lock);
- goto done;
-
-must_grow:
- spin_unlock(&l3->list_lock);
- x = cache_grow(cachep, flags, nodeid);
-
- if (!x)
- return NULL;
-
- goto retry;
-done:
- return obj;
+ return obj_checkout(cachep, flags, obj);
}
#endif

@@ -2980,20 +2987,6 @@ void kmem_cache_free(kmem_cache_t *cache
EXPORT_SYMBOL(kmem_cache_free);

/**
- * kzalloc - allocate memory. The memory is set to zero.
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate.
- */
-void *kzalloc(size_t size, unsigned int __nocast flags)
-{
- void *ret = kmalloc(size, flags);
- if (ret)
- memset(ret, 0, size);
- return ret;
-}
-EXPORT_SYMBOL(kzalloc);
-
-/**
* kfree - free previously allocated memory
* @objp: pointer returned by kmalloc.
*

2005-09-23 17:50:10

by Christoph Lameter

[permalink] [raw]
Subject: Re: Make kzalloc a macro

Next rev. There was an issue with kmalloc_node:

---
Make kzalloc a macro and use __GFP_ZERO for zeroed slab allocations

kzalloc is right now a function call. The optimization that the kmalloc macro
provides does not work for kzalloc invocations. kmalloc also determines the
slab to use at compile time and fails the compilation if the size is too big.
kzalloc cannot do not.

Moreover there is no kzalloc_node.

This patch adds the ability to the slab allocator to indicate that an entity
should be zeroed by using __GFP_ZERO in the same way to the page allocator.

__GFP_ZERO may be specified when using any slab allocation operation.

Then kzalloc can be defined as a simple macro which will then perform all the
compile time checks of kmalloc and also check the sizes.

Signed-off-by: Christoph Lameter

Index: linux-2.6.14-rc2/include/linux/slab.h
===================================================================
--- linux-2.6.14-rc2.orig/include/linux/slab.h 2005-09-19 20:00:41.000000000 -0700
+++ linux-2.6.14-rc2/include/linux/slab.h 2005-09-23 10:17:20.000000000 -0700
@@ -99,7 +99,7 @@ found:
return __kmalloc(size, flags);
}

-extern void *kzalloc(size_t, unsigned int __nocast);
+#define kzalloc(__size, __flags) kmalloc(__size, (__flags) | __GFP_ZERO)

/**
* kcalloc - allocate memory for an array. The memory is set to zero.
Index: linux-2.6.14-rc2/mm/slab.c
===================================================================
--- linux-2.6.14-rc2.orig/mm/slab.c 2005-09-23 10:17:20.000000000 -0700
+++ linux-2.6.14-rc2/mm/slab.c 2005-09-23 10:17:20.000000000 -0700
@@ -1191,7 +1191,7 @@ static void *kmem_getpages(kmem_cache_t
void *addr;
int i;

- flags |= cachep->gfpflags;
+ flags = (flags | cachep->gfpflags) & ~__GFP_ZERO;
if (likely(nodeid == -1)) {
page = alloc_pages(flags, cachep->gfporder);
} else {
@@ -2510,11 +2510,21 @@ cache_alloc_debugcheck_after(kmem_cache_
#define cache_alloc_debugcheck_after(a,b,objp,d) (objp)
#endif

+static inline void *obj_checkout(kmem_cache_t *cachep, unsigned int __nocast flags, void *objp)
+{
+ if (likely(objp)) {
+ if (unlikely(flags & __GFP_ZERO))
+ memset(objp, 0, obj_reallen(cachep));
+ else
+ prefetchw(objp);
+ }
+ return objp;
+}

static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
{
unsigned long save_flags;
- void* objp;
+ void *objp;
struct array_cache *ac;

cache_alloc_debugcheck_before(cachep, flags);
@@ -2532,8 +2542,7 @@ static inline void *__cache_alloc(kmem_c
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp,
__builtin_return_address(0));
- prefetchw(objp);
- return objp;
+ return obj_checkout(cachep, flags, objp);
}

#ifdef CONFIG_NUMA
@@ -2558,8 +2567,15 @@ retry:
if (entry == &l3->slabs_partial) {
l3->free_touched = 1;
entry = l3->slabs_free.next;
- if (entry == &l3->slabs_free)
- goto must_grow;
+ if (entry == &l3->slabs_free) {
+ spin_unlock(&l3->list_lock);
+ x = cache_grow(cachep, flags, nodeid);
+
+ if (!x)
+ return NULL;
+
+ goto retry;
+ }
}

slabp = list_entry(entry, struct slab, list);
@@ -2592,18 +2608,7 @@ retry:
}

spin_unlock(&l3->list_lock);
- goto done;
-
-must_grow:
- spin_unlock(&l3->list_lock);
- x = cache_grow(cachep, flags, nodeid);
-
- if (!x)
- return NULL;
-
- goto retry;
-done:
- return obj;
+ return obj_checkout(cachep, flags, obj);
}
#endif

@@ -2981,20 +2986,6 @@ void kmem_cache_free(kmem_cache_t *cache
EXPORT_SYMBOL(kmem_cache_free);

/**
- * kzalloc - allocate memory. The memory is set to zero.
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate.
- */
-void *kzalloc(size_t size, unsigned int __nocast flags)
-{
- void *ret = kmalloc(size, flags);
- if (ret)
- memset(ret, 0, size);
- return ret;
-}
-EXPORT_SYMBOL(kzalloc);
-
-/**
* kfree - free previously allocated memory
* @objp: pointer returned by kmalloc.
*

2005-09-23 22:37:00

by Andrew Morton

[permalink] [raw]
Subject: Re: Make kzalloc a macro

Christoph Lameter <[email protected]> wrote:
>
> Make kzalloc a macro and use __GFP_ZERO for zeroed slab allocations
>

I'd question the usefulness of this. It adds more code to a fastpath
(kmem_cache_alloc) so as to speed up a slowpath (kzalloc()) which is
already slow due to its memset.

It makes my kernel a bit fatter too - 150-odd bytes of text for some
reason.

2005-09-24 10:53:36

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Make kzalloc a macro

On Friday 23 September 2005 18:58, Christoph Lameter wrote:
> How about this patch making kzalloc a macro?
>
> ---
>
> Make kzalloc a macro and use __GFP_ZERO for zeroed slab allocations
>
> kzalloc is right now a function call. The optimization that the kmalloc macro
> provides does not work for kzalloc invocations. kmalloc also determines the
> slab to use at compile time and fails the compilation if the size is too big.
> kzalloc cannot do not.
>
>
> -extern void *kzalloc(size_t, unsigned int __nocast);
> +#define kzalloc(__size, __flags) kmalloc(__size, (__flags) | __GFP_ZERO)

Why macro and not an inline function?

> +static inline void *obj_checkout(kmem_cache_t *cachep, unsigned int __nocast flags, void *objp)
> +{
> + if (likely(objp)) {
> + objp = cache_alloc_debugcheck_after(cachep, flags, objp,
> + __builtin_return_address(0));
> + if (unlikely(flags & __GFP_ZERO))

Why unlikely?

> + memset(objp, 0, obj_reallen(cachep));
> + else
> + prefetchw(objp);
> + }
> + return objp;
> +}
--
vda

2005-09-26 17:38:23

by Christoph Lameter

[permalink] [raw]
Subject: Re: Make kzalloc a macro

On Fri, 23 Sep 2005, Andrew Morton wrote:

> I'd question the usefulness of this. It adds more code to a fastpath
> (kmem_cache_alloc) so as to speed up a slowpath (kzalloc()) which is
> already slow due to its memset.

It tries to unify both and make usage consistent over the allocator
functions in slab.c. kzalloc essentially vanishes.

> It makes my kernel a bit fatter too - 150-odd bytes of text for some
> reason.

Yes the inline function doubles the code generated for obj_checkout
because it occurs in __cache_alloc and __cache_alloc_node.
__cache_alloc is also an inline and is expanded three times.

Removing the inline from __cache_alloc could reduced code size.