2007-06-01 07:28:39

by Srinivasa Ds

[permalink] [raw]
Subject: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning


When number of tasks assigned to the specified cpuset becomes zero and
if we try to read tasks file, We get this warning on 2.6.22-rc3 kernel.

=======================================================
------------[ cut here ]------------
Badness at mm/slab.c:777
Call Trace:
[c000000071ebb4a0] [c00000000000eb30] .show_stack+0x68/0x1b0 (unreliable)
[c000000071ebb540] [c00000000021d7b0] .report_bug+0x94/0xe8
[c000000071ebb5d0] [c00000000048eb4c] __kprobes_text_start+0x164/0x5a0
[c000000071ebb650] [c000000000004a84] program_check_common+0x104/0x180
--- Exception: 700 at .__kmalloc+0x44/0x178
LR = .cpuset_tasks_open+0x90/0x1e8
[c000000071ebb940] [c000000071ebb9e0] 0xc000000071ebb9e0 (unreliable)
[c000000071ebb9e0] [c0000000000812b0] .cpuset_tasks_open+0x90/0x1e8
[c000000071ebbaa0] [c0000000000811fc] .cpuset_file_open+0x70/0x94
[c000000071ebbb30] [c0000000000c2578] .__dentry_open+0x13c/0x278
[c000000071ebbbe0] [c0000000000c2828] .do_filp_open+0x50/0x70
[c000000071ebbd00] [c0000000000c28bc] .do_sys_open+0x74/0x130
[c000000071ebbdb0] [c000000000101108] .compat_sys_open+0x24/0x38
[c000000071ebbe30] [c00000000000862c] syscall_exit+0x0/0x40
=====================================================

This is because "npids"(represnets number of pids in that cpuset) in
"cpu_task_open" is zero and it tries allocate 0 bytes through kmalloc.

Below patch fixes this problem. Please let me know your comments on this.

Signed-off-by: Srinivasa DS <[email protected]>



Attachments:
cpuset.patch (669.00 B)

2007-06-01 10:50:31

by Srinivasa Ds

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Friday 01 June 2007 12:57:53 Srinivasa Ds wrote:

I have modified the patch little bit, Thanks Vatsa for input and hence
resending it. Please let me know your comments on this.

Signed-off-by: Srinivasa DS <[email protected]>

---
kernel/cpuset.c | 10 ++++++++++
1 file changed, 10 insertions(+)

Index: linux-2.6.22-rc3/kernel/cpuset.c
===================================================================
--- linux-2.6.22-rc3.orig/kernel/cpuset.c
+++ linux-2.6.22-rc3/kernel/cpuset.c
@@ -1741,6 +1741,13 @@ static int cpuset_tasks_open(struct inod
* show up until sometime later on.
*/
npids = atomic_read(&cs->count);
+ if (!npids) {
+ ctr->buf = NULL;
+ ctr->bufsz = 0;
+ file->private_data = ctr;
+ return 0;
+ }
+
pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL);
if (!pidarray)
goto err1;
@@ -1772,6 +1779,9 @@ static ssize_t cpuset_tasks_read(struct
{
struct ctr_struct *ctr = file->private_data;

+ if (!ctr->buf) /* No tasks in this cpuset */
+ return 0;
+
return simple_read_from_buffer(buf, nbytes, ppos, ctr->buf, ctr->bufsz);
}



2007-06-01 18:13:54

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Srinivasa Ds wrote:

> I have modified the patch little bit, Thanks Vatsa for input and hence
> resending it. Please let me know your comments on this.

Hmmmm.... Isnt it easier to set the pidarray to NULL in the case it
has no objects? Then any deferences will fail. Paul?


---
kernel/cpuset.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/cpuset.c
===================================================================
--- linux-2.6.orig/kernel/cpuset.c 2007-06-01 11:09:39.000000000 -0700
+++ linux-2.6/kernel/cpuset.c 2007-06-01 11:11:59.000000000 -0700
@@ -1723,7 +1723,7 @@ static int cpuset_tasks_open(struct inod
{
struct cpuset *cs = __d_cs(file->f_path.dentry->d_parent);
struct ctr_struct *ctr;
- pid_t *pidarray;
+ pid_t *pidarray = NULL;
int npids;
char c;

@@ -1741,9 +1741,11 @@ static int cpuset_tasks_open(struct inod
* show up until sometime later on.
*/
npids = atomic_read(&cs->count);
- pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL);
- if (!pidarray)
- goto err1;
+ if (npids)
+ pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL);
+ if (!pidarray)
+ goto err1;
+ }

npids = pid_array_load(pidarray, npids, cs);
sort(pidarray, npids, sizeof(pid_t), cmppid, NULL);

2007-06-01 19:11:24

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

Christoph wrote:
> Hmmmm.... Isnt it easier to set the pidarray to NULL in the case it
> has no objects? Then any deferences will fail. Paul?

>From what I can tell, in 30 seconds of reading, either of these
approaches, Christoph's or Srinivasa's, based on Vatsa's suggestions
-could- be made to work.

I don't understand how Christoph's approach works, as stated, however.

What do you mean, Christoph, by "then any dereferences will fail" ?
That sounds bad to me -- we don't want pointer dereferences failing
in the kernel, do we? Wouldn't you have to add some more lines of
code, a bit further down, where pidarray is used, to avoid trying to
use it if its NULL?

I think that Srinivasa's point is that it is legitimate for this array
to be empty, and that we have to code for that case, one way or another.
Now that we can't simply kmalloc zero bytes and get away with it, that
requires explicit tests in the code for the empty case, one way or
another. This sounds like a valid point to me.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2007-06-01 19:18:40

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Paul Jackson wrote:

> What do you mean, Christoph, by "then any dereferences will fail" ?
> That sounds bad to me -- we don't want pointer dereferences failing
> in the kernel, do we? Wouldn't you have to add some more lines of
> code, a bit further down, where pidarray is used, to avoid trying to
> use it if its NULL?

The allocated array has 0 bytes. Any dereference is an error.

2007-06-01 19:48:14

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

> The allocated array has 0 bytes. Any dereference is an error.

Well, then, that doesn't work either.

We -agree- we can't have this failing. You suggested (if I understood
correctly, which I doubt) that we set the pointer to NULL, and noted
that derefencing a NULL pointer would fail. I agree such derefences
would fail, and tried to point out that this would be bad. We can't
have pointer dereferences failing in the kernel ... duh.

Your reply seems like a non sequiter to me, pointing out that having
a pointer to an array of 0 bytes fails as well. Ok - that's bad too.

So we cannot have a NULL pointer, used unchecked, and we cannot have
a non-NULL pointer to a zero byte array, used unchecked.

As I said, and as I thought Srinivasa coded, with these patch lines:

@@ -1772,6 +1779,9 @@ static ssize_t cpuset_tasks_read(struct
{
struct ctr_struct *ctr = file->private_data;

+ if (!ctr->buf) /* No tasks in this cpuset */
+ return 0;
+
return simple_read_from_buffer(buf, nbytes, ppos, ctr->buf, ctr->bufsz);
}

we actually have to check the NULL-ness (or emptiness) of this thing,
down where it would be used, to avoid using NULL or empty things.

Summary - Srinivasa's patch still looks ok to me. This followup
discussion between you and I is just confusing me - sorry.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2007-06-01 19:51:42

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Paul Jackson wrote:

> > The allocated array has 0 bytes. Any dereference is an error.
>
> Well, then, that doesn't work either.
>
> We -agree- we can't have this failing. You suggested (if I understood
> correctly, which I doubt) that we set the pointer to NULL, and noted
> that derefencing a NULL pointer would fail. I agree such derefences
> would fail, and tried to point out that this would be bad. We can't
> have pointer dereferences failing in the kernel ... duh.
>
> Your reply seems like a non sequiter to me, pointing out that having
> a pointer to an array of 0 bytes fails as well. Ok - that's bad too.

Think about it some more...

> So we cannot have a NULL pointer, used unchecked, and we cannot have
> a non-NULL pointer to a zero byte array, used unchecked.

We must have a NULL pointer exactly because it cannot be dereferenced. It
is a safety precaution that the following code will not deference the
pointer to the array of zero size. Because attempting to access an object
in an array with zero object in them is an error.

2007-06-01 20:02:22

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

> We must have a NULL pointer exactly because it cannot be dereferenced.

Well, both patch versions had NULL pointers - either pidarray or ctr->buf.

But now I can make more sense of what you say -- you -want- the pidarray
pointer, in particular, to be NULL, so that we don't accidentally use it.

Does that still mean that your patch suggestion was incomplete, in that
it lacked the additional checks to avoid using a NULL pidarray?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2007-06-01 20:07:01

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Paul Jackson wrote:

> > We must have a NULL pointer exactly because it cannot be dereferenced.
>
> Well, both patch versions had NULL pointers - either pidarray or ctr->buf.
>
> But now I can make more sense of what you say -- you -want- the pidarray
> pointer, in particular, to be NULL, so that we don't accidentally use it.
>
> Does that still mean that your patch suggestion was incomplete, in that
> it lacked the additional checks to avoid using a NULL pidarray?

There are no checks necessary. Your function worked fine so far for
the case of zero objects with the pointer returned by kmalloc. If the
code is correct then it will not dereference the pointer to the zero
sized array. If not then we may find a bug and fix it.

2007-06-01 20:20:05

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

> There are no checks necessary. Your function worked fine so far for
> the case of zero objects with the pointer returned by kmalloc. If the
> code is correct then it will not dereference the pointer to the zero
> sized array. If not then we may find a bug and fix it.

I suspect you got lucky. The check for a full pidarray[] in the routine
pid_array_load() occurs -after- a pid is put in the array. If a task
showed up in this cpuset at the wrong time, we would fall over and die
in the code:

static int pid_array_load(pid_t *pidarray, int npids, struct cpuset *cs)
{
int n = 0;
struct task_struct *g, *p;

read_lock(&tasklist_lock);

do_each_thread(g, p) {
if (p->cpuset == cs) {
pidarray[n++] = p->pid; /* Death if pidarray == NULL */
if (unlikely(n == npids))
goto array_full;
}

} while_each_thread(g, p);

Perhaps if you moved the "if (unlikely(n == npids))" test before the
"pidarray[n++] = p->pid" assignment, it would be safe.

And does the next line of code, the call to sort() after the call of
pid_array_load(), work with pidarray == NULL and npids == 0:

npids = pid_array_load(pidarray, npids, cs);
sort(pidarray, npids, sizeof(pid_t), cmppid, NULL); /* <== ?? */

Off hand, I didn't know. I guess it must, or you would have already
tripped over it.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2007-06-01 20:30:41

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

Srinivasa Ds wrote:
> ---
> kernel/cpuset.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> Index: linux-2.6.22-rc3/kernel/cpuset.c
> ===================================================================
> --- linux-2.6.22-rc3.orig/kernel/cpuset.c
> +++ linux-2.6.22-rc3/kernel/cpuset.c
> @@ -1741,6 +1741,13 @@ static int cpuset_tasks_open(struct inod
> * show up until sometime later on.
> */
> npids = atomic_read(&cs->count);
> + if (!npids) {
> + ctr->buf = NULL;
> + ctr->bufsz = 0;
> + file->private_data = ctr;
> + return 0;
> + }
> +
>

I think this is a good example of why having to special-case kmalloc(0)
is a bad idea. The original code was straightforward and, barring
silliness, should be completely correct with npids==0. This new code
does nothing other than make things more complex.

J

2007-06-01 20:43:19

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Paul Jackson wrote:

> > There are no checks necessary. Your function worked fine so far for
> > the case of zero objects with the pointer returned by kmalloc. If the
> > code is correct then it will not dereference the pointer to the zero
> > sized array. If not then we may find a bug and fix it.
>
> I suspect you got lucky. The check for a full pidarray[] in the routine
> pid_array_load() occurs -after- a pid is put in the array. If a task
> showed up in this cpuset at the wrong time, we would fall over and die
> in the code:

Then you are deferencing an element in the pidarray that you did not
allocate! This is a bug in cpuset code. So we would need to allocate at
mininum one array element? Or would we need to allocate npids + 1 to be
safe???


---
kernel/cpuset.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/cpuset.c
===================================================================
--- linux-2.6.orig/kernel/cpuset.c 2007-06-01 13:41:24.000000000 -0700
+++ linux-2.6/kernel/cpuset.c 2007-06-01 13:42:08.000000000 -0700
@@ -1741,7 +1741,7 @@ static int cpuset_tasks_open(struct inod
* show up until sometime later on.
*/
npids = atomic_read(&cs->count);
- pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL);
+ pidarray = kmalloc(max(1, npids) * sizeof(pid_t), GFP_KERNEL);
if (!pidarray)
goto err1;

2007-06-01 20:44:20

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

> I think this is a good example of why having to special-case kmalloc(0)
> is a bad idea. The original code was straightforward and, barring
> silliness, should be completely correct with npids==0. This new code
> does nothing other than make things more complex.

Perhaps so. Perhaps not.

Actually, the original cpuset pid_array_load() code was not correct,
and Christoph's work is smoking this out.

The original code assumed that it could access the first element of the
kmalloc'd array, before it checked if it could go even further.

But with the kmalloc call of:

kmalloc(npids * sizeof(pid_t), GFP_KERNEL);

it is impossible for kmalloc to know how much to allocate, to guarantee
such behaviour. If npids is zero, all kmalloc sees, in essence, is the
call:

kmalloc(0, GFP_KERNEL);

There is no way it can guess that I might still want to access the
first "size(pid_t)" bytes of whatever it returned.

If we had a different sort of kmalloc API, such as would be called with:

kmalloc(npids, sizeof(pid_t), GFP_KERNEL);

rather like the libc calloc() routine, having separate count and size
fields, then in theory, kmalloc could treat calls of the form:

kmalloc(0, sz, GFP_KERNEL); /* for sz > 0 */

as if they were actually:

kmalloc(1, sz, GFP_KERNEL);

and then my cpuset code might not have been broken. But, since
some of us can never remember whether it is the size or the count
that comes first in such an API, we would still have had such
bugs -- just fewer, for those who got those arguments backwards,
and finally tripped over this anyway.

That doesn't address the other likely broken code, elsewhere in
the kernel somewhere that happened to try to access the 2nd or
3rd element of such an empty array before noticing.

So ... while I started this reply agreeing with you, I end up still
agreeing with Christoph's changes here.

Thanks, Christoph, for finding a bug in the cpuset code - good work!

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2007-06-01 20:47:32

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Jeremy Fitzhardinge wrote:

> I think this is a good example of why having to special-case kmalloc(0)
> is a bad idea. The original code was straightforward and, barring
> silliness, should be completely correct with npids==0. This new code
> does nothing other than make things more complex.

Hehe we got you. The code is indexing the pidarray allocated with
kmalloc(0). So it uncovered a latent bug. It only worked because SLAB gave
him 32 bytes and it now only works because SLUB give him 8. That is enough
to illegally index the first array element.


2007-06-01 20:54:40

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

Christoph wrote:
> Then you are deferencing an element in the pidarray that you did not
> allocate! This is a bug in cpuset code.

Absolutely - as I described in more detail in a reply Jeremy a
few minutes ago. Thanks for smoking it out.

If, as I suggested in my previous message to which you are
responding:

> Perhaps if you moved the "if (unlikely(n == npids))" test before the
> "pidarray[n++] = p->pid" assignment, it would be safe.

then it looks safe to me, without having to add the bogus "+1" to
the allocated size. That is, I suspect the following patch fixes
this long standing cpuset bug (warning - white space mangled):

--- kernel/cpuset.c 2006-12-10 12:27:37.000000000 -0800
+++ kernel/cpuset.c.new 2007-06-01 13:53:00.271010074 -0700
@@ -1661,9 +1661,9 @@ static int pid_array_load(pid_t *pidarra

do_each_thread(g, p) {
if (p->cpuset == cs) {
- pidarray[n++] = p->pid;
if (unlikely(n == npids))
goto array_full;
+ pidarray[n++] = p->pid;
}
} while_each_thread(g, p);


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2007-06-01 20:56:56

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

Christoph Lameter wrote:
> On Fri, 1 Jun 2007, Jeremy Fitzhardinge wrote:
>
>
>> I think this is a good example of why having to special-case kmalloc(0)
>> is a bad idea. The original code was straightforward and, barring
>> silliness, should be completely correct with npids==0. This new code
>> does nothing other than make things more complex.
>>
>
> Hehe we got you.

Bugger ;) Well, my get-out clause was "barring silliness".

J

2007-06-01 20:59:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007 13:47:23 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> On Fri, 1 Jun 2007, Jeremy Fitzhardinge wrote:
>
> > I think this is a good example of why having to special-case kmalloc(0)
> > is a bad idea. The original code was straightforward and, barring
> > silliness, should be completely correct with npids==0. This new code
> > does nothing other than make things more complex.
>
> Hehe we got you. The code is indexing the pidarray allocated with
> kmalloc(0). So it uncovered a latent bug. It only worked because SLAB gave
> him 32 bytes and it now only works because SLUB give him 8. That is enough
> to illegally index the first array element.
>

Poisoning and redzoning could have caught that.

But I guess it doesn't matter now, as this shortcoming is specific to
the zero-length allocations, and we're weeding those out anyway.

2007-06-01 21:45:41

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Andrew Morton wrote:

> Poisoning and redzoning could have caught that.

Redzoning would not have caught it. This was a kmalloc allocation and
SLAB always gave them 32 bytes to play with. Only writes more than 32
bytes behind would have been caught.

Poisoning is only applicable to unallocated objects and these were
allocated.

2007-06-01 22:17:23

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007 14:45:27 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> On Fri, 1 Jun 2007, Andrew Morton wrote:
>
> > Poisoning and redzoning could have caught that.
>
> Redzoning would not have caught it. This was a kmalloc allocation and
> SLAB always gave them 32 bytes to play with. Only writes more than 32
> bytes behind would have been caught.
>
> Poisoning is only applicable to unallocated objects and these were
> allocated.

Nope and nope.

This is a special case where the user asked for zero bytes and the kernel
gave him 8 (or 32) bytes instead.

If slab was smart enough, it would have poisoned those 8 bytes to some
known pattern, and then checked that they still had that pattern when the
memory got freed again.

But it isn't smart enough, so the bug went undetected.

As I said, it's specific to the kmalloc(0) problem, and we're fixing that
by other means anyway.

2007-06-01 22:20:39

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Andrew Morton wrote:

> If slab was smart enough, it would have poisoned those 8 bytes to some
> known pattern, and then checked that they still had that pattern when the
> memory got freed again.

So this is new feature request?

> But it isn't smart enough, so the bug went undetected.

I should make SLUB put poisoning values in unused areas of a kmalloced
object?

2007-06-01 22:33:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007 15:20:05 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> On Fri, 1 Jun 2007, Andrew Morton wrote:
>
> > If slab was smart enough, it would have poisoned those 8 bytes to some
> > known pattern, and then checked that they still had that pattern when the
> > memory got freed again.
>
> So this is new feature request?
>
> > But it isn't smart enough, so the bug went undetected.
>
> I should make SLUB put poisoning values in unused areas of a kmalloced
> object?

hm, I hadn't thought of it that way actually. I was thinking it was
specific to kmalloc(0) but as you point out, the situation is
generalisable.

Yes, if someone does kmalloc(42) and we satisfy the allocation from the
size-64 slab, we should poison and then check the allegedly-unused 22
bytes.

Please ;)

(vaguely stunned that we didn't think of doing this years ago).

It'll be a large patch, I expect?




Actually, I have this vague memory that slab would take that kmalloc(42)
and would then return kmalloc(64)+22, so the returned memory is
"right-aligned". This way the existing overrun-detection is applicable to
all kmallocs. Maybe I dreamed it.

2007-06-01 22:41:56

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Andrew Morton wrote:

> > I should make SLUB put poisoning values in unused areas of a kmalloced
> > object?
>
> hm, I hadn't thought of it that way actually. I was thinking it was
> specific to kmalloc(0) but as you point out, the situation is
> generalisable.

Right it could catch a lot of other bugs as well.

> Yes, if someone does kmalloc(42) and we satisfy the allocation from the
> size-64 slab, we should poison and then check the allegedly-unused 22
> bytes.
>
> Please ;)
>
> (vaguely stunned that we didn't think of doing this years ago).

Well there are architectural problems. We determine the power of two slab
at compile time. The object size information is currently not available in
the binary :=).

> It'll be a large patch, I expect?

Ummm... Yes. We need to switch off the compile time power of two slab
calculation. Then I need to have some way of storing the object size in
the metainformation of each object. Changes a lot of function calls.

> Actually, I have this vague memory that slab would take that kmalloc(42)
> and would then return kmalloc(64)+22, so the returned memory is
> "right-aligned". This way the existing overrun-detection is applicable to
> all kmallocs. Maybe I dreamed it.

Yes, that would be possible by simply adding a compile time generated
offset to the compile time generated call to kmem_cache_alloc. But then
kfree() will have a difficult time figuring out which object to free.
Hmmm... But I can get to the slab via the page struct which allows me to
figure out the power of two size. This would mean that kfree would work
with an arbitrary pointer anywhere into the object.

2007-06-01 23:00:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning



On Fri, 1 Jun 2007, Christoph Lameter wrote:
>
> Right it could catch a lot of other bugs as well.

I'd actually prefer "malloc(0)" to _not_ return NULL, but some known
(non-NULL) bogus pointer.

Why?

Because it's quite sane to have simple logic like

ptr = malloc(size);
if (!ptr)
return -ENOMEM;

and writing it as

if (size && !ptr)
return -ENOMEM;

is just annoying.

Also, NULL is _special_. There are absolutely tons of code in the kernel
(and elsewhere) that just does something *different* from NULL pointers,
and that totally breaks the whole notion of "you can allocate a zero-sized
allocation, you just must not dereference it". If people special-case
NULL as something else, they won't even go through the normal code-path.

So for *both* of the above reasons, it's actually stupid to return NULL
for a zero-sized allocation. It would be much better to return another
pointer that will trap on access. A good candidate might be to return

#define BADPTR ((void *)16)

which is a portable-enough (where "portable-enough" is "against strict
ANSI C rules, but works in practice on all architectures") way to return
something that will cause the same page fault behaviour as NULL, but will
_not_ trigger the "NULL is special" code.

(Of course, you then need to teach "kfree()" to accept it as another
pointer to be ignored, that's fine).

I bet you'd find *more* problems that way than by returning NULL, and
you'd also avoid the whole problem with "if (!ptr) return -ENOMEM".

Linus

2007-06-01 23:03:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007 15:41:48 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> On Fri, 1 Jun 2007, Andrew Morton wrote:
>
> > > I should make SLUB put poisoning values in unused areas of a kmalloced
> > > object?
> >
> > hm, I hadn't thought of it that way actually. I was thinking it was
> > specific to kmalloc(0) but as you point out, the situation is
> > generalisable.
>
> Right it could catch a lot of other bugs as well.
>
> > Yes, if someone does kmalloc(42) and we satisfy the allocation from the
> > size-64 slab, we should poison and then check the allegedly-unused 22
> > bytes.
> >
> > Please ;)
> >
> > (vaguely stunned that we didn't think of doing this years ago).
>
> Well there are architectural problems. We determine the power of two slab
> at compile time. The object size information is currently not available in
> the binary :=).
>
> > It'll be a large patch, I expect?
>
> Ummm... Yes. We need to switch off the compile time power of two slab
> calculation. Then I need to have some way of storing the object size in
> the metainformation of each object. Changes a lot of function calls.

Oh well. Don't lose any sleep over it ;)

<loses sleep>

We could store the size of the allocation in the allocated object? Just
add four bytes to the user's request, then pick the appropriate cache based
on that, then put the user's `size' at the tail of the resulting allocation?

So a kmalloc(62) would get upped to 66, so we allocate from size-128
and put the number 62 at bytes 124-127 and we poison bytes 62-123?

2007-06-01 23:17:29

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

> So a kmalloc(62) would get upped to 66, so we allocate from size-128
> and put the number 62 at bytes 124-127 and we poison bytes 62-123?

Hmmm... We are going rapidly here. This is a patch that I am testing right
now. It right adjust the object and the patch is manageable:



SLUB mm-only: Right align kmalloc objects to trigger overwrite detection

Right align kmalloc objects if they are less than the full kmalloc slab size.
This will move the object to be flush with the end of the object in order
to allow the easy detection of writes / reads after the end of the kmalloc
object.

Without slub_debug overwrites will destroy the free pointer of the next object
or the next object. Read will yield garbage that is likely zero.

With slub_debug redzone checks will be triggered. Reads will read redzone
poison.

This patch is only for checking things out. There are issues:

1. Alignment of kmalloc objects may now be different. In particular
objects whose size is not a multiple of wordsize may be not word alignmed.

2. __kmalloc and kfree need to touch an additional cacheline in
struct kmem_cache thereby reducing performance.

3. An object allocated via kmalloc may no longer be freed via kmem_cache_free.

So we need to figure out some may to make this configurable. Preferably
runtime configurable.

Signed-off-by: Christoph Lameter <[email protected]>

---
include/linux/slub_def.h | 22 +++++++++++++++++++---
mm/slub.c | 11 ++++++++---
2 files changed, 27 insertions(+), 6 deletions(-)

Index: slub/include/linux/slub_def.h
===================================================================
--- slub.orig/include/linux/slub_def.h 2007-06-01 15:56:42.000000000 -0700
+++ slub/include/linux/slub_def.h 2007-06-01 16:00:03.000000000 -0700
@@ -120,6 +120,19 @@ static inline struct kmem_cache *kmalloc
return &kmalloc_caches[index];
}

+static inline unsigned long kmalloc_size(size_t size)
+{
+ int index = kmalloc_index(size);
+
+ if (index >= KMALLOC_SHIFT_LOW)
+ return 1 << index;
+
+ if (index == 2)
+ return 192;
+ return 96;
+}
+
+
#ifdef CONFIG_ZONE_DMA
#define SLUB_DMA __GFP_DMA
#else
@@ -135,7 +148,8 @@ static inline void *kmalloc(size_t size,
if (!s)
return NULL;

- return kmem_cache_alloc(s, flags);
+ return kmem_cache_alloc(s, flags)
+ + kmalloc_size(size) - size;
} else
return __kmalloc(size, flags);
}
@@ -148,7 +162,8 @@ static inline void *kzalloc(size_t size,
if (!s)
return NULL;

- return kmem_cache_zalloc(s, flags);
+ return kmem_cache_zalloc(s, flags)
+ + kmalloc_size(size) - size;
} else
return __kzalloc(size, flags);
}
@@ -159,7 +174,8 @@ extern void *__kmalloc_node(size_t size,
static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
if (__builtin_constant_p(size) && !(flags & SLUB_DMA)) {
- struct kmem_cache *s = kmalloc_slab(size);
+ struct kmem_cache *s = kmalloc_slab(size) +
+ kmalloc_size(size) - size;

if (!s)
return NULL;
Index: slub/mm/slub.c
===================================================================
--- slub.orig/mm/slub.c 2007-06-01 15:51:05.000000000 -0700
+++ slub/mm/slub.c 2007-06-01 16:15:21.000000000 -0700
@@ -2283,9 +2283,10 @@ static struct kmem_cache *get_slab(size_
void *__kmalloc(size_t size, gfp_t flags)
{
struct kmem_cache *s = get_slab(size, flags);
+ int offset = size - s->size;

if (s)
- return slab_alloc(s, flags, -1, __builtin_return_address(0));
+ return slab_alloc(s, flags, -1, __builtin_return_address(0)) + offset;
return NULL;
}
EXPORT_SYMBOL(__kmalloc);
@@ -2294,9 +2295,10 @@ EXPORT_SYMBOL(__kmalloc);
void *__kmalloc_node(size_t size, gfp_t flags, int node)
{
struct kmem_cache *s = get_slab(size, flags);
+ int offset = size - s->size;

if (s)
- return slab_alloc(s, flags, node, __builtin_return_address(0));
+ return slab_alloc(s, flags, node, __builtin_return_address(0)) + offset;
return NULL;
}
EXPORT_SYMBOL(__kmalloc_node);
@@ -2337,6 +2339,7 @@ void kfree(const void *x)
{
struct kmem_cache *s;
struct page *page;
+ unsigned long addr = (unsigned long) x;

if (!x)
return;
@@ -2344,7 +2347,9 @@ void kfree(const void *x)
page = virt_to_head_page(x);
s = page->slab;

- slab_free(s, page, (void *)x, __builtin_return_address(0));
+ addr &= ~((unsigned long)s->size - 1);
+
+ slab_free(s, page, (void *)addr, __builtin_return_address(0));
}
EXPORT_SYMBOL(kfree);

2007-06-01 23:22:16

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Christoph Lameter wrote:

> Hmmm... We are going rapidly here. This is a patch that I am testing right
> now. It right adjust the object and the patch is manageable:

Does not boot sigh.

2007-06-01 23:26:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning



On Fri, 1 Jun 2007, Andrew Morton wrote:
>
> We could store the size of the allocation in the allocated object? Just
> add four bytes to the user's request, then pick the appropriate cache based
> on that, then put the user's `size' at the tail of the resulting allocation?

It should be easy enough to do it for _most_ allocations by just doing it
when there is already "enough slack" to do it (which is likely true most
of the time).

IOW, if you ask for a 42-byte allocation, and we allocate from a 64-byte
slab, you get the slab allocation at address X, you don't actually have to
return "X" at all. Just return "X+8", and then you do:

- at 32-bit word at X+0 you put the "real length"
- at 32-bit word at X+4 you put some good redzone marker
- at 32-bit word at "X + reallen + 8" you put the endzone marker.

And then you say: if the real length was within 12 bytes of the allocation
length, we just don't do this.

So you wouldn't get any redzoning for those allocations that are exactly
sized (or close enough) to fit in an allocation block, but I bet *most*
allocations would get this for free.

And then, if you actually turn on redzoning, you just always add the 12
byte to the allocation size (assuming the alignment rules allow you to).

The nice thing about this is that the freeing path already knows where the
object is *supposed* to start (because it sees the allocation size in the
slub/slab data structures), so the kfree() path can actually figure out on
its own whether it is given a "X" or an "X+8" kind of address.

So you don't actually need any extra information. You literally just need
enough slop in the allocation that you can do this in the first place, so
there is no cost (except for the cost of checking itself, of course).

Hmm?

Linus

2007-06-01 23:29:37

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Linus Torvalds wrote:

> So for *both* of the above reasons, it's actually stupid to return NULL
> for a zero-sized allocation. It would be much better to return another
> pointer that will trap on access. A good candidate might be to return
>
> #define BADPTR ((void *)16)

Something like this? (Not tested yet just for review):


SLUB: Return BADPTR instead of warning for kmalloc(0)

Remove the WARN_ON_ONCE and simply return BADPTR.

BADPTR can be used legitimately as long as it is not dereferenced.

Signed-off-by: Christoph Lameter <[email protected]>

---
include/linux/slub_def.h | 18 ++++++++----------
mm/slub.c | 8 ++++----
2 files changed, 12 insertions(+), 14 deletions(-)

Index: slub/include/linux/slub_def.h
===================================================================
--- slub.orig/include/linux/slub_def.h 2007-06-01 16:19:36.000000000 -0700
+++ slub/include/linux/slub_def.h 2007-06-01 16:24:54.000000000 -0700
@@ -12,6 +12,8 @@
#include <linux/kobject.h>
#include <linux/log2.h>

+#define BADPTR ((void *)16)
+
struct kmem_cache_node {
spinlock_t list_lock; /* Protect partial list and nr_partial */
unsigned long nr_partial;
@@ -74,13 +76,9 @@ extern struct kmem_cache kmalloc_caches[
*/
static inline int kmalloc_index(size_t size)
{
- /*
- * We should return 0 if size == 0 (which would result in the
- * kmalloc caller to get NULL) but we use the smallest object
- * here for legacy reasons. Just issue a warning so that
- * we can discover locations where we do 0 sized allocations.
- */
- WARN_ON_ONCE(size == 0);
+
+ if (!size)
+ return 0;

if (size > KMALLOC_MAX_SIZE)
return -1;
@@ -133,7 +131,7 @@ static inline void *kmalloc(size_t size,
struct kmem_cache *s = kmalloc_slab(size);

if (!s)
- return NULL;
+ return BADPTR;

return kmem_cache_alloc(s, flags);
} else
@@ -146,7 +144,7 @@ static inline void *kzalloc(size_t size,
struct kmem_cache *s = kmalloc_slab(size);

if (!s)
- return NULL;
+ return BADPTR;

return kmem_cache_zalloc(s, flags);
} else
@@ -162,7 +160,7 @@ static inline void *kmalloc_node(size_t
struct kmem_cache *s = kmalloc_slab(size);

if (!s)
- return NULL;
+ return BADPTR;

return kmem_cache_alloc_node(s, flags, node);
} else
Index: slub/mm/slub.c
===================================================================
--- slub.orig/mm/slub.c 2007-06-01 16:21:00.000000000 -0700
+++ slub/mm/slub.c 2007-06-01 16:27:12.000000000 -0700
@@ -2286,7 +2286,7 @@ void *__kmalloc(size_t size, gfp_t flags

if (s)
return slab_alloc(s, flags, -1, __builtin_return_address(0));
- return NULL;
+ return BADPTR;
}
EXPORT_SYMBOL(__kmalloc);

@@ -2297,7 +2297,7 @@ void *__kmalloc_node(size_t size, gfp_t

if (s)
return slab_alloc(s, flags, node, __builtin_return_address(0));
- return NULL;
+ return BADPTR;
}
EXPORT_SYMBOL(__kmalloc_node);
#endif
@@ -2707,7 +2707,7 @@ void *__kmalloc_track_caller(size_t size
struct kmem_cache *s = get_slab(size, gfpflags);

if (!s)
- return NULL;
+ return BADPTR;

return slab_alloc(s, gfpflags, -1, caller);
}
@@ -2718,7 +2718,7 @@ void *__kmalloc_node_track_caller(size_t
struct kmem_cache *s = get_slab(size, gfpflags);

if (!s)
- return NULL;
+ return BADPTR;

return slab_alloc(s, gfpflags, node, caller);
}

2007-06-01 23:37:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning



On Fri, 1 Jun 2007, Christoph Lameter wrote:
>
> Hmmm... We are going rapidly here. This is a patch that I am testing right
> now. It right adjust the object and the patch is manageable:

No, I don't think you can do it this way.

At a minimum, you'd need to test that the result is word-aligned.
Preferably 8-byte aligned. We literally have stuff that knows about these
things and uses the low bits in the pointer to keep extra data.

(That said, I didn't check whether we actually kmalloc() the data, but I
think we do - things like "struct key" etc).

Now, maybe those things always have a 8-byte-aligned size, and it works
out, but I'd be worried.

Of course, there migth be other (even more subtle) cases where we just
assume certain alignment, and depend on the fact that we just _happen_ to
get it. Who knows..

Linus

2007-06-01 23:42:19

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Linus Torvalds wrote:

> No, I don't think you can do it this way.

Ultimately not. But its worth to see if this works.

> At a minimum, you'd need to test that the result is word-aligned.
> Preferably 8-byte aligned. We literally have stuff that knows about these
> things and uses the low bits in the pointer to keep extra data.

kmalloc allocations are guaranteed to be aligned to KMALLOC_MINALIGN. I
can bring that in but it will make the patch less readable.

> Of course, there migth be other (even more subtle) cases where we just
> assume certain alignment, and depend on the fact that we just _happen_ to
> get it. Who knows..

I tried to get rid of those cased and I hope that work is complete in
2.6.22.

2007-06-01 23:42:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning



On Fri, 1 Jun 2007, Christoph Lameter wrote:
>
> Something like this? (Not tested yet just for review):

I'm not seeing the path on the freeing side that silently accepts BADPTR.

It is not ok to derefence BADPTR, but it's obviously ok to _free_ it.

Also, if I read the patch correctly, you _also_ return BADPTR for slabs
that are too large. No? That would be wrong - those need to return NULL
for "out of memory".

(But I only looked at the diff, not at the end result, so it may be that
all the cases where you changed "return NULL" into "return BADPTR" were
really just the size-zero ones - it just looked suspicious).

Linus

2007-06-01 23:46:58

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Linus Torvalds wrote:

> I'm not seeing the path on the freeing side that silently accepts BADPTR.
>
> It is not ok to derefence BADPTR, but it's obviously ok to _free_ it.

Ok.

> Also, if I read the patch correctly, you _also_ return BADPTR for slabs
> that are too large. No? That would be wrong - those need to return NULL
> for "out of memory".

A too large alloc is >32MB or MAX_ORDER << PAGE_SIZE. A BUG_ON in
kmalloc_slab() will trigger.

Here is the updated patch. It works fine here:


SLUB: Return BADPTR instead of warning for kmalloc(0)

Remove the WARN_ON_ONCE and simply return BADPTR.

BADPTR can be used legitimately as long as it is not dereferenced.
Can even be freed.

Signed-off-by: Christoph Lameter <[email protected]>

---
include/linux/slub_def.h | 18 ++++++++----------
mm/slub.c | 10 +++++-----
2 files changed, 13 insertions(+), 15 deletions(-)

Index: slub/include/linux/slub_def.h
===================================================================
--- slub.orig/include/linux/slub_def.h 2007-06-01 16:19:36.000000000 -0700
+++ slub/include/linux/slub_def.h 2007-06-01 16:43:57.000000000 -0700
@@ -12,6 +12,8 @@
#include <linux/kobject.h>
#include <linux/log2.h>

+#define BADPTR ((void *)16)
+
struct kmem_cache_node {
spinlock_t list_lock; /* Protect partial list and nr_partial */
unsigned long nr_partial;
@@ -74,13 +76,9 @@ extern struct kmem_cache kmalloc_caches[
*/
static inline int kmalloc_index(size_t size)
{
- /*
- * We should return 0 if size == 0 (which would result in the
- * kmalloc caller to get NULL) but we use the smallest object
- * here for legacy reasons. Just issue a warning so that
- * we can discover locations where we do 0 sized allocations.
- */
- WARN_ON_ONCE(size == 0);
+
+ if (!size)
+ return 0;

if (size > KMALLOC_MAX_SIZE)
return -1;
@@ -133,7 +131,7 @@ static inline void *kmalloc(size_t size,
struct kmem_cache *s = kmalloc_slab(size);

if (!s)
- return NULL;
+ return BADPTR;

return kmem_cache_alloc(s, flags);
} else
@@ -146,7 +144,7 @@ static inline void *kzalloc(size_t size,
struct kmem_cache *s = kmalloc_slab(size);

if (!s)
- return NULL;
+ return BADPTR;

return kmem_cache_zalloc(s, flags);
} else
@@ -162,7 +160,7 @@ static inline void *kmalloc_node(size_t
struct kmem_cache *s = kmalloc_slab(size);

if (!s)
- return NULL;
+ return BADPTR;

return kmem_cache_alloc_node(s, flags, node);
} else
Index: slub/mm/slub.c
===================================================================
--- slub.orig/mm/slub.c 2007-06-01 16:21:00.000000000 -0700
+++ slub/mm/slub.c 2007-06-01 16:43:27.000000000 -0700
@@ -2286,7 +2286,7 @@ void *__kmalloc(size_t size, gfp_t flags

if (s)
return slab_alloc(s, flags, -1, __builtin_return_address(0));
- return NULL;
+ return BADPTR;
}
EXPORT_SYMBOL(__kmalloc);

@@ -2297,7 +2297,7 @@ void *__kmalloc_node(size_t size, gfp_t

if (s)
return slab_alloc(s, flags, node, __builtin_return_address(0));
- return NULL;
+ return BADPTR;
}
EXPORT_SYMBOL(__kmalloc_node);
#endif
@@ -2338,7 +2338,7 @@ void kfree(const void *x)
struct kmem_cache *s;
struct page *page;

- if (!x)
+ if (!x || x == BADPTR)
return;

page = virt_to_head_page(x);
@@ -2707,7 +2707,7 @@ void *__kmalloc_track_caller(size_t size
struct kmem_cache *s = get_slab(size, gfpflags);

if (!s)
- return NULL;
+ return BADPTR;

return slab_alloc(s, gfpflags, -1, caller);
}
@@ -2718,7 +2718,7 @@ void *__kmalloc_node_track_caller(size_t
struct kmem_cache *s = get_slab(size, gfpflags);

if (!s)
- return NULL;
+ return BADPTR;

return slab_alloc(s, gfpflags, node, caller);
}

2007-06-01 23:57:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning



On Fri, 1 Jun 2007, Christoph Lameter wrote:
>
> A too large alloc is >32MB or MAX_ORDER << PAGE_SIZE. A BUG_ON in
> kmalloc_slab() will trigger.

Did we use to BUG_ON()? I think that's wrong. There are ways for users to
potentially ask the kernel to do big allocations, and the correct response
is to say "no can do", not to crash!

> Here is the updated patch. It works fine here:
>
> SLUB: Return BADPTR instead of warning for kmalloc(0)

Looks fine to me. My only comment is that

> - if (!x)
> + if (!x || x == BADPTR)
> return;

This could be micro-optimized (again, non-standard, but it should be
"practically portable") to have just a single test using something like

if ((unsigned long)x <= 16)
return;

but I guess it doesn't really matter much.

I think this is better than what we have now, but I also suspect it's
*not* something we should try this late in the -rc sequence ;)

Andrew, want to take this patch to -mm to see if it triggers anything?

Linus

2007-06-02 00:12:52

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Linus Torvalds wrote:

> > A too large alloc is >32MB or MAX_ORDER << PAGE_SIZE. A BUG_ON in
> > kmalloc_slab() will trigger.
>
> Did we use to BUG_ON()? I think that's wrong. There are ways for users to
> potentially ask the kernel to do big allocations, and the correct response
> is to say "no can do", not to crash!

There is no way to distinguish that from out of memory. Failing on large
allocs is what we have always done for kmalloc(). Before 2.6.22 we used to
fail for allocs > 256k which was a big nuisance for NUMAs large allocs.

The patches in 2.6.22 allow us for the first time to allocate arbitrary
sized objects up to MAX_ORDER. So we no longer have troubles with large
NUMA objects.

2007-06-02 00:16:45

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007 16:57:20 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

> Andrew, want to take this patch to -mm to see if it triggers anything?

spose so.

I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there,
because it is exposing some coding warts. But we should turn it off for
2.6.22 and make it conditional on CONFIG_DEVEL_KERNEL (or whatever it will
be called) later.

The BADPTR thing is a little worrying because it will make
previously-working-by-luck code go oops. I guess we can live with that.

So we end up with the BADPTR code enabled even in production kernels, in
which case your ((unsigned long)x <= 16) trick is worth doing.

2007-06-02 00:26:17

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, Andrew Morton wrote:

> I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there,

The trouble with the WARN_ON is that it triggers even for code that is
okay like noted by Jeremy. My initial intend with NULL was to allow the
allocation of a zero sized pointer without extra checks. It should only
trigger a failure if something bad was done.

NULL had the problem of confusion with no memory available. I think BADPTR
is good. If the coding warts are causing trouble then BADPTR will result
in a failure. Otherwise if the code is doing a kmalloc(0) and not
dereferencing the pointer (like Paul's fixed code) then we should be fine
and not issue a warning.

The false positives may be upsetting some people.

2007-06-02 00:41:58

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

Christoph Lameter wrote:
> Well there are architectural problems. We determine the power of two slab
> at compile time. The object size information is currently not available in
> the binary :=).
>

That only applies to allocations with constant sizes. One presumes
nobody is explicitly doing kmalloc(0), so we can use a separate
runtime-computed-size path to do poisoning. (Which is probably 90% of
the problem, since people who kmalloc(sizeof(struct foo)) will generally
stay within bounds without too much effort.)

J

2007-06-02 00:44:00

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

Andrew Morton wrote:
> As I said, it's specific to the kmalloc(0) problem, and we're fixing that
> by other means anyway.
>

I guess I'm not getting much traction in my campaign for equal rights
for zero-sized allocations. They're perfectly reasonable things to have,
if that's the way the code falls out. The warning is just prejudice.

J

2007-06-02 00:46:42

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

Linus Torvalds wrote:
> So for *both* of the above reasons, it's actually stupid to return NULL
> for a zero-sized allocation. It would be much better to return another
> pointer that will trap on access. A good candidate might be to return
>
> #define BADPTR ((void *)16)
>

I think this is a good idea in principle, but I wonder if there's any
code which assumes that kmalloc(x) != kmalloc(x) for all non-NULL
returns from kmalloc.

J

2007-06-02 00:52:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 01 Jun 2007 17:43:46 -0700 Jeremy Fitzhardinge <[email protected]> wrote:

> Andrew Morton wrote:
> > As I said, it's specific to the kmalloc(0) problem, and we're fixing that
> > by other means anyway.
> >
>
> I guess I'm not getting much traction in my campaign for equal rights
> for zero-sized allocations. They're perfectly reasonable things to have,
> if that's the way the code falls out. The warning is just prejudice.
>

In some cases, sure, and we should remove the warning at some stage.

But it has exposed a couple of bugs and a few weird things.

2007-06-02 00:59:48

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

Andrew Morton wrote:
> In some cases, sure, and we should remove the warning at some stage.
>
> But it has exposed a couple of bugs and a few weird things.
>

We just need to scatter better bug powder into the corners of the
allocations.

J

2007-06-02 01:05:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning



On Fri, 1 Jun 2007, Christoph Lameter wrote:
>
> On Fri, 1 Jun 2007, Andrew Morton wrote:
>
> > I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there,
>
> The trouble with the WARN_ON is that it triggers even for code that is
> okay like noted by Jeremy.

Yes. Sometimes it's just more natural to have

ptr = kmalloc(size);
.. use it ..
free(ptr);

and if a *degenerate* case of size=0 happens, who cares? It should just
work, as long as we (obviously) don't actually try to access the pointer.

So I don't much like the WARN_ON(size == 0). I think it potentially just
causes people to write around it, and quite possibly causes the callers to
write code that is not at all more readable or maintainable!

That's why I'd much rather return BADPTR instead: we'll get an oops for
buggy code, but we don't penalize the "natural" and good code! So once you
return BADPTR, there really isn't any good reason for the WARN_ON.

Linus

2007-06-02 01:06:27

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 01 Jun 2007 16:00:30 PDT, Linus Torvalds said:

> #define BADPTR ((void *)16)

> I bet you'd find *more* problems that way than by returning NULL, and
> you'd also avoid the whole problem with "if (!ptr) return -ENOMEM".

Hmm.. this looks like a good contender for "first usage of #ifndef CONFIG_STABLE"

:)


Attachments:
(No filename) (226.00 B)

2007-06-02 01:25:04

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

On Fri, 1 Jun 2007, [email protected] wrote:

> On Fri, 01 Jun 2007 16:00:30 PDT, Linus Torvalds said:
>
> > #define BADPTR ((void *)16)
>
> > I bet you'd find *more* problems that way than by returning NULL, and
> > you'd also avoid the whole problem with "if (!ptr) return -ENOMEM".
>
> Hmm.. this looks like a good contender for "first usage of #ifndef CONFIG_STABLE"

The warning? Sure but the BADPTR (or ZERO_SIZE_PTR now) will stay.
It is definitely an error to deference an object that has no size.