LinuxLists.cc - [PATCH 2/5] mm: remove unlikly NULL from kfree

2009-03-25 05:20:51

Subject: [PATCH 2/5] mm: remove unlikly NULL from kfree

From: Steven Rostedt <[email protected]>

Impact: clean up

A NULL pointer to kfree is no longer unlikely, as seen by the
annotated branch profiler:

correct incorrect % Function File Line
------- --------- - -------- ---- ----
728571 1315540 64 kfree slab.c 3719

This makes sense, since we now encourage developers to just call kfree
without checking for NULL.

Signed-off-by: Steven Rostedt <[email protected]>
---
mm/slab.c | 2 +-
mm/slob.c | 2 +-
mm/slub.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 4d00855..0386c33 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3716,7 +3716,7 @@ void kfree(const void *objp)
struct kmem_cache *c;
unsigned long flags;

- if (unlikely(ZERO_OR_NULL_PTR(objp)))
+ if (ZERO_OR_NULL_PTR(objp))
return;
local_irq_save(flags);
kfree_debugcheck(objp);
diff --git a/mm/slob.c b/mm/slob.c
index 52bc8a2..e077174 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -491,7 +491,7 @@ void kfree(const void *block)
{
struct slob_page *sp;

- if (unlikely(ZERO_OR_NULL_PTR(block)))
+ if (ZERO_OR_NULL_PTR(block))
return;

sp = (struct slob_page *)virt_to_page(block);
diff --git a/mm/slub.c b/mm/slub.c
index 0280eee..65dc436 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2743,7 +2743,7 @@ void kfree(const void *x)
struct page *page;
void *object = (void *)x;

- if (unlikely(ZERO_OR_NULL_PTR(x)))
+ if (ZERO_OR_NULL_PTR(x))
return;

page = virt_to_head_page(x);
--
1.6.2

--

2009-03-25 07:34:42

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, Mar 25, 2009 at 7:19 AM, Steven Rostedt <[email protected]> wrote:
> From: Steven Rostedt <[email protected]>
>
> Impact: clean up
>
> A NULL pointer to kfree is no longer unlikely, as seen by the
> annotated branch profiler:
>
> ?correct incorrect ?% ? ? ? ?Function ? ? ? ? ? ? ? ? ?File ? ? ? ? ? ? ?Line
> ?------- --------- ?- ? ? ? ?-------- ? ? ? ? ? ? ? ? ?---- ? ? ? ? ? ? ?----
> ?728571 ?1315540 ?64 kfree ? ? ? ? ? ? ? ? ? ? ? ? ?slab.c ? ? ? ? ? ? ? 3719
>
> This makes sense, since we now encourage developers to just call kfree
> without checking for NULL.

But those are _error handling paths_ (at least supposed to be). I
wonder which call-sites are responsible for this. Can frtrace help us
here?

> ---
> ?mm/slab.c | ? ?2 +-
> ?mm/slob.c | ? ?2 +-
> ?mm/slub.c | ? ?2 +-
> ?3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 4d00855..0386c33 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3716,7 +3716,7 @@ void kfree(const void *objp)
> ? ? ? ?struct kmem_cache *c;
> ? ? ? ?unsigned long flags;
>
> - ? ? ? if (unlikely(ZERO_OR_NULL_PTR(objp)))
> + ? ? ? if (ZERO_OR_NULL_PTR(objp))
> ? ? ? ? ? ? ? ?return;
> ? ? ? ?local_irq_save(flags);
> ? ? ? ?kfree_debugcheck(objp);
> diff --git a/mm/slob.c b/mm/slob.c
> index 52bc8a2..e077174 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -491,7 +491,7 @@ void kfree(const void *block)
> ?{
> ? ? ? ?struct slob_page *sp;
>
> - ? ? ? if (unlikely(ZERO_OR_NULL_PTR(block)))
> + ? ? ? if (ZERO_OR_NULL_PTR(block))
> ? ? ? ? ? ? ? ?return;
>
> ? ? ? ?sp = (struct slob_page *)virt_to_page(block);
> diff --git a/mm/slub.c b/mm/slub.c
> index 0280eee..65dc436 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2743,7 +2743,7 @@ void kfree(const void *x)
> ? ? ? ?struct page *page;
> ? ? ? ?void *object = (void *)x;
>
> - ? ? ? if (unlikely(ZERO_OR_NULL_PTR(x)))
> + ? ? ? if (ZERO_OR_NULL_PTR(x))
> ? ? ? ? ? ? ? ?return;
>
> ? ? ? ?page = virt_to_head_page(x);
> --
> 1.6.2
>
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>

2009-03-25 07:53:16

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 25 Mar 2009, Pekka Enberg wrote:
> On Wed, Mar 25, 2009 at 7:19 AM, Steven Rostedt <[email protected]> wrote:
> > This makes sense, since we now encourage developers to just call kfree
> > without checking for NULL.
>
> But those are _error handling paths_ (at least supposed to be). I
> wonder which call-sites are responsible for this. Can frtrace help us
> here?

Why is this an error handler. We replaced tons of

if (obj)
kfree(obj);

constructs all over the kernel with kfree(obj); and let kfree deal
with the NULL pointer.

Thanks,

tglx

2009-03-25 08:02:16

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

Hi Thomas,

On Wed, Mar 25, 2009 at 7:19 AM, Steven Rostedt <[email protected]> wrote:
> > > This makes sense, since we now encourage developers to just call kfree
> > > without checking for NULL.

On Wed, 25 Mar 2009, Pekka Enberg wrote:
> > But those are _error handling paths_ (at least supposed to be). I
> > wonder which call-sites are responsible for this. Can frtrace help us
> > here?

On Wed, 2009-03-25 at 08:50 +0100, Thomas Gleixner wrote:
> Why is this an error handler. We replaced tons of
>
> if (obj)
> kfree(obj);
>
> constructs all over the kernel with kfree(obj); and let kfree deal
> with the NULL pointer.

We encourage developers not to check for kfree() in the common
out-of-memory error handling paths. But what Steven's results suggest is
that the common case is something like this:

void *p = NULL;

if (/* unlikely condition */)
p = kmalloc(...);

kfree(p);

which, quite frankly, doesn't make much sense to me. That's why I would
really want to know which call-sites are causing this before applying
the patch.

Pekka

2009-03-25 08:03:37

by Hua Zhong

[permalink] [raw]

Subject: RE: [PATCH 2/5] mm: remove unlikly NULL from kfree

> But those are _error handling paths_ (at least supposed to be). I
> wonder which call-sites are responsible for this. Can frtrace help us
> here?

I am not sure why you call these error paths.

I submitted the same patch two years ago, and you are still holding the same
argument.

http://www.archivum.info/linux.kernel/2006-04/msg06042.html

Have you used likely profiler? These are real numbers. If you insist on
calling them error paths then error paths are obviously the norm.

Hua

2009-03-25 08:06:55

by Pekka Enberg

[permalink] [raw]

Subject: RE: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 2009-03-25 at 01:02 -0700, Hua Zhong wrote:
> > But those are _error handling paths_ (at least supposed to be). I
> > wonder which call-sites are responsible for this. Can frtrace help us
> > here?
>
> I am not sure why you call these error paths.
>
> I submitted the same patch two years ago, and you are still holding the same
> argument.
>
> http://www.archivum.info/linux.kernel/2006-04/msg06042.html
>
> Have you used likely profiler? These are real numbers. If you insist on
> calling them error paths then error paths are obviously the norm.

I am not denying the results, I am just saying that they don't make much
sense to me. Like I said, I would love to see the actual call-sites to
prove my argument wrong.

Pekka

2009-03-25 13:52:21

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 2009-03-25 at 01:02 -0700, Hua Zhong wrote:
>> > But those are _error handling paths_ (at least supposed to be). I
>> > wonder which call-sites are responsible for this. Can frtrace help us
>> > here?
>>
>> I am not sure why you call these error paths.
>>
>> I submitted the same patch two years ago, and you are still holding the same
>> argument.
>>
>> http://www.archivum.info/linux.kernel/2006-04/msg06042.html
>>
>> Have you used likely profiler? These are real numbers. If you insist on
>> calling them error paths then error paths are obviously the norm.

On Wed, Mar 25, 2009 at 10:06 AM, Pekka Enberg <[email protected]> wrote:
> I am not denying the results, I am just saying that they don't make much
> sense to me. Like I said, I would love to see the actual call-sites to
> prove my argument wrong.

OK, so according to Steven, audit_syscall_exit() is one such call-site
that shows up in the traces. I don't really understand what's going on
there but if it is sane, maybe that warrants the removal of unlikely()
from kfree(). Hmm?

2009-03-25 14:47:20

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 25 Mar 2009, Pekka Enberg wrote:

> On Wed, 2009-03-25 at 01:02 -0700, Hua Zhong wrote:
> >> > But those are _error handling paths_ (at least supposed to be). I
> >> > wonder which call-sites are responsible for this. Can frtrace help us
> >> > here?
> >>
> >> I am not sure why you call these error paths.
> >>
> >> I submitted the same patch two years ago, and you are still holding the same
> >> argument.
> >>
> >> http://www.archivum.info/linux.kernel/2006-04/msg06042.html
> >>
> >> Have you used likely profiler? These are real numbers. If you insist on
> >> calling them error paths then error paths are obviously the norm.
>
> On Wed, Mar 25, 2009 at 10:06 AM, Pekka Enberg <[email protected]> wrote:
> > I am not denying the results, I am just saying that they don't make much
> > sense to me. Like I said, I would love to see the actual call-sites to
> > prove my argument wrong.
>
> OK, so according to Steven, audit_syscall_exit() is one such call-site
> that shows up in the traces. I don't really understand what's going on
> there but if it is sane, maybe that warrants the removal of unlikely()
> from kfree(). Hmm?

After disabling AUDIT_SYSCALLS I have this:

# cat /debug/tracing/trace | sort -u

record_nulls: ptr=(null) (ext3_get_acl+0x1e0/0x3f0 [ext3])
record_nulls: ptr=(null) (free_bitmap+0x29/0x70)
record_nulls: ptr=(null) (free_tty_struct+0x1d/0x40)
record_nulls: ptr=(null) (ftrace_graph_exit_task+0x1e/0x20)
record_nulls: ptr=(null) (inet_sock_destruct+0x1cb/0x2a0)
record_nulls: ptr=(null) (ip_cork_release+0x24/0x50)
record_nulls: ptr=(null) (keyctl_join_session_keyring+0x5a/0x70)
record_nulls: ptr=(null) (key_user_lookup+0x183/0x220)
record_nulls: ptr=(null) (kobject_set_name_vargs+0x43/0x50)
record_nulls: ptr=(null) (netlink_release+0x1a4/0x2f0)
record_nulls: ptr=(null) (release_sysfs_dirent+0x20/0xc0)
record_nulls: ptr=(null) (sysfs_open_file+0x1c8/0x3e0)
record_nulls: ptr=(null) (tty_write+0x16a/0x290)

I added a hook to only record when NULL is passed into kfree.

Also note, that after disabling AUDIT_SYSCALLS I now only have roughly 7%
NULL hit rate. Still, unlikely is probably not a benefit here.

correct incorrect % Function File Line
------- --------- - -------- ---- ----
108129 8214 7 kfree slub.c 2796

And yes, this is a different kernel than the patch. This is running on tip
and not 2.6.29, and also, you can see, uses slub.c not slab.c. The reason
I changed was because:

1) it has better tracing utilities
2) included your trace point patch ;-)

The header is still messed up, because I'm using the kmemtrace branch and
not the ftrace branch, but I had better go back and make sure they are
fixed.

-- Steve

2009-03-25 14:59:21

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

Hi Steven,

On Wed, 25 Mar 2009, Pekka Enberg wrote:
> > OK, so according to Steven, audit_syscall_exit() is one such call-site
> > that shows up in the traces. I don't really understand what's going on
> > there but if it is sane, maybe that warrants the removal of unlikely()
> > from kfree(). Hmm?

On Wed, 2009-03-25 at 10:47 -0400, Steven Rostedt wrote:
> After disabling AUDIT_SYSCALLS I have this:
>
> # cat /debug/tracing/trace | sort -u
>
> record_nulls: ptr=(null) (ext3_get_acl+0x1e0/0x3f0 [ext3])
> record_nulls: ptr=(null) (free_bitmap+0x29/0x70)
> record_nulls: ptr=(null) (free_tty_struct+0x1d/0x40)
> record_nulls: ptr=(null) (ftrace_graph_exit_task+0x1e/0x20)
> record_nulls: ptr=(null) (inet_sock_destruct+0x1cb/0x2a0)
> record_nulls: ptr=(null) (ip_cork_release+0x24/0x50)
> record_nulls: ptr=(null) (keyctl_join_session_keyring+0x5a/0x70)
> record_nulls: ptr=(null) (key_user_lookup+0x183/0x220)
> record_nulls: ptr=(null) (kobject_set_name_vargs+0x43/0x50)
> record_nulls: ptr=(null) (netlink_release+0x1a4/0x2f0)
> record_nulls: ptr=(null) (release_sysfs_dirent+0x20/0xc0)
> record_nulls: ptr=(null) (sysfs_open_file+0x1c8/0x3e0)
> record_nulls: ptr=(null) (tty_write+0x16a/0x290)
>
> I added a hook to only record when NULL is passed into kfree.
>
> Also note, that after disabling AUDIT_SYSCALLS I now only have roughly 7%
> NULL hit rate. Still, unlikely is probably not a benefit here.

Thanks for doing this. Do you mean that 93% hit ratio is not enough to
be a performance gain?

Pekka

2009-03-25 15:04:20

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 25 Mar 2009, Pekka Enberg wrote:

> Hi Steven,
>
> On Wed, 25 Mar 2009, Pekka Enberg wrote:
> > > OK, so according to Steven, audit_syscall_exit() is one such call-site
> > > that shows up in the traces. I don't really understand what's going on
> > > there but if it is sane, maybe that warrants the removal of unlikely()
> > > from kfree(). Hmm?
>
> On Wed, 2009-03-25 at 10:47 -0400, Steven Rostedt wrote:
> > After disabling AUDIT_SYSCALLS I have this:
> >
> > # cat /debug/tracing/trace | sort -u
> >
> > record_nulls: ptr=(null) (ext3_get_acl+0x1e0/0x3f0 [ext3])
> > record_nulls: ptr=(null) (free_bitmap+0x29/0x70)
> > record_nulls: ptr=(null) (free_tty_struct+0x1d/0x40)
> > record_nulls: ptr=(null) (ftrace_graph_exit_task+0x1e/0x20)
> > record_nulls: ptr=(null) (inet_sock_destruct+0x1cb/0x2a0)
> > record_nulls: ptr=(null) (ip_cork_release+0x24/0x50)
> > record_nulls: ptr=(null) (keyctl_join_session_keyring+0x5a/0x70)
> > record_nulls: ptr=(null) (key_user_lookup+0x183/0x220)
> > record_nulls: ptr=(null) (kobject_set_name_vargs+0x43/0x50)
> > record_nulls: ptr=(null) (netlink_release+0x1a4/0x2f0)
> > record_nulls: ptr=(null) (release_sysfs_dirent+0x20/0xc0)
> > record_nulls: ptr=(null) (sysfs_open_file+0x1c8/0x3e0)
> > record_nulls: ptr=(null) (tty_write+0x16a/0x290)
> >
> > I added a hook to only record when NULL is passed into kfree.
> >
> > Also note, that after disabling AUDIT_SYSCALLS I now only have roughly 7%
> > NULL hit rate. Still, unlikely is probably not a benefit here.
>
> Thanks for doing this. Do you mean that 93% hit ratio is not enough to
> be a performance gain?

I think it was Christoph Lameter (good you Cc'd him) told me that anything
less that 99% is not worthy of a (un)likely macro.

I honestly don't know.

-- Steve

2009-03-25 15:08:49

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 25 Mar 2009, Steven Rostedt wrote:

>
> On Wed, 25 Mar 2009, Pekka Enberg wrote:
>
> > Hi Steven,
> >
> > On Wed, 25 Mar 2009, Pekka Enberg wrote:
> > > > OK, so according to Steven, audit_syscall_exit() is one such call-site
> > > > that shows up in the traces. I don't really understand what's going on
> > > > there but if it is sane, maybe that warrants the removal of unlikely()
> > > > from kfree(). Hmm?
> >
> > On Wed, 2009-03-25 at 10:47 -0400, Steven Rostedt wrote:
> > > After disabling AUDIT_SYSCALLS I have this:
> > >
> > > # cat /debug/tracing/trace | sort -u
> > >
> > > record_nulls: ptr=(null) (ext3_get_acl+0x1e0/0x3f0 [ext3])
> > > record_nulls: ptr=(null) (free_bitmap+0x29/0x70)
> > > record_nulls: ptr=(null) (free_tty_struct+0x1d/0x40)
> > > record_nulls: ptr=(null) (ftrace_graph_exit_task+0x1e/0x20)
> > > record_nulls: ptr=(null) (inet_sock_destruct+0x1cb/0x2a0)
> > > record_nulls: ptr=(null) (ip_cork_release+0x24/0x50)
> > > record_nulls: ptr=(null) (keyctl_join_session_keyring+0x5a/0x70)
> > > record_nulls: ptr=(null) (key_user_lookup+0x183/0x220)
> > > record_nulls: ptr=(null) (kobject_set_name_vargs+0x43/0x50)
> > > record_nulls: ptr=(null) (netlink_release+0x1a4/0x2f0)
> > > record_nulls: ptr=(null) (release_sysfs_dirent+0x20/0xc0)
> > > record_nulls: ptr=(null) (sysfs_open_file+0x1c8/0x3e0)
> > > record_nulls: ptr=(null) (tty_write+0x16a/0x290)
> > >
> > > I added a hook to only record when NULL is passed into kfree.
> > >
> > > Also note, that after disabling AUDIT_SYSCALLS I now only have roughly 7%
> > > NULL hit rate. Still, unlikely is probably not a benefit here.
> >
> > Thanks for doing this. Do you mean that 93% hit ratio is not enough to
> > be a performance gain?
>
> I think it was Christoph Lameter (good you Cc'd him) told me that anything
> less that 99% is not worthy of a (un)likely macro.
>
> I honestly don't know.

I think the theory is that gcc and the CPU can handle normal branch
predictions well. But if you use one of the prediction macros, gcc
(and some archs) behaves differently, such that taking the wrong branch
can cost more than the time saved with all the other correct hits.

Again, I'm not sure. I haven't done the bench marks. Perhaps someone else
is more apt at knowing the details here.

-- Steve

2009-03-25 15:27:27

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 25 Mar 2009, Steven Rostedt wrote:
>
> # cat /debug/tracing/trace | sort -u
>
> record_nulls: ptr=(null) (ext3_get_acl+0x1e0/0x3f0 [ext3])
> record_nulls: ptr=(null) (free_bitmap+0x29/0x70)
> record_nulls: ptr=(null) (free_tty_struct+0x1d/0x40)
> record_nulls: ptr=(null) (ftrace_graph_exit_task+0x1e/0x20)
> record_nulls: ptr=(null) (inet_sock_destruct+0x1cb/0x2a0)
> record_nulls: ptr=(null) (ip_cork_release+0x24/0x50)
> record_nulls: ptr=(null) (keyctl_join_session_keyring+0x5a/0x70)
> record_nulls: ptr=(null) (key_user_lookup+0x183/0x220)
> record_nulls: ptr=(null) (kobject_set_name_vargs+0x43/0x50)
> record_nulls: ptr=(null) (netlink_release+0x1a4/0x2f0)
> record_nulls: ptr=(null) (release_sysfs_dirent+0x20/0xc0)
> record_nulls: ptr=(null) (sysfs_open_file+0x1c8/0x3e0)
> record_nulls: ptr=(null) (tty_write+0x16a/0x290)
>

Hmm, it is looking like the patter of calling kfree(NULL) is due to this:

struct some_struct *some_init_function(int some_option)
{
some_pointer = kzalloc(sizeof(struct some_struct));

if (some_option)
some_pointer->some_item = kmalloc();

return some_pointer;
}

some_destructor(struct some_struct *some_pointer)
{
kfree(some_pointer->some_item);
kfree(some_pointer);
}

That is, a structure may or may not allocate memory for various items in
the structure. They may stay as NULL. On the destructor side, instead of
testing if the items are NULL, we simply call kfree on them.

This explains why the object is not unlikely to be NULL in kfree.

Note, likely is not the same as majority of the time. If I were to say, 7%
of the next 100 days will snow. Can you say it is unlikely that it will
snow? Perhaps on a certain day it may be unlikely, but it is very likely
it will eventually snow. The same should go with the macros. If you say,
unlikely, it really should mean, I do not expect this to ever happen.
Not, the majority of the times it will not happen.

-- Steve

2009-03-25 16:18:38

by Matt Mackall

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 2009-03-25 at 11:08 -0400, Steven Rostedt wrote:
> On Wed, 25 Mar 2009, Steven Rostedt wrote:
>
> >
> > On Wed, 25 Mar 2009, Pekka Enberg wrote:
> >
> > > Hi Steven,
> > >
> > > On Wed, 25 Mar 2009, Pekka Enberg wrote:
> > > > > OK, so according to Steven, audit_syscall_exit() is one such call-site
> > > > > that shows up in the traces. I don't really understand what's going on
> > > > > there but if it is sane, maybe that warrants the removal of unlikely()
> > > > > from kfree(). Hmm?
> > >
> > > On Wed, 2009-03-25 at 10:47 -0400, Steven Rostedt wrote:
> > > > After disabling AUDIT_SYSCALLS I have this:
> > > >
> > > > # cat /debug/tracing/trace | sort -u
> > > >
> > > > record_nulls: ptr=(null) (ext3_get_acl+0x1e0/0x3f0 [ext3])
> > > > record_nulls: ptr=(null) (free_bitmap+0x29/0x70)
> > > > record_nulls: ptr=(null) (free_tty_struct+0x1d/0x40)
> > > > record_nulls: ptr=(null) (ftrace_graph_exit_task+0x1e/0x20)
> > > > record_nulls: ptr=(null) (inet_sock_destruct+0x1cb/0x2a0)
> > > > record_nulls: ptr=(null) (ip_cork_release+0x24/0x50)
> > > > record_nulls: ptr=(null) (keyctl_join_session_keyring+0x5a/0x70)
> > > > record_nulls: ptr=(null) (key_user_lookup+0x183/0x220)
> > > > record_nulls: ptr=(null) (kobject_set_name_vargs+0x43/0x50)
> > > > record_nulls: ptr=(null) (netlink_release+0x1a4/0x2f0)
> > > > record_nulls: ptr=(null) (release_sysfs_dirent+0x20/0xc0)
> > > > record_nulls: ptr=(null) (sysfs_open_file+0x1c8/0x3e0)
> > > > record_nulls: ptr=(null) (tty_write+0x16a/0x290)
> > > >
> > > > I added a hook to only record when NULL is passed into kfree.
> > > >
> > > > Also note, that after disabling AUDIT_SYSCALLS I now only have roughly 7%
> > > > NULL hit rate. Still, unlikely is probably not a benefit here.
> > >
> > > Thanks for doing this. Do you mean that 93% hit ratio is not enough to
> > > be a performance gain?
> >
> > I think it was Christoph Lameter (good you Cc'd him) told me that anything
> > less that 99% is not worthy of a (un)likely macro.
> >
> > I honestly don't know.
>
> I think the theory is that gcc and the CPU can handle normal branch
> predictions well. But if you use one of the prediction macros, gcc
> (and some archs) behaves differently, such that taking the wrong branch
> can cost more than the time saved with all the other correct hits.
>
> Again, I'm not sure. I haven't done the bench marks. Perhaps someone else
> is more apt at knowing the details here.

>From first principles, we can make a reasonable model of branch
prediction success with a branch cache:

hot cache cold cache cold cache cold cache
w|w/o hint good hint bad hint
p near 0 + + + -
p near .5 0 0 0 0
p near 1 + - + -

(this assumes the CPU is biased against branching in the cold cache
case)

Branch prediction miss has a penalty measured in some smallish number of
cycles. So the impact in cycles/sec[1] is (p(miss) * penalty) * (calls /
sec). Because the branch cache kicks in and hides our unlikely hint with
a hot cache, we can't get a high calls/sec, so to have much impact,
we've got to have a very high probably of a missed branch (p near 1)
_and_ cold cache.

So for CPUs with a branch cache, unlikely hints only make sense in
fairly extreme cases. And I think that includes most CPU families these
days as it's pretty much required to get much advantage from making the
CPU clock faster than the memory bus.

We'd have a lot of trouble benchmarking this meaningfully as hot caches
kill the effect. And it would of course depend directly on a given CPU's
branch cache size and branch miss penalty, numbers that vary from model
to model. So I think unless we can confidently state that a branch is
rarely taken, there's very little upside to adding unlikely.

On the other hand, there's also very little downside until our hint is
grossly inaccurate. So there's a huge hysteresis here: if p is < .99,
not much point adding unlikely. If p is > .1, not much point removing
it.

[1] Note that cycles/sec is dimensionless as cycles and seconds are both
measures of time. So impact here is in units of very small fractions of
a percent.
--
http://selenic.com : development and support for Mercurial and Linux

2009-03-25 16:34:22

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 25 Mar 2009, Matt Mackall wrote:
> >
> > I think the theory is that gcc and the CPU can handle normal branch
> > predictions well. But if you use one of the prediction macros, gcc
> > (and some archs) behaves differently, such that taking the wrong branch
> > can cost more than the time saved with all the other correct hits.
> >
> > Again, I'm not sure. I haven't done the bench marks. Perhaps someone else
> > is more apt at knowing the details here.
>
> >From first principles, we can make a reasonable model of branch
> prediction success with a branch cache:
>
> hot cache cold cache cold cache cold cache
> w|w/o hint good hint bad hint
> p near 0 + + + -
> p near .5 0 0 0 0
> p near 1 + - + -
>
> (this assumes the CPU is biased against branching in the cold cache
> case)
>
> Branch prediction miss has a penalty measured in some smallish number of
> cycles. So the impact in cycles/sec[1] is (p(miss) * penalty) * (calls /
> sec). Because the branch cache kicks in and hides our unlikely hint with
> a hot cache, we can't get a high calls/sec, so to have much impact,
> we've got to have a very high probably of a missed branch (p near 1)
> _and_ cold cache.
>
> So for CPUs with a branch cache, unlikely hints only make sense in
> fairly extreme cases. And I think that includes most CPU families these
> days as it's pretty much required to get much advantage from making the
> CPU clock faster than the memory bus.
>
> We'd have a lot of trouble benchmarking this meaningfully as hot caches
> kill the effect. And it would of course depend directly on a given CPU's
> branch cache size and branch miss penalty, numbers that vary from model
> to model. So I think unless we can confidently state that a branch is
> rarely taken, there's very little upside to adding unlikely.
>
> On the other hand, there's also very little downside until our hint is
> grossly inaccurate. So there's a huge hysteresis here: if p is < .99,
> not much point adding unlikely. If p is > .1, not much point removing
> it.
>
> [1] Note that cycles/sec is dimensionless as cycles and seconds are both
> measures of time. So impact here is in units of very small fractions of
> a percent.

Hi Matt,

Thanks for this info!

Although gcc plays a role too. That is, if we have

if (x)
do something small;

do something large;

this can be broken into:

cmp x
beq 1f
do something small
1:
do something large

Which plays nice with the cache. But, by adding a unlikely(x), gcc will
probably choose to do:

cmp x
bne 2f

1:
do something large

ret;

2:
do something small
b 1b

which hurts in a number of ways.

-- Steve

2009-03-25 20:26:21

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

Steven Rostedt wrote:
> Thanks for this info!
>
> Although gcc plays a role too. That is, if we have
>
> if (x)
> do something small;
>
> do something large;
>
>
> this can be broken into:
>
> cmp x
> beq 1f
> do something small
> 1:
> do something large
>
> Which plays nice with the cache. But, by adding a unlikely(x), gcc will
> probably choose to do:
>
> cmp x
> bne 2f
>
> 1:
> do something large
>
> ret;
>
> 2:
> do something small
> b 1b
>
> which hurts in a number of ways.
>

I think that's probably the dominant effect on x86 systems, because
Intel doesn't recommend using the branch hint prefixes as far as I can
tell (their consumption of icache space outweighs any benefit of priming
the predictor).

J

2009-03-25 21:04:59

by Matt Mackall

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 2009-03-25 at 12:34 -0400, Steven Rostedt wrote:
> On Wed, 25 Mar 2009, Matt Mackall wrote:
> > >
> > > I think the theory is that gcc and the CPU can handle normal branch
> > > predictions well. But if you use one of the prediction macros, gcc
> > > (and some archs) behaves differently, such that taking the wrong branch
> > > can cost more than the time saved with all the other correct hits.
> > >
> > > Again, I'm not sure. I haven't done the bench marks. Perhaps someone else
> > > is more apt at knowing the details here.
> >
> > >From first principles, we can make a reasonable model of branch
> > prediction success with a branch cache:
> >
> > hot cache cold cache cold cache cold cache
> > w|w/o hint good hint bad hint
> > p near 0 + + + -
> > p near .5 0 0 0 0
> > p near 1 + - + -
> >
> > (this assumes the CPU is biased against branching in the cold cache
> > case)
> >
> > Branch prediction miss has a penalty measured in some smallish number of
> > cycles. So the impact in cycles/sec[1] is (p(miss) * penalty) * (calls /
> > sec). Because the branch cache kicks in and hides our unlikely hint with
> > a hot cache, we can't get a high calls/sec, so to have much impact,
> > we've got to have a very high probably of a missed branch (p near 1)
> > _and_ cold cache.
> >
> > So for CPUs with a branch cache, unlikely hints only make sense in
> > fairly extreme cases. And I think that includes most CPU families these
> > days as it's pretty much required to get much advantage from making the
> > CPU clock faster than the memory bus.
> >
> > We'd have a lot of trouble benchmarking this meaningfully as hot caches
> > kill the effect. And it would of course depend directly on a given CPU's
> > branch cache size and branch miss penalty, numbers that vary from model
> > to model. So I think unless we can confidently state that a branch is
> > rarely taken, there's very little upside to adding unlikely.
> >
> > On the other hand, there's also very little downside until our hint is
> > grossly inaccurate. So there's a huge hysteresis here: if p is < .99,
> > not much point adding unlikely. If p is > .1, not much point removing
> > it.
> >
> > [1] Note that cycles/sec is dimensionless as cycles and seconds are both
> > measures of time. So impact here is in units of very small fractions of
> > a percent.
>
> Hi Matt,
>
> Thanks for this info!
>
> Although gcc plays a role too. That is, if we have
>
> if (x)
> do something small;
>
> do something large;
>
>
> this can be broken into:
>
> cmp x
> beq 1f
> do something small
> 1:
> do something large
>
> Which plays nice with the cache. But, by adding a unlikely(x), gcc will
> probably choose to do:
>
> cmp x
> bne 2f
>
> 1:
> do something large
>
> ret;
>
> 2:
> do something small
> b 1b
>
> which hurts in a number of ways.

The cost is an unconditional branch; the ret already exists. There's a
slightly larger cache footprint for the small branch and a slightly
smaller footprint for the large branch. If p is close to .5 and
calls/sec is high, the cache footprint is the sum of the footprint of
both branches. But if calls/sec is close to low, cache footprint is also
low.

So, yeah, I think this is a good additional argument to err on the side
of not adding these things at all. And I certainly wasn't intending to
defend the ones in kfree.

But I'm also skeptical of whether it's worth spending much time actively
routing out the moderately incorrect instances. It's going to be nearly
immune to performance benchmarking. We should instead just actively
discourage using unlikely in new code.

--
http://selenic.com : development and support for Mercurial and Linux

2009-03-25 21:12:18

by Matt Mackall

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 2009-03-25 at 13:26 -0700, Jeremy Fitzhardinge wrote:
> Steven Rostedt wrote:
> > Thanks for this info!
> >
> > Although gcc plays a role too. That is, if we have
> >
> > if (x)
> > do something small;
> >
> > do something large;
> >
> >
> > this can be broken into:
> >
> > cmp x
> > beq 1f
> > do something small
> > 1:
> > do something large
> >
> > Which plays nice with the cache. But, by adding a unlikely(x), gcc will
> > probably choose to do:
> >
> > cmp x
> > bne 2f
> >
> > 1:
> > do something large
> >
> > ret;
> >
> > 2:
> > do something small
> > b 1b
> >
> > which hurts in a number of ways.
> >
>
> I think that's probably the dominant effect on x86 systems, because
> Intel doesn't recommend using the branch hint prefixes as far as I can
> tell (their consumption of icache space outweighs any benefit of priming
> the predictor).

Yeah, I was actually thinking 'hint' in the gcc sense of adding
unlikely() or not, which results in, say, choosing one code layout vs
the other based on the CPU's cold-cache bias.

--
http://selenic.com : development and support for Mercurial and Linux

2009-03-25 21:24:23

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 25 Mar 2009, Matt Mackall wrote:
>
> The cost is an unconditional branch; the ret already exists. There's a
> slightly larger cache footprint for the small branch and a slightly
> smaller footprint for the large branch. If p is close to .5 and
> calls/sec is high, the cache footprint is the sum of the footprint of
> both branches. But if calls/sec is close to low, cache footprint is also
> low.
>
> So, yeah, I think this is a good additional argument to err on the side
> of not adding these things at all. And I certainly wasn't intending to
> defend the ones in kfree.
>
> But I'm also skeptical of whether it's worth spending much time actively
> routing out the moderately incorrect instances. It's going to be nearly
> immune to performance benchmarking. We should instead just actively
> discourage using unlikely in new code.
>

Sure. OK, actually I'd say they are valid for 100% hits. These are for
error conditions and trace point like data. Where, we want the least
amount of overhead when the condition is false.

But I can see, we've been brought up (well some of us) that branches are
horrible for pipelines, and any help in deciding the choice is always
tempting.

-- Steve

2009-03-25 21:28:35

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 25 Mar 2009 08:50:44 +0100 (CET) Thomas Gleixner <[email protected]> wrote:

> On Wed, 25 Mar 2009, Pekka Enberg wrote:
> > On Wed, Mar 25, 2009 at 7:19 AM, Steven Rostedt <[email protected]> wrote:
> > > This makes sense, since we now encourage developers to just call kfree
> > > without checking for NULL.
> >
> > But those are _error handling paths_ (at least supposed to be). I
> > wonder which call-sites are responsible for this. Can frtrace help us
> > here?
>

-mm's profile-likely-unlikely-macros.patch gives the backtraces you're
looking for.

I agree, probably some particular callsite is doing something silly and
is skewing all the instrumentation.

We have at times identified callsites where kfree(0) is so likely that
"remove the NULL test" was an undesirable change.

Many many many kfree() callsites _know_ that the pointer isn't NULL.
Having that test in kfree() was always stupid. What we should do is to
have a kfree_it_isnt_null() for those callsites so they can omit the test
altogether.

2009-03-26 16:10:33

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, Mar 25, 2009 at 03:51:49PM +0200, Pekka Enberg wrote:
> OK, so according to Steven, audit_syscall_exit() is one such call-site
> that shows up in the traces. I don't really understand what's going on
> there but if it is sane, maybe that warrants the removal of unlikely()
> from kfree(). Hmm?

*Shrug*

We certainly can add explicit check, but that'll keep asking for
patches "removing useless check". The same applies to other places,
really.

And making kfree(NULL) break will keep reintroducing bugs, since people
will expect the behaviour of free(3)...

I don't have any serious objections to adding a check there; indeed,
the normal case there (
if (context->state != AUDIT_RECORD_CONTEXT) {
kfree(context->filterkey);
context->filterkey = NULL;
}
) is context->state != AUDIT_RECORD_CONTEXT, context->filterkey == NULL,
so it might be worth turning into
if (unlikely(context->filterkey)) {
if (context->state != AUDIT_RECORD_CONTEXT) {
kfree(context->filterkey);
context->filterkey = NULL;
}
}
anyway. Just dubious about the goal in general...

2009-03-26 16:23:53

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Thu, 26 Mar 2009 16:10:19 +0000 Al Viro <[email protected]> wrote:

> We certainly can add explicit check, but that'll keep asking for
> patches "removing useless check". The same applies to other places,
> really.

Such patches are easily prevented by commenting the exceptional cases.