2009-04-24 08:01:25

by Metzger, Markus T

[permalink] [raw]
Subject: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer

Use vmalloc to allocate the branch trace buffer.

Peter Zijlstra suggested to use vmalloc rather than kmalloc to
allocate the potentially multi-page branch trace buffer.

Is there a way to have vmalloc allocate a physically non-contiguous
buffer for test purposes? Ideally, the memory area would have big
holes in it with sensitive data in between so I would know immediately
when this is overwritten.


Reported-by: Peter Zijlstra <[email protected]>
CC: Andrew Morton <[email protected]>
Signed-off-by: Markus Metzger <[email protected]>
---
arch/x86/kernel/ptrace.c | 5 3 + 2 - 0 !
1 file changed, 3 insertions(+), 2 deletions(-)

Index: b/arch/x86/kernel/ptrace.c
===================================================================
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -22,6 +22,7 @@
#include <linux/seccomp.h>
#include <linux/signal.h>
#include <linux/workqueue.h>
+#include <linux/vmalloc.h>

#include <asm/uaccess.h>
#include <asm/pgtable.h>
@@ -626,7 +627,7 @@ static int alloc_bts_buffer(struct bts_c
if (err < 0)
return err;

- buffer = kzalloc(size, GFP_KERNEL);
+ buffer = vmalloc(size);
if (!buffer)
goto out_refund;

@@ -646,7 +647,7 @@ static inline void free_bts_buffer(struc
if (!context->buffer)
return;

- kfree(context->buffer);
+ vfree(context->buffer);
context->buffer = NULL;

refund_locked_memory(context->mm, context->size);
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


2009-04-24 08:21:58

by Andrew Morton

[permalink] [raw]
Subject: Re: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer

On Fri, 24 Apr 2009 10:00:55 +0200 Markus Metzger <[email protected]> wrote:

> Use vmalloc to allocate the branch trace buffer.
>
> Peter Zijlstra suggested to use vmalloc rather than kmalloc to
> allocate the potentially multi-page branch trace buffer.

The changelog provides no reason for this change. It should do so.

> Is there a way to have vmalloc allocate a physically non-contiguous
> buffer for test purposes? Ideally, the memory area would have big
> holes in it with sensitive data in between so I would know immediately
> when this is overwritten.

I suppose you could allocate the pages by hand and then vmap() them.
Allocating 2* the number you need and then freeing every second one
should make them physically holey.

> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -22,6 +22,7 @@
> #include <linux/seccomp.h>
> #include <linux/signal.h>
> #include <linux/workqueue.h>
> +#include <linux/vmalloc.h>
>
> #include <asm/uaccess.h>
> #include <asm/pgtable.h>
> @@ -626,7 +627,7 @@ static int alloc_bts_buffer(struct bts_c
> if (err < 0)
> return err;
>
> - buffer = kzalloc(size, GFP_KERNEL);
> + buffer = vmalloc(size);
> if (!buffer)
> goto out_refund;
>
> @@ -646,7 +647,7 @@ static inline void free_bts_buffer(struc
> if (!context->buffer)
> return;
>
> - kfree(context->buffer);
> + vfree(context->buffer);
> context->buffer = NULL;
>

The patch looks like a regression to me. vmalloc memory is slower to
allocate, slower to free, slower to access and can exhaust or fragment
the vmalloc arena. Confused.

2009-04-24 08:33:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer


* Andrew Morton <[email protected]> wrote:

> On Fri, 24 Apr 2009 10:00:55 +0200 Markus Metzger <[email protected]> wrote:
>
> > Use vmalloc to allocate the branch trace buffer.
> >
> > Peter Zijlstra suggested to use vmalloc rather than kmalloc to
> > allocate the potentially multi-page branch trace buffer.
>
> The changelog provides no reason for this change. It should do so.
>
> > Is there a way to have vmalloc allocate a physically non-contiguous
> > buffer for test purposes? Ideally, the memory area would have big
> > holes in it with sensitive data in between so I would know immediately
> > when this is overwritten.
>
> I suppose you could allocate the pages by hand and then vmap() them.
> Allocating 2* the number you need and then freeing every second one
> should make them physically holey.
>
> > --- a/arch/x86/kernel/ptrace.c
> > +++ b/arch/x86/kernel/ptrace.c
> > @@ -22,6 +22,7 @@
> > #include <linux/seccomp.h>
> > #include <linux/signal.h>
> > #include <linux/workqueue.h>
> > +#include <linux/vmalloc.h>
> >
> > #include <asm/uaccess.h>
> > #include <asm/pgtable.h>
> > @@ -626,7 +627,7 @@ static int alloc_bts_buffer(struct bts_c
> > if (err < 0)
> > return err;
> >
> > - buffer = kzalloc(size, GFP_KERNEL);
> > + buffer = vmalloc(size);
> > if (!buffer)
> > goto out_refund;
> >
> > @@ -646,7 +647,7 @@ static inline void free_bts_buffer(struc
> > if (!context->buffer)
> > return;
> >
> > - kfree(context->buffer);
> > + vfree(context->buffer);
> > context->buffer = NULL;
> >
>
> The patch looks like a regression to me. vmalloc memory is slower
> to allocate, slower to free, slower to access and can exhaust or
> fragment the vmalloc arena. Confused.

Performance does not matter here (this is really a slowpath), but
fragmentation does matter, especially on 32-bit systems.

I'd not uglify the code via vmap() - and vmap has the same
fundamental address space limitations on 32-bit as vmalloc().

The existing kmalloc() is fine. We do larger than PAGE_SIZE
allocations elsewhere too (the kernel stack for example), and this
is a debug facility, so failing the allocation is not a big problem
even if it happens.

Ingo

2009-04-24 08:41:05

by Metzger, Markus T

[permalink] [raw]
Subject: RE: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer

>-----Original Message-----
>From: Ingo Molnar [mailto:[email protected]]
>Sent: Friday, April 24, 2009 10:31 AM
>To: Andrew Morton
>Cc: Metzger, Markus T; [email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; Villacis, Juan; [email protected]; linux-
>[email protected]; [email protected]; [email protected]
>Subject: Re: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer
>
>
>* Andrew Morton <[email protected]> wrote:
>
>> On Fri, 24 Apr 2009 10:00:55 +0200 Markus Metzger <[email protected]> wrote:
>>
>> > Use vmalloc to allocate the branch trace buffer.
>> >
>> > Peter Zijlstra suggested to use vmalloc rather than kmalloc to
>> > allocate the potentially multi-page branch trace buffer.
>>
>> The changelog provides no reason for this change. It should do so.
>>
>> > Is there a way to have vmalloc allocate a physically non-contiguous
>> > buffer for test purposes? Ideally, the memory area would have big
>> > holes in it with sensitive data in between so I would know immediately
>> > when this is overwritten.
>>
>> I suppose you could allocate the pages by hand and then vmap() them.
>> Allocating 2* the number you need and then freeing every second one
>> should make them physically holey.
>>
>> > --- a/arch/x86/kernel/ptrace.c
>> > +++ b/arch/x86/kernel/ptrace.c
>> > @@ -22,6 +22,7 @@
>> > #include <linux/seccomp.h>
>> > #include <linux/signal.h>
>> > #include <linux/workqueue.h>
>> > +#include <linux/vmalloc.h>
>> >
>> > #include <asm/uaccess.h>
>> > #include <asm/pgtable.h>
>> > @@ -626,7 +627,7 @@ static int alloc_bts_buffer(struct bts_c
>> > if (err < 0)
>> > return err;
>> >
>> > - buffer = kzalloc(size, GFP_KERNEL);
>> > + buffer = vmalloc(size);
>> > if (!buffer)
>> > goto out_refund;
>> >
>> > @@ -646,7 +647,7 @@ static inline void free_bts_buffer(struc
>> > if (!context->buffer)
>> > return;
>> >
>> > - kfree(context->buffer);
>> > + vfree(context->buffer);
>> > context->buffer = NULL;
>> >
>>
>> The patch looks like a regression to me. vmalloc memory is slower
>> to allocate, slower to free, slower to access and can exhaust or
>> fragment the vmalloc arena. Confused.
>
>Performance does not matter here (this is really a slowpath), but
>fragmentation does matter, especially on 32-bit systems.
>
>I'd not uglify the code via vmap() - and vmap has the same
>fundamental address space limitations on 32-bit as vmalloc().
>
>The existing kmalloc() is fine. We do larger than PAGE_SIZE
>allocations elsewhere too (the kernel stack for example), and this
>is a debug facility, so failing the allocation is not a big problem
>even if it happens.

OK. I'll drop 2/2 and send out 1/2 as a patch, then.

The original suggestion was to use the page allocator and vmap().
I assume you don't want that, either.

thanks,
markus.

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2009-04-25 06:41:11

by Andi Kleen

[permalink] [raw]
Subject: Re: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer

Markus Metzger wrote:
> Use vmalloc to allocate the branch trace buffer.
>
> Peter Zijlstra suggested to use vmalloc rather than kmalloc to
> allocate the potentially multi-page branch trace buffer.
>
> Is there a way to have vmalloc allocate a physically non-contiguous
> buffer for test purposes? Ideally, the memory area would have big
> holes in it with sensitive data in between so I would know immediately
> when this is overwritten.

For test purposes you could hack vmalloc.c to allocate more pages
and only put in every second/third/... into the mapping. You would
just need to add another to loop to __vmalloc_area_node() that allocates
more. That should give you non continuous mappings unless you're really unlucky.

-Andi


>
>
> Reported-by: Peter Zijlstra <[email protected]>
> CC: Andrew Morton <[email protected]>
> Signed-off-by: Markus Metzger <[email protected]>
> ---
> arch/x86/kernel/ptrace.c | 5 3 + 2 - 0 !
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> Index: b/arch/x86/kernel/ptrace.c
> ===================================================================
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -22,6 +22,7 @@
> #include <linux/seccomp.h>
> #include <linux/signal.h>
> #include <linux/workqueue.h>
> +#include <linux/vmalloc.h>
>
> #include <asm/uaccess.h>
> #include <asm/pgtable.h>
> @@ -626,7 +627,7 @@ static int alloc_bts_buffer(struct bts_c
> if (err < 0)
> return err;
>
> - buffer = kzalloc(size, GFP_KERNEL);
> + buffer = vmalloc(size);
> if (!buffer)
> goto out_refund;
>
> @@ -646,7 +647,7 @@ static inline void free_bts_buffer(struc
> if (!context->buffer)
> return;
>
> - kfree(context->buffer);
> + vfree(context->buffer);
> context->buffer = NULL;
>
> refund_locked_memory(context->mm, context->size);

2009-04-26 16:09:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer


* Metzger, Markus T <[email protected]> wrote:

> >-----Original Message-----
> >From: Ingo Molnar [mailto:[email protected]]
> >Sent: Friday, April 24, 2009 10:31 AM
> >To: Andrew Morton
> >Cc: Metzger, Markus T; [email protected]; [email protected]; [email protected];
> >[email protected]; [email protected]; Villacis, Juan; [email protected]; linux-
> >[email protected]; [email protected]; [email protected]
> >Subject: Re: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer
> >
> >
> >* Andrew Morton <[email protected]> wrote:
> >
> >> On Fri, 24 Apr 2009 10:00:55 +0200 Markus Metzger <[email protected]> wrote:
> >>
> >> > Use vmalloc to allocate the branch trace buffer.
> >> >
> >> > Peter Zijlstra suggested to use vmalloc rather than kmalloc to
> >> > allocate the potentially multi-page branch trace buffer.
> >>
> >> The changelog provides no reason for this change. It should do so.
> >>
> >> > Is there a way to have vmalloc allocate a physically non-contiguous
> >> > buffer for test purposes? Ideally, the memory area would have big
> >> > holes in it with sensitive data in between so I would know immediately
> >> > when this is overwritten.
> >>
> >> I suppose you could allocate the pages by hand and then vmap() them.
> >> Allocating 2* the number you need and then freeing every second one
> >> should make them physically holey.
> >>
> >> > --- a/arch/x86/kernel/ptrace.c
> >> > +++ b/arch/x86/kernel/ptrace.c
> >> > @@ -22,6 +22,7 @@
> >> > #include <linux/seccomp.h>
> >> > #include <linux/signal.h>
> >> > #include <linux/workqueue.h>
> >> > +#include <linux/vmalloc.h>
> >> >
> >> > #include <asm/uaccess.h>
> >> > #include <asm/pgtable.h>
> >> > @@ -626,7 +627,7 @@ static int alloc_bts_buffer(struct bts_c
> >> > if (err < 0)
> >> > return err;
> >> >
> >> > - buffer = kzalloc(size, GFP_KERNEL);
> >> > + buffer = vmalloc(size);
> >> > if (!buffer)
> >> > goto out_refund;
> >> >
> >> > @@ -646,7 +647,7 @@ static inline void free_bts_buffer(struc
> >> > if (!context->buffer)
> >> > return;
> >> >
> >> > - kfree(context->buffer);
> >> > + vfree(context->buffer);
> >> > context->buffer = NULL;
> >> >
> >>
> >> The patch looks like a regression to me. vmalloc memory is slower
> >> to allocate, slower to free, slower to access and can exhaust or
> >> fragment the vmalloc arena. Confused.
> >
> >Performance does not matter here (this is really a slowpath), but
> >fragmentation does matter, especially on 32-bit systems.
> >
> >I'd not uglify the code via vmap() - and vmap has the same
> >fundamental address space limitations on 32-bit as vmalloc().
> >
> >The existing kmalloc() is fine. We do larger than PAGE_SIZE
> >allocations elsewhere too (the kernel stack for example), and this
> >is a debug facility, so failing the allocation is not a big problem
> >even if it happens.
>
> OK. I'll drop 2/2 and send out 1/2 as a patch, then.

ok - i've already applied 1/2 so unless you can see a bug we should
be fine.

> The original suggestion was to use the page allocator and vmap().
> I assume you don't want that, either.

Yeah - i'd rather suggest to avoid that complexity - unless there
are good reasons.

Ingo

2009-04-27 06:35:32

by Metzger, Markus T

[permalink] [raw]
Subject: RE: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer

>-----Original Message-----
>From: Andi Kleen [mailto:[email protected]]
>Sent: Saturday, April 25, 2009 8:41 AM
>To: Metzger, Markus T; Linux Kernel Mailing List
>Subject: Re: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer
>
>Markus Metzger wrote:
>> Use vmalloc to allocate the branch trace buffer.
>>
>> Peter Zijlstra suggested to use vmalloc rather than kmalloc to
>> allocate the potentially multi-page branch trace buffer.
>>
>> Is there a way to have vmalloc allocate a physically non-contiguous
>> buffer for test purposes? Ideally, the memory area would have big
>> holes in it with sensitive data in between so I would know immediately
>> when this is overwritten.
>
>For test purposes you could hack vmalloc.c to allocate more pages
>and only put in every second/third/... into the mapping. You would
>just need to add another to loop to __vmalloc_area_node() that allocates
>more. That should give you non continuous mappings unless you're really unlucky.

Thanks.

I got enough feedback that the existing method is preferable to using vmalloc
that I dropped it again.

regards,
markus.


>
>-Andi
>
>
>>
>>
>> Reported-by: Peter Zijlstra <[email protected]>
>> CC: Andrew Morton <[email protected]>
>> Signed-off-by: Markus Metzger <[email protected]>
>> ---
>> arch/x86/kernel/ptrace.c | 5 3 + 2 - 0 !
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> Index: b/arch/x86/kernel/ptrace.c
>> ===================================================================
>> --- a/arch/x86/kernel/ptrace.c
>> +++ b/arch/x86/kernel/ptrace.c
>> @@ -22,6 +22,7 @@
>> #include <linux/seccomp.h>
>> #include <linux/signal.h>
>> #include <linux/workqueue.h>
>> +#include <linux/vmalloc.h>
>>
>> #include <asm/uaccess.h>
>> #include <asm/pgtable.h>
>> @@ -626,7 +627,7 @@ static int alloc_bts_buffer(struct bts_c
>> if (err < 0)
>> return err;
>>
>> - buffer = kzalloc(size, GFP_KERNEL);
>> + buffer = vmalloc(size);
>> if (!buffer)
>> goto out_refund;
>>
>> @@ -646,7 +647,7 @@ static inline void free_bts_buffer(struc
>> if (!context->buffer)
>> return;
>>
>> - kfree(context->buffer);
>> + vfree(context->buffer);
>> context->buffer = NULL;
>>
>> refund_locked_memory(context->mm, context->size);

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2009-04-27 10:53:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer

On Fri, 2009-04-24 at 10:31 +0200, Ingo Molnar wrote:
> * Andrew Morton <[email protected]> wrote:
>

> > The patch looks like a regression to me. vmalloc memory is slower
> > to allocate, slower to free, slower to access and can exhaust or
> > fragment the vmalloc arena. Confused.
>
> Performance does not matter here (this is really a slowpath), but
> fragmentation does matter, especially on 32-bit systems.
>
> I'd not uglify the code via vmap() - and vmap has the same
> fundamental address space limitations on 32-bit as vmalloc().
>
> The existing kmalloc() is fine. We do larger than PAGE_SIZE
> allocations elsewhere too (the kernel stack for example), and this
> is a debug facility, so failing the allocation is not a big problem
> even if it happens.

Nobody has yet told what the typical size of these allocations are. If
they're large enough to account in pages, one should arguable use the
page allocator not kmalloc. Also, any >3 order allocation (>32kb) are
very likely to fail. Having this ptrace interface work in the (unloaded)
development environment but not in a (loaded) production environment
will render it basically useless IMHO.

Having a regular high order allocation with vmalloc/vmap fallback is
quite normal, esp. if one wants to promote the use of this facility as
usable.

So, no, I very strongly disagree that the existing kmalloc is fine.


PS. I get shitloads of order-4 alloc failures from GEM after a few days
of uptime on my laptop.

2009-04-29 09:14:47

by Metzger, Markus T

[permalink] [raw]
Subject: RE: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer

>-----Original Message-----
>From: Peter Zijlstra [mailto:[email protected]]
>Sent: Monday, April 27, 2009 12:54 PM
>To: Ingo Molnar
>Cc: Andrew Morton; Metzger, Markus T; [email protected]; [email protected];
>[email protected]; [email protected]; Villacis, Juan; [email protected]; linux-
>[email protected]; [email protected]; [email protected]
>Subject: Re: [rfc 2/2] x86, bts: use physically non-contiguous trace buffer
>
>On Fri, 2009-04-24 at 10:31 +0200, Ingo Molnar wrote:
>> * Andrew Morton <[email protected]> wrote:
>>
>
>> > The patch looks like a regression to me. vmalloc memory is slower
>> > to allocate, slower to free, slower to access and can exhaust or
>> > fragment the vmalloc arena. Confused.
>>
>> Performance does not matter here (this is really a slowpath), but
>> fragmentation does matter, especially on 32-bit systems.
>>
>> I'd not uglify the code via vmap() - and vmap has the same
>> fundamental address space limitations on 32-bit as vmalloc().
>>
>> The existing kmalloc() is fine. We do larger than PAGE_SIZE
>> allocations elsewhere too (the kernel stack for example), and this
>> is a debug facility, so failing the allocation is not a big problem
>> even if it happens.
>
>Nobody has yet told what the typical size of these allocations are. If
>they're large enough to account in pages, one should arguable use the
>page allocator not kmalloc. Also, any >3 order allocation (>32kb) are
>very likely to fail. Having this ptrace interface work in the (unloaded)
>development environment but not in a (loaded) production environment
>will render it basically useless IMHO.

The size of these allocations is limited by the lockable memory ulimit
and debuggers typically need to divide this between multiple threads.

In practice, I would expect the buffers to be small. One page already
gives you ~170 entries and the most interesting part is the tail of
the trace, anyway.


>Having a regular high order allocation with vmalloc/vmap fallback is
>quite normal, esp. if one wants to promote the use of this facility as
>usable.
>
>So, no, I very strongly disagree that the existing kmalloc is fine.

I could add the vmalloc fallback. Is there already some function that
tries kmalloc and falls back to vmalloc which I should use?

thanks,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.