2004-01-26 01:21:10

by Philippe Elie

[permalink] [raw]
Subject: [PATCH] oprofile per-cpu buffer overrun

Hi Andrew,

In a ring buffer controlled by a read and write positions we
can't use buffer_size but only buffer_size - 1 entry, the last
free entry act as a guard to avoid write pos overrun. This bug
was hidden because the writer, oprofile_add_sample(), request
one more entry than really needed.

regards,
Phil

Index: drivers/oprofile/cpu_buffer.c
===================================================================
RCS file: /usr/local/cvsroot/linux-2.5/drivers/oprofile/cpu_buffer.c,v
retrieving revision 1.9
diff -u -p -r1.9 cpu_buffer.c
--- drivers/oprofile/cpu_buffer.c 26 May 2003 04:42:54 -0000 1.9
+++ drivers/oprofile/cpu_buffer.c 24 Jan 2004 21:07:03 -0000
@@ -86,9 +86,9 @@ static unsigned long nr_available_slots(
unsigned long tail = b->tail_pos;

if (tail > head)
- return tail - head;
+ return (tail - head) - 1;

- return tail + (b->buffer_size - head);
+ return tail + (b->buffer_size - head) - 1;
}



2004-01-26 04:06:56

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] oprofile per-cpu buffer overrun

Philippe Elie <[email protected]> wrote:
>
> Hi Andrew,
>
> In a ring buffer controlled by a read and write positions we
> can't use buffer_size but only buffer_size - 1 entry,

you can, actually.

> the last
> free entry act as a guard to avoid write pos overrun. This bug
> was hidden because the writer, oprofile_add_sample(), request
> one more entry than really needed.
>
>...
> diff -u -p -r1.9 cpu_buffer.c
> --- drivers/oprofile/cpu_buffer.c 26 May 2003 04:42:54 -0000 1.9
> +++ drivers/oprofile/cpu_buffer.c 24 Jan 2004 21:07:03 -0000
> @@ -86,9 +86,9 @@ static unsigned long nr_available_slots(
> unsigned long tail = b->tail_pos;
>
> if (tail > head)
> - return tail - head;
> + return (tail - head) - 1;
>
> - return tail + (b->buffer_size - head);
> + return tail + (b->buffer_size - head) - 1;
> }

When implementing a circular buffer it is better to not constrain the head
and tail indices - just let them grow and wrap without bound. You only need
to bring them in-bounds when you actually use them to index the buffer.

This way,

- head-tail is always the amount of used space, no need to futz around
handling the case where one has wrapped and the other hasn't.

- you get to use all of the buffer, because the cases head-tail == 0
(empty) and head-tail == bufsize (full) are now distinguishable.

It helps if the buffer size is a power of two, of course, but integer
modulus is pretty quick.

All the net drivers and the printk log buffer implement their ring buffers
in this way; it works nicely.

2004-01-26 05:59:31

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH] oprofile per-cpu buffer overrun


> It helps if the buffer size is a power of two, of course, but integer
> modulus is pretty quick.

If its something thats hit real hard, the way the modulus is done can
make a difference. We've seen it in various places (eg disabling tigon 1
support in the acenic changes the driver from using a variable to a
compile time constant and you can actually see it in the profile):

quickest == power of 2 compile time constant (results in a mask)
quickish == compile time constant (results in the multiplication by
an inverse trick)
slow == run time (results in a divide)

Of course you can do like the printk stuff and use a variable but
enforce a power of 2 value and & it.

Anton

--

unsigned long i, j, k;

int quickest()
{
j = i % 2;
}

int quickish()
{
j = i % 3;
}

int slow()
{
j = i % k;
}

--

quickest:
lis 9,i@ha
lwz 0,i@l(9)
lis 9,j@ha
rlwinm 0,0,0,31,31
stw 0,j@l(9)
blr
quickish:
lis 9,i@ha
lis 0,0xaaaa
lwz 11,i@l(9)
ori 0,0,43691
lis 9,j@ha
mulhwu 0,11,0
srwi 0,0,1
mulli 0,0,3
subf 11,0,11
stw 11,j@l(9)
blr
slow:
lis 9,i@ha
lis 11,k@ha
lwz 10,i@l(9)
lwz 9,k@l(11)
divwu 0,10,9
mullw 0,0,9
lis 9,j@ha
subf 10,0,10
stw 10,j@l(9)
blr

2004-01-26 10:32:39

by John Levon

[permalink] [raw]
Subject: Re: [PATCH] oprofile per-cpu buffer overrun

On Sun, Jan 25, 2004 at 08:07:01PM -0800, Andrew Morton wrote:

> When implementing a circular buffer it is better to not constrain the head
> and tail indices - just let them grow and wrap without bound. You only need
> to bring them in-bounds when you actually use them to index the buffer.

I'm not sure why that's better.

> - head-tail is always the amount of used space, no need to futz around
> handling the case where one has wrapped and the other hasn't.

I admit I have a hangover, but it seems to me it would actually be more
complicated to make damn sure that the integer overflow case is
definitely handled properly.

I clearly can't write functioning ring buffers :), so I'd prefer it to
stay as simple as possible.

regards,
john
--
Khendon's Law:
If the same point is made twice by the same person, the thread is over.

2004-01-26 11:10:21

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] oprofile per-cpu buffer overrun



John Levon wrote:

>On Sun, Jan 25, 2004 at 08:07:01PM -0800, Andrew Morton wrote:
>
>
>>When implementing a circular buffer it is better to not constrain the head
>>and tail indices - just let them grow and wrap without bound. You only need
>>to bring them in-bounds when you actually use them to index the buffer.
>>
>
>I'm not sure why that's better.
>
>
>>- head-tail is always the amount of used space, no need to futz around
>> handling the case where one has wrapped and the other hasn't.
>>
>
>I admit I have a hangover, but it seems to me it would actually be more
>complicated to make damn sure that the integer overflow case is
>definitely handled properly.
>

You needn't worry about integer overflow - it is handled properly ;)


2004-01-26 14:36:00

by Philippe Elie

[permalink] [raw]
Subject: Re: [PATCH] oprofile per-cpu buffer overrun

John Levon wrote:
> On Sun, Jan 25, 2004 at 08:07:01PM -0800, Andrew Morton wrote:
>
>
>>When implementing a circular buffer it is better to not constrain the head
>>and tail indices - just let them grow and wrap without bound. You only need
>>to bring them in-bounds when you actually use them to index the buffer.

neat!

> I'm not sure why that's better.

We win in increment_head/increment_tail:

static void increment_head(struct oprofile_cpu_buffer * b)
{
-
unsigned long new_head = b->head_pos + 1;
wmb();
-
if (new_head < (b->buffer_size))
-
b->head_pos = new_head;
-
else
-
b->head_pos = 0;
+
b->head_pos++;
}

for this added cost when accessing the buffer:

b->buffer[b->head & b->buffer_size_mask];

Modulo use is not worth but with buffer_size a power of 2
it's probably a win, I'll try and measure this later, not
urgent since the problem is fixed, I added it in our todo.

regards,
Phil