2010-04-02 23:10:54

by Jiaying Zhang

[permalink] [raw]
Subject: Re: lockup in rb_get_reader_page

I finally reproduced the problem again with an instrumented kernel. Here are
the logged debugging messages I put right after the spin label in
rb_get_reader_page():

<4>cpu 0 may deadlock, loops 100
<4>reader pgoff 11 list ffff8100025c9c40 prev ffff8100025c9b00 next
ffff8100025c9a41
<4>reader_page pgoff 11 list ffff8100025c9c40 prev ffff8100025c9b00
next ffff8100025c9a41
<4>head_page pgoff 11 list ffff8100025c9c40 prev ffff8100025c9b00 next
ffff8100025c9a41
<4>tail_page pgoff 3 list ffff8100025c9a40 prev ffff8100025c9d00 next
ffff8100025c9e40
<4>commit_page pgoff 3 list ffff8100025c9a40 prev ffff8100025c9d00
next ffff8100025c9e40
<4>page offset 4 list ffff8100025c9b00 prev ffff8100025c9c80 next
ffff8100025c9d00
<4>page offset 12 list ffff8100025c9d00 prev ffff8100025c9b00 next
ffff8100025c9a40
<4>page offset 3 list ffff8100025c9a40 prev ffff8100025c9d00 next
ffff8100025c9e40
<4>page offset 17 list ffff8100025c9e40 prev ffff8100025c9a40 next
ffff8100025c9d80
<4>page offset 14 list ffff8100025c9d80 prev ffff8100025c9e40 next
ffff8100025c9cc0
<4>page offset 10 list ffff8100025c9cc0 prev ffff8100025c9d80 next
ffff8100025c9b80
<4>page offset 7 list ffff8100025c9b80 prev ffff8100025c9cc0 next
ffff8100025c9ac0
<4>page offset 5 list ffff8100025c9ac0 prev ffff8100025c9b80 next
ffff8100025c9a00
<4>page offset 1 list ffff8100025c9a00 prev ffff8100025c9ac0 next
ffff8100025c9d40
<4>page offset 13 list ffff8100025c9d40 prev ffff8100025c9a00 next
ffff8100025c9e00
<4>page offset 15 list ffff8100025c9e00 prev ffff8100025c9d40 next
ffff8100025c9a80
<4>page offset 2 list ffff8100025c9a80 prev ffff8100025c9e00 next
ffff8100025c9bc0
<4>page offset 6 list ffff8100025c9bc0 prev ffff8100025c9a80 next
ffff8100025c9dc0
<4>page offset 16 list ffff8100025c9dc0 prev ffff8100025c9bc0 next
ffff8100025c99c0
<4>page offset 0 list ffff8100025c99c0 prev ffff8100025c9dc0 next
ffff8100025c9c00
<4>page offset 8 list ffff8100025c9c00 prev ffff8100025c99c0 next
ffff8100025c9c80
<4>page offset 9 list ffff8100025c9c80 prev ffff8100025c9c00 next
ffff8100025c9b00

The page offset is the index we added in the buffer_page structure.
You can ignore this field. The interesting part here is that both
cpu_buffer->head_page and cpu_buffer->reader_page point to the
same buffer_page. I am not sure yet how we entered this situation,
but the problem is once we get here, we will be in an infinite loop.

At the beginning of the spin loop, we call rb_set_head_page() to grab
the head_page. In that function, we check whether a page is the head_page
with rb_is_head_page(). The problem is that rb_is_head_page() may
return RB_PAGE_MOVED if the head_page has changed to another
page, and that is what has happened as the above messages show.
Shouldn't we just return 0 in case that head_page has moved so that
we can move to the next page in the loop inside rb_set_head_page()?

Jiaying

On Wed, Mar 31, 2010 at 6:23 PM, Steven Rostedt <[email protected]> wrote:
> On Wed, 2010-03-31 at 18:14 -0700, Jiaying Zhang wrote:
>> On Wed, Mar 31, 2010 at 5:28 PM, Steven Rostedt <[email protected]> wrote:
>> >
>> > Are you doing reads from interrupt context?
>> >
>> No. We are using a user-space reader that reads ring buffer via syscall
>> or ioctl. But even though we don't read from interrupt context, as I
>> understand, we can still have a reader on another cpuB reads cpuA's
>> buffer when the writer on cpuA moves the head page. Will we have
>> any problem in that case?
>
> Nope, unless that write hangs for some reason ;-)
>
> The read may spin waiting for a writer to finish, hence it just can not
> preempt a writer in an interrupt context, because then it may spin
> forever, because it is blocking the writer. But if a writer is on
> another CPU, then all is well. In fact, the ring buffer was designed
> with that in mind.
>
> -- Steve
>
>


2010-04-03 12:45:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: lockup in rb_get_reader_page

On Fri, 2010-04-02 at 16:10 -0700, Jiaying Zhang wrote:

> The page offset is the index we added in the buffer_page structure.
> You can ignore this field. The interesting part here is that both
> cpu_buffer->head_page and cpu_buffer->reader_page point to the
> same buffer_page. I am not sure yet how we entered this situation,

You can ignore the cpu_buffer->head_page, it is used as a reference and
is not part of the main algorithm. It is just there to tell the reader
where the last head page was.

> but the problem is once we get here, we will be in an infinite loop.

But yes, it should never point to the reader page, because the reader
controls the head_page __and__ the reader page.

>
> At the beginning of the spin loop, we call rb_set_head_page() to grab
> the head_page. In that function, we check whether a page is the head_page
> with rb_is_head_page(). The problem is that rb_is_head_page() may
> return RB_PAGE_MOVED if the head_page has changed to another
> page, and that is what has happened as the above messages show.

I don't see where it said that.

If RB_PAGE_MOVED is returned in rb_set_head_page then something is very
broken. Because that is only returned if the reader modified the code.
And since we only allow one reader at a time (we have locks to protect
that), and the rb_set_head_page is only called by the reader, then this
would mean another reader is reading the ring buffer.

I should add a:

if ((ret = rb_is_head_page(cpu_buffer, page, page->list.prev))) {
RB_WARN_ON(ret == RB_PAGE_MOVED);
cpu_buffer->head_page = page;
return page;
}


> Shouldn't we just return 0 in case that head_page has moved so that
> we can move to the next page in the loop inside rb_set_head_page()?

No, when the reader moves the page, the RB_PAGE_MOVED forces the writer
to go into the conflict path (conflict between writer and reader).

-- Steve

2010-04-06 21:12:53

by Jiaying Zhang

[permalink] [raw]
Subject: Re: lockup in rb_get_reader_page

Hi Steven,

On Sat, Apr 3, 2010 at 5:45 AM, Steven Rostedt <[email protected]> wrote:
> On Fri, 2010-04-02 at 16:10 -0700, Jiaying Zhang wrote:
>
>> The page offset is the index we added in the buffer_page structure.
>> You can ignore this field. The interesting part here is that both
>> cpu_buffer->head_page and cpu_buffer->reader_page point to the
>> same buffer_page. I am not sure yet how we entered this situation,
>
> You can ignore the cpu_buffer->head_page, it is used as a reference and
> is not part of the main algorithm. It is just there to tell the reader
> where the last head page was.
>
>> but the problem is once we get here, we will be in an infinite loop.
>
> But yes, it should never point to the reader page, because the reader
> controls the head_page __and__ the reader page.
>
>>
>> At the beginning of the spin loop, we call rb_set_head_page() to grab
>> the head_page. In that function, we check whether a page is the head_page
>> with rb_is_head_page(). The problem is that rb_is_head_page() may
>> return RB_PAGE_MOVED if the head_page has changed to another
>> page, and that is what has happened as the above messages show.
>
> I don't see where it said that.
>
> If RB_PAGE_MOVED is returned in rb_set_head_page then something is very
> broken. Because that is only returned if the reader modified the code.
> And since we only allow one reader at a time (we have locks to protect
> that), and the rb_set_head_page is only called by the reader, then this
> would mean another reader is reading the ring buffer.
>
> I should add a:
>
> ? ? ? ?if ((ret = rb_is_head_page(cpu_buffer, page, page->list.prev))) {
> ? ? ? ? ? ? ? ?RB_WARN_ON(ret == RB_PAGE_MOVED);
> ? ? ? ? ? ? ? ?cpu_buffer->head_page = page;
> ? ? ? ? ? ? ? ?return page;
> ? ? ? ?}
>
>
I added the RB_WARN_ON(ret == RB_PAGE_MOVED) in rb_set_head_page()
as you suggested and I think it has helped me figure out the problem.

I saw a warning triggered by this WARN_ON this morning and realized
that although we are not doing read from interrupt context, we sometimes
call ring_buffer_empty() from a timer interrupt handler that checks whether
there is new data coming in the trace buffer and if so wakes up the
user-space reader. ring_buffer_empty() calls rb_set_head_page()
that can move the head_page. As far as I understand, it should be
ok to have ring_buffer_empty() preempt a writer so I guess we should leave
that RB_WARN_ON out from rb_set_head_page(). The problem in our case
is that we use our own locking mechanism to guarantee a single reader
instead of using the cpu_buffer->reader_lock so the reader is not synchronized
with ring_buffer_empty(). So when ring_buffer_empty() is called while
we are in the process of swapping the reader_page and head_page, the
head_page pointer can point to the old head, i.e., the new reader_page,
and we will enter into an infinite loop.

I wrapped our rb_get_reader_page() calls with cpu_buffer->reader_lock
spinlock and it seems to have solved the problem.

Thank you very much for the help!

Jiaying

>> Shouldn't we just return 0 in case that head_page has moved so that
>> we can move to the next page in the loop inside rb_set_head_page()?
>
> No, when the reader moves the page, the RB_PAGE_MOVED forces the writer
> to go into the conflict path (conflict between writer and reader).
>
> -- Steve
>
>
>