2013-07-10 02:20:37

by zhangwei(Jovi)

[permalink] [raw]
Subject: [PATCH V2] relay: fix timer madness

When I'm using ktap script to tracing all event tracepoints by relay
transport, without this patch, the system will hang in few seconds.

I found the original patch discussion in 2007.
http://marc.info/?l=linux-kernel&m=118544794717162&w=2
(In that mail thread, the patch didn't fix that problem, but it fix
the problem I encountered now)

Changed from v1:
mod timer interval changed from jiffies+1 to HZ/10, as Ingo suggested.

Original patch changelog from Ingo in 2007:

Remove timer calls (!!!) from deep within the tracing infrastructure.
This was totally bogus code that can cause lockups and worse.
Poll the buffer every 2 jiffies for now.

Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: "zhangwei(Jovi)" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
kernel/relay.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/kernel/relay.c b/kernel/relay.c
index b91488b..87af4ce 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -339,6 +339,10 @@ static void wakeup_readers(unsigned long data)
{
struct rchan_buf *buf = (struct rchan_buf *)data;
wake_up_interruptible(&buf->read_wait);
+ /*
+ * Stupid polling for now:
+ */
+ mod_timer(&buf->timer, HZ / 10);
}

/**
@@ -356,6 +360,7 @@ static void __relay_reset(struct rchan_buf *buf, unsigned int init)
init_waitqueue_head(&buf->read_wait);
kref_init(&buf->kref);
setup_timer(&buf->timer, wakeup_readers, (unsigned long)buf);
+ mod_timer(&buf->timer, HZ / 10);
} else
del_timer_sync(&buf->timer);

@@ -739,15 +744,6 @@ size_t relay_switch_subbuf(struct rchan_buf *buf, size_t length)
else
buf->early_bytes += buf->chan->subbuf_size -
buf->padding[old_subbuf];
- smp_mb();
- if (waitqueue_active(&buf->read_wait))
- /*
- * Calling wake_up_interruptible() from here
- * will deadlock if we happen to be logging
- * from the scheduler (trying to re-grab
- * rq->lock), so defer it.
- */
- mod_timer(&buf->timer, jiffies + 1);
}

old = buf->data;
--
1.7.9.7


2013-07-10 03:49:53

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH V2] relay: fix timer madness

On 2013/7/10 10:18, zhangwei(Jovi) wrote:
> When I'm using ktap script to tracing all event tracepoints by relay
> transport, without this patch, the system will hang in few seconds.
>
> I found the original patch discussion in 2007.
> http://marc.info/?l=linux-kernel&m=118544794717162&w=2
> (In that mail thread, the patch didn't fix that problem, but it fix
> the problem I encountered now)
>
> Changed from v1:
> mod timer interval changed from jiffies+1 to HZ/10, as Ingo suggested.
>
> Original patch changelog from Ingo in 2007:
>
> Remove timer calls (!!!) from deep within the tracing infrastructure.
> This was totally bogus code that can cause lockups and worse.
> Poll the buffer every 2 jiffies for now.
>
> Signed-off-by: Ingo Molnar <[email protected]>
> Signed-off-by: "zhangwei(Jovi)" <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>

I don't think this patch should have Andrew's signed-off-by?

2013-07-10 03:57:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH V2] relay: fix timer madness

On Wed, 10 Jul 2013 11:37:26 +0800 Li Zefan <[email protected]> wrote:

> On 2013/7/10 10:18, zhangwei(Jovi) wrote:
> > When I'm using ktap script to tracing all event tracepoints by relay
> > transport, without this patch, the system will hang in few seconds.
> >
> > I found the original patch discussion in 2007.
> > http://marc.info/?l=linux-kernel&m=118544794717162&w=2
> > (In that mail thread, the patch didn't fix that problem, but it fix
> > the problem I encountered now)
> >
> > Changed from v1:
> > mod timer interval changed from jiffies+1 to HZ/10, as Ingo suggested.
> >
> > Original patch changelog from Ingo in 2007:
> >
> > Remove timer calls (!!!) from deep within the tracing infrastructure.
> > This was totally bogus code that can cause lockups and worse.
> > Poll the buffer every 2 jiffies for now.
> >
> > Signed-off-by: Ingo Molnar <[email protected]>
> > Signed-off-by: "zhangwei(Jovi)" <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Jens Axboe <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: Eric Dumazet <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
>
> I don't think this patch should have Andrew's signed-off-by?

I guess not, unless it was taken from -mm, which would be odd, as I
have the old version.

v1 has been in my tree for a few months - Ingo requested some updates
but nothing happened and I have not checked whether v2 addresses his
requests.

2013-07-19 22:45:29

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH V2] relay: fix timer madness

On Wed, Jul 10, 2013 at 10:18:54AM +0800, zhangwei(Jovi) wrote:
> @@ -339,6 +339,10 @@ static void wakeup_readers(unsigned long data)
> {
> struct rchan_buf *buf = (struct rchan_buf *)data;
> wake_up_interruptible(&buf->read_wait);
> + /*
> + * Stupid polling for now:
> + */
> + mod_timer(&buf->timer, HZ / 10);

mod_timer() takes an offset so probably "jiffies + HZ / 10" was
intended here and also below. Certainly passing "HZ / 10" doesn't
make any kind of sense.

regards,
dan carpenter