2007-08-16 11:53:50

by Dan Aloni

[permalink] [raw]
Subject: Re: [PATCH 3/3] tty_io.c: don't use flush_scheduled_work()

On Sun, Jul 01, 2007 at 07:37:49PM +0400, Oleg Nesterov wrote:
> I don't know how to test this patch, the ack/nack from maintainer is wanted.
>
> flush_scheduled_work() is evil and should be avoided. Change tty_set_ldisc()
> and release_dev() to use cancel_delayed_work_sync/cancel_work_sync.
>
> I am not sure we really need to call do_tty_hangup() when cancel_work_sync()
> returns true, but this matches the current behaviour.

I also noticed this problem recently with 2.6.22, on a 2-CPU box where there
was one SCHED_RR userspace process stuck in a busy loop. The box was completely
responsive but had this annoyance where all tty closings were stuck in
flush_scheduled_work(). It's especially noticable when you ssh to the machine
and then try to log out.

A temporary workaround was to give just the workqueue events/* threads a
SCHED_FIFO static priority of 99, but I have kept that small patch to
myself (figured it's just too nasty).

--
Dan Aloni
XIV LTD, http://www.xivstorage.com
da-x (at) monatomic.org, dan (at) xiv.co.il


2007-08-16 16:00:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/3] tty_io.c: don't use flush_scheduled_work()

On 08/16, Dan Aloni wrote:
>
> On Sun, Jul 01, 2007 at 07:37:49PM +0400, Oleg Nesterov wrote:
> > I don't know how to test this patch, the ack/nack from maintainer is wanted.
> >
> > flush_scheduled_work() is evil and should be avoided. Change tty_set_ldisc()
> > and release_dev() to use cancel_delayed_work_sync/cancel_work_sync.
> >
> > I am not sure we really need to call do_tty_hangup() when cancel_work_sync()
> > returns true, but this matches the current behaviour.
>
> I also noticed this problem recently with 2.6.22, on a 2-CPU box where there
> was one SCHED_RR userspace process stuck in a busy loop. The box was completely
> responsive but had this annoyance where all tty closings were stuck in
> flush_scheduled_work(). It's especially noticable when you ssh to the machine
> and then try to log out.

cancel_work_sync(work) can hang too if some SCHED_RR userspace process does not
relinquish CPU, but the probability is much lower (it should preempt work->func
of that work_struct).

see also http://marc.info/?l=linux-kernel&m=118115098120503.

Oleg.

2007-08-21 06:09:06

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH 3/3] tty_io.c: don't use flush_scheduled_work()

On Thu, Aug 16, 2007 at 02:53:50PM +0300, Dan Aloni wrote:
> On Sun, Jul 01, 2007 at 07:37:49PM +0400, Oleg Nesterov wrote:
> > I don't know how to test this patch, the ack/nack from maintainer is wanted.
> >
> > flush_scheduled_work() is evil and should be avoided. Change tty_set_ldisc()
> > and release_dev() to use cancel_delayed_work_sync/cancel_work_sync.
> >
> > I am not sure we really need to call do_tty_hangup() when cancel_work_sync()
> > returns true, but this matches the current behaviour.
>
> I also noticed this problem recently with 2.6.22, on a 2-CPU box where there
> was one SCHED_RR userspace process stuck in a busy loop. The box was completely

IMHO, it was rather a busy sleep.

> responsive but had this annoyance where all tty closings were stuck in
> flush_scheduled_work(). It's especially noticable when you ssh to the machine
> and then try to log out.
>
> A temporary workaround was to give just the workqueue events/* threads a
> SCHED_FIFO static priority of 99, but I have kept that small patch to
> myself (figured it's just too nasty).

It looks like there was something more than this one SCHED_RR:
probably some high priority task(s) could have preempted workqueue
thread, delaying run_workqueues. Then it should be an interesting test
for this new, 2.6.23 scheduler.

Regards,
Jarek P.

PS: sorry for so delayed responsing.