LinuxLists.cc - [PATCH] [2.5] Non-blocking write can block

[permalink] [raw]

Subject: Re: [PATCH] [2.5] Non-blocking write can block

On Wed, 4 Jun 2003, Christoph Hellwig wrote:
>
> The else should be on the same line as the closing brace, else
> the patch looks fine.

No no no, it's wrong.

If you do something like this, then you also have to teach "select()"
about this, otherwise you just get busy looping in applications.

In general, we shouldn't do this, unless somebody can show an application
where it really matters. Taking internal kernel locking into account for
"blockingness" easily gets quite complicated, and there is seldom any real
point to it.

Remember: perfect is the enemy of good. I'll happily apply the patch (if
it also updates the tty poll() functionality), _if_ there is some
real-world situation where it matters.

Linus

2003-06-04 14:44:41

[permalink] [raw]

Subject: Re: [PATCH] [2.5] Non-blocking write can block

On Wed, 4 Jun 2003, Linus Torvalds wrote:

> No no no, it's wrong.
>
> If you do something like this, then you also have to teach "select()"
> about this, otherwise you just get busy looping in applications.
>
> In general, we shouldn't do this, unless somebody can show an application
> where it really matters.

I wrote the patch to solve a real-world problem with wall(1), which
occasionally gets stuck writing to somebody's tty. I think it's reasonable
for wall to assume that non-blocking writes are non-blocking.

I'll think about how to do the patch correctly.

Peter

2003-06-04 17:01:51

[permalink] [raw]

Subject: RE: [PATCH] [2.5] Non-blocking write can block

We ran into this problem here in an embedded environment. It causes
syslogd to hang and when this happens, everybody who talks to syslogd
hangs. Which means you may not even be able to login. In the end we used
exactly the same fix which seems to work.

I am curious to know the correct fix.

> On Wed, 4 Jun 2003, Christoph Hellwig wrote:
> >
> > The else should be on the same line as the closing brace, else
> > the patch looks fine.
>
> No no no, it's wrong.
>
> If you do something like this, then you also have to teach "select()"
> about this, otherwise you just get busy looping in applications.
>
> In general, we shouldn't do this, unless somebody can show an
> application
> where it really matters. Taking internal kernel locking into
> account for
> "blockingness" easily gets quite complicated, and there is
> seldom any real
> point to it.
>
> Remember: perfect is the enemy of good. I'll happily apply
> the patch (if
> it also updates the tty poll() functionality), _if_ there is some
> real-world situation where it matters.
>
> Linus
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-06-04 17:31:58

by Alan

[permalink] [raw]

Subject: Re: [PATCH] [2.5] Non-blocking write can block

On Mer, 2003-06-04 at 15:35, Linus Torvalds wrote:
> In general, we shouldn't do this, unless somebody can show an application
> where it really matters. Taking internal kernel locking into account for
> "blockingness" easily gets quite complicated, and there is seldom any real
> point to it.

Hanging shutdown is the obvious one. With 2.0/2.2 we had a similar
problem and fixed it.

2003-06-04 17:29:28

[permalink] [raw]

Subject: RE: [PATCH] [2.5] Non-blocking write can block

On Wed, 4 Jun 2003, Hua Zhong wrote:
>
> We ran into this problem here in an embedded environment. It causes
> syslogd to hang and when this happens, everybody who talks to syslogd
> hangs. Which means you may not even be able to login. In the end we used
> exactly the same fix which seems to work.
>
> I am curious to know the correct fix.

[ First off: your embedded syslog problem is fixed by making sure that
syslog doesn't try to write to a tty that somebody else might be
blocked. In other words, to me it sounds like a "well, don't do that
then" schenario, rather than a real kernel problem. ]

[ Secondly, you should all realize that O_NONBLOCK has _never_ meant that
the IO can't ever block. Even O_NONBLOCK reads and writes will always
block on things like having page faults on the user buffer, and a lot of
drivers still use the kernel lock and will block on that. O_NONBLOCK is
not an absolute "this is atomic" thing, it's a "don't wait for data if
there is none" thing ]

With that in mind, if you feel strongly about this particular path, then I
can only warn you that the correct fix actually looks fairly hard, as far
as I can tell. Yes, the posted patch is a small part of it, but the more
complex side is how to make poll() agree with the write semantics that the
posted patch changed.

If you have a write() that returns -EAGAIN, and a poll() function that
says "it's ok to write", any select-loop based application will start
busy-looping calling poll/write, and use up 100% CPU time.

Which may be acceptable for some users, of course, but what you're doing
with the simple patch is just replacing one bug with another one. And I
personally think the bug you're introducing is the worse one.

But which bug you "prefer" ends up depending entirely on the machine load
and usage - the current behaviour has clearly not ended up in very many
complaints, and even if the patch fixes it for those few people didn't
like the historical behaviour, it may well end up breaking a hell of a lot
more distributions that until now were perfectly happy.

For example: what happens when your real-time application starts
busy-looping due to this? Right. The system is totally _dead_, since the
application that is busy writing to the tty will never be scheduled.

And yes, something like syslogd could easily be marked high-priority in
some setup. You do NOT want to make it busy-loop.

As to how to expand the patch to avoid the busy-loop: it's definitely
non-trivial. Semaphores do not have poll() qualities, and I don't see a
good way to get them. Something like

static unsigned int tty_poll(struct file * filp, poll_table * wait)
{
struct tty_struct * tty;
struct semaphore *sem;
int retval;

tty = (struct tty_struct *)filp->private_data;
if (tty_paranoia_check(tty, filp->f_dentry->d_inode->i_rdev, "tty_poll"))
return 0;

sem = &tty->atomic_write;
if (!down_trylock(sem)) {
poll_wait(filp, sem->wait, wait);
if (!down_trylock(sem))
return 0;
}
retval = 0;
if (tty->ldisc.poll)
retval = tty->ldisc.poll(tty, filp, wait);
up(sem);
return retval;
}

MIGHT work, but as you can see it actually now depends on knowing the
internals of the semaphore implementation, and quite frankly I don't know
if it works at all. As a result, I'm not horribly keen on the idea.

And as I tried to explain, I'm also not horribly keen on having a write()
that doesn't match poll() and can cause busy looping.

Linus

2003-06-04 17:40:08

by Mike Dresser

[permalink] [raw]

Subject: RE: [PATCH] [2.5] Non-blocking write can block

On Wed, 4 Jun 2003, Hua Zhong wrote:

> We ran into this problem here in an embedded environment. It causes
> syslogd to hang and when this happens, everybody who talks to syslogd
> hangs. Which means you may not even be able to login. In the end we used
> exactly the same fix which seems to work.

I get this problem with writing to a remote syslog server, if the remote
syslog server hangs up or crashes no one can login to the machine that is
writing to the syslog server, even when the syslog server comes back.

Mike

2003-06-04 17:44:32

[permalink] [raw]

Subject: Re: [PATCH] [2.5] Non-blocking write can block

On 4 Jun 2003, Alan Cox wrote:
>
> On Mer, 2003-06-04 at 15:35, Linus Torvalds wrote:
> > In general, we shouldn't do this, unless somebody can show an application
> > where it really matters. Taking internal kernel locking into account for
> > "blockingness" easily gets quite complicated, and there is seldom any real
> > point to it.
>
> Hanging shutdown is the obvious one. With 2.0/2.2 we had a similar
> problem and fixed it.

As I tried to point out, the current patch on the table doesn't actually
"fix" anything, in that it can break things even _worse_ than the current
situation.

A much better fix might well be to actually not allow over-long tty writes
at all, and thus avoid the "block out" thing at the source of the problem,
instead of trying to make programs who play nice be the ones that suffer.

If somebody does a 1MB write to a tty, do we actually have any reason to
try to make it so damn atomic and not return early?

Linus

2003-06-04 18:34:26

[permalink] [raw]

Subject: RE: [PATCH] [2.5] Non-blocking write can block

On Wed, 4 Jun 2003, Hua Zhong wrote:
> This particular patch is in 2.4.20 already. There is another patch in
> 2.4.20 (?) which seems to fix the "main problem" (the n_tty_write_wakeup
> function in n_tty.c), but I didn't verify it.

Yes - that's because I submitted the patch ages ago. All that means is
that the distributions are relying on it, not that the patch is correct!

Peter

2003-06-04 18:31:46

[permalink] [raw]

Subject: RE: [PATCH] [2.5] Non-blocking write can block

> -----Original Message-----
> From: Linus Torvalds [mailto:[email protected]]
> Sent: Wednesday, June 04, 2003 10:42 AM
> To: Hua Zhong
> Cc: 'Christoph Hellwig'; 'P. Benie'; 'Kernel Mailing List'
> Subject: RE: [PATCH] [2.5] Non-blocking write can block
>
>
>
> On Wed, 4 Jun 2003, Hua Zhong wrote:
> >
> > We ran into this problem here in an embedded environment. It causes
> > syslogd to hang and when this happens, everybody who talks to
syslogd
> > hangs. Which means you may not even be able to login. In the end we
> > used exactly the same fix which seems to work.
> >
> > I am curious to know the correct fix.
>
> [ First off: your embedded syslog problem is fixed by making sure that
> syslog doesn't try to write to a tty that somebody else might be
> blocked. In other words, to me it sounds like a "well, don't do that
> then" schenario, rather than a real kernel problem. ]

It's hard. The shell might be printing and you cannot prevent that.

That said, the main problem was somebody could be stuck in waiting for
tty *forever* and thus everyone who tries to write also hangs.

This particular patch is in 2.4.20 already. There is another patch in
2.4.20 (?) which seems to fix the "main problem" (the n_tty_write_wakeup
function in n_tty.c), but I didn't verify it.

2003-06-04 19:00:11

by Ben Pfaff

[permalink] [raw]

Subject: Re: [PATCH] [2.5] Non-blocking write can block

Linus Torvalds <[email protected]> writes:

> On Wed, 4 Jun 2003, Hua Zhong wrote:
> >
> > We ran into this problem here in an embedded environment. It causes
> > syslogd to hang and when this happens, everybody who talks to syslogd
> > hangs. Which means you may not even be able to login. In the end we used
> > exactly the same fix which seems to work.
> >
> > I am curious to know the correct fix.
>
> [ First off: your embedded syslog problem is fixed by making sure that
> syslog doesn't try to write to a tty that somebody else might be
> blocked. In other words, to me it sounds like a "well, don't do that
> then" schenario, rather than a real kernel problem. ]

One day I managed to keep myself from logging in or su'ing or
doing a number of things that needed the log for a quite a while
by accidentally hitting Scroll Lock on a console that syslog was
set up to log to. I suppose the answer is "don't do that" but it
was a mysterious problem for several minutes that day.
--
"Let others praise ancient times; I am glad I was born in these."
--Ovid (43 BC-18 AD)

2003-06-04 19:10:28

[permalink] [raw]

Subject: RE: [PATCH] [2.5] Non-blocking write can block

On Wed, 4 Jun 2003, P. Benie wrote:

> On Wed, 4 Jun 2003, Hua Zhong wrote:
> > This particular patch is in 2.4.20 already. There is another patch in
> > 2.4.20 (?) which seems to fix the "main problem" (the n_tty_write_wakeup
> > function in n_tty.c), but I didn't verify it.
>
> Yes - that's because I submitted the patch ages ago. All that means is
> that the distributions are relying on it, not that the patch is correct!

Sorry Hua, I wasn't reading your mail correctly. Please ignore the above
comment.

Peter

2003-06-04 19:08:01

[permalink] [raw]

Subject: RE: [PATCH] [2.5] Non-blocking write can block

On Wed, 4 Jun 2003, Hua Zhong wrote:
>
> That said, the main problem was somebody could be stuck in waiting for
> tty *forever* and thus everyone who tries to write also hangs.
>
> This particular patch is in 2.4.20 already. There is another patch in
> 2.4.20 (?) which seems to fix the "main problem" (the n_tty_write_wakeup
> function in n_tty.c), but I didn't verify it.

Do y ou have that other patch handy? It sounds like that is the real cause
of the problem, and the patch quoted originally in this thread was a
(broken) work-around..

Linus

2003-06-04 19:22:43

[permalink] [raw]

Subject: RE: [PATCH] [2.5] Non-blocking write can block

> Do y ou have that other patch handy? It sounds like that is
> the real cause of the problem, and the patch quoted originally
> in this thread was a (broken) work-around..
>
> Linus
>

Something like this:

--- n_tty.c.old 2003-06-04 12:28:36.000000000 -0700
+++ n_tty.c 2003-06-04 12:28:51.000000000 -0700
@@ -711,6 +711,23 @@
return 0;
}

+
+/*
+ * Required for the ptys, serial driver etc. since processes
+ * that attach themselves to the master and rely on ASYNC
+ * IO must be woken up
+ */
+
+static void n_tty_write_wakeup(struct tty_struct *tty)
+{
+ if (tty->fasync)
+ {
+ set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
+ kill_fasync(&tty->fasync, SIGIO, POLL_OUT);
+ }
+ return;
+}
+
static void n_tty_receive_buf(struct tty_struct *tty, const unsigned char *cp,
char *fp, int count)
{
@@ -1157,6 +1174,8 @@
while (nr > 0) {
ssize_t num = opost_block(tty, b, nr);
if (num < 0) {
+ if (num == -EAGAIN)
+ break;
retval = num;
goto break_out;
}
@@ -1236,6 +1255,6 @@
normal_poll, /* poll */
n_tty_receive_buf, /* receive_buf */
n_tty_receive_room, /* receive_room */
- 0 /* write_wakeup */
+ n_tty_write_wakeup /* write_wakeup */
};

2003-06-04 19:33:26

[permalink] [raw]

Subject: Re: [PATCH] [2.5] Non-blocking write can block

On Wed, 4 Jun 2003, Linus Torvalds wrote:
>
> A much better fix might well be to actually not allow over-long tty writes
> at all, and thus avoid the "block out" thing at the source of the problem,
> instead of trying to make programs who play nice be the ones that suffer.
>
> If somebody does a 1MB write to a tty, do we actually have any reason to
> try to make it so damn atomic and not return early?

The problem isn't to do with large writes. It's to do with any sequence of
writes that fills up the receive buffer, which is only 4K for N_TTY. If
the receiving program is suspended, the buffer will fill sooner or later.

I am half-tempted by this style of fix, but I can't help but feel that
we'll discover a huge set of programs that assume short writes never
happen if they aren't playing with signals.

It's also not as easy a fix as it sounds: for blocking writes, we've gone
into into ldisc.write and then in tty->driver->write before we discover
that that we can't write any bytes, by which time we already have the
write semaphore. I suspect that it requires just as much effort to ensure
that this case is handled correctly as it does to stop the non-blocking
write/poll loop.

I compared 2.4.20 and 2.5.70 to see if I could find the patch Hua
referred to. n_tty.c and pty.c look almost the same - I don't think the
patch is in 2.4.20.

Peter

2003-06-04 19:43:06

[permalink] [raw]

Subject: Re: [PATCH] [2.5] Non-blocking write can block

On Wed, 4 Jun 2003, P. Benie wrote:
>
> The problem isn't to do with large writes. It's to do with any sequence of
> writes that fills up the receive buffer, which is only 4K for N_TTY. If
> the receiving program is suspended, the buffer will fill sooner or later.

Well, even then we could just drop the "write_atomic" lock.

The thing is, I don't know what the tty atomicity guarantees are. I know
what they are for pipes (quite reasonable), but tty's?

Linus

2003-06-04 19:56:33

[permalink] [raw]

Subject: Re: [PATCH] [2.5] Non-blocking write can block

There is a missing piece in the previous mail. The complete patch is as
follows. I just googled it and the author is Sapan Bhatia .Cc-ed.

diff -urN linux-old/drivers/char/CVS/Entries linux/drivers/char/CVS/Entries
--- linux-old/drivers/char/CVS/Entries 2003-06-04 12:57:32.000000000 -0700
+++ linux/drivers/char/CVS/Entries 2003-06-04 13:01:35.000000000 -0700
@@ -194,5 +194,5 @@
D/pcmcia////
D/rio////
/tty_io.c/1.3/Wed Jun 4 19:28:08 2003//
-/pty.c/1.1/Wed Jun 4 19:57:20 2003//T1.1
-/n_tty.c/1.1/Wed Jun 4 19:57:32 2003//T1.1
+/n_tty.c/1.2/Wed Jun 4 20:01:32 2003//
+/pty.c/1.2/Wed Jun 4 20:01:32 2003//
diff -urN linux-old/drivers/char/n_tty.c linux/drivers/char/n_tty.c
--- linux-old/drivers/char/n_tty.c 2003-06-04 13:00:51.000000000 -0700
+++ linux/drivers/char/n_tty.c 2003-06-04 13:01:32.000000000 -0700
@@ -711,6 +711,23 @@
return 0;
}

+
+/*
+ * Required for the ptys, serial driver etc. since processes
+ * that attach themselves to the master and rely on ASYNC
+ * IO must be woken up
+ */
+
+static void n_tty_write_wakeup(struct tty_struct *tty)
+{
+ if (tty->fasync)
+ {
+ set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
+ kill_fasync(&tty->fasync, SIGIO, POLL_OUT);
+ }
+ return;
+}
+
static void n_tty_receive_buf(struct tty_struct *tty, const unsigned char
*cp,
char *fp, int count)
{
@@ -1157,6 +1174,8 @@
while (nr > 0) {
ssize_t num = opost_block(tty, b, nr);
if (num < 0) {
+ if (num == -EAGAIN)
+ break;
retval = num;
goto break_out;
}
@@ -1236,6 +1255,6 @@
normal_poll, /* poll */
n_tty_receive_buf, /* receive_buf */
n_tty_receive_room, /* receive_room */
- 0 /* write_wakeup */
+ n_tty_write_wakeup /* write_wakeup */
};

diff -urN linux-old/drivers/char/pty.c linux/drivers/char/pty.c
--- linux-old/drivers/char/pty.c 2003-06-04 13:01:04.000000000 -0700
+++ linux/drivers/char/pty.c 2003-06-04 13:01:32.000000000 -0700
@@ -331,6 +331,7 @@
clear_bit(TTY_OTHER_CLOSED, &tty->link->flags);
wake_up_interruptible(&pty->open_wait);
set_bit(TTY_THROTTLED, &tty->flags);
+ set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
/* Register a slave for the master */
if (tty->driver.major == PTY_MASTER_MAJOR)
tty_register_devfs(&tty->link->driver,
?@

2003-06-04 20:32:19

[permalink] [raw]

Subject: RE: [PATCH] [2.5] Non-blocking write can block

Oops, yes, that patch is already in 2.5. It got merged in 2.4 sometime
between 2.4.17 and 2.4.20..

> I compared 2.4.20 and 2.5.70 to see if I could find the patch Hua
> referred to. n_tty.c and pty.c look almost the same - I don't
> think the
> patch is in 2.4.20.
>
> Peter
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-06-04 20:37:00