2005-02-15 00:54:47

by Andreas Schwab

[permalink] [raw]
Subject: Pty is losing bytes

Recent kernel are losing bytes on a pty. Try running this program (needs
to be linked against -lutil) with a moderately large input (10K - 20K).
The output should match its input, but instead there is always one byte
missing at the end of the first 4K chunk read by the child.

#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pty.h>
#include <sys/wait.h>

char buf[4096];

int main (void)
{
int master;
int stdout = dup (1);
int pid = forkpty (&master, 0, 0, 0);
int bytes_read;
int bytes_written;

if (pid < 0) exit (1);
if (pid > 0) {
while ((bytes_read = read (0, buf, sizeof (buf))) > 0) {
char *p = buf;
while (bytes_read > 0 &&
(bytes_written = write (master, p, bytes_read)) > 0)
bytes_read -= bytes_written, p += bytes_written;
}
write (master, "EOF\n", 4);
waitpid (pid, 0, 0);
exit (0);
}

while ((bytes_read = read (0, buf, sizeof (buf))) > 0) {
char *p = buf;
if (strncmp (buf, "EOF\n", 4) == 0) break;
while (bytes_read > 0 &&
(bytes_written = write (stdout, p, bytes_read)) > 0)
bytes_read -= bytes_written, p += bytes_written;
}
exit (0);
}

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."


2005-02-15 02:00:20

by Alex Davis

[permalink] [raw]
Subject: Re: Pty is losing bytes

Problem does not exist on 2.6.8.1. Compiling your program and running

./a.out < README | diff README -

produces no output.

I tested various files ranging in size from 10 to 60k.

-Alex


=====
I code, therefore I am



__________________________________
Do you Yahoo!?
Yahoo! Mail - 250MB free storage. Do more. Manage less.
http://info.mail.yahoo.com/mail_250

2005-02-15 10:48:59

by Andreas Schwab

[permalink] [raw]
Subject: Re: Pty is losing bytes

Alex Davis <[email protected]> writes:

> Problem does not exist on 2.6.8.1.

Yes, it is a pretty recent regression, reproducable since 2.6.10.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2005-02-15 11:01:49

by Jan De Luyck

[permalink] [raw]
Subject: Re: Pty is losing bytes

On Tuesday 15 February 2005 11:48, Andreas Schwab wrote:
> Alex Davis <[email protected]> writes:
> > Problem does not exist on 2.6.8.1.
>
> Yes, it is a pretty recent regression, reproducable since 2.6.10.

Confirmed against 2.6.11-rc4

devilkin@precious:/tmp$ ./a.out < laptop-mode.txt | diff laptop-mode.txt -
97c97
< ext3 or ReiserFS filesystems (also done automatically by the control script),
---
> ext3 or eiserFS filesystems (also done automatically by the control script),


Jan

--
An American is a man with two arms and four wheels.
-- A Chinese child

2005-02-15 19:08:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: Pty is losing bytes



On Tue, 15 Feb 2005, Andreas Schwab wrote:
>
> Recent kernel are losing bytes on a pty.

Great catch.

I think it may be a n_tty line discipline bug, brought on by the fact that
the PTY buffering is now 4kB rather than 2kB. 4kB is also the
N_TTY_BUF_SIZE, and if n_tty has some off-by-one error, that would explain
it.

Does the problem go away if you change the default value of "chunk" (in
drivers/char/tty_io.c:do_tty_write) from 4096 to 2048? If so, that means
that the pty code has _claimed_ to have written 4kB, and only ever wrote
4kB-1 bytes. That in turn implies that "ldisc.receive_room()" disagrees
with "ldisc.receive_buf()".

Linus

2005-02-15 19:50:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: Pty is losing bytes



On Tue, 15 Feb 2005, Linus Torvalds wrote:
>
> On Tue, 15 Feb 2005, Andreas Schwab wrote:
> >
> > Recent kernel are losing bytes on a pty.
>
> Great catch.
>
> I think it may be a n_tty line discipline bug, brought on by the fact that
> the PTY buffering is now 4kB rather than 2kB. 4kB is also the
> N_TTY_BUF_SIZE, and if n_tty has some off-by-one error, that would explain
> it.
>
> Does the problem go away if you change the default value of "chunk" (in
> drivers/char/tty_io.c:do_tty_write) from 4096 to 2048? If so, that means
> that the pty code has _claimed_ to have written 4kB, and only ever wrote
> 4kB-1 bytes. That in turn implies that "ldisc.receive_room()" disagrees
> with "ldisc.receive_buf()".

Ok, I just tried this myself, and yes, the change from 4kB chunks to 2kB
chunks seems to fix your PTY test code.

However, then when I start looking at n_tty_receive_room() and
n_tty_receive_buf(), my stomach gets a bit queasy. I have this horrid
feeling that I had something to do with the mess, but I'm going to lash
out and blame somebody else, like tytso, for most of it. That's some _old_
code, regardless (gone through a lot of "let's fix up that detail", but
not a lot of "boy, I bet we could fix it entirely").

We should get rid of the separate "how much room do you have to write" and
"do the write" stuff, and just make "receive_buf()" able to write partial
results. As it is, if (as it seems to be the case) n_tty_receive_room()
claims to have more room than "n_tty_receive_buf()" can actually fill,
then the tty layer will never know that somebody lied, and that characters
got dropped on the floor.

This bug was apparently hidden just because the PTY layer used the flip
buffers as it's staging area, and that happens to be 2kB in size. That,
in turn, is only half of what the actual N_TTY_BUF_SIZE is, so a single
call could never fill up N_TTY_BUF_SIZE, even if characters were expanded
(ie LF -> CRLF translation etc).

I'd love for somebody to try to take a look at where n_tty goes wrong, but
I think that for now I'll just make the fix be the cheezy "limit tty
chunks to 2kB". It's worked for a decade, it can work for a bit longer ;)

Who here feels they know n_tty and have a strong stomach? Raise your
hands now. Don't be shy.

Linus

2005-02-15 20:03:12

by Sergey Vlasov

[permalink] [raw]
Subject: Re: Pty is losing bytes

On Tue, 15 Feb 2005 11:08:07 -0800 (PST) Linus Torvalds wrote:

> On Tue, 15 Feb 2005, Andreas Schwab wrote:
> >
> > Recent kernel are losing bytes on a pty.
>
> Great catch.
>
> I think it may be a n_tty line discipline bug, brought on by the fact that
> the PTY buffering is now 4kB rather than 2kB. 4kB is also the
> N_TTY_BUF_SIZE, and if n_tty has some off-by-one error, that would explain
> it.
>
> Does the problem go away if you change the default value of "chunk" (in
> drivers/char/tty_io.c:do_tty_write) from 4096 to 2048? If so, that means
> that the pty code has _claimed_ to have written 4kB, and only ever wrote
> 4kB-1 bytes. That in turn implies that "ldisc.receive_room()" disagrees
> with "ldisc.receive_buf()".

The problem also goes away after unsetting ECHO on the slave terminal.
This seems to point to this code in n_tty_receive_char():

if (L_ECHO(tty)) {
if (tty->read_cnt >= N_TTY_BUF_SIZE-1) {
put_char('\a', tty); /* beep if no space */
return;
}
.......
}

This code sets the maximum number of buffered characters to
N_TTY_BUF_SIZE-1, however, put_tty_queue() considers the maximum to be
N_TTY_BUF_SIZE, and n_tty_receive_room() also returns N_TTY_BUF_SIZE for
canonical mode if the canon_data buffer is empty - therefore after
unsetting ECHO bytes are no longer lost.

BTW, for the noncanonical mode n_tty_receive_room() calculates the
result assuming that the buffer can hold at most N_TTY_BUF_SIZE-1
characters - not the full N_TTY_BUF_SIZE.


Attachments:
(No filename) (1.46 kB)
(No filename) (189.00 B)
Download all attachments

2005-02-15 20:11:46

by Alan Curry

[permalink] [raw]
Subject: Re: Pty is losing bytes

Linus Torvalds writes the following:
>
>Does the problem go away if you change the default value of "chunk" (in
>drivers/char/tty_io.c:do_tty_write) from 4096 to 2048? If so, that means
>that the pty code has _claimed_ to have written 4kB, and only ever wrote
>4kB-1 bytes. That in turn implies that "ldisc.receive_room()" disagrees
>with "ldisc.receive_buf()".
>

That does fix it for me, in 2.6.10-as3 (the -as3 patch doesn't touch anything
near the pty code, so it should be good for vanilla 2.6.10 too).

I noticed while testing that the "lost byte" is occasionally more than one
byte. It's always one byte at the 4k mark, but sometimes a larger group of
bytes is lost around the 8k mark, and (more rarely) an even larger group of
bytes is lost at the 12k mark.

$ ls -l /etc/mime.types
-rw-r--r-- 1 root root 15723 Apr 9 2002 /etc/mime.types
$ while :; do ./ptytest < /etc/mime.types | wc -c ; sleep 1 ; done
15722
15710
15722
15710
15722
15710
15722
15722
15682
(and so on)

2005-02-15 20:21:33

by Sergey Vlasov

[permalink] [raw]
Subject: Re: Pty is losing bytes

On Tue, Feb 15, 2005 at 10:58:02PM +0300, Sergey Vlasov wrote:
> On Tue, 15 Feb 2005 11:08:07 -0800 (PST) Linus Torvalds wrote:
>
> > On Tue, 15 Feb 2005, Andreas Schwab wrote:
> > >
> > > Recent kernel are losing bytes on a pty.
> >
> > Great catch.
> >
> > I think it may be a n_tty line discipline bug, brought on by the fact that
> > the PTY buffering is now 4kB rather than 2kB. 4kB is also the
> > N_TTY_BUF_SIZE, and if n_tty has some off-by-one error, that would explain
> > it.
> >
> > Does the problem go away if you change the default value of "chunk" (in
> > drivers/char/tty_io.c:do_tty_write) from 4096 to 2048? If so, that means
> > that the pty code has _claimed_ to have written 4kB, and only ever wrote
> > 4kB-1 bytes. That in turn implies that "ldisc.receive_room()" disagrees
> > with "ldisc.receive_buf()".
>
> The problem also goes away after unsetting ECHO on the slave terminal.
> This seems to point to this code in n_tty_receive_char():
>
> if (L_ECHO(tty)) {
> if (tty->read_cnt >= N_TTY_BUF_SIZE-1) {
> put_char('\a', tty); /* beep if no space */
> return;
> }
> .......
> }
>
> This code sets the maximum number of buffered characters to
> N_TTY_BUF_SIZE-1, however, put_tty_queue() considers the maximum to be
> N_TTY_BUF_SIZE, and n_tty_receive_room() also returns N_TTY_BUF_SIZE for
> canonical mode if the canon_data buffer is empty - therefore after
> unsetting ECHO bytes are no longer lost.

But all this really is not important, because n_tty_receive_char() can
put more than one char into buffer for a single input char in lots of
cases. Therefore n_tty_receive_room() can overestimate the available
space at least by 2 times.


Attachments:
(No filename) (1.65 kB)
(No filename) (189.00 B)
Download all attachments

2005-02-15 20:36:04

by Andreas Schwab

[permalink] [raw]
Subject: Re: Pty is losing bytes

Linus Torvalds <[email protected]> writes:

> I think it may be a n_tty line discipline bug, brought on by the fact that
> the PTY buffering is now 4kB rather than 2kB. 4kB is also the
> N_TTY_BUF_SIZE, and if n_tty has some off-by-one error, that would explain
> it.

I've also seen more than one byte missing. For example when sending a big
chunk of bytes down the pty via an Emacs *shell* buffer up to 16 bytes are
missing somewhere in the middle.

> Does the problem go away if you change the default value of "chunk" (in
> drivers/char/tty_io.c:do_tty_write) from 4096 to 2048?

Yes, that helps.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2005-02-15 20:59:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: Pty is losing bytes



On Tue, 15 Feb 2005, Andreas Schwab wrote:
>
> Linus Torvalds <[email protected]> writes:
>
> > I think it may be a n_tty line discipline bug, brought on by the fact that
> > the PTY buffering is now 4kB rather than 2kB. 4kB is also the
> > N_TTY_BUF_SIZE, and if n_tty has some off-by-one error, that would explain
> > it.
>
> I've also seen more than one byte missing. For example when sending a big
> chunk of bytes down the pty via an Emacs *shell* buffer up to 16 bytes are
> missing somewhere in the middle.

If it's NTTY (and I'm pretty certain it is - the generic tty code looks
fine, the pty code itself is too simple for words), then I'd actually have
expected more variability than a simple off-by-one.

I'd have expected the problems to be due to character expansion, ie the
CR->LF thing etc, and that would have resulted in off-by-N, where N is the
number of expanded characters in a 4096 byte block. With XTAB, you could
even have each \t be expanded to 8 space characters on ECHO, and you could
lose a whole lot more than just one byte at the end.

That's clearly not the case, and I haven't looked into exactly what
termios settings "forkpty()" uses, so I suspect that it's something else
than pure expansion on write. There's a lot of things going on in a tty
driver: the character flow itself, the "backflow" in the form of ECHO, etc
etc.

Also, there's the interaction with "flushing" the buffer: we do a
"flush_chars()" at regular intervals, and that will flush it to the next
buffer, which may actually cause things to fit a lot better than they
would otherwise have fit. In fact, I'd not be at all surprised if this
thing was timing-dependent too, ie depended on how quickly the reader
emptied the other side buffer.

The tty layer is pretty ugly, but in its defense, it has to be said that
tty handling _is_ one of the more complex parts of traditional UNIX, so
the ugliness is at least partly due to the problem space, not just the
fact that the code is old.

Linus

2005-02-15 21:54:19

by Andreas Schwab

[permalink] [raw]
Subject: Re: Pty is losing bytes

Linus Torvalds <[email protected]> writes:

> That's clearly not the case, and I haven't looked into exactly what
> termios settings "forkpty()" uses

If no termios is passed then the defaults are unchanged.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2005-02-15 22:15:00

by Alan

[permalink] [raw]
Subject: Re: Pty is losing bytes

On Maw, 2005-02-15 at 19:44, Linus Torvalds wrote:
> However, then when I start looking at n_tty_receive_room() and
> n_tty_receive_buf(), my stomach gets a bit queasy. I have this horrid
> feeling that I had something to do with the mess, but I'm going to lash

You did.
Then Ted tided it up
Then Bill added hacks to "fix" up the locking

The real fix is shooting the flip buffers, at that point a lot of the
locking goes away (because the tty owns everything relevant). I just
don't have time to do major kernel hacking and finish the MBA thesis at
the same time.

Sorry yes I know how n_tty works, and no I'm not going to fix it

2005-02-16 03:12:52

by Roman Zippel

[permalink] [raw]
Subject: Re: Pty is losing bytes

Hi,

On Tue, 15 Feb 2005, Linus Torvalds wrote:

> > I've also seen more than one byte missing. For example when sending a big
> > chunk of bytes down the pty via an Emacs *shell* buffer up to 16 bytes are
> > missing somewhere in the middle.
>
> If it's NTTY (and I'm pretty certain it is - the generic tty code looks
> fine, the pty code itself is too simple for words), then I'd actually have
> expected more variability than a simple off-by-one.

The patch below seems to do the trick too.
It seems the initial receive_room() call in pty_write() returns
N_TTY_BUF_SIZE and receive_buf() will happily drop the last byte.

bye, Roman


n_tty.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/drivers/char/n_tty.c
===================================================================
--- linux-2.6.orig/drivers/char/n_tty.c 2005-02-16 03:53:53.000000000 +0100
+++ linux-2.6/drivers/char/n_tty.c 2005-02-16 03:56:39.000000000 +0100
@@ -863,7 +863,7 @@ static int n_tty_receive_room(struct tty
* characters will be beeped.
*/
if (tty->icanon && !tty->canon_data)
- return N_TTY_BUF_SIZE;
+ return N_TTY_BUF_SIZE - 1;

if (left > 0)
return left;

2005-02-16 04:05:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: Pty is losing bytes



On Wed, 16 Feb 2005, Roman Zippel wrote:
>
> The patch below seems to do the trick too.
> It seems the initial receive_room() call in pty_write() returns
> N_TTY_BUF_SIZE and receive_buf() will happily drop the last byte.

Why have that "tty->icanon && !tty->canon_data" test in the first place, I
wonder? Isn't the "left" calculation always correct? That's really how
many bytes free we have in the tty, that "canon_data" thing is just about
how much of it is available for _reading_ as canon, no? (Ie "how many
characters that have seen a finishing end-of-line"). So I don't see why
that canon_data test is relevant to the question of filling the buffer..

In particular, afaikt, "canon_data" can be zero even if we _have_
characters in the queue (just not aany EOLN), so I'd expect your version
to still return "N_TTY_BUF_SIZE-1" even though the queue can only actually
take fewer characters..

So I'd still worry whether that added -1 actually fixes the bug, or just
means that a off-by-one has to now be off-by-two to be noticeable.. (In
contrast, changing the chunking to 2kB not only changes a "off-by-one" to
a "off-by-2048", but more importantly is what we've tested for the last
decade or so ;)

In fact, I _think_ that the whole

if (tty->icanon && !tty->canon_data)

test is about the fact that we _want_ the writer to start dropping
characters at some point, since if we're in canon mode, we need to get a
full line in the end, and if there is no EOLN to be had, then we _need_ to
drop characters.

So I'd think that the n_tty_receive_room() function should look something
like this:

static int n_tty_receive_room(struct tty_struct *tty)
{
int left = N_TTY_BUF_SIZE - tty->read_cnt - 1;

if (left <= 0)
left = tty->icanon && !tty->canon_data;
return left;
}

which basically says: only accept as much data as we can fit in the
buffer, BUT if we can't really fit anything and we're in canon mode (but
haven't seen a newline yet), we need to allow the writer to continut to
trickle data (that we will discard) in the hopes of _eventually_ seeing an
EOLN.

Does that make sense to you? Maybe the "input full, but no canon_data"
special case would be better handled in the read path, rather than the
write path?

Linus

2005-02-16 14:43:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Pty is losing bytes

On Tue, Feb 15, 2005 at 08:05:05PM -0800, Linus Torvalds wrote:
> Why have that "tty->icanon && !tty->canon_data" test in the first place, I
> wonder? Isn't the "left" calculation always correct? That's really how
> many bytes free we have in the tty, that "canon_data" thing is just about
> how much of it is available for _reading_ as canon, no? (Ie "how many
> characters that have seen a finishing end-of-line"). So I don't see why
> that canon_data test is relevant to the question of filling the buffer..

The comment above the test explains why that test is there in
n_tty_receive_room. If that test isn't there, and we are doing input
canonicalization, when the buffer gets full, the low-level driver will
either flow control the source (so the ERASE or EOLN characters won't
get sent through) or the low-level driver will drop the characters
(and the line discpline will never see the ERASE or EOLN characters).

So the idea behind this code was to lie to the low-level driver, so
that the line discpline would always get the characters, and then the
line discpline could process ERASE, WERASE, KILL, or EOL, and drop the
rest on the floor. At least, that was the basic idea. Yes, it was a
kludge; no I'm not particularly proud of it.

The whole structure badly needs to be ripped apart and rewritten, but
unfortunately few employers seem to be willing to dedicate their
hackers' time to Do This Right. Sigh....

- Ted

2005-02-16 16:06:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: Pty is losing bytes



On Wed, 16 Feb 2005, Theodore Ts'o wrote:
>
> The comment above the test explains why that test is there in
> n_tty_receive_room. If that test isn't there, and we are doing input
> canonicalization, when the buffer gets full

Yes, yes, but did you see my suggested version that I had just below that
explained what I thought the real fix was?

Th eproblem with checking for the "canon but no canon data" is that it's a
special case that IS ONLY VALID WHEN THE BUFFER IS FULL! Until that
happens, it means that the code returns the wrong value, and then can
(obviously, as seen by the bug) drop bytes even when it shouldn't.

That's why my suggested work-around moved things around, to only return
the "we'll take anything" thing if the buffer really was full.

Linus

2005-02-16 19:50:20

by Roman Zippel

[permalink] [raw]
Subject: Re: Pty is losing bytes

Hi,

On Tue, 15 Feb 2005, Linus Torvalds wrote:

> So I'd still worry whether that added -1 actually fixes the bug, or just
> means that a off-by-one has to now be off-by-two to be noticeable..

You're right, even if one just writes 4095 bytes, it will also drop
tty->read_cnt characters later.

> Does that make sense to you? Maybe the "input full, but no canon_data"
> special case would be better handled in the read path, rather than the
> write path?

There might be no reader at this time. The basic problem is that this
should be handled in the receive_buf path, but for that it had to return
the numbers of characters written (or dropped in the no canon_data case).
Relying on receive_room value instead is rather bogus.

Below is a new patch, which also fixes problems with very long lines.
1. if receive_room() returns a small number, write_chan() has to
continue writing, otherwise it will sleep with nobody waking it up.
2. we mustn't simply drop eol characters in n_tty_receive_char otherwise
canon_data isn't increased and the reader isn't woken up.

bye, Roman

n_tty.c | 33 +++++++++++++++------------------
1 files changed, 15 insertions(+), 18 deletions(-)

Index: linux-2.6/drivers/char/n_tty.c
===================================================================
--- linux-2.6.orig/drivers/char/n_tty.c 2005-02-16 16:01:35.000000000 +0100
+++ linux-2.6/drivers/char/n_tty.c 2005-02-16 19:17:13.053000548 +0100
@@ -770,10 +770,8 @@ send_signal:
}
if (c == '\n') {
if (L_ECHO(tty) || L_ECHONL(tty)) {
- if (tty->read_cnt >= N_TTY_BUF_SIZE-1) {
+ if (tty->read_cnt >= N_TTY_BUF_SIZE-1)
put_char('\a', tty);
- return;
- }
opost('\n', tty);
}
goto handle_newline;
@@ -790,10 +788,8 @@ send_signal:
* XXX are EOL_CHAR and EOL2_CHAR echoed?!?
*/
if (L_ECHO(tty)) {
- if (tty->read_cnt >= N_TTY_BUF_SIZE-1) {
+ if (tty->read_cnt >= N_TTY_BUF_SIZE-1)
put_char('\a', tty);
- return;
- }
/* Record the column of first canon char. */
if (tty->canon_head == tty->read_head)
tty->canon_column = tty->column;
@@ -862,12 +858,9 @@ static int n_tty_receive_room(struct tty
* that erase characters will be handled. Other excess
* characters will be beeped.
*/
- if (tty->icanon && !tty->canon_data)
- return N_TTY_BUF_SIZE;
-
- if (left > 0)
- return left;
- return 0;
+ if (left <= 0)
+ left = tty->icanon && !tty->canon_data;
+ return left;
}

/**
@@ -1473,13 +1466,17 @@ static ssize_t write_chan(struct tty_str
if (tty->driver->flush_chars)
tty->driver->flush_chars(tty);
} else {
- c = tty->driver->write(tty, b, nr);
- if (c < 0) {
- retval = c;
- goto break_out;
+ while (nr > 0) {
+ c = tty->driver->write(tty, b, nr);
+ if (c < 0) {
+ retval = c;
+ goto break_out;
+ }
+ if (!c)
+ break;
+ b += c;
+ nr -= c;
}
- b += c;
- nr -= c;
}
if (!nr)
break;

2005-02-16 19:59:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: Pty is losing bytes



On Wed, 16 Feb 2005, Roman Zippel wrote:
>
> Below is a new patch, which also fixes problems with very long lines.

Ok, I agree with this one, but won't dare to apply it right now. Remind me
post-2.6.11, or - even better - see if one of the alternate trees wants to
fix and test this (-mm, -ac, vendors, who-ever).

And testing the dang thing sounds wonderful.

Andreas' testprogram is a good place to start, probably coupled with some
test-files with really really long lines or something..

Linus

2005-02-16 22:41:22

by Andrew Morton

[permalink] [raw]
Subject: Re: Pty is losing bytes

Linus Torvalds <[email protected]> wrote:
>
>
>
> On Wed, 16 Feb 2005, Roman Zippel wrote:
> >
> > Below is a new patch, which also fixes problems with very long lines.
>
> Ok, I agree with this one, but won't dare to apply it right now. Remind me
> post-2.6.11, or - even better - see if one of the alternate trees wants to
> fix and test this (-mm, -ac, vendors, who-ever).

I scooped it up.

2005-02-17 04:45:14

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Pty is losing bytes

On Wed, Feb 16, 2005 at 08:06:00AM -0800, Linus Torvalds wrote:
> Yes, yes, but did you see my suggested version that I had just below that
> explained what I thought the real fix was?
>
> Th eproblem with checking for the "canon but no canon data" is that it's a
> special case that IS ONLY VALID WHEN THE BUFFER IS FULL! Until that
> happens, it means that the code returns the wrong value, and then can
> (obviously, as seen by the bug) drop bytes even when it shouldn't.
>
> That's why my suggested work-around moved things around, to only return
> the "we'll take anything" thing if the buffer really was full.

Yes, but then when the buffer is full, and we return the "we'll take
anything" return value, the code that was getting confused with the
"incorrect" receive_room value will still be getting confused....

- Ted

2005-02-17 15:45:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: Pty is losing bytes



On Wed, 16 Feb 2005, Theodore Ts'o wrote:
>
> Yes, but then when the buffer is full, and we return the "we'll take
> anything" return value, the code that was getting confused with the
> "incorrect" receive_room value will still be getting confused....

But that's fine - at that point we're literally _supposed_ to drop
characters: we have to, in order to see the EOLN.

But we're only supposed to drop characters IFF:
- the buffer is full
- we are in canon mode
- we _still_ haven't seen a single EOLN in the whole buffer

If any of these three isn't true, we're not supposed to drop anything.

That's my argument.

Linus

2005-02-18 16:33:09

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Pty is losing bytes

On Thu, Feb 17, 2005 at 07:45:56AM -0800, Linus Torvalds wrote:
>
>
> On Wed, 16 Feb 2005, Theodore Ts'o wrote:
> >
> > Yes, but then when the buffer is full, and we return the "we'll take
> > anything" return value, the code that was getting confused with the
> > "incorrect" receive_room value will still be getting confused....
>
> But that's fine - at that point we're literally _supposed_ to drop
> characters: we have to, in order to see the EOLN.
>
> But we're only supposed to drop characters IFF:
> - the buffer is full
> - we are in canon mode
> - we _still_ haven't seen a single EOLN in the whole buffer

Sure, but in that case, if we return "no worries, we have plenty of
memory", then we're opening ourselves up to the memory overrun problem
that was the original issue in the first place. I suspect that the
underlying problem is that somewhere in the tty layer there is code
that is using receive_room() as permission to simply push that many
characters into the receive queue, instead of using that function as
an hint of how many characters would be profitable to feed into the
line discpline.

- Ted