2009-10-06 09:52:24

by Felipe Balbi

[permalink] [raw]
Subject: TTY loosing bytes ?

Hi all,

I'm facing a weird issue with TTY when using u_serial gadget.

The problem seems to be related to ldisc's buffer getting full. Imagine
a situation where you have 510 bytes available on N_TTY's buffer and you
try to flush 512 bytes then the code path would look like:

--> flush_to_ldisc();
--> n_tty_receive_buf(512 bytes);
--> i = 510;
--> memcpy(i);

--> i = 0;
--> memcpy(i);
<-- wake_up_interruptible();

What happens is that we loose those 2 extra bytes. I tried digging this
problem and try to find a fix for it but it's been really complicated
for me.

Hope you guys can help. Actually, I've been reproducing the problem on
2.6.28 but looking at the code current mainline should have the same
problem. I also cherry-picked the relevant patches to my 2.6.28 tree
(hope I got all of them) and it still failed.

Sorry I'm reporting such a problem with such an old kernel but upgrading
the kernel right now is just impossible.

If you guys could give any pointers I would be really glad.

--
balbi


2009-10-06 11:49:20

by Felipe Balbi

[permalink] [raw]
Subject: Re: TTY loosing bytes ?

On Tue, Oct 06, 2009 at 11:48:45AM +0200, Balbi Felipe (Nokia-D/Helsinki) wrote:
> Hi all,
>
> I'm facing a weird issue with TTY when using u_serial gadget.
>
> The problem seems to be related to ldisc's buffer getting full. Imagine
> a situation where you have 510 bytes available on N_TTY's buffer and you
> try to flush 512 bytes then the code path would look like:
>
> --> flush_to_ldisc();
> --> n_tty_receive_buf(512 bytes);
> --> i = 510;
> --> memcpy(i);
>
> --> i = 0;
> --> memcpy(i);
> <-- wake_up_interruptible();
>
> What happens is that we loose those 2 extra bytes. I tried digging this
> problem and try to find a fix for it but it's been really complicated
> for me.
>
> Hope you guys can help. Actually, I've been reproducing the problem on
> 2.6.28 but looking at the code current mainline should have the same
> problem. I also cherry-picked the relevant patches to my 2.6.28 tree
> (hope I got all of them) and it still failed.
>
> Sorry I'm reporting such a problem with such an old kernel but upgrading
> the kernel right now is just impossible.
>
> If you guys could give any pointers I would be really glad.

one more information/question:

why doesn't receive_buf() return the amount of bytes actually received ?
I see flush_to_ldisc() believes it can flush everything before even
calling receive_buf() then it will never act on the possibility of
receive_buf() not being able to receive the entire data.

Am I right ? Should receive_buf() return the amount of bytes actually
received ? Also, why isn't receive_room enough to be sure there's
enough space to really receive that block of data ?

--
balbi

2009-10-06 12:11:56

by Alan

[permalink] [raw]
Subject: Re: TTY loosing bytes ?

> why doesn't receive_buf() return the amount of bytes actually received ?

You'd have to ask whoever wrote the code in 199something.

> I see flush_to_ldisc() believes it can flush everything before even
> calling receive_buf() then it will never act on the possibility of
> receive_buf() not being able to receive the entire data.

The ldisc is responsible for maintaining tty->receive_room correctly at
all times.

> Am I right ? Should receive_buf() return the amount of bytes actually
> received ? Also, why isn't receive_room enough to be sure there's
> enough space to really receive that block of data ?

I've not seen this reported elsewhere so I assume you are somehow
tripping a bug in the n_tty ldisc code. The other possibility is that you
are in canonical mode and some of your input is intentionally discarded
by the ldisc either as errors, overruns or through things like quoting or
flow control.

2009-10-06 13:05:33

by Felipe Balbi

[permalink] [raw]
Subject: Re: TTY loosing bytes ?

On Tue, Oct 06, 2009 at 02:12:30PM +0200, ext Alan Cox wrote:
> > why doesn't receive_buf() return the amount of bytes actually received ?
>
> You'd have to ask whoever wrote the code in 199something.

fair enough ;-)

> > I see flush_to_ldisc() believes it can flush everything before even
> > calling receive_buf() then it will never act on the possibility of
> > receive_buf() not being able to receive the entire data.
>
> The ldisc is responsible for maintaining tty->receive_room correctly at
> all times.
>
> > Am I right ? Should receive_buf() return the amount of bytes actually
> > received ? Also, why isn't receive_room enough to be sure there's
> > enough space to really receive that block of data ?
>
> I've not seen this reported elsewhere so I assume you are somehow
> tripping a bug in the n_tty ldisc code. The other possibility is that you
> are in canonical mode and some of your input is intentionally discarded
> by the ldisc either as errors, overruns or through things like quoting or
> flow control.

hmm, not canonical, no. I'm falling on the if (tty->real_raw) in
n_tty_receive_buf() for sure. Have prints there.

The following patch helps a whole lot but sometimes it still gets stuck
and I'm now debugging that:

diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 2e50f4d..a00bd8d 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -1348,7 +1348,7 @@ static void n_tty_write_wakeup(struct tty_struct *tty)
* calls one at a time and in order (or using flush_to_ldisc)
*/

-static void n_tty_receive_buf(struct tty_struct *tty, const unsigned char *cp,
+static int n_tty_receive_buf(struct tty_struct *tty, const unsigned char *cp,
char *fp, int count)
{
const unsigned char *p;
@@ -1356,9 +1356,10 @@ static void n_tty_receive_buf(struct tty_struct *tty, const unsigned char *cp,
int i;
char buf[64];
unsigned long cpuflags;
+ int ret = 0;

if (!tty->read_buf)
- return;
+ return 0;

if (tty->real_raw) {
spin_lock_irqsave(&tty->read_lock, cpuflags);
@@ -1370,6 +1371,7 @@ static void n_tty_receive_buf(struct tty_struct *tty, const unsigned char *cp,
tty->read_cnt += i;
cp += i;
count -= i;
+ ret += i;

i = min(N_TTY_BUF_SIZE - tty->read_cnt,
N_TTY_BUF_SIZE - tty->read_head);
@@ -1377,8 +1379,11 @@ static void n_tty_receive_buf(struct tty_struct *tty, const unsigned char *cp,
memcpy(tty->read_buf + tty->read_head, cp, i);
tty->read_head = (tty->read_head + i) & (N_TTY_BUF_SIZE-1);
tty->read_cnt += i;
+ ret += i;
spin_unlock_irqrestore(&tty->read_lock, cpuflags);
+
} else {
+ ret = count;
for (i = count, p = cp, f = fp; i; i--, p++) {
if (f)
flags = *f++;
@@ -1421,6 +1426,8 @@ static void n_tty_receive_buf(struct tty_struct *tty, const unsigned char *cp,
*/
if (tty->receive_room < TTY_THRESHOLD_THROTTLE)
tty_throttle(tty);
+
+ return ret;
}

int is_ignored(int sig)
diff --git a/drivers/char/tty_buffer.c b/drivers/char/tty_buffer.c
index 3108991..e53adb7 100644
--- a/drivers/char/tty_buffer.c
+++ b/drivers/char/tty_buffer.c
@@ -58,7 +58,7 @@ static struct tty_buffer *tty_buffer_alloc(struct tty_struct *tty, size_t size)
{
struct tty_buffer *p;

- if (tty->buf.memory_used + size > 65536)
+ if (tty->buf.memory_used + size > 96 * 1024)
return NULL;
p = kmalloc(sizeof(struct tty_buffer) + 2 * size, GFP_ATOMIC);
if (p == NULL)
@@ -417,6 +417,7 @@ static void flush_to_ldisc(struct work_struct *work)
if (head != NULL) {
tty->buf.head = NULL;
for (;;) {
+ int copied;
int count = head->commit - head->read;
if (!count) {
if (head->next == NULL)
@@ -439,11 +440,11 @@ static void flush_to_ldisc(struct work_struct *work)
count = tty->receive_room;
char_buf = head->char_buf_ptr + head->read;
flag_buf = head->flag_buf_ptr + head->read;
- head->read += count;
spin_unlock_irqrestore(&tty->buf.lock, flags);
- disc->ops->receive_buf(tty, char_buf,
+ copied = disc->ops->receive_buf(tty, char_buf,
flag_buf, count);
spin_lock_irqsave(&tty->buf.lock, flags);
+ head->read += copied;
}
/* Restore the queue head */
tty->buf.head = head;
diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index 0c4ee9b..e1c940f 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -133,7 +133,7 @@ struct tty_ldisc_ops {
/*
* The following routines are called from below.
*/
- void (*receive_buf)(struct tty_struct *, const unsigned char *cp,
+ int (*receive_buf)(struct tty_struct *, const unsigned char *cp,
char *fp, int count);
void (*write_wakeup)(struct tty_struct *);


to me it seems that receive_room is being mis-set as I can see from some
debugging messages I added:

[ 517.793792] first: read_cnt 3586 read_head 2904, second: read_cnt 4096 read_head 3414, room 1021
[ 517.800994] Fuck, lost bytes (510, 512)!
[ 524.998687] first: read_cnt 3591 read_head 3408, second: read_cnt 4096 read_head 3913, room 1016
[ 525.005889] Fuck, lost bytes (505, 512)!

and it goes on and on.

With the patch above I still get this messages but it still goes through
since not receive_buf is returning the amount of bytes actually
received. Then flush_to_ldisc() will retry those bytes on the next
iteration. Maybe this is not the desired patch though ?

Thanks a lot for the comments Alan.

--
balbi

2009-10-06 13:10:42

by Alan

[permalink] [raw]
Subject: Re: TTY loosing bytes ?

> With the patch above I still get this messages but it still goes through
> since not receive_buf is returning the amount of bytes actually
> received. Then flush_to_ldisc() will retry those bytes on the next
> iteration. Maybe this is not the desired patch though ?
>
> Thanks a lot for the comments Alan.

First guess try the following

n_tty: For some I/O patterns n_tty will lose bytes

From: Alan Cox <[email protected]>

Fix the obvious disagreement between the receive path and the receive room
logic.

Signed-off-by: Alan Cox <[email protected]>
---

drivers/char/n_tty.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)


diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 2e50f4d..27b28ca 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -95,6 +95,9 @@ static void n_tty_set_room(struct tty_struct *tty)
/* tty->read_cnt is not read locked ? */
int left = N_TTY_BUF_SIZE - tty->read_cnt - 1;

+ if (tty->real_raw)
+ left = min(left, N_TTY_BUF_SIZE - tty->read_head - 1);
+
/*
* If we are doing input canonicalization, and there are no
* pending newlines, let characters through without limit, so

2009-10-06 13:31:31

by Felipe Balbi

[permalink] [raw]
Subject: Re: TTY loosing bytes ?

On Tue, Oct 06, 2009 at 03:11:15PM +0200, ext Alan Cox wrote:
> > With the patch above I still get this messages but it still goes through
> > since not receive_buf is returning the amount of bytes actually
> > received. Then flush_to_ldisc() will retry those bytes on the next
> > iteration. Maybe this is not the desired patch though ?
> >
> > Thanks a lot for the comments Alan.
>
> First guess try the following
>
> n_tty: For some I/O patterns n_tty will lose bytes
>
> From: Alan Cox <[email protected]>
>
> Fix the obvious disagreement between the receive path and the receive room
> logic.
>
> Signed-off-by: Alan Cox <[email protected]>
> ---
>
> drivers/char/n_tty.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
>
> diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
> index 2e50f4d..27b28ca 100644
> --- a/drivers/char/n_tty.c
> +++ b/drivers/char/n_tty.c
> @@ -95,6 +95,9 @@ static void n_tty_set_room(struct tty_struct *tty)
> /* tty->read_cnt is not read locked ? */
> int left = N_TTY_BUF_SIZE - tty->read_cnt - 1;
>
> + if (tty->real_raw)
> + left = min(left, N_TTY_BUF_SIZE - tty->read_head - 1);
> +
> /*
> * If we are doing input canonicalization, and there are no
> * pending newlines, let characters through without limit, so

unfortunately it didn't work. Although now I don't the debugging prints
complaining we've lost bytes, the transfer does get stuck in lot earlier
stage.

Maybe this is one fix but there's still other problem with that code ?

--
balbi

2009-10-06 13:39:40

by Alan

[permalink] [raw]
Subject: Re: TTY loosing bytes ?

> unfortunately it didn't work. Although now I don't the debugging prints
> complaining we've lost bytes, the transfer does get stuck in lot earlier
> stage.
>
> Maybe this is one fix but there's still other problem with that code ?

Or with completely unrelated code. I guess you need to measure what goes
into receive_buf versus what appears in userspace

Alan

2009-10-06 13:46:04

by Felipe Balbi

[permalink] [raw]
Subject: Re: TTY loosing bytes ?

On Tue, Oct 06, 2009 at 03:40:12PM +0200, ext Alan Cox wrote:
> > unfortunately it didn't work. Although now I don't the debugging prints
> > complaining we've lost bytes, the transfer does get stuck in lot earlier
> > stage.
> >
> > Maybe this is one fix but there's still other problem with that code ?
>
> Or with completely unrelated code. I guess you need to measure what goes
> into receive_buf versus what appears in userspace

will do it soonish. Thanks Alan, I'll return to you when I have more
debugging.

--
balbi

2009-10-06 13:59:11

by Felipe Balbi

[permalink] [raw]
Subject: Re: TTY loosing bytes ?

On Tue, Oct 06, 2009 at 03:42:09PM +0200, Balbi Felipe (Nokia-D/Helsinki) wrote:
> On Tue, Oct 06, 2009 at 03:40:12PM +0200, ext Alan Cox wrote:
> > > unfortunately it didn't work. Although now I don't the debugging prints
> > > complaining we've lost bytes, the transfer does get stuck in lot earlier
> > > stage.
> > >
> > > Maybe this is one fix but there's still other problem with that code ?
> >
> > Or with completely unrelated code. I guess you need to measure what goes
> > into receive_buf versus what appears in userspace
>
> will do it soonish. Thanks Alan, I'll return to you when I have more
> debugging.

ok, here we are. For some reason the host side seems that it just
stopped sending data to us:

> [ 177.413879] 37 37 37: read for 3 bytes
> [ 177.413909] 37 37 37: read actually 3 bytes (total 3)
> [ 177.413940] 37 37 37: read for 34 bytes
> [ 177.413940] 37 37 37: read actually 34 bytes (total 37)
> [ 177.414916] 37 37 37: write
> [ 177.416992] 40 40 40: read for 3 bytes
> [ 177.417022] 40 40 40: read actually 3 bytes (total 3)
> [ 177.417053] 40 40 40: read for 37 bytes
> [ 177.417053] 40 40 40: read actually 37 bytes (total 40)
> [ 178.115966] 40 40 40: write
> [ 178.116058] 0 0 0: write
> [ 178.175689] 19 19 19: read for 3 bytes
> [ 178.175720] 19 19 19: read actually 3 bytes (total 3)
> [ 178.175750] 19 19 19: read for 16 bytes
> [ 178.175781] 19 19 19: read actually 16 bytes (total 19)
> [ 178.175842] 19 19 19: write
> [ 180.784820] musb_hdrc periph: enabled ep1out for bulk OUT, maxpacket 512
> [ 180.784851] musb_hdrc periph: enabled ep1in for bulk IN, maxpacket 512
> [ 180.848358] musb_hdrc periph: enabled ep1out for bulk OUT, maxpacket 512
> [ 180.848388] musb_hdrc periph: enabled ep1in for bulk IN, maxpacket 512
> [ 186.659851] musb_hdrc periph: enabled ep2in for bulk IN, maxpacket 512
> [ 186.659912] musb_hdrc periph: enabled ep2out for bulk OUT, maxpacket 512
> [ 186.723907] musb_hdrc periph: enabled ep2in for bulk IN, maxpacket 512
> [ 186.723937] musb_hdrc periph: enabled ep2out for bulk OUT, maxpacket 512
> [ 186.848602] 37 37 37: read for 3 bytes
> [ 186.848632] 37 37 37: read actually 3 bytes (total 3)
> [ 186.848663] 37 37 37: read for 34 bytes
> [ 186.848663] 37 37 37: read actually 34 bytes (total 37)
> [ 186.849182] 37 37 37: write
> [ 186.852539] 40 40 40: read for 3 bytes
> [ 186.852569] 40 40 40: read actually 3 bytes (total 3)
> [ 186.852600] 40 40 40: read for 37 bytes
> [ 186.852630] 40 40 40: read actually 37 bytes (total 40)
> [ 187.435577] 40 40 40: write
> [ 187.435699] 0 0 0: write
> [ 187.644683] 47 47 47: read for 3 bytes
> [ 187.644714] 47 47 47: read actually 3 bytes (total 3)
> [ 187.644744] 47 47 47: read for 44 bytes
> [ 187.644775] 47 47 47: read actually 44 bytes (total 47)
> [ 187.648620] 47 47 47: write
> [ 192.939239] 24 24 24: read for 3 bytes
> [ 192.939300] 24 24 24: read actually 3 bytes (total 3)
> [ 192.939300] 24 24 24: read for 21 bytes
> [ 192.939331] 24 24 24: read actually 21 bytes (total 24)
> [ 192.939453] 24 24 24: write
> [ 192.940246] 34 34 34: read for 3 bytes
> [ 192.940246] 34 34 34: read actually 3 bytes (total 3)
> [ 192.940277] 34 34 34: read for 31 bytes
> [ 192.940307] 34 34 34: read actually 31 bytes (total 34)
> [ 192.943206] 34 34 34: write
> [ 192.944152] 34 34 34: read for 3 bytes
> [ 192.944183] 34 34 34: read actually 3 bytes (total 3)
> [ 192.944213] 34 34 34: read for 31 bytes
> [ 192.944213] 34 34 34: read actually 31 bytes (total 34)
> [ 192.944488] 34 34 34: write
> [ 192.945373] 47 47 47: read for 3 bytes
> [ 192.945373] 47 47 47: read actually 3 bytes (total 3)
> [ 192.945404] 47 47 47: read for 44 bytes
> [ 192.945434] 47 47 47: read actually 44 bytes (total 47)
> [ 192.946594] 47 47 47: write
> [ 192.981689] 6656 4096 3832: read for 3 bytes
> [ 192.981750] 7168 4096 3832: read actually 3 bytes (total 3)
> [ 192.981811] 7680 4096 3832: read for 65532 bytes
> [ 192.982177] 11264 11264 3832: read actually 3829 bytes (total 3832)

this is weird. Why didn't we get more data ? Where's the rest of this
block ??

> [ 194.410125] musb_hdrc periph: enabled ep3in for bulk IN, maxpacket 512
> [ 194.410156] musb_hdrc periph: enabled ep3out for bulk OUT, maxpacket 512
> [ 194.474456] musb_hdrc periph: enabled ep3in for bulk IN, maxpacket 512
> [ 194.474487] musb_hdrc periph: enabled ep3out for bulk OUT, maxpacket 512
> [ 200.706115] 65556 65556 3853: read for 3 bytes
> [ 200.706176] 65556 65556 3853: read actually 3 bytes (total 3835)
> [ 200.706207] 65556 65556 3853: read for 18 bytes
> [ 200.706207] 65556 65556 3853: read actually 18 bytes (total 3853)
> [ 200.706542] 65556 65556 3853: write

I'll add more messages to be sure what's going on.

--
balbi