2013-09-25 12:26:01

by Mikael Pettersson

[permalink] [raw]
Subject: [REGRESSION][BISECTED] 3.12-rc "n_tty: Don't wait for buffer work in read() loop" patch breaks gcc's testsuite

With 3.12-rc[12] I see unexpected failures in gcc's Ada acats testsuite, e.g.

=== acats tests ===
FAIL: a83009b
FAIL: c37209a
FAIL: c45531e
FAIL: c45614a
FAIL: c67005d
FAIL: c730a01
FAIL: c74302b
FAIL: cc3004a
FAIL: cd2a24j
FAIL: cd2a53a
FAIL: cxa3001
FAIL: cxf3a07
FAIL: cxf3a08

=== acats Summary ===
# of expected passes 2307
# of unexpected failures 13
Native configuration is x86_64-unknown-linux-gnu

The exact failures vary from run to run, but some failures always occur on my
x86_64 machines, and all three open gcc branches (trunk, 4.8, 4.7) are affected.
With a 3.11 kernel the acats testsuite is always clean.

A bisection identified:

>From f95499c3030fe1bfad57745f2db1959c5b43dca8 Mon Sep 17 00:00:00 2001
From: Peter Hurley <[email protected]>
Date: Sat, 15 Jun 2013 13:14:29 +0000
Subject: n_tty: Don't wait for buffer work in read() loop

User-space read() can run concurrently with receiving from device;
waiting for receive_buf() to complete is not required.

Signed-off-by: Peter Hurley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index fe1c399..a6eea30 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1724,7 +1724,6 @@ static inline int input_available_p(struct tty_struct *tty, int amt)
{
struct n_tty_data *ldata = tty->disc_data;

- tty_flush_to_ldisc(tty);
if (ldata->icanon && !L_EXTPROC(tty)) {
if (ldata->canon_head != ldata->read_tail)
return 1;

as the culprit. Reverting that from 3.12-rc2 eliminates the acats failures
and brings the gcc testsuite results to what one gets with 3.11.

I can't pretend to understand exactly what goes wrong, suffice it to say that
the gcc testsuite harness uses a combination of shell, expect, and tcl. I
suspect ptys are also involved.

To repeat, bootstrap a recent gcc 4.8 snapshot w/ ada in --enable-languages,
then run the test suite with "make -j6 -k check; make mail-report.log".
(Adjust -jN as appropriate, but -j6 is what I'm using on my quad-core i7s.)

Please consider reverting or fixing this patch.

/Mikael


2013-09-25 13:50:23

by Peter Hurley

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] 3.12-rc "n_tty: Don't wait for buffer work in read() loop" patch breaks gcc's testsuite

On 09/25/2013 08:18 AM, Mikael Pettersson wrote:
> With 3.12-rc[12] I see unexpected failures in gcc's Ada acats testsuite, e.g.
>
> === acats tests ===
> FAIL: a83009b
> FAIL: c37209a
> FAIL: c45531e
> FAIL: c45614a
> FAIL: c67005d
> FAIL: c730a01
> FAIL: c74302b
> FAIL: cc3004a
> FAIL: cd2a24j
> FAIL: cd2a53a
> FAIL: cxa3001
> FAIL: cxf3a07
> FAIL: cxf3a08
>
> === acats Summary ===
> # of expected passes 2307
> # of unexpected failures 13
> Native configuration is x86_64-unknown-linux-gnu

Thanks for the report.
Would you please send me the acats.log file from a failed testsuite run with its
matching screen output?

Regards,
Peter Hurley


> The exact failures vary from run to run, but some failures always occur on my
> x86_64 machines, and all three open gcc branches (trunk, 4.8, 4.7) are affected.
> With a 3.11 kernel the acats testsuite is always clean.
>
> A bisection identified:
>
> From f95499c3030fe1bfad57745f2db1959c5b43dca8 Mon Sep 17 00:00:00 2001
> From: Peter Hurley <[email protected]>
> Date: Sat, 15 Jun 2013 13:14:29 +0000
> Subject: n_tty: Don't wait for buffer work in read() loop
>
> User-space read() can run concurrently with receiving from device;
> waiting for receive_buf() to complete is not required.
>
> Signed-off-by: Peter Hurley <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ---
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index fe1c399..a6eea30 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -1724,7 +1724,6 @@ static inline int input_available_p(struct tty_struct *tty, int amt)
> {
> struct n_tty_data *ldata = tty->disc_data;
>
> - tty_flush_to_ldisc(tty);
> if (ldata->icanon && !L_EXTPROC(tty)) {
> if (ldata->canon_head != ldata->read_tail)
> return 1;
>
> as the culprit. Reverting that from 3.12-rc2 eliminates the acats failures
> and brings the gcc testsuite results to what one gets with 3.11.
>
> I can't pretend to understand exactly what goes wrong, suffice it to say that
> the gcc testsuite harness uses a combination of shell, expect, and tcl. I
> suspect ptys are also involved.
>
> To repeat, bootstrap a recent gcc 4.8 snapshot w/ ada in --enable-languages,
> then run the test suite with "make -j6 -k check; make mail-report.log".
> (Adjust -jN as appropriate, but -j6 is what I'm using on my quad-core i7s.)
>
> Please consider reverting or fixing this patch.
>
> /Mikael
>

2013-09-25 13:52:36

by Peter Hurley

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] 3.12-rc "n_tty: Don't wait for buffer work in read() loop" patch breaks gcc's testsuite

[ +cc Greg Kroah-Hartman ]

On 09/25/2013 09:50 AM, Peter Hurley wrote:
> On 09/25/2013 08:18 AM, Mikael Pettersson wrote:
>> With 3.12-rc[12] I see unexpected failures in gcc's Ada acats testsuite, e.g.
>>
>> === acats tests ===
>> FAIL: a83009b
>> FAIL: c37209a
>> FAIL: c45531e
>> FAIL: c45614a
>> FAIL: c67005d
>> FAIL: c730a01
>> FAIL: c74302b
>> FAIL: cc3004a
>> FAIL: cd2a24j
>> FAIL: cd2a53a
>> FAIL: cxa3001
>> FAIL: cxf3a07
>> FAIL: cxf3a08
>>
>> === acats Summary ===
>> # of expected passes 2307
>> # of unexpected failures 13
>> Native configuration is x86_64-unknown-linux-gnu
>
> Thanks for the report.
> Would you please send me the acats.log file from a failed testsuite run with its
> matching screen output?
>
> Regards,
> Peter Hurley
>
>
>> The exact failures vary from run to run, but some failures always occur on my
>> x86_64 machines, and all three open gcc branches (trunk, 4.8, 4.7) are affected.
>> With a 3.11 kernel the acats testsuite is always clean.
>>
>> A bisection identified:
>>
>> From f95499c3030fe1bfad57745f2db1959c5b43dca8 Mon Sep 17 00:00:00 2001
>> From: Peter Hurley <[email protected]>
>> Date: Sat, 15 Jun 2013 13:14:29 +0000
>> Subject: n_tty: Don't wait for buffer work in read() loop
>>
>> User-space read() can run concurrently with receiving from device;
>> waiting for receive_buf() to complete is not required.
>>
>> Signed-off-by: Peter Hurley <[email protected]>
>> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>> ---
>> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
>> index fe1c399..a6eea30 100644
>> --- a/drivers/tty/n_tty.c
>> +++ b/drivers/tty/n_tty.c
>> @@ -1724,7 +1724,6 @@ static inline int input_available_p(struct tty_struct *tty, int amt)
>> {
>> struct n_tty_data *ldata = tty->disc_data;
>>
>> - tty_flush_to_ldisc(tty);
>> if (ldata->icanon && !L_EXTPROC(tty)) {
>> if (ldata->canon_head != ldata->read_tail)
>> return 1;
>>
>> as the culprit. Reverting that from 3.12-rc2 eliminates the acats failures
>> and brings the gcc testsuite results to what one gets with 3.11.
>>
>> I can't pretend to understand exactly what goes wrong, suffice it to say that
>> the gcc testsuite harness uses a combination of shell, expect, and tcl. I
>> suspect ptys are also involved.
>>
>> To repeat, bootstrap a recent gcc 4.8 snapshot w/ ada in --enable-languages,
>> then run the test suite with "make -j6 -k check; make mail-report.log".
>> (Adjust -jN as appropriate, but -j6 is what I'm using on my quad-core i7s.)
>>
>> Please consider reverting or fixing this patch.
>>
>> /Mikael

2013-09-26 20:07:09

by Peter Hurley

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] 3.12-rc "n_tty: Don't wait for buffer work in read() loop" patch breaks gcc's testsuite

On 09/25/2013 09:52 AM, Peter Hurley wrote:
> [ +cc Greg Kroah-Hartman ]
>
> On 09/25/2013 09:50 AM, Peter Hurley wrote:
>> On 09/25/2013 08:18 AM, Mikael Pettersson wrote:
>>> With 3.12-rc[12] I see unexpected failures in gcc's Ada acats testsuite, e.g.
>>>
>>> === acats tests ===
>>> FAIL: a83009b
>>> FAIL: c37209a
>>> FAIL: c45531e
>>> FAIL: c45614a
>>> FAIL: c67005d
>>> FAIL: c730a01
>>> FAIL: c74302b
>>> FAIL: cc3004a
>>> FAIL: cd2a24j
>>> FAIL: cd2a53a
>>> FAIL: cxa3001
>>> FAIL: cxf3a07
>>> FAIL: cxf3a08
>>>
>>> === acats Summary ===
>>> # of expected passes 2307
>>> # of unexpected failures 13
>>> Native configuration is x86_64-unknown-linux-gnu
>>
>> Thanks for the report.
>> Would you please send me the acats.log file from a failed testsuite run with its
>> matching screen output?
>>
>> Regards,
>> Peter Hurley
>>
>>
>>> The exact failures vary from run to run, but some failures always occur on my
>>> x86_64 machines, and all three open gcc branches (trunk, 4.8, 4.7) are affected.
>>> With a 3.11 kernel the acats testsuite is always clean.
>>>
>>> A bisection identified:
>>>
>>> From f95499c3030fe1bfad57745f2db1959c5b43dca8 Mon Sep 17 00:00:00 2001
>>> From: Peter Hurley <[email protected]>
>>> Date: Sat, 15 Jun 2013 13:14:29 +0000
>>> Subject: n_tty: Don't wait for buffer work in read() loop
>>>
>>> User-space read() can run concurrently with receiving from device;
>>> waiting for receive_buf() to complete is not required.
>>>
>>> Signed-off-by: Peter Hurley <[email protected]>
>>> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>>> ---
>>> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
>>> index fe1c399..a6eea30 100644
>>> --- a/drivers/tty/n_tty.c
>>> +++ b/drivers/tty/n_tty.c
>>> @@ -1724,7 +1724,6 @@ static inline int input_available_p(struct tty_struct *tty, int amt)
>>> {
>>> struct n_tty_data *ldata = tty->disc_data;
>>>
>>> - tty_flush_to_ldisc(tty);
>>> if (ldata->icanon && !L_EXTPROC(tty)) {
>>> if (ldata->canon_head != ldata->read_tail)
>>> return 1;
>>>
>>> as the culprit. Reverting that from 3.12-rc2 eliminates the acats failures
>>> and brings the gcc testsuite results to what one gets with 3.11.
>>>
>>> I can't pretend to understand exactly what goes wrong, suffice it to say that
>>> the gcc testsuite harness uses a combination of shell, expect, and tcl. I
>>> suspect ptys are also involved.
>>>
>>> To repeat, bootstrap a recent gcc 4.8 snapshot w/ ada in --enable-languages,
>>> then run the test suite with "make -j6 -k check; make mail-report.log".
>>> (Adjust -jN as appropriate, but -j6 is what I'm using on my quad-core i7s.)

Ok, I've managed to reproduce this (epic adventure).

What happens is the child process (the test) writes to its stdout (which is
the slave end of a pty pair) and exits.

Then, the parent (expect), waiting for output from the child, is scheduled
and run before the tty buffer i/o loop has pushed any data to the pty master
read buffer.

IOW, at that particular instant, the pty appears to be closed (read return -EIO).

(The converse is also possible: ie., writing to the master and then closing
the master may not be read by the slave.)

>>> Please consider reverting or fixing this patch.

I need to think a little on the right way to fix this.

Regards,
Peter Hurley

2013-09-27 17:27:32

by Peter Hurley

[permalink] [raw]
Subject: [PATCH] tty: Fix pty master read() after slave closes

Commit f95499c3030fe1bfad57745f2db1959c5b43dca8,
n_tty: Don't wait for buffer work in read() loop
creates a race window which can cause a pty master read()
to miss the last pty slave write(s) and return -EIO instead,
thus signalling the pty slave is closed. This can happen when
the pty slave is written and immediately closed but before the
tty buffer i/o loop receives the new input; the pty master
read() is scheduled, sees its read buffer is empty and the
pty slave has been closed, and exits.

Because tty_flush_to_ldisc() has significant performance impact
for parallel i/o, rather than revert the commit, special case this
condition (ie., when the read buffer is empty and the 'other' pty
has been closed) and, only then, wait for buffer work to complete
before re-testing if the read buffer is still empty.

As before, subsequent pty master reads return any available data
until no more data is available, and then returns -EIO to
indicate the pty slave has closed.

Reported-by: Mikael Pettersson <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/n_tty.c | 46 ++++++++++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 09505ff..4e69f3b 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -2176,28 +2176,34 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,

if (!input_available_p(tty, 0)) {
if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
- retval = -EIO;
- break;
- }
- if (tty_hung_up_p(file))
- break;
- if (!timeout)
- break;
- if (file->f_flags & O_NONBLOCK) {
- retval = -EAGAIN;
- break;
- }
- if (signal_pending(current)) {
- retval = -ERESTARTSYS;
- break;
- }
- n_tty_set_room(tty);
- up_read(&tty->termios_rwsem);
+ up_read(&tty->termios_rwsem);
+ tty_flush_to_ldisc(tty);
+ down_read(&tty->termios_rwsem);
+ if (!input_available_p(tty, 0)) {
+ retval = -EIO;
+ break;
+ }
+ } else {
+ if (tty_hung_up_p(file))
+ break;
+ if (!timeout)
+ break;
+ if (file->f_flags & O_NONBLOCK) {
+ retval = -EAGAIN;
+ break;
+ }
+ if (signal_pending(current)) {
+ retval = -ERESTARTSYS;
+ break;
+ }
+ n_tty_set_room(tty);
+ up_read(&tty->termios_rwsem);

- timeout = schedule_timeout(timeout);
+ timeout = schedule_timeout(timeout);

- down_read(&tty->termios_rwsem);
- continue;
+ down_read(&tty->termios_rwsem);
+ continue;
+ }
}
__set_current_state(TASK_RUNNING);

--
1.8.1.2

2013-09-27 17:32:36

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH] tty: Fix pty master read() after slave closes

On 09/27/2013 01:27 PM, Peter Hurley wrote:
> Commit f95499c3030fe1bfad57745f2db1959c5b43dca8,
> n_tty: Don't wait for buffer work in read() loop
> creates a race window which can cause a pty master read()
> to miss the last pty slave write(s) and return -EIO instead,
> thus signalling the pty slave is closed. This can happen when
> the pty slave is written and immediately closed but before the
> tty buffer i/o loop receives the new input; the pty master
> read() is scheduled, sees its read buffer is empty and the
> pty slave has been closed, and exits.
>
> Because tty_flush_to_ldisc() has significant performance impact
> for parallel i/o, rather than revert the commit, special case this
> condition (ie., when the read buffer is empty and the 'other' pty
> has been closed) and, only then, wait for buffer work to complete
> before re-testing if the read buffer is still empty.
>
> As before, subsequent pty master reads return any available data
> until no more data is available, and then returns -EIO to
> indicate the pty slave has closed.
>
> Reported-by: Mikael Pettersson <[email protected]>
> Signed-off-by: Peter Hurley <[email protected]>

I tested this patch with the gcc ada ACATS testsuite and got this
summary so all seems ok with this patch.

=== acats Summary ===
# of expected passes 805
# of unexpected failures 0

...

=== acats Summary ===
# of expected passes 772
# of unexpected failures 0

...

=== acats Summary ===
# of expected passes 743
# of unexpected failures 0

Mikael,

I'd appreciate if you could re-test with this patch and
confirm the issue is fixed.

Regards,
Peter Hurley

2013-09-28 17:37:24

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH] tty: Fix pty master read() after slave closes

Peter Hurley writes:
> On 09/27/2013 01:27 PM, Peter Hurley wrote:
> > Commit f95499c3030fe1bfad57745f2db1959c5b43dca8,
> > n_tty: Don't wait for buffer work in read() loop
> > creates a race window which can cause a pty master read()
> > to miss the last pty slave write(s) and return -EIO instead,
> > thus signalling the pty slave is closed. This can happen when
> > the pty slave is written and immediately closed but before the
> > tty buffer i/o loop receives the new input; the pty master
> > read() is scheduled, sees its read buffer is empty and the
> > pty slave has been closed, and exits.
> >
> > Because tty_flush_to_ldisc() has significant performance impact
> > for parallel i/o, rather than revert the commit, special case this
> > condition (ie., when the read buffer is empty and the 'other' pty
> > has been closed) and, only then, wait for buffer work to complete
> > before re-testing if the read buffer is still empty.
> >
> > As before, subsequent pty master reads return any available data
> > until no more data is available, and then returns -EIO to
> > indicate the pty slave has closed.
> >
> > Reported-by: Mikael Pettersson <[email protected]>
> > Signed-off-by: Peter Hurley <[email protected]>
>
> I tested this patch with the gcc ada ACATS testsuite and got this
> summary so all seems ok with this patch.
>
> === acats Summary ===
> # of expected passes 805
> # of unexpected failures 0
>
> ...
>
> === acats Summary ===
> # of expected passes 772
> # of unexpected failures 0
>
> ...
>
> === acats Summary ===
> # of expected passes 743
> # of unexpected failures 0
>
> Mikael,
>
> I'd appreciate if you could re-test with this patch and
> confirm the issue is fixed.

I did a big pile of gcc bootstrap + regression test runs today with
this patch, and everything looks good. Thus:

Tested-by: Mikael Pettersson <[email protected]>

Thanks,

/Mikael