2015-05-01 06:20:53

by NeilBrown

[permalink] [raw]
Subject: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'


Hi Peter,
I recently had a report of a regression in 3.12. I bisected it down to your
patch
f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")

Sometimes a poll on a master-pty will report there is nothing to read after
the slave has written something.
As test program is below.
On a kernel prior to your commit, this program never reports

Total bytes read is 0. PollRC=0

On a kernel subsequent to your commit, that message is produced quite often.

This was found while working on a debugger.

Following the test program is my proposed patch which allows the program to
run as it used to. It re-introduces the call to tty_flush_to_ldisc(), but
only if it appears that there is nothing to read.

Do you think this is a suitable fix? Do you even agree that it is a real
bug?

Thanks,
NeilBrown



------------------------------------------------------
#define _XOPEN_SOURCE
#include<unistd.h>
#include<stdlib.h>
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<errno.h>
#include<sys/wait.h>
#include<sys/types.h>
#include<sys/ptrace.h>
#include<asm/ptrace.h>
#include<fcntl.h>
#include<sys/poll.h>


#define USEPTY
#define COUNT_MAX (5000000)
#define MY_BREAKPOINT { asm("int $3"); }
#define PTRACE_IGNORED (void *)0

/*
** Open s pseudo-tty pair.
**
** Return the master fd, set spty to the slave fd.
*/
int
my_openpt(int *spty)
{
int mfd = posix_openpt(O_RDWR | O_NOCTTY);
char *slavedev;
int sfd=-1;
if(mfd == -1) return -1;
if(grantpt(mfd) == -1) return -1;
if(unlockpt(mfd) == -1) return -1;

slavedev = (char *)ptsname(mfd);

if((sfd = open(slavedev, O_RDWR | O_NOCTTY)) == -1)
{
close(mfd);
return -1;
}
if(spty != NULL)
{
*spty = sfd;
}
return mfd;
}


/*
** Read from the provided file descriptor if poll says there's
** anything there..
*/
int
DoPollRead(int mpty)
{
struct pollfd fds;
int pollrc;
ssize_t bread=0, totread=0;
char readbuf[101];

/*
** Set up the poll.
*/
fds.fd = mpty;
fds.events = POLLIN | POLLRDNORM | POLLRDBAND | POLLPRI;

/*
** poll for any output.
*/

while((pollrc = poll(&fds, 1, 0)) == 1)
{
if(fds.revents & POLLIN)
{
bread = read( mpty, readbuf, 100 );
totread += bread;
if(bread > 0)
{
//printf("Read %d bytes.\n", (int)bread);
readbuf[bread] = '\0';
//printf("\t%s", readbuf);
} else
{
//puts("Nothing read.\n");
}
} else if (fds.revents & (POLLHUP | POLLERR | POLLNVAL)) {
printf ("hangup/error/invalid on poll\n");
return totread;
} else { printf("No POLLIN, revents=%d\n", fds.revents); };
}

/*
** This sometimes happens - we're expecting input on the pty,
** but nothing is there.
*/
if(totread == 0)
printf("Total bytes read is 0. PollRC=%d\n", pollrc);

return totread;
}

static
void writeall (int fd, const char *buf, size_t count)
{
while (count)
{
ssize_t r = write (fd, buf, count);
if (r == 0)
break;
if (r < 0 && errno == EINTR)
continue;
if (r < 0)
exit (2);
count -= r;
buf += r;
}
}

int
thechild(void)
{
unsigned int i;

writeall (1, "debuggee starts\n", strlen ("debuggee starts\n"));

for(i=0 ; i<COUNT_MAX ; i++)
{
char buf[100];
sprintf(buf, "This is the debuggee. Count is %d\n", i);
writeall (1, buf, strlen (buf));

MY_BREAKPOINT
}

writeall (1, "debuggee finishing now.\n", strlen ("debuggee finishing now.\n"));
exit (0);
}

int
main()
{
int rv, status, i=0;
pid_t pid;
int sfd = -1;
int mfd;
#ifdef USEPTY
mfd = my_openpt(&sfd); /* Get a pseudo-tty pair. */
if(mfd < 0)
{
fprintf(stderr, "Failed to create pty\n");
return(1);
}
#else
int pipefd[2];
if (pipe (pipefd) < 0)
{
perror ("pipe");
return 1;
}
mfd = pipefd[0];
sfd = pipefd[1];
#endif

/*
** Create a child process.
*/
pid = fork();
switch(pid)
{
case -1: /* failed fork */
return -1;
case 0: /* child process */

close (mfd);
/*
** Close stdout, use the slave pty for output.
*/
dup2(sfd, STDOUT_FILENO);


/*
** Set 'TRACEME' so this child process can be traced by the
** parent process.
*/
ptrace(PTRACE_TRACEME,
PTRACE_IGNORED, PTRACE_IGNORED, PTRACE_IGNORED);
thechild ();
break;

default: /* parent process drops out of switch */
close (sfd);
break;
}

/*
** Wait for the debuggee to hit the traceme.
** When we see this, immediately send a PTRACE_CONT to kick off
** the debuggee..
*/
rv = waitpid(pid, &status, 0);
if(WIFSTOPPED(status))
{
printf("Debuggee initial stop. Sending PTRACE_CONT..\n");
ptrace(PTRACE_CONT, pid, PTRACE_IGNORED,PTRACE_IGNORED);
}
else
{
printf("Failure of debuggee to hit initial 'traceme'\n");
return 99;
}

/*
** Loop around, waiting for the debuggee to hit its own breakpoint.
** Look for output on the pty. Each time around we should see a line
** from the debuggee.
**/
while(1)
{
int pollrc, stopped, exited;

//printf("Debugger doing a waitpid on debuggee..(%d)\n", i);
i++;

rv = waitpid(pid, &status, 0);
stopped=0;
exited=0;

/*
** Ok, the waitpid has returned. What's happened to the debuggee?
** Note we do recheck rv and drop out later if it's -1.
*/
if(rv == -1)
{
printf("Debuggee process has died?. Checking for output.\n");
}
else if(WIFSTOPPED(status))
{
//printf("Debuggee process has stopped. Checking for output.\n");
stopped=1;
}
else if(!WIFEXITED(status))
{
printf("Debuggee has exited. Checking for output.\n");
exited=1;
}
else
{
printf("WEXITSTATUS is %d\n", WEXITSTATUS(status));
exited=1;
}

/*
** See if there's anything to read. If so display it.
** There always should be, the debuggee writes output on each
** iteration and fflush's it.
*/
(void) DoPollRead(mfd);

/*
** If the debuggee has stopped tell it to continue. If it's
** exited detach from it.
*/
if(stopped)
{
//puts("Sending PTRACE_CONT");
ptrace(PTRACE_CONT, pid, PTRACE_IGNORED,PTRACE_IGNORED);
} else if(exited)
{
printf("Debuggee exited, detaching\n");
ptrace(PTRACE_DETACH, pid, PTRACE_IGNORED,PTRACE_IGNORED);
return 0;
} else if(rv == -1)
{
printf("Debuggee died? Leaving.\n");
return 98;
}
}
}
------------------------------------------------------



From: NeilBrown <[email protected]>
Subject: [PATCH] n_tty: Sometimes wait for buffer work in read() loop

Since commit
f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")

it as been possible for poll to report that there is no data to read
on a master-pty even if a write to the slave has actually completed.

That patch removes a 'wait' when the wait isn't really necessary.
Unfortunately it also removed it in the case when it *is* necessary.
If the simple tests show that there is nothing to read, we really need
to flush the work queue in case there is something ready but which
hasn't arrived yet.

This patch restores the wait, but only if simple tests suggest there
is nothing ready.

Reported-by: Nic Percival <[email protected]>
Reported-by: Michael Matz <[email protected]>
Fixes: f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")
Signed-off-by: NeilBrown <[email protected]>

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index cf6e0f2e1331..9884091819b6 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1942,11 +1942,18 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
{
struct n_tty_data *ldata = tty->disc_data;
int amt = poll && !TIME_CHAR(tty) && MIN_CHAR(tty) ? MIN_CHAR(tty) : 1;
-
- if (ldata->icanon && !L_EXTPROC(tty))
- return ldata->canon_head != ldata->read_tail;
- else
- return ldata->commit_head - ldata->read_tail >= amt;
+ int i;
+ int ret = 0;
+
+ for (i = 0; !ret && i < 2; i++) {
+ if (i)
+ tty_flush_to_ldisc(tty);
+ if (ldata->icanon && !L_EXTPROC(tty))
+ ret = (ldata->canon_head != ldata->read_tail);
+ else
+ ret = (ldata->commit_head - ldata->read_tail >= amt);
+ }
+ return ret;
}

/**


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-01 15:06:12

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

Hi Neil,

On 05/01/2015 02:20 AM, NeilBrown wrote:
>
> Hi Peter,
> I recently had a report of a regression in 3.12. I bisected it down to your
> patch
> f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")
>
> Sometimes a poll on a master-pty will report there is nothing to read after
> the slave has written something.
> As test program is below.
> On a kernel prior to your commit, this program never reports
>
> Total bytes read is 0. PollRC=0
>
> On a kernel subsequent to your commit, that message is produced quite often.
>
> This was found while working on a debugger.
>
> Following the test program is my proposed patch which allows the program to
> run as it used to. It re-introduces the call to tty_flush_to_ldisc(), but
> only if it appears that there is nothing to read.
>
> Do you think this is a suitable fix? Do you even agree that it is a real
> bug?

I don't think this a real bug, in the sense that pty i/o is not synchronous,
in the same way that tty i/o is not synchronous.

However, that said, if this is a regression (regression as in "it broke something
that used to work", not regression as in "this new thing I'm writing doesn't
behave the way I want it to" :) )

Help me understand the use-case here: are you using pty i/o to debug the
debugger?

Regards,
Peter Hurley

> ------------------------------------------------------
> #define _XOPEN_SOURCE
> #include<unistd.h>
> #include<stdlib.h>
> #include<stdio.h>
> #include<stdlib.h>
> #include<string.h>
> #include<errno.h>
> #include<sys/wait.h>
> #include<sys/types.h>
> #include<sys/ptrace.h>
> #include<asm/ptrace.h>
> #include<fcntl.h>
> #include<sys/poll.h>
>
>
> #define USEPTY
> #define COUNT_MAX (5000000)
> #define MY_BREAKPOINT { asm("int $3"); }
> #define PTRACE_IGNORED (void *)0
>
> /*
> ** Open s pseudo-tty pair.
> **
> ** Return the master fd, set spty to the slave fd.
> */
> int
> my_openpt(int *spty)
> {
> int mfd = posix_openpt(O_RDWR | O_NOCTTY);
> char *slavedev;
> int sfd=-1;
> if(mfd == -1) return -1;
> if(grantpt(mfd) == -1) return -1;
> if(unlockpt(mfd) == -1) return -1;
>
> slavedev = (char *)ptsname(mfd);
>
> if((sfd = open(slavedev, O_RDWR | O_NOCTTY)) == -1)
> {
> close(mfd);
> return -1;
> }
> if(spty != NULL)
> {
> *spty = sfd;
> }
> return mfd;
> }
>
>
> /*
> ** Read from the provided file descriptor if poll says there's
> ** anything there..
> */
> int
> DoPollRead(int mpty)
> {
> struct pollfd fds;
> int pollrc;
> ssize_t bread=0, totread=0;
> char readbuf[101];
>
> /*
> ** Set up the poll.
> */
> fds.fd = mpty;
> fds.events = POLLIN | POLLRDNORM | POLLRDBAND | POLLPRI;
>
> /*
> ** poll for any output.
> */
>
> while((pollrc = poll(&fds, 1, 0)) == 1)
> {
> if(fds.revents & POLLIN)
> {
> bread = read( mpty, readbuf, 100 );
> totread += bread;
> if(bread > 0)
> {
> //printf("Read %d bytes.\n", (int)bread);
> readbuf[bread] = '\0';
> //printf("\t%s", readbuf);
> } else
> {
> //puts("Nothing read.\n");
> }
> } else if (fds.revents & (POLLHUP | POLLERR | POLLNVAL)) {
> printf ("hangup/error/invalid on poll\n");
> return totread;
> } else { printf("No POLLIN, revents=%d\n", fds.revents); };
> }
>
> /*
> ** This sometimes happens - we're expecting input on the pty,
> ** but nothing is there.
> */
> if(totread == 0)
> printf("Total bytes read is 0. PollRC=%d\n", pollrc);
>
> return totread;
> }
>
> static
> void writeall (int fd, const char *buf, size_t count)
> {
> while (count)
> {
> ssize_t r = write (fd, buf, count);
> if (r == 0)
> break;
> if (r < 0 && errno == EINTR)
> continue;
> if (r < 0)
> exit (2);
> count -= r;
> buf += r;
> }
> }
>
> int
> thechild(void)
> {
> unsigned int i;
>
> writeall (1, "debuggee starts\n", strlen ("debuggee starts\n"));
>
> for(i=0 ; i<COUNT_MAX ; i++)
> {
> char buf[100];
> sprintf(buf, "This is the debuggee. Count is %d\n", i);
> writeall (1, buf, strlen (buf));
>
> MY_BREAKPOINT
> }
>
> writeall (1, "debuggee finishing now.\n", strlen ("debuggee finishing now.\n"));
> exit (0);
> }
>
> int
> main()
> {
> int rv, status, i=0;
> pid_t pid;
> int sfd = -1;
> int mfd;
> #ifdef USEPTY
> mfd = my_openpt(&sfd); /* Get a pseudo-tty pair. */
> if(mfd < 0)
> {
> fprintf(stderr, "Failed to create pty\n");
> return(1);
> }
> #else
> int pipefd[2];
> if (pipe (pipefd) < 0)
> {
> perror ("pipe");
> return 1;
> }
> mfd = pipefd[0];
> sfd = pipefd[1];
> #endif
>
> /*
> ** Create a child process.
> */
> pid = fork();
> switch(pid)
> {
> case -1: /* failed fork */
> return -1;
> case 0: /* child process */
>
> close (mfd);
> /*
> ** Close stdout, use the slave pty for output.
> */
> dup2(sfd, STDOUT_FILENO);
>
>
> /*
> ** Set 'TRACEME' so this child process can be traced by the
> ** parent process.
> */
> ptrace(PTRACE_TRACEME,
> PTRACE_IGNORED, PTRACE_IGNORED, PTRACE_IGNORED);
> thechild ();
> break;
>
> default: /* parent process drops out of switch */
> close (sfd);
> break;
> }
>
> /*
> ** Wait for the debuggee to hit the traceme.
> ** When we see this, immediately send a PTRACE_CONT to kick off
> ** the debuggee..
> */
> rv = waitpid(pid, &status, 0);
> if(WIFSTOPPED(status))
> {
> printf("Debuggee initial stop. Sending PTRACE_CONT..\n");
> ptrace(PTRACE_CONT, pid, PTRACE_IGNORED,PTRACE_IGNORED);
> }
> else
> {
> printf("Failure of debuggee to hit initial 'traceme'\n");
> return 99;
> }
>
> /*
> ** Loop around, waiting for the debuggee to hit its own breakpoint.
> ** Look for output on the pty. Each time around we should see a line
> ** from the debuggee.
> **/
> while(1)
> {
> int pollrc, stopped, exited;
>
> //printf("Debugger doing a waitpid on debuggee..(%d)\n", i);
> i++;
>
> rv = waitpid(pid, &status, 0);
> stopped=0;
> exited=0;
>
> /*
> ** Ok, the waitpid has returned. What's happened to the debuggee?
> ** Note we do recheck rv and drop out later if it's -1.
> */
> if(rv == -1)
> {
> printf("Debuggee process has died?. Checking for output.\n");
> }
> else if(WIFSTOPPED(status))
> {
> //printf("Debuggee process has stopped. Checking for output.\n");
> stopped=1;
> }
> else if(!WIFEXITED(status))
> {
> printf("Debuggee has exited. Checking for output.\n");
> exited=1;
> }
> else
> {
> printf("WEXITSTATUS is %d\n", WEXITSTATUS(status));
> exited=1;
> }
>
> /*
> ** See if there's anything to read. If so display it.
> ** There always should be, the debuggee writes output on each
> ** iteration and fflush's it.
> */
> (void) DoPollRead(mfd);
>
> /*
> ** If the debuggee has stopped tell it to continue. If it's
> ** exited detach from it.
> */
> if(stopped)
> {
> //puts("Sending PTRACE_CONT");
> ptrace(PTRACE_CONT, pid, PTRACE_IGNORED,PTRACE_IGNORED);
> } else if(exited)
> {
> printf("Debuggee exited, detaching\n");
> ptrace(PTRACE_DETACH, pid, PTRACE_IGNORED,PTRACE_IGNORED);
> return 0;
> } else if(rv == -1)
> {
> printf("Debuggee died? Leaving.\n");
> return 98;
> }
> }
> }
> ------------------------------------------------------
>
>
>
> From: NeilBrown <[email protected]>
> Subject: [PATCH] n_tty: Sometimes wait for buffer work in read() loop
>
> Since commit
> f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")
>
> it as been possible for poll to report that there is no data to read
> on a master-pty even if a write to the slave has actually completed.
>
> That patch removes a 'wait' when the wait isn't really necessary.
> Unfortunately it also removed it in the case when it *is* necessary.
> If the simple tests show that there is nothing to read, we really need
> to flush the work queue in case there is something ready but which
> hasn't arrived yet.
>
> This patch restores the wait, but only if simple tests suggest there
> is nothing ready.
>
> Reported-by: Nic Percival <[email protected]>
> Reported-by: Michael Matz <[email protected]>
> Fixes: f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")
> Signed-off-by: NeilBrown <[email protected]>
>
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index cf6e0f2e1331..9884091819b6 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -1942,11 +1942,18 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
> {
> struct n_tty_data *ldata = tty->disc_data;
> int amt = poll && !TIME_CHAR(tty) && MIN_CHAR(tty) ? MIN_CHAR(tty) : 1;
> -
> - if (ldata->icanon && !L_EXTPROC(tty))
> - return ldata->canon_head != ldata->read_tail;
> - else
> - return ldata->commit_head - ldata->read_tail >= amt;
> + int i;
> + int ret = 0;
> +
> + for (i = 0; !ret && i < 2; i++) {
> + if (i)
> + tty_flush_to_ldisc(tty);
> + if (ldata->icanon && !L_EXTPROC(tty))
> + ret = (ldata->canon_head != ldata->read_tail);
> + else
> + ret = (ldata->commit_head - ldata->read_tail >= amt);
> + }
> + return ret;
> }
>
> /**
>

2015-05-04 12:24:22

by Michael Matz

[permalink] [raw]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

Hi,

On Fri, 1 May 2015, Peter Hurley wrote:

> I don't think this a real bug, in the sense that pty i/o is not
> synchronous, in the same way that tty i/o is not synchronous.

Here's what I wrote internally about my speculations about this being a
bug or not:

> > I also never hit it with pipes (remove the USEPTY define), also not on
> > sle12, so it must be some change specific to the pty implementation.
> >
> > Now, all of this is of course unspecified. There are two asynchronous
> > processes involved, and a buffered tube between them. Just because
> > one process filled one end of the tube (the breakpoint was hit)
> > doesn't mean the contents have to appear at that instant at the other
> > end. So the change in behaviour in sle12 is not a genuine bug. It
> > _might_ be an unintented change, though, that's why kernel people
> > should comment on this. If there are no terribly good reasons for
> > this change I'd consider it a quality-of-implementation regression in
> > sle12.

So, I'd accept this being declared a non-bug, but it is certainly a change
in behaviour that's visible for our debugger team.

> However, that said, if this is a regression (regression as in "it broke
> something that used to work", not regression as in "this new thing I'm
> writing doesn't behave the way I want it to" :) )
>
> Help me understand the use-case here: are you using pty i/o to debug the
> debugger?

Nic is working on the Cobol debugger, but I think this pty i/o is rather a
part of the normal interaction between a debugged Cobol process and the
debugger; that's just a theory, Nic is authorative here. But this change
in behaviour _did_ result in real testsuite regressions, so it's not
something that he wanted to write from scratch.

(FWIW: I do think it's a better QoI factor if something returns data from
a tube if we can know via side channels (break points) that something must
have been written locally to the other end of the tube, if that can be
ensured without too much other work)


Ciao,
Michael.

2015-05-04 16:32:15

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

On 05/04/2015 08:24 AM, Michael Matz wrote:
> Hi,
>
> On Fri, 1 May 2015, Peter Hurley wrote:
>
>> I don't think this a real bug, in the sense that pty i/o is not
>> synchronous, in the same way that tty i/o is not synchronous.
>
> Here's what I wrote internally about my speculations about this being a
> bug or not:
>
>>> I also never hit it with pipes (remove the USEPTY define), also not on
>>> sle12, so it must be some change specific to the pty implementation.
>>>
>>> Now, all of this is of course unspecified. There are two asynchronous
>>> processes involved, and a buffered tube between them. Just because
>>> one process filled one end of the tube (the breakpoint was hit)
>>> doesn't mean the contents have to appear at that instant at the other
>>> end. So the change in behaviour in sle12 is not a genuine bug. It
>>> _might_ be an unintented change, though, that's why kernel people
>>> should comment on this. If there are no terribly good reasons for
>>> this change I'd consider it a quality-of-implementation regression in
>>> sle12.
>
> So, I'd accept this being declared a non-bug, but it is certainly a change
> in behaviour that's visible for our debugger team.
>
>> However, that said, if this is a regression (regression as in "it broke
>> something that used to work", not regression as in "this new thing I'm
>> writing doesn't behave the way I want it to" :) )
>>
>> Help me understand the use-case here: are you using pty i/o to debug the
>> debugger?
>
> Nic is working on the Cobol debugger, but I think this pty i/o is rather a
> part of the normal interaction between a debugged Cobol process and the
> debugger; that's just a theory, Nic is authorative here. But this change
> in behaviour _did_ result in real testsuite regressions, so it's not
> something that he wanted to write from scratch.

I'd like to understand why the debugger cares about when pty i/o shows up
and why there is a testsuite to check for that.

Does the debuggee know about the debugger, or is the pty i/o just stdout/stderr?

This doesn't seem stable in the face of multiple threads of execution in
the debuggee (or grandchild processes); IOW, pty slave writes from the
debuggee may continue from other non-TRACEME threads. Presumably that i/o
isn't being read either.

> (FWIW: I do think it's a better QoI factor if something returns data from
> a tube if we can know via side channels (break points) that something must
> have been written locally to the other end of the tube, if that can be
> ensured without too much other work)

Well, if the debugger simply continues to monitor the pty master, the i/o
will arrive.

I think it would be a shame if ptrace() usage forced a whole class of
i/o to be synchronous.

Regards,
Peter Hurley

2015-05-04 16:56:55

by Michael Matz

[permalink] [raw]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

Hi,

On Mon, 4 May 2015, Peter Hurley wrote:

> I think it would be a shame if ptrace() usage forced a whole class of
> i/o to be synchronous.

I think Neils patch doesn't do that, does it? If it has an indication
that in fact some data must be there but isn't it pulls it, otherwise it's
the same as after your patch/current state?

(Leaving the debugger question to Nic; but I guess it's similar interface
like gdb. Once you come back into the debugger (breakpoint hit) it looks
once for input on the tubes of the debuggee and then enters a prompt; it
doesn't continue looking for input until you continue the debuggee (1).
ptys would be used because it's Cobol, the programs contain data entry
masks presumably needing a real tty, not just pipes. That usecase would
be broken now; the tty provided by the debugger doesn't reflect the real
screen that the debuggee actually generated before the breakpoint. Note
how pipes in my test program _are_ synchronuous in this sense, just ptys
aren't.)


Ciao,
Michael.

(1) And in a single threaded debugger (no matter if the debuggee is
multithreaded) it would be awkward to implement. After read returns 0
you'd again call poll, which indicates data is there, and read again.
You repeat that until $SOMEWHEN. But when is it enough? Two loops, 10,
1000? To not sit in a tight loop you'd also add some nanosleeps, but that
would add unnecessary lags.

Basically, whenever poll indicates that read won't block then it should
also return some data, not 0, if at all reasonably implementable; i.e.
some progress should be guaranteed. I realize that this isn't always the
case, but here it is. In code, this loop:

while (poll ([fd, POLLIN], 0) == 1)
// So, read won't block, yippie
if (read (fd, ...) == 0)
continue;

shouldn't become a tight loop, without the read making progress but the
kernel continuously stating "yep, there's data available", until some
random point in the future.

2015-05-04 18:42:53

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

On 05/04/2015 12:56 PM, Michael Matz wrote:
> Hi,
>
> On Mon, 4 May 2015, Peter Hurley wrote:
>
>> I think it would be a shame if ptrace() usage forced a whole class of
>> i/o to be synchronous.
>
> I think Neils patch doesn't do that, does it?

Yes. Non-blocking i/o would now have to wait until the input worker has
completed (at a time when the input worker may not even be running on a
cpu yet).

> If it has an indication
> that in fact some data must be there but isn't it pulls it, otherwise it's
> the same as after your patch/current state?

At the point when the write() returns from the child process, the input
worker may not even have run yet. The patch will now force the reader to
wait until the input worker has started and run to completion.

Moreover, prior to 4.1, the read i/o loop is sloppy wrt kicking the
the input worker, so there is every likelihood that this patch would
force extra waits on a non-blocking reader even though no input was actually
being written.

> (Leaving the debugger question to Nic; but I guess it's similar interface
> like gdb. Once you come back into the debugger (breakpoint hit) it looks
> once for input on the tubes of the debuggee and then enters a prompt; it
> doesn't continue looking for input until you continue the debuggee (1).

That's exactly what gdb does. Below is simple test jig [2] where the child
outputs while at the gdb prompt.


> ptys would be used because it's Cobol, the programs contain data entry
> masks presumably needing a real tty, not just pipes. That usecase would
> be broken now; the tty provided by the debugger doesn't reflect the real
> screen that the debuggee actually generated before the breakpoint. Note
> how pipes in my test program _are_ synchronuous in this sense, just ptys
> aren't.)

What's interesting about that expectation is that it would never work
on an actual tty.


> (1) And in a single threaded debugger (no matter if the debuggee is
> multithreaded) it would be awkward to implement. After read returns 0
> you'd again call poll, which indicates data is there, and read again.
> You repeat that until $SOMEWHEN. But when is it enough? Two loops, 10,
> 1000? To not sit in a tight loop you'd also add some nanosleeps, but that
> would add unnecessary lags.

That's not what's happening.

poll() with a timeout of 0 returns immediately, even if no file descriptors
are ready for i/o. The poll() is returning 0 because there is no data to read,
so the loop is exiting.

If poll() returned non-zero and read() returned no data, that would
definitely be a bug.

> Basically, whenever poll indicates that read won't block then it should
> also return some data, not 0, if at all reasonably implementable; i.e.
> some progress should be guaranteed.

ttys have a further refinement of the behavior of read() and poll(), which
is controlled by the VMIN and VTIME values in the termios structure.

But note: the pty master does not allow its termios to be programmed --
a pty master read() is always non-blocking.

> I realize that this isn't always the
> case, but here it is. In code, this loop:
>
> while (poll ([fd, POLLIN], 0) == 1)
> // So, read won't block, yippie
> if (read (fd, ...) == 0)
> continue;
>
> shouldn't become a tight loop, without the read making progress but the
> kernel continuously stating "yep, there's data available", until some
> random point in the future.

Yeah, that would be broken but, again, that's not what's happening here.

Regards,
Peter Hurley


--- >% ---
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>

#define BRKPT asm("int $3")

void child()
{
sleep(1);
printf("Hello, world!");
}

int main()
{
int child_id;

setbuf(stdout, NULL);


child_id = fork();
switch (child_id) {
case -1:
printf("fork: %s (code: %d)\n", strerror(errno), errno);
exit(EXIT_FAILURE);

case 0:
child();
break;

default: /* parent */
BRKPT;
break;
}

return 0;
}

2015-05-05 09:56:23

by Nic Percival

[permalink] [raw]
Subject: RE: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

Michael is correct.
Our COBOL debugger has a test feature whereby we can drive it to step through debugging code, hitting breakpoints and so on.
The debugger maintains a 'user screen' which is what the 'debuggee' process has displayed.
This is communicated to the debugger with pseudo-tty's.
The state of this user screen is checked as part of this (and other) tests.

The actual test failure is a failure of some text to be displayed on the debuggee user screen when we know, given it has hit a certain breakpoint, that the text has been written.

What is worse is its non-deterministic. Sometimes the text makes it and is displayed, so it wouldn't even be practical to modify the test to make it pass.
We wouldn't really want to do that anyway - the test is just fine on other earlier SUSE, on RedHat (intel and 390), HP/UX, AIX and Solaris.

Thanks,
Nic

-----Original Message-----
From: Michael Matz [mailto:[email protected]]
Sent: 04 May 2015 13:24
To: Peter Hurley
Cc: NeilBrown; Nic Percival; Greg Kroah-Hartman; Jiri Slaby; [email protected]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

Hi,

On Fri, 1 May 2015, Peter Hurley wrote:

> I don't think this a real bug, in the sense that pty i/o is not
> synchronous, in the same way that tty i/o is not synchronous.

Here's what I wrote internally about my speculations about this being a bug or not:

> > I also never hit it with pipes (remove the USEPTY define), also not
> > on sle12, so it must be some change specific to the pty implementation.
> >
> > Now, all of this is of course unspecified. There are two
> > asynchronous processes involved, and a buffered tube between them.
> > Just because one process filled one end of the tube (the breakpoint
> > was hit) doesn't mean the contents have to appear at that instant at
> > the other end. So the change in behaviour in sle12 is not a genuine
> > bug. It _might_ be an unintented change, though, that's why kernel
> > people should comment on this. If there are no terribly good
> > reasons for this change I'd consider it a quality-of-implementation
> > regression in sle12.

So, I'd accept this being declared a non-bug, but it is certainly a change in behaviour that's visible for our debugger team.

> However, that said, if this is a regression (regression as in "it
> broke something that used to work", not regression as in "this new
> thing I'm writing doesn't behave the way I want it to" :) )
>
> Help me understand the use-case here: are you using pty i/o to debug
> the debugger?

Nic is working on the Cobol debugger, but I think this pty i/o is rather a part of the normal interaction between a debugged Cobol process and the debugger; that's just a theory, Nic is authorative here. But this change in behaviour _did_ result in real testsuite regressions, so it's not something that he wanted to write from scratch.

(FWIW: I do think it's a better QoI factor if something returns data from a tube if we can know via side channels (break points) that something must have been written locally to the other end of the tube, if that can be ensured without too much other work)


Ciao,
Michael.


This message has been scanned for malware by Websense. http://www.websense.com

2015-05-05 11:19:01

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'


A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top


On 05/05/2015 04:20 AM, Nic Percival wrote:
> Michael is correct.
> Our COBOL debugger has a test feature whereby we can drive it to step through debugging code, hitting breakpoints and so on.
> The debugger maintains a 'user screen' which is what the 'debuggee' process has displayed.
> This is communicated to the debugger with pseudo-tty's.
> The state of this user screen is checked as part of this (and other) tests.

So the debugger doesn't display output from other non-TRACEME threads or
child processes of the debuggee, right?

When that's fixed, you'll see that the "test failure" has gone away.

> The actual test failure is a failure of some text to be displayed on the debuggee user screen when we know, given it has hit a certain breakpoint, that the text has been written.
>
> What is worse is its non-deterministic.

That your test is non-deterministic stems from the fact that
the i/o is asynchronous.

You would experience the same problem if your test setup
was a tty in loopback.

> Sometimes the text makes it and is displayed, so it wouldn't even be practical to modify the test to make it pass.
> We wouldn't really want to do that anyway - the test is just fine on other earlier SUSE, on RedHat (intel and 390), HP/UX, AIX and Solaris.

There is a reason Linux is the platform of choice for scalability.

Regards,
Peter Hurley

> -----Original Message-----
> From: Michael Matz [mailto:[email protected]]
> Sent: 04 May 2015 13:24
> To: Peter Hurley
> Cc: NeilBrown; Nic Percival; Greg Kroah-Hartman; Jiri Slaby; [email protected]
> Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'
>
> Hi,
>
> On Fri, 1 May 2015, Peter Hurley wrote:
>
>> I don't think this a real bug, in the sense that pty i/o is not
>> synchronous, in the same way that tty i/o is not synchronous.
>
> Here's what I wrote internally about my speculations about this being a bug or not:
>
>>> I also never hit it with pipes (remove the USEPTY define), also not
>>> on sle12, so it must be some change specific to the pty implementation.
>>>
>>> Now, all of this is of course unspecified. There are two
>>> asynchronous processes involved, and a buffered tube between them.
>>> Just because one process filled one end of the tube (the breakpoint
>>> was hit) doesn't mean the contents have to appear at that instant at
>>> the other end. So the change in behaviour in sle12 is not a genuine
>>> bug. It _might_ be an unintented change, though, that's why kernel
>>> people should comment on this. If there are no terribly good
>>> reasons for this change I'd consider it a quality-of-implementation
>>> regression in sle12.
>
> So, I'd accept this being declared a non-bug, but it is certainly a change in behaviour that's visible for our debugger team.
>
>> However, that said, if this is a regression (regression as in "it
>> broke something that used to work", not regression as in "this new
>> thing I'm writing doesn't behave the way I want it to" :) )
>>
>> Help me understand the use-case here: are you using pty i/o to debug
>> the debugger?
>
> Nic is working on the Cobol debugger, but I think this pty i/o is rather a part of the normal interaction between a debugged Cobol process and the debugger; that's just a theory, Nic is authorative here. But this change in behaviour _did_ result in real testsuite regressions, so it's not something that he wanted to write from scratch.
>
> (FWIW: I do think it's a better QoI factor if something returns data from a tube if we can know via side channels (break points) that something must have been written locally to the other end of the tube, if that can be ensured without too much other work)
>
>
> Ciao,
> Michael.
>
>
> This message has been scanned for malware by Websense. http://www.websense.com
>

2015-05-05 12:04:26

by Nic Percival

[permalink] [raw]
Subject: RE: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'


There is only ever one debuggee process.
My original demo (and indeed the original test failure) is not threaded. The debugger is multi-threaded.

I've brought in Chris, Fletch and Paul, my immediate colleagues, into the discussion.

The email thread is getting a little tangled, however, from my standpoint I have..

1) poll tells us we have nothing to read on a pty, when we know something was written into the other end.

2) Given that 'poll' is not telling us that data has been written into the pty, what can we use? Surely that is what poll is for.

3) If a debuggee program has displayed 'how old are you?' and then hit a breakpoint on the 'ACCEPT' response, then the question might very well not be displayed, despite the debugger sitting on the statement some way subsequent to the display.

4) If I understand correctly, the modification is a performance enhancement. Obviously in the case of 'ptrace' debugging, performance is not a requirement.

5) Given 'xterm' use pty's, could a scenario happen where a user is prompted 'How old are you?' in the xterm, but an input (getchar, whatever) is hit before that output is displayed? With or without ptrace?

Thanks,
Nic



-----Original Message-----
From: Peter Hurley [mailto:[email protected]]
Sent: 05 May 2015 12:19
To: Nic Percival; Michael Matz
Cc: NeilBrown; Greg Kroah-Hartman; Jiri Slaby; [email protected]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'


A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top


On 05/05/2015 04:20 AM, Nic Percival wrote:
> Michael is correct.
> Our COBOL debugger has a test feature whereby we can drive it to step through debugging code, hitting breakpoints and so on.
> The debugger maintains a 'user screen' which is what the 'debuggee' process has displayed.
> This is communicated to the debugger with pseudo-tty's.
> The state of this user screen is checked as part of this (and other) tests.

So the debugger doesn't display output from other non-TRACEME threads or child processes of the debuggee, right?

When that's fixed, you'll see that the "test failure" has gone away.

> The actual test failure is a failure of some text to be displayed on the debuggee user screen when we know, given it has hit a certain breakpoint, that the text has been written.
>
> What is worse is its non-deterministic.

That your test is non-deterministic stems from the fact that the i/o is asynchronous.

You would experience the same problem if your test setup was a tty in loopback.

> Sometimes the text makes it and is displayed, so it wouldn't even be practical to modify the test to make it pass.
> We wouldn't really want to do that anyway - the test is just fine on other earlier SUSE, on RedHat (intel and 390), HP/UX, AIX and Solaris.

There is a reason Linux is the platform of choice for scalability.

Regards,
Peter Hurley

> -----Original Message-----
> From: Michael Matz [mailto:[email protected]]
> Sent: 04 May 2015 13:24
> To: Peter Hurley
> Cc: NeilBrown; Nic Percival; Greg Kroah-Hartman; Jiri Slaby;
> [email protected]
> Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'
>
> Hi,
>
> On Fri, 1 May 2015, Peter Hurley wrote:
>
>> I don't think this a real bug, in the sense that pty i/o is not
>> synchronous, in the same way that tty i/o is not synchronous.
>
> Here's what I wrote internally about my speculations about this being a bug or not:
>
>>> I also never hit it with pipes (remove the USEPTY define), also not
>>> on sle12, so it must be some change specific to the pty implementation.
>>>
>>> Now, all of this is of course unspecified. There are two
>>> asynchronous processes involved, and a buffered tube between them.
>>> Just because one process filled one end of the tube (the breakpoint
>>> was hit) doesn't mean the contents have to appear at that instant at
>>> the other end. So the change in behaviour in sle12 is not a genuine
>>> bug. It _might_ be an unintented change, though, that's why kernel
>>> people should comment on this. If there are no terribly good
>>> reasons for this change I'd consider it a quality-of-implementation
>>> regression in sle12.
>
> So, I'd accept this being declared a non-bug, but it is certainly a change in behaviour that's visible for our debugger team.
>
>> However, that said, if this is a regression (regression as in "it
>> broke something that used to work", not regression as in "this new
>> thing I'm writing doesn't behave the way I want it to" :) )
>>
>> Help me understand the use-case here: are you using pty i/o to debug
>> the debugger?
>
> Nic is working on the Cobol debugger, but I think this pty i/o is rather a part of the normal interaction between a debugged Cobol process and the debugger; that's just a theory, Nic is authorative here. But this change in behaviour _did_ result in real testsuite regressions, so it's not something that he wanted to write from scratch.
>
> (FWIW: I do think it's a better QoI factor if something returns data
> from a tube if we can know via side channels (break points) that
> something must have been written locally to the other end of the tube,
> if that can be ensured without too much other work)
>
>
> Ciao,
> Michael.
>
>
> This message has been scanned for malware by Websense.
> http://www.websense.com
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-05-05 13:29:16

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

Stop top-posting.

On 05/05/2015 08:03 AM, Nic Percival wrote:
>
> There is only ever one debuggee process.
> My original demo (and indeed the original test failure) is not threaded. The debugger is multi-threaded.
>
> I've brought in Chris, Fletch and Paul, my immediate colleagues, into the discussion.
>
> The email thread is getting a little tangled, however, from my standpoint I have..
>
> 1) poll tells us we have nothing to read on a pty, when we know something was written into the other end.

You're using a synchronization mechanism (ptrace) to validate an
asynchronous process (tty i/o). That's not going to work.

> 2) Given that 'poll' is not telling us that data has been written into the pty, what can we use? Surely that is what poll is for.

poll() doesn't tell you that nothing has been written.

You're inferring that using a broken understanding of terminal i/o:
ttys are not synchronous pipes.

> 3) If a debuggee program has displayed 'how old are you?' and then hit a breakpoint on the 'ACCEPT' response, then the question might very well not be displayed, despite the debugger sitting on the statement some way subsequent to the display.

Let's extend your logic process here to a general-purpose debugger
that can control all output devices.

1. The debugger and debuggee are running on X-Windows.
2. The debuggee outputs 'how old are you?'
3. The debugger immediately halts the debuggee and all output devices.

The output will not appear on the monitor because X-Windows
output is asynchronous. So is terminal i/o.


> 4) If I understand correctly, the modification is a performance enhancement. Obviously in the case of 'ptrace' debugging, performance is not a requirement.

Nothing obvious about it. Not all uses of ptrace are interactive, and certainly don't
want alternate behavior based on whether the process is ptraced.

> 5) Given 'xterm' use pty's, could a scenario happen where a user is prompted 'How old are you?' in the xterm, but an input (getchar, whatever) is hit before that output is displayed? With or without ptrace?

Of course. It's called typeahead. Since tty i/o is buffered, the following is possible:

1. The user types '15\r'
2. The process writes 'How old are you?'
3. The process reads '15\n'

Processes that don't want typeahead call tcflush() before reading input.

Regards,
Peter Hurley

> Thanks,
> Nic
>
>
>
> -----Original Message-----
> From: Peter Hurley [mailto:[email protected]]
> Sent: 05 May 2015 12:19
> To: Nic Percival; Michael Matz
> Cc: NeilBrown; Greg Kroah-Hartman; Jiri Slaby; [email protected]
> Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'
>
>
> A: No.
> Q: Should I include quotations after my reply?
>
> http://daringfireball.net/2007/07/on_top
>
>
> On 05/05/2015 04:20 AM, Nic Percival wrote:
>> Michael is correct.
>> Our COBOL debugger has a test feature whereby we can drive it to step through debugging code, hitting breakpoints and so on.
>> The debugger maintains a 'user screen' which is what the 'debuggee' process has displayed.
>> This is communicated to the debugger with pseudo-tty's.
>> The state of this user screen is checked as part of this (and other) tests.
>
> So the debugger doesn't display output from other non-TRACEME threads or child processes of the debuggee, right?
>
> When that's fixed, you'll see that the "test failure" has gone away.
>
>> The actual test failure is a failure of some text to be displayed on the debuggee user screen when we know, given it has hit a certain breakpoint, that the text has been written.
>>
>> What is worse is its non-deterministic.
>
> That your test is non-deterministic stems from the fact that the i/o is asynchronous.
>
> You would experience the same problem if your test setup was a tty in loopback.
>
>> Sometimes the text makes it and is displayed, so it wouldn't even be practical to modify the test to make it pass.
>> We wouldn't really want to do that anyway - the test is just fine on other earlier SUSE, on RedHat (intel and 390), HP/UX, AIX and Solaris.
>
> There is a reason Linux is the platform of choice for scalability.
>
> Regards,
> Peter Hurley
>
>> -----Original Message-----
>> From: Michael Matz [mailto:[email protected]]
>> Sent: 04 May 2015 13:24
>> To: Peter Hurley
>> Cc: NeilBrown; Nic Percival; Greg Kroah-Hartman; Jiri Slaby;
>> [email protected]
>> Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'
>>
>> Hi,
>>
>> On Fri, 1 May 2015, Peter Hurley wrote:
>>
>>> I don't think this a real bug, in the sense that pty i/o is not
>>> synchronous, in the same way that tty i/o is not synchronous.
>>
>> Here's what I wrote internally about my speculations about this being a bug or not:
>>
>>>> I also never hit it with pipes (remove the USEPTY define), also not
>>>> on sle12, so it must be some change specific to the pty implementation.
>>>>
>>>> Now, all of this is of course unspecified. There are two
>>>> asynchronous processes involved, and a buffered tube between them.
>>>> Just because one process filled one end of the tube (the breakpoint
>>>> was hit) doesn't mean the contents have to appear at that instant at
>>>> the other end. So the change in behaviour in sle12 is not a genuine
>>>> bug. It _might_ be an unintented change, though, that's why kernel
>>>> people should comment on this. If there are no terribly good
>>>> reasons for this change I'd consider it a quality-of-implementation
>>>> regression in sle12.
>>
>> So, I'd accept this being declared a non-bug, but it is certainly a change in behaviour that's visible for our debugger team.
>>
>>> However, that said, if this is a regression (regression as in "it
>>> broke something that used to work", not regression as in "this new
>>> thing I'm writing doesn't behave the way I want it to" :) )
>>>
>>> Help me understand the use-case here: are you using pty i/o to debug
>>> the debugger?
>>
>> Nic is working on the Cobol debugger, but I think this pty i/o is rather a part of the normal interaction between a debugged Cobol process and the debugger; that's just a theory, Nic is authorative here. But this change in behaviour _did_ result in real testsuite regressions, so it's not something that he wanted to write from scratch.
>>
>> (FWIW: I do think it's a better QoI factor if something returns data
>> from a tube if we can know via side channels (break points) that
>> something must have been written locally to the other end of the tube,
>> if that can be ensured without too much other work)
>>
>>
>> Ciao,
>> Michael.
>>
>>
>> This message has been scanned for malware by Websense.
>> http://www.websense.com
>>
>

2015-05-05 16:09:44

by Chris Purvis

[permalink] [raw]
Subject: RE: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

Peter,

So is calling tcflush() a solution here?

Regards,
Chris
--
Chris Purvis
Senior Development Manager

Micro Focus

[email protected]
The Lawn, 22-30 Old Bath Road
Newbury, Berkshire, RG14 1QN, UK
Direct: +44 1635 565282


-----Original Message-----
From: Peter Hurley [mailto:[email protected]]
Sent: 05 May 2015 14:29
To: Nic Percival; Michael Matz; Kevin Fletcher; Paul Matthews; Chris Purvis
Cc: NeilBrown; Greg Kroah-Hartman; Jiri Slaby; [email protected]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

Stop top-posting.

On 05/05/2015 08:03 AM, Nic Percival wrote:
>
> There is only ever one debuggee process.
> My original demo (and indeed the original test failure) is not threaded. The debugger is multi-threaded.
>
> I've brought in Chris, Fletch and Paul, my immediate colleagues, into the discussion.
>
> The email thread is getting a little tangled, however, from my standpoint I have..
>
> 1) poll tells us we have nothing to read on a pty, when we know something was written into the other end.

You're using a synchronization mechanism (ptrace) to validate an
asynchronous process (tty i/o). That's not going to work.

> 2) Given that 'poll' is not telling us that data has been written into the pty, what can we use? Surely that is what poll is for.

poll() doesn't tell you that nothing has been written.

You're inferring that using a broken understanding of terminal i/o:
ttys are not synchronous pipes.

> 3) If a debuggee program has displayed 'how old are you?' and then hit a breakpoint on the 'ACCEPT' response, then the question might very well not be displayed, despite the debugger sitting on the statement some way subsequent to the display.

Let's extend your logic process here to a general-purpose debugger
that can control all output devices.

1. The debugger and debuggee are running on X-Windows.
2. The debuggee outputs 'how old are you?'
3. The debugger immediately halts the debuggee and all output devices.

The output will not appear on the monitor because X-Windows
output is asynchronous. So is terminal i/o.


> 4) If I understand correctly, the modification is a performance enhancement. Obviously in the case of 'ptrace' debugging, performance is not a requirement.

Nothing obvious about it. Not all uses of ptrace are interactive, and certainly don't
want alternate behavior based on whether the process is ptraced.

> 5) Given 'xterm' use pty's, could a scenario happen where a user is prompted 'How old are you?' in the xterm, but an input (getchar, whatever) is hit before that output is displayed? With or without ptrace?

Of course. It's called typeahead. Since tty i/o is buffered, the following is possible:

1. The user types '15\r'
2. The process writes 'How old are you?'
3. The process reads '15\n'

Processes that don't want typeahead call tcflush() before reading input.

Regards,
Peter Hurley

> Thanks,
> Nic
>
>
>
> -----Original Message-----
> From: Peter Hurley [mailto:[email protected]]
> Sent: 05 May 2015 12:19
> To: Nic Percival; Michael Matz
> Cc: NeilBrown; Greg Kroah-Hartman; Jiri Slaby; [email protected]
> Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'
>
>
> A: No.
> Q: Should I include quotations after my reply?
>
> http://daringfireball.net/2007/07/on_top
>
>
> On 05/05/2015 04:20 AM, Nic Percival wrote:
>> Michael is correct.
>> Our COBOL debugger has a test feature whereby we can drive it to step through debugging code, hitting breakpoints and so on.
>> The debugger maintains a 'user screen' which is what the 'debuggee' process has displayed.
>> This is communicated to the debugger with pseudo-tty's.
>> The state of this user screen is checked as part of this (and other) tests.
>
> So the debugger doesn't display output from other non-TRACEME threads or child processes of the debuggee, right?
>
> When that's fixed, you'll see that the "test failure" has gone away.
>
>> The actual test failure is a failure of some text to be displayed on the debuggee user screen when we know, given it has hit a certain breakpoint, that the text has been written.
>>
>> What is worse is its non-deterministic.
>
> That your test is non-deterministic stems from the fact that the i/o is asynchronous.
>
> You would experience the same problem if your test setup was a tty in loopback.
>
>> Sometimes the text makes it and is displayed, so it wouldn't even be practical to modify the test to make it pass.
>> We wouldn't really want to do that anyway - the test is just fine on other earlier SUSE, on RedHat (intel and 390), HP/UX, AIX and Solaris.
>
> There is a reason Linux is the platform of choice for scalability.
>
> Regards,
> Peter Hurley
>
>> -----Original Message-----
>> From: Michael Matz [mailto:[email protected]]
>> Sent: 04 May 2015 13:24
>> To: Peter Hurley
>> Cc: NeilBrown; Nic Percival; Greg Kroah-Hartman; Jiri Slaby;
>> [email protected]
>> Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'
>>
>> Hi,
>>
>> On Fri, 1 May 2015, Peter Hurley wrote:
>>
>>> I don't think this a real bug, in the sense that pty i/o is not
>>> synchronous, in the same way that tty i/o is not synchronous.
>>
>> Here's what I wrote internally about my speculations about this being a bug or not:
>>
>>>> I also never hit it with pipes (remove the USEPTY define), also not
>>>> on sle12, so it must be some change specific to the pty implementation.
>>>>
>>>> Now, all of this is of course unspecified. There are two
>>>> asynchronous processes involved, and a buffered tube between them.
>>>> Just because one process filled one end of the tube (the breakpoint
>>>> was hit) doesn't mean the contents have to appear at that instant at
>>>> the other end. So the change in behaviour in sle12 is not a genuine
>>>> bug. It _might_ be an unintented change, though, that's why kernel
>>>> people should comment on this. If there are no terribly good
>>>> reasons for this change I'd consider it a quality-of-implementation
>>>> regression in sle12.
>>
>> So, I'd accept this being declared a non-bug, but it is certainly a change in behaviour that's visible for our debugger team.
>>
>>> However, that said, if this is a regression (regression as in "it
>>> broke something that used to work", not regression as in "this new
>>> thing I'm writing doesn't behave the way I want it to" :) )
>>>
>>> Help me understand the use-case here: are you using pty i/o to debug
>>> the debugger?
>>
>> Nic is working on the Cobol debugger, but I think this pty i/o is rather a part of the normal interaction between a debugged Cobol process and the debugger; that's just a theory, Nic is authorative here. But this change in behaviour _did_ result in real testsuite regressions, so it's not something that he wanted to write from scratch.
>>
>> (FWIW: I do think it's a better QoI factor if something returns data
>> from a tube if we can know via side channels (break points) that
>> something must have been written locally to the other end of the tube,
>> if that can be ensured without too much other work)
>>
>>
>> Ciao,
>> Michael.
>>
>>
>> This message has been scanned for malware by Websense.
>> http://www.websense.com
>>
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-05-05 13:36:00

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

On 05/05/2015 09:34 AM, Chris Purvis wrote:
> Peter,
>
> So is calling tcflush() a solution here?
>
> Regards,
> Chris

What is with the top-posting?

2015-05-05 13:40:02

by Chris Purvis

[permalink] [raw]
Subject: RE: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

A way of giving context.



This message has been scanned for malware by Websense. http://www.websense.com
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-05-05 22:17:42

by Chris Purvis

[permalink] [raw]
Subject: RE: [PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

Apologies all. Nic and I did not realise that an internal email had been cross-posted to a public mail list (let alone one with strict email formatting rules!), and were having a hard time understanding the context for the emails we were receiving.

I have a certain amount of experience of asynchronous communication and protocol design: we aren't novices in this area. One would hope in the kind of intra-machine protocol that we're using here that, if we know the sender is halted, there should be a way of clearing the contents of the channel so that the receiver can get hold of whatever has been put in to it.
Can tcflush() (or some similar API) be used to resolve our debugging scenario?

Regards,
Chris
--
Chris Purvis
Senior Development Manager

Micro Focus

[email protected]
The Lawn, 22-30 Old Bath Road
Newbury, Berkshire, RG14 1QN, UK
Direct: +44 1635 565282



This message has been scanned for malware by Websense. http://www.websense.com
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-05-05 22:59:21

by NeilBrown

[permalink] [raw]
Subject: [PATCH man-pages] pty.7: clarify asynchronous nature of PTY IO.


A PTY is not like a pipe - there may be delayed between
data being written at one end and it being available at the other.

This became particularly apparent after
commit f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")

in Linux 3.12

Signed-off-by: NeilBrown <[email protected]>

---

Peter: does this seem reasonable and accurate to you?

MichaelK: Would you prefer the commit ID in the man page. It isn't so much
a deliberate change as a code improvement which caused problems for certain
use cases which depended on undefined behaviour.
Thread at https://lkml.org/lkml/2015/5/1/35

NeilBrown


diff --git a/man7/pty.7 b/man7/pty.7
index 1332d11d9ca2..6c9ae182925c 100644
--- a/man7/pty.7
+++ b/man7/pty.7
@@ -56,6 +56,12 @@ terminal emulators,
and
.BR expect (1).

+Data flow between master and slave is handle asynchronously, much like
+data flow with a physical TTY. Data written to the slave will be
+available at the master promptly, but may not be available
+immediately. Similarly there may be a small processing delay between
+a write to the master, and the effect being visible at the slave.
+
Historically, two pseudoterminal APIs have evolved: BSD and System V.
SUSv1 standardized a pseudoterminal API based on the System V API,
and this API should be employed in all new programs that use


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature
Subject: Re: [PATCH man-pages] pty.7: clarify asynchronous nature of PTY IO.

Hi Neil,

On 05/06/2015 12:59 AM, NeilBrown wrote:
>
> A PTY is not like a pipe - there may be delayed between
> data being written at one end and it being available at the other.
> This became particularly apparent after
> commit f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")
>
> in Linux 3.12
>
> Signed-off-by: NeilBrown <[email protected]>
>
> ---

Thanks! Applied and tweaked a very little.

> Peter: does this seem reasonable and accurate to you?

I've got the patch sitting in a branch, in case Peter has suggestions for changes.

> MichaelK: Would you prefer the commit ID in the man page. It isn't so much
> a deliberate change as a code improvement which caused problems for certain
> use cases which depended on undefined behaviour.
> Thread at https://lkml.org/lkml/2015/5/1/35

No, it's fine as is, thanks.

Cheers,

Michael


> diff --git a/man7/pty.7 b/man7/pty.7
> index 1332d11d9ca2..6c9ae182925c 100644
> --- a/man7/pty.7
> +++ b/man7/pty.7
> @@ -56,6 +56,12 @@ terminal emulators,
> and
> .BR expect (1).
>
> +Data flow between master and slave is handle asynchronously, much like
> +data flow with a physical TTY. Data written to the slave will be
> +available at the master promptly, but may not be available
> +immediately. Similarly there may be a small processing delay between
> +a write to the master, and the effect being visible at the slave.
> +
> Historically, two pseudoterminal APIs have evolved: BSD and System V.
> SUSv1 standardized a pseudoterminal API based on the System V API,
> and this API should be employed in all new programs that use
>


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2015-05-06 13:36:14

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH man-pages] pty.7: clarify asynchronous nature of PTY IO.

On 05/06/2015 08:26 AM, Michael Kerrisk (man-pages) wrote:
> On 05/06/2015 12:59 AM, NeilBrown wrote:
>> Peter: does this seem reasonable and accurate to you?
>
> I've got the patch sitting in a branch, in case Peter has suggestions for changes.

This is fine with me.

Regards,
Peter Hurley

Subject: Re: [PATCH man-pages] pty.7: clarify asynchronous nature of PTY IO.

On 6 May 2015 at 15:36, Peter Hurley <[email protected]> wrote:
> On 05/06/2015 08:26 AM, Michael Kerrisk (man-pages) wrote:
>> On 05/06/2015 12:59 AM, NeilBrown wrote:
>>> Peter: does this seem reasonable and accurate to you?
>>
>> I've got the patch sitting in a branch, in case Peter has suggestions for changes.
>
> This is fine with me.

Okay. thanks, Peter.

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/