2016-04-07 12:53:34

by Cyrill Gorcunov

[permalink] [raw]
Subject: [PATCH resend v2] tty: n_tty -- Add new TIOCPEEKRAW ioctl to peek unread data

Currently when we checkpoint PTY connections we ignore data queued inside
read buffer of ldisk, such data is barely lost after the restore. So we
would like to have an ability to dump and restore it.

Here is a new ioctl code which simply copies data from read buffer into
the userspace without any additional processing (just like terminal is
sitting in a raw mode).

The idea behind is when we checkpointing PTY pairs we peek this unread
data into images on disk and on restore phase we find the linked peer
and write it back thus reading peer will see it in read buffer.

I tried several approaches (including simply reuse of canon_copy_from_read_buf
and copy_from_read_buf) but it makes code more complicated, so I end up
in plain copying of buffer data.

Strictly speaking, we could try handle it inside CRIU itself, as
Peter Hurley proposed (switch terminal to raw mode, read data,
and return original termios back) but in this case we may hit
the scenario when we read data and failed to restore it back
(due to any error) leaving the task we're dumping without
read buffer, thus peeking data looks like be the only
safe way.

v2:
- Use @char in ioctl definition.

Signed-off-by: Cyrill Gorcunov <[email protected]>
CC: Jiri Slaby <[email protected]>
CC: Peter Hurley <[email protected]>
CC: Greg Kroah-Hartman <[email protected]>
CC: Andrey Vagin <[email protected]>
CC: Pavel Emelianov <[email protected]>
CC: Vladimir Davydov <[email protected]>
CC: Konstantin Khorenko <[email protected]>
---
drivers/tty/n_tty.c | 22 ++++++++++++++++++++++
include/uapi/asm-generic/ioctls.h | 1 +
2 files changed, 23 insertions(+)

Index: linux-ml.git/drivers/tty/n_tty.c
===================================================================
--- linux-ml.git.orig/drivers/tty/n_tty.c
+++ linux-ml.git/drivers/tty/n_tty.c
@@ -2420,6 +2420,26 @@ static unsigned long inq_canon(struct n_
return nr;
}

+static ssize_t n_tty_peek_raw(struct tty_struct *tty, unsigned char __user *buf)
+{
+ struct n_tty_data *ldata = tty->disc_data;
+ ssize_t retval;
+
+ if (!mutex_trylock(&ldata->atomic_read_lock))
+ return -EAGAIN;
+
+ down_read(&tty->termios_rwsem);
+ retval = read_cnt(ldata);
+ if (retval) {
+ const unsigned char *from = read_buf_addr(ldata, ldata->read_tail);
+ if (copy_to_user(buf, from, retval))
+ retval = -EFAULT;
+ }
+ up_read(&tty->termios_rwsem);
+ mutex_unlock(&ldata->atomic_read_lock);
+ return retval;
+}
+
static int n_tty_ioctl(struct tty_struct *tty, struct file *file,
unsigned int cmd, unsigned long arg)
{
@@ -2437,6 +2457,8 @@ static int n_tty_ioctl(struct tty_struct
retval = read_cnt(ldata);
up_write(&tty->termios_rwsem);
return put_user(retval, (unsigned int __user *) arg);
+ case TIOCPEEKRAW:
+ return n_tty_peek_raw(tty, (unsigned char __user *) arg);
default:
return n_tty_ioctl_helper(tty, file, cmd, arg);
}
Index: linux-ml.git/include/uapi/asm-generic/ioctls.h
===================================================================
--- linux-ml.git.orig/include/uapi/asm-generic/ioctls.h
+++ linux-ml.git/include/uapi/asm-generic/ioctls.h
@@ -77,6 +77,7 @@
#define TIOCGPKT _IOR('T', 0x38, int) /* Get packet mode state */
#define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */
#define TIOCGEXCL _IOR('T', 0x40, int) /* Get exclusive mode state */
+#define TIOCPEEKRAW _IOR('T', 0x41, char)

#define FIONCLEX 0x5450
#define FIOCLEX 0x5451


2016-04-11 05:43:34

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH resend v2] tty: n_tty -- Add new TIOCPEEKRAW ioctl to peek unread data

Hi Cyrill,

On 04/07/2016 05:53 AM, Cyrill Gorcunov wrote:
> Currently when we checkpoint PTY connections we ignore data queued inside
> read buffer of ldisk, such data is barely lost after the restore. So we
> would like to have an ability to dump and restore it.

Do you want to be able to restore the exact state (like column and
newline markers)?

I'm wondering if a for-purpose checkpoint/restore ioctl would be more
appropriate?

A generic peek() may find other uses which could make future improvements
difficult.

Plus this ioctl() is only reading the 4k line discipline read buffer, and
not the tty buffers which could contain up to another 8k of unprocessed
input.


> Here is a new ioctl code which simply copies data from read buffer into
> the userspace without any additional processing (just like terminal is
> sitting in a raw mode).
>
> The idea behind is when we checkpointing PTY pairs we peek this unread
> data into images on disk and on restore phase we find the linked peer
> and write it back thus reading peer will see it in read buffer.

Have you frozen every process with either end open? I could see how this would
be easy for an entire process group but what if the processes are unrelated?
How do you ensure you have the correct linked peer with multiple devpts instances?


> I tried several approaches (including simply reuse of canon_copy_from_read_buf
> and copy_from_read_buf) but it makes code more complicated, so I end up
> in plain copying of buffer data.
>
> Strictly speaking, we could try handle it inside CRIU itself, as
> Peter Hurley proposed (switch terminal to raw mode, read data,
> and return original termios back) but in this case we may hit
> the scenario when we read data and failed to restore it back
> (due to any error) leaving the task we're dumping without
> read buffer, thus peeking data looks like be the only
> safe way.

Code comments below.


> v2:
> - Use @char in ioctl definition.
>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> CC: Jiri Slaby <[email protected]>
> CC: Peter Hurley <[email protected]>
> CC: Greg Kroah-Hartman <[email protected]>
> CC: Andrey Vagin <[email protected]>
> CC: Pavel Emelianov <[email protected]>
> CC: Vladimir Davydov <[email protected]>
> CC: Konstantin Khorenko <[email protected]>
> ---
> drivers/tty/n_tty.c | 22 ++++++++++++++++++++++
> include/uapi/asm-generic/ioctls.h | 1 +
> 2 files changed, 23 insertions(+)
>
> Index: linux-ml.git/drivers/tty/n_tty.c
> ===================================================================
> --- linux-ml.git.orig/drivers/tty/n_tty.c
> +++ linux-ml.git/drivers/tty/n_tty.c
> @@ -2420,6 +2420,26 @@ static unsigned long inq_canon(struct n_
> return nr;
> }
>
> +static ssize_t n_tty_peek_raw(struct tty_struct *tty, unsigned char __user *buf)

function name is redundant; n_tty_peek() is sufficient.

Needs a buffer-length parameter; could return the size required to completely
copy the ldisc read buffer, if smaller than required.


> +{
> + struct n_tty_data *ldata = tty->disc_data;
> + ssize_t retval;
> +
> + if (!mutex_trylock(&ldata->atomic_read_lock))
> + return -EAGAIN;
> +
> + down_read(&tty->termios_rwsem);

Why take these two locks if parallel operations are not expected?
Conversely, I would expect this to assert current state is TASK_TRACED.

Assuming the other pty end is halted, this ioctl would also need to
wait for the input kworker to stop:

--- >% ---
diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
index a946e49..744cb87 100644
--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -614,3 +614,8 @@ bool tty_buffer_cancel_work(struct tty_port *port)
{
return cancel_work_sync(&port->buf.work);
}
+
+void tty_buffer_flush_work(struct tty_port *port)
+{
+ flush_work(&port->buf.work);
+}
diff --git a/include/linux/tty.h b/include/linux/tty.h
index bf1bcdb..a541132 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -480,6 +480,7 @@ extern void tty_buffer_init(struct tty_port *port);
extern void tty_buffer_set_lock_subclass(struct tty_port *port);
extern bool tty_buffer_restart_work(struct tty_port *port);
extern bool tty_buffer_cancel_work(struct tty_port *port);
+extern void tty_buffer_flush_work(struct tty_port *port);
extern speed_t tty_termios_baud_rate(struct ktermios *termios);
extern speed_t tty_termios_input_baud_rate(struct ktermios *termios);
extern void tty_termios_encode_baud_rate(struct ktermios *termios,





tty_buffer_flush_work(tty->port);

> + retval = read_cnt(ldata);
> + if (retval) {
> + const unsigned char *from = read_buf_addr(ldata, ldata->read_tail);
> + if (copy_to_user(buf, from, retval))
> + retval = -EFAULT;
> + }
> + up_read(&tty->termios_rwsem);
> + mutex_unlock(&ldata->atomic_read_lock);
> + return retval;
> +}
> +
> static int n_tty_ioctl(struct tty_struct *tty, struct file *file,
> unsigned int cmd, unsigned long arg)
> {
> @@ -2437,6 +2457,8 @@ static int n_tty_ioctl(struct tty_struct
> retval = read_cnt(ldata);
> up_write(&tty->termios_rwsem);
> return put_user(retval, (unsigned int __user *) arg);
> + case TIOCPEEKRAW:

As with the function name, the ioctl symbol is redundant; simply TIOCPEEK
is ok.

> + return n_tty_peek_raw(tty, (unsigned char __user *) arg);
> default:
> return n_tty_ioctl_helper(tty, file, cmd, arg);
> }
> Index: linux-ml.git/include/uapi/asm-generic/ioctls.h
> ===================================================================
> --- linux-ml.git.orig/include/uapi/asm-generic/ioctls.h
> +++ linux-ml.git/include/uapi/asm-generic/ioctls.h
> @@ -77,6 +77,7 @@
> #define TIOCGPKT _IOR('T', 0x38, int) /* Get packet mode state */
> #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */
> #define TIOCGEXCL _IOR('T', 0x40, int) /* Get exclusive mode state */
> +#define TIOCPEEKRAW _IOR('T', 0x41, char)
>
> #define FIONCLEX 0x5450
> #define FIOCLEX 0x5451
>

2016-04-11 09:01:57

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH resend v2] tty: n_tty -- Add new TIOCPEEKRAW ioctl to peek unread data

On Sun, Apr 10, 2016 at 10:43:28PM -0700, Peter Hurley wrote:
> Hi Cyrill,
>
> On 04/07/2016 05:53 AM, Cyrill Gorcunov wrote:
> > Currently when we checkpoint PTY connections we ignore data queued inside
> > read buffer of ldisk, such data is barely lost after the restore. So we
> > would like to have an ability to dump and restore it.

Hi Peter! First of all, thanks a huge for comments!

> Do you want to be able to restore the exact state (like column and
> newline markers)?
>
> I'm wondering if a for-purpose checkpoint/restore ioctl would be more
> appropriate?

Strictly speaking saving complete state would be ideal but this require
exporting internal n_tty structure into user space as api (sure we
can name the members in any way we like, but this would force then
to keep backward compatibility when n_tty code get changed in future).
I think loosing some information like column and marker is acceptable
and fetching unread buffers is more general. I can try to implement
c/r'ing ioctl though which would not only fetch buffers but column
and such, and see how it goes, sounds ok?

> A generic peek() may find other uses which could make future improvements
> difficult.
>
> Plus this ioctl() is only reading the 4k line discipline read buffer, and
> not the tty buffers which could contain up to another 8k of unprocessed
> input.

Which is scheduled for flush into port and then landed into ld when
write() called. True, I managed to miss this moment, thanks! So the
proper general peek() handling should be implemented on tty code level
rather than ld, and fetch both -- start with tty buffer and proceed with
ld buffer then, right?

> > Here is a new ioctl code which simply copies data from read buffer into
> > the userspace without any additional processing (just like terminal is
> > sitting in a raw mode).
> >
> > The idea behind is when we checkpointing PTY pairs we peek this unread
> > data into images on disk and on restore phase we find the linked peer
> > and write it back thus reading peer will see it in read buffer.
>
> Have you frozen every process with either end open? I could see how this would
> be easy for an entire process group but what if the processes are unrelated?
> How do you ensure you have the correct linked peer with multiple devpts instances?

Currently we freeze all processes which are requested for dumping and then
we walk over all files. Once tty peer is met (actually we support pty
and vt only at the moment) we remember its index via TIOCGPTN (for master
peers) and expect to find appropriate slave peer. You know, having multiple
devpts mounted may be a problem actually in current criu code, I track indices
via one global array but rather should do that via per-sb basis.

Still we support that named external peers, where one of the ends do not
belong process tree we're dumping. In such scenario it's allowed to not
dump queued data because we don't control external peer anyhow.

> > I tried several approaches (including simply reuse of canon_copy_from_read_buf
> > and copy_from_read_buf) but it makes code more complicated, so I end up
> > in plain copying of buffer data.
> >
> > Strictly speaking, we could try handle it inside CRIU itself, as
> > Peter Hurley proposed (switch terminal to raw mode, read data,
> > and return original termios back) but in this case we may hit
> > the scenario when we read data and failed to restore it back
> > (due to any error) leaving the task we're dumping without
> > read buffer, thus peeking data looks like be the only
> > safe way.
>
> Code comments below.
>
>
> > v2:
> > - Use @char in ioctl definition.
> >
> > Signed-off-by: Cyrill Gorcunov <[email protected]>
> > CC: Jiri Slaby <[email protected]>
> > CC: Peter Hurley <[email protected]>
> > CC: Greg Kroah-Hartman <[email protected]>
> > CC: Andrey Vagin <[email protected]>
> > CC: Pavel Emelianov <[email protected]>
> > CC: Vladimir Davydov <[email protected]>
> > CC: Konstantin Khorenko <[email protected]>
> > ---
> > drivers/tty/n_tty.c | 22 ++++++++++++++++++++++
> > include/uapi/asm-generic/ioctls.h | 1 +
> > 2 files changed, 23 insertions(+)
> >
> > Index: linux-ml.git/drivers/tty/n_tty.c
> > ===================================================================
> > --- linux-ml.git.orig/drivers/tty/n_tty.c
> > +++ linux-ml.git/drivers/tty/n_tty.c
> > @@ -2420,6 +2420,26 @@ static unsigned long inq_canon(struct n_
> > return nr;
> > }
> >
> > +static ssize_t n_tty_peek_raw(struct tty_struct *tty, unsigned char __user *buf)
>
> function name is redundant; n_tty_peek() is sufficient.
>
> Needs a buffer-length parameter; could return the size required to completely
> copy the ldisc read buffer, if smaller than required.

Seems better to provide some kind of structure with data address
and size would be kept, on read we can update @size and put there
the number of bytes unread if buffer is too small.

> > +{
> > + struct n_tty_data *ldata = tty->disc_data;
> > + ssize_t retval;
> > +
> > + if (!mutex_trylock(&ldata->atomic_read_lock))
> > + return -EAGAIN;
> > +
> > + down_read(&tty->termios_rwsem);
>
> Why take these two locks if parallel operations are not expected?
> Conversely, I would expect this to assert current state is TASK_TRACED.

If someone is already reading the data it will either exit soon with
data read or will sit here waiting for data to appear. So I thought
if we're fetching unread data we should not interfere with any other
readers? If someone is sitting in read procedure it implies that there
either no data on buffer and reader is waiting for it, or the read
procedure will complete soon, but indeed I rather should be checking
for read_cnt locklessly and return -EAGAIN if > 0 and can't lock
the atomic_read_lock. I implied that task can be in non-traced state.
If require traced indeed we won't need the lock. For criu lockless
+ tracing state is fine but i thought someone else might be needing
to peek data without tracing the task. Hm?

> Assuming the other pty end is halted, this ioctl would also need to
> wait for the input kworker to stop:

Yes, thanks!

>
> --- >% ---
> diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
> index a946e49..744cb87 100644
> --- a/drivers/tty/tty_buffer.c
> +++ b/drivers/tty/tty_buffer.c
> @@ -614,3 +614,8 @@ bool tty_buffer_cancel_work(struct tty_port *port)
> {
> return cancel_work_sync(&port->buf.work);
> }
> +
> +void tty_buffer_flush_work(struct tty_port *port)
> +{
> + flush_work(&port->buf.work);
> +}
> diff --git a/include/linux/tty.h b/include/linux/tty.h
> index bf1bcdb..a541132 100644
> --- a/include/linux/tty.h
> +++ b/include/linux/tty.h
> @@ -480,6 +480,7 @@ extern void tty_buffer_init(struct tty_port *port);
> extern void tty_buffer_set_lock_subclass(struct tty_port *port);
> extern bool tty_buffer_restart_work(struct tty_port *port);
> extern bool tty_buffer_cancel_work(struct tty_port *port);
> +extern void tty_buffer_flush_work(struct tty_port *port);
> extern speed_t tty_termios_baud_rate(struct ktermios *termios);
> extern speed_t tty_termios_input_baud_rate(struct ktermios *termios);
> extern void tty_termios_encode_baud_rate(struct ktermios *termios,
>
> tty_buffer_flush_work(tty->port);
>
> > + retval = read_cnt(ldata);
> > + if (retval) {
> > + const unsigned char *from = read_buf_addr(ldata, ldata->read_tail);
> > + if (copy_to_user(buf, from, retval))
> > + retval = -EFAULT;
> > + }
> > + up_read(&tty->termios_rwsem);
> > + mutex_unlock(&ldata->atomic_read_lock);
> > + return retval;
> > +}
> > +
> > static int n_tty_ioctl(struct tty_struct *tty, struct file *file,
> > unsigned int cmd, unsigned long arg)
> > {
> > @@ -2437,6 +2457,8 @@ static int n_tty_ioctl(struct tty_struct
> > retval = read_cnt(ldata);
> > up_write(&tty->termios_rwsem);
> > return put_user(retval, (unsigned int __user *) arg);
> > + case TIOCPEEKRAW:
>
> As with the function name, the ioctl symbol is redundant; simply TIOCPEEK
> is ok.

OK

2016-04-11 17:20:16

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH resend v2] tty: n_tty -- Add new TIOCPEEKRAW ioctl to peek unread data

On 04/11/2016 02:01 AM, Cyrill Gorcunov wrote:
> On Sun, Apr 10, 2016 at 10:43:28PM -0700, Peter Hurley wrote:
>> Hi Cyrill,
>>
>> On 04/07/2016 05:53 AM, Cyrill Gorcunov wrote:
>>> Currently when we checkpoint PTY connections we ignore data queued inside
>>> read buffer of ldisk, such data is barely lost after the restore. So we
>>> would like to have an ability to dump and restore it.
>
> Hi Peter! First of all, thanks a huge for comments!
>
>> Do you want to be able to restore the exact state (like column and
>> newline markers)?
>>
>> I'm wondering if a for-purpose checkpoint/restore ioctl would be more
>> appropriate?

My thoughts here are two-fold.

1. I'd like whatever functionality is to be added to support checkpoint/
restore to be a restrictive as possible while still accomplishing your
objectives _exactly because I don't want it to grow other uses_.
That's why I suggested asserting the task state (and maybe adding
ptrace state checks) so that the interface can only be used for this
purpose.

2. If we want to make changes because the existing read()/write()
interface is inadequate, let's add a complete interface.

> Strictly speaking saving complete state would be ideal but this require
> exporting internal n_tty structure into user space as api (sure we
> can name the members in any way we like, but this would force then
> to keep backward compatibility when n_tty code get changed in future).
> I think loosing some information like column and marker is acceptable
> and fetching unread buffers is more general. I can try to implement
> c/r'ing ioctl though which would not only fetch buffers but column
> and such, and see how it goes, sounds ok?

My thinking here was to use an opaque buffer; it could be sized in the
same manner as the peek interface; ie., provide a buffer address with
size, and the call fails with an updated size required if inadequate
(or whatever, but I think you get the idea).


>> A generic peek() may find other uses which could make future improvements
>> difficult.
>>
>> Plus this ioctl() is only reading the 4k line discipline read buffer, and
>> not the tty buffers which could contain up to another 8k of unprocessed
>> input.
>
> Which is scheduled for flush into port and then landed into ld when
> write() called. True, I managed to miss this moment, thanks! So the
> proper general peek() handling should be implemented on tty code level
> rather than ld, and fetch both -- start with tty buffer and proceed with
> ld buffer then, right?

To safely copy the tty buffers, it needs to be implemented like
tty_buffer_flush():

1. get an ldisc reference (ioctls don't automatically do this)

ld = tty_ldisc_ref(tty);
if (!ld)
return -EIO;

2. lock out any other consumers of tty buffers:

tty_buffer_lock_exclusive(tty->port);

This will abort any in-progress input kworker and ensures neither
buf->head nor head->read advance.

* This does not lock out parallel producers so it's important to *
* guarantee there are no parallel writers (see my lock discussion) *

3. walk the list of tty buffers, copying each one:

struct tty_buffer *head = port->buf->head;
while (head) {
count = smp_load_acquire(&head->commit) - head->read)
if (count)
/* copy tty buffer */
head = smp_load_acquire(head->next);
}

4. Now call into the line discipline to copy its data:

if (ld->ops->checkpoint)
ld->ops->checkpoint(....)

5. Restart kworker

tty_buffer_unlock_exclusive(tty->port);

6. Release ldisc reference

tty_ldisc_deref(ld);



>>> Here is a new ioctl code which simply copies data from read buffer into
>>> the userspace without any additional processing (just like terminal is
>>> sitting in a raw mode).
>>>
>>> The idea behind is when we checkpointing PTY pairs we peek this unread
>>> data into images on disk and on restore phase we find the linked peer
>>> and write it back thus reading peer will see it in read buffer.
>>
>> Have you frozen every process with either end open? I could see how this would
>> be easy for an entire process group but what if the processes are unrelated?
>> How do you ensure you have the correct linked peer with multiple devpts instances?
>
> Currently we freeze all processes which are requested for dumping and then
> we walk over all files. Once tty peer is met (actually we support pty
> and vt only at the moment) we remember its index via TIOCGPTN (for master
> peers) and expect to find appropriate slave peer. You know, having multiple
> devpts mounted may be a problem actually in current criu code, I track indices
> via one global array but rather should do that via per-sb basis.

Ok.


> Still we support that named external peers, where one of the ends do not
> belong process tree we're dumping. In such scenario it's allowed to not
> dump queued data because we don't control external peer anyhow.

Ok.


>>> I tried several approaches (including simply reuse of canon_copy_from_read_buf
>>> and copy_from_read_buf) but it makes code more complicated, so I end up
>>> in plain copying of buffer data.
>>>
>>> Strictly speaking, we could try handle it inside CRIU itself, as
>>> Peter Hurley proposed (switch terminal to raw mode, read data,
>>> and return original termios back) but in this case we may hit
>>> the scenario when we read data and failed to restore it back
>>> (due to any error) leaving the task we're dumping without
>>> read buffer, thus peeking data looks like be the only
>>> safe way.
>>
>> Code comments below.
>>
>>
>>> v2:
>>> - Use @char in ioctl definition.
>>>
>>> Signed-off-by: Cyrill Gorcunov <[email protected]>
>>> CC: Jiri Slaby <[email protected]>
>>> CC: Peter Hurley <[email protected]>
>>> CC: Greg Kroah-Hartman <[email protected]>
>>> CC: Andrey Vagin <[email protected]>
>>> CC: Pavel Emelianov <[email protected]>
>>> CC: Vladimir Davydov <[email protected]>
>>> CC: Konstantin Khorenko <[email protected]>
>>> ---
>>> drivers/tty/n_tty.c | 22 ++++++++++++++++++++++
>>> include/uapi/asm-generic/ioctls.h | 1 +
>>> 2 files changed, 23 insertions(+)
>>>
>>> Index: linux-ml.git/drivers/tty/n_tty.c
>>> ===================================================================
>>> --- linux-ml.git.orig/drivers/tty/n_tty.c
>>> +++ linux-ml.git/drivers/tty/n_tty.c
>>> @@ -2420,6 +2420,26 @@ static unsigned long inq_canon(struct n_
>>> return nr;
>>> }
>>>
>>> +static ssize_t n_tty_peek_raw(struct tty_struct *tty, unsigned char __user *buf)
>>
>> function name is redundant; n_tty_peek() is sufficient.
>>
>> Needs a buffer-length parameter; could return the size required to completely
>> copy the ldisc read buffer, if smaller than required.
>
> Seems better to provide some kind of structure with data address
> and size would be kept, on read we can update @size and put there
> the number of bytes unread if buffer is too small.
>
>>> +{
>>> + struct n_tty_data *ldata = tty->disc_data;
>>> + ssize_t retval;
>>> +
>>> + if (!mutex_trylock(&ldata->atomic_read_lock))
>>> + return -EAGAIN;
>>> +
>>> + down_read(&tty->termios_rwsem);
>>
>> Why take these two locks if parallel operations are not expected?
>> Conversely, I would expect this to assert current state is TASK_TRACED.
>
> If someone is already reading the data it will either exit soon with
> data read or will sit here waiting for data to appear. So I thought
> if we're fetching unread data we should not interfere with any other
> readers? If someone is sitting in read procedure it implies that there
> either no data on buffer and reader is waiting for it, or the read
> procedure will complete soon, but indeed I rather should be checking
> for read_cnt locklessly and return -EAGAIN if > 0 and can't lock
> the atomic_read_lock. I implied that task can be in non-traced state.
> If require traced indeed we won't need the lock. For criu lockless
> + tracing state is fine but i thought someone else might be needing
> to peek data without tracing the task. Hm?

Well, I'd rather not have other uses for peeking data. In fact, I need
to check if this interface needs to be CAP_SYSADMIN; I'm pretty sure
that ioctl() ignores file permissions and write-only file permissions
(used by /usr/bin/write to send messages to other users) should not
allow tty reads.

Ok, so we want to defend against parallel operations in case not
all tty users are currently stopped by ptrace (such as when unrelated
tasks are potentially reading or writing to either end and were not
specified for dumping)?

In that case, I think just write trylocking the termios rwsem should
prevent any parallel i/o, at least for the N_TTY line discipline.

This should only interfere with processes not being dumped because
ptrace signalling should have ejected any process to be dumped out
of their i/o loops (readers or writers).


Also, I think we should further limit the interface based on what is
supported currently; IOW, check that the driver is either pty or vt,
assert that the line discipline is N_TTY at both ends, etc.


>> Assuming the other pty end is halted, this ioctl would also need to
>> wait for the input kworker to stop:
>
> Yes, thanks!
>
>>
>> --- >% ---
>> diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
>> index a946e49..744cb87 100644
>> --- a/drivers/tty/tty_buffer.c
>> +++ b/drivers/tty/tty_buffer.c
>> @@ -614,3 +614,8 @@ bool tty_buffer_cancel_work(struct tty_port *port)
>> {
>> return cancel_work_sync(&port->buf.work);
>> }
>> +
>> +void tty_buffer_flush_work(struct tty_port *port)
>> +{
>> + flush_work(&port->buf.work);
>> +}
>> diff --git a/include/linux/tty.h b/include/linux/tty.h
>> index bf1bcdb..a541132 100644
>> --- a/include/linux/tty.h
>> +++ b/include/linux/tty.h
>> @@ -480,6 +480,7 @@ extern void tty_buffer_init(struct tty_port *port);
>> extern void tty_buffer_set_lock_subclass(struct tty_port *port);
>> extern bool tty_buffer_restart_work(struct tty_port *port);
>> extern bool tty_buffer_cancel_work(struct tty_port *port);
>> +extern void tty_buffer_flush_work(struct tty_port *port);
>> extern speed_t tty_termios_baud_rate(struct ktermios *termios);
>> extern speed_t tty_termios_input_baud_rate(struct ktermios *termios);
>> extern void tty_termios_encode_baud_rate(struct ktermios *termios,
>>
>> tty_buffer_flush_work(tty->port);
>>
>>> + retval = read_cnt(ldata);
>>> + if (retval) {
>>> + const unsigned char *from = read_buf_addr(ldata, ldata->read_tail);
>>> + if (copy_to_user(buf, from, retval))
>>> + retval = -EFAULT;
>>> + }
>>> + up_read(&tty->termios_rwsem);
>>> + mutex_unlock(&ldata->atomic_read_lock);
>>> + return retval;
>>> +}
>>> +
>>> static int n_tty_ioctl(struct tty_struct *tty, struct file *file,
>>> unsigned int cmd, unsigned long arg)
>>> {
>>> @@ -2437,6 +2457,8 @@ static int n_tty_ioctl(struct tty_struct
>>> retval = read_cnt(ldata);
>>> up_write(&tty->termios_rwsem);
>>> return put_user(retval, (unsigned int __user *) arg);
>>> + case TIOCPEEKRAW:
>>
>> As with the function name, the ioctl symbol is redundant; simply TIOCPEEK
>> is ok.
>
> OK
>

2016-04-11 21:42:18

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH resend v2] tty: n_tty -- Add new TIOCPEEKRAW ioctl to peek unread data

On Mon, Apr 11, 2016 at 10:20:08AM -0700, Peter Hurley wrote:
...
>
> My thoughts here are two-fold.
>
> 1. I'd like whatever functionality is to be added to support checkpoint/
> restore to be a restrictive as possible while still accomplishing your
> objectives _exactly because I don't want it to grow other uses_.
> That's why I suggested asserting the task state (and maybe adding
> ptrace state checks) so that the interface can only be used for this
> purpose.
>
> 2. If we want to make changes because the existing read()/write()
> interface is inadequate, let's add a complete interface.

I completely agree!

> > Strictly speaking saving complete state would be ideal but this require
> > exporting internal n_tty structure into user space as api (sure we
> > can name the members in any way we like, but this would force then
> > to keep backward compatibility when n_tty code get changed in future).
> > I think loosing some information like column and marker is acceptable
> > and fetching unread buffers is more general. I can try to implement
> > c/r'ing ioctl though which would not only fetch buffers but column
> > and such, and see how it goes, sounds ok?
>
> My thinking here was to use an opaque buffer; it could be sized in the
> same manner as the peek interface; ie., provide a buffer address with
> size, and the call fails with an updated size required if inadequate
> (or whatever, but I think you get the idea).

Yeah, got it.

> >> A generic peek() may find other uses which could make future improvements
> >> difficult.
> >>
> >> Plus this ioctl() is only reading the 4k line discipline read buffer, and
> >> not the tty buffers which could contain up to another 8k of unprocessed
> >> input.
> >
> > Which is scheduled for flush into port and then landed into ld when
> > write() called. True, I managed to miss this moment, thanks! So the
> > proper general peek() handling should be implemented on tty code level
> > rather than ld, and fetch both -- start with tty buffer and proceed with
> > ld buffer then, right?
>
> To safely copy the tty buffers, it needs to be implemented like
> tty_buffer_flush():

...

> 6. Release ldisc reference
>
> tty_ldisc_deref(ld);
>

Thanks a huge, got it!

...
> >>> +{
> >>> + struct n_tty_data *ldata = tty->disc_data;
> >>> + ssize_t retval;
> >>> +
> >>> + if (!mutex_trylock(&ldata->atomic_read_lock))
> >>> + return -EAGAIN;
> >>> +
> >>> + down_read(&tty->termios_rwsem);
> >>
> >> Why take these two locks if parallel operations are not expected?
> >> Conversely, I would expect this to assert current state is TASK_TRACED.
> >
> > If someone is already reading the data it will either exit soon with
> > data read or will sit here waiting for data to appear. So I thought
> > if we're fetching unread data we should not interfere with any other
> > readers? If someone is sitting in read procedure it implies that there
> > either no data on buffer and reader is waiting for it, or the read
> > procedure will complete soon, but indeed I rather should be checking
> > for read_cnt locklessly and return -EAGAIN if > 0 and can't lock
> > the atomic_read_lock. I implied that task can be in non-traced state.
> > If require traced indeed we won't need the lock. For criu lockless
> > + tracing state is fine but i thought someone else might be needing
> > to peek data without tracing the task. Hm?
>
> Well, I'd rather not have other uses for peeking data. In fact, I need
> to check if this interface needs to be CAP_SYSADMIN; I'm pretty sure
> that ioctl() ignores file permissions and write-only file permissions
> (used by /usr/bin/write to send messages to other users) should not
> allow tty reads.
>
> Ok, so we want to defend against parallel operations in case not
> all tty users are currently stopped by ptrace (such as when unrelated
> tasks are potentially reading or writing to either end and were not
> specified for dumping)?

I think so. At least this gives some kind of consistensy while we're
fetching data. Something close to peeking data from pipes/sockets.

> In that case, I think just write trylocking the termios rwsem should
> prevent any parallel i/o, at least for the N_TTY line discipline.
>
> This should only interfere with processes not being dumped because
> ptrace signalling should have ejected any process to be dumped out
> of their i/o loops (readers or writers).
>
> Also, I think we should further limit the interface based on what is
> supported currently; IOW, check that the driver is either pty or vt,
> assert that the line discipline is N_TTY at both ends, etc.

Thats a good idea, thanks, will do!