Hi,
Recently the kernel has implemented the TIOCGPTPEER ioctl() call which allows
users to retrieve an fd for the slave side of a pty solely based on the
corresponding fd for the master side. The ioctl()-based fd retrieval however
causes the "/proc/<pid>/fd/<pty-slave-fd>" symlink to point to the wrong dentry.
Currently, it will always point to "/". The following simple program can be used
to illustrate the problem when run on a system that implements the TIOCGPTPEER
ioctl() call.
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
int main()
{
int master;
int ret = -1, slave = -1;
char buf[4096], path[4096];
master = open("/dev/ptmx", O_RDWR | O_NOCTTY);
if (master < 0)
return -1;
ret = grantpt(master);
if (ret < 0)
return -1;
ret = unlockpt(master);
if (ret < 0)
goto on_error;
slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY);
if (slave < 0)
goto on_error;
ret = snprintf(path, 4096, "/proc/self/fd/%d", slave);
if (ret < 0 || ret >= 4096)
goto on_error;
ret = readlink(path, buf, sizeof(buf));
if (ret < 0)
goto on_error;
printf("I point to \"%s\"\n", buf);
on_error:
close(master);
if (slave >= 0)
close(slave);
return ret;
}
With the symlink pointing to the wrong path any interesting path-based operation
on the slave side will fail. Also this will cause ttyname{_r}() to falsely
report that this fd does not return to a tty. So I really think this needs to be
fixed. The fix however doesn't seem super obvious to me. It seems the most
straightforward way for now is to behave like the implementation of pipes and
sockets that implement a dynamic_dname() method to correctly set the content of
the "/proc/<pid>/fd/<n>" symlink without requiring special-casing in the proc
implementation itself. I prefer this approach. The downside of this however is
that in case the devpts is not mounted at its standard "/dev/pts" location but
e.g. at "/mnt" the content of the corresponding "/proc/<pid>/fd/<n>" symlink
will still be "/dev/pts/<idx>" although it should likely be "/mnt/<idx". I've
gone over this back and forth in my head but I think that this is not a deal
breaker. All libcs currently hard-code "/dev/pts/<idx>" into their
implementation of ptsname{_r}() and so wouldn't be affected by this change at
all. Furthermore, mounting devpts somewhere other than "/dev/pts" (e.g. "/mnt")
doesn't seem to work and from what I gather from LKML is not really expected nor
supported to work.
All ioctl() I've seens so far that return fds seems to implement a
dynamic_dname() method and I didn't see any other way how to do this here. One
thing that came to mind was to somehow get at the "absolute path" for the slave
side dentry such that you retrieve the mountpoint for the devpts fs and then
combine this with the pty slave index to generate the proc name. However, the
concept of an absolute path does not really make sense for dentries and in the
face of bind-mounts it becomes questionable what devpts instance should win.
Furthermore, the dynamic_dname() method will only allow you to access the dentry
itself and not a struct path which would contain the vfsmount information. In
any case, here is my patch, when applied the fd returned by ioctl(fd,
TIOCGPTPEER) will have the correct content ("/dev/pts/<idx>"):
Christian Brauner (1):
devpts: use dynamic_dname() to generate proc name
fs/devpts/inode.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
--
2.13.3
Recently the kernel has implemented the TIOCGPTPEER ioctl() call which allows
users to retrieve an fd for the slave side of a pty solely based on the
corresponding fd for the master side. The ioctl()-based fd retrieval however
causes the "/proc/<pid>/fd/<pty-slave-fd>" symlink to point to the wrong dentry.
Currently, it will always point to "/".
With the symlink pointing to the wrong path any interesting path-based operation
on the slave side will fail. Also this will cause ttyname{_r}() to falsely
report that this fd does not return to a tty. So I really think this needs to be
fixed. The fix however doesn't seem super obvious to me. It seems the most
straightforward way for now is to behave like the implementation of pipes and
sockets that implement a dynamic_dname() method to correctly set the content of
the "/proc/<pid>/fd/<n>" symlink without requiring special-casing in the proc
implementation itself. I prefer this approach. The downside of this however is
that in case the devpts is not mounted at its standard "/dev/pts" location but
e.g. at "/mnt" the content of the corresponding "/proc/<pid>/fd/<n>" symlink
will still be "/dev/pts/<idx>" although it should likely be "/mnt/<idx". I've
gone over this back and forth in my head but I think that this is not a deal
breaker. All libcs currently hard-code "/dev/pts/<idx>" into their
implementation of ptsname{_r}() and so wouldn't be affected by this change at
all. Furthermore, mounting devpts somewhere other than "/dev/pts" (e.g. "/mnt")
doesn't seem to work and from what I gather from LKML is not really expected nor
supported to work.
Signed-off-by: Christian Brauner <[email protected]>
---
fs/devpts/inode.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 108df2e3602c..1b1b9b0813e8 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -369,6 +369,18 @@ static const struct super_operations devpts_sops = {
.show_options = devpts_show_options,
};
+/*
+ * pty_peer_dname() is called from d_path().
+ */
+static char *pty_peer_dname(struct dentry *dentry, char *buffer, int buflen)
+{
+ return dynamic_dname(dentry, buffer, buflen, "/dev/pts/%pd", dentry);
+}
+
+static const struct dentry_operations devpts_dops = {
+ .d_dname = pty_peer_dname,
+};
+
static void *new_pts_fs_info(struct super_block *sb)
{
struct pts_fs_info *fsi;
@@ -397,6 +409,7 @@ devpts_fill_super(struct super_block *s, void *data, int silent)
s->s_magic = DEVPTS_SUPER_MAGIC;
s->s_op = &devpts_sops;
s->s_time_gran = 1;
+ s->s_d_op = &devpts_dops;
error = -ENOMEM;
s->s_fs_info = new_pts_fs_info(s);
--
2.13.3
On Wed, Aug 16, 2017 at 10:12 AM, Christian Brauner
<[email protected]> wrote:
>
> Recently the kernel has implemented the TIOCGPTPEER ioctl() call which allows
> users to retrieve an fd for the slave side of a pty solely based on the
> corresponding fd for the master side. The ioctl()-based fd retrieval however
> causes the "/proc/<pid>/fd/<pty-slave-fd>" symlink to point to the wrong dentry.
> Currently, it will always point to "/". The following simple program can be used
> to illustrate the problem when run on a system that implements the TIOCGPTPEER
> ioctl() call.
I think your patch is wrong - we need to actually use the *right*
path, rather than hardcode "/dev/pts/%d" in there.
Hardcoding "/dev/pts/%d" is something that user space can already do.
The kernel can and should do better.
That "dynamic_dname()" helper is for things that don't really have a
real pathname at all, so things like pipes and other file descriptors
that were opened without an associated entry in a filesystem (sockets,
other "special" inodes with no filesystem backing).
For things like pts slaves, we actually *do* have a real pathname -
and we should expose it. If the devpts filesystem is mounted somewhere
else than /dev/pts, it should give that correct pathname.
I also think your test program is a bit buggy, and silly:
> ret = snprintf(path, 4096, "/proc/self/fd/%d", slave);
> if (ret < 0 || ret >= 4096)
> goto on_error;
>
> ret = readlink(path, buf, sizeof(buf));
> if (ret < 0)
> goto on_error;
> printf("I point to \"%s\"\n", buf);
This just smells wrong to me. And not just because you don't
NUL-terminate the readlink() return value (or just use "%.*s" with
ret, buf) .
We should just give people a better way to get the pathname than that
readlink on /proc/self/fd/<fd> thing. It's kind of ridiculous that we
don't. We already have a "getcwd()" system call that only does it for
cwd.
So I think we could/should just add a system call for 'fdname()' or similar.
Linus
On Wed, Aug 16, 2017 at 11:26 AM, Linus Torvalds
<[email protected]> wrote:
>
> Hardcoding "/dev/pts/%d" is something that user space can already do.
> The kernel can and should do better.
Put another way: there's no point in applying the patch as-is, since
existing glibc ptsname() does the same thing better and faster
entirely in user space.
Also, we already do special things to get a path for this, but it
clearly isn't working. See the
/* We need to cache a fake path for TIOCGPTPEER. */
comment in ptmx_open(). Why doesn't the file d_path get filled in
correctly there, I wonder.
Because The regular
readlink("/proc/self/fd/0", ...)
that 'tty' does works correctly. I think we've done something
incorrect in pty_open_peer(), which means that the fd path hasn't been
fully filled in.
Fixing that *should* fix the readlink() automatically, since it
clearly works for the 'tty' binary.
I'm wondering why it's not working as-is. "vfs_open()" does that
file->f_path = *path;
thing. Why aren't we getting the right path? The ptmx_open() code
looks ok to me.
Al, do you see what the issue is, and why we don't get a proper path
on that readlink?
Linus
On Wed, Aug 16, 2017 at 11:48:48AM -0700, Linus Torvalds wrote:
> On Wed, Aug 16, 2017 at 11:26 AM, Linus Torvalds
> <[email protected]> wrote:
> >
> > Hardcoding "/dev/pts/%d" is something that user space can already do.
> > The kernel can and should do better.
>
> Put another way: there's no point in applying the patch as-is, since
> existing glibc ptsname() does the same thing better and faster
> entirely in user space.
Right. I actually took that to be an argument for this patch not against it. :)
Another point I was trying to make in my initial mail is that ttyname{_r}() will
currently look at "/proc/self/fd/<nr> to detect the name of the associated pts
device and it will error out if the content it reads is not a "/dev/pts/<n>"
path. Afaict musl and glibc both currently rely on this. This would be broken
with the current TIOCGPTPEER change.
>
> Also, we already do special things to get a path for this, but it
> clearly isn't working. See the
>
> /* We need to cache a fake path for TIOCGPTPEER. */
>
> comment in ptmx_open(). Why doesn't the file d_path get filled in
> correctly there, I wonder.
>
> Because The regular
>
> readlink("/proc/self/fd/0", ...)
>
> that 'tty' does works correctly. I think we've done something
> incorrect in pty_open_peer(), which means that the fd path hasn't been
> fully filled in.
>
> Fixing that *should* fix the readlink() automatically, since it
> clearly works for the 'tty' binary.
>
> I'm wondering why it's not working as-is. "vfs_open()" does that
>
> file->f_path = *path;
>
> thing. Why aren't we getting the right path? The ptmx_open() code
> looks ok to me.
I thought - and sorry if I'm completely wrong here - that the proc name came
from the open(const char *pathname, ...) call. Currently the only way to
retrieve a slave side fd for a given pty pair is by calling open("/dev/pts/<n>",
O_RDWR | O_NOCTTY). This would take care of placing the correct path in the proc
symlink through do_filp_open() and friends. However, for ioctl() calls that open
dentries to return an fd this is not possible and - or so I thought - the proc
name is/has to be generated via dynamic_dname(). But again, I might be totally
off here.
Christian
>
> Al, do you see what the issue is, and why we don't get a proper path
> on that readlink?
>
> Linus
On Wed, Aug 16, 2017 at 12:48 PM, Christian Brauner
<[email protected]> wrote:
>
> I thought - and sorry if I'm completely wrong here - that the proc name came
> from the open(const char *pathname, ...) call.
No. It comes from the path associated with the file descriptor, and is
expanded from the dentry tree.
Which is why you get a full pathname even when you only opened
something using a relative pathname.
So the fact that we _don't_ get the right pathname for the pts entry
here means that something got screwed up in setting filp->f_path to
the right thing. We have all the code in place that _tries_ to do it,
but it clearly has a bug somewhere.
Linus
On Wed, Aug 16, 2017 at 12:56 PM, Linus Torvalds
<[email protected]> wrote:
>
> So the fact that we _don't_ get the right pathname for the pts entry
> here means that something got screwed up in setting filp->f_path to
> the right thing. We have all the code in place that _tries_ to do it,
> but it clearly has a bug somewhere.
Ok, I think I see what the bug is, although I don't have a fix for it yet.
We generate the path largely correctly: the path has a nice dentry
that contains the right pts number, and has the right parent pointer
that points to the root of the pts mount.
And we also fill in the path 'mnt' field. Everything should be fine.
Except when we actually hit that root dentry of the pts mount, the
code in prepend_path() hits this condition:
if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
struct mount *parent = ACCESS_ONCE(mnt->mnt_parent);
/* Escaped? */
if (dentry != vfsmnt->mnt_root) {
and we break out, and reset the path to '/' because we think it
somehow escaped out of the user namespace.
So it looks like we filled in the path with the *wrong* mount information.
And THAT in turn is because we fill the path with the mount
information for the "/dev/ptmx" field - which is *not* in the
/dev/pts/ mount - that's the mount for '/dev'.
So we have a dentry and a mnt, but they simply aren't paired up correctly.
And you can see this with your test program: if you open /dev/pts/ptmx
for the master, it actually works correctly (but you need to make sure
the permissions for that ptmx node allow that).
Anyway, I know what's wrong, next step is to figure out what the fix is.
Linus
On Wed, Aug 16, 2017 at 1:19 PM, Linus Torvalds
<[email protected]> wrote:
>
> Anyway, I know what's wrong, next step is to figure out what the fix is.
Grr, We actually look up the right mnt in devpts_acquire(), but we
don't save it. We only save the superblock pointer, because that's
traditionally what we used.
I suspect the easiest fix is to just add a "mnt" argument to
devpts_acquire(), It shouldn't be too painful. Let me try.
Linus
On Wed, Aug 16, 2017 at 1:30 PM, Linus Torvalds
<[email protected]> wrote:
>
> I suspect the easiest fix is to just add a "mnt" argument to
> devpts_acquire(), It shouldn't be too painful. Let me try.
Ok, here's a *very* lightly tested patch. It might have new bugs, but
it makes your test program DTRT.
Al, mind going over this and making sure I didn't miss anything?
And Christian, if you can beat on this, that would be good.
Linus
On Wed, Aug 16, 2017 at 11:03 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Aug 16, 2017 at 1:30 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> I suspect the easiest fix is to just add a "mnt" argument to
>> devpts_acquire(), It shouldn't be too painful. Let me try.
>
> Ok, here's a *very* lightly tested patch. It might have new bugs, but
> it makes your test program DTRT.
Cool. Very happy this could be fixed so quickly!
>
> Al, mind going over this and making sure I didn't miss anything?
>
> And Christian, if you can beat on this, that would be good.
Yes, I can pound on this nicely with liblxc. We have patch
( https://github.com/lxc/lxc/pull/1728 ) up for review that
allocates pty fds from private devpts mounts in different namespaces
and sends those fds around between different namespaces. This
was one of the original motivations for implementing TIOCGPTPEER.
I had marked it as blocked until this bug was fixed which now seems
to be the case.
Thanks Linus!
Christian
>
> Linus
On Wed, Aug 16, 2017 at 2:37 PM, Christian Brauner
<[email protected]> wrote:
>> And Christian, if you can beat on this, that would be good.
>
> Yes, I can pound on this nicely with liblxc. We have patch
> ( https://github.com/lxc/lxc/pull/1728 ) up for review that
> allocates pty fds from private devpts mounts in different namespaces
> and sends those fds around between different namespaces.
Good. Testing that this works with different pts filesystems in
different places is exactly the kind of thing I'd like to see. I only
tested with my single pts filesystem that is mounted at /dev/pts, and
making sure it works when there are multiple mounts and in different
places is exactly the kind of testing this should get.
For example, if some namespace has it's pty's in _its_ /dev/pts/
hierarchy, and you then pass such a pty to somebody else that either
doesn't have that pts mount at all, or has it visible somewhere
entirely different, the result should now be something else than that
"/dev/pts/n" path.
But it would be good to just test this in general too, and make sure I
didn't screw up some reference count or something. The patch *looks*
obviously correct, but ...
Linus
On Wed, Aug 16, 2017 at 2:45 PM, Linus Torvalds
<[email protected]> wrote:
>
> But it would be good to just test this in general too, and make sure I
> didn't screw up some reference count or something. The patch *looks*
> obviously correct, but ...
Side note: I suspect it should be marked for stable, but it's going to
be basically impossible to back-port this any further than 4.7 or
something where we made each pts mount its own proper filesystem. The
code before that was a mess, and this is never going to work right in
old kernels.
Of course, TIOCGPTPEER itself is much more recent than that and was
done in this merge window, so hopefully you haven't backported it and
expect this all to work in some older kernel?
Linus
On Wed, Aug 16, 2017 at 11:55 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Aug 16, 2017 at 2:45 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> But it would be good to just test this in general too, and make sure I
>> didn't screw up some reference count or something. The patch *looks*
>> obviously correct, but ...
>
> Side note: I suspect it should be marked for stable, but it's going to
> be basically impossible to back-port this any further than 4.7 or
> something where we made each pts mount its own proper filesystem. The
> code before that was a mess, and this is never going to work right in
> old kernels.
Yeah, it would be nice if this could make it into stable at least and of course
into 4.13 but I take it that's a given anyway.
>
> Of course, TIOCGPTPEER itself is much more recent than that and was
> done in this merge window, so hopefully you haven't backported it and
> expect this all to work in some older kernel?
No, I haven't backported anything and we never intended to.
Serge, Stéphane, and I were very careful to not blindly rely on anything
that recent and - imho - security sensitive in our api. I see this more as a
"tech-preview" until this has seen some decent userspace testing. Even
if the kernel side seems fine now a lot of the userspace pty api is currently
using path-based operations instead of operations on the pty fds themselves.
This is tricky and error-prone when sending around pty fds between
different namespaces as one can see from the ttyname{_r}() example above.
Christian
On Wed, Aug 16, 2017 at 11:45 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Aug 16, 2017 at 2:37 PM, Christian Brauner
> <[email protected]> wrote:
>>> And Christian, if you can beat on this, that would be good.
>>
>> Yes, I can pound on this nicely with liblxc. We have patch
>> ( https://github.com/lxc/lxc/pull/1728 ) up for review that
>> allocates pty fds from private devpts mounts in different namespaces
>> and sends those fds around between different namespaces.
>
> Good. Testing that this works with different pts filesystems in
> different places is exactly the kind of thing I'd like to see. I only
> tested with my single pts filesystem that is mounted at /dev/pts, and
> making sure it works when there are multiple mounts and in different
> places is exactly the kind of testing this should get.
I'm compiling a kernel now and depending on how good the in-flight
wifi is I try to test this right away and answer here if that helps. If the
in-flight wifi sucks it might take me until tomorrow.
>
> For example, if some namespace has it's pty's in _its_ /dev/pts/
> hierarchy, and you then pass such a pty to somebody else that either
> doesn't have that pts mount at all, or has it visible somewhere
> entirely different, the result should now be something else than that
> "/dev/pts/n" path.
>
> But it would be good to just test this in general too, and make sure I
> didn't screw up some reference count or something. The patch *looks*
> obviously correct, but ...
>
> Linus
Linus Torvalds <[email protected]> writes:
> On Wed, Aug 16, 2017 at 1:30 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> I suspect the easiest fix is to just add a "mnt" argument to
>> devpts_acquire(), It shouldn't be too painful. Let me try.
>
> Ok, here's a *very* lightly tested patch. It might have new bugs, but
> it makes your test program DTRT.
>
> Al, mind going over this and making sure I didn't miss anything?
>
> And Christian, if you can beat on this, that would be good.
Linus reading through this it looks like your error handling is wrong.
> diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
> index 284749fb0f6b..432f514e3f42 100644
> --- a/drivers/tty/pty.c
> +++ b/drivers/tty/pty.c
> @@ -793,6 +793,7 @@ static int ptmx_open(struct inode *inode, struct file *filp)
> struct tty_struct *tty;
> struct path *pts_path;
> struct dentry *dentry;
> + struct vfsmount *mnt;
> int retval;
> int index;
>
> @@ -805,7 +806,7 @@ static int ptmx_open(struct inode *inode, struct file *filp)
> if (retval)
> return retval;
>
> - fsi = devpts_acquire(filp);
> + fsi = devpts_acquire(filp, &mnt);
> if (IS_ERR(fsi)) {
> retval = PTR_ERR(fsi);
> goto out_free_file;
> @@ -849,9 +850,14 @@ static int ptmx_open(struct inode *inode, struct file *filp)
> pts_path = kmalloc(sizeof(struct path), GFP_KERNEL);
> if (!pts_path)
> goto err_release;
> - pts_path->mnt = filp->f_path.mnt;
> - pts_path->dentry = dentry;
> - path_get(pts_path);
> +
> + /*
> + * The mnt already got a ref from devpts_acquire(),
> + * so we only dget() on the dentry.
> + */
> + pts_path->mnt = mnt;
> + pts_path->dentry = dget(dentry);
> +
> tty->link->driver_data = pts_path;
>
> retval = ptm_driver->ops->open(tty, filp);
^^^^^^^
If this open fails the code jumps to err_put_path which falls
through into out_put_fsi. But it also does path_put(pts_path).
Which will result in a double mntput of mnt.
So I think err_path_put needs to be updated to just put the dentry,
and let the later mntput put the mount.
> @@ -874,6 +880,7 @@ static int ptmx_open(struct inode *inode, struct file *filp)
> devpts_kill_index(fsi, index);
> out_put_fsi:
> devpts_release(fsi);
> + mntput(mnt);
> out_free_file:
> tty_free_file(filp);
> return retval;
Eric
On Wed, Aug 16, 2017 at 3:46 PM, Eric W. Biederman
<[email protected]> wrote:
>> tty->link->driver_data = pts_path;
>>
>> retval = ptm_driver->ops->open(tty, filp);
> ^^^^^^^
>
> If this open fails the code jumps to err_put_path which falls
> through into out_put_fsi.
No it doesn't.
err_path_put falls through to err_release, but that then does a
"return retval;". It doesn't get to out_put_fsi.
Now, I _do_ want people to check the release path, but I don't think
this was it. Or am I blind and missing something?
Linus
Linus Torvalds <[email protected]> writes:
> On Wed, Aug 16, 2017 at 3:46 PM, Eric W. Biederman
> <[email protected]> wrote:
>>> tty->link->driver_data = pts_path;
>>>
>>> retval = ptm_driver->ops->open(tty, filp);
>> ^^^^^^^
>>
>> If this open fails the code jumps to err_put_path which falls
>> through into out_put_fsi.
>
> No it doesn't.
>
> err_path_put falls through to err_release, but that then does a
> "return retval;". It doesn't get to out_put_fsi.
*Blink* You are right I missed that.
In which case I am concerned about failures that make it to err_release.
Unless I am missing something (again) failures that jump to err_release
won't call mntput and will result in a mnt leak.
> Now, I _do_ want people to check the release path, but I don't think
> this was it. Or am I blind and missing something?
Eric
On Wed, Aug 16, 2017 at 4:51 PM, Eric W. Biederman
<[email protected]> wrote:
>
> *Blink* You are right I missed that.
>
> In which case I am concerned about failures that make it to err_release.
> Unless I am missing something (again) failures that jump to err_release
> won't call mntput and will result in a mnt leak.
Yes, I think you're right.
Maybe this attached patch is better anyway. It's smaller, because it
keeps more closely to the old code, and just adds a mntput() in all
the exit cases, and depends on the "path_get()" to have incremented
the mnt refcount one extra time.
Can you find something in this one?
ENTIRELY UNTESTED!
Linus
Linus Torvalds <[email protected]> writes:
> On Wed, Aug 16, 2017 at 4:51 PM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> *Blink* You are right I missed that.
>>
>> In which case I am concerned about failures that make it to err_release.
>> Unless I am missing something (again) failures that jump to err_release
>> won't call mntput and will result in a mnt leak.
>
> Yes, I think you're right.
>
> Maybe this attached patch is better anyway. It's smaller, because it
> keeps more closely to the old code, and just adds a mntput() in all
> the exit cases, and depends on the "path_get()" to have incremented
> the mnt refcount one extra time.
>
> Can you find something in this one?
My eyeballs don't see any problems with the patch below.
Acked-by: "Eric W. Biederman" <[email protected]>
Eric
> ENTIRELY UNTESTED!
>
> Linus
>
> drivers/tty/pty.c | 7 +++++--
> fs/devpts/inode.c | 4 +++-
> include/linux/devpts_fs.h | 2 +-
> 3 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
> index 284749fb0f6b..1fc80ea87c13 100644
> --- a/drivers/tty/pty.c
> +++ b/drivers/tty/pty.c
> @@ -793,6 +793,7 @@ static int ptmx_open(struct inode *inode, struct file *filp)
> struct tty_struct *tty;
> struct path *pts_path;
> struct dentry *dentry;
> + struct vfsmount *mnt;
> int retval;
> int index;
>
> @@ -805,7 +806,7 @@ static int ptmx_open(struct inode *inode, struct file *filp)
> if (retval)
> return retval;
>
> - fsi = devpts_acquire(filp);
> + fsi = devpts_acquire(filp, &mnt);
> if (IS_ERR(fsi)) {
> retval = PTR_ERR(fsi);
> goto out_free_file;
> @@ -849,7 +850,7 @@ static int ptmx_open(struct inode *inode, struct file *filp)
> pts_path = kmalloc(sizeof(struct path), GFP_KERNEL);
> if (!pts_path)
> goto err_release;
> - pts_path->mnt = filp->f_path.mnt;
> + pts_path->mnt = mnt;
> pts_path->dentry = dentry;
> path_get(pts_path);
> tty->link->driver_data = pts_path;
> @@ -866,6 +867,7 @@ static int ptmx_open(struct inode *inode, struct file *filp)
> path_put(pts_path);
> kfree(pts_path);
> err_release:
> + mntput(mnt);
> tty_unlock(tty);
> // This will also put-ref the fsi
> tty_release(inode, filp);
> @@ -874,6 +876,7 @@ static int ptmx_open(struct inode *inode, struct file *filp)
> devpts_kill_index(fsi, index);
> out_put_fsi:
> devpts_release(fsi);
> + mntput(mnt);
> out_free_file:
> tty_free_file(filp);
> return retval;
> diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
> index 108df2e3602c..44dfbca9306f 100644
> --- a/fs/devpts/inode.c
> +++ b/fs/devpts/inode.c
> @@ -133,7 +133,7 @@ static inline struct pts_fs_info *DEVPTS_SB(struct super_block *sb)
> return sb->s_fs_info;
> }
>
> -struct pts_fs_info *devpts_acquire(struct file *filp)
> +struct pts_fs_info *devpts_acquire(struct file *filp, struct vfsmount **ptsmnt)
> {
> struct pts_fs_info *result;
> struct path path;
> @@ -142,6 +142,7 @@ struct pts_fs_info *devpts_acquire(struct file *filp)
>
> path = filp->f_path;
> path_get(&path);
> + *ptsmnt = NULL;
>
> /* Has the devpts filesystem already been found? */
> sb = path.mnt->mnt_sb;
> @@ -165,6 +166,7 @@ struct pts_fs_info *devpts_acquire(struct file *filp)
> * pty code needs to hold extra references in case of last /dev/tty close
> */
> atomic_inc(&sb->s_active);
> + *ptsmnt = mntget(path.mnt);
> result = DEVPTS_SB(sb);
>
> out:
> diff --git a/include/linux/devpts_fs.h b/include/linux/devpts_fs.h
> index 277ab9af9ac2..7883e901f65c 100644
> --- a/include/linux/devpts_fs.h
> +++ b/include/linux/devpts_fs.h
> @@ -19,7 +19,7 @@
>
> struct pts_fs_info;
>
> -struct pts_fs_info *devpts_acquire(struct file *);
> +struct pts_fs_info *devpts_acquire(struct file *, struct vfsmount **ptsmnt);
> void devpts_release(struct pts_fs_info *);
>
> int devpts_new_index(struct pts_fs_info *);
Linus Torvalds <[email protected]> writes:
> On Wed, Aug 16, 2017 at 12:56 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> So the fact that we _don't_ get the right pathname for the pts entry
>> here means that something got screwed up in setting filp->f_path to
>> the right thing. We have all the code in place that _tries_ to do it,
>> but it clearly has a bug somewhere.
>
> Ok, I think I see what the bug is, although I don't have a fix for it yet.
>
> We generate the path largely correctly: the path has a nice dentry
> that contains the right pts number, and has the right parent pointer
> that points to the root of the pts mount.
>
> And we also fill in the path 'mnt' field. Everything should be fine.
>
> Except when we actually hit that root dentry of the pts mount, the
> code in prepend_path() hits this condition:
>
> if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
> struct mount *parent = ACCESS_ONCE(mnt->mnt_parent);
> /* Escaped? */
> if (dentry != vfsmnt->mnt_root) {
>
> and we break out, and reset the path to '/' because we think it
> somehow escaped out of the user namespace.
Escaped it's bind mount actually. There should always be a path
from mnt_root to a dentry under that mount point. In some rare caseses
involving bind mounts a rename that moves a dentry from one directory to
another can result in dentries that are not reachable from mnt_root.
As those entries do not have a path in a meaningful sense setting the
path to '/' is the best we can do.
This condition should be limited to bind mounts as any dentry on a
filesystem is descendent from the filesystems root directory.
The rest of your analysis below is correct.
My apologies for the pendantic reply. I am repling just so that someone
doesn't find this in an email archive 20 years from now and become
impossibly confused.
> So it looks like we filled in the path with the *wrong* mount information.
>
> And THAT in turn is because we fill the path with the mount
> information for the "/dev/ptmx" field - which is *not* in the
> /dev/pts/ mount - that's the mount for '/dev'.
>
> So we have a dentry and a mnt, but they simply aren't paired up correctly.
>
> And you can see this with your test program: if you open /dev/pts/ptmx
> for the master, it actually works correctly (but you need to make sure
> the permissions for that ptmx node allow that).
>
> Anyway, I know what's wrong, next step is to figure out what the fix is.
>
> Linus
Christian Brauner <[email protected]> writes:
> On Wed, Aug 16, 2017 at 11:45 PM, Linus Torvalds
> <[email protected]> wrote:
>> On Wed, Aug 16, 2017 at 2:37 PM, Christian Brauner
>> <[email protected]> wrote:
>>>> And Christian, if you can beat on this, that would be good.
>>>
>>> Yes, I can pound on this nicely with liblxc. We have patch
>>> ( https://github.com/lxc/lxc/pull/1728 ) up for review that
>>> allocates pty fds from private devpts mounts in different namespaces
>>> and sends those fds around between different namespaces.
>>
>> Good. Testing that this works with different pts filesystems in
>> different places is exactly the kind of thing I'd like to see. I only
>> tested with my single pts filesystem that is mounted at /dev/pts, and
>> making sure it works when there are multiple mounts and in different
>> places is exactly the kind of testing this should get.
>
> I'm compiling a kernel now and depending on how good the in-flight
> wifi is I try to test this right away and answer here if that helps. If the
> in-flight wifi sucks it might take me until tomorrow.
Linus has merged the fix but have you been able to test and verify all
is well from your side?
Eric
On Wed, Aug 23, 2017 at 10:31:53AM -0500, Eric W. Biederman wrote:
> Christian Brauner <[email protected]> writes:
>
> > On Wed, Aug 16, 2017 at 11:45 PM, Linus Torvalds
> > <[email protected]> wrote:
> >> On Wed, Aug 16, 2017 at 2:37 PM, Christian Brauner
> >> <[email protected]> wrote:
> >>>> And Christian, if you can beat on this, that would be good.
> >>>
> >>> Yes, I can pound on this nicely with liblxc. We have patch
> >>> ( https://github.com/lxc/lxc/pull/1728 ) up for review that
> >>> allocates pty fds from private devpts mounts in different namespaces
> >>> and sends those fds around between different namespaces.
> >>
> >> Good. Testing that this works with different pts filesystems in
> >> different places is exactly the kind of thing I'd like to see. I only
> >> tested with my single pts filesystem that is mounted at /dev/pts, and
> >> making sure it works when there are multiple mounts and in different
> >> places is exactly the kind of testing this should get.
> >
> > I'm compiling a kernel now and depending on how good the in-flight
> > wifi is I try to test this right away and answer here if that helps. If the
> > in-flight wifi sucks it might take me until tomorrow.
>
> Linus has merged the fix but have you been able to test and verify all
> is well from your side?
Hi Eric,
Sorry for the late reply! So I've tested the patch and it does what I expect it
to do, i.e. it places the correct path as the content of /proc/<pid>/fd/<n>.
However, if I'm correct not in all cases. But I need to confirm this first and I
didn't want to start pointless discussions before having something reliable.
Thanks!
The reason for the late reply is that I was investigating some other "weirdness"
related to this patch which also - I believe - touches on the bind-mount
escaping you mentioned in your previous mail. It relates to an idea I had in
mind for a while now but never thought through sufficiently. I hope to find some
time on the weekend to think about this more clearly. I had hoped to bundle this
up with my testing but didn't get around to it.
Christian
Hi
On 2017-08-16, Linus Torvalds wrote:
> On Wed, Aug 16, 2017 at 4:51 PM, Eric W. Biederman
> <[email protected]> wrote:
[...]
> Maybe this attached patch is better anyway. It's smaller, because it
> keeps more closely to the old code, and just adds a mntput() in all
> the exit cases, and depends on the "path_get()" to have incremented
> the mnt refcount one extra time.
>
> Can you find something in this one?
>
> ENTIRELY UNTESTED!
This patch[1] as part of 4.13-rc6 (up to, at least,
v4.13-rc6-45-g6470812e2226) introduces a regression for me when using
pbuilder 0.228.7[2] (a helper to build Debian packages in a chroot and
to create and update its chroots) when trying to umount /dev/ptmx (inside
the chroot) on Debian/ unstable (full log and pbuilder configuration
file[3] attached).
[...]
Setting up build-essential (12.3) ...
Processing triggers for libc-bin (2.24-15) ...
I: unmounting dev/ptmx filesystem
W: Could not unmount dev/ptmx: umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)
W: Retrying to unmount dev/ptmx in 5s
umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)
Could not unmount dev/ptmx, some programs might
still be using files in /proc (klogd?).
Please check and kill these processes manually
so that I can unmount dev/ptmx. Last umount error was:
umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)
[...]
lsof isn't revealing (but this might point towards gvfs 1.30.4-1+b1
involvement), fuser -k doesn't release the ressource.
Kernel v4.13-rc5 and before (at least 4.11.x and 4.12.x) are not
affected, but this problem is reliably reproducible on three different
x86_64 systems running Debian/ unstable when the host is running a
kernel >=4.6.13-rc6. Unfortunately I haven't really found an easier/
smaller way to reproduce this issue yet, but creating a new build
chroot[4] always triggers this problem, updating an existing build
chroot[5] (which also mounts and umounts /dev/ptmx) triggers most of
the time, but not reliably - building a package (e.g. the kernel, also
mounting and umounting /dev/ptmx) also triggered this issue at least
once (but I didn't try this more often).
My git bisection log is this:
$ git bisect log
git bisect start
# good: [ef954844c7ace62f773f4f23e28d2d915adc419f] Linux 4.13-rc5
git bisect good ef954844c7ace62f773f4f23e28d2d915adc419f
# bad: [14ccee78fc82f5512908f4424f541549a5705b89] Linux 4.13-rc6
git bisect bad 14ccee78fc82f5512908f4424f541549a5705b89
# bad: [cb247857f3dae0bdb843362c35027a0066b963a4] Merge tag 'sound-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad cb247857f3dae0bdb843362c35027a0066b963a4
# good: [88a5c690b66110ad255380d8f629c629cf6ca559] bpf: fix bpf_trace_printk on 32 bit archs
git bisect good 88a5c690b66110ad255380d8f629c629cf6ca559
# bad: [3bc6c906eacec34f0d8dcfd3c7e4513edf152297] Merge branch 'parisc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
git bisect bad 3bc6c906eacec34f0d8dcfd3c7e4513edf152297
# good: [40c6d1b9e2fc4251ca19fa69398f6fa34e813e27] Merge tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
git bisect good 40c6d1b9e2fc4251ca19fa69398f6fa34e813e27
# good: [ac9a40905a610fb02086a37b11ff4bf046825a88] Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect good ac9a40905a610fb02086a37b11ff4bf046825a88
# bad: [99f781b1bfc199ec8eb86d4e015920faf79d5d57] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
git bisect bad 99f781b1bfc199ec8eb86d4e015920faf79d5d57
# good: [41e327b586762833e48b3703d53312ac32f05f24] quota: correct space limit check
git bisect good 41e327b586762833e48b3703d53312ac32f05f24
Reverting just c8c03f1858331e85d397bacccd34ef409aae993c from
v4.13-rc6-65-g2acf097f16ab reliably fixes the problem for me.
Regards
Stefan Lippers-Hollmann
[1] commit c8c03f1858331e85d397bacccd34ef409aae993c (HEAD)
Author: Linus Torvalds <[email protected]>
Date: Wed Aug 16 17:08:07 2017 -0700
Subject: pty: fix the cached path of the pty slave file
descriptor in the master
[2] https://packages.qa.debian.org/p/pbuilder.html
https://pbuilder.alioth.debian.org/
https://anonscm.debian.org/git/pbuilder/pbuilder.git
mounting/ umounting /dev/ptmx happens in
https://anonscm.debian.org/git/pbuilder/pbuilder.git/tree/pbuilder-modules
[3] the configuration file as attached relies on BUILDUSERNAME="pbuilder"
and BUILDUSERID="$(getent passwd $BUILDUSERNAME | cut -d\: -f3)",
commenting those out should be possible, my full BUILDUSER* setup
is:
# addgroup --system pbuilder
# adduser --system --disabled-password --ingroup pbuilder --gecos "pbuilder buildd user" --home /var/run/pbuilder pbuilder
# chown pbuilder:pbuilder /var/cache/pbuilder/deps /var/cache/pbuilder/build-result
# chmod 2775 /var/cache/pbuilder/deps /var/cache/pbuilder/build-result
[4] sudo /usr/sbin/pbuilder create --configfile pbuilderrc.debian.sid.amd64
[5] sudo /usr/sbin/pbuilder update --configfile pbuilderrc.debian.sid.amd64
On Wed, Aug 23, 2017 at 5:24 PM, Stefan Lippers-Hollmann <[email protected]> wrote:
>
> This patch[1] as part of 4.13-rc6 (up to, at least,
> v4.13-rc6-45-g6470812e2226) introduces a regression for me when using
> pbuilder 0.228.7[2] (a helper to build Debian packages in a chroot and
> to create and update its chroots) when trying to umount /dev/ptmx (inside
> the chroot) on Debian/ unstable (full log and pbuilder configuration
> file[3] attached).
>
> [...]
> Setting up build-essential (12.3) ...
> Processing triggers for libc-bin (2.24-15) ...
> I: unmounting dev/ptmx filesystem
> W: Could not unmount dev/ptmx: umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy
Yes, that patch definitely keeps a reference to the pts filesystem
around while a pty is open.
We always used to do that, but we did it differently - we would keep
the 's_active' count elevated so that the superblock never went away,
even after it was unmounted.
Now it does an actual mntget(), and that makes umount _notice_ that
the filesystem is still busy.
How annoying.
Because in a very real sehse the filesystem really is busy, but we
used to hide it (perhaps on purpose - it's possible that people hit
this problem before).
Let me try to think about alteratives. Clearly this is a regression
and I need to fix it, I just need to figure out _how_.
Linus
On Wed, Aug 23, 2017 at 5:42 PM, Linus Torvalds
<[email protected]> wrote:
>
> Let me try to think about alteratives. Clearly this is a regression
> and I need to fix it, I just need to figure out _how_.
Ok, sadly, I think it's unfixable with the current model.
We literally used to keep the wrong 'struct path' around, and sadly,
fixing the struct path to point to the right vfsmount fundamentally
means that we'd be keeping the mount count elevated for that pts
mount.
And that fundamentally means that umount() will return -EBUSY. There's
no way around it.
So I think I will have to just revert that fix.
Damn.
Now, I think there's a way forward: get rid of the 'struct path'
(which is bogus anyway), and only remember the pts denty.
Then, at TIOCGPTPEER time (which is why we currently have that 'struct
path' anyway), look up the right 'vfsmount' by looking up the 'pts'
path again.
That's a rather bigger patch than the one I'll have to revert, I'm afraid ;(
Linus
Linus Torvalds <[email protected]> writes:
> On Wed, Aug 23, 2017 at 5:24 PM, Stefan Lippers-Hollmann <[email protected]> wrote:
>>
>> This patch[1] as part of 4.13-rc6 (up to, at least,
>> v4.13-rc6-45-g6470812e2226) introduces a regression for me when using
>> pbuilder 0.228.7[2] (a helper to build Debian packages in a chroot and
>> to create and update its chroots) when trying to umount /dev/ptmx (inside
>> the chroot) on Debian/ unstable (full log and pbuilder configuration
>> file[3] attached).
>>
>> [...]
>> Setting up build-essential (12.3) ...
>> Processing triggers for libc-bin (2.24-15) ...
>> I: unmounting dev/ptmx filesystem
>> W: Could not unmount dev/ptmx: umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy
>
> Yes, that patch definitely keeps a reference to the pts filesystem
> around while a pty is open.
>
> We always used to do that, but we did it differently - we would keep
> the 's_active' count elevated so that the superblock never went away,
> even after it was unmounted.
>
> Now it does an actual mntget(), and that makes umount _notice_ that
> the filesystem is still busy.
>
> How annoying.
>
> Because in a very real sehse the filesystem really is busy, but we
> used to hide it (perhaps on purpose - it's possible that people hit
> this problem before).
>
> Let me try to think about alteratives. Clearly this is a regression
> and I need to fix it, I just need to figure out _how_.
The new behavior is that when we open ptmx we cache a path the slave
pty.
If instead of caching that path we call devpts_acquire to compute the
mount point of the dentry we should be able to skip caching mountpoint
in ptmx_open.
That should trivially remove the regression.
We will have to fail if someone crazy unmounted the devpts filesystem
before we ask for the peer file descriptor. But otherwise the behavior
should be exactly the same.
Eric
On Wed, Aug 23, 2017 at 6:25 PM, Eric W. Biederman
<[email protected]> wrote:
>
> The new behavior is that when we open ptmx we cache a path the slave
> pty.
Yes. It's not strictly "new", though - we've done that for a while,
and if you used /dev/pts/ptmx you'd even have had the *right* path
for a while ;)
And this exact issue that Stefan is reporting.
But nobody ever used /dev/pts/ptmx, so nobody got the right path, and
nobody kept an extra reference to the pts mount.
> If instead of caching that path we call devpts_acquire to compute the
> mount point of the dentry we should be able to skip caching mountpoint
> in ptmx_open.
Yes, that's my plan - get rid of the 'struct path' entirely, make
'driver_data' point to just the dentry, and then at TIOCGPTPEER time
just re-create the path by looking up the vfsmount again (by doingf
that "pts" lookup again)
It should all be _fairly_ straightforward, but it's definitely a
rather bigger change than that "just fix the path" patch was.
Anyway, it's already reverted in my tree, I'll push it out after I've
verified that there isn't some silly build issue (there won't be, but
I've been burned by "this is obviously correct" too many times, so now
I always build before pushing anything out unless I'm on my laptop of
something when it's just too inconvenient).
Linus
On Wed, Aug 23, 2017 at 6:32 PM, Linus Torvalds
<[email protected]> wrote:
>
> It should all be _fairly_ straightforward, but it's definitely a
> rather bigger change than that "just fix the path" patch was.
Argh. And it's *not* fairly straightforward, because the
tty_operations "ioctl()" function pointer only gets 'struct tty *'.
So in the TIOCGPTPEER path, we don't actually have access to the file
pointer of the fd we're doing the ioctl on.
And that's where the 'struct path' to the 'ptmx' node is - which we
need to then look up the 'pts' directory.
How very annoying. I think that's why we did it all at ptmx_open()
time, because then we had all the information.
Linus
On Wed, Aug 23, 2017 at 6:49 PM, Linus Torvalds
<[email protected]> wrote:
>
> Argh. And it's *not* fairly straightforward, because the
> tty_operations "ioctl()" function pointer only gets 'struct tty *'.
>
> So in the TIOCGPTPEER path, we don't actually have access to the file
> pointer of the fd we're doing the ioctl on.
>
> And that's where the 'struct path' to the 'ptmx' node is - which we
> need to then look up the 'pts' directory.
>
> How very annoying. I think that's why we did it all at ptmx_open()
> time, because then we had all the information.
Anyway, the revert is pushed out. So we're back to the old behavior
that gives the wrong pathname in /proc.
And I think I can handle the lack of a 'struct file *' to the ioctl
operations by just special-casing TIOCGPTPEER directly in tty_ioctl()
itself.
That's where we handle "generic" tty ioctls, and doing pty stuff there
is kind of wrong, but pty's are special.
But I think I'll leave it for tomorrow. So Eric, if you feel like
looking at this, I'd appreciate it.
Linus
Linus Torvalds <[email protected]> writes:
> On Wed, Aug 23, 2017 at 6:49 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Argh. And it's *not* fairly straightforward, because the
>> tty_operations "ioctl()" function pointer only gets 'struct tty *'.
>>
>> So in the TIOCGPTPEER path, we don't actually have access to the file
>> pointer of the fd we're doing the ioctl on.
>>
>> And that's where the 'struct path' to the 'ptmx' node is - which we
>> need to then look up the 'pts' directory.
>>
>> How very annoying. I think that's why we did it all at ptmx_open()
>> time, because then we had all the information.
>
> Anyway, the revert is pushed out. So we're back to the old behavior
> that gives the wrong pathname in /proc.
>
> And I think I can handle the lack of a 'struct file *' to the ioctl
> operations by just special-casing TIOCGPTPEER directly in tty_ioctl()
> itself.
>
> That's where we handle "generic" tty ioctls, and doing pty stuff there
> is kind of wrong, but pty's are special.
>
> But I think I'll leave it for tomorrow. So Eric, if you feel like
> looking at this, I'd appreciate it.
This is so far untested (except for compiling) but I think this will
work.
I factor out devpts_ptmx_path out of devpts_acquire so the code
doesn't have to do unnecessary and confusing work, and add the
new function devpts_mnt.
I revert the code to keep anything except a dentry in
tty->link->driver_data.
And reduce the peer opening to a single function ptm_open_peer.
It takes lines of code but the result is very straightforward code.
Eric
drivers/tty/pty.c | 63 ++++++++++++++++++++---------------------------
drivers/tty/tty_io.c | 3 +++
fs/devpts/inode.c | 60 +++++++++++++++++++++++++++++++++-----------
include/linux/devpts_fs.h | 10 ++++++++
4 files changed, 85 insertions(+), 51 deletions(-)
diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 284749fb0f6b..269e6ea65a33 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -69,13 +69,8 @@ static void pty_close(struct tty_struct *tty, struct file *filp)
#ifdef CONFIG_UNIX98_PTYS
if (tty->driver == ptm_driver) {
mutex_lock(&devpts_mutex);
- if (tty->link->driver_data) {
- struct path *path = tty->link->driver_data;
-
- devpts_pty_kill(path->dentry);
- path_put(path);
- kfree(path);
- }
+ if (tty->link->driver_data)
+ devpts_pty_kill(tty->link->driver_data);
mutex_unlock(&devpts_mutex);
}
#endif
@@ -607,25 +602,25 @@ static inline void legacy_pty_init(void) { }
static struct cdev ptmx_cdev;
/**
- * pty_open_peer - open the peer of a pty
- * @tty: the peer of the pty being opened
+ * ptm_open_peer - open the peer of a pty
+ * @master: the open struct file of the ptmx device node
+ * @tty: the master of the pty being opened
+ * @flags: the flags for open
*
- * Open the cached dentry in tty->link, providing a safe way for userspace
- * to get the slave end of a pty (where they have the master fd and cannot
- * access or trust the mount namespace /dev/pts was mounted inside).
+ * Provide a race free way for userspace to open the slave end of a pty
+ * (where they have the master fd and cannot access or trust the mount
+ * namespace /dev/pts was mounted inside).
*/
-static struct file *pty_open_peer(struct tty_struct *tty, int flags)
-{
- if (tty->driver->subtype != PTY_TYPE_MASTER)
- return ERR_PTR(-EIO);
- return dentry_open(tty->link->driver_data, flags, current_cred());
-}
-
-static int pty_get_peer(struct tty_struct *tty, int flags)
+int ptm_open_peer(struct file *master, struct tty_struct *tty, int flags)
{
int fd = -1;
struct file *filp = NULL;
int retval = -EINVAL;
+ struct path path;
+
+ if ((tty->driver->type != TTY_DRIVER_TYPE_PTY) ||
+ (tty->driver->subtype != PTY_TYPE_MASTER))
+ return -EIO;
fd = get_unused_fd_flags(0);
if (fd < 0) {
@@ -633,7 +628,16 @@ static int pty_get_peer(struct tty_struct *tty, int flags)
goto err;
}
- filp = pty_open_peer(tty, flags);
+ /* Compute the slave's path */
+ path.mnt = devpts_mnt(filp);
+ if (IS_ERR(path.mnt)) {
+ retval = PTR_ERR(path.mnt);
+ goto err_put;
+ }
+ path.dentry = tty->link->driver_data;
+
+ filp = dentry_open(&path, flags, current_cred());
+ mntput(path.mnt);
if (IS_ERR(filp)) {
retval = PTR_ERR(filp);
goto err_put;
@@ -662,8 +666,6 @@ static int pty_unix98_ioctl(struct tty_struct *tty,
return pty_get_pktmode(tty, (int __user *)arg);
case TIOCGPTN: /* Get PT Number */
return put_user(tty->index, (unsigned int __user *)arg);
- case TIOCGPTPEER: /* Open the other end */
- return pty_get_peer(tty, (int) arg);
case TIOCSIG: /* Send signal to other side of pty */
return pty_signal(tty, (int) arg);
}
@@ -791,7 +793,6 @@ static int ptmx_open(struct inode *inode, struct file *filp)
{
struct pts_fs_info *fsi;
struct tty_struct *tty;
- struct path *pts_path;
struct dentry *dentry;
int retval;
int index;
@@ -845,26 +846,16 @@ static int ptmx_open(struct inode *inode, struct file *filp)
retval = PTR_ERR(dentry);
goto err_release;
}
- /* We need to cache a fake path for TIOCGPTPEER. */
- pts_path = kmalloc(sizeof(struct path), GFP_KERNEL);
- if (!pts_path)
- goto err_release;
- pts_path->mnt = filp->f_path.mnt;
- pts_path->dentry = dentry;
- path_get(pts_path);
- tty->link->driver_data = pts_path;
+ tty->link->driver_data = dentry;
retval = ptm_driver->ops->open(tty, filp);
if (retval)
- goto err_path_put;
+ goto err_release;
tty_debug_hangup(tty, "opening (count=%d)\n", tty->count);
tty_unlock(tty);
return 0;
-err_path_put:
- path_put(pts_path);
- kfree(pts_path);
err_release:
tty_unlock(tty);
// This will also put-ref the fsi
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 974b13d24401..ba3371449a5c 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -2518,6 +2518,9 @@ long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case TIOCSSERIAL:
tty_warn_deprecated_flags(p);
break;
+ case TIOCGPTPEER:
+ retval = ptm_open_peer(file, tty, (int)arg);
+ break;
default:
retval = tty_jobctrl_ioctl(tty, real_tty, file, cmd, arg);
if (retval != -ENOIOCTLCMD)
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 108df2e3602c..6e8816cf7d54 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -133,37 +133,67 @@ static inline struct pts_fs_info *DEVPTS_SB(struct super_block *sb)
return sb->s_fs_info;
}
-struct pts_fs_info *devpts_acquire(struct file *filp)
+static int devpts_ptmx_path(struct path *path)
{
- struct pts_fs_info *result;
- struct path path;
struct super_block *sb;
- int err;
-
- path = filp->f_path;
- path_get(&path);
+ int err = 0;
/* Has the devpts filesystem already been found? */
- sb = path.mnt->mnt_sb;
+ sb = path->mnt->mnt_sb;
if (sb->s_magic != DEVPTS_SUPER_MAGIC) {
/* Is a devpts filesystem at "pts" in the same directory? */
- err = path_pts(&path);
- if (err) {
- result = ERR_PTR(err);
+ err = path_pts(path);
+ if (err)
goto out;
- }
/* Is the path the root of a devpts filesystem? */
- result = ERR_PTR(-ENODEV);
- sb = path.mnt->mnt_sb;
+ err = -ENODEV;
+ sb = path->mnt->mnt_sb;
if ((sb->s_magic != DEVPTS_SUPER_MAGIC) ||
- (path.mnt->mnt_root != sb->s_root))
+ (path->mnt->mnt_root != sb->s_root))
goto out;
}
+out:
+ return err;
+}
+
+struct vfsmount *devpts_mnt(struct file *filp)
+{
+ struct path path;
+ int err;
+
+ path = filp->f_path;
+ path_get(&path);
+
+ err = devpts_ptmx_path(&path);
+ if (err) {
+ path_put(&path);
+ path.mnt = ERR_PTR(err);
+ }
+ return path.mnt;
+}
+
+struct pts_fs_info *devpts_acquire(struct file *filp)
+{
+ struct pts_fs_info *result;
+ struct path path;
+ struct super_block *sb;
+ int err;
+
+ path = filp->f_path;
+ path_get(&path);
+
+ err = devpts_ptmx_path(&path);
+ if (err) {
+ result = ERR_PTR(err);
+ goto out;
+ }
+
/*
* pty code needs to hold extra references in case of last /dev/tty close
*/
+ sb = path.mnt->mnt_sb;
atomic_inc(&sb->s_active);
result = DEVPTS_SB(sb);
diff --git a/include/linux/devpts_fs.h b/include/linux/devpts_fs.h
index 277ab9af9ac2..e27c548acfb0 100644
--- a/include/linux/devpts_fs.h
+++ b/include/linux/devpts_fs.h
@@ -19,6 +19,7 @@
struct pts_fs_info;
+struct vfsmount *devpts_mnt(struct file *);
struct pts_fs_info *devpts_acquire(struct file *);
void devpts_release(struct pts_fs_info *);
@@ -32,6 +33,15 @@ void *devpts_get_priv(struct dentry *);
/* unlink */
void devpts_pty_kill(struct dentry *);
+/* in pty.c */
+int ptm_open_peer(struct file *master, struct tty_struct *tty, int flags);
+
+#else
+static inline int
+ptm_open_peer(struct file *master, struct tty_struct *tty, int flags)
+{
+ return -EIO;
+}
#endif
On Wed, Aug 23, 2017 at 8:11 PM, Eric W. Biederman
<[email protected]> wrote:
> -static int pty_get_peer(struct tty_struct *tty, int flags)
> +int ptm_open_peer(struct file *master, struct tty_struct *tty, int flags)
> {
> int fd = -1;
> struct file *filp = NULL;
> int retval = -EINVAL;
> + struct path path;
> +
> + if ((tty->driver->type != TTY_DRIVER_TYPE_PTY) ||
> + (tty->driver->subtype != PTY_TYPE_MASTER))
> + return -EIO;
No. Afaik, that could be a legact PTY, which wouldn't be ok.
I think you need to do
if (tty->driver != ptm_driver)
return -EIO;
which should check both that it's the unix98 pty, and that it's the master.
Maybe I'm missing something.
That check used to be implicit, in that only the unix98 pty's could
reach that pty_unix98_ioctl() function, so then testing just that it
was a master was sufficient.
> - /* We need to cache a fake path for TIOCGPTPEER. */
> - pts_path = kmalloc(sizeof(struct path), GFP_KERNEL);
> - if (!pts_path)
> - goto err_release;
> - pts_path->mnt = filp->f_path.mnt;
> - pts_path->dentry = dentry;
> - path_get(pts_path);
> - tty->link->driver_data = pts_path;
> + tty->link->driver_data = dentry;
We used to do "path_get()". Shouldn't we now use "dget()"?
But maybe the slave dentry is guaranteed to be around and we don't
need to do that. So your approach may be fine. You did remove all the
path_put() calls too, so I guess it all matches up.
So this looks like it could be fine, but I'd like to make sure.
> +struct vfsmount *devpts_mnt(struct file *filp)
> +{
> + struct path path;
> + int err;
> +
> + path = filp->f_path;
> + path_get(&path);
> +
> + err = devpts_ptmx_path(&path);
> + if (err) {
> + path_put(&path);
> + path.mnt = ERR_PTR(err);
> + }
> + return path.mnt;
> +}
That can't be right. You're leaking the dentry that you're not returning, no?
But yes, apart from those comments, this looks like what I envisioned.
Needs testing, and needs more looking at those reference counts, but
otherwise looks good.
And while the patch is a bit bigger, I do like getting rid of that
'struct path' thing, and keeping just the dentry.
Linus
Hi
On 2017-08-23, Eric W. Biederman wrote:
> Linus Torvalds <[email protected]> writes:
> > On Wed, Aug 23, 2017 at 6:49 PM, Linus Torvalds <[email protected]> wrote:
[...]
> This is so far untested (except for compiling) but I think this will
> work.
>
> I factor out devpts_ptmx_path out of devpts_acquire so the code
> doesn't have to do unnecessary and confusing work, and add the
> new function devpts_mnt.
>
> I revert the code to keep anything except a dentry in
> tty->link->driver_data.
>
> And reduce the peer opening to a single function ptm_open_peer.
>
> It takes lines of code but the result is very straightforward code.
I've given this a quick test, while it seems to fix the initial problem
with umounting /dev/ptmx, it does introduce a new one - trying to open
an xterm (KDE5's konsole to be exact) doesn't open a shell (the shell
window remains totally empty) and trying to ssh into the system fails
with "PTY allocation request failed on channel 0", logging in via a
real tty and creating a new pbuilder chroot from there succeeds.
Regards
Stefan Lippers-Hollmann
Linus Torvalds <[email protected]> writes:
> On Wed, Aug 23, 2017 at 8:11 PM, Eric W. Biederman
> <[email protected]> wrote:
>> -static int pty_get_peer(struct tty_struct *tty, int flags)
>> +int ptm_open_peer(struct file *master, struct tty_struct *tty, int flags)
>> {
>> int fd = -1;
>> struct file *filp = NULL;
>> int retval = -EINVAL;
>> + struct path path;
>> +
>> + if ((tty->driver->type != TTY_DRIVER_TYPE_PTY) ||
>> + (tty->driver->subtype != PTY_TYPE_MASTER))
>> + return -EIO;
>
> No. Afaik, that could be a legact PTY, which wouldn't be ok.
>
> I think you need to do
>
> if (tty->driver != ptm_driver)
> return -EIO;
>
> which should check both that it's the unix98 pty, and that it's the master.
>
> Maybe I'm missing something.
>
> That check used to be implicit, in that only the unix98 pty's could
> reach that pty_unix98_ioctl() function, so then testing just that it
> was a master was sufficient.
No. That seems correct. Change made. If nothing else it is cheaper
and clearer so even if the other version wasn't wrong it is a good idea.
>> - /* We need to cache a fake path for TIOCGPTPEER. */
>> - pts_path = kmalloc(sizeof(struct path), GFP_KERNEL);
>> - if (!pts_path)
>> - goto err_release;
>> - pts_path->mnt = filp->f_path.mnt;
>> - pts_path->dentry = dentry;
>> - path_get(pts_path);
>> - tty->link->driver_data = pts_path;
>> + tty->link->driver_data = dentry;
>
> We used to do "path_get()". Shouldn't we now use "dget()"?
>
> But maybe the slave dentry is guaranteed to be around and we don't
> need to do that. So your approach may be fine. You did remove all the
> path_put() calls too, so I guess it all matches up.
>
> So this looks like it could be fine, but I'd like to make sure.
That change is a revert to the old v4.12 code. So it is definitely
not regression inducing.
Further devpts_pty_new allocates a dentry keeps it in the devpts
filesystem. The dentry is good until devpts_pty_kill where the
dentry is unlinked and killed.
I figure not differences from v4.12 if the logic hasn't changed
seems a good good way to cut down on the search for bugs/regressions.
>> +struct vfsmount *devpts_mnt(struct file *filp)
>> +{
>> + struct path path;
>> + int err;
>> +
>> + path = filp->f_path;
>> + path_get(&path);
>> +
>> + err = devpts_ptmx_path(&path);
>> + if (err) {
>> + path_put(&path);
>> + path.mnt = ERR_PTR(err);
>> + }
>> + return path.mnt;
>> +}
>
> That can't be right. You're leaking the dentry that you're not returning, no?
Correct. That is buggy. Will fix before I resend.
> But yes, apart from those comments, this looks like what I envisioned.
>
> Needs testing, and needs more looking at those reference counts, but
> otherwise looks good.
>
> And while the patch is a bit bigger, I do like getting rid of that
> 'struct path' thing, and keeping just the dentry.
Eric
Stefan Lippers-Hollmann <[email protected]> writes:
> Hi
>
> On 2017-08-23, Eric W. Biederman wrote:
>> Linus Torvalds <[email protected]> writes:
>> > On Wed, Aug 23, 2017 at 6:49 PM, Linus Torvalds <[email protected]> wrote:
> [...]
>> This is so far untested (except for compiling) but I think this will
>> work.
>>
>> I factor out devpts_ptmx_path out of devpts_acquire so the code
>> doesn't have to do unnecessary and confusing work, and add the
>> new function devpts_mnt.
>>
>> I revert the code to keep anything except a dentry in
>> tty->link->driver_data.
>>
>> And reduce the peer opening to a single function ptm_open_peer.
>>
>> It takes lines of code but the result is very straightforward code.
>
> I've given this a quick test, while it seems to fix the initial problem
> with umounting /dev/ptmx, it does introduce a new one - trying to open
> an xterm (KDE5's konsole to be exact) doesn't open a shell (the shell
> window remains totally empty) and trying to ssh into the system fails
> with "PTY allocation request failed on channel 0", logging in via a
> real tty and creating a new pbuilder chroot from there succeeds.
Weird. There is at least one leak inducing bug in there. So perhaps
that is the cause. *Scratches my head* Are you also testing the new
ioctl?
I will resend shortly with a version that has no differences in the old
code from v4.12 (other than the refactoring in fs/devpts/inode.c).
Which should make it very hard to have a regression.
Eric
On Thu, Aug 24, 2017 at 8:54 AM, Eric W. Biederman
<[email protected]> wrote:
>
> Weird. There is at least one leak inducing bug in there. So perhaps
> that is the cause. *Scratches my head* Are you also testing the new
> ioctl?
I can verify, and it's not the leak. I tried your patch with that leak
fix (and the other fixes I pointed out), and I see similar issues that
Stefan noted.
With gnome-terminal, the terminal window opens, and then it says
Failed to open PTY: No such device
and the terminal obviously doesn't work.
And I see the error: your devpts_ptmx_path() code always returns
-ENODEV because you set the error unconditionally before an error
check, and then you don't clear it if the error didn't happen.
I'll test the fix.
Linus
On Thu, Aug 24, 2017 at 10:52 AM, Linus Torvalds
<[email protected]> wrote:
>
> I'll test the fix.
Yes, that was it, and things work with that fixed.
But that still fails the TIOCGPTPEER ioctl with an NULL pointer
dereference in devpts_mnt, I probably messed up when I fixed the
dentry refcount leak.
Linus
On Thu, Aug 24, 2017 at 11:06 AM, Linus Torvalds
<[email protected]> wrote:
>
> But that still fails the TIOCGPTPEER ioctl with an NULL pointer
> dereference in devpts_mnt, I probably messed up when I fixed the
> dentry refcount leak.
No, that was another bug in the original patch.
ptm_open_peer() passed in 'filp' to devpts_mnt(), but it should
obviously pass in the 'master' one.
filp is NULL at that point.
The attached patch should work. It's Eric's original patch with
various cleanups and fixes.
Linus
On Thu, Aug 24, 2017 at 11:13 AM, Linus Torvalds
<[email protected]> wrote:
>
> The attached patch should work. It's Eric's original patch with
> various cleanups and fixes.
No, one more bug in there: Eric did
+ case TIOCGPTPEER:
+ retval = ptm_open_peer(file, tty, (int)arg);
+ break;
to call that ptm_open_peer(), but that's bogus. The "break;" will just
cause it to continue with the tty ioctl handling, and result in
-ENOTTY in the end.
So it actually *did* open the slave, but then threw the returned fd value away.
It should just do a "return ptm_open_peer(file, tty, (int)arg);" instead.
Linus
On Thu, Aug 24, 2017 at 11:31 AM, Linus Torvalds
<[email protected]> wrote:
>
> It should just do a "return ptm_open_peer(file, tty, (int)arg);" instead.
Here's the actual tested patch. It "WorksForMe(tm)", including the
TIOCGPTPEER ioctl, and also verified that it gets the pathname right
in /proc, which was the original problem.
But I did *not* check that pbuilder is still happy. Stefan?
Linus
Linus Torvalds <[email protected]> writes:
> On Thu, Aug 24, 2017 at 11:06 AM, Linus Torvalds
> <[email protected]> wrote:
>>
>> But that still fails the TIOCGPTPEER ioctl with an NULL pointer
>> dereference in devpts_mnt, I probably messed up when I fixed the
>> dentry refcount leak.
>
> No, that was another bug in the original patch.
>
> ptm_open_peer() passed in 'filp' to devpts_mnt(), but it should
> obviously pass in the 'master' one.
>
> filp is NULL at that point.
>
> The attached patch should work. It's Eric's original patch with
> various cleanups and fixes.
It still retains two of my bugs. I forgot to return from ioctl in
tty_io.c without which the return code is lost. I failed to verify
that the devpts filesystem we find via path lookup is the same one
that was found when /dev/ptmx was opened.
Here is my tested version of the patch.
It survives light testing here.
Eric
From: "Eric W. Biederman" <[email protected]>
Date: Thu, 24 Aug 2017 10:55:19 -0500
Subject: [PATCH] pty: Repair TIOCGPTPEER
The implementation of TIOCGPTPEER has two issues.
When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong
vfsmount is passed to dentry_open. Which results in the kernel displaying
the wrong pathname for the peer.
The second is simply by caching the vfsmount and dentry of the peer it leaves
them open, in a way they were not previously Which because of the inreased
reference counts can cause unnecessary behaviour differences resulting in
regressions.
To fix these move the ioctl into tty_io.c at a generic level allowing
the ioctl to have access to the struct file on which the ioctl is
being called. This allows the path of the slave to be derived when
opening the slave through TIOCGPTPEER instead of requiring the path to
the slave be cached. Thus removing the need for caching the path.
A new function devpts_ptmx_path is factored out of devpts_acquire and
used to implement a function devpts_mntget. The new function devpts_mntget
takes a filp to perform the lookup on and fsi so that it can confirm
that the superblock that is found by devpts_ptmx_path is the proper superblock.
Fixes: 54ebbfb16034 ("tty: add TIOCGPTPEER ioctl")
Reported-by: Christian Brauner <[email protected]>
Reported-by: Stefan Lippers-Hollmann <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
drivers/tty/pty.c | 62 +++++++++++++++++++--------------------------
drivers/tty/tty_io.c | 3 +++
fs/devpts/inode.c | 64 ++++++++++++++++++++++++++++++++++++-----------
include/linux/devpts_fs.h | 10 ++++++++
4 files changed, 89 insertions(+), 50 deletions(-)
diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 284749fb0f6b..3856bd228fa9 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -69,13 +69,8 @@ static void pty_close(struct tty_struct *tty, struct file *filp)
#ifdef CONFIG_UNIX98_PTYS
if (tty->driver == ptm_driver) {
mutex_lock(&devpts_mutex);
- if (tty->link->driver_data) {
- struct path *path = tty->link->driver_data;
-
- devpts_pty_kill(path->dentry);
- path_put(path);
- kfree(path);
- }
+ if (tty->link->driver_data)
+ devpts_pty_kill(tty->link->driver_data);
mutex_unlock(&devpts_mutex);
}
#endif
@@ -607,25 +602,24 @@ static inline void legacy_pty_init(void) { }
static struct cdev ptmx_cdev;
/**
- * pty_open_peer - open the peer of a pty
- * @tty: the peer of the pty being opened
+ * ptm_open_peer - open the peer of a pty
+ * @master: the open struct file of the ptmx device node
+ * @tty: the master of the pty being opened
+ * @flags: the flags for open
*
- * Open the cached dentry in tty->link, providing a safe way for userspace
- * to get the slave end of a pty (where they have the master fd and cannot
- * access or trust the mount namespace /dev/pts was mounted inside).
+ * Provide a race free way for userspace to open the slave end of a pty
+ * (where they have the master fd and cannot access or trust the mount
+ * namespace /dev/pts was mounted inside).
*/
-static struct file *pty_open_peer(struct tty_struct *tty, int flags)
-{
- if (tty->driver->subtype != PTY_TYPE_MASTER)
- return ERR_PTR(-EIO);
- return dentry_open(tty->link->driver_data, flags, current_cred());
-}
-
-static int pty_get_peer(struct tty_struct *tty, int flags)
+int ptm_open_peer(struct file *master, struct tty_struct *tty, int flags)
{
int fd = -1;
struct file *filp = NULL;
int retval = -EINVAL;
+ struct path path;
+
+ if (tty->driver != ptm_driver)
+ return -EIO;
fd = get_unused_fd_flags(0);
if (fd < 0) {
@@ -633,7 +627,16 @@ static int pty_get_peer(struct tty_struct *tty, int flags)
goto err;
}
- filp = pty_open_peer(tty, flags);
+ /* Compute the slave's path */
+ path.mnt = devpts_mntget(master, tty->driver_data);
+ if (IS_ERR(path.mnt)) {
+ retval = PTR_ERR(path.mnt);
+ goto err_put;
+ }
+ path.dentry = tty->link->driver_data;
+
+ filp = dentry_open(&path, flags, current_cred());
+ mntput(path.mnt);
if (IS_ERR(filp)) {
retval = PTR_ERR(filp);
goto err_put;
@@ -662,8 +665,6 @@ static int pty_unix98_ioctl(struct tty_struct *tty,
return pty_get_pktmode(tty, (int __user *)arg);
case TIOCGPTN: /* Get PT Number */
return put_user(tty->index, (unsigned int __user *)arg);
- case TIOCGPTPEER: /* Open the other end */
- return pty_get_peer(tty, (int) arg);
case TIOCSIG: /* Send signal to other side of pty */
return pty_signal(tty, (int) arg);
}
@@ -791,7 +792,6 @@ static int ptmx_open(struct inode *inode, struct file *filp)
{
struct pts_fs_info *fsi;
struct tty_struct *tty;
- struct path *pts_path;
struct dentry *dentry;
int retval;
int index;
@@ -845,26 +845,16 @@ static int ptmx_open(struct inode *inode, struct file *filp)
retval = PTR_ERR(dentry);
goto err_release;
}
- /* We need to cache a fake path for TIOCGPTPEER. */
- pts_path = kmalloc(sizeof(struct path), GFP_KERNEL);
- if (!pts_path)
- goto err_release;
- pts_path->mnt = filp->f_path.mnt;
- pts_path->dentry = dentry;
- path_get(pts_path);
- tty->link->driver_data = pts_path;
+ tty->link->driver_data = dentry;
retval = ptm_driver->ops->open(tty, filp);
if (retval)
- goto err_path_put;
+ goto err_release;
tty_debug_hangup(tty, "opening (count=%d)\n", tty->count);
tty_unlock(tty);
return 0;
-err_path_put:
- path_put(pts_path);
- kfree(pts_path);
err_release:
tty_unlock(tty);
// This will also put-ref the fsi
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 974b13d24401..10c4038c0e8d 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -2518,6 +2518,9 @@ long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case TIOCSSERIAL:
tty_warn_deprecated_flags(p);
break;
+ case TIOCGPTPEER:
+ /* Special because the struct file is needed */
+ return ptm_open_peer(file, tty, (int)arg);
default:
retval = tty_jobctrl_ioctl(tty, real_tty, file, cmd, arg);
if (retval != -ENOIOCTLCMD)
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 108df2e3602c..809da242a452 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -133,37 +133,73 @@ static inline struct pts_fs_info *DEVPTS_SB(struct super_block *sb)
return sb->s_fs_info;
}
-struct pts_fs_info *devpts_acquire(struct file *filp)
+static int devpts_ptmx_path(struct path *path)
{
- struct pts_fs_info *result;
- struct path path;
struct super_block *sb;
int err;
- path = filp->f_path;
- path_get(&path);
-
/* Has the devpts filesystem already been found? */
- sb = path.mnt->mnt_sb;
+ sb = path->mnt->mnt_sb;
if (sb->s_magic != DEVPTS_SUPER_MAGIC) {
/* Is a devpts filesystem at "pts" in the same directory? */
- err = path_pts(&path);
- if (err) {
- result = ERR_PTR(err);
+ err = path_pts(path);
+ if (err)
goto out;
- }
/* Is the path the root of a devpts filesystem? */
- result = ERR_PTR(-ENODEV);
- sb = path.mnt->mnt_sb;
+ err = -ENODEV;
+ sb = path->mnt->mnt_sb;
if ((sb->s_magic != DEVPTS_SUPER_MAGIC) ||
- (path.mnt->mnt_root != sb->s_root))
+ (path->mnt->mnt_root != sb->s_root))
goto out;
}
+ err = 0;
+out:
+ return err;
+}
+
+struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi)
+{
+ struct path path;
+ int err;
+
+ path = filp->f_path;
+ path_get(&path);
+
+ err = devpts_ptmx_path(&path);
+ dput(path.dentry);
+ if (err) {
+ mntput(path.mnt);
+ path.mnt = ERR_PTR(err);
+ }
+ if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) {
+ mntput(path.mnt);
+ path.mnt = ERR_PTR(-ENODEV);
+ }
+ return path.mnt;
+}
+
+struct pts_fs_info *devpts_acquire(struct file *filp)
+{
+ struct pts_fs_info *result;
+ struct path path;
+ struct super_block *sb;
+ int err;
+
+ path = filp->f_path;
+ path_get(&path);
+
+ err = devpts_ptmx_path(&path);
+ if (err) {
+ result = ERR_PTR(err);
+ goto out;
+ }
+
/*
* pty code needs to hold extra references in case of last /dev/tty close
*/
+ sb = path.mnt->mnt_sb;
atomic_inc(&sb->s_active);
result = DEVPTS_SB(sb);
diff --git a/include/linux/devpts_fs.h b/include/linux/devpts_fs.h
index 277ab9af9ac2..100cb4343763 100644
--- a/include/linux/devpts_fs.h
+++ b/include/linux/devpts_fs.h
@@ -19,6 +19,7 @@
struct pts_fs_info;
+struct vfsmount *devpts_mntget(struct file *, struct pts_fs_info *);
struct pts_fs_info *devpts_acquire(struct file *);
void devpts_release(struct pts_fs_info *);
@@ -32,6 +33,15 @@ void *devpts_get_priv(struct dentry *);
/* unlink */
void devpts_pty_kill(struct dentry *);
+/* in pty.c */
+int ptm_open_peer(struct file *master, struct tty_struct *tty, int flags);
+
+#else
+static inline int
+ptm_open_peer(struct file *master, struct tty_struct *tty, int flags)
+{
+ return -EIO;
+}
#endif
--
2.14.1
On Thu, Aug 24, 2017 at 11:40 AM, Eric W. Biederman
<[email protected]> wrote:
>
> Here is my tested version of the patch.
Can you please take my cleanups to devpts_ptmx_path() too?
Those 'goto err' statements are disgusting, when a plain 'return
-ERRNO' works cleaner.
And that "struct file *filp = NULL;" is bogus - you added the NULL
initialization because you mis-used "filp" early, and with that fixed
it's just garbage.
Other than that, it looks fine to me.
Linus
On Thu, Aug 24, 2017 at 12:01 PM, Christian Brauner
<[email protected]> wrote:
>
> I've touched on this in my original message, I wonder whether we currently
> support mounting devpts at a different a location and expect an open on a
> newly created slave to work.
Yes. That is very much intended to work.
> Say I mount devpts at /mnt and to open("/mnt/ptmx", O_RDWR | O_NOCTTY) and get a new slave pty at /mnt/1 do we
> expect open("/mnt/1, O_RDWR | O_NOCTTY) to work?
Yes.
Except you actually don't want to use "/mnt/ptmx". That ptmx node
inside the pts filesystem is garbage that we never actually used,
because the permissions aren't sane. It should probably be removed,
but because somebody *might* have used it, we have left it alone.
So what you're actually *supposed* to do is
- create a ptmx node and a pts directory in /mnt
- mount devpts on /mnt/pts
- use /mnt/ptmx to create new pty's, which should just look up that
pts mount directly.
And yes, the pathname should then be /mnt/pts/X for the slave side,
and /mnt/ptmx for the master.
In fact, I just tested that TIOCGPTPEER, including using your original
test-program (this is me as root in my home directory):
[root@i7 torvalds]# mkdir dummy
[root@i7 torvalds]# cd dummy/
[root@i7 dummy]# mknod ptmx c 5 2
[root@i7 dummy]# mkdir pts
[root@i7 dummy]# mount -t devpts devpts pts
[root@i7 dummy]# ../a.out
I point to "/home/torvalds/dummy/pts/0"
[root@i7 dummy]# umount pts/
[root@i7 dummy]# cd ..
[root@i7 torvalds]# rm -rf dummy
There's two things to note there:
- look at that "I point to" - it's not hardcoded to /dev/pts/X
- look at the pts number: each pts filesystem has its own private
numbers, so despite the fact that in another window I *also* have that
ptx/0:
[torvalds@i7 linux]$ tty
/dev/pts/0
that new devpts instance has its *own* pts/0, which is a
completely different pty.
this is one of those big cleanups we did with the pts filesystem some time ago.
Linus
Linus Torvalds <[email protected]> writes:
> On Thu, Aug 24, 2017 at 11:40 AM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> Here is my tested version of the patch.
>
> Can you please take my cleanups to devpts_ptmx_path() too?
Let met take a look.
> Those 'goto err' statements are disgusting, when a plain 'return
> -ERRNO' works cleaner.
Yes those look like good cleanups. I had tried to preserve the original
logic in devpts_ptmx_path from devpts_acquire to make it easier to
see if I had goofed. But that out and out failed so cleanups
so the code is easier to read look like a very good thing.
> And that "struct file *filp = NULL;" is bogus - you added the NULL
> initialization because you mis-used "filp" early, and with that fixed
> it's just garbage.
Actually the NULL initialization is a hold over from the original
version of that function. But I agree without it gcc could have
caught my use of the wrong variable, so removing it looks like a good
idea.
> Other than that, it looks fine to me.
Thanks. I will respin and retest and see where things are at.
Eric
On Thu, Aug 24, 2017 at 12:22 PM, Linus Torvalds
<[email protected]> wrote:
> [root@i7 dummy]# ../a.out
> I point to "/home/torvalds/dummy/pts/0"
Note: that "a.out" binary is modified from your original code.
It's modified to correctly print out the readdir() results, but it's
also modified to open just "ptmx" for the above trial (so it doesn't
open /dev/ptmx, it's opening that ptmx node in the current directory,
which is why it then reports that
I point to "/home/torvalds/dummy/pts/0"
thing.
So it's not your *original* test-program, it's slightly tweaked for this test.
Linus
Christian Brauner <[email protected]> writes:
> On Aug 24, 2017 20:41, "Eric W. Biederman" <[email protected]> wrote:
>
> > The implementation of TIOCGPTPEER has two issues.
> >
> > When /dev/ptmx (as opposed to
>
> I've touched on this in my original message, I wonder whether we
> currently support mounting devpts at a different a location and expect
> an open on a newly created slave to work. Say I mount devpts at /mnt
> and to open("/mnt/ptmx", O_RDWR | O_NOCTTY) and get a new slave pty at
> /mnt/1 do we expect open("/mnt/1, O_RDWR | O_NOCTTY) to work?
Yes.
In particular one of my crazy test cases when I did the last round
of cleanups to devpts was someone had created a chroot including
a /dev/ptmx device node and mounted devpts at the appropriate path
inside the chroot. Which is in part why /dev/ptmx does a relative
lookup of the devpts filesystem.
Now glibc won't work with devpts mounted somewhere else. As it has
/dev/pts/... hardcoded. But the kernel should work fine. The case you
described using /mnt/ptmx instead of a random /dev/ptmx device now
should work especially well as none of this crazy relative path lookup
work needs to happen.
There are little things such as TIOCSPTLCK and perhaps chmod that need
to be called in your example before the slave open will succeed (without
O_PATH) but yes that case most definitely should work.
Eric
The implementation of TIOCGPTPEER has two issues.
When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong
vfsmount is passed to dentry_open. Which results in the kernel displaying
the wrong pathname for the peer.
The second is simply by caching the vfsmount and dentry of the peer it leaves
them open, in a way they were not previously Which because of the inreased
reference counts can cause unnecessary behaviour differences resulting in
regressions.
To fix these move the ioctl into tty_io.c at a generic level allowing
the ioctl to have access to the struct file on which the ioctl is
being called. This allows the path of the slave to be derived when
opening the slave through TIOCGPTPEER instead of requiring the path to
the slave be cached. Thus removing the need for caching the path.
A new function devpts_ptmx_path is factored out of devpts_acquire and
used to implement a function devpts_mntget. The new function devpts_mntget
takes a filp to perform the lookup on and fsi so that it can confirm
that the superblock that is found by devpts_ptmx_path is the proper superblock.
v2: Lots of fixes to make the code actually work
v3: Suggestions by Linus
- Removed the unnecessary initialization of filp in ptm_open_peer
- Simplified devpts_ptmx_path as gotos are no longer required
Fixes: 54ebbfb16034 ("tty: add TIOCGPTPEER ioctl")
Reported-by: Christian Brauner <[email protected]>
Reported-by: Stefan Lippers-Hollmann <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
drivers/tty/pty.c | 64 ++++++++++++++++++++--------------------------
drivers/tty/tty_io.c | 3 +++
fs/devpts/inode.c | 65 +++++++++++++++++++++++++++++++++++------------
include/linux/devpts_fs.h | 10 ++++++++
4 files changed, 89 insertions(+), 53 deletions(-)
diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 284749fb0f6b..a6d5164c33a9 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -69,13 +69,8 @@ static void pty_close(struct tty_struct *tty, struct file *filp)
#ifdef CONFIG_UNIX98_PTYS
if (tty->driver == ptm_driver) {
mutex_lock(&devpts_mutex);
- if (tty->link->driver_data) {
- struct path *path = tty->link->driver_data;
-
- devpts_pty_kill(path->dentry);
- path_put(path);
- kfree(path);
- }
+ if (tty->link->driver_data)
+ devpts_pty_kill(tty->link->driver_data);
mutex_unlock(&devpts_mutex);
}
#endif
@@ -607,25 +602,24 @@ static inline void legacy_pty_init(void) { }
static struct cdev ptmx_cdev;
/**
- * pty_open_peer - open the peer of a pty
- * @tty: the peer of the pty being opened
+ * ptm_open_peer - open the peer of a pty
+ * @master: the open struct file of the ptmx device node
+ * @tty: the master of the pty being opened
+ * @flags: the flags for open
*
- * Open the cached dentry in tty->link, providing a safe way for userspace
- * to get the slave end of a pty (where they have the master fd and cannot
- * access or trust the mount namespace /dev/pts was mounted inside).
+ * Provide a race free way for userspace to open the slave end of a pty
+ * (where they have the master fd and cannot access or trust the mount
+ * namespace /dev/pts was mounted inside).
*/
-static struct file *pty_open_peer(struct tty_struct *tty, int flags)
-{
- if (tty->driver->subtype != PTY_TYPE_MASTER)
- return ERR_PTR(-EIO);
- return dentry_open(tty->link->driver_data, flags, current_cred());
-}
-
-static int pty_get_peer(struct tty_struct *tty, int flags)
+int ptm_open_peer(struct file *master, struct tty_struct *tty, int flags)
{
int fd = -1;
- struct file *filp = NULL;
+ struct file *filp;
int retval = -EINVAL;
+ struct path path;
+
+ if (tty->driver != ptm_driver)
+ return -EIO;
fd = get_unused_fd_flags(0);
if (fd < 0) {
@@ -633,7 +627,16 @@ static int pty_get_peer(struct tty_struct *tty, int flags)
goto err;
}
- filp = pty_open_peer(tty, flags);
+ /* Compute the slave's path */
+ path.mnt = devpts_mntget(master, tty->driver_data);
+ if (IS_ERR(path.mnt)) {
+ retval = PTR_ERR(path.mnt);
+ goto err_put;
+ }
+ path.dentry = tty->link->driver_data;
+
+ filp = dentry_open(&path, flags, current_cred());
+ mntput(path.mnt);
if (IS_ERR(filp)) {
retval = PTR_ERR(filp);
goto err_put;
@@ -662,8 +665,6 @@ static int pty_unix98_ioctl(struct tty_struct *tty,
return pty_get_pktmode(tty, (int __user *)arg);
case TIOCGPTN: /* Get PT Number */
return put_user(tty->index, (unsigned int __user *)arg);
- case TIOCGPTPEER: /* Open the other end */
- return pty_get_peer(tty, (int) arg);
case TIOCSIG: /* Send signal to other side of pty */
return pty_signal(tty, (int) arg);
}
@@ -791,7 +792,6 @@ static int ptmx_open(struct inode *inode, struct file *filp)
{
struct pts_fs_info *fsi;
struct tty_struct *tty;
- struct path *pts_path;
struct dentry *dentry;
int retval;
int index;
@@ -845,26 +845,16 @@ static int ptmx_open(struct inode *inode, struct file *filp)
retval = PTR_ERR(dentry);
goto err_release;
}
- /* We need to cache a fake path for TIOCGPTPEER. */
- pts_path = kmalloc(sizeof(struct path), GFP_KERNEL);
- if (!pts_path)
- goto err_release;
- pts_path->mnt = filp->f_path.mnt;
- pts_path->dentry = dentry;
- path_get(pts_path);
- tty->link->driver_data = pts_path;
+ tty->link->driver_data = dentry;
retval = ptm_driver->ops->open(tty, filp);
if (retval)
- goto err_path_put;
+ goto err_release;
tty_debug_hangup(tty, "opening (count=%d)\n", tty->count);
tty_unlock(tty);
return 0;
-err_path_put:
- path_put(pts_path);
- kfree(pts_path);
err_release:
tty_unlock(tty);
// This will also put-ref the fsi
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 974b13d24401..10c4038c0e8d 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -2518,6 +2518,9 @@ long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case TIOCSSERIAL:
tty_warn_deprecated_flags(p);
break;
+ case TIOCGPTPEER:
+ /* Special because the struct file is needed */
+ return ptm_open_peer(file, tty, (int)arg);
default:
retval = tty_jobctrl_ioctl(tty, real_tty, file, cmd, arg);
if (retval != -ENOIOCTLCMD)
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 108df2e3602c..7eae33ffa3fc 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -133,6 +133,50 @@ static inline struct pts_fs_info *DEVPTS_SB(struct super_block *sb)
return sb->s_fs_info;
}
+static int devpts_ptmx_path(struct path *path)
+{
+ struct super_block *sb;
+ int err;
+
+ /* Has the devpts filesystem already been found? */
+ if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
+ return 0;
+
+ /* Is a devpts filesystem at "pts" in the same directory? */
+ err = path_pts(path);
+ if (err)
+ return err;
+
+ /* Is the path the root of a devpts filesystem? */
+ sb = path->mnt->mnt_sb;
+ if ((sb->s_magic != DEVPTS_SUPER_MAGIC) ||
+ (path->mnt->mnt_root != sb->s_root))
+ return -ENODEV;
+
+ return 0;
+}
+
+struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi)
+{
+ struct path path;
+ int err;
+
+ path = filp->f_path;
+ path_get(&path);
+
+ err = devpts_ptmx_path(&path);
+ dput(path.dentry);
+ if (err) {
+ mntput(path.mnt);
+ path.mnt = ERR_PTR(err);
+ }
+ if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) {
+ mntput(path.mnt);
+ path.mnt = ERR_PTR(-ENODEV);
+ }
+ return path.mnt;
+}
+
struct pts_fs_info *devpts_acquire(struct file *filp)
{
struct pts_fs_info *result;
@@ -143,27 +187,16 @@ struct pts_fs_info *devpts_acquire(struct file *filp)
path = filp->f_path;
path_get(&path);
- /* Has the devpts filesystem already been found? */
- sb = path.mnt->mnt_sb;
- if (sb->s_magic != DEVPTS_SUPER_MAGIC) {
- /* Is a devpts filesystem at "pts" in the same directory? */
- err = path_pts(&path);
- if (err) {
- result = ERR_PTR(err);
- goto out;
- }
-
- /* Is the path the root of a devpts filesystem? */
- result = ERR_PTR(-ENODEV);
- sb = path.mnt->mnt_sb;
- if ((sb->s_magic != DEVPTS_SUPER_MAGIC) ||
- (path.mnt->mnt_root != sb->s_root))
- goto out;
+ err = devpts_ptmx_path(&path);
+ if (err) {
+ result = ERR_PTR(err);
+ goto out;
}
/*
* pty code needs to hold extra references in case of last /dev/tty close
*/
+ sb = path.mnt->mnt_sb;
atomic_inc(&sb->s_active);
result = DEVPTS_SB(sb);
diff --git a/include/linux/devpts_fs.h b/include/linux/devpts_fs.h
index 277ab9af9ac2..100cb4343763 100644
--- a/include/linux/devpts_fs.h
+++ b/include/linux/devpts_fs.h
@@ -19,6 +19,7 @@
struct pts_fs_info;
+struct vfsmount *devpts_mntget(struct file *, struct pts_fs_info *);
struct pts_fs_info *devpts_acquire(struct file *);
void devpts_release(struct pts_fs_info *);
@@ -32,6 +33,15 @@ void *devpts_get_priv(struct dentry *);
/* unlink */
void devpts_pty_kill(struct dentry *);
+/* in pty.c */
+int ptm_open_peer(struct file *master, struct tty_struct *tty, int flags);
+
+#else
+static inline int
+ptm_open_peer(struct file *master, struct tty_struct *tty, int flags)
+{
+ return -EIO;
+}
#endif
--
2.14.1
Hi
[ sorry for the re-send, this accidentally only reached you, rather than
the mailing list and the other recipients as well ]
On 2017-08-24, Linus Torvalds wrote:
> On Thu, Aug 24, 2017 at 11:31 AM, Linus Torvalds
> <[email protected]> wrote:
> >
> > It should just do a "return ptm_open_peer(file, tty, (int)arg);" instead.
>
> Here's the actual tested patch. It "WorksForMe(tm)", including the
> TIOCGPTPEER ioctl, and also verified that it gets the pathname right
> in /proc, which was the original problem.
>
> But I did *not* check that pbuilder is still happy. Stefan?
This patch seems to work, ssh, xterm (konsole5), real tty and pbuilder
(creating- and updating the build chroots, just as well as building
several fairly involved packages) are fine with this patch on top of
v4.13-rc6-66-g143c97cc6529 (tested on x86_64).
Thanks a lot
Stefan Lippers-Hollmann
On Thu, Aug 24, 2017 at 1:24 PM, Stefan Lippers-Hollmann <[email protected]> wrote:
>
> This patch seems to work, ssh, xterm (konsole5), real tty and pbuilder
> (creating- and updating the build chroots, just as well as building
> several fairly involved packages) are fine with this patch on top of
> v4.13-rc6-66-g143c97cc6529 (tested on x86_64).
Ok. I just committed Eric's final patch, which is pretty much exactly
the same as what I sent out except with an added verification that the
mount has not changed.
So I marked you as having "Reported-and-tested" it, even if the
version you tested was not 100% identical.
Will push out after the usual final build test,
Linus
Linus Torvalds <[email protected]> writes:
> On Thu, Aug 24, 2017 at 12:01 PM, Christian Brauner
> <[email protected]> wrote:
>>
>> I've touched on this in my original message, I wonder whether we currently
>> support mounting devpts at a different a location and expect an open on a
>> newly created slave to work.
>
> Yes. That is very much intended to work.
>
>> Say I mount devpts at /mnt and to open("/mnt/ptmx", O_RDWR | O_NOCTTY) and get a new slave pty at /mnt/1 do we
>> expect open("/mnt/1, O_RDWR | O_NOCTTY) to work?
>
> Yes.
>
> Except you actually don't want to use "/mnt/ptmx". That ptmx node
> inside the pts filesystem is garbage that we never actually used,
> because the permissions aren't sane. It should probably be removed,
> but because somebody *might* have used it, we have left it alone.
The ptmx node on devpts is used.
Use of that device node is way more prevalent then crazy weird cases
that required us to make /dev/ptmx perform relative lookups. People
just set the ptmxmode= boot parameter when mounting devpts if they care.
Every use case I am aware of where people actually knew about multiple
instances of devpts used the ptmx node in the devpts filesystem.
If everyone used devtmpfs we could have fixed the permissions on the
ptmx node in devpts and made /dev/ptmx a symlink instead of a device
node. Saving lots of complexity.
Unfortunately there were crazy weird cases out there where people
created chroots or mounted devpts multiple times during boot that
defeated every strategy except making /dev/ptmx perform a relative
lookup for devpts.
The reasons I did not fix the permissions on the ptmx deivcd node was
that given the magnitude of the change needed to get to the sensible
behavior of every mount of devpts creating a new filesystem, any
unnecessary changes were just plain scary.
Further the kind of regression that would be introduced if we changed
the permissions would be a security hole if someone has some really
weird and crazy permissions on /dev/ptmx and does not use devtmpfs.
That said I could not find a distribution being that crazy and I had a
very good sample of them. So I expect we can fix the default permissions
on ptmx node of devpts and not have anyone notice or care.
I would encourage people who are doing new things to actually use the
ptmx node on devpts because there is less overhead and it is simpler.
There are just enough weird one off scripts like xen image builder (I
think that was the nasty test case that broke in debian) that I can't
imagine ever being able to responsibly remove the path based lookups in
/dev/ptmx. I do dream of it sometimes.
It might be worth fixing the default permissions on the devpts ptmx node
and updating glibc to try /dev/pts/ptmx first. That would shave off a
few cycles in opening ptys. If you add TIOCGPTPEER there are probably
enough cleanups and simplifications that it would be worth it just
for the code improvements.
With glibc fixed we could even dream of a day when /dev/ptmx could be
completely removed.
Eric
Hi
On 2017-08-24, Eric W. Biederman wrote:
> The implementation of TIOCGPTPEER has two issues.
>
> When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong
> vfsmount is passed to dentry_open. Which results in the kernel displaying
> the wrong pathname for the peer.
[...]
> v2: Lots of fixes to make the code actually work
> v3: Suggestions by Linus
> - Removed the unnecessary initialization of filp in ptm_open_peer
> - Simplified devpts_ptmx_path as gotos are no longer required
This version of the patch is working for me as well in all my test
(including pbuilder) so far, thanks a lot.
Regards
Stefan Lippers-Hollmann
On Thu, Aug 24, 2017 at 1:43 PM, Eric W. Biederman
<[email protected]> wrote:
>
> There are just enough weird one off scripts like xen image builder (I
> think that was the nasty test case that broke in debian) that I can't
> imagine ever being able to responsibly remove the path based lookups in
> /dev/ptmx. I do dream of it sometimes.
Not going to happen.
The fact is, /dev/ptmx is the simply the standard location.
/dev/pts/ptmx simply is *not*.
So pretty much every single user that ever uses pty's will use
/dev/ptmx, it's just how it has always worked.
Trying to change it to anything else is just stupid. There's no
upside, there is only downsides - mainly the "we'll have to support
the standard way anyway, that newfangled way doesn't add anything".
Our "pts" lookup isn't expensive.
So quite frankly, we should discourage people from using the
non-standard place. It really has no real advantages, and it's simply
not worth it.
Linus
Linus Torvalds <[email protected]> writes:
> On Thu, Aug 24, 2017 at 1:43 PM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> There are just enough weird one off scripts like xen image builder (I
>> think that was the nasty test case that broke in debian) that I can't
>> imagine ever being able to responsibly remove the path based lookups in
>> /dev/ptmx. I do dream of it sometimes.
>
> Not going to happen.
Which is what I said.
> The fact is, /dev/ptmx is the simply the standard location.
> /dev/pts/ptmx simply is *not*.
The standard is posix_openpt(). That is a syscall on the bsds.
Opening something called ptmx at this point is a Linuxism.
There are a lot of programs that are going to be calling posix_openpt()
simply because /dev/ptmx can not be counted on to exist.
> So pretty much every single user that ever uses pty's will use
> /dev/ptmx, it's just how it has always worked.
>
> Trying to change it to anything else is just stupid. There's no
> upside, there is only downsides - mainly the "we'll have to support
> the standard way anyway, that newfangled way doesn't add anything".
Except the new fangled way does add quite a bit. Not everyone who
mounts devpts has permission to call mknod. So /dev/ptmx frequently
winds up either being a bind mount or a symlink to /dev/pts/ptmx in
containers.
It is going to take a long time but device nodes like one of those
filesystem features thare are very slowly on their way out.
> Our "pts" lookup isn't expensive.
>
> So quite frankly, we should discourage people from using the
> non-standard place. It really has no real advantages, and it's simply
> not worth it.
The "pts" lookup admitted isn't runtime expensive. I could propbably
measure a cost but anyone who is creating ptys fast enough to care
likely has other issues.
The "pts" lookup does have some real maintenance costs as it takes
someone with a pretty deep understanding of things to figure out what is
going on. I hope things have finally been abstracted well enough, and
the code is used heavily enough we don't have to worry about a
regression there. I still worry.
As for non-standard locations. Anything that isn't /dev/ptmx and
/dev/pts/NNN simply won't work for anything isn't very specialized.
At which point I don't think there is any reason to skip using the ptmx
node on the devpts filesystem as you have already given up compatibility
with everything else.
But I agree it doesn't look worth it to change glibc to deal with an
alternate location for /dev/ptmx. I see a huge point in changing glibc
to use the new TIOCGPTPEER ioctl when available as that is really the
functionality the glibc internals are after.
Eric
On Thu, Aug 24, 2017 at 4:01 PM, Eric W. Biederman
<[email protected]> wrote:
> Linus Torvalds <[email protected]> writes:
>
>> On Thu, Aug 24, 2017 at 1:43 PM, Eric W. Biederman
>> <[email protected]> wrote:
>>>
>>> There are just enough weird one off scripts like xen image builder (I
>>> think that was the nasty test case that broke in debian) that I can't
>>> imagine ever being able to responsibly remove the path based lookups in
>>> /dev/ptmx. I do dream of it sometimes.
>>
>> Not going to happen.
>
> Which is what I said.
Yes, but you then went on to say that we should encourage "/dev/pts/ptmx"
Which is BS.
It's not a standard location, and it doesn't have any advantages.
It was a bad idea and due to a bad implementation. We fixed it. Let it go.
>> The fact is, /dev/ptmx is the simply the standard location.
>> /dev/pts/ptmx simply is *not*.
>
> The standard is posix_openpt(). That is a syscall on the bsds.
> Opening something called ptmx at this point is a Linuxism.
Bzzt. Thank you for playing, but you're completely and utterly wrong.
Look around a bit more.
posix_openpt() may be what you *wish* the standard was, but no,
/dev/ptmx is not a linuxism.
Really. It's the SysV STREAMS standard location, and it is what Sysv
pty users _will_ use directly.
Linux didn't make up that name.
Solaris, HP-UX-11, other sysv code bases all use /dev/ptmx
The whole "posix_openpt()" thing came later, in an attempt to just
unify the BSD and Sysv models.
Just google for "streams pty" if you don't believe me.
So really. The only linuxism here is that stupid /dev/pts/ptmx.
> There are a lot of programs that are going to be calling posix_openpt()
> simply because /dev/ptmx can not be counted on to exist.
.. and there are probably even more programs that simply use
"/dev/ptmx". If you came from a sysv world, or if you just happened to
copy any of the hundreds of examples on the interenet, that's what you
would do.
Christ, just go to Wikipedia. And I quote:
'BSD PTYs have been rendered obsolete by Unix98 ptys whose naming
system does not limit the number of pseudo-terminals and access to
which occurs without danger of race conditions. /dev/ptmx is the
"pseudo-terminal master multiplexer". Opening it returns a file
descriptor of a master node and causes an associated slave node
/dev/pts/N to be created.[5]'
That's from
https://en.wikipedia.org/wiki/Pseudoterminal
so stop blathering garbage. The fact is, /dev/ptmx is the standard
location, and /dev/pts/ptmx is, and always has been, an abomination.
Now, if you want to be portable, "posix_openpt()" is indeed what you
should use, but that doesn't change the basic point.
There's a very real reason why people use "/dev/ptmx", and no, it's
not a linuxism.
Linus
On Thu, Aug 24, 2017 at 06:01:36PM -0500, Eric W. Biederman wrote:
> Linus Torvalds <[email protected]> writes:
>
> > On Thu, Aug 24, 2017 at 1:43 PM, Eric W. Biederman
> > <[email protected]> wrote:
> >>
> >> There are just enough weird one off scripts like xen image builder (I
> >> think that was the nasty test case that broke in debian) that I can't
> >> imagine ever being able to responsibly remove the path based lookups in
> >> /dev/ptmx. I do dream of it sometimes.
> >
> > Not going to happen.
>
> Which is what I said.
>
> > The fact is, /dev/ptmx is the simply the standard location.
> > /dev/pts/ptmx simply is *not*.
>
> The standard is posix_openpt(). That is a syscall on the bsds.
> Opening something called ptmx at this point is a Linuxism.
>
> There are a lot of programs that are going to be calling posix_openpt()
> simply because /dev/ptmx can not be counted on to exist.
>
> > So pretty much every single user that ever uses pty's will use
> > /dev/ptmx, it's just how it has always worked.
> >
> > Trying to change it to anything else is just stupid. There's no
> > upside, there is only downsides - mainly the "we'll have to support
> > the standard way anyway, that newfangled way doesn't add anything".
>
> Except the new fangled way does add quite a bit. Not everyone who
> mounts devpts has permission to call mknod. So /dev/ptmx frequently
> winds up either being a bind mount or a symlink to /dev/pts/ptmx in
> containers.
In fact, /dev/ptmx being a symlink or bind-mount is the *standard* in containers
even for non-user namespaced containers or containers that do not retain
CAP_MKNOD.
>
> It is going to take a long time but device nodes like one of those
> filesystem features thare are very slowly on their way out.
This related to the point above: The fact that we can mount a devpts at its
standard location but are unable to also have/create an additional device node
at the *standard location* is usually quite irritating for people who do not
know about this "legacy" behaviour. But yeah, it's probably going away but
that's going to be a long long time. I agree that userspace is the place to
slowly make the transition though. :)
>
> > Our "pts" lookup isn't expensive.
> >
> > So quite frankly, we should discourage people from using the
> > non-standard place. It really has no real advantages, and it's simply
> > not worth it.
>
> The "pts" lookup admitted isn't runtime expensive. I could propbably
> measure a cost but anyone who is creating ptys fast enough to care
> likely has other issues.
>
> The "pts" lookup does have some real maintenance costs as it takes
> someone with a pretty deep understanding of things to figure out what is
> going on. I hope things have finally been abstracted well enough, and
> the code is used heavily enough we don't have to worry about a
> regression there. I still worry.
>
> As for non-standard locations. Anything that isn't /dev/ptmx and
> /dev/pts/NNN simply won't work for anything isn't very specialized.
I was mainly asking about non-standard locations because I experienced weird
behaviour when trying to open("/mnt/<slave-idx", O_RDWR | O_NOCTTY). Mind you I
did all the steps that grantpt() + unlockpt() usually do purely file descriptor
based. But I think this was due to the faulty TIOCGPTPEER implemenation before
which should now be fixed.
> At which point I don't think there is any reason to skip using the ptmx
> node on the devpts filesystem as you have already given up compatibility
> with everything else.
>
> But I agree it doesn't look worth it to change glibc to deal with an
> alternate location for /dev/ptmx. I see a huge point in changing glibc
> to use the new TIOCGPTPEER ioctl when available as that is really the
> functionality the glibc internals are after.
That's a patch I've been looking into. But TIOCGPTPEER alone won't be enough. A
couple of other function such as grantpt() need to switch from path-based
operation to file descriptor based operations too (Something I tried to point
out in one of my previous mails.). The whole user-space api could do - imho -
with a redo. The kernel is doing the right thing and exposing the right bits
mostly; TIOCGPTPEER being a good step. But user-space wise it's actually a
little security nightmare as soon as namespaces and - sorry for the buzzword -
*containers* come into play. @Eric, are you going to be at Plumbers again this
year? That's maybe a good chance to discuss some of this if there's still
interest.
Christian
>
> Eric
On Thu, Aug 24, 2017 at 4:37 PM, Christian Brauner
<[email protected]> wrote:
>
> In fact, /dev/ptmx being a symlink or bind-mount is the *standard* in containers
> even for non-user namespaced containers or containers that do not retain
> CAP_MKNOD.
Yes.
I think using /dev/pts/ptmx is nice from a kernel standpoint, but I
really think that user space should *never* use it.
The distro or container setup can do whatever it wants to made
/dev/ptmx then point into the pts directory. Either the traditional
device node, the symlink, or the bind mount works fine. But the point
is that glibc definitely should *not* point to /dev/pts/ptmx itself,
because it's simply not the right path. On lots of distributions that
path simply will not work.
And yes, I agree that the user interface to this all is particularly
nasty. With TIOCGPTPEER we have a nice way to get the pts file
descriptor, but the "normal" way to get to it involves opening a path
given by ptsname(), so we en dup in the crazy situation that we can
easily open the file without the path, but then we use the fd to get
the path (that we didn't need) and then people open it with that path,
because the standard sequence to get a pts is
master = getpt() / posix_openpt() / open("/dev/ptmx", O_RDWR | O_NOCTTY);
grantpt(master);
unlockpt(master);
name = ptsname(master);
slave = open(name, O_RDWR);
which is kind of silly. And I'm not talking about the three different
ways to open the master side. I'm talking about all the rest, which is
all just pretty much garbage.
But I guess none of this is really performance-critical.
Linus