2004-09-10 19:22:50

by Timothy Miller

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4



Linus Torvalds wrote:

>
> The same goes for something like a "container file". Whether you see it as
> "dir-as-file" or "file-as-dir" (and I agree with Jan that the two are
> totally equivalent), the point of having the capability in the kernel is
> not that the operations cannot be done in user space - the point is that
> they cannot be done in user space _safely_. The kernel is kind of the
> thing that guarantees that everybody follows the rules.
>

Well, it CAN be done safely if every client has to go through the kernel
which does all the appropriate locks/semaphors and then passes the
request down to the daemon. (Isn't NFS implemented this way?)

This is very micro-kernel-ish, but it's a reasonable idea to do it this
way in cases where things are non-critical.

Say there's a way to cd into a tgz file to look around. If the access
methods through the kernel get routed back to a user-space process
(which probably does some amount of caching in memory and on disk of
uncompressed bits of the archive), it could be a bit slower than if it
were all in-kernel. The thing is that the processing time in the daemon
is probably quite high compared to the overhead of the context switches,
so it doesn't cost much. (And if it CAN be done in userspace without
being horribly convoluted or unsafe, then it SHOULD be done in
userspace.) Besides, even if it were a LOT slower to access a tgz file
without extracting it first, I would STILL think it was wonderful AND
use it a LOT.

Heh... now, the next thing is to be able to do some sort of union-mount
like thing where the tgz is really read-only but if you WRITE to it, it
gets stored somewhere else. This way, you can compile a kernel tree
without extracting it first.

Taking this a bit further, the union-mount could be done away with if
the daemon were to lazily rebuild the tgz file from its cache when the
tgz is modified. That is, writes have little or no penalty because they
just get stored in the daemon's cache, and the daemon rebuilds the
archive in the background.

The VFS layer would have to recognize that a particular file has been
accessed in such a way that it must go through the daemon for ANY access
to it, which means that if someone tries to access the original archive
as a file, requests have to go through the daemon. This makes it
coherent, although with an expected performance penalty.

One issue to consider is that of shutdowns, both intentional and
crashes. This is a non-issue with an underlying journaling file system.
As long as the tgz daemon comes up early enough, it would see its
cache (which is on-disk anyway), recognize which archive it belongs to,
and continue where it left off. This way, the daemon doesn't have to go
through a long process of rebuilding every archive you've ever modified
on the disk on shut-down. If, however, you needed to access the archive
without the daemon, that could be a problem (but what you'd get is
probably the archive before it was modified). Also, some sort of sync
command would be needed to cause the daemon to rebuild all archives
immediately.

I really dig the idea of being able to access archives without
extracting them first, regardless of any performance penalty.


2004-09-10 22:18:54

by Elladan

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Fri, Sep 10, 2004 at 03:22:59PM -0400, Timothy Miller wrote:
>
> Linus Torvalds wrote:
>
> >The same goes for something like a "container file". Whether you see it as
> >"dir-as-file" or "file-as-dir" (and I agree with Jan that the two are
> >totally equivalent), the point of having the capability in the kernel is
> >not that the operations cannot be done in user space - the point is that
> >they cannot be done in user space _safely_. The kernel is kind of the
> >thing that guarantees that everybody follows the rules.
>
> Well, it CAN be done safely if every client has to go through the kernel
> which does all the appropriate locks/semaphors and then passes the
> request down to the daemon. (Isn't NFS implemented this way?)
>
> This is very micro-kernel-ish, but it's a reasonable idea to do it this
> way in cases where things are non-critical.
>
> Say there's a way to cd into a tgz file to look around. If the access
> methods through the kernel get routed back to a user-space process
> (which probably does some amount of caching in memory and on disk of
> uncompressed bits of the archive), it could be a bit slower than if it
> were all in-kernel. The thing is that the processing time in the daemon
> is probably quite high compared to the overhead of the context switches,
> so it doesn't cost much. (And if it CAN be done in userspace without
> being horribly convoluted or unsafe, then it SHOULD be done in
> userspace.) Besides, even if it were a LOT slower to access a tgz file
> without extracting it first, I would STILL think it was wonderful AND
> use it a LOT.

This is only safe for some limited definitions of safety.

Leaving aside the obvious complexities of making sure the user space
daemon doesn't just do something crazy, you have a number of problems:

* Which user does the daemon run as? If it runs as root, it needs to
enforce strict security requirements in terms of VFS operations coming
in, and also, it has all the problems of a SUID application running on
an arbitrary user file. I don't know about you, but I don't trust
joebob's tarfsd to be suid when running on some script kiddie tarball.

If it runs as the user making the request, then what happens when two
users try to open the same file at the same time. Do they see two
different views of the file? Is one of them locked out? How can they
possibly synchronize if the views are writable?

* If the daemon runs as a user, then what happens if you try to run a
suid program inside of the view? What if root tries to walk into a
tar file? The conservative view is that the kernel would need to
always return ENOENT for any process with elevated capabilities, but
even this is not safe.

* Consider what happens if foo.tar/blah_blah is automatically bound to
enter a tarfsd view of foo.tar. What happens if I point the web
server at foo.tar/blah? The web server runs as httpd or something, so
presumably httpd ends up running some sort of tarfsd view on the
file. But if the tarball was made by a malicious person, presumably
it can obtain httpd user access now by exploiting a bug in tarfsd.

* Even if you assume the view is read-only and the kernel coerces all
permissions and ownership and such, there's the possibility of tarfsd
presenting unexpected syscall results - weird error codes, short
reads, file data changing underneath mmap, etc. that user applications
don't expect and may be susceptible to.

* Besides all these security issues, if this scheme is writable it has
all the resource problems that loopback network filesystems suffer.
What if the kernel is short on memory and tries to flush dirty buffers
to reclaim it. If those buffers are running against the user FS
daemon, then that daemon must wake up to clear dirty buffers. If it
tries to allocate memory, deadlock in the kernel.


Probably the security problems can be solved to some degree by being
very paranoid in the kernel (at an associated loss in utility), and the
resource issues can be solved by restricting dirty buffers for loopback
mounts (at an associated loss in performance), but it's hardly simple.

-J

2004-09-17 20:52:07

by Pavel Machek

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Hi!

> Say there's a way to cd into a tgz file to look around. If the
> access methods through the kernel get routed back to a user-space
> process (which probably does some amount of caching in memory and on
> disk of uncompressed bits of the archive), it could be a bit slower
> than if it were all in-kernel. The thing is that the processing time

Exactly. See uservfs.sf.net. It is using coda hooks into the kernel.

> be done in userspace.) Besides, even if it were a LOT slower to
> access a tgz file without extracting it first, I would STILL think it
> was wonderful AND use it a LOT.

Go ahead :-).


--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-09-17 20:52:05

by Pavel Machek

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Hi!

I'm talking about uservfs.sf.net, aka podfuk, which implements tarfs
(and more).

> Leaving aside the obvious complexities of making sure the user space
> daemon doesn't just do something crazy, you have a number of problems:
>
> * Which user does the daemon run as? If it runs as root, it needs to
> enforce strict security requirements in terms of VFS operations coming
> in, and also, it has all the problems of a SUID application running on
> an arbitrary user file. I don't know about you, but I don't trust
> joebob's tarfsd to be suid when running on some script kiddie tarball.

It needs to run as root. If there's bug in tar, that script kiddie is going
to get your users anyway. Audit the code.

> * If the daemon runs as a user, then what happens if you try to run a

It does not.

> * Consider what happens if foo.tar/blah_blah is automatically bound to
> enter a tarfsd view of foo.tar. What happens if I point the web
> server at foo.tar/blah? The web server runs as httpd or something, so
> presumably httpd ends up running some sort of tarfsd view on the
> file. But if the tarball was made by a malicious person, presumably
> it can obtain httpd user access now by exploiting a bug in tarfsd.

If there's a bug in tar, I can do a lot of interesting things. Audit tar.

> * Even if you assume the view is read-only and the kernel coerces all
> permissions and ownership and such, there's the possibility of tarfsd
> presenting unexpected syscall results - weird error codes, short
> reads, file data changing underneath mmap, etc. that user applications
> don't expect and may be susceptible to.

FUD. Fix tarfsd if its problem.

> * Besides all these security issues, if this scheme is writable it has

There are no security issues. Yes, uservfs is security-sensitive code.

> all the resource problems that loopback network filesystems suffer.
> What if the kernel is short on memory and tries to flush dirty buffers
> to reclaim it. If those buffers are running against the user FS
> daemon, then that daemon must wake up to clear dirty buffers. If it
> tries to allocate memory, deadlock in the kernel.

Coda solves this: all writable things are file-backed on file in /tmp =>
bad performance if you only need piece of very large file,
but no deadlocks.

> Probably the security problems can be solved to some degree by being
> very paranoid in the kernel (at an associated loss in utility), and the
> resource issues can be solved by restricting dirty buffers for loopback
> mounts (at an associated loss in performance), but it's hardly simple.

See sources. It is <1000 lines of code.
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms